Minimally Entangled Typical Thermal State Algorithms

We discuss a method based on sampling minimally entangled typical thermal states (METTS) that can simulate finite temperature quantum systems with a computational cost comparable to ground state DMRG. Detailed implementations of each step of the method are presented, along with efficient algorithms for working with matrix product states and matrix product operators. We furthermore explore how properties of METTS can reveal characteristic order and excitations of systems and discuss why METTS form an efficient basis for sampling. Finally, we explore the extent to which the average entanglement of a METTS ensemble is minimal.


I. INTRODUCTION
According to the elementary principles of quantum statistical mechanics, the average of an observable is found by computing While performing such a calculation directly is usually intractable for quantum many body systems, one can still approximate the expectation value of A by strategies based on sampling. For instance, many Quantum Monte Carlo methods take advantage of the Trotter decomposition of ρ to express an expectation value as a sum over virtual classical paths, which can then be numerically sampled. In such approaches, the act of randomly choosing a path corresponds to sampling both the thermal and quantum fluctations together. However, it is also possible to sample the thermal fluctuations only, randomly selecting quantum states rather than classical ones. This may be done by expanding the trace in Eq. (1) in terms of an orthonormal basis |i such that the expectation value of A takes the form In the first line we have used the cyclic property of the trace and in the second line the set of normalized states |φ(i) are defined as |φ(i) = P (i) −1/2 e −βH/2 |i together with the (unnormalized) probability distribution One may then estimate A by sampling the states |φ(i) with probability P (i)/Z and averaging the expectation values φ(i)|A|φ(i) computed at each step. Though such a procedure could be carried out using any orthonormal basis of states |i , not every choice would lead to an efficient algorithm. In particular, the computational cost of working with a quantum state goes up the more entangled it is, as measured by the von Neumann entanglement entropy across bipartitions of the system. Therefore a natural choice for the basis |i is the set of classical product states (CPS). These are states with wavefunctions of the form |i = |i 1 |i 2 |i 3 . . . |i N , where the i j label states in a local basis that may be chosen arbitrarily for each site j. As a result of their product structure, CPS have an entanglement entropy that is exactly zero. One therefore might expect that, of all ensembles of states |φ(i) , those produced from CPS have the least entanglement. Moreover, such |φ(i) have all of the properties one intuitively expects of typical states of a thermal system. At high temperature they are effectively classical, while at lower temperatures they spontaneously break symmetries of the Hamiltonian in systems that exhibit long range order. They do not exhibit unphysical non-local correlations, and remain factorized over parts of the system which do not interact. And, expectations of observables φ(i)|A|φ(i) lie very close to the thermal average. For such reasons as well as others that we will discuss below, these states have therefore been designated minimally entangled typical thermal states, or METTS. 1 Not only does each METTS provide a good characterization of the thermal ensemble, but there exists a simple algorithm, the pure state algorithm, for sampling many of them efficiently. This is acheived by a random walk through the set of METTS where the next state is constructed from a CPS obtained by measuring, or collapsing, the previous one. This collapsing step automatically ensures that the METTS are sampled with the correct distribution, and it allows one to optimize the the sampling process by an appropriate choice of basis. Moreover, collapsing each state may be seen as sampling the quantum fluctuations of the system, a fact that can be used to accelerate the calculation of certain physical observables.
In this paper, we elaborate upon the implementation of the pure state algorithm for sampling METTS and its usage in realistic simulations. First, in section II we provide a more detailed explanation of the pure state algorithm and its implementation. We begin with a discussion of various methods for carrying out the imaginary time evolution implicit in Eq. (3), as it is the most computationally intensive part of the algorithm and the step where the most significant approximations must be made. Then, we turn to the measurement of physical observables. While it is most convenient to calculate quantities like the total energy using matrix product operators (MPOs), we will show how local observables can be calculated more efficiently by taking advantage of the redundancy inherent in the matrix product state (MPS) formalism. We then conclude the section by discussing an efficient method for collapsing a pure state into a CPS. The basis into which the CPS is measured can be arbitrary, and we will demonstrate how this freedom may be exploited both to improve the sampling autocorrelation time and to help calculate challenging observables.
In section III we turn our focus to the properties of the METTS themselves. Properties of individual METTS can provide insight into the order that is present in a system as well as characteristic thermal fluctuations. On the other hand, studying the entire METTS ensemble will show us why it is such an efficient basis for sampling. Finally, we quantify the sense in which METTS are minimally entangled and conjecture that although there exist thermal decompositions with less entanglement than METTS at some temperatures, no decomposition can outperform an optimal METTS ensemble at all temperatures.
Throughout, whenever we find it helpful to show the results of an explicit calculation, we will choose as our model of interest the S = 1 bilinear-biquadratic chain with Hamiltonian In particular we will focus on two special cases of this model, namely θ = 0, the S = 1 Heisenberg antiferromagnet and θ = arctan[1/3], the AKLT model posessing an S = 1 valence bond solid ground state. 2 At T = 0, both models are in the same phase, the Haldane phase, which is characterized by, among other things, a gap to all excitations 3 and a doubly degenerate entanglement spectrum. 4

II. PRODUCING METTS WITH THE PURE STATE METHOD
The pure state method consists of the following simple algorithm for producing a series of METTS such that they are correctly distributed with probability P (i)/Z: 1. Choose a CPS |i .
We will shortly discuss how to carry out these steps in practice. However, let us first see why producing METTS in this way is guaranteed to give the correct distribution.
Consider the ensemble of all CPS |i initially distributed with probability P (i)/Z. If we randomly choose a CPS |i from this ensemble and then follow the steps above, the probability of measuring a particular CPS |j at the end of step 3 will be That the resulting ensemble of all such |j have the same distribution as the original ensemble is a consequence of the detailed balance condition Therefore the ensemble of CPS with distribution P (i)/Z is a fixed point of this process. Finally then, because each METTS |φ(i) is produced deterministically from a CPS |i in step 2, they will be generated with probability P (i)/Z as well.
In addition to generating METTS with the correct distribution, the pure state method has other important properties that make it advantageous for performing simulations. First, it may be defined in a completely general way, allowing one to choose a specific implementation based on the problem of interest. Next, it does not rely on a rejection method to perform sampling. That is, every METTS produced (in what is the costliest step of the algorithm) can be used to perform measurements. And, the algorithm is readily parallelizable since its sampling method is that of a Markovian random walk.
Most importantly, the computational cost of the pure state algorithm is only modestly greater than that of ground state DMRG. The typical bond dimension m of the matrix product state representing each METTS produced by the imaginary time evolution step ranges from m = 1 for β = 0 to m = m 0 for very large β, where m 0 is the bond dimension necessary to represent the ground state using DMRG. And, as shown below, the imaginary time evolution for a system with N sites scales as N m 3 per timestep. Therefore the cost of producing each METTS scales as N m 3 β (ground state DMRG scales as N m 3 ). The CPS collapse step is much less costly, scaling only as as N m 2 . We have therefore been able to reach significantly lower temperatures with the METTS approach than with previous DMRG-based finite temperature methods. For example, Feiguin and White were able to simulate the S = 1 Heisenberg chain down to about T = 0.05 using the ancilla method, where about m = 500 DMRG states had to be kept at the lowest temperatures when using a truncation error cutoff of 1×10 −10 . This is because their matrix product states had to encode both the system and environment spins. 5 In contrast, to reach comparable temperatures using METTS requires only about m = 60 states when using the same cutoff. This has allowed us to produce accurate results down to at least T = 0.02, as shown in Fig. 1. And, since the computational cost of METTS is comparable to that of ground state DMRG, we expect that it can treat comparably large ladders and two-dimensional systems. This promises to be of great value for studying frustrated and fermionic models beyond one dimension -especially models with non-trivial phases at finite temperature.
Having defined the pure state method in a general setting, let us look more closely at each step. We will discuss how to produce and sample METTS in detail and demonstrate that there is great flexibility within the basic algorithm that can allow us to optimize it for our system of interest.

A. Imaginary Time Evolution
Since the imaginary time evolution is both the costliest step of the algorithm and the source of the most numerical error, we will begin by discussing some of the best methods for evolving a state in imaginary time and their relative advantages and disadvantages with regards to producing METTS.
In what follows, we will find it convenient to represent wavefunctions as matrix product states (MPS) and, in certain cases, operators as matrix product operators (MPO). A brief review of this formalism may be found in the Appendix.

Trotter-Suzuki Method
For a one dimensional system, the simplest way to implement time evolution is to use a Trotter-Suzuki decomposition of the time evolution operator, evolving each bond one at a time. When using this method for our 1D benchmark calculations, we chose to do a secondorder breakup as in the time-dependent DRMG method of White and Feiguin 6 . Assuming that the Hamiltonian consists of a sum of nearest-neighbor bond hamiltonians H = n H n , decompose e −τ H as We emphasize here that the labeling of the terms in the Hamiltonian is arbitrary. However, for a 1D system with nearest-neighbor interactions it is convenient to let H n refer to the interactions on the n th bond, making the operator decomposition above perfectly suited for a single DMRG sweep. Because each factor, or 'gate', in this breakup is a local operator, it can be computed exactly such that the only error in this treatment of the time evolution operator is the finite time step Trotter error. Then, each gate may be applied directly to the MPS representing the state of the system by using DMRG to expose the two sites around the bond to be time evolved.
While the Trotter-Suzuki method is accurate and convenient, it is only directly applicable to systems with (a) FIG. 2: Diagrams illustrating (a) the structure of a swap gate tensor, which is a product of two identity tensors and exchanges the states of two lattice sites and (b) the application of swap gates to an MPS, allowing a Trotter time evolution gate to act on effective nearest-neighbor sites while actually evolving a system with longer range interactions.
nearest-neighbor interactions along the path taken by the MPS through the sites of the lattice. This makes it hard to apply to 2D systems, including those with nearestneighbor interactions. Even quasi-1D ladders and 1D chains with longer range interactions cannot be treated by the above method without some modifications.
Various methods have therefore been developed to extend Trotter time evolution to such systems, for example Feiguin and White's method based on a Runge-Kutta differential equation solver. 7 However, such methods turn out to be too inefficient for our purposes. So, we will discuss some alternative methods below that are more effective for these challenging systems.

Trotter-Suzuki with Swap Gates
The first method extends the Trotter-Suzuki method by utilizing swap gates, a tool that is familiar in quantum computing. A swap gate S(ij) exchanges the states on two identical sites i and j. For a spin or boson system, S(ij) = δ sis j δ sj s i ; the diagrammatic representation of which is shown in Fig. 2a. Swap gates can be used for fermionic systems as well. In the case of spinless fermions, for instance, S(ij) takes the same form as for bosons, but with an overall minus sign multiplying the case in which both sites are occupied. Now, a generic Trotter-Suzuki breakup of the time evolution operator for a Hamiltonian containing only one site or two site interaction terms can be written in the form were the sites acted on by a given gate K ij are not necessarily nearest-neighbors along the path taken by the MPS through the lattice. However, K ij can be written in terms of a gate that does act on neighboring sites by using the identity where the swap operators Θ can be defined as a product of nearest-neighbor swap gates as The operator K (ij) j−1 j is the exponentiation of the same two-body term in the Hamiltonian as K ij , but acts instead on the sites j − 1 and j.
To make use of the identity above for the purpose of applying the single gate K ij , one first applies the swap gates composing Θ j−1 i to the original MPS |ψ , yielding an effective MPS |ψ . Then, the gate K (ij) j−1 j may be acted on |ψ as a local operator. Finally, the swap gates composing Θ i j−1 are applied, restoring the original order of the sites. .
Finally, when applying many Trotter gates K ij to an MPS, it is useful to observe that because S(ij) 2 = 1, the ordering of the Trotter gates in the decomposition can be chosen such that many of the swap gates cancel. Whenever this is the case, the repeated pair of gates can simply be omitted from the algorithm, leading to a significant speedup.
One way to arrange the Trotter decomposition to take advantage of this cancellation is as follows: taking j > i, order the K ij first by i, from left to right and then for each i, by j from left to right. Then all of the gates are applied in a single sweep that is arranged as a nested loop, where i is iterated over in the outer loop and j in the inner loop. In practice, all of the gates can be produced before applying any of them to the MPS so that cancellations can be found and the corresponding pairs of gates omitted. Even when taking advantage of cancellations, however, this method scales as L x L 2 y m 3 for an L x × L y ladder instead of scaling proportionally to the number of sites as in the case of a chain.

Imaginary Time Evolution with MPOs
The remaining methods for time evolution we will discuss are based upon matrix product operators, or MPOs. But, before we turn to the formation of the time evolution MPO itself, let us first consider how to apply it to an MPS.
A concept that will be useful is that of the orthogonality center of an MPS. If we take the center to be at site i, then all of the MPS matrices A sj such that j < i have been made left orthogonal in the sense that and all of the matrices A sj such that j > i have been made right orthogonal such that These conditions are illustrated in Fig. 3.
When an MPS has a well defined orthogonality center, the set of A matrices to its left and right define an orthonormal basis for each half of the system. The matrix at the orthogonality center then defines the coefficients of the wavefunction in this basis, that is where the site index s i = 1, 2 . . . , d while the bond indices α j = 1, 2 . . . , m. Alternatively, when performing two-site operations one can can take the orthogonality center to be a bond that can optionally include the sites adjacent to it. When performing the usual SVD or density matrix diagonalization in DMRG, one takes the tensor ψ(α i−1 s i s i+1 α i+1 ) representing the wavefunction and splits it into matrices A si and A si+1 . These matrices give an optimal represention of the tensor ψ and because they define expansion coefficients of the wavefunction in an orthonormal basis, the truncation of the wavefunction is also optimal. If there was no orthogonality center, or an SVD was performed away from the center, such an operation would optimally represent a local tensor but not an optimal truncation of the wavefunction.
In applying an MPO W to an MPS |ψ A one destroys its orthogonality. Apart from issues of efficiency, the simplest approach to forming the product W |ψ A would be as follows. Define the matrices B as  To guarantee proper truncation in this naive approach, one would first sweep left to right, performing SVDs to make the basis to the left orthogonal, but leaving the bond dimension as mk since the basis to the right is not orthogonal. Once at the right edge, the SVDs in the reverse sweep could truncate the bond dimension to m or to some specified truncation error. However, this procedure scales as N m 3 k 3 and would be highly inefficient if k ∼ 10 − 100.
Verstraete and Cirac recommended a better approach (in a different context) that scales as N m 3 k + N m 2 k 2 based on fitting an MPS |ψ B to the product W |ψ A . 8 Form a random MPS |ψ B of bond dimension m and orthogonalize it to have any arbitrary orthogonality center. The optimal two-site wavefunction ψ(β i−1 s i s i+1 β i+1 ) at this center is then found by forming tensors L and R representing the product in the basis for the left and right halves of the system and combining them with the local W and A matrices as in Fig. 4. This ψ B minimizes || |ψ B − W |ψ A || 2 . One may then split ψ B into matrices B si and B si+1 and sweep back and forth through the system, repeating this process until it converges.
Verstraete and Cirac originally advocated a single-site version of this approach but we have found the two-site version better -for example, the value of m can be automatically set to achieve a specified truncation error much more readily, particularly if one is keeping track of con- served quantum numbers. However, even when using the two-site version, this fitting procedure may fail to converge for systems with non-nearest-neighbor interactions in a reasonable number of iterations. The reason seems to be related to the "sticking" problem of DMRG. When two distant sites are strongly entangled, the method has trouble building up this entanglement when improving the state near each site one at a time. On the other hand, the naive method above does not have such a sticking problem.
We have therefore developed an alternative approach, the zip-up algorithm, that has similar scaling to the fitting method but which does not get stuck. We begin by identifying the MPO W with an MPS |W by combining its primed and unprimed (i.e. bra and ket) site indices. The orthogonality centers of |ψ A and |W are then moved to the first site. This arrangement guarantees that the product of W and |ψ at the first site has a basis to the right that, while not orthogonal, is not drastically ill-conditioned. By this we mean that while some linear dependence of the basis can exist, no basis state is expected to have a norm that is significantly bigger than one (norms much less than one are less of a problem).
Then, starting from the first site, we form the product MPS |ψ B as follows. Define the tensor C 1 as Thinking of C 1 as a matrix C s1 (ω1α1) , where (ω 1 α 1 ) is a combined or 'fat' index, one may compute its Singular Value Decomposition (SVD) as At this step, it is crucial to perform a truncation, keeping only the m largest singular values contained in Λ 1 . This may be done by setting m explicitly or by specifying a cutoff on the truncation error. After truncating, the first matrix of |ψ B is defined as B s1 β1 = U s1β1 and is therefore left orthogonal by construction.
The remaining B matrices may now be found recursively. To go on to the next step, one computes this time thinking of the result as a matrix C (β1s2)(ω2α2) and computing its SVD to find U (β1s2)β2 . The matrix B s2 may then be set to B s2 β1β2 = U (β1s2)β2 and the process continued by defining C 3 = C t3 β2ω3α3 analagously in terms of U 2 , C 2 , A 3 and W 3 .
For clarity, the construction of the C tensors is depicted graphically in Fig. 5. Asymptotically, if one assumes that k < m, this algorithm scales as N m 3 kd.
We note that when performing the SVDs in the above procedure, it is important to use a truncation error cutoff versus a fixed m during the initial left to right sweep to compensate for the modest loss of efficiency stemming from the fact that each truncation optimizes a local tensor, not |ψ B itself. Once the computation of |ψ B reaches the right boundary, the return SVD is fully efficient since the B matrices are left orthogonal and during this return sweep m can be reduced to an optimum value. For typical 2d clusters, we find that such a procedure leads to an intermediatem that is about twice m.
Finally, it may be useful to combine the above two approaches. One could perform an initial left to right sweep using the zip-up algorithm with a fixedm m to obtain an initial guess for |ψ B and then switch to the fitting algorithm for the return sweep (and possibly additional sweeps) until convergence is reached.

Taylor Series Construction of the Time Evolution MPO
Having discussed the application of an MPO to an MPS, let us now turn to the construction of an MPO representation of the time evolution operator K τ = e −τ H . The first method we will discuss relies on expanding K τ in a Taylor series.
To construct the MPO for K τ using this approach, one begins by constructing an MPO representation of the Hamiltonian H. While it is sometimes possible to work out the MPO for H explicity, such as in the case of translationally invariant 1D systems, more complex systems can be treated by constructing MPOs for each term in the Hamiltonian and summing them together (see the Appendix for further details on adding and multiplying MPOs).
Having found an MPO representation of H, the Taylor series approach begins with breaking the timestep τ into small fractions = τ /n. One may then approximate K = e − H in a Taylor series which is truncated at a high order p such that the error is controlled by p . Finally, this series may be summed by utilizing MPO multiplication and addition algorithms. In doing so, it is important to arrange the computational steps so as not to repeat any expensive calculations such as computing powers of H. One such arrangement is given by the following algorithm.
Compute the MPO W 0 = − H. Then compute the MPOs W j recursively as where j = 1, . . . , p. The timestep operator itself is then given by K = 1 + W p . Finally, the MPO for K τ may be found by computing the n th power of K using the zip-up method for MPO multiplication described in the Appendix. In doing so, it is helpful to choose n to be a power of two so that K τ may be computed by by repeatedly squaring K . In this way fewer MPO multiplications are required.

Trotter-Suzuki Construction of the Time Evolution MPO
An alternative method for constructing the time evolution MPO K = e −τ H is to make use of the Trotter-Suzuki decomposition of Eq. (9), where K is approximated as a product of gates K ij . But instead of applying the gates directly to the MPS representing the system, one combines them together to form an MPO for K. Now, explicitly writing an MPO representation for a gate K ij can be challenging as it is not a simple product of single site operators and will not generally act on sites adjacent to each other in the MPS. However, one solution is to use the swap gate technique described above. One first works out the set of swap gates and local K (ij) j−1 j gate operators that form K and removes any redundant gates. Then the MPO for K may be computed by creating an identity MPO and applying the gates directly to it, taking care to truncate the MPO bond dimension after each gate is applied. Or, if even greater accuracy is required, one may follow the steps above to construct an MPO for K = e − H with = τ /2 n and then use the zip-up method to square it n times and obtain K.
An important advantage of this Trotter-Suzuki method over the Taylor series approach is that errors are easier to identify and control. In particular, we have found that the parameters controlling the error in the Taylor series method must be reduced whenever the number of lattice sites N is increased in order to maintain the same accuracy. This can be traced to the fact that the Hamiltonian generally possesses at least one eigenvalue that scales as N , while other eigenvalues scale as N p with p < 1.
In contrast, the Trotter-Suzuki approach to computing K does not suffer from this difficulty since one works with gates acting only on pairs of sites, and such gates may be constructed exactly. The only sources of error that remain, then, are the finite timestep Trotter error and the truncation error in the construction of the MPO.

B. Measurement of Observables
After producing a METTS, one wants to measure a number of bulk quantities, such as the energy, as well as local observables, such as the magnetization on each site. For bulk observables that consist of local terms summed over the entire lattice, it is usually convenient to use an MPO representation. We must therefore find an efficient way to compute expectation values of such MPOs.
On the other hand, it is more efficient to work with local observables explicitly, and to compute their expectation values only in terms of the orthogonality center of the MPS. And, as we will see later, this second approach is particularly important as it is the key to performing an efficient CPS collapse in the last step of the pure state method.

Computing the Expectation Value of an MPO
Operators formed by summing over local terms, most notably the Hamiltonian, can generally be written as an MPO with relatively small bond dimension k. Since we will need to compute the expectation value of these MPOs with respect to MPS that have a relatively large bond dimension m, let us see how to do this efficiently.
Consider computing the matrix element ψ A |Ŵ |ψ B whereŴ is an MPO of bond dimension k, and |ψ A and |ψ B are MPS of bond dimension m. Of course, this reduces to an expectation value if |ψ A = |ψ B .
One starts the computation at the beginning of the chain, defining L 1 as Then L 2 can be produced from L 1 by computing Here the order of operations in performing the sum is crucial to ensure good scaling -in particular one should not sum over both the α 1 and β 1 indices simultaneously. One then proceeds to define L 3 in terms of L 2 in a similar manner until the end of the chain is reached. In this way, one computes the matrix element exactly as there is no truncation step. If the steps of the algorithm are followed in the order given above and we assume that k < m, the computation scales as N m 3 kd.

Computing the Expectation Value of a Local Operator
While any operator may applied to a state by representing it as an MPO, it is often more efficient to compute the action of local operators on an MPS explicitly. The basic idea is shift the orthogonality center of the MPS (defined in section II A 3 above) such that it lies within the set of sites on which the local operator acts. This way any MPS matrix on which the operator acts trivially is excluded from the computation.
For example, consider computing of the expectation value of a single site operatorÔ i acting on site i. If we take the orthogonality center to be the site i, then because the matrices A sj for all j < i obey the left orthogonality condition Eq. (13) and the matrices A sj for all j > i obey the right orthogonality condition Eq. (14), the expectation value ofÔ i reduces to Or, if we are interested in applying the operator to the state to obtainÔ i |ψ A , this can be done by making the replacement while keeping all other A matrices fixed. Expectation values of multi-site local operators such as ψ A |Ô iÔi+1 |ψ A may be calculated in a similar way. Assuming that the orthogonality center is at site i or i + 1, one now includes both the matrices A si and A si+1 . However, in contrast to the single-site computation, it is important in such multi-site expectation values to break up the tensor contractions such that the scaling of any one operation never exceeds m 3 d.
Finally, while each local expectation value may be computed with a scaling no worse than m 3 d, one usually wants to perform such a calculation at each site in the system. Since this necessarily involved shifting of the orthogonality center, the overall scaling will then be N m 3 d 2 as an SVD must be computed at each step.

Estimation of Errors
While the methods discussed above give an efficient way to calculate observables for a single METTS, one must also consider the properties of the METTS ensemble when estimating averages and computing error bars. This is because correlations exist both between sequential METTS and between related observables within a METTS.
As we will discuss in the next section, the autocorrelation time of the METTS pure state method can be made quite short (less than 5 steps or so) by properly choosing the basis into which the CPS are collapsed. However, when estimating the error in a series of expectation values, it is still best to remove the effects of correlations by using a binning procedure as follows.
After calculating expectation values of an observable A, one collects sequential values into bins whose size is larger than the autocorrelation time. The average value within each bin is then computed. Each bin average may now be taken to be statistically independent and the error estimated as the standard error of these bin averages.
However, for quantities such as the specific heat or magnetic susceptibility, given by , computing the error as the sum of the errors in the first and second moments greatly overestimates the true error even when using binning. This can be traced to the fact that, from one step of the pure state algorithm to the next, the first and second moments of a bulk observable A are strongly correlated. That is, if A is relatively large for a particular METTS, so is A 2 .
A simple way to deal with this difficulty is therefore to use a resampling method, such as the bootstrap method but with correlations taken into account. In other words, from the entire set of A expectation values, one randomly samples a new set of the same size as the original but with replacement, such that a value may appear more than once. One then calculates A for this resample. But then, instead of repeating this procedure independently for A 2 , one resamples instead the same subset of METTS. So, if expectation value of A calculated from a particular METTS is included in the resample n times, the corresponding expectation value of A 2 is included n times as well.
The bootstrap method then consists of repeating the above process a large number of times, each time obtaining an estimate for the second cumulant A 2 − A 2 . If the process is repeated enough times, the resulting distribution of second cumulant estimates should be approximately Gaussian. Finally, the error in the second cumulant may be estimated as the standard deviation of this Gaussian distribution.

C. Collapsing METTS into Classical Product States
After generating a METTS by imaginary time evolution and calculating observables of interest, the METTS must be collapsed into a CPS so that a new METTS can be generated with the correct probability. This can be done in any basis, a fact which one can exploit to reduce the autocorrelation time of the pure state algorithm and to accelerate the measurement of certain observables.
It is also important, for reasons of efficiency, to carry out this collapse for each site one at a time. We will therefore discuss how to collapse a METTS in practice and show that the resulting algorithm scales as N m 2 d 2 .

CPS Collapse Algorithm
Since collapsing METTS is simplest for the case of a spin 1/2 model, let us begin by considering this case before discussing the general method.
Say one wishes to collapse the spin of a pure state |Ψ at site i along then axis. Define the states |+ and |− at site i such thatn · S i |± = ± 1 2 |± . Then the probability of finding the spin at site i to be in the |+ state is p + = Ψ|n · S i |Ψ + 1 2 and to be in the |− state is After determining the new state for site i using a random number generator, the actual collapse is enacted by the projection Finally, once this procedure is carried out for every site in the system, one obtains a CPS |Ψ with probability | Ψ |Ψ | 2 . Now consider the more general case of a lattice system where the local Hilbert space at each site i can be expanded in an orthonormal basis |m i with m = 1, ..., d i . We again emphasize that this basis can be chosen arbitrarily for each site i and for each step of the algorithm.
Define P i (m) = |m i m| i as the projection operator into the state |m i . Then the probability of finding site i to be in the state |m i is just p m = Ψ|P i (m)|Ψ .
Having selecting a particular state |m i according to this probability, one completes the collapse by making the replacement Before discussing the detailed implementation of this method for MPS, let us conclude with some explicit examples of projection operators. From the above discussion, we can immediately identify the projectors for the S = 1/2 basis states |+ and |− as Or, if we consider an S = 1 system and take |m i with m =1, 0, 1 to be the eigenstates ofn · S, the P i (m) may be written explicitly as

Collapsing an MPS Into a CPS
Having defined the CPS collapse algorithm in a general setting, let us see how to apply it to a state |ψ A represented as an MPS. First, assume that the orthogonality center of the MPS has been chosen to be the first site. In other words, assume that the matrices A sj such that j = 2, 3, . . . , N have been made right orthogonal as in Eq. (14).
One begins the collapse by first computing the expectation values of the projectors P 1 (m). As discussed in section II B, the existence of a well defined orthogonality center allows us to compute these expectation values in terms of the matrix A s1 alone. Explicitly, one computes Once the new state of site 1 is chosen to be |s 1 = |m 1 , say, with probability p m1 = ψ A |P 1 (m 1 )|ψ A , the state could be collapsed by naively applying the projector P 1 (m 1 ) to it. However, while the action of a generic single-site operator would be followed by an SVD to orthogonalize site 1 and turn site 2 into the center site, the fact that the CPS collapse operators are projectors allows for an even more efficient algorithm.
Because the repeated action of single-site projectors on any state produces a product state and a product state can be represented as an MPS with bond dimension 1, if the above step of acting a projector P 1 (m 1 ) on site 1 and left orthogonalizing A s1 were performed, the new bond index α 1 connecting A s1 to A s2 could be truncated such that it only takes on one value. Anticipating this result, one can therefore directly replace A s1 with the 1 × 1 matrix A s1 = s 1 |m 1 . Finally then, to ensure that P 1 (m 1 )|ψ A still describes the same state of the remaining sites not acted on by the projector, one must replace A s2 according to That is, the label m 1 plays the role of a new bond index that runs over just one value. In fact, because it plays a trivial role, this label can be dropped with the result that the new state of the system is a product of a singlesite wavefunction for site 1 and an N − 1 site MPS for Step Number the remaining sites. Having made this identification, the second site now lies on the boundary of an MPS and can be collapsed in exactly the same way as the first. Although applying a local operator typically scales as m 3 d 2 per site, the considerations above show that the CPS collapse step of the pure state algorithm can be optimized even further. The result is an algorithm that scales as m 2 d 2 where the costliest step is the formation of the new A matrix in Eqn. (30).

Choosing the CPS Basis for Spin Models
When collapsing a METTS into a CPS, one has the freedom to choose the CPS basis arbitrarily for each METTS and at each site. We would therefore like to exploit this freedom first of all to ensure ergodicity and also to minimize correlations among the METTS.
One simple strategy for improving ergodicity is to pick a random quantization axisn for each site at every step, and collapse the spins into the basis ofn · S eigenstates. Then if the Hamiltonian conserves a quantity such as the totalẑ magnetization S z tot , such random measurements will generate CPS which explore many different S z tot sectors. Moreover, random projections do not explicitly break rotational symmetry in the sense of favoring a particular axis from the outset. And, we have found that in practice random projections work quite well and give a short autocorrelation time as shown in Fig. 6.
In contrast, one could instead collapse every site along the same axis, sayẑ. This has a number of advantages, including being simpler to code and making measurements of diagonal operators easier to perform, as we will discuss below. One particularly nice feature of using aẑonly basis is that, since all of the collapsed spins lie along the same axis, the resulting CPS can be easily visualized and plotted.
For example, in Figs. 7 and 8 we show the CPS obtained after collapsing subsequent METTS in simulations of the Heisenberg and AKLT models and mark points at which the spins fail to follow the diluted Néel pattern that underlies the string order in the Haldane phase. It is interesting to see the number of defects decrease as the temperature is lowered, and to observe that for the AKLT case, they disappear entirely as T → 0.
A serious drawback to theẑ-only approach, however, is that it leads to problems with ergodicity. In practice, using theẑ basis only tends to lengthen the autocorrelation time, reducing the efficiency of the algorithm for calculating thermal averages. For example, one can clearly see strong correlations in the positions of the defects plotted in Figs 7a and 8a. And, correlations in energy measurements can persist over many steps as shown in Fig. 6.
Since we would like to retain the advantages of theẑonly basis but cannot afford to sacrifice ergodicity, one solution is to collapse METTS along theẑ axis only on even numbered steps. If we then collapse along random axes after odd steps, we can maintain a short autocorrelation time.
This compromise suggests an even better approach, however. Instead of switching to random axis CPS collapses, we can use a basis that is maximally mixed relative to the S z eigenstates. For example, in a S = 1/2 system one would perform odd step collapses along thê x axis. As S x eigenstates are in an equal superposition of S z eigenstates, the autocorrelation time is expected to be minimal since any memory of the previous CPS will be completely erased.
Likewise, for S = 1 systems, one can perform odd step collapses in the maximally mixed basis given by As shown in Fig. 6, using this basis on odd steps reduces correlation effects even more effectively than the random axis approach, for which the autocorrelation time is already quite short.

Diagonal Measurements with Product States
While the cost of computing expectation values of quantities such as the energy or the magnetization is quite low for each METTS produced, other measurements can be very costly.
For example, we may want to measure real space spin correlations C(∆) = i S i · S i+∆ . One approach would be to create an MPO that encodes the sum over all pairs of spin operators separated by ∆. Or, we can work with MPOs for the individual spin operators and perform the sum above for each METTS. In practice however, we have found that both approaches are very costly because of both the time needed to construct the MPOs and to compute the expectation values.
But, assuming we are studying a spin rotationally invariant system, only the S z i S z j correlations are needed as the others are equal by symmetry. And, since S z i S z j is diagonal in the product basis of S z eigenstates, we can take advantage of this fact to perform correlator measurements using the CPS collapsed from each METTS instead of using the METTS themselves. As long as the operator we wish to calculate is diagonal in the basis of the collapsed CPS, we will still obtain the correct thermal average. So, if one performsẑ-only collapses as discussed in the previous section, the operator content of measuring an diagonal observable such as C z (∆) = i S z i S z i+∆ can be ignored entirely. That is, if we collapse a METTS into a CPS |Ψ in theẑ basis such that |Ψ = |s 1 |s 2 |s 3 . . . |s N , then we immediately have C z (∆) = i s i s i+∆ . One can make even better use of the resources expended in producing each METTS by re-collapsing it multiple times and measuring each resulting CPS. This procedure also produces the correct thermal average for diagonal observables and is much more efficient for large β than collapsing each METTS only once. By using the pure state algorithm in this way, one explicitly samples not only the thermal fluctuations, but the quantum fluctuations as well.
Finally, since the above method relies on performing CPS collapses along theẑ axis, it is susceptible to the same ergodicity issues mentioned earlier. However, ergodicity can be restored by alternating betweenẑ axis collapses on even steps and random or maximally mixed collapses on odd steps. And, for the S = 1/2 case, since the maximally mixed basis is just thex basis, we can perform measurements of quantities like C(∆) for rotationally invariant models on every step by treating thex basis as an effectiveẑ basis.

III. PROPERTIES OF METTS
While the pure state algorithm is effective for computing thermal averages, it also generates an entire wavefunction at each step that may be thought of as a snapshot of the system. Every METTS it produces can be used to simultaneously calculate many different observables and look for characteristic fluctuations or short range order.
Studying the properties of METTS can also help us to understand why relatively few METTS are needed to accurately estimate observables. By identifying the METTS ensemble which maximizes this sampling efficiency for a simple model, we will discover that while the entanglement entropy of such an ensemble is not always optimal, it may be minimal in a more restricted sense. Throughout both chains, antiferromagnetic spin correlations are clearly visible but thermal fluctuations are also apparent. For example, the AKLT chain in Fig. 10 shows longitudinal fluctuations in the magnitude of S around site 40. In the inset, one can see that the local bond energy is also correspondingly higher at this point in the chain. This may be understood as a fluctuation away from the spin liquid ground state in which S = 0.
A second type of thermal fluctuation that is visible is a twist in the staggered magnetization, such as the one near site 30 of the Heisenberg chain of Fig. 9. Near this site, the spin length is also shortened and the local bond energy is relatively low. After the twist, a second region of correlated spins persists for a few correlation lengths until another twist occurs near site 70.
It is interesting to observe that since each METTS is a pure state, the von Neumann entanglement entropy for bipartitions of the system about each bond is well defined, and we normalize it here such that a spin 1/2 singlet has an entropy of 1. As Figs. 9 and 10 show, in the regions where the spins are more classical and the bond energy is relatively higher, the entanglement entropy is lower, signifying weaker quantum fluctuations.

B. Energy Measurements and the Efficiency of METTS
Let us now turn our attention from the properties of individual METTS to ensembles of METTS. An impor- tant concept will be that of a thermal decomposition, by which we mean a set of pure states |φ(i) and weights P (i) such that In fact, all density matrices can be decomposed in this way and the |φ(i) need not be orthogonal, though they should be normalized. From Eqs. (3) and (4), one sees that every METTS ensemble corresponds to such a decomposition. Implicit in Eq. (33) is a division of thermal effects into classical fluctuations quantified by the weights P (i), and quantum fluctuations present within the states |φ(i) , although this distinction is somewhat arbitrary as there exist many different thermal decompositions. But, because we are interested in sampling the |φ(i) rather than calculating them all, our sampling process will be especially efficient when most of the thermal effects can be captured within the states themselves, rather than in their distribution.
To make this intuition more precise, let us consider the energy fluctuations within a given thermal decomposition. Using the notation A i = φ(i)|A|φ(i) , we may write the specific heat as However, the fact that we are working with an ensemble of pure states also allows us to define a 'quantum specific heat' that quantifies how much each state |φ(i) contributes to the full specific heat on average. Now it turns out that, up to a constant factor, the difference between the quantum specific heat and the total specific heat is nothing but the variance of the expectation value of the energy within the ensemble, that is So, the smaller this difference, the fewer the number of states that must be sampled to estimate the energy to a given accuracy. Now turning to a particular choice of a thermal decomposition, let us first consider the exact energy eigenstates of the system, taking |φ(i) = | i and P (i) = e −β i . Since H is diagonal in this basis, it follows that H 2 i = H 2 i . Therefore the quantum specific heat is precisely zero and, as a result, the variance in the energy is maximal. So even if one could sample the exact energy eigenstates of the system, they would make an exceedingly poor basis for the purpose of estimating the average energy.
Next, consider a decomposition of the thermal ensemble in terms of METTS. From the calculation of both the specific heat and the quantum specific heat shown in Fig. 11 for the S = 1 Heisenberg chain, we can see that, even near the peak where the difference between C v and C (q) v is greatest, the quantum specific heat is only smaller than the total by about a factor of 0.8. This indicates that the METTS decomposition we sampled is quite close to being optimal. Physically, we may attribute this efficiency to the fact that, for a given temperature T , each METTS has a significant overlap with all of the energy eigenstates | i such that i T .
Finally then, let us see what would happen if we use a decomposition of the thermal ensemble in which this overlap with the energy eigenstates is maximal. Consider the overcomplete, unnormalized basis of states each parameterized by the set Θ = {θ i }. Then the normalized states given by form a decomposition of a thermal ensemble with uniform weights P (Θ) = const. If we now imagine sampling even just one such state, the resulting estimate of the energy would be exact, since In a similar manner, one may obtain the exact specific heat from just one of these states as well, which implies that C (q) v = C v for this decomposition. Formally, then, this particular basis is the most optimal one for sampling the average energy of the system. However, constructing even one state |Θ amounts to diagonalizing the entire Hamiltonian, which would render a sampling approach irrelevant.
In the end, then, the METTS decomposition achieves both a remarkably high sampling efficiency in practice, as quantified by C v − C q v , while also being one of the least costly decompositions to produce.

C. In What Sense are METTS Minimally
Entangled?
Consider a system consisting of just two spin 1/2 moments, A and B in a pure state |ψ . We call |ψ entangled if it cannot be factorized, that is, if it is not a CPS.
To compute the entanglement entropy S[ψ] one first forms the density matrix ρ A describing spin A alone by tracing out the states of B ρ A = Tr B |ψ ψ| . (40) One may define ρ B likewise. In terms of these reduced density matrices, S[ψ] is then At finite temperature, however, A and B are not in a pure state |ψ , but in the mixed state described by the density matrix ρ = 1 Z e −βH . While for a mixed state one can no longer define an entanglement entropy in the same manner, one could choose a particular decomposition of ρ as in Eq. (33) and compute its average entanglement entropy However, S φ [ρ] turns out to depend upon which decomposition we choose. A more meaningful measure of mixed state entanglement is the entanglement of formation E[ρ] defined as the minimum value of S φ [ρ] amongst all possible decompositions 9 While such a minimization would be difficult to compute in general, an exact expression has been found by Wootters for the case of a two qubit system. 10  Before we make this comparison, though, let us review Wootters' method for computing the entanglement of formation. First define the concurrence of a pure state |ψ of two qubits as where |ψ is the time reversal conjugate of |ψ . Concurrence is an alternative measure of entanglement, and may be used to compute the standard von Neumann entanglement entropy Eq. (41) through the relation . Now for a general mixed state of two qubits, it may be shown that there exists at least one optimal decomposition of ρ which saturates the minimum in Eq. (43) and which consists of pure states all having the same concurrence, and hence the same entanglement entropy. This minimal concurrence value is given by where the λ i are the square roots of the eigenvalues of the matrix ρ (σ y ⊗ σ y )ρ * (σ y ⊗ σ y ). It therefore follows that the entanglement of formation of a two qubit system is given by Equipped with this explicit formula for the entanglement of formation, let us now consider a specific system, namely the two site S = 1/2 Heisenberg model with Hamiltonian and take the antiferromagnetic J > 0 case. As β → ∞, the unique ground state |ψ 0 of H is the singlet formed out the the spins A and B, which has an entanglement entropy S[ψ 0 ] = 1. So, the entanglement of formation must also be E[ρ] = 1 at zero temperature. In the opposite limit as β goes to zero, E[ρ] must go to zero. As shown in Fig. 12, it actually reaches zero at a finite temperature of βJ = ln(3). Thus for any smaller value of βJ, the density matrix is separable and may be decomposed entirely in terms of CPS . 11 Now let us compute the average entanglement of a METTS decomposition of this system. To do so, we must first choose the underlying CPS |i to be used in producing our METTS. One possible choice is to take the set |i to be the complete basis ofẑ eigenstates. Because this yields a decomposition consisting of only four METTS we can calculate them exactly. The resulting average entanglement entropy of this "ZZ METTS" ensemble is plotted in Fig. 12.
Note that we could have also used the pure state algorithm to sample this decomposition by collapsing each METTS into theẑ basis on every site and then thex basis on every site on alternate steps. The METTS produced after either type of collapse can then be used to measure the entanglement since the system is rotationally invariant. Collapsing into theẑ basis alone, on the other hand, will not sample all four METTS since the Hamiltonian conserves total S z .
We may also choose CPS bases other than the S z eigenstates and thereby find METTS decompositions with even lower average entanglement. Consider the XZ METTS derived from the orthonormal CPS basis {| ↑→ , | ↑← , | ↓→ , | ↓← }, where up and down arrows denote S z eigenstates and left and right arrows denote S x eigenstates. As shown in Fig. 12, the average entanglement of these METTS are uniformly lower than the ZZ METTS.
What is more, the variance of both energy and entropy is exactly zero for the XZ METTS ensemble. This follows from the fact that any XZ METTS can be transformed into any other by global spin rotations that commute with H. Therefore we can conclude that the XZ METTS are the optimal METTS decomposition for this system, at least up to a global spin rotation. Note that one may also sample the XZ METTS with the pure state method by collapsing them into thex andẑ bases on alternating steps and on alternating sites. Regardless of the basis we choose, however, one can see from Fig. 12 that the average entanglement of each METTS decomposition goes to zero smoothly as β → 0 and always exceeds E[ρ]. Therefore, because even the optimal METTS decomposition is greater than E[ρ] for any finite temperature, we can conclude that METTS are not minimally entangled.
However, we may ask if the optimal METTS decomposition is instead minimally entangled in a more limited sense. For this purpose it will be helpful to look at a special class of thermal density matrix decompositions. Beginning from a decomposition at some particular temperature β 0 one can generate a decomposition for ρ at a different temperature β by making use of the relation where the pure states defining the new decomposition are given by We may consider this new decomposition the extension of the decomposition of Eq. (49), or an extended decomposition for short. Note that the entanglement of an extended decomposition is always an analytic function of β. For our two qubit system then, this implies that no extended decomposition can have an entanglement equal to E[ρ] at all temperatures. In fact, we would like to conjucture that no extended decomposition can even have an entanglement below the optimal METTS decomposition at all temperatures. To see why this may be the case, let us attempt to construct a counterexample for the two qubit system. At any temperature β 0 , we can construct an optimal decomposition, following Wootters, such that the entropy of each state |ψ i is equal to the entanglement of formation. Then, because the extensions of these decompositions will have the least possible entanglement at β 0 , we expect that they will also have low entanglement at other temperatures. Now, there can be multiple optimal decompositions for each starting temperature β 0 . However, we have found that there is always one whose extension is less entangled than the others and has zero entropy variance for all temperatures, not just β 0 . As shown in Fig. 13, while such optimal extensions have less entanglement than the optimal METTS at low temperatures, they eventually exceed the optimal METTS as the temperature is increased. Moreover, the less entangled these extensions become at high temperature, the more they resemble the optimal METTS ensemble itself.
To check that this behavior is general, we can also study the two qubit system in a field with Hamiltonian Here we focus on the case J > 0 and h ≥ 0. For small fields, the temperature dependence of the entanglement of formation E[ρ] remains similar to the zero field case. Then at h = J, the system undergoes a first-order quantum phase transition such that the ground state is no longer a maximally entangled singlet, but the fully polarized CPS with zero entanglement. And once h is greater than J, the entanglement of formation no longer behaves as a monotonic function of temperature. Regardless of the value of h, though, E[ρ] is strictly zero for βJ ≤ ln (3) indicating that ρ becomes separable above this temperature.
In Fig. 14 we have taken a particular value of the field h > J and plotted the entropies of various optimal extensions. As before, each extension either exceeds the optimal METTS entanglement over a significant temperature range or else approaches the optimal METTS decomposition ever more closely as this range is reduced. Also, it is interesting to note that in the presence of a field it is the XY METTS (generated from S x and S y eigenstates) which now form the optimal METTS decomposition, at least up to a rotation about theẑ axis. This is because the XZ METTS are no longer required to be physically identical by the symmetries of the Hamiltonian and will exhibit a finite entropy variance.
Going forward, it will be interesting to explore the extent to which the observations made here hold for larger systems and more complex models. It would be especially useful to know if an optimal METTS decomposition with ideal sampling efficiency always exists and whether it can be efficiently computed.
And although the average entanglement entropy of a METTS decomposition is not an entanglement measure in a strict sense (for instance, it is not zero when ρ is separable), 12 it is conceivable that its scaling behavior could reveal non-trivial information about a system. Whether or not one must find an optimal METTS decomposition to access such information remains to be seen. But, because the METTS pure state algorithm is efficient enough to simulate moderately large two-dimensional systems, the entanglement properties of METTS could turn out to be a useful tool in characterizing exotic phases or critical points.

Appendix: Working with Matrix Product States and Matrix Product Operators
A matrix product state (MPS) is a scheme for representing a wavefunction, either exactly or approximately, by writing its amplitudes in some particular basis as a product of matrices. A general MPS may be written as where the site labels s i take on values s i = 1, 2, . . . , d labeling the local basis.
For an adjacent pair of matrices A si µν A si+1 νλ , if the index ν summed between the matrices takes on values 1, 2, . . . , m one says that the bond between matrices i and i + 1 has bond dimension m. Though m can vary from bond to bond, in the following discussion we assume that it has a fixed value for a given MPS. And, while in the presence of periodic boundary conditions one must trace over the above product of matrices in order to reduce it to a scalar amplitude, in what follows we work exclusively with open boundary conditions such that A s1 and A s N are just 1 × m and m × 1 matrices, respectively.
A matrix product operator (MPO) is defined similarly to an MPS except that the decomposition is over a basis of local operators. A general MPO may therefore be written aŝ where for open boundary conditions the dimensions of the first and last matrix are again 1 × m and m × 1.

Creating MPOs
The simplest MPOs to create are those describing products of local operators, that isŴ = Π jÔj . In particular, one can represent a single site operatorÔ i this way by takingÔ j = 1 for all j = i. If the operatorŴ is such a site-factorized product, its MPO representation has a simple product structure as well. Each W tisi is actually just a 1 × 1 matrix defined by W tisi = t i |Ô i |s i .
Another simple class of MPOs are those describing translationally invariant sums of operators in one dimension in which case the matrices W can be taken to be identical (except at the boundaries) and of lower triangular form. For example, the MPO of the operator S z tot = N i=1 S z i may be represented in terms of the bulk matrices W tisi = δ tisi 0 (S z i ) tisi δ tisi together with the boundary matrices W t1s1 = [0 δ t1s1 ] and W t N s N = [δ t N s N 0] T . Translationally invariant one dimensional Hamiltonians can likewise be represented by translationally invariant lower triangular MPOs of bond dimension 3 and higher. 14 However, there may not be always such a simple, explicit prescription for operators of interest, in particular Hamiltonians of two dimensional systems. For use in such cases, we present below general algorithms for the addition and multiplication of arbitrary MPOs.
Equipped with these algorithms, the MPO representation of any Hamiltonian that is a sum of local terms may be found by first obtaining the MPOs for each term as described above and then summing them together. The MPO multiplication algorithm is then useful for forming other important operators such as H 2 for specific heat measurements or e −τ H for imaginary time evolution.

Adding MPS and MPOs
First consider adding two MPS |ψ A and |ψ B with bond dimensions m A and m B . If |ψ C = |ψ A + |ψ B , then which in block form is More formally, one may write this definition of C si as where the χ indices run over values 1, 2, . . . , (m A + m B ).
After constructing the C matrices as above, the addition has been carried out exactly. However, one usually wants to work with an MPS for |ψ C that has a bond dimension comparable to m A or m B , not m A + m B . One therefore usually truncates the MPS representation for |ψ C as follows.
Treating C 1 = C s1 χ1 as a matrix C s1χ1 , compute its SVD During this step, only m C singular values are kept such that χ 1 = 1, 2, . . . , m C . Finally, replace C 1 with C s1 χ 1 = U s1χ 1 and replace C 2 with C s2 χ 1 χ2 = Λ χ 1 V † χ 1 χ1 C s2 χ1χ2 . The next bond in the truncated MPS for |ψ C is now found by treating the new C 2 as a matrix C (s2χ 1 )χ2 and computing its SVD. The matrix C 2 is then set to U 2 and C 3 is multiplied by Λ 2 V † 2 just as C 2 was in the first step. This truncation process may then be likewise repeated for every remaining C matrix.
The addition of MPOs proceeds in complete analogy to the addition of MPS. To compute the MPOẐ such thatẐ =Ŵ +X, one defines the matrices Z as where the only difference from the MPS case is that there are now d 2 matrices to be added instead of d. Thinking of the pair of indices s i and t i as a fat index (s i t i ), one may likewise truncate the matrices Z by performing sequential SVDs as above.
The MPS addition algorithm above has a computational cost of N m 3 d 2 for m A = m B = m C = m. Similarly, the addition of MPOs scales as N k 3 d 4 where k is the MPO bond dimension.

Multiplying MPOs
Consider the task of multiplying two MPOsŴ andX with the goal of producing a product MPOẐ =ŴX. Let these MPOs be written as products of W , X and Z matrices with bond dimension k.
This multiplication may be carried out one site at a time in a process analogous to the zip-up algorithm discussed above. Therefore, the first step is to map each MPO to an MPS by combining its t and s (or primed and unprimed) indices at each site into a fat index. These MPS should then be transformed such that their orthogonality center is at the first site, and then mapped back into MPOs. In doing so, one ensures that in the truncation step below, the basis of states surrounding the local tensor is not overly ill-conditioned. However, because this basis will not be orthonormal, it is important to use a truncation error cutoff instead of a fixed k to determine the final MPO bond dimension. Now, to carry out the multiplication beginning with the first site, define the tensor R 1 as Thinking of R 1 as a matrix R (s1t1)(ω1ξ1) , compute its SVD to obtain where the singular values are truncated such that only the k largest are kept. The first matrix ofẐ may now be defined as Z s1t1 ζ1 = U (s1t1)ζ1 . The remaining Z matrices may now be found by proceeding recursively. Define R 2 in terms of R 1 as (62) Again, R 2 may be thought of as a matrix and its SVD computed as The second matrix ofẐ may now be defined as Z s2t2 ζ1ζ2 = U (s2t2ζ1)ζ2 and the algorithm continued by defining R 3 analagously to R 2 above in terms of U 2 , R 2 , W 3 and X 3 .
If the contractions in Eq. (62) are carried out in the order indicated, this algorithm for multiplying MPOs scales as N k 4 d 3 with the most expensive step being the construction of the R tensors.