A Scalable Maximum Likelihood Method for Quantum State Tomography

The principle of maximum likelihood reconstruction has proven to yield satisfactory results in the context of quantum state tomography for many-body systems of moderate system sizes. Until recently, however, quantum state tomography has been considered to be infeasible for systems consisting of a large number of subsystems due to the exponential growth of the Hilbert space dimension with the number of constituents. Several reconstruction schemes have been proposed since then to overcome the two main obstacles in quantum many-body tomography: experiment time and postprocessing resources. Here we discuss one strategy to address these limitations for the maximum likelihood principle by adopting a particular state representation to merge a well established reconstruction algorithm maximizing the likelihood with techniques known from quantum many-body theory.


I. INTRODUCTION
The ability to prepare and manipulate quantum mechanical states is crucial for implementing quantum information processing and hence for building quantum computers [1]. Thus, it is important to verify the algorithms realized on quantum mechanical systems by conducting measurements on the system. This is achieved by means of quantum state tomography [2,3], quantum process tomography [4] and quantum detector tomography [5,6] which together are capable of characterizing the three main stages of a quantum experiment. Here we concentrate on the task of quantum state tomography for an informationally incomplete set of observables. Deducing a quantum mechanical description of the system from these measurements is by no means trivial and what is the most appropriate method to invert the experimental data is still a matter of controversy [7,8]. The strategies of quantum state tomography range from directly inverting the measurements [9] to statistical approaches known from classical probability theory such as maximum likelihood estimation [9,10] and Bayesian inference [8,11]. The latter methods where introduced into the field of quantum state tomography to circumvent the possibility of negative eigenvalues of the state estimation arising by inverting Born's rule for relative frequencies, i.e., noisy data. In principle, maximum likelihood estimation and Bayesian inference divide the approach to statistical inference into two schools. The former is the method at hand for the frequentist statisticians where probabilities are interpreted as the infinite limit of relative frequencies obtained in a measurement process. Principally, the probabilities can be determined solely by repeating the measurements on the system infinitely often. For Bayesian statisticians, probabilities are a degree of belief in a certain event. The prior probability distribution imposed by the observer is then updated exploiting additional knowledge (i.e., measurements) of the system. Both principles have their advantages and drawbacks in the field of quantum state tomography. While maximum likelihood estimation is biased towards rank deficient states, no prior knowledge is required. In contrast, in Bayesian inference it is often not clear how to choose the prior distribution, whereas the estimates will have full rank [8,11]. In this manuscript, we will focus on the frequentist approach to quantum state tomography.
Maximum likelihood estimation is currently the method of choice and has proven to yield satisfactory results for moderate system sizes [12]. However, as the system sizes realized in the laboratory steadily increase [12][13][14][15][16], new techniques for performing quantum state tomography on large system sizes are required due to the exponential growth both in the number of required measurements and in the classical post-processing that is required to connect the measured data with a description of the quantum state. Recently, several scalable schemes for verifying the state in the laboratory by a fidelity estimation or learning the state via a concrete state reconstruction have been presented [17][18][19][20][21][22][23][24][25][26]. The efficiency of the scalable schemes stems from the exploitation of additional structure of the considered states such as a low rank, a local nature of correlations or a specific form of the state. If these assumptions are true, they allow for a reduction of the number of parameters which have to be determined experimentally and, simultaneously, for a reduction of the experiment time.
To obtain an estimate of the state within these models, the scalable tomography schemes come along with efficiently implementable reconstruction algorithms to reduce the required post-processing resources. The algorithms are designed, for instance, to optimize a fit function over pure states [24], to adopt maximum likelihood estimation to permutationally invariant states, which can be represented efficiently [25], or to invert local measurements to obtain a (not necessarily positive) state estimate [26].
In this manuscript we strive to push the applicability of large scale state tomography further towards an efficient implementation of a maximum likelihood estimation scheme exploiting both the spirit of maximum likelihood reconstruction and adopting a specific representation of the states. We are considering one-dimensional quantum systems composed of N finite-dimensional d-level subsystems (e.g., qubits with d = 2). To realize the scalable tomography scheme we show how to implement a well established fixed point algorithm [7] maximizing the likelihood function by means of matrix product states and operators [27][28][29]. Note that in this manuscript we restrict to the iterative fixed point algorithm for maximizing the likelihood function. One could, however, also adopt a direct maximization of the likelihood function using convex optimization tools together with a suitable representation of the considered states (this has been demonstrated, e.g., for arXiv:1308.2395v2 [quant-ph] 28 Jan 2014 permutationally invariant systems in [25]). This manuscript is organized as follows. We start by reviewing the maximum likelihood approach to quantum state tomography in section II. Then, we introduce the fixed point algorithm and discuss a modification of the latter when a pure state estimate is desired. In section III we show that all operations necessary for executing the maximum likelihood algorithm can be implemented by means of matrix product states and operators. Here, we solely discuss the individual computational steps and postpone details to the Appendix. If a certain operator comprising the POVM elements of the measurement setting has a matrix product operator representation of low bond-dimension, we show how this can be done efficiently. In the Appendix we discuss how this operator can be recast to fit into the framework of matrix product states and operators for two specific measurement settings. In the last section of this manuscript, section IV, we present numerical results demonstrating the performance of the algorithm for numerically simulated states.

II. QUANTUM STATE TOMOGRAPHY VIA MAXIMUM LIKELIHOOD ESTIMATION
In this section we review the maximum likelihood approach to quantum state tomography and discuss a fixed point algorithm which maximizes the likelihood function [3,7]. We further motivate a modification of the algorithm when a pure state estimate is desired. The latter reformulation reduces the computational cost of the algorithm since it only requires matrix-vector multiplications instead of matrix-matrix multiplications.
Quantum state tomography is a procedure for estimating the density matrix from repeated measurements on M identically prepared quantum systems. Let {Π i }, i ∈ ∆, be the set of all POVM elements corresponding to the measurements performed on the system where ∆ is an index set associated to these measurements [30]. In the measurement process we record the number of times n i the outcomeΠ i is obtained for all i ∈ ∆, i.e., M = i∈∆ n i . Now, let D be the state space of the physical system under consideration. According to Born's rule, the conditional probability of measuring outcome i given stateˆ ∈ D is equal to The joint probability p(n|ˆ ) of registering the specific outcomes n i , i ∈ ∆, is given by The likelihood function L(ˆ ) is interpreted as the conditional probability of measuring on the stateˆ when registering outcome n, i.e., it is considered to be a function on D for a specific outcome of a measurement series. The maximum likelihood estimateˆ ML is the element of the state space D which maximizes the likelihood function. Therefore, it is considered as the most likely state given a set of measurement outcomes. Instead of maximizing the likelihood function L(ˆ ) di-rectly, one maximizes the log-likelihood function which yields the same estimateˆ ML due to the fact that the likelihood function L(ˆ ) is positive on the state space and the logarithm is strictly increasing. The resulting function reveals the property of being concave and since we are optimizing over a convex set we are left with a convex optimization problem [31]. It is well known that the stateˆ ML maximizing the log-likelihood satisfiesˆ ML =R(ˆ M L )ˆ ML where the positive operatorR(ˆ ) is given by [3] Since both the operatorR(ˆ ) andˆ are Hermitian, the extremal equationˆ ML =R(ˆ M L )ˆ ML can be transformed to yield a well-established fixed point algorithm [3] where the operationR(ˆ k )ˆ kR (ˆ k ) preserves the positivity of the current iteration, the function N : H d N ×d N → D mapping Hermitian matrices to states denotes normalization to trace one and the algorithm is usually initialized by an unbiased initial state such as the completely mixed state. Note that, although heuristically convergent, there is no analytical proof that the fixed point algorithm (4) converges to the maximum of the log-likelihood function. Diluting the operator R(ˆ ) to (1 + R (ˆ ))/(1 + ), one can show that for 1 the log-likelihood increases in each step of the fixed point algorithm at the expense of the rate of convergence [7]. Since the log-likelihood function is concave, this guarantees that the diluted fixed point algorithm converges to the maximum.
Under the assumption that the state in the laboratory is pure, the fixed point algorithm can be rewritten to optimize only over pure states. In particular, iterating will yield a pure estimate of the desired state. Note, however, that restricting the state space to rank one matrices alters the optimization problem to being non-convex. This implies that although the objective function, i.e., the log-likelihood, is still concave the feasible set, i.e., the set of rank one matrices, is non-convex and hence it is not guaranteed that the global maximum is attained by increasing the log-likelihood in every step, both in the standard formulation and the diluted version of this pure state algorithm. Nevertheless, as we will see, iterating only over pure states yields satisfactory results in practice even for noisy data. Now, of course, in a realistic setting the state prepared in the laboratory will never be pure. A pure state estimate, however, can nevertheless be used to certify whether the (possibly mixed) state implemented in the laboratory is close to this pure estimate by means of the techniques described in [24]. With this, one can construct a lower bound on the fidelity of the pure state estimate with respect to the actual state only by means of the (experimentally determined) reduced density matrices.

III. MERGING MATRIX PRODUCT OPERATORS WITH THE MAXIMUM LIKELIHOOD ALGORITHM
In this section, we introduce the family of matrix product states and operators [27][28][29] and further show that the iterative fixed point algorithm (4) can be implemented using this formalism. We only give a brief summary of the individual steps and postpone details to the Appendix.
Letˆ ∈ D be a state describing a physical system comprising N d-level subsystems. Then,ˆ can be written aŝ where {P (α k ) k } is an operator basis of C d×d for site k, α k = 1, . . . , d 2 enumerates the basis elements per site and P k [α k ] ∈ C D k ×D k+1 are complex matrices for k = 1, . . . , N with D 1 = 1 and D N +1 = 1. Every state can be brought to this form and the exponentially growing dimension of the Hilbert space (with respect to the number of subsystems N ) can be covered by exponentially growing matrix dimensions D k . However, many interesting quantum states (in particular living on one-dimensional structures) can be represented with a moderate number of parameters. Examples, amongst others, are ground states and thermal states of gapped local Hamiltonians [32][33][34][35][36][37]. These states have in common that they can be approximated very well by states with low bond-dimension D = max k D k . We call states of the form of equation (6) matrix product operators or matrix product states depending on {P (α k ) k } being a basis of C d×d or C d [28,29] (note that every matrix product state with bond-dimension D is a matrix product operator with bond-dimension at most D 2 ) and collect such states with maximum bond-dimension equal to D in the set M D .
Next, we list the operations required to run the algorithm and then proceed by pointing out the corresponding matrix product operator equivalents. In what follows, we restrict ourselves to a POVM which contains solely observables of tensor product form, i.e.,Π i =π i 1 ⊗ · · · ⊗π i N with i ∈ ∆. The fixed point algorithm is commonly initialized with the completely mixed state. Then, in the k th iteration, the algorithm: 2. constructs the operatorR(ˆ k ), 3. multiplies the current iterationˆ k from both sides witĥ R(ˆ k ), and 4. normalizes the output to obtain iteration k + 1.
Wheneverˆ k ,R(ˆ k ) and henceR(ˆ k )ˆ kR (ˆ k ) may be written as matrix product operators of small bond-dimension, all these steps may be performed efficiently (for a detailed outline of the different operations that are required we refer to the literature [28,29] and Appendix A): We now outline how the four steps above may be performed efficiently. First, computing expectation values of product observables of the formΠ =π 1 ⊗ · · · ⊗π N is straightforward: Secondly, the operatorR(ˆ k ) has to be determined. This operator acts on the state space and hence, generally, has the same dimension asˆ k . To efficiently implement the maximum likelihood algorithm we therefore need to represent this operator as a matrix product operator with small bond-dimension. SinceR(ˆ k ) is basically a weighted sum of the POVM elements, this restricts the set of measurements for which this is possible. To give an example for which this can be achieved, consider POVM elementsΠ i k of the form where k = 1, . . . , N − R + 1 labels all blocks of R consecutive spins and i enumerates the set of local POVM elements. Key point for the efficient representation of the opera-torR(ˆ k ) is that with this specific kind of measurement set-up this operator is in fact a local Hamiltonian. But local Hamiltonians are matrix product operators of low bond-dimension and henceR(ˆ k ) can be stored efficiently in its matrix product operator representation (see Appendix B for further details). Note again that every measurement setting where the operator R(ˆ k ) has an efficient matrix product operator representation with small bond-dimension is suitable for the efficient implementation of the maximum likelihood algorithm and the above form is merely an example. One may also add global observables to the POVM set, a scenario which we consider for the reconstruction of a GHZ-type state below, with a discussion of the construction of the operatorR(ˆ k ) in Appendix C. Thirdly, to compute the next iteration of the algorithm (4) we need to multiply the current iterationˆ k with the operator R(ˆ k ) from both sides. Both objects are stored as matrix product operators. Multiplying two matrix product operators with bond-dimensions D 1 and D 2 results in a matrix product operator of dimension D 1 · D 2 , i.e., bond-dimensions multiply. To keep the bond-dimension at a certain level (e.g., D), we compress the current iterationR(ˆ k )ˆ kR (ˆ k ) that has been obtained after multiplication of the previous estimateˆ k with the operatorR(ˆ k ), i.e., we solve [28] where · denotes the Hilbert-Schmidt norm and M D comprises all matrix product operators with bond-dimension D, see Appendix A for a detailed discussion. Obviously, solving equation (9) is a crucial point in the algorithm. If the operator is not compressible to a low bond-dimension, the algorithm will not be efficient. To recognize this issue in the execution of the algorithm, we choose a method which allows us to track the error made in the compression, i.e., the algorithm aborts when the minimum in equation (9) is not zero (or greater than a prescribed tolerance). Hence, if the norm difference does not converge to zero, we either choose a larger bond-dimension or abort the maximum likelihood algorithm (i.e., throw a flag and abort the tomography scheme). Fourthly, in the final step of one iteration we have to normalize the current estimate to trace one. Again, this can be done efficiently since the trace of a matrix product operator can be computed directly using equation (7). Normalization is then equivalent to dividing each matrix by a fraction of the initial trace.

IV. NUMERICAL SIMULATIONS
In this section we present results of our implementation of the maximum likelihood algorithm iterating over matrix product states and operators [38]. The results suggest that applying maximum likelihood reconstruction to system sizes where the conventional implementations of full tomography fail due to (i) the number of required measurement settings and (ii) the limited post-processing resources is still manageable for appropriate POVM settings.
We are considering a one-dimensional chain of quantum systems implemented on N d-level systems with d = 2, i.e., qubits. We let the POVM elements be of the local form with k = 1, . . . , N − R + 1 labelling the block of size R, i = 1, . . . , 3 R enumerating the basis rotation (all combinations of orientations along the X, Y and Z direction) and j = 1, . . . , 2 R denoting the corresponding projectors per basis orientation on the eigenbasis of the respective Pauli spin operators. This corresponds to measurements on blocks of R contiguous spins. For fixed R, the experimental effort, and hence the associated experiment time, grows linearly in the number of subsystems N reducing the exponential scaling of the total number of measurements from M = 3 N m to a linearly scaling M = 3 R m(N − R + 1). Here, m denotes the number of measurements per basis orientation. We verify the performance of the maximum likelihood algorithm optimizing over matrix product operators (and matrix product states) for thermal states and ground states of nextneighbour Hamiltonians and for GHZ-type states. For the latter, we need to enlarge the above POVM, resulting in a set of measurements determining the GHZ-type state uniquely and a total number of measurements M = 3 R m(N −R +1)+Km, where K is the cardinality of the set of additional (global) measurements.
We simulate the measurements by drawing m times from the exact multinomial distributions corresponding to the 1 2 ..,k+R−1 Figure 1: The considered measurements on blocks of R consecutive spins. Forπ i,j k,...,k+R−1 see equation (11). Here, k = 1, . . . , N − R + 1 labels the block of size R, i = 1, . . . , 3 R enumerates the basis rotation (all combinations of orientations along the X, Y and Z direction) and j = 1, . . . , 2 R denotes the corresponding projectors per basis orientation on the eigenbasis of the respective Pauli spin operators.
POVM elements specifying each basis orientation. Hence, the input to the reconstruction scheme are the relative frequencies f i,j k = n i,j k /m for all i, j and k approximating the exact probability distributions. The reconstructed statesˆ rec are compared to the exact states by means of the renormalized Hilbert- The first example outlines the performance of the algorithm (4) iterating over matrix product operators for thermal states of random next-neighbour Hamiltonians, where the Hermitian matricesr i,i+1 act on sites i and i + 1.
The real and imaginary parts of these matrices are drawn from a Gaussian distribution with zero mean and standard deviation one. Thermal statesˆ = e −βĤ /Z are obtained by an imaginary time evolution using the time evolving block-decimation algorithm [39,40]. From these states, we compute the exact probability distributions corresponding to all contiguous blocks of size R for local tomographically complete measurements, simulate the measurements and reconstruct a state estimateˆ rec . Results are shown in figure 2. The plot indicates that, as expected, (i) the accuracy of the reconstruction scheme increases with an increasing number of measurements and (ii) extending the range R on which measurements are performed decreases the error of the reconstructed estimate. Notably, the mean error for the reconstruction of thermal states corresponding to 45 different random Hamiltonians of the form of equation (12) with R = 3 and perfect measurements, i.e., m = ∞, is of order 10 −3 .
Ground states of Hamiltonians of the form of equation (12) serve as our second example. We obtain the ground states by minimizing the expectation value of the Hamiltonian with respect to a matrix product state with given bonddimension [41]. Sweeping through the chain and optimizing the matrices in the matrix product state site by site such that in each iteration the expectation value with respect to the Hamiltonian decreases, we will end up in a stationary point which serves as an approximation to the exact ground state [28,42]. Again, we compute the exact expectation values corresponding to local measurements on all blocks of R sites, simulate the measurements by drawing m times from the exact multinomial distributions and reconstruct the state using the maximum likelihood algorithm. Here, we are optimizing only over pure states using the iterative algorithm in equation (5). Figure 3 presents the results. Even for very small m we obtain a mean fidelity larger than 0.80 for N = 20 qubits. Moreover, for the exact probabilities (m = ∞) the mean fidelity is close to 1.00 after 5000 iterations indicating that iterating only over pure states does not get stuck in local minima. Note that the iterative algorithm is known to converge very slowly in comparison to other maximization techniques for the likelihood function [25]. Finally, let us consider GHZ-type states of the form where the number of sites N is even and φ ∈ [0; 2π) is a relative phase. These states are not uniquely determined by local measurements. The relative phase φ is a global feature and hence, besides local measurements, we need to perform some global measurements to fully determine the state. This could be done by additionally measuring the operatorsX ⊗N andŶ ⊗X ⊗N −1 since the expectation values f (|ψ, fix the local phase. Note that these expectation values can be determined experimentally for large system sizes because only a simultaneous rotation of all spins is required plus a single site addressing to rotate one spin (e.g., the spin on the very left-hand side) along an orthogonal direction. We define the POVM {Π i }, i = 1, . . . , 4, withΠ 1 = (1 +X ⊗N )/4, Π 2 = (1 −X ⊗N )/4,Π 3 = (1 +Ŷ ⊗X ⊗N −1 )/4 and Π 4 = (1 −Ŷ ⊗X ⊗N −1 )/4. These operators are positive and sum to one. Incorporating these operators into the POVM describing the local measurements can be done by a straightforward normalization of the operators. Let us denote the index set of the full POVM set by ∆ = ∆ 1 ∪ ∆ 2 where ∆ 1 comprises the elements corresponding to local measurements and ∆ 2 comprises the four elements defined above. To apply the maximum likelihood algorithm for matrix product operators to this measurement scheme, it remains to show that the operatorR(ˆ ), see equation (3), can be written as a matrix product operator with small bond-dimension D. In fact, we findR whereR k (ˆ ) = 1 M i∈∆ k ni piΠ i for k = 1, 2. As before, R 1 (ˆ ) corresponds to a matrix product operator with small bond-dimension D 1 due to its local character. Moreover, R 2 (ˆ ) can be written as a matrix product operator with bonddimension D 2 = 2 such thatR(ˆ ) is a matrix product operator with bond-dimension D = D 1 + 2 (see Appendix C for further details). We simulate measurements on the exact GHZ-type state (which is a matrix product state with bonddimension D GHZ = 2) for R = 2 and use the algorithm iterating over matrix product operators to obtain a state estimate. Results are shown in figure 4. This example illustrates that this method is also applicable to states that are not uniquely determined by local measurements only. In general, however, the experimentalist has to have some prior knowledge about the state he intends to implement on a physical system to decide whether additional measurements are necessary. The results suggest that even for a very small number of measurements where each considered basis rotation is measured m = 100 times, the algorithm is capable of reconstructing GHZ-type states very accurately.

V. CONCLUSIONS
We presented a scalable maximum likelihood algorithm for quantum state tomography. The only restrictions for the appli-cability is that the set of measurement operators is polynomial in the system size and that the individual POVM elements are of tensor product form, both of which are anyway generally desirable from an experimental point of view. The reconstruction technique relies on a well-established fixed point algorithm for maximizing the likelihood function. We have shown that this algorithm can be generalized to iterate over matrix product states and operators and hence yields a scalable reconstruction algorithm for quantum state tomography. Of course, for a general state where the measurements do not uniquely specify the state, the fixed point of the scalable maximum likelihood algorithm will not always come close to the true state. If the state is uniquely determined by the measured POVM elements, however, we provided numerical evidence that the algorithm chooses a state estimate which is close to the true state.
We observed that the convergence of the algorithm is very slow caused by the flat log-likelihood function when measurements are done only on a small subset of the full Hilbert space. Moreover, while our first and by no means optimal implementation is capable of dealing with measurements on blocks of size R ≥ 2 the execution of the algorithm becomes rapidly slow with increasing R. An improvement of the implementation is left for future work. For instance, one could think of directly maximizing the concave objective function (e.g., by gradient descend methods) when the states are represented as matrix product states or in any other tailored efficient form. Concretely, restricting to permutationally invariant states and directly maximizing the likelihood, [25] has shown to yield superior performance when high accuracy is required.
The numerical results, however, suggest that the generalization of the iterative algorithm to matrix product operators is able to find state estimates which are close to the exact states already for a small number of measurements, which we verified for thermal and ground states of random next-neighbour Hamiltonians and for GHZ-type states. Furthermore, restricting the algorithm to optimize only over pure states has shown to yield satisfactory results under the assumption that the desired state is pure. This is remarkable since one iterates over a non-convex set. To certify the reconstructed pure estimate one could use the techniques described in [24] to find a lower bound on the fidelity of the state in the laboratory with respect to the pure state estimate. The results presented in this manuscript add into recent research for large scale quantum tomography and push the applicability of quantum state tomography to large system sizes.
We gratefully acknowledge R. Rosenbach for the supply of results of the TEBD algorithm. Computations were performed on the bwGRiD [43]. This work was supported by the Alexander von Humboldt Foundation, the EU Integrating project SIQS, the EU STReP EQUAM and the BMBF Verbundprojekt QuOReP.
In the first part of the Appendix we review some basic properties and specific operations applied to matrix product operators (for a general overview we refer to [28,29] where the ideas discussed in the remainder of this section of the Appendix are presented in far more detail). We focus on the operations required for generalizing the maximum likelihood fixed point algorithm.
To set the notation, we recall the definition of a matrix product operator. For all sites k = 1, . . . , N and α k = 1, . . . , d 2 , an operator basis for site k, e.g., for spin-1/2 particles the orthonormal Pauli spin basis, i.e.,P is called matrix product operator if D = max k D k is low with respect to the state space dimension. Note, of course, that the notation of a low bond-dimension is a bit vague. The efficient simulation of a matrix product operator of bond-dimension D certainly depends on the computational resources available for the post-processing.

Basic Matrix Product Operator Manipulations
In the first subsection of Appendix A we present a derivation of equation (7) in the main text, yielding an efficient strategy for computing expectation values of product observables with respect to matrix product operators. Further, we show how to multiply matrix product operators and give a specific representation of the completely mixed state. LetΠ =π 1 ⊗ · · · ⊗π N be a product observable acting on N d-level subsystems. The expectation value ofΠ with respect to the matrix product operatorˆ is determined via In the derivation we used the fact that for systems i and i + 1 we havê Naively multiplying two matrix product operators with bond-dimensions D 1 and D 2 yields a matrix product operator with bond-dimension at most D 1 · D 2 . In particular, given the matrix product operatorŝ = α1,...,α N Now, for every site k = 1, . . . , N , {P (α k ) k } denotes an operator basis such that we haveP ]. Consequently, the product of two matrix product operators giveŝ Hence, the product of two matrix product operators with bond-dimensions D 1 and D 2 is itself a matrix product operator with bond-dimension at most D = D 1 · D 2 . The maximum likelihood algorithm is initialized by the completely mixed state. Choosing the orthonormal Pauli spin basis in the matrix product operator representation, this state corresponds to a matrix product operator with bond-dimension one. One possible choice of the 1 × 1 matrices defining the completely mixed state is , for α k = 2, . . . , d 2 and k = 1, . . . , N.
Therefore, initializing the matrix product operator formulation of the maximum likelihood algorithm is straightforward.

Compressing Matrix Product Operators
We have seen that the bond-dimension of the current iterationˆ k increases in each step of the algorithm due to the matrix product operator multiplication with the operatorR(ˆ k ). Hence, it is essential to introduce subroutines to keep the bonddimension of the state estimate low. This is done by compressing the matrix product state to a state with smaller bond-dimension at each iteration of the tomography algorithm [28]. In the second subsection of Appendix A we describe the compression of a matrix product operatorˆ with bond-dimension D 1 to a matrix product operatorσ with bond-dimension D 2 < D 1 . For that, we show how to solve the optimization problemσ see equation (9) in the main text. Before we describe the iterative compression scheme, let us first introduce a convenient representation of the matrix product operator. For this, note that the matrices {P k [α k ]} are not unique since inserting 1 = U U † between sites k and k + 1 will not alter the state, but the corresponding matrices {P k [α k ]} and {P k+1 [α k+1 ]}. This freedom in the choice of the matrices allows (by a successive singular value decomposition starting from the left-and right-hand side of the chain) to write the matrix product operator asˆ where the matrices on sites l = 1, . . . , k − 1 satisfy and where the matrices on sites l = k + 1, . . . , N satisfy We call matrix product operators satisfying these relations k-normal matrix product operators [28]. Further, note that for knormal matrix product operators the purity, i.e., the squared Hilbert-Schmidt norm ˆ 2 = tr[ˆ †ˆ ], can be computed by be a matrix product operator with bond-dimension D 1 (not necessarily in k-normal form) andσ be a matrix product operator with bond-dimension D 2 < D 1 satisfyingσ = argmin[ ˆ −ˆ 2 |ˆ ∈ M D2 ], (A.14) i.e.,σ is the matrix product operator with bond-dimension D 2 < D 1 closest toˆ with respect to the Hilbert-Schmidt norm (hence a compression of the latter). To determineσ, we use a procedure where the minimization is performed iteratively by sweeping through the chain several times while optimizing the compressed matrices site by site [28], i.e., we minimize  16) given in k-normal form while keeping all the other matrices fixed. Note that only the last two terms in (A.15) contribute to the minimization such that Since the stateˆ is given in k-normal form, we find Hence, the extremal point is given by Sweeping through the chain back and forth several times will lead to a convergence of this procedure. Since we are minimizing the norm distance in an iterative manner, this scheme might get stuck in local minima [28]. Therefore we monitor the norm difference by exploiting that after updating the matrices on site k we have With this, we can either abort the algorithm if the norm difference does not converge to zero or increase the bond-dimension and redo the compression. To avoid the attraction of local minima, one can consider two sites instead of only one site in each step of the compression algorithm. Decomposing the optimized matrix products living on two sites by means of a singular value decomposition yields the matrices living on the single sites and hence the matrix product structure is preserved [28].
can be written as a matrix product operator of low bond-dimension. Now, let the POVM elements {Π i }, i ∈ ∆ 1 , only act non-trivially on a subset of R consecutive sites. Then, we find where we consider n operators on each site and comprise all coefficients of the operatorR(ˆ ) in c i [α 1 , . . . , α R ]. Note, that the operatorŜ i+R−1 only acts on sites i, i + 1, . . . , i + R − 1 and that, for the moment, we do not require that {Ŝ is a basis on site k = 1, . . . , N . In this measurement setting, the operatorR(ˆ ) obeys the form of a R-local Hamiltonian. It is well known that R-local Hamiltonians are matrix product operators [28]. Before we show how to convert a R-local Hamiltonian into its matrix product operator representation, let us discuss an alternative representation of matrix product operators. Letˆ be a matrix product operator of bond-dimension D. Then [28] = α1,...,α N acts only on system k and where i 1 = 1 and i N +1 = 1. We will refer to this representation of the matrix product operators as the operator-valued matrix product operator representation. Note that, given the operator-valued matrices, one can find the matrices of the matrix product operator representation by inversion, i.e., We now set out to find an operator-valued matrix product operator representation of the R-local HamiltonianR(ˆ ), see equation (B.2). For this, we rewrite equation (B.2) aŝ for all i = 1, . . . , N − R + 1. Let us illustrate the operator-valued matrices forR(ˆ ) in the case of next-neighbour interaction, i.e., we measure only on all blocks of two contiguous sites and hence R = 2. It is easily verified that the operator-valued matrices take on the form [28] (1) kQ for sites 2, . . . , N − 1 andR The rules to obtain the operator-valued matrices for R-local Hamiltonians can be generalized straightforwardly. The dimensions of the resulting operator-matrices grow according to d(2 + where d is the on-site dimension, n the number of considered operators per site (i.e., the cardinality of the set {Ŝ (α) }) and R the number of consecutive sites in one block on which measurements are performed. Equation (B.4) allows to compute the matrices of the standard matrix product operator representation resulting in a bond-dimension of at most d(2 + R−1 i=1 n i ). For the quantum state tomography of qubits (d = 2) we have n = 6 since every basis rotation (orientation along X,Y and Z) allows for two spin orientations (spin up or down in the respective basis). Consequently, for realistic quantum state tomography settings the bond-dimension of the operatorR(ˆ ) still grows rapidly when measured on large block sizes R. Heuristically, it turns out that the so constructed matrix product operator representation of the operatorR(ˆ ) is not optimal and that one can find a representation with smaller bond-dimension by compressing the corresponding matrices.
Note that with the strategy discussed above one constructs an exact matrix product operator representation of the operator R(ˆ ). Of course, to obtain a matrix product operator approximation one could simply add the individual terms of the opera-torR(ˆ ) which are often of matrix product operator structure or straightforwardly converted into the latter. Here, the bonddimensions of the individual terms add to the overall bond-dimension. To keep the bond-dimension at a certain level, one could compress this operator as described in Appendix A to keep the bond-dimension fixed if one exceeds a predetermined threshold. This provides a general strategy to obtain the matrix product operator representation of the operatorR(ˆ ).

Appendix C: GHZ-type States
In this section of the Appendix we discuss GHZ-type states of the form |ψ N (φ) = [|0 ⊗N/2 |1 ⊗N/2 + e iφ |1 ⊗N/2 |0 ⊗N/2 ]/ √ 2 (C.1) in terms of matrix product states and show how to represent the operatorR(ˆ ) comprising the POVM elements efficiently as a matrix product operator. Recall that for the GHZ-type state one needs to incorporate two global observables into the set of POVM elements, see equations (14) and (15) of the main text.
The GHZ-type State as a Matrix Product State In this subsection we set out to find a matrix product state representation of the GHZ-type state. First, note that Similarly, we can find a matrix product state representation with bond-dimension D = 1 for the second term in the GHZ-type state. Further, adding two matrix product states with bond-dimension D 1 = 1 = D 2 results in a matrix product state with bond-dimension D = 2. Finally, we have for the GHZ-type state whereR 1 (ˆ ) contains the local measurements andR 2 (ˆ ) the four global POVM elements. Now one agrees thatR(ˆ ) is indeed a matrix product operator with small bond-dimension. The first term is a matrix product operator with small bond-dimension D 1 due to its locality (see Appendix B) and the global POVM elements are all matrix product operators with bond-dimensions equal to 1. Hence,R(ˆ ) is a matrix product operator with bond-dimension equal to D 1 + 4. In fact, we can even find a representation ofR 2 (ˆ ) with a smaller bond-dimension. We havê In this notation, the second and third terms can be combined to a matrix product operator with bond-dimension 1 such thatR 2 (ˆ ) has bond-dimension D 2 = 2. Consequently,R(ˆ ) is a matrix product operator with bond-dimension D 1 + 2. The matrices of the matrix product operator representation of c 1 ·X ⊗N + c 2 ·Ŷ ⊗X ⊗N −1 can be chosen as such thatR 2 (ˆ ) is a sum of two matrix product operators with bond-dimensions equal to 1.