Permutationally invariant state reconstruction

Feasible tomography schemes for large particle numbers must possess, besides an appropriate data acquisition protocol, also an efficient way to reconstruct the density operator from the observed finite data set. Since state reconstruction typically requires the solution of a non-linear large-scale optimization problem, this is a major challenge in the design of scalable tomography schemes. Here we present an efficient state reconstruction scheme for permutationally invariant quantum state tomography. It works for all common state-of-the-art reconstruction principles, including, in particular, maximum likelihood and least squares methods, which are the preferred choices in today's experiments. This high efficiency is achieved by greatly reducing the dimensionality of the problem employing a particular representation of permutationally invariant states known from spin coupling combined with convex optimization, which has clear advantages regarding speed, control and accuracy in comparison to commonly employed numerical routines. First prototype implementations easily allow reconstruction of a state of 20 qubits in a few minutes on a standard computer.


Introduction
Full information about the experimental state of a quantum system is naturally highly desirable because it enables one to determine the mean value of each observable and thus also of each other property of the quantum state. Abstractly, such a complete description is given for example by the density operator, a positive semidefinite matrix ρ with unit trace. Quantum state tomography [1] refers to the task to determine the density operator for a previously unknown quantum state by means of appropriate measurements. Via the respective outcomes more and more information about the true state generating the data is collected up to the point where this information uniquely specifies the particular state. Quantum state tomography has successfully been applied in many experiments using different physical systems, including trapped ions [2] or photons [3] as prominent instances.
Unfortunately, tomography comes with a very high price due to the exponential scaling of the number of parameters required to describe composed quantum systems. For an N qubit system the total number of parameters of the associated density operator is 4 N − 1 and any standard tomography protocol is naturally designed to determine all these variables. The most common scheme used in experiments [4] consists of locally measuring in the basis of all different Pauli operators and requires an overall amount of 3 N different measurement settings with 2 N distinct outcomes each. Other schemes, e.g., using an informationally complete measurement [5] locally would require just one setting but, nevertheless, the statistics for 4 N different outcomes. Hence the important figure of merit to compare different methods is given by the combination of settings and outcomes.
For such a scaling, the methods rapidly become intractable, already for present state-of-the-art experiments: Recording for example the data of 14 trapped ions [6], currently the record for quantum registers, would require about 150 days, although 100 measurement outcomes can be collected for a single setting in only about three seconds. In photonic experiments this scaling is even worse because count rates are typically much lower, e.g., in recent eight-photon experiments [7,8] a coincidence of single click events occurs only at the order of minutes, hence it would require about seven years to collect an adequate data set. This directly shows that more sophisticated tomography techniques are mandatory.
New tomography protocols equipped with better scaling behaviour exploit the idea that the measurement scheme is explicitly optimized only for particular kinds of states rather than for all possible ones. If the true unknown state lies within this designed target class then full information about the state can be obtained with much less effort, and if the underlying density operator is not a member then a certificate signals that tomography is impossible in this given case. Recent results along this direction include tomography schemes designed for states with a low rank [9,10,11], particularly for important low rank states like matrix product [12] or multi-scale entanglement renormalization ansatz states [13]. Other schemes include a tomography scheme based on information criteria [14] or-the topic of this manuscript-states with permutation invariance [15].
However it needs to be stressed that in any real experiment all these tomography schemes must cope with another, purely statistical challenge: Since only a finite number of measurements can be carried out in any experiment one cannot access the true probabilities predicted by quantum mechanics p k = tr(ρ true M k ), operator M k describing the measurements, but merely relative frequencies f k . Though these deviations might be small, the approximation f k ≈ p k causes severe problems in the actual state reconstruction process, i.e., the task to determine the density operator from the observed data. If one naïvely uses the frequencies according to Born's rule f k = tr(ρ lin M k ) and solves for the unknown operatorρ lin , then, apart from possible inconsistencies in the set of linear equations, the reconstructed operatorρ lin ≥ 0 is often not a valid density operator anymore because some of its eigenvalues are negative. Hence in such cases this reconstruction called linear inversion delivers an unreasonable answer. It should be kept in mind that inconsistencies can also be due to systematic errors, e.g., if the true measurements are aligned wrongly relative to the respective operator representation [16,17], but such effects are typically ignored.
Therefore, statistical state reconstruction relies on other principles than linear inversion. In general, these methods require the solution of a non-linear optimization problem, which is much harder to solve than just a system of linear equations. For large system sizes this becomes, besides the exponential scaling of the number of settings and outcomes, a second major problem, again due to the exponential scaling of the number of parameters of the density operator. In fact for the current tomography record of eight ions in a trap [2], this actual reconstruction took even longer than the experiment itself (one week versus a couple of hours). Hence, feasible quantum state tomography schemes for large systems must, in addition to an efficient measurement procedure, possess also an efficient state reconstruction algorithm, otherwise they are not scalable.
In this paper we develop a scalable reconstruction algorithm for the proposed permutationally invariant tomography scheme [15]. It works for common reconstruction principles, including, among others, maximum likelihood and least squares methods. This scheme becomes possible once more by taking advantage of the particular symmetry of this special kind of states, which provides an efficient and operational way to store, characterize and even process those states. This method enables a large dimension reduction in the underlying optimization problem such that it gets into the feasible regime. The final low dimensional optimization is performed via non-linear convex optimization which offers great advantages in contrast to commonly used numerical routines, in particular regarding numerical stability and accuracy. Already a first prototype implementation of this algorithm allows state reconstruction for 20 qubits in a few minutes on a standard desktop computer.
The outline of the manuscript is as follows: Section 2 summarizes the background on permutationally invariant tomography and on statistical state reconstruction. The key method is explained in Sec. 3 and highlighted via examples in Sec. 4, that are generated by our current implementation. Section 5 collects all the technical details: the mentioned toolbox, more notes about convex optimization, additional information about the pretest or certificate and the measurement optimization, both addressed for large qubit numbers. Finally, we conclude and provide an outlook on further directions in Sec. 6.

Permutationally invariant tomography
Permutationally invariant tomography has been introduced as a scalable reconstruction protocol for multi-qubit systems in Ref. [15]. It is designed for all states of the system that remain invariant under all possible interchanges of its different particles. Abstractly such a permutationally invariant state ρ PI of N qubits can be expressed in the form, where V (p) is the unitary operator which permutes the N different subsystems according to the particular permutation p and the summation runs over all possible elements of the permutation group S N . Many important states, like Greenberger-Horne-Zeilinger states or Dicke states fall within this special class. As shown in Ref. [15], full information of such states can be obtained by using in total N +2 N = (N 2 + 3N + 2)/2 different local binary measurement settings, while for each setting only the count rates of (N + 1) different outcomes need to be registered. This finally leads to a cubic scaling in contrast to the exponential scaling of standard tomography schemes.
The measurement strategy that attains this number runs as follows: Each setting is described by a unit vectorâ ∈ Ê 3 which defines associated eigenstates |i a of the traceless operatorâ · σ. Each party locally measures in this basis and registers the outcomes "0" or "1" respectively. The permutationally invariant part can be reconstructed from the collective outcomes, i.e., only the number of "0" or "1" results at the different parties matters but not the individual site information. The corresponding coarsegrained measurements are given by with k = 0, . . . , N, and where the summation p ′ is over all permutations that give distinct terms. In total one needs the above stated number of different settingsâ. These settings can be optimized in order to minimize the total variance which provides an advantage in contrast to random selection. In addition to this measurement strategy there is also a pretest which estimates the "closeness" of the true, unknown state with respect to all permutationally invariant states from just a few measurement results [15]. This provides a way to test in advance whether permutationally invariant tomography is a good method for the unknown state.
Restricting to the permutationally invariant part of a density operator has already been discussed in the literature; for example for spins in a Stern-Gerlach experiment [18] or in terms of the polarization density operator [19,20,21]. Here, due to the restricted class of possible measurements, only the permutationally invariant part of, in principle distinguishable, particles is accessible [20,21]. This is a strong conceptual difference to permutationally invariant tomography where one intentionally constrains itself to this invariant part. Nevertheless it should be emphasized that the employed techniques are similar.

Statistical state reconstruction
Since standard linear inversion of the observed data typically results in unreasonable estimates as explained in the introduction, one employs other principles for actual state reconstruction. In general one uses a certain fit function F (ρ) that penalizes deviations between the observed frequencies f k and the true probabilities predicted by quantum mechanics p k (ρ) = tr(ρM k ) if the state of the system is ρ. The reconstructed density operatorρ is then given by the (often unique) state that minimizes this fit function, hence the reconstructed state is precisely the one which best fits the observed data. Since the optimization explicitly restricts to physical density operators this assures validity of the final estimateρ in contrast to linear inversion. Depending on the functional form of the fit function different reconstruction principles are distinguished. A list of the most common choices is provided in Tab. 2.2.
Reconstruction principle Fit function F (ρ) Least Squares [23] Table 1. Common reconstruction principles and their corresponding fit functions F used in the optimization given by Eq. (4); see text for further details.
The presumably best-known and most often employed method is called maximum likelihood principle [22]. Given a set of measured frequencies f k the maximum likelihood stateρ ml is exactly the one with the highest probability to generate these data. Other common fit functions, usually employed in photonic state reconstruction, are different variants of least squares [4,23] that originate from the likelihood function using Gaussian approximations for the probabilities. There this is often also called maximum likelihood principle but we distinguish these, indeed different, functions here explicitly. Typically the weights in the least squares function are set to be w k = 1/f k because f k represents an estimate of the variance in a multinomial distribution, cf. the free least squares principle. However, this leads to a strong bias if the counts rates are extremal, e.g., if one of the outcomes is never observed this method naturally leads to difficulties. A method to circumvent this is given by the free least squares function [4] or using improved error analysis for rare events. Let us stress that all these principles have the property that if linear inversion delivers a valid estimateρ lin ≥ 0, it is also the estimate given by these reconstruction principles ‡.
Finally, hedged maximum likelihood [24] represents a method that circumvents low rank state estimates. Via this one obtains more reasonable error bars using parametric bootstrapping methods [25]; for other error estimates we refer to the recently introduced confidence regions for quantum states [26,27]. In principle many more fit functions are possible, like generic loss functions [28], but considering these is out of the scope of this work.

Method
From the previous it is apparent that permutationally invariant state reconstruction requires the solution of for the preferred fit function. This large-scale optimization becomes feasible along the following lines: First, one reduces the dimensionality of the underlying optimization problem because one cannot work with full density operators anymore. This requires an operational way to characterize permutationally invariant states ρ PI ≥ 0 over which the optimization is performed, and additionally demands an efficient way to compute probabilities p k (ρ PI ) which appear in the fit function. Second, one needs a method to perform the final optimization. We employ convex optimization for this task.

Reduction of the dimensionality
This reduction relies on an efficient toolbox to handle permutationally invariant states, which exploits the particular symmetry. Here we explain this method and the final structure; for more details see Sec. 5.1. These techniques are well-proven and established; we employ and adapt them here for the permutationally invariant tomography scheme such that we finally reach state reconstruction of larger qubits.
The methods of this toolbox are obtained via the concept of spin coupling that describes how individual, distinguishable spins couple to a combined system if they become indistinguishable. Since we deal with qubits we only need to focus on spin-1/2 particles. In the simplest case, two spin-1/2 particles can couple to a spin-1 system, called the triplet, if both spins are aligned symmetric, or to a spin-0 state, the singlet, if the spins are aligned anti-symmetric. Abstractly, this can be denoted as ‡ For the least squares fit functions this follows because F = 0 in this case and clearly F ≥ 0 for those functions. In the case of the likelihood it follows from positivity of the classical relative entropy between probability distributions.
If one considers now three spins, then of course all spins can point in the same direction giving a total spin-3/2 system. There is also to a spin-1/2 system possible if two particles form already a spin-0 and the remaining one stays untouched. This can be achieved however by more than one possibility, in fact by two inequivalent choices §, and is expressed by 2 This scheme can be extended to N spin-1/2 particles to obtain the following decomposition of the total Hilbert space, where the summation runs over different total spin numbers j = j min , j min + 1, . . . , N/2 starting from j min ∈ {0, 1/2} depending on whether N is even or odd. Here, H j are called the spin Hilbert spaces with dimensions dim(H j ) = 2j + 1, while K j are referred as multiplicative spaces that account for the different possibilities to obtain a spin-j state. They are generally of a much larger dimension, cf. Eq. (22). Permutationally invariant states have a simpler form on this Hilbert space decomposition, namely with density operators ρ j called spin states and according probabilities p j . Thus a permutationally invariant density operator only contains non-trivial parts on the spin Hilbert spaces while carrying no information on the multiplicative spaces. Note further that there are no coherences between different spin states. This means that any permutationally invariant state can be parsed into a block structure as schematically depicted in Fig. 1. The main diagonal is built up by unnormalized spin statesρ j = p j ρ j / dim(K j ), which appear several times, the number being equal to the dimension of the corresponding multiplicative space. This block-decomposition represents a natural way to treat permutationally invariant states and has for example been employed already in the aforementioned related works of permutationally invariant tomography [18,19,20,21] but also in other contexts [29,30]. This structure shows that if we work with permutationally invariant states we do not need to consider the full density operator but rather that it is sufficient to deal only with this ensemble of spin states. Therefore we identify from now on This provides already an efficient way to store and to visualize such states. More importantly, it also enables an operational way to characterize valid states, since any § More precisely, all states of the form |ψ = V (p) |ψ − ⊗|0 with p being any possible permutation, are states of total spin j = 1/2 and projection m = 1/2 to the collective spin operators J i = Figure 1. Block decomposition for a generic permutationally invariant state as given by Eq. (7) permutationally invariant operator ρ PI represents a true state if and only if all these spin operators ρ j are density operators and p j a probability distribution. This is in contrast to the generalized Bloch vector employed in the original proposal of permutationally invariant tomography [15] given by Eq. (52), which also provides efficient storage and processing of permutationally invariant states, but where Bloch vectors of physical states are not as straightforward to characterize. Via this identification one can demonstrate once more the origin of the cubic scaling of the permutationally invariant tomography scheme. The largest spin state is of dimension N + 1 which requires parameters on the order of N 2 for characterization. Since one has on the order of N of these states this shows results in a cubic scaling.
Fixing the ensemble of spin states as parametrization it is now required to obtain an efficient procedure to compute the probabilities p a k (ρ PI ) for the optimized measurement scheme. This is achieved as follows: First let us stress that a similar block decomposition as given by Eq. (7) holds for all permutationally invariant operators. Hence also the measurements M a k given by Eq. (2) can be cast into this form. Using the convention leads to Therefore the problem is shifted to the computation of the spin-j operators M a k,j for each settingâ. As we show in Prop. 5.2 below, using the idea that measurements can be transformed into each other by a local operation U a |i = |i a this provides the relation Here M e 3 k,j corresponds to the measurement in the standard basis (that need to be computed once) and W a j is a unitary transformation determined by the rotation U a . This connection is given by where S l,j stands for the spin operators in dimension 2j + 1. This finally provides the efficient way to compute probabilities.

Optimization
As a second step one still needs to cope with the optimization itself. Although there are different numerical routines for statistical state reconstruction like maximum likelihood [31] or least squares [4,32], we prefer non-linear convex optimization [33] to obtain the final solution. Quantum state reconstruction problems are known to be convex [32,34], but convex optimization has hardly been used for this task. However, convex optimization possesses several advantages: First of all it is a systematic approach that works for any convex fit function, including maximum likelihood and least squares.
In contrast to other algorithms such as the fixed-point algorithm proposed in Ref. [31] it gives a precise stopping condition via an appropriate error control (see, however Ref. [35]) and still exploits all the favourable, convex, structure in comparison to reparametrization ideas as in Ref. [4]. Moreover it is guaranteed to find the global optimum and the obtained accuracy is typically much higher than with other methods. Quantum state reconstruction as defined via Eq. (4) can be formulated as a convex optimization problem as follows: All fit functions listed in Tab. 2.2 are convex on the set of states. Via a linear parametrization of the density operator ρ(x) = ½/ dim(H) + x i B i , using an appropriate operator basis B i such that normalization is fulfilled directly, the required optimization problem becomes and a linear matrix inequality as constraint, i.e., precisely the structure of a non-linear convex optimization problem [33].
For permutationally invariant states one uses ρ(x) = ⊕ jρj (x) withρ j = p j ρ j by using an appropriate block-diagonal operator basis B i ; therefore we continue this discussion with the more general form. The optimization given by Eq. (13) can be performed for instance with the help of a barrier function [33] . Rather than considering the constrained problem one solves Let us stress that both least squares options can be parsed into a simpler convex problem, called semidefinite program, as for instance shown in Ref. [32,23], but that this does not work with the true maximum likelihood function to our best knowledge. the unconstrained convex task given by where the constraint is now directly included in the objective function. This so-called barrier term acts precisely as its name suggests: If one tries to leave the strictly feasible set, i.e., all parameters x that satisfy ρ(x) > 0, one always reaches a point where at least one of the eigenvalues vanishes. Since the barrier term is large within this neighbourhood, in fact singular at the crossing, it penalizes points close to the border and thus ensures that one searches for an optimum well inside the region where the constraint is satisfied. The penalty parameter t > 0 plays the role of a scaling factor. If it becomes very small the effect of the barrier term becomes negligible within the strictly feasible set and only remains at the border. Therefore a solution of Eq. (14) with a very small value of t provides an excellent approximation to the real solution.
As shown in Sec. 5.2 this statement can be made more precise by which follows from convexity and which relates the true solution x sol of the original problem given by Eq. (13) to the solution x t sol of the unconstrained problem with penalty parameter t. This condition represents the above mentioned error control and serves as a stopping condition, i.e., as a quantitative error bound for a given t. Note that for permutationally invariant tomography dim(H) is not the dimension of the true N-qubit Hilbert space but instead the dimension of ρ(x) = ⊕ jρj (x), i.e., j=j min (2j + 1) = (N + 1)(N + 2j min + 1)/4 which increases only quadratically.
Small penalty parameters are approached by an iterative process: For a given starting point x n start and a certain value of the parameter t n > 0 one solves Eq. (14). Its solution will be the starting point for x n+1 start = x n sol for the next unconstrained optimization with a lower penalty parameter t n+1 < t n . As starting point for the first iteration we employ t 0 = 1 and the point x 0 start corresponding to the totally mixed state. This procedure is repeated until one has reached small enough penalty parameters. The penalty parameter is decreased step-wise. Then each unconstrained problem can be solved very efficiently since one starts already quite close to the true solution.
Let us point out that via the above mentioned barrier method one additionally obtains solutions to the hedged state reconstruction with β = t since the unconstrained problem given by Eq. (14) is precisely the fit function of the hedged version of Tab. 2.2.
Finally for comparative purposes we also like to mention the iterative fixedpoint algorithm of Ref. [36]; for a modification see Ref. [37]. It is designed for maximum likelihood estimation and is straightforward to implement since it only requires matrix multiplication, however, it has deficits regarding control and accuracy. For permutationally invariant tomography the algorithm can be adapted as follows: Given a valid iterate ρ n PI characterized by the ensemble of spin statesρ n j = p n j ρ n j one evaluates the probabilities p a k (ρ n PI ) using Eq. (10). Next, one computes the operators which determine the next iterateρ n+1 j = R n jρ n j R n † j /N with N = j tr(R n jρ n j R n † j ). This iteration is started for example from the totally mixed state and repeated until a sufficiently good solution is obtained.

Examples
The two methods from the previous section are employed in a prototype implementation under MATLAB, which already enables state reconstruction of about 20 qubits on a standard desktop computer.
The current algorithm is tested along the following lines: For a randomly generated permutationally invariant state ρ true PI we compute the true probabilities p a k,true for the chosen measurement settings. Rather than sampling we set the observed frequencies equal to this distribution, i.e., f a k = p a k,true . In this way linear inversion would return the original state, hence also each other reconstruction principle from Tab. 2.2 has this state as solution. We now start the algorithm and compare the trace distance between the analytic solution ρ true PI and the state after n iterations ρ n PI . This distance 1 2 tr |ρ true PI − ρ n PI | quantifies the probability with which the two states, the true analytic solution and the iterate after n steps in the algorithm, could be distinguished [38].  A typical representative of this process is depicted in Fig. 2 for 12 qubits using optimized settings. The randomly generated state ρ true PI was chosen to lie at the boundary of the state space since such rank-reduced solutions better resemble the case of state reconstruction of real data. More precisely, each spin state of the true density operator is given by a pure state ρ true j = |ψ j ψ j | chosen according to the Haar measure, while the p j are selected by the symmetric Dirichlet distribution with concentration measure α = 1/2 [39]. As apparent the algorithm behaves similar for all three reconstruction principles and rapidly obtains a good solution after about 70 iterations. The steps in this plot are points where the penalty parameter is reduced by a factor of 10 starting from t = 1 and decreased down to t = 10 −10 . The slight rise after these points comes from the fact that we plot the trace distance and not the actual function (fit-function plus penalty term) that is minimized. Convex Optimization Iterative fixed-point algorithm Figure 3. Comparison of the maximum likelihood principle between convex optimization and the iterative fixed-point algorithm for the described testing procedure with respect to accuracy and algorithm time. Figure 3 shows a similar comparison for the maximum likelihood reconstruction of 20 qubits but now plotted versus algorithm time ¶. For comparison we include the performance of the iterative fixed-point algorithm, which requires much more iterations in general (3000 in this case vs. about 90 for convex optimization). Let us emphasise that a similar behaviour between these two algorithms appears also for smaller qubit numbers. As one can see, convex optimization delivers a faster and in particular more accurate solution. In contrast, the iterative fixed-point algorithm shows a bad convergence rate although it initially starts off better. This was one of the main reasons for us to switch to convex optimization.
The current algorithm times of this test are listed in Tab. 2 which are averaged over ¶ All simulations were performed on an Intel Core i5-650, 3. 50 randomly generated states. Thus already this prototype implementation enables state reconstruction of larger qubit numbers in moderate times. The small time difference between reconstruction principles is because least squares as a quadratic fit function provides some advantages in the implementation. More details about this difference are given in Sec. 5. Table 2 also contains the algorithm times for reconstructions using simulated frequencies f a k = n a k /N r . For each setting they are deduced from the count rates n a k sampled from a multinomial distribution using the true event distribution p a k,true and N r = 1000 repetitions. The true probabilities correspond to the same states as already employed in the algorithm test. From the table one observes that state reconstruction for data with count statistics requires only slightly more time than the algorithm test with the correct probabilities. We attribute this to the fact that a few more iterations are typically required in order to achieve the desired accuracy.
Finally let us perform the reconstruction of a simulated experiment of N = 14 qubits. Suppose that one intends to create a Dicke state |D k,N as given by Eq. (27), but that the preparation suffers from some imperfections such that at best one can prepare states of the form where p = 0.6 characterizes some asymmetry. As the true state prepared in the experiment we now model some further imperfection in the form of an additional small misalignment U ⊗N , with U = exp(−iθσ y /2), θ = 0.2, and some permutationally invariant noise σ PI (chosen via the aforementioned method but using Hilbert-Schmidt instead of the Haar measure), i.e., The frequencies are obtained via sampling from the state given by Eq. (18) using intentionally only N r = 200 repetitions per setting (to see some differences). Finally, we reconstruct the state according to the maximum likelihood principle. Figure 4 shows the tomography bar plots of one of these examples for the largest spin state p j ρ j , j = N/2 = 7 for both states. Though this state might be artificial this example should highlight once more that this state reconstruction algorithm works also for realistic data and for qubit sizes where clearly any non-tailored state reconstruction scheme would fail. Moreover, it demonstrates that the spin ensemble p j ρ j represents a very convenient graphical representation of such states (compared to a 2 14 × 2 14 matrix in this case).

Reduction
Let us first give more details regarding the reduction step. This starts by recalling a group theoretic result summarized in the next section, which is then used to show how the stated simplifications with respect to states and measurements are obtained.

Background
Consider the following two unitary representations defined on the N qubit Hilbert space: The permutation or symmetric group V (p) which is defined by their action onto a standard tensor product basis by V (p) |i 1 , . . . , i N = |i p −1 (1) , . . . , i p −1 (N ) according to the given permutation p, and the tensor product representation W (U) = U ⊗N of the special unitary group. A result known as the Schur-Weyl duality [40,41] states that the action of these two groups is dual, which means that the total Hilbert space can be divided into blocks on which the two representations commute. More precisely one has Here V j and W j are respective irreducible representations, and j min ∈ {0, 1/2} depending on whether N is even or odd. The dimensions of the appearing Hilbert spaces are dim(H j ) = 2j + 1 and for all j < N/2 and dim(K N/2 ) = 1. Let us note that Eq. (20) already ensures the block-diagonal structure of permutationally invariant operators, while Eq. (21) becomes important for the measurement computation. A basis of the Hilbert space H j ⊗ K j is formed by the spin states |j, m, α = |j, m ⊗ |α j with m = −j, . . . , j and α j = 1, . . . , dim(K j ). These are obtained by starting with the states having the largest spin number m = j, which are given by for all α ≥ 2. The coefficients c j,p must ensure that the states |j, j, α are orthogonal, otherwise their choice is completely free since the detailed structure of different α's is not important. The full basis is obtained by subsequently applying the ladder operator J − = N n=1 σ −;n to decrease the spin number m. Here σ −;n refers to the operator with σ − = (σ x − iσ y )/2 on the n-th qubit and identity on the rest. Thus in total the basis becomes with appropriate normalizations N . Note that the subspace corresponding to the highest spin number j = N/2 is also called the symmetric subspace, which contains many important states like Greenberger-Horne-Zeilinger or Dicke states, which using the spin states read as

Permutationally invariant states and measurement operators
Let us now employ this result in order to derive a generic form for permutationally invariant states; we give the proof for completeness.
hence it is fully characterized already by the ensemble p j ρ j . Moreover ρ PI is a density operator if and only if all ρ j are density operators and p j a probability distribution.
Proof. The proposition follows using the representation V (p) given by Eq. (20) in the definition of the states Eq. (1) and then applying Schur's lemma [40,41]. This lemma states that any linear operator A from K j to K i which commutes with all elements p of the group V i (p)A = AV j (p) must either be zero if i and j label different irreducible representations or A = c½ if they are unitarily equivalent. Since The normalization can be checked taking the trace on both sides. Adding appropriate identities provides 1 where B should now be a linear operator from H j ⊗ K j to H i ⊗ K i . Finally let P j denote the projector onto H j ⊗ K j and using Eq. (30) delivers which provides the general structure. The state characterization part follows because positivity of a block-matrix is equivalent to positivity of each block.
Next let us concentrate on the measurement part. Though the block decomposition follows already from the previous proposition, it is here more important to obtain an efficient computation of each measurement block for the selected setting.

Proposition 5.2 (Measurement operators). The POVM elements M a k as defined in
Eq. (2) for any local settingâ ∈ Ê 3 can be decomposed as M a The unitary is given by W j (U a ) = exp(−i l t l S l,j ) using the spin operators S l,j in dimension 2j + 1, while the coefficients t l are determined by U a = exp(−i l t l σ l /2) which satisfiesâ · σ = U a σ z U † a . For the measurement in the standard basisâ =ê 3 one gets if −j ≤ N/2 − k ≤ j and zero otherwise.
Proof. The basic idea is to consider the measurement in an arbitrary local basisâ by a rotation followed by the collective measurement in the standard basis. The block decomposition is obtained as follows The first step holds because U a satisfies |i a i| = U a |i i| U † a , while the block decomposition of the standard basis measurement M e 3 k is employed afterwards. In the last part one uses the tensor product representation given by Eq. (21).
Since one knows that W j is irreducible it can be uniquely written in terms of its Lie algebra representation dW j as W j (U a ) = W j (e −iX ) = e −idW j (X) , which is given by the spin operators in this case, i.e., dW j (σ l /2) = S l,j [42].
Thus it is left to compute the measurement blocks M e 3 k,j for the standard basis. Note it is sufficient to evaluate M k,j ⊗ |1 j 1 j | such that one can employ the spin basis states |j, m, 1 as introduced in Sec. 5.1.1. At first note that M e 3 k,j exactly contains k projections onto |0 , while each basis state |j, m, 1 possesses (N/2+m) zeros. Therefore one obtains M e 3 k,j |j, m, 1 ∝ δ k,N/2+m |j, m, 1 . Since each POVM has to resolve the identity this is only possible if each M e 3 k,j is the stated rank-1 projector on the basis states.
Finally one still needs to express U a = exp(−i l t l σ l /2) for the chosen settinĝ a ∈ Ê 3 . Since this can be related to a familiar rotation [42] these coefficients can be expressed as t l = (θn) l via a rotation about an angle θ around the axisn. Since this rotation should turnê 3 intoâ its parameters are given bŷ andn =ê 1 (or any other orthogonal vector) ifâ = ±ê 3 .

Convex optimization
In this part we collect some more details regarding the described convex optimization algorithm; for a complete coverage we refer to the book [33]. Each unconstrained optimization given by Eq. (14) is solved via a damped Newton algorithm. The minimization of f (x) = F [ρ(x)] − t log[det ρ(x)] is obtained by an iterative process. In order to determine a search direction at a given iterate x n one minimizes the quadratic approximation This reduces to solving a linear set of equation called the Newton equation which determines the search direction ∆x nt . The steplength s for the next iterate x n+1 = x n + s∆x nt is chosen by a backtracking line search. Here one picks the largest s = max k∈AE β k with β ∈ (0, 1) such that the iterate stays feasible ρ(x n+1 ) > 0 and that the function value decreases sufficiently, i.e., f (x n+1 ) ≤ f (x n ) + αs∇f (x n ) T ∆x nt with α ∈ (0, 0.5). The process is stopped if one has reached an appropriate solution, which can be identified by ∇f (x n ) 2 ≤ ǫ. If the initial point x start is already sufficiently close to the true solution then the whole algorithm converges quadratically, i.e., the precision gets doubled at each step. At this point let us give the gradient and Hessian of the appearing functions. For the barrier term ψ(x) = − log[det ρ(x)] restricted to the positive domain ρ(x) = ½/ dim(H) + i x i B i > 0 one gets [33] ∂ψ(x) Equation (44) shows that the Hessian of the penalty term ∇ 2 ψ(x) > 0 is positive definite, such that ψ(x) is indeed convex. The derivatives of the preferred fit function can be computed directly. For instance using the likelihood function The bottleneck of such an algorithm is the actual computation of the second derivatives. Although the expansion coefficients of each measurement tr(B j M k ) can be computed in advance, it is still necessary to compute Eq. (46) anew at each point x due to the dependence of p k (x). With respect to that the least squares fit function bears a great advantage since its Hessian is constant, i.e., ∂ j ∂ i F ls (x) = 2 k w k tr(B j M k ) tr(B i M k ), such that one saves time on this part.
At last let us comment on the optimality conditions, known as the Karush-Kuhn-Tucker conditions [33]. A given x ⋆ is the global solution of the convex problem given by Eq. (13) if and only if + there exists an additional Lagrange multiplier Z ⋆ such that the pair ( Given the solution x t sol of the corresponding unconstrained problem with penalty parameter t it follows from ∇f (x t sol ) = 0 using Eq. (43) that the gradient conditions are satisfied with Λ t = tρ(x t sol ) −1 being the Lagrange multiplier. This pair (x t sol , Λ t ) also satisfies Eq. (48); only the duality gap condition tr[Λ t ρ(x t sol )] = t dim(H) > 0 does not hold exactly. However this quantity appears in the following inequality ≤ min Here one used that x t sol is the solution of F (x) − tr[Λ t ρ(x)] because the gradient vanishes (and the solution is not at the border), and tr[Λ t ρ(x)] ≥ 0 for the inequality. This is the stated accuracy given by Eq. (15) which relates the function value of x t sol to the true solution x sol . merit characterizing how well a given permutationally invariant target state ρ tar can be reconstructed. This is motivated as follows: A permutationally invariant state ρ PI is uniquely described by its generalized Bloch vector [15] defined as with natural numbers satisfying k + l + m + n = N. Consequently, one possible figure of merit is given by the total error of all Bloch vector elements, i.e., more precisely by Here the multinomial coefficient weights the error of each Bloch vector by its number of appearance in a generic Pauli product decomposition. The error of each Bloch vector element must now be related to the performed measurements. For that note that each Bloch vector element can be expressed as + Sufficiency holds under the Slater regularity condition that demands a strictly feasible point ρ(x) > 0, which naturally holds for state reconstruction problems. using appropriate coefficients c klmn i and the expectation values of [(â i · σ) ⊗N −n ⊗ ½ ⊗n ] PI which can be computed from the coarse-grained measurement outcomes M a i k of settinĝ a i as given by Eq. (2) using linear combinations. Assuming independent errors one gets The detailed form of the error expression E 2 ρtar [(â i · σ) ⊗N −n ⊗ ½ ⊗n ] PI may depend on the actual physical realization, but we assume the following form where ∆ ρ [A] = tr(ρA 2 ) − [tr(ρA)] 2 is the standard variance and K a state-independent factor. This form fits for example well to the common error model in photonic experiments where count rates are assumed to follow a Poissonian distribution. For more details on this derivation we refer to Ref. [15]. For large qubit numbers N this optimization is carried out iteratively. Starting from randomly chosen measurement directions or from vectors which are uniformly distributed according to some frame potential [43], one searches for updates according toâ Hereâ n i is the current iterate, p < 1 a probability close to 1 andr i are randomly chosen unit vectors. If this new set of directionsâ ′ i leads to a smaller total error E 2 total (â ′ i , ρ tar ) than the previous set then this new measurement settings form the next iterateâ n+1 i =â n i , otherwise this procedure is carried out once more. This process is repeated until the total error does not decrease any more. This way of optimizing the measurements requires a method to compute the total error E 2 total (â i , ρ tar ) for a given set of measurementsâ i . Using the generalized Bloch vector or the spin ensemble this computation can be carried out again efficiently for larger qubit numbers N.
Though this algorithm is not proven to attain the true global optimum it is still a straightforward technique to obtain good settings. In the end this is often sufficient, recalling that the true unknown state can deviate from the target state and that one considers "just" an overall single figure of merit. For our simulations we always use the optimized settings for the totally mixed state.
Regarding this point we finally like to add that if one does not employ the minimal number of measurement settings, but rather an over-complete set, e.g., four times as much settings but four times fewer measurements per setting, then the procedure is quite insensitive to the chosen measurement directions. Hence, in many practical situations the search for optimal directions might not even be necessary and randomly chosen measurement directions suffice equally well.

Statistical pretest
Via the pretest one estimates the fidelity between the true ρ true and the best permutationally invariant state F PI (ρ true ) = max ρ PI ≥0 tr( √ ρ true ρ PI √ ρ true ) using only measurement results from a few settingsâ ∈ T , e.g., employing only Depending on this quantity one decides whether permutationally invariant tomography is worth to continue. As explained in detail in Ref. [15] this fidelity can be bounded by with an operator Z = z a k M a k being built up by the performed measurements M a k given by Eq.
(2) and satisfying Z ≤ P sym , where P sym denotes the projector onto the symmetric subspace. The expansion coefficients z a k should be optimized to attain the best lower bound. For a given target state ρ tar this problem can be cast into a semidefinite program [33,44] that can be solved efficiently using standard numerical routines. However for larger qubit numbers one must again employ the block structure of the measurement operators as given by Eq. (9) to handle the operator inequality. Note that the projector on the symmetric subspace has a Block structure P sym,j = δ j,N/2 ½. Then the final problem reads as If one experimentally implements this pretest one must account for additional statistical fluctuations. For the chosen Z one can employ the sample meanZ = z a k f a k using the observed frequencies f a k = n a k /N R in N R repetitions of settingâ, as an estimate of the true expectation value tr(ρ true Z). This sample meanZ will fluctuate around the true mean but large deviations will become less likely, such that for an appropriately chosen ǫ the quantity sign(Z−ǫ)(Z−ǫ) 2 is a lower bound to the true fidelity at the desired confidence level. The proof essentially uses the techniques employed in Refs. [45,46]. Proposition 5.3 (Statistical pretest). For any Z = z a k M a k ≤ P sym letZ = z a k n a k /N R denote the sample mean using N R repetitions for settingâ ∈ T . If the data are generated by the state ρ true then where z a max/min are the respective optima for settingâ over all outcomes k.
Proof. The given statement follows along Here the first inequality holds because the set of outcomes satisfying {n a k : tr(ρ true Z) ≥ Z − ǫ} is a subset of {n a k : F PI (ρ true ) ≥ sign(Z − ǫ)(Z − ǫ) 2 } using Eq. (58). In the last inequality we use Hoeffdings tail inequality [47] to bound Prob tr(ρ true Z) ≤Z − ǫ .
Note that this pretest can also be applied after the whole tomography scheme in which case the projector P sym = z a k M a k becomes accessible. Moreover, let us point out that a strong statistical significance, or a low ǫ respectively, might not be achieved with the best expectation value as given by Eq. (59) [48]; hence optimizing Z for a rather mixed state is often better.
Finally let us remark that the pretest can be improved by additional projectors, see supplementary material of Ref. [15]. This leads to the bound F PI (ρ true ) ≥ j p 2 j with p j being the weight of the corresponding spin-j state of the permutationally invariant part of ρ true as given in Eq. (7). From this expression one sees that this test only works well for states having a rather large weight on one of these spin states. Others, like the totally mixed state, while clearly being permutationally invariant, are not identified as states close to the permutationally invariant subspace. This is in stark contrast to compressed sensing where the certificate succeeds for the whole class of low rank target states [10].

Entanglement and MaxLik-MaxEnt principle
Following the last comment from the previous section, we want to argue that even in the case of an inefficient certificate, permutationally invariant state reconstruction as given by Eq. (5) is meaningful. At first we would like to emphasize that the permutationally invariant part of any density operator represents a fair representative to investigate the entanglement properties of the true, unknown state. This is because the transformation given Eq. (1) can be achieved by local operations and classical communications, whereby the amount of entanglement cannot increase. Thus if the permutationally invariant part of the density operator is entangled this holds true also for the real state.
Second, permutationally invariant state reconstruction also represents the solution of the maximum-likelihood maximum-entropy principle as introduced in Ref. [49], which goes as follows: If the performed measurements are not tomographically complete then there is, in general, not a single stateρ ml that maximizes the likelihood but rather a complete set of them. In order to single out a "good" representative, Ref. [49] proposes to choose the state which has the largest entropy, which, according to the Jaynes principle [50], is the special state for which one has the fewest information.
Proposition 5.4 (Permutationally invariant MaxLik-MaxEnt principle). Using the described permutationally invariant tomography scheme, the reconstructed permutationally invariant state given by Eq. (5) (with the likelihood function) is also the solution of the maximum-likelihood maximum-entropy principle.
Proof. Since the measurements given by Eq. (2) are invariant and tomographically complete for permutationally invariant states, all density operators with the same spin ensemble asρ PI have the same maximum likelihood. According to Ref. [50] the state with maximal entropy and consistent with a given set of expectation values for operators M a k has the form ρ ∝ exp( a,k λ a k M a k ). The Lagrange multipliers λ a k ∈ Ê must be chosen such that the given expectation values match. However, because all M a k are permutationally invariant we can employ the block decomposition given by Eq. (9) and finally obtain exp( a,k λ a k M a k ) = exp(⊕ j a,k λ a k M a k,j ⊗ ½) = ⊕ j exp( a k λ a k M a k,j ) ⊗ ½.
Hence we obtain the same structure asρ PI , which therefore is also the state with maximum entropy.

Conclusion and outline
In this manuscript we provided all necessary ingredients to carry out permutationally invariant tomography [15] in experiments with large qubit numbers. This includes, besides scheme specific tasks like the statistical pretest and the optimization of the measurement settings, in particular the state reconstruction part. Accounting for statistical fluctuations due to a finite amount of data, this reconstruction demands the solution of a non-linear large-scale optimization problem. We achieve this by first using a convenient toolbox to store, characterize and process permutationally invariant states, which largely reduces the dimension of the underlying problem and second by using convex optimization, which is superior compared to commonly used numerical routines in many respects. This makes permutationally invariant tomography a complete tomography method requiring only moderate measurement and data analysis effort.
There are many questions one may pursue in this direction: First, let us stress that the current prototype implementation is still not optimal. As explained, the bottleneck is the computation of the second derivatives, hence we strongly believe that Hessian free-optimization, like quasi-Newton or conjugate gradients [51], or the use of other barrier functions more tailored to linear matrix inequalities [52] are likely to push the reconstruction limit further. Second, it is natural to try to exploit other symmetries in the development of "symmetry" tomography protocols, i.e., tomography should work for all states that remain invariant under the action of a specific group. Clearly any symmetry decreases the number of state dependent parameters, but the challenge is to devise efficient local measurement strategies. Interesting classes here are graphdiagonal [53] or, more general, locally maximally entangleable states [54], translation or shift-invariant states [55], or U ⊗N invariant states [56,57]. Third, it is worth to investigate to what extent particularly designed state tomography protocols are useful in further tasks, like process tomography for quantum gates. For instance, permutationally invariant tomography might be unable to resolve the Toffoli gate [58] directly, but since the operation on all N target qubits is symmetric, a permutationally invariant resolution of this subspace (and the additional control qubit) might be sufficient. Finally let us point out that permutationally invariant tomography can be further restricted to the symmetric subspace, which often contains the desired states. This is reasonable since we have seen that the pretest is only good if the unknown state has a large weight in one of the spin states. However since the symmetric subspace grows only linearly with the number of particles, this tomography scheme can analyse even much more qubits efficiently.