Coarse-graining in retrodictive quantum state tomography

Quantum state tomography often operates in the highly idealised scenario of assuming perfect measurements. The errors implied by such an approach are entwined with other imperfections relating to the information processing protocol or application of interest. We consider the problem of retrodicting the quantum state of a system, existing prior to the application of random but known phase errors, allowing those errors to be separated and removed. The continuously random nature of the errors implies that there is only one click per measurement outcome -- a feature having a drastically adverse effect on data-processing times. We provide a thorough analysis of coarse-graining under various reconstruction algorithms, finding dramatic increases in speed for only modest sacrifices in fidelity.


I. INTRODUCTION
Accurate quantum state reconstruction from finite data is a fundamental tool in quantum information science. Continued development of experimental tomography protocols and data-processing algorithms has improved both the accuracy and computational time required to produce state estimates in the face of the rapid increase in complexity of quantum systems. Despite being a mature field of research, quantum tomography -covering state (QST), process (QPT) and detector tomography -suffers from outstanding problems, such as state preparation and measurement (SPAM) errors. Whilst SPAM errors can be mitigated to some extent by using gate set tomography (GST) for gate characterisation, the latter is significantly resource-intensive (requiring 4000 measurements to estimate a complete gate set, whereas only 256 are required to reconstruct a 2-qubit gate using QPT 1 ). In this work we shall deal with a particular type of SPAM error caused, for example, by noisy detector readout or by mis-calibrated measurement apparatuses. Measurement errors may be systematic or random, and will tend to reduce the fidelity of the tomogram, with respect to the true state ρ. If errors are known in a general quantum information processing protocol on a shot-by-shot basis, they may generally be compensated for by additional quantum control. The irreversible nature of the quantum detection process, however, means that post-measurement knowledge of errors is insufficient for such compensation.
Such a situation may be modelled by a semi-malevolent agent intervening in the experiment, applying random evolutions ρ → U θ ρU † θ that are only revealed to the experimenter after they have made their measurements. For concreteness, we take U θ = cos θ 2 I + i sin θ 2 σ z for σ z the usual Pauli operator, and ρ as the system density matrix when no errors occur. Although the errors cannot be corrected in the sense of a fault tolerant quantum protocol, it is possible to retrodict the quantum state which existed before the errors were applied. Since the success probability of a fixed measurement operator M is p θ = tr(M [U θ ρU † θ ]) = tr([U † θ M U θ ]ρ), moving from the Schrödinger to Heisenberg pictures, the situation becomes equivalent to performing tomography on an ideal preparation ρ with random measurements -see Fig 1. The retrodiction is useful because ρ may still contain other sources of error, which may then be separately estimated 2,3 .
A concrete example of such a situation comes from the field of photonic cluster state generation. A single emitter -e.g. a natural atom or quantum dot -will spontaneously undergo radiative decay at a random delay after excitation. The emitted photons are entangled with the emitter in such a way that repeated resonant control of the emitter's spin state and further excitations causes the subsequent emission of a chain of photons to be generated in a linear cluster state 4-6 : a key resource 7 for measurement based quantum computation 8 . Such schemes rely on an external magnetic field orthogonal to the optical axis 4-6 . Due to the non-zero lifetime τ decay of the emitter, the spin precesses at an angular frequency ω l for a random interval. We may thus think of nature applying a random phase to the spin, which is then transferred to the emitted photon but revealed to the experimenter immediately upon detection. The task of estimating the density matrix ρ of the photonic cluster state in the limit of τ decay → 0 is precisely the problem of retrodictive quantum state tomography outlined above.
For the technique to to work, it is necessary that the effective measurement operators are known: In the precessing spin example, this information is revealed by the arrival time of the photon, the angular precession frequency ω l , and the time-of-flight of the photon to the detector. Because of the continuous nature of the distribution over θ, the measurement record has the following 'sparsity' feature: measurement operators will never be repeated, meaning that at most one click is attributed to each outcome. In this paper we show that retrodictive to-mography is successful in spite of this feature, and go on to investigate the merits and demerits of coarse-graining -a technique which removes sparsity by introducing a finite number of discrete bins which the measurement results are aggregated into. Our numerical simulations reveal that fidelity degrades monotonically as the number of bins is reduced, but that this is accompanied by a drastic improvement in algorithm run-time. As well as being a choice available to the tomographer, coarsegraining can also be considered as one way of simulating imperfect knowledge about the errors θ. Intuitively, a Bayesian shot-by-shot approach is a natural paradigm to tackle the sparse tomography problem, making use of prior knowledge to process additional data obtained as more measurements are performed. However, the binning approach (discussed in Sec II) cannot be applied to this technique straightforwardly. Thus, the Bayesian approach, as we shall see, suffers from being computationally expensive, but will still be be used as a benchmark for the Maximum Likelihood techniques which will follow.
In Section II, we describe qualitatively how the sparse and coarse-grained QST methods work, outlining our methods for simulating tomographic datasets and assessing the performance of reconstruction algorithms. In Section III we introduce Bayesian estimation, along with an algorithm relying on a Monte Carlo implementation, followed by an outline of the Maximum Likelihood (ML) principle in Section IV, and an assessment of two distinct implementation algorithms. Section V treats normally distributed measurement operators, and we draw our conclusions in Section VI. Finally, we give the full details of the algorithms used, along with some additional results, in the Appendix.

II. SPARSE AND BINNED TOMOGRAPHY
The distribution p(θ) (supported on [0, 2π)) of effective measurement operators depends on the physical scenario: in the example of frequency cluster-state generation in the hole-spin system in Ref. 5, when the precession time is much shorter than the emission time, p(θ) ≈ 1/(2π). In such a case, the coherences of the reconstructed state would be completely washed out by conventional QST techniques (not making use of the knowledge of the errors θ). For the more general case of photon emission from spin-bearing emitters, however, the exponential distribution p(θ) ∝ e −θ/µ (with the mean µ = λ −1 , where λ is the rate parameter) is more adequate to describe the spread of operators. Other distributions may be similarly treated -meaning that our analysis applies to a wider range of physical scenarios -although the measurement operators may then be clustered to a greater or lesser degree, having an affect on the accuracy of the retrodicted tomogram. The normal distribution p(θ) ∝ e −θ 2 /2σ 2 (σ being the standard deviation) is considered in Section V, while as µ → ∞, we recover the uniform distribution FIG. 1. Bloch sphere representation of the problem in the context of a precessing qubit. a) In the Schrödinger picture, the state (purple) gains a random phase (dots) prior to every measurement, with the measurement bases given by the arrows. b) In the Heisenberg picture, the state is static while the measurement operators are distributed randomly. The detector clicks can then be gathered in several bins on the Bloch sphere (coloured segments) to be used for coarse grained state reconstruction. c) Graphical depiction of exponentially distributed phases, for various means. d) Graphical depiction of normally distributed phases, for various distribution widths.
By using the Heisenberg picture (as in the previous section), the tomographic protocol is equivalent to reconstructing some unknown state ρ with the following set of positive (projective) measurement operators and that the values of φ play less of a role as the spread of θ increases. Because |φ φ| + |φ + π φ + π| = I, this set may be considered a POVM (Positive Operator Valued Measure) upon appropriate normalisation (in the sense that the sum of all operators is proportional to the identity).
We generated pseudo-tomographic data for a fixed ρ by drawing N/2 unique values of θ ∈ [0, 2π) from p(θ). We then simulate a single Bernoulli trial for each measurement operator, assigning the event to M θi with probability p i = tr(ρM θi ), and to the orthogonal operator with the complementary probability. The measurement record then consists of a (multi)set of N/2 + 2 measurement operators with (for the N/2 operators perpendicular to the 'precession' axis) multiplicities n i = 1, and the two orthogonal operators (parallel to the 'precession' axis) with joint multiplicity of N/2 (i.e. it is 'sparse'). For the former, N/2 measurements are then split between the two projections along the precession axis. Optionally, we modify the measurement record by a process of coarse-graining or 'binning', resulting in a lower number N b < N/2 of coarse-grained measurement operators, e.g.
projecting onto states evenly distributed around the equator of the Bloch sphere (see Fig. 1) with multiplicitiesñ that simply accumulate the events according to the bin that they fall within (with the bins being intervals centred on θ j and with width N/2π, as shown graphically in Fig. 1). Other binning schemes are possible, including those that depend on the original measurement record 9 . We then run different reconstruction algorithms (to be introduced below) on the coarse-grained measurement record, to give a quantum state estimate or 'tomogram' ρ est . The running time of the algorithm is noted, and the fidelity of the tomogram computed: is a measure of the distance between the true state and the retrodicted tomogram. The procedure was then repeated for distinct, randomly generated (but full rank) ρ, and we collected statistics to summarise the typical performance.
Counter to intuition, using sparse tomography without any binning works remarkably well. However, algorithm running time tends to scale badly with N (since the calculation of the cost function and its gradient involves a contribution from each of the N distinct operators). Hence our proposed coarse-grained approach. The remainder of the paper is dedicated to investigating the dependence of fidelity and run time on N b , for different reconstruction algorithms. As N b , N → ∞, the sparse and coarse grained approaches are expected to give the same fidelities.

III. NON-ADAPTIVE BAYESIAN TOMOGRAPHY
The Bayesian approach was introduced in the field of quantum tomography [10][11][12][13][14][15] , and is an ongoing theoretical and experimental research topic [16][17][18] . This approach offers numerous advantages over other techniques, such as use of online information available to the experimentalist after each measurement. Furthermore, Bayesian inference was also shown to be optimal with respect to any strictly proper scoring rule derived from Bregman 2. a) Initial uninformed prior (orange), with the mean of the distribution shown in green, and the true state to be reconstructed in red. b) Final posterior (orange) after 2000 measurements, where the marker size indicates the relative particle weights. c) and d) show the [ σx , σz ] projection of the prior and posterior, respectively, as a visual aid. As more measurements are performed, most of the original particle weights drop to zero, requiring resampling for a more accurate prediction without requiring an excessive number of particles to begin with.
distances 16,19,20 (near-optimal if the infidelity is used as a loss function instead 21 ), with the ability to track fidelity bounds online 21 (allowing for feedback to minimise number of required measurements), as well as giving robust region estimates 22 and allowing for model selection/averaging. Thus the Bayesian approach shall be used as a benchmark for the other techniques discussed in this work. Our implementation follows closely the approaches used in Refs. 17 and 23. For a Bayesian update scheme, we start with an initial prior probability density p(ρ) over feasible state space (usually uninformed due to the absence of additional knowledge, resulting in a uniform prior). After obtaining a new measurement datum D, the posterior distribution p(ρ|D) is then built using the likelihood function L(ρ; D) as Typically, Bayesian tomography schemes would then make use of the narrower posterior and additional criteria (for example, Shannon information 23 ) to infer the next optimal measurement setting 17,23 . However, since we do not have control over which measurement to perform next, this latter step of the Bayesian scheme cannot be applied. Although we do not make use of any criteria to track the narrowing of the sample, one could still use the covariance of the the narrowed posterior, in this case, to indicate when a sufficiently precise estimate has been found. Despite the simple form of Eq. 4, the analytical evaluation of the posterior is seldom feasible, and hence the latter is typically replaced with an approximation. To this end, several Markov Chain Monte Carlo techniques (MCMC) have been adopted, including the Metropolis-Hastings algorithm 16 . However, these MCMC techniques tend to be computationally expensive, with decreasing acceptance probabilities at each sampling step, leading to more samples being discarded as additional data is obtained. Furthermore, these methods require the assumption of a normal posterior, which is not always the case in state tomography. The Sequential Monte Carlo technique (SMC) 24,25 , on the other hand, only requires the computation of a single term of the likelihood to update the weights of the approximate distribution with each measurement 23 . In this approach, adopting the notation in Ref. 23, the posterior after the i th measurement is approximated by a number P of randomly sampled particles, {ρ p }, and their corresponding weights {w Suppose our current (prior) knowledge is given by the dataset {D i } = {α j : 1 ≤ j ≤ i, α j ∈ P}, where the set P is defined in (1). If the next projection phase is, without loss of generality, θ i+1 , (that is, α i+1 = M θi+1 ), then, following Ref. 23 and using Bayes' rule (Eq. 4), we can write the approximation for the next posterior as where P(M θi+1 |ρ p ) = Tr(M θi+1 ρ p ). In our numerical simulations, we do the first N/2 measurements along the zaxis (that is, using projection operators {|↑ ↑| , |↓ ↓|}), followed by the remaining N/2 measurements along the Bloch equatorial plane. As more measurements are performed, narrowing the particle distribution, most of the weights drop to zero, which can be remedied by resampling using the new posterior distribution 23 . Finally, the Bayes estimator ρ est can be extracted from the mean of the final posterior approximation. In Fig. 2 we show the above steps graphically, emphasising the use of resampling to obtain an accurate posterior.
We numerically benchmarked the Bayesian technique, using a uniform prior 26 . An example is shown in Fig. 2. and further results are summarised in Fig. 3. Despite the fact that we cannot decide which measurement to perform next, our random basis measurement can be seen to give a good convergence after 2000 measurements with 1000 particles.

IV. MAXIMUM LIKELIHOOD ESTIMATION
A common, alternative, approach to state estimation is producing a tomogram ρ est which maximises the likelihood function. Naive approaches may result in an invalid tomogram (having, for example, negative eigenvalues). The search for the best fit to the data, therefore, should be constrained to the allowed state space of trace-one positive semidefinite matrices [27][28][29]. Previously (in the Bayesian method) this was ensured by choosing a prior distribution supported only in the allowed state space. Here, the prior is not modelled, but we consider two alternative approaches: A) the constraints are enforced by a non-linear parametrization of the density matrix and B) the constraints are enforced periodically in the course of an iterative gradient descent procedure, allowing for temporary violations [29][30][31] . Given a density matrix ρ, the likelihood function to be maximised has the form with equality holding up to an irrelevant proportionality constant. For sparse tomography, the product would be over N exponentiated probabilities p j , with each n j taking a binary value of either 0 or 1. Due to the monotonicity of the logarithm, maximising the likelihood function is identical to minimising the negative of its logarithm [which we refer to as the cost function C(ρ)], given by where we took the normalising constant to identity. Recall that the sparse tomography limit is recovered when N b = N and n j = 1. In the limit of a large number of detections per measurement, the probability of obtaining the j th measurement can be approximated by a Gaussian distribution 32,33 , with the estimated number of detections for the j th measurement given byn j = N p j . Since this approximation clearly fails for the sparse case due to the binary nature of the n j 's, we do not make it.

A. Cholesky factorisation
In this section we implement a Cholesky-like decomposition of the density matrix in order to minimise Eq. 8 [32][33][34] , allowing us to use Python's SciPy leastsquares solver on a 1D array 35 One can easily show that any qubit density matrix ρ allows for a decomposition of the form where T is the lower triangular matrix given by with t = (t 1 , t 2 , t 3 , t 4 ) being the array over which the minimisation search is performed. In particular, we can use this decomposition to calculaten j ∝ p j = Tr |φ Generalising this parametrisation to m qubits, we get and hence the search needs to be done over a real array of length 4 m .
Having formulated a decomposition guaranteeing a valid density matrix, the problem can be recast to a least-squares minimisation problem 32,33 in order to find the minimum of the negative log likelihood, as the latter may be written down as The random phases were sampled from an exponential distribution with µ = π/8. As expected the coarse grained approach returns slightly higher infidelities (shown on the xaxis). The algorithm running times (y-axis) for the sparse approach scales linearly with number of measurement repetitions. On the other hand, the computation times for the binned approach, within error bars, remain the same with increased repetitions, as the number of projective operators used for reconstruction is the same for all repetition numbers. The results from the Bayesian method are also shown for comparison. While the Bayesian approach offers higher fidelity estimates for lower measurement numbers N (star), the infidelity is higher compared to the sparse PGDB for higher N , and the corresponding computation time heavily offsets any advantages gained in fidelity by the Bayesian approach. The black arrow indicates the direction of the trend as the number of measurement events increases; infidelity decreasing at the expense of higher computation time, whilst the grey arrows on the axes point towards the ideal region of low infidelity and computation time.
where, for the general case of a multinomial probability distribution, we get using Eq. (8) Despite having multiple local minima, this optimization problem was shown to have a single global solution 34 , meaning that all local minimizers lead to the same solution minimizing the negative log likelihood.
In Fig. 4 we show the results for single qubit reconstruction. As expected, the fidelity of the reconstructed density matrix increases with number of Bloch sphere partitions. This is also the case for a two-qubit reconstruction, as we show in Appendix A.

B. Projected gradient descent
Gradient descent algorithms rely on following the path of steepest descent of the cost function, in this case Eq. (8), starting from a well chosen initial estimate. If left unconstrained in the convex space of d × d matrices (where d is the Hilbert space dimension), the resulting estimate ρ est might lie outside the convex subspace of unit-trace, positive semidefinite matrices, leading to an unphysical estimate. Hence, projection back to the physical subspace, minimising distance as measured through of a matrix norm (such as projection of the spectrum onto the unit simplex 30,31,34 ) is employed, giving rise to projected gradient descent (PGD) algorithms. Iterating this process leads to a convergence of the cost function to a minimum below a predefined threshold. A unique solution satisfying the appropriate constraints and minimising the cost function is then guaranteed as long as the latter is a continuously differentiable convex function of the density matrix. Eq. (8) is convex but not continuously differentiable, but this tends to not pose a problem in practice, as discussed in Ref. 29. Choosing the projection of ρ to be of its spectrum onto the unit simplex (which we refer to as P S ), the PGD algorithm update can be written as As is commonplace, we supplement the PGD algorithm with a backtracking line search (PGDB) based on the Armijo-Goldstein condition to losely optimise the maximum step size for each descent iteration 30,31,34 . The estimate at the k th PGDB iteration can thus be written as where α ∈ [0, 1] is the line search parameter to be roughly optimised at each step. We assess the impact of binning on the PGDB algorithm, Fig. 5a showing the trade-off between computation time and fidelity for µ = π/8. Fig. 5b, on the other hand shows the relation between computation time and infidelity for various number of bins N b and events N for normally distributed phases, showing a similar trend to the exponentially spread phases. Appendix C shows a closer analysis of the exponential data, with first and third quartiles for the infidelity, and standard deviation errorbars for computation times. As expected, within standard deviation error, the binned approach gives slightly lower fidelities than the sparse one. This difference, however, is well justified when considering the significant reduction in computation time shown in Fig. 5a. The trends in Fig. 5a, both for computation time and infidelity, are similar to those shown in Fig. 4 for the Cholesky method. However, our numerical simulations clearly show lower reconstruction times achieved using the PDGB technique. In Fig. 6, we show how the infidelity varies with increasing mean µ for various values of the bin number N b .

V. CONDITION NUMBERS
Using a single basis for reconstruction along the plane of precession, we see that the higher the spread of the distribution, the higher the fidelity one expects, as the effective rotated bases sample larger portions of the Bloch plane, whereas for lower spreads, the additional phase knowledge does not contribute considerably, and hence incomplete Pauli tomography (in which only x-and zbasis measurements are performed) is recovered. This can be seen in Fig. 7, showing the behaviour of the condition number κ(A) of the measurement matrix A for increasing N , where A is given by where the projectorsΠ i make up the set P in Eq. (1) 31,36 . The condition number decreases significantly with increasing standard deviation of the distribution, meaning that sampling distributions with larger spreads results in a better conditioned measurement matrix.

VI. CONCLUSION
QST is still an active area of experimental and theoretical research, allowing the reconstruction of quantum states from finite experimental data. In this work, we implemented several QST algorithms in the presence of phase errors which is only known after the system is measured. We showed, with a simple modification, how the unaffected state may be retrodicted using such knowl- . Sparse tomography condition number (shown above for N = 2 × 10 4 ) decreases (improves) as the standard deviation of the normally distributed phases (σ) increases. The red bars indicate one sigma uncertainty. When σ is high, we recover the limit of many measurements distributed evenly around the equator of the Bloch sphere. In this situation, we obtain the same condition number, κ(A) = 2, as in the case of complete Pauli measurements 36 . edge. Furthermore, we demonstrated that, at a small cost in fidelity, the reconstruction time can be significantly decreased. All data in this work was generated and visualised using Python and QuTiP package 37,38 . Bayesian approach taken from 20 .

6:
Select particle xj with probability wj 7: µi = axj + (1 − a)µ Mean for new particle location 8: Pick x i randomly from N (µi, Σ) Draw new, shifted, particle In this section we re-state the performance of PGDB for the exponential distribution, but in an alternative format with error bars: see Fig A.3. a) Full rank, single qubit reconstruction using gradient descent, averaged over 1000 trials. The random phases were sampled from an exponential distribution with µ = π/8. As expected the coarse grained approach returns slightly higher infidelities. b) Algorithm running times for the unsorted, and coarse grained approaches. The unsorted approach scales linearly with number of measurement repetitions. The coarse grained approaches, within the standard deviation, do not scale with increased repetitions as the number of projective operators used for reconstruction is the same for all repetition numbers.