When quantum state tomography benefits from willful ignorance

We show that quantum state tomography with perfect knowledge of the measurement apparatus proves to be, in some instances, inferior to strategies discarding all information about the measurement at hand, as in the case of data pattern tomography. In those scenarios, the larger uncertainty about the measurement is traded for the smaller uncertainty about the reconstructed signal. This effect is more pronounced for minimal or nearly minimal informationally complete measurement settings, which are of utmost practical importance.


Introduction
The goal of quantum state tomography (QST) is to infer a reliable estimate of the state of a quantum system from a suitable set of measurements performed on a finite set of identical copies of the system [1,2]. This technique has evolved from the first theoretical and experimental concepts to a fairly standard method [3,4].
The rapid progress in quantum-enhanced technologies entails the use of increasingly complicated systems, which in turn demands ever more sophisticated measurements. Although efficient procedures for QST are available, a persistent problem faced by contemporary QST is the complete characterization of the measurement. Having efficient and precise quantum detector calibration methods is particularly challenging: this is precisely the objective of quantum detector tomography (QDT) [5,6,7].
The standard QDT reconstructs the action of the measurement from the outcome statistics in response to a set of complete certified input probes [8,9]. This has been successfully applied to a variety of situations [10,11,12,13,14,15,16,17]. However, QDT soon becomes exceedingly onerous and impractical as the number of detector outcomes grows.
This drawback can be partially overcome with a more sophisticated calibration embracing the utilization of intrinsic quantum resources, such as entanglement. Examples include the absolute calibration method based on twin beams [18,19,20] and self-testing or blind tomography [21,22,23]. Trading knowledge of probes for information about the measurement yields the concept of self-calibrating tomography [24,25,26,27,28,29].
When the measurement itself is of no interest, the expensive detector calibration can be overcome by using a direct fitting of the measure responses (patterns) to a set of known quantum states. This is the essence of the data pattern tomography (DPT) [30,31,32,33,34], which has been implemented with remarkable success [35,36,37]. In this way, QST is accomplished without any prior knowledge of the measurement, avoiding unnecessary wasting of resources. Moreover, this approach is insensitive to the imperfections of the setup, because they are automatically accounted for by the data patterns.
A quantitative comparison of DPT and QDT+QST strategies was carried out in [33], where it was demonstrated that DPT overcame QDT+QST in many regimes of practical importance. Here, we take the case further by showing that DPT can even outperform QST suplemented with perfect knowledge about the measurement apparatus. This is a bit counterintuitive, as one would think that a complete description of the measurement should give a substantial advantage over an approximation spoiled by the measurement noise. Showing this is not universally true is the main result of the present paper.
We proceed by analyzing the distinctions between QDT and DPT and by giving an intuitive explanation of the above-mentioned unexpected behavior. The rationale lies in the subtle effects of the omnipresent noise: the noisy data corresponding to probes allows for a better fitting of the noisy data of an unknown state than the exact knowledge of the detection process. In this sense, the exact information about detection is just an apparent advantage, which is causing a bias in the reconstruction scheme. For simplicity reasons, we adopt a linear inversion (being aware that it is not the optimal strategy), for it allows for .a deeper comprehension of the physics involved. We confirm our theoretical findings with numerical simulations of realistic setups.

Quantum state tomography with unknown detectors
Let us first set the stage for our analysis. We deal with an input signal described by a true density matrix ρ. Considering a d-dimensional quantum system, the positive semidefinite d × d matrix ρ requires n ≡ d 2 − 1 independent real numbers for its specification. The following analysis can also be applied to continuous-variable systems provided the state space can be truncated in a suitable computational basis, so that all significant state components are contained within that finite-dimensional representation.
QST aims at estimating ρ from measurements performed on identically prepared copies of the system under certification. In general, these measurements are represented by positive operator-valued measures (POVMs) [38]. They are a set of Hermitian operators {Π j } (with Π j ≥ 0 and ∑ j Π j = 1 1), such that each POVM element represents a single output channel of the measuring apparatus. We take every measurement as yielding m distinct outcomes, the probability of detecting the jth output being given by Born's rule p j = Tr(ρΠ j ).
We can expand both ρ and the POVM {Π j } in a suitable operator basis. A very convenient choice is a traceless Hermitian basis {Γ k } (k = 1, . . . , n), satisfying Tr(Γ k ) = 0 and Tr(Γ k Γ ) = δ k . This set coincides with the orthogonal generators of SU(d), which is the associated symmetry algebra. In this way, we directly get p = A r, (2.1) where we have omitted a trivial constant and r k = Tr(ρΓ k ) and A k = Tr(Π k Γ ). A is a unique m × n real matrix that describes the explicit relation between the theoretical probabilities p and the state parameters r.
In presence of noise and with a finite number of copies, the collected relative frequencies of individual measurement outcomes, we call f , deviates from their expected values p. The ultimate goal of tomography is to infer the unknown signal parameters r from the measured noisy data f . A number of different techniques are accessible, all of them providing a sensible inversion of equation (2.1). To facilitate as much as possible the analysis while retaining the fundamental features of the problem, we adopt here linear inversion: this can be accomplished by using either the ordinary least-square estimator (OLS) or the generalized least-square estimator (GLS) [39]: 2) Henceforth, the hat will mean estimator, G = CC † is the data covariance matrix and the superscripts † and + denote the Hermitian conjugation and the Moore-Penrose pseudoinverse [40,41,42], respectively. We just mention that the GLS is the best linear unbiased estimator under the assumption of zero-mean noise f = p (with · denoting average over data), we adopt throughout this paper, whereas OLS is a handy estimator for small and medium-sized data sets, when a reliable estimation of the data covariances is not possible [43].
It is important to underline that the positivity of the reported density matrix ρ is not guaranteed by linear inversion methods. As positivity constraints serve as a kind of regularization, this leads to a slightly worse performance of OLS and GLS techniques on rank-deficient states compared to more elaborated statistically motivated inversion, such as maximum likelihood [44] or Bayesian methods [45]. However, linear methods are enough for our purposes. First, our preliminary analysis indicates that the effect discussed here persists for any positivity-constrained estimation. Second, differences between constrained and unconstrained estimators quickly disappear with the growing size of data measured on realistic (i.e., full rank) quantum states.
Assume next that the QST is carried out with an unknown measurement apparatus. This may happen because either the details of the measurement are not known, or we choose to discard such information. Two conceptually different ways of dealing with this task are at hand. In both of them, a set of M known probes r (α) (α = 1, . . . , M) is measured and the corresponding data f (α) , called patterns, collected. Arranging these probes and patterns columnwise, we get the n × M probe matrix R and the m × M pattern matrix F.
In what follows, we are interested in informationally complete schemes, so that any quantum state ρ can be unambiguously assigned to the corresponding theoretical probabilities p. This requires the matrix A to have at least d 2 − 1 linearly independent rows, which means that m, M ≥ n. In particular, the minimal or nearly minimal tomography n ≈ m M is of special interest, because having a small number of measurement outputs improves the feasibility of the scheme. Highly overcomplete schemes may have a better performance, but their practical implementation can become a formidable task.
The two protocols above mentioned are as follows: (a) Detector-tomography-assisted quantum state tomography (DQST). In DQST, we first implement QDT to estimate the measurement matrix from measured probes. Therefore, we have to solve F = A R. The detector is thus specified by where we have attached the subscript s to stress that we are working in this standard tomographic procedure. The next step is the QST, which is tantamount to solve (2.1) for the signal r. If we use OLS (the generalization to GLS is straightforward), we get Data pattern tomography (DPT). With the same set of probes and patterns, in DPT we bypass the QDT and construct a best fit of the data f in terms of patterns F and fitting coefficients x ; i.e., Fx f , which results in Notice that when there are less measurement channels than probes, m < M, the set (2.5) is underdetermined, and hence the OLS and GLS estimators of x coincide. The final state estimate is formed by combining the probes Here, we use the subscript p to denote the DPT results. From (2.5) it is clear that the effective measurement matrix in this method reads We stress once more that we are presenting a linear version of DPT. In real applications, the data fitting is subject to ρ ≥ 0 to produce a physically meaningful estimate. A glance at equations (2.4) and (2.6) immediately shows that both methods become equivalent whenever (FR + ) + = RF + . (2.8) This always holds true for regular matrix inverses, but not always by pseudoinverses. As demonstrated in [33], the identity ( , hence (2.9). So, by increasing the number of probes, the QDT converges to the true design matrix A and DQST becomes equivalent to QST with a perfect knowledge about the measurement apparatus: this does not come as as surprise.
Taking the same limit of the effective inverse detector matrix A + p = RF + of DPT gives where the inequality is due to the violation of (2.8). The DPT inversion A + p , in general differs (is biased) from the true inverse A + .
What is crucial for us is that introducing such a bias can be beneficial and improve the performance of DPT with respect to DQST. Notice that the effective inverse A + p is the OLS solution to the problem By the properties of OLS [39], the DPT inverse provides the best reconstruction of probes from the measured patterns. Hence, in the limit of a large M, where the distance of any unknown state from the set of probes can be made arbitrarily small, the DPT becomes optimal. As will be shown below with numerical simulations, moderate values of M are sufficient to see a significant advantage of the DPT over the DQST for nearly minimal informationally complete settings.

Resolution limits
The time-honored Cramér-Rao lower bound (CRLB) [46,47] can be used to bound the mean square error e 2 of any unbiased estimator ρ of the true density matrix ρ: where F is the (classical) Fisher information matrix [48] associated with a given measurement and true state Here, L is the likelihood; i.e., the probability of registering data f given the true state r. For example, for Poissonian statistics with N events, each one with probability p j , we have and hence In the limit of many probes M → ∞, the QDT step of DQST converges towards the true measurement matrix and so the DQST is expected to attain the bound of QST with perfect knowledge of the measurement apparatus. However, in this case the DPT estimator becomes biased and the CRLB must be suitably modified [49]. There are many instances where a biased estimator displays better performance than the predicted by the unbiased CRLB [50,51,52]. A particularly relevant one is the Rayleigh curse in estimating the separation between two incoherent pointlike sources [53,54].

Examples and discussion
Excellent performance of the DPT for nearly minimal informationally complete measurements can be illustrated with numerical experiments. We use random square-root measurements defined as where |φ j are randomly generated Haar distributed pure states and j = 1, . . . , m. This has been proposed as a pretty good measurement for distinguishing quantum states [55] and is known to be optimal [56]. In our case, we take m = 40 outputs applied to a six-dimensional quantum system prepared in one random pure state with a 10% admixture of the maximally mixed state to make it full rank. Notice that the measurement is nearly minimal m ≈ 6 2 . We assume Poissonian detection statistics with N total detection events and calculate the mean square error of the DPT and DQST estimates from 1000 repeated data acquisitions. We also calculate the CRLB for the QST with perfect knowledge of the measurement apparatus.   DPT with the standard DQST technique, about twice as many detections must be registered.
Notice also that for moderate and large sets of probes, the DPT errors are squeezed below the CRLB limit. This means that instead of using perfect information about the measurement apparatus for QST, the better strategy is to discard that information, probe the measurement device anew with a sufficiently large probe set and use the DPT approach. In fact, since the CRLB applies to any unbiased estimator, any unbiased QST based on the true design matrix is outperformed by DPT in that regime.
The observed CRLB violation can be explained by the bias inherent to the DPT data processing. This is illustrated in figure 2, where the bias of the effective DPT measurement estimator A p = (RF + ) + is shown for different values of M and N. The bias tends to zero with increasing the sample size. However, for any fixed noise strength, the bias is nonzero and remains so even for large sets of probes. This bias, in turn, makes the DPT state estimator biased and the standard CRLB does not apply. The apparently wrong measurement estimate is of no concern in DPT as we are not interested in the description of the measurement apparatus but only in the final state estimate.
Having seen the performance gap between the DPT and DQST growing with the noise strength, we note that a similar effect can be observed for badly conditioned measurements, as can be appreciated in figure 3. Here, the noise strength is fixed and we take a large set of probes to simulate the asymptotic limit. Increasing the condition number of the true design matrix makes the linear system (2.1) ill-posed, and this is reflected in the CRLB and, as a consequence, in the rapid growth of estimation errors of the standard DQST technique. The alternative DPT approach is much less sensitive to measurement quality and due to its biased nature can produce errors squeezed much below those of DQST but also much below the CRLB bounds derived for perfect knowledge about the measurement apparatus.
On account of the figures 1 and 3, we arrive at a counterintuitive conclusion: to get a better QST in some scenarios, one is advised to discard the true description of the measurement apparatus and replace it with a crude and biased approximation. The resolution of this apparent contradiction is as follows: in the DPT approach, the noisy data measured is matched to noisy data patterns. In this way, the estimation technique is trained by noisy patterns to handle the noisy data better than if we used the true (correct) description of the measurement apparatus. The latter being equivalent to matching the noisy data to noiseless patterns. Interestingly, training the estimation procedure to cope with the noise in an optimal way seems equivalent to using a wrong (biased) description of the measurement device for QST. Errors introduced in an intermediate stage improve the final result!

Concluding remarks
In summary, we have demonstrated the remarkable performance of DPT. Probing an unknown measurement device with a sufficiently large set of probes in some regimes of practical interest turns out to be a better strategy than adopting a perfect knowledge about the measurement apparatus. In other words, discarding true information at hand and replacing it with a crude approximation leads to better final results. This is possible because of the way the DPT self-adapts to the omnipresent noise. We confirmed our findings with numerical simulations of random square-root measurements. Similar behavior is expected for other kinds of measurements and noise.