High posterior density ellipsoids of quantum states

Regions of quantum states generalize the classical notion of error bars. High posterior density (HPD) credible regions are the most powerful of region estimators. However, they are intractably hard to construct in general. This paper reports on a numerical approximation to HPD regions for the purpose of testing a much more computationally and conceptually convenient class of regions: posterior covariance ellipsoids (PCEs). The PCEs are defined via the covariance matrix of the posterior probability distribution of states. Here it is shown that PCEs are near optimal for the example of Pauli measurements on multiple qubits. Moreover, the algorithm is capable of producing accurate PCE regions even when there is uncertainty in the model.


I. INTRODUCTION
Characterizing a physical system at the most fundamental level requires specifying its quantum mechanical state. Arriving at such a description from measured data is called quantum tomography and the output of such a process is often a single point in the space of allowed parameters. In theory, by considering an infinite amount of data, a unique state can be identified. In practice, however, only a finite amount of data can be obtained. In such cases, it is impossible for a single reported state to coincide with the true state. In classical data fitting, error bars give a measure of the accuracy of the estimate. In the quantum state tomography setting, regions generalize this concept [1]. A region of quantum states should colloquially be understood to contain the true state with high probability (with the exact interpretation depending subtly on how the region is constructed).
Although this idea is quite simple, formalizing the concept of region estimation is not straightforward and there exists many competing alternatives, each with its own set of advantages and drawbacks. For example, bootstrap resampling is a common technique to produce error bars in tomographic experiments. However, as in [2] for example, bootstrapping has so far been exclusively used to calculate statistics on quantities derived from state estimates, such as fidelity to some target state. Bootstrapping is conceptually simple and easy to implement. However, the errors bootstrapping estimate come with no guarantees and it can grossly underestimate errors for estimators which produce states near the boundary [3]. The Fisher information (matrix) is most often used, via the Cramer-Rao inequality, to lower bound the variance of unbiased estimators. In this sense, it gives the errors one would expect in the asymptotic limit, provided an efficient estimator is used. In terms of regions, the Fisher information matrix is also asymptotically the inverse of the covariance matrix of the posterior distribution of parameters, which in turn defines an error ellipse [4]. For most problems, however, even computing the Fisher information numerically is an intractable problem. In some estimation strategies, such as compressed sensing [5] (also [6,7]), the estimate of the state comes with a certificate. That is, an estimated state is provided along with an upper bound on the distance to the true state. This implicitly defines a ball in state space centered on the estimated state. However, the statistical meaning of this ball is not clear nor does a ball provide information about correlated errors. Confidence regions [3,8] (and comparable constructions [9]) are the most stringently defined regions from a statistical perspective. These regions would be ideal if they admitted a method of construction which is computationally tractable.
There is one overarching theme to notice here: trade-offs. On one end of the spectrum is conceptual simplicity, ease of implementation and computational tractability; on the other is statistical rigor, precision, accuracy and optimality. Here we take the approach of constructing statistically rigorous region estimators via a numerical algorithm which possess tunable parameters to control the trade-off between optimality and computational efficiency. The regions we construct here are approximations to high posterior density credible regions and are in some sense the Bayesian analogs of confidence regions. To aid in the descriptive simplicity of the regions, we use ellipsoids which are well understood and easy to conceptualize geometrical objects. Moreover, ellipsoids provide a useful summary of how the state parameters are correlated.
The numerical algorithm we use is a Monte Carlo approximation to Bayesian inference and has been used to in the tomographic estimation of one and two qubit states [10] as well as in the continuous measurement of a qubit [11]. It has also been recently used to construct point and region estimates of Hamiltonian parameters in [12,13]. In particular, the region used was the ellipse defined by the posterior covariance matrix. Here we show that the same method can be applied to quantum states and, more importantly, such regions approximate high posterior density credible regions.
One of the major advantages to this approach is that it naturally accommodates the possibility of unknown errors in modeling. For example, we might assume that the source of quantum states is fixed when it is not; or, we might assume that the measurements are known exactly when they are not. Previous analysis of errors in quantum state estimation have focused on assessing its effect [14]; detecting its presence [15,16]; and, most recently, selecting the best model for it [17,18]. Here we demonstrate that the our approach can estimate and construct regions for both quantum mechanical state and error model parameters simultaneously. That is, our algorithm produces a region in the space defined by all parameters.
This work is outlined as follows. In section II we overview the precise problem and theoretical solution. In section III we give the numerical algorithm which constructs the regions. In section IV we describe the examples used to test the method and in section V the results of the numerical experiments are presented. Finally, in section VI we conclude discussion.

II. BAYESIAN LEARNING AND CREDIBLE REGIONS
Each quantum mechanical problem specification produces a probability distribution Pr(d k |x; c k ), where d k is the data obtained and c k are the experimental designs (or controls) chosen for measurement k, and where x is a vector parameterizing the system of interest. Pr(d k |x; c k ). (1) However, we are ultimately interested in Pr(x|D; C), the probability distribution of the model parameters x given the experimental data. We achieve this using use Bayes' rule: where Pr(x) is the prior, which encodes any a priori knowledge of the model parameters. The final term Pr(D|C) can simply be thought as a normalization factor. Since each measurement is statistically independent given x, the processing of the data can be done on or off-line. That is, we can sequentially update the probability distribution as the data arrive or post-process it afterward. In many scenarios the mean of the posterior distribution 1 is the optimal point estimator [19]. But we desire more than point estimates. In this scenario, the most powerful region estimators are high posterior density (HPD) regions. Let 1 X be the indicator function of the set X. That is, Then, the probability that x lies in some set X is The set X is an α-credible region if That is, a set is α-credible region if no more than α probability mass exists outside the region or, equivalently, at least 1 − α probability mass is contained within the region. The set X is an HPD α-credible region if where y is the largest number such that X is a α-credible interval. Under natural notions of "size", HPD α-credible are the smallest (that is, most powerful) α-credible regions. Being optimal, HPD regions are the solutions to computationally intractable optimization problems-at least if we require a deterministic algorithm. In this work, we will explore a Monte Carlo algorithm to numerically approximate HPD regions. In particular, we use posterior covariance ellipsoids (PCEs). These are defined via the mean and covariance matrix of the posterior probability distribution as follows 2 : [20], the values of which are readily available in countless tables. Note that this is simply the covariance ellipse under a Gaussian approximation to the posterior x|D; ). In addition to being HPD credible regions in the asymptotic limit, PCEs are computationally tractable and, even for modest numbers of experiments, they are remarkably close in size and coverage probability to true HPD regions. The remainder of this work is devoted to detailing the algorithm and demonstrating the above claims via simulated experiments on qubits.

III. CONSTRUCTION OF THE REGIONS
Our numerical algorithm fits within the subclass of Monte Carlo methods called sequential Monte Carlo or SMC 3 . We approximate the posterior distribution by a weighted sum of delta-functions: where the weights at each step are iteratively calculated from the previous step via followed by a normalization step. The elements of the set {x j } n j=0 are called particles and are initially chosen by sampling the prior distribution Pr(x) and setting each weight w j = 1/n. In the equations above, n is the number of particles and controls the accuracy of the approximation.
Note that the approximation in equation (9) is not a particularly good one per se (we are approximating a continuous function by a discrete one after all). However, it does allow us to calculate some quantities of interest with arbitrary accuracy. Like all Monte Carlo algorithms, it was designed to approximate expectation values, such that In other words, it allows us to efficiently evaluate difficult multidimensional integrals with respect to the measure defined by the posterior distribution.
The SMC approximation provides a nearly trivial method to approximate HPD credible regions, which surprisingly has been overlooked. Since the SMC approximate distribution (9) is a discrete distribution, the credible regions will be (at least initially) discrete sets. In particular, the HPD α-credible set,X SMC , is defined by the following construction: 1. Sort the particles according to their weights: {x j : w i ≥ w l for i < l}; 2. Collect particles (starting with the highest weighted) until the sum of the collected particle weights is at least 1 − α. The resulting set isX SMC .
The proof that this an HPD α-credible set is as follows. Assuming the particles are sorted as above. Begin with the highest weighted particle x 1 with weight w 1 . Then, the set X 1 = x 1 clearly has weight w 1 and the largest k satisfying equation (7), in the definition of HPD credible regions, is k = w 1 . Now take the set X 1,2 = {x 1 , x 2 } with weight w 1 + w 2 . The largest k is now k = w 2 . Iterate this process until we reach the first weight w nα such that set X 1,2,...,nα = {x 1 , x 2 , . . . , x nα } satisfies nα j=1 w j ≥ 1 − α. This set will have largest k = w nα . The set X 1,2,...,nα is clearly an α-credible set but it is also an HPD α-credible set since any k > w nα will result in a set excluding all particles x j with j ≥ n α and necessarily have weight less than 1 − α.
The immediate problem withX SMC is that it is a discrete set of points and while it is HPD α-credible set for the SMC approximated distribution, any discrete set has zero measure according the true posterior Pr(x|D; C). The resolution is quite simple. Suppose we have some regionX which containsX SMC . Then, according to the SMC approximation (11), since 1X(x j ) = 1 for all x j inX SMC . Thus, any region enclosing the points inX SMC will be a α-credible region.
But we do not want just any α-credible region. The HPD requirement is conceptually similar to asking for the region to be as small as possible while maintaining 1 − α weight. If we assume (relaxed later on) the credible regions are convex, then we seek the smallest convex set containingX SMC . This defines the convex hull ofX SMC : SinceX hull is a convex polytope in R d 2 −1 , it can be most compactly represented by a list of its vertices, which in the absolute worst cases is the entire set of SMC particles. That is, we require n(d 2 − 1) numbers to specifyX hull . Although certain classes of convex polytopes contain many symmetries and are easy to conceptualize geometrically, specifying the vertices of the convex hull of a random set of points is not most convenient representation for credible regions.
The most efficient way to describe this hull is the smallest ball containing it since this would be described by a location and single radial parameter. However, a ball would not account for large covariances in the posterior distribution. To account for these covariances, we will use ellipsoidal regions where c and A define an ellipsoid via the set of states satisfying (x − c) T A −1 (x − c) ≤ 1. In other words, c is the center of the ellipsoid and A its covariance matrix. Crucially, we want the smallest ellipse containing the hull, the so-called minimum volume enclosing ellipse (MVEE): To numerically construct the MVEE, we use the algorithm of Khachiyan [22]. To summarize,X MVEE is the numerical approximation to the HPD credible region. The posterior covariance regions,X PCE , defined earlier in equation (8), are far less computationally intensive to construct thanX MVEE and are expected to be HPD credible regions in the asymptotic limit. In order to show that they are also approximately HPD credible regions for finite data, we compareX MVEE andX PCE in a number of examples and also look at the performance in cases where limited computational resources prohibit constructingX MVEE over many simulations.
Finally, we note that to compare the sizes of various ellipsoids, we will use the volume where d is the dimension of the parameter space and Γ(n) = (n − 1)! is the well-known Gamma function.

IV. EXAMPLES
Consider repeated preparations of a qubit subjected to random Pauli measurements. We label the Pauli operators {σ 0 , σ 1 , σ 2 , σ 3 } such that an arbitrary operator can be written and x k = Tr(ρσ k ).
For many qubits, the situation is similar. The reconstruction is given by where and we index by k = k 1 + 4k 2 + 4 2 k 3 + · · · + 4 n−1 k n . Then the parameterization is equivalent to that in equation (17). Since each Pauli is idempotent, σ 2 k = 1, each individual measurement has 2 possible outcomes which we label d ∈ {+1, −1} for the +1 and −1 eigenvalues. The likelihood function of a single measurement is then We also consider the effect of errors. We will not assume a particular model for the errors since any error model, for our two outcome measurements, manifests as a bit flip, or equivalently, randomization channel. For simplicity we assume the process is symmetric so we have a single parameter η, called the visibility, which has the following effect on the likelihood function: We consider two cases: the visibility in known and fixed or the visibility is unknown but still fixed run-to-run. In the former case, the task is to compute the PCEX PCE for the state only, while in the latter case the task is to compute the PCE over the joint distribution of (x, η). If only a region of states is desired, we can orthogonally project the PCE onto the subspace of the parameter space defining the state (and similarly for η). Hence, we will have separate marginal PCEs for the state and visibility separately. Examples of how the regions are constructed are presented in figures 8 and 9.

V. RESULTS
We first look at a comparison ofX MVEE andX PCE . These results are presented in figures 1,2 and 3. In figures 1 and 2, the size of the two classes of regions is compared. Initially the volume of the PCE is not smaller than that of the entire parameter space, which is to be expected since it is motivated from asymptotic normality. However, it rapidly converges in volume to, and becomes slightly smaller than,X MVEE . Both sets of regions decrease in size at the same rate as a function of the number of measurements. This suggests that the PCEs are approximate HPD credible regions. This is important because, as opposed to all other region estimators, PCE regions are computationally tractable. That the PCEs remain valid in higher dimensions is shown in figure 4. In figure 4 the probability for the state to lie in the constructed PCE region is shown to be consistent with the target of 95% containment probability for two and three-qubits subjected to random Pauli measurements.
In the above mentioned cases, the visibility η was assumed to be known. In figure 5, the case of unknown visibility is considered. When the visibility in known relatively accurately, the PCE captures the state and visibility accurately. However, as the initial variance in the prior on the visibility increases, the ability of the PCE to capture the both the state and visibility diminishes. Surprisingly, the PCE still finds the state even when it cannot resolve the visibility accurately.
The problem is easily identified to be the assumption that this posterior has a single mode with a convex HPD credible region. To illustrate the problem graphically, we need to reduce the dimensionality. To this end, we assume the state is of the form ρ = p|0 0| + (1 − p)|0 0|, with p unknown and to be estimated along with the visibility η. A typical example of a posterior distribution and the possible regions is shown in figure 6. There are two things to note: (1) the posterior distribution has two modes; (2) even within each mode, the distribution is not well approximated by a Gaussian. Both of these are due to the degeneracy in the posterior distribution of (p, η) arising from the symmetry in of p and η in the likelihood function. For example, an outcome could equally well be explained by a large p and small η as a small η and large p.
The problem of many modes in the posterior can be resolved by reporting disjoint ellipsoidal regions, one for each mode. The discrete set of highest weighted SMC points X SMC naturally find themselves within the modes. Given this set, the task is to then identify which particles belong to which modes. In machine learning parlance, this is the problem of clustering. Many solutions to this problem exist, each with its own set of advantages and drawbacks. Here we have used DBSCAN [23,24], as it seems to require the fewest number of assumptions 4 .
When the visibility is known fairly well only the first problem, disjoint regions, is automatically resolved. In other words, it is not likely that the PCE will be the optimal region estimator unless the visibility is relatively well-known. Practically, when the noise is known with some-but not perfect-accuracy, the MVEE regionX MVEE still behaves properly even when the PCE regionX PCE does not. This is demonstrated in figure 7 where we see thatX MVEE contains the true state with the correct probability butX PCE does not. In practice then, the recommendation is to identify whether the problem specifies convex credible regions. If so, then the PCE is the appropriate choice; if not, then a clustering algorithm should be used to identify the modes of the distribution first.

VI. CONCLUSION
For the three qubit data shown in figure 4, the constructed PCEs were ellipsoids in a 4 3 − 1 = 63 dimensional parameters space. That these simulations were performed on an average laptop computer lends credence to the claim that the regions discussed here, constructed via the SMC algorithm, are a computationally attractive solution to the problem of providing region estimators of quantum states. For the implementation given in [21], further optimization of the code itself would be required to move beyond 100-or-so parameters for conventional computing resources. Alternative to this is the "throw money at it" solution and run the code on high performance computing machines.
The key obstacle to a fully scalable solution is the curse of dimensionality, which is ever-present in quantum theory. The problem that the parameter space grows exponentially presents a challenge in representing the quantum state and simulating the dynamics of the quantum system. Both of these obstacles can be overcome within the computational framework presented here. To address these problems, we follow the already many ingenious methods reducing the complexity of identifying and characterizing quantum states and processes. These include identifying stabilizer states [25]; tomography for matrix product states [26]; tomography for permutationally invariant states [27]; learning local Hamiltonians [28]; tomography for low-rank states via compressed sensing [5]; and tomography for multi-scale entangled states [29]. These techniques employ efficient simulation algorithms which propagate efficient representations of the state vector to calculate of the probabilities defining the likelihood function. Physical constraints and careful engineering lead to such drastic dimensional reductions that we can use along with additional prior information in the Bayesian method.
The above mentioned methods aimed at reducing the complexity of estimating parameters has relied on the notion of strong simulation, where the likelihood function is computed exactly. On the other hand is weak simulation, where the likelihood is not computed but is instead sampled from. The distinction between strong and weak simulation has been a topic of recent interest in quantum computational complexity where it has been shown, for example, that there In the upper left, the visibility is known (this is identical to figure 3). In all other figures, the visibility in unknown (but known to lie in the specified interval-with a uniform prior). For both the state and the visibility parameter, a marginal posterior covariance ellipse is constructed and the tested against the true state and parameter.
FIG. 6: Construction of regions for qubit (known to be of the form ρ = p|0 0| + (1 − p)|0 0|) with unknown visibility subjected to 1000 randomly selected Z measurements. On the left, we have the initial 1000 particles (blue) randomly select according to a uniform prior and the randomly generated "true" state (red). In the middle figure, we have the posterior SMC particle cloud after 1000 randomly selected Z measurements along with the following regions: the green line is the convex hull of those highest weighted particles comprising at least 95% of the particle weight (this isX hull ); the red ellipse isXMVEE, the smallest ellipse containingX hull ; and, in black is the ellipse defined by the estimated convariance matrix of the particle cloud,XPCE. When the posterior is disjoint, all regions poorly approximate the HPD credible region. On the right, the same distribution of particles is shown along with the convex hull and MVEE regions after the modes of the distribution have been identified via the DBSCAN clustering algorithm. is exists subtheories of quantum mechanics which admit efficient weak simulation but do not allow for efficient strong simulation [30,31]. It has recently been shown that the Bayesian sequential Monte Carlo algorithm can also be used in the case where one only has access to a weak simulator [32].
Finally, we might find ourselves with a quantum system which does not admit efficient strong nor weak simulation. In that case, it still may be possible to efficiently characterize the system using a trusted quantum simulator. For the case of estimating dynamical parameters, it has been shown that the Bayesian SMC approach can also perform estimation tasks efficiently using a quantum simulator [13].
In this work we have considered supplementing point estimates of quantum states with regions of state space. These regions contain the true state with a pre-determined probability and within the tolerance of the numerical algorithm. The numerical algorithm has tunable parameters which trades accuracy for computational efficiency and thus can be determined based on desired optimality and available computational resources. When the noise is known with relatively high accuracy, the optimal regions are the posterior covariance ellipsoids. When the noise is unknown, more complex techniques are available to construct ellipsoids which capture the state. In any case, the constructed regions are ellipsoids which are easily described and conceptualized.
In the context of classical statistics, quantum state estimation can simply be thought of as overly ambitious parameter estimation. That is, quantum state estimation is just classical parameter estimation with a specific model and, perhaps, oddly appearing constraints. The point is that the framework presented here for region estimation is suitable to any parameter estimation problem. In particular, we have already shown that additional noise on the measurements can be estimated simultaneously with the unknown state. More generally, the framework possesses a beautiful modularity which allows arbitrary statistical models to be learned.
FIG. 8: Construction of regions for a rebit subjected to 100 randomly selected X and Y . On the left, we have the initial 100 particles (blue) randomly select according to the Hilbert-Schmidt prior and the randomly generated "true" state (red). In the middle figure, we have the posterior SMC particle cloud after 100 randomly selected X and Y measurements and the estimated state, the mean of distribution, is shown in teal. The larger weighted particles are represented as larger dots. On the right, the same distribution of particles is presented along with the regions discussed in the text. The green line is the convex hull of those highest weighted particles comprising at least 95% of the particle weight (this isX hull ). The red ellipse isXMVEE, the smallest ellipse containingX hull . Finally, in black, is the ellipse defined by the estimated convariance matrix of the particle cloud,XPCE. These objects are blown up below the figure to show details.
FIG. 9: Construction of regions for qubit subjected to 20 randomly selected Pauli measurements using the SMC approximation with 100 particles. On the left, we have the initial 100 particles (blue) randomly select according to the Hilbert-Schmidt prior and the randomly generated "true" state (red). In the middle and right figure, we have the posterior SMC particle cloud after 20 randomly selected Pauli measurements and the estimated state, the mean of distribution, is shown in teal. The larger weighted particles are represented as larger dots. In the middle figure, the gray object is the convex hull of those highest weighted particles comprising at least 95% of the particle weight (this isX hull ). In the right figure, the blue ellipsoid isXMVEE, the smallest ellipse containingX hull while the red ellipsoid is the posterior covariance ellipsoidXPCE.