Effect of nonnegativity on estimation errors in one-qubit state tomography with finite data

We analyze the behavior of estimation errors evaluated by two loss functions, the Hilbert-Schmidt distance and infidelity, in one-qubit state tomography with finite data. We show numerically that there can be a large gap between the estimation errors and those predicted by an asymptotic analysis. The origin of this discrepancy is the existence of the boundary in the state space imposed by the requirement that density matrices be nonnegative (positive semidefinite). We derive an explicit form of a function reproducing the behavior of the estimation errors with high accuracy by introducing two approximations: a Gaussian approximation of the multinomial distributions of outcomes, and linearizing the boundary. This function gives us an intuition for the behavior of the expected losses for finite data sets. We show that this function can be used to determine the amount of data necessary for the estimation to be treated reliably with the asymptotic theory. We give an explicit expression for this amount, which exhibits strong sensitivity to the true quantum state as well as the choice of measurement.


I. INTRODUCTION
Quantum tomography has become a standard measurement technique in quantum physics. It is especially important in the field of quantum information as it is used for the confirmation of successful experimental implementation of quantum protocols. For example, it can be used to confirm that the quantum states required in a quantum information protocol are sufficiently closed to their theoretical targets [1]. In practice, experimental data obtained from tomographic measurements are used to assign a mathematical description to an unknown quantum state or operation, called an estimate. Statistically, this is a constrained multi-parameter estimation problem -the quantum estimation problem -where we assume we are given a finite number of identical copies of a quantum state or process, we perform measurements whose mathematical description is assumed to be known, and from the outcome statistics we make our estimate. Due to the probabilistic behavior of the measurement outcomes and the finiteness of the number of measurement trials, there always exist statistical errors in any quantum estimate. The size of the error depends on the choice of the measurements, known as the experimental design, and the estimation algorithm, known as the estimator. A standard combination in quantum information is that of quantum tomography and maximum likelihood estimation [1]. In order to compare estimation schemes, it is therefore important to evaluate precisely the size of * sugiyama@eve.phys.s.u-tokyo.ac.jp † turner@phys.s.u-tokyo.ac.jp ‡ murao@phys.s.u-tokyo.ac.jp the estimation error for a given combination of experimental design and estimator.
For evaluating the size of the estimation error, we introduce a distance-like function, called a loss function, between the estimate and the true operator. One way to evaluate estimation errors using a loss function is an expected loss, which is the statistical expectation value of the loss function over all possible data sets. In quantum information experiments, the infidelity (one minus the fidelity) and the trace distance are often used as loss functions for state estimation. These evaluations are often performed in the theoretical limit of infinite data, called the asymptotic regime. The asymptotic behavior of these expected losses for this combination has been studied very well [2,3]. Using the asymptotic theory of parameter estimation, we can show that for a sufficiently large number of measurement trials, N , there is a lower bound of the expected losses, called the Cramér-Rao bound. It is known that a maximum likelihood estimator achieves the Cramér-Rao bound asymptotically, and that those expected losses decrease as O(1/N ).
In practice of course, no experiment produces infinitely many data, and there are problems in applying the asymptotic theory of expected losses to finite data sets. First of all, the Cramér-Rao inequality holds only for a specific class of estimators, namely those that are unbiased. A maximum likelihood estimator is asymptotically unbiased, but is not unbiased for finite N , so the expected losses can be smaller than the bound for finite N . Especially when the purity of the true density matrix becomes high, the bias becomes larger. This is due to the boundary in the parameter space imposed by the condition that density matrices be positive semidefinite, and the expected losses can deviate significantly from the asymptotic behavior [4,5]. A natural question is then to ask at what value of N the expected losses begin to behave asymptotically. If N is large enough for the effect of the bias to be negligible, we can safely apply the asymptotic theory for evaluating the estimation error in an experiment. However, in general, determining the effects of the bias is a difficult problem.
In this paper, we analyze the effect of the bias caused by the parameter space boundary in one-qubit state tomography using a maximum likelihood estimator. In section II, we briefly review quantum state tomography and the asymptotic theory. In section III, we analyze the boundary effect theoretically. Applying ideas from classical statistical estimation theory, we derive an approximate form of the expected losses for finite N . In section IV, we analyze the boundary effect numerically, giving the results of our pseudo-random numerical experiments. These indicate that the function we derived reproduces the behavior of the expected losses for finite N more precisely than the Cramér-Rao bound. This makes it possible to predict the point at which the behavior of the expected infidelity becomes effectively asymptotic. We conclude in section V.

II. QUANTUM STATE TOMOGRAPHY AND ASYMPTOTIC ESTIMATION THEORY
In this section, we give a brief review of known results in quantum state tomography and asymptotic estimation theory. The purpose of quantum state tomography is to identify the density matrix characterizing the state of a quantum system of interest. Here we only consider states of a single qubit. Let H be the 2-dimensional Hilbert space C 2 and S(C 2 ) be the set of all positive semidefinite density matrices acting on H. Such a density matrix ρ can be parametrized as where ½ is the identity matrix on C 2 , σ = (σ 1 , σ 2 , σ 3 ) T is the vector of Pauli matrices, and s ∈ R 3 , s ≤ 1, is called the Bloch vector. Let us define the parameter space S := {s| ρ(s) ∈ S(C 2 )}. Identifying the true density matrix ρ ∈ S(C 2 ) is equivalent to identifying the true parameter s ∈ S. Let Π = {Π x } x∈X denote the POVM characterizing the measurement apparatus used in the tomographic experiment, where X is the set of measurement outcomes. Like a density matrix, a POVM can be parametrized as where (v x , w x ) ∈ R 4 . When the true density matrix is ρ(s), Born's Rule tells us that the probability distribution describing the tomographic experiment is given by where Tr denotes the trace operation with respect to C 2 . We assume that in the experiment we prepare identical copies of an unknown state ρ(s). We perform N measurement trials and obtain a data set x N = (x 1 , . . . , x N ), where x i ∈ X is the outcome observed in the i-th trial.
Let N x denote the number of times that outcome x occurs in x N , then f N (x) := N x /N is the relative frequency of x for the data set x N . In the relative frequency interpretation of probability, one has that in the limit of N → ∞, f N (x) converges to the true probability p(x|s).
has a unique solution ρ ′ for arbitrary ρ ∈ S(H) [6]. This condition is equivalent to that of the POVM Π being a basis for the set of all Hermitian matrices on H. For finite N , the relative frequency and true probability are generally not the same, i.e., there is unavoidable statistical error, and we need to choose an estimation procedure that takes the experimental result x N to a density matrix, that is, we need an estimator. It is natural to consider a linear estimator, which demands that we find a 2 × 2 matrix ρ li N satisfying However, Eq. There is therefore a 'gap' between B and C, consisting of unphysical linear estimates. When the true Bloch parameter s is in the interior of B and N becomes sufficiently large, the probability that linear estimates are out of B becomes negligibly small. However, when the Bloch vector is on the boundary of B, or when N is not sufficiently large, the effect of unphysical linear estimates cannot be ignored. A maximum likelihood estimator ρ ml is one way to address these problems. The estimated density matrix and the Bloch vector are defined as It can be shown that when ρ li N ∈ S(H), ρ li N = ρ ml N holds [7].
In order to evaluate the precision of estimates, we introduce a loss function. A loss function ∆ is a map from S(H) × S(H) to R such that (i) ∀ρ, σ ∈ S(H), ∆(ρ, σ) ≥ 0, and (ii) ∀ρ ∈ O, ∆(ρ, ρ) = 0. For example, the trace-distance and the infidelity (one minus the fidelity) are loss functions for density matrices. For our loss functions, we use both the squared Hilbert-Schmidt distance ∆ HS and the infidelity ∆ IF [8] defined as The Hilbert-Schmidt distance is a normalized Euclidean distance in the parameter space, and the infidelity is a conventional loss function used in experiments. We note that the Hilbert-Schmidt distance coincides with the trace distance in one-qubit systems, but it does not in general.
The outcomes of quantum measurements are random variables, and the value of the loss function between an estimate and the true density matrix is also a random variable. Thus, in order to evaluate the precision of a general estimator ρ est (not the estimate) for the true density matrix, we use the statistical expectation value of the loss function, called an expected loss (sometimes called a risk function) [? ]. The explicit form is given bȳ The value of the expected loss depends on the choice of the estimator as well as the true density matrix. The latter is of course unknown in an experiment, and one way to eliminate its dependence is to average over all possible true states where µ is a probability measure on S. The purpose of this paper is to clarify the behavior of expected losses for true states close to or on the boundary of B, so we focus not on average but pointwise expected losses for those states.
Let us assume that s < 1. For any unbiased estimator s est and any positive semidefinite matrix H s , the inequalitȳ ∆ N (s est |s) holds, where is called the Fisher matrix and tr denotes the trace operation with respect to the parameter space Ê 3 . Equation (14) is called the Cramér-Rao inequality, and it holds not only for one-qubit state tomography, but also for arbitrary finite dimensional parameter estimation problems under some regularity condition [12]. The matrix F s is a 3 × 3 positive semidefinite matrix for s ∈ Ê 3 . It is known that a maximum likelihood estimator asymptotically achieves the equality of Eq.(14) [12]. From the explicit formulas for the squared Hilbert-Schmidt distance and infidelity in Eqs. (9) and (11), we have where I is the identity matrix on Ê 3 . Therefore when we use the Hilbert-Schmidt distance as our loss function, we substitute H s in Eq. (14)  The Cramér-Rao inequality is often used to evaluate the estimation errors of a maximum likelihood estimator, but there are problems applying the bound to evaluating the expected losses for finite data sets. The inequality holds for unbiased estimators, which maximum likelihood estimators are asymptotically. However, they are not unbiased for finite N , because of the existence of the boundary in the parameter space. It has been shown numerically that for values of N in typical experiments the Cramér-Rao bound cannot be applied [5]. Hence, we are motivated to investigate the behavior of expected losses in parameter spaces with boundaries for finite data sets. We undertake this investigation for one-qubit in the next section.

III. THEORETICAL ANALYSIS
In this section, we derive a function which approximates the expected losses of the squared Hilbert-Schmidt distance and infidelity for finite data sets.

A. Two approximations
In general, the explicit form of expected losses with finite data sets is extremely complicated. In this paper, we try to derive not the exact form but a simpler function which reproduces the behavior of the true function accurately enough to help us understand the boundary effect. In order to accomplish this, we introduce two approximations. First, we approximate the multinomial distri-bution generated by successive trials by a Gaussian distribution. Second, we approximate the spherical boundary by a plane tangent to its boundary.
From the central limit theorem, we can readily prove that the distribution of a linear estimator s li converges to a Gaussian distribution with mean s and covariance matrix F −1 s . For finite N , we approximate the true probability distribution by the Gaussian distribution We will refer to this as the Gaussian distribution approximation (GDA). Because the approximation of the multinomial distribution by the GDA becomes better as each outcome probability grows sufficiently larger than 0, the expected losses under the GDA should be closer to the true expected losses the farther the true Bloch vector is from alignment with the axes in the Bloch sphere defined by the measurement. For a one-qubit system, the boundary between the physical and unphysical regions of the state space is a sphere with unit radius. Despite its simplicity, it is difficult to derive the explicit formula of a maximum likelihood estimator even in this case. Indeed, this is a major contributor to the general complexity of the expected loss behavior in quantum tomography. We therefore choose the simplest possible way to approximate the boundary, namely by replacing it with a plane in the state space. Suppose that the true Bloch vector is s ∈ B. The boundary of the Bloch ball, ∂B, is represented as We approximate this by the tangent plane to the sphere at the point e s := s/ s , represented as and so the approximated parameter space is represented as We will refer to this as the linear boundary approximation (LBA). The LBA is a specific case of tangent cone methods in statistical estimation theory which have been developed and used for analyzing models with constrained parameters in classical statistical estimation theory [13,14]. It is known that the distribution of a maximum likelihood estimator in a constrained parameter estimation problem converges to the Gaussian distribution with a boundary approximated by a tangent cone [14]. Therefore it is guaranteed that the expected losses approximated by the GDA and LBA converge to their true values in the limit of infinite data.

B. Approximated maximum likelihood estimator
In [14], it is proved that the distribution of a maximum likelihood estimator in a constrained parameter estimation problem converges to the distribution of the following vector s ml N := argmin s ′ ∈Ds (s li N − s ′ ) · F s (s li N − s ′ ). (23) By using the Lagrange multiplier method, we can derive the approximated maximum likelihood estimates as We note thats ml N depends on the true parameter s, and so by definition it is not an estimator -it is a vector introduced for the purpose of approximating expected losses of a maximum likelihood estimator. Intuitively, it takes the value of the linear estimate if that estimate is physical, and if it is unphysical a correction vector is added to bring it back within the physical region.

C. Expected squared Hilbert-Schmidt distance
From a straightforward calculation using formulas for Gaussian integrals, we can derive the approximate expected squared Hilbert-Schmidt distance.
where erfc[a] : is the complementary error function and is a typical scale for the number of trials. By using the Cramér-Rao inequality, Eq. (14), we can prove that e s · F −1 s e s /N is the variance of the linear estimates s li N in the e s direction of the Bloch sphere. When N is sufficiently large, most of the distribution of linear estimates is included in D s and the effect of the boundary becomes negligible. Roughly speaking, this condition is represented as e s · F −1 s e s /N ≪ (1 − s ) 2 , where the right hand side is the squared Euclidean distance between s and D s . This can be rewritten as We interpret N * as a reasonable benchmark for judging whether most of the distribution of the linear estimates is included in the physical region or not. The factor of 2 in Eq. (27) comes from the Gaussian integration, though in defining N * it is fairly arbitrary as it makes precise what we mean by 'most' in the preceding sentence. Thus, in order to justify the use of the Cramér-Rao bound for evaluating the estimation error, the number of measurement trials, N , must be larger than N * . When s < 1, in the limit of N → ∞, erfc[ N/N * ] decreases exponentially fast. This can be readily shown by using the asymptotic expansion [15], Therefore we can see that the approximate expected squared Hilbert-Schmidt distance converges to the Cramér-Rao bound. On the other hand, when s = 1, the first and second terms disappear and we obtain where we assumed that F s < ∞ for a Bloch vector s with s = 1. This is smaller than the Cramér-Rao bound, and this implies that when the true state is pure, a maximum likelihood estimator can break the Cramér-Rao bound even in the asymptotic region.

D. Expected infidelity
In order to analyze the expected infidelity, we take the Taylor expansion of the infidelity around the true Bloch vector s up to the second order. The explicit form is in Eq. (18). Again, using formulas for Gaussian integrals we can derive the approximate expected infidelity. When s < 1, ∆ IF N (s ml |s) where is the projection matrix onto the subspace orthogonal to s, and A − is the Moore-Penrose generalized inverse of a matrix A. From the argument above, we can see that the approximate expected infidelity converges to the Cramér-Rao bound in the limit of large N . When s = 1, the infidelity is a 1st order function of s, given by ∆ IF (s, s ′ ) = 1 2 (1 − s · s ′ ), and there are no 2nd-order terms. Consequently, the Hesse matrix of the infidelity H IF s diverges at s = 1. Therefore we cannot apply the Cramér-Rao inequality to the infidelity for pure states. By calculating the expectation value of the approximate estimators ml N , we can obtain (33)

IV. NUMERICAL ANALYSIS
We performed Monte Carlo simulations of one-qubit state tomography using three orthogonal projective (XYZ) measurements. Our task is to estimate the density matrix of the one-qubit system, where the true state can be pure or mixed. We choose a maximum likelihood estimator, and we used a Newton-Raphson method to solve the (log-)likelihood equation with the completely mixed state s = 0 as the initial point of the iteration. When the procedure returned a candidate point outside of the Bloch sphere, we chose the previous point (within the sphere) as the estimate.
The POVM corresponding to three orthogonal projective measurements is given by where |α± are the eigenstates of σ α with eigenvalue ±1. The Fisher matrix and its inverse are given by In Secs. IV A and IV B, we show the plots for two loss functions: the squared Hilbert-Schmidt distance ∆ HS and the infidelity ∆ IF . The pointwise expected losses ∆ N (s ml |s) and the approximated functions∆ N (s ml |s) introduced in Sec. III are compared, and the accuracy of those approximations are discussed. Table I is a list of true Bloch vectors s for the figures shown in the following subsections, along with the numerical values of N * for each s. We chose two Bloch radii, r = 0.9, 0.99, and two set of angles (θ, φ) = (0, 0), (π/4, π/4) as the true Bloch vector s. For a fixed r, the case with angles (0, 0) corresponds to one of the best case scenarios because the Bloch vector is along one measurement axis, while the (π/4, π/4) case corresponds to a worst case scenarios because the Bloch vector is equidistant from all the measurement axes. The explicit form of N * for the Fisher matrix in Eq. (35) is There are two terms which contribute to the divergence at s = 1, and near this value the first term behaves as (1−r) 2 . Therefore N * for a true Bloch vector whose direction is along one of the measurement axes becomes smaller than that for a true Bloch vector whose direction is not. This difference caused by the alignment of measurement axes becomes larger as the purity of ρ(s) becomes higher.
The terms caused by the boundary in Eqs. (25), (30), (31), (33) start to decrease exponentially fast after N becomes larger than N * . We expect that the simulated and approximated plots start to converge to the Cramér-Rao bound after N becomes larger than N * . In all figures, the line styles are as follows: a solid (black) line for the numerically simulated expected loss∆ N (s ml |s), a dashed (red) line for the approximate expected loss∆ N (s ml |s) given in Eqs (25), (30), (31), (33), a chain (green) line for the Cramér-Rao bound, and a dotted (black) vertical line for N * .

A. Expected squared Hilbert-Schmidt distance
The Cramér-Rao bound of the expected squared Hilbert-Schmidt distance is given by (37) Figure 2 shows the pointwise expected squared Hilbert-Schmidt distance∆ HS N plotted against the number of trials N (the horizontal and vertical axes are both logarithmic scale). The panels (EHS-1) and (EHS-2) are for the true Bloch vector s given by (r, θ, φ) = (0.99, π/4, π/4) and (r, θ, φ) = (1, π/4, π/4), respectively, so that the former is (slightly) mixed, while the latter is pure. The panel (EHS-1) shows that our approximation in Eq. (25) converges to the simulated plot, and both the simulated and approximated plots converge to the Cramér-Rao bound of Eq. (37) as N becomes large. The same behavior is observed for other mixed true states. On the other hand, panel (EHS-2) shows a different behavior; our approximation in Eq. (30) converges to the simulated plot, but the simulated and approximated plots do not converge to the Cramér-Rao bound. This indicates that for pure states, our approximation better captures the behavior of the expected loss than does the Cramér-Rao bound. As mentioned around Eq. (30), the reason for this is that the center of the distribution of the linear estimates for a pure state will always be on the boundary of the Bloch sphere, so that about a half of the distribution will always be in the unphysical region. This prohibits a maximum likelihood estimator from ever converging to the Cramér-Rao bound.

B. Expected infidelity
The infidelity is a nonlinear function of the states, and we must approximate the Cramér-Rao bound in this case; doing so up to second order gives Figure 3 shows the pointwise expected infidelitȳ ∆ IF N plotted against the number of measurement trials N : (EIF-1), (EIF-2), (EIF-3), (EIF-4) are for the true Bloch vector s given by (r, θ, φ) = (0.9, 0, 0), (0.9, π/4, π/4), (0.99, 0, 0), and (0.99, π/4, π/4), respectively. Thus panels (EIF-1,2) and panels (EIF-3,4) ) are for true states with the same  direction of the true Bloch vector, while panels (EIF-2) and (EIF-4) are for the case that all of the measurement axes are as far as possible from the true Bloch vector. Figure 3 shows that N * is a good benchmark for the number of trials required for the simulated plot to start to converge to the Cramér-Rao bound, and so we can say that in order to justify the use of the asymptotic theory, N must be larger than N * . Figure 3 indicates that the angle dependency of the expected infidelity becomes larger as the purity becomes higher. When the true state is far from all measurement axes, the accuracy of our approximation is higher than that of the Cramér-Rao bound. For N smaller than about 10 000 (the 'low N region'), the accuracy of our approximation is low (though still higher than that of the Cramer-Rao bound). We believe that the main reason for our approximation's poor performance in this low N region is the second order approximation of the infidelity, and that higher orders would improve the accuracy here. However, in the high N region the approximation can be seen to capture the behavior of the curve far better than the Cramér-Rao bound. Figure 4 shows the pointwise expected infidelity∆ IF N against the number of measurement trials N for the true Bloch vector s given by (r, θ, φ) = (1, π/4, π/4). For pure true states, the expected infidelity decreases as O( √ N ), and Fig. 4 shows that the expected infidelity converges to the approximate function.

V. CONCLUSIONS
In this paper, we analyzed expected losses in onequbit state tomography for finite data sets. We derived an explicit formula of the expected squared Hilbert-Schmidt distance and the expected infidelity between a tomographic maximum likelihood estimate and the true state under two approximations: a Gaussian distribution matched to the moments of the asymptotic multinomial distribution, and a linearization of the parameter space boundary imposed by the positivity of quantum states. We performed Monte Carlo simulations of one-qubit state tomography and evaluated the accuracy of the approximation formulas by comparing them to the numerical results. The numerical comparison shows that our approximation reproduces the behavior in the nonasymptotic regime much better than the asymptotic theory, and the typical number of measurement trials derived from the approximation is a reasonable threshold after which the expected loss starts to converge to the asymptotic behavior.

ACKNOWLEDGMENTS
T. S. would like to thank R. Blume-Kohout and C. Ferrie for their correspondence, as well as F. Tanaka for helpful discussion on mathematical statistics. Additionally, P. S. T. would like to thank D. Mahler, L. Rozema and A. Steinberg for discussions pointing out the importance of this problem. This work was supported by JSPS Research Fellowships for Young Scientists (22-7564) and Project for Developing Innovation Systems of the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan.