Gradient-based stopping rules for maximum-likelihood quantum-state tomography

When performing maximum-likelihood quantum-state tomography, one must find the quantum state that maximizes the likelihood of the state given observed measurements on identically prepared systems. The optimization is usually performed with iterative algorithms. This paper provides a gradient-based upper bound on the ratio of the true maximum likelihood and the likelihood of the state of the current iteration, regardless of the particular algorithm used. This bound is useful for formulating stopping rules for halting iterations of maximization algorithms. We discuss such stopping rules in the context of determining confidence regions from log-likelihood differences when the differences are approximately chi-squared distributed.


Introduction
Quantum-state tomography is a statistical procedure for estimating the quantum state ρ true of a quantum system. One prepares many identical copies of the system, measures each copy independently and uses the measurement results to estimate ρ true . One useful way to make the estimate is to find the state ρ ML with the maximum likelihood for the measurement results [1]. The estimation then becomes an optimization problem, which is usually solved numerically with iterative methods. A stopping rule is needed to decide when to halt the iterations, otherwise one risks stopping before the calculation has reached a point 'near enough' to ρ ML or wasting time with unnecessary iterations. Using the difference between the state or the likelihood achieved in successive iterations is unreliable, especially if the maximization algorithm suffers from slow convergence. Ideally, the stopping rule should specify 'near enough' in a statistically relevant way: if there is large statistical uncertainty, high numerical precision may not be necessary. In this paper we give such stopping rules that depend on an upper bound on the ratio of the true maximum likelihood and the likelihood of the state achieved at the current iteration. The bound and stopping rules can be applied to any iterative likelihood maximization algorithm. The bound is particularly useful when a likelihood ratio is used to assign uncertainties to the inferred state. This paper begins with the derivation of a bound on the ratio of the true maximum likelihood to the likelihood of a particular state. One may stop iterations when this ratio is sufficiently small. We next give a brief review of Wilks's Theorem, which gives the probability distribution for the likelihood ratio statistic. We use this theorem to give some rules-of-thumb for when one should halt iterations, in three contexts: (1) Point estimation. The goal is to obtain a point estimate for which the likelihood of the estimate is at least as large as the expectation value of the likelihood of the true state with respect to the data. (2) Confidence regions for states. Here the goal is to construct a confidence region for the true state at a given significance level. (3) Confidence regions for expectation values. In this case, we wish to construct a confidence region for expectation values of observables of the state. Our stopping criteria are formulated using Wilks's Theorem, which may not always apply in quantum state tomography experiments. We then present a numerical example of our likelihood ratio bound using simulated optical homodyne measurements.

Likelihood ratio bound
Suppose N quantum systems are prepared, each in the state with density matrix ρ true . For each copy i, the experimenter chooses an observable to measure. We will label each observable with θ i for i = 1 . . . N. The measurements yield results x i . Corresponding to the choice and result combination, there is a positive-operator-valued-measure element Π (x i |θ i ) = Π i . For finite-dimensional systems, N may exceed the number of possible measurement choice and result combinations, so many of the Π i will equal one another. However, for infinite-dimensional systems, the possible measurement results may be infinite and continuous, so the Π i may all be different. The probability to observe x when measuring θ is p(x|θ) = Tr [ρ true Π (x|θ)]. The likelihood for observing the sequence of measurement results {x i : i = 1 . . . N} as a function of candidate density matrix ρ is The goal of maximum-likelihood quantum-state tomography is to maximize this function to obtain the state ρ ML as the estimate of the true state ρ true . The optimization is easier if we focus instead on the natural logarithm of the likelihood (the 'log-likelihood'): The same density matrix maximizes both the likelihood and the log-likelihood. Fortunately, the log-likelihood is concave over the (convex) set of density matrices. One can show that it is concave by using the concavity of the logarithm, the linear dependence of the individual event probabilities on the density matrix, and the fact that the total log-likelihood is a positive sum of logarithms of the probabilities. This concavity simplifies the maximization. Several maximization methods are described in Refs. [1,2,3,4]. These methods use iterative schemes, producing a new density matrix ρ k after the k'th iteration.
After iteration k, we would like to place an upper bound on L(ρ ML ) − L(ρ k ). Consider the density matrix ρ ǫ = (1 − ǫ)ρ k + ǫρ ML , where 0 ≤ ǫ ≤ 1. Because the log-likelihood is concave, for any choice of ǫ In particular, when ǫ = 1, This is the same matrix R that is used in the 'RρR' algorithm described in [3]. Of course, we do not know ρ ML , so we find an upper bound of Tr[ρ ML R(ρ k )] by maximizing Tr[σR(ρ k )] over all density matrices σ: This maximum is achieved for σ equal to the pure density matrix corresponding to the eigenstate of R(ρ k ) with the largest eigenvalue. Thus After exponentiation, we obtain, Thus one may stop iterations when r k is less than a predetermined bound specified by a stopping rule. Specific bounds depend on context as we discuss below.
The above ideas could also be adapted for a simple gradient-ascent maximization procedure, as follows: Initialize the procedure with some state ρ 0 , perhaps the fully mixed state. At each iteration, set σ equal to the eigenstate of R(ρ k ) with the largest eigenvalue. Then use a one-dimensional optimization procedure to find the ǫ maximizing L(ρ ǫ ) and set ρ k+1 = ρ ǫ . However, such a procedure can have slow convergence because it uses only the slope of the log-likelihood function and not its curvature.

Review of Wilks's theorem
Of course, the stopping rule, that is the value of r k below which one can halt iterations, depends on how the estimate is used. In the following we discuss the use of L(ρ k ) and r k to establish two types of confidence regions related to our estimate. The asymptotic theory of likelihood-ratio tests provides guidelines. A key technique is the application of Wilks's Theorem; see Ref. [5] and section 6.4 of Ref. [6]. Wilks's Theorem states that under appropriate assumptions, for two sets of models H 0 ⊆ H specified by h 0 and h free parameters, respectively, the random variable D( Here, L(H 0,ML |X) and L(H ML |X) are the maximum log-likelihoods for H 0 and H, respectively, and we assume that the true state is in the interior of H 0 with respect to the parametrization. The parametrization must be sufficiently well-behaved; see the references above. We can apply this to hypotheses consisting of linear spaces of density matrices parametrized with respect to a linear basis, provided the true density matrix is not too near the boundary, that is, has no statistically near-zero eigenvalues.

Point estimate stopping rule
As a first approximation to be refined below, we intuit that little further information about ρ true is obtained once where . is the expectation value for the enclosed random variable, and X is a random vector of length N distributed according to ρ true . If ρ true is an interior point of the space of density matrices and N is sufficiently large, the expectation of L(ρ ML |X) − L(ρ true |X) can be approximated by an application of Wilks's Theorem. That is, let H consist of all density matrices of dimension d; H has d 2 − 1 free parameters. Let H 0 contain only one element, ρ true . Then the random variable D(ρ true |X) = 2[L(ρ ML |X) − L(ρ true |X)] converges in distribution to χ 2 (d 2 − 1). This distribution has expectation d 2 − 1, so According to this intuition, one can stop iterations when r k is less than a fraction of (d 2 − 1)/2. To make the above intuition more precise, a reasonable stopping rule can be based on the requirement that ρ k be in a confidence region for the true state at a reasonable level of significance s. Such a confidence region can be constructed from likelihood-ratio hypothesis tests with level of significance s. This confidence region is defined as the set of density matrices (or other parameters) ρ for which the observations {x i } and the associated likelihood ratio would not lead us to reject the hypothesis that ρ is ρ true at level of significance s; see theorem 7.2 in ref. [6]. Here we reject the hypothesis that ρ is ρ true if the observed log-likelihood difference D(ρ|{x i }) has a p-value less than s, where the p-value is the probability that the state would (if it were the true state) produce a value for D(ρ|X) at least the observed value. In general, p-values are associated with a random variable, in this case D(ρ|X). For brevity, we omit mention of the random variable when the random variable is clear from context. According to Wilks's Theorem, we can calculate a state's p-value as the integral where f (u) is the probability density function for χ 2 (d 2 − 1). Notice that smaller values for D(ρ|{x i }) correspond to larger p-values. Let t be the value of D(ρ|{x i }) that gives a p-value equal to s. That is, s = ∞ t f (u)du. Thus the confidence region at level of significance s is {ρ : D(ρ|{x i }) ≤ t}. We can ensure that our estimate ρ k is in a confidence region at a predetermined level of significance by stopping when r k is sufficiently small. The level of significance determines the statistical closeness of ρ k to ρ true . Higher levels of significance imply closer ρ k . For example, a level of significance of 0.5 requires that r k is at most the median of the χ 2 (d 2 − 1) distribution. The mean and variance of χ 2 (f ) distribution are f and 2f , respectively, and as f increases, χ 2 (f ) converges in distribution to a Gaussian with the given mean and variance [7]. Therefore, to ensure that ρ k is in a confidence region for the true state with s ∼ 0.5, for large d, one may stop iterations when r k is below (d 2 − 1)/2.

State confidence region stopping rule
Another potential use of ρ k is to construct a confidence region of states based on L(ρ k |{x i }). When determining confidence regions rather than a statistically good approximation of ρ true , it is not enough to ensure that ρ k is statistically close to ρ ML . Because ρ ML is not known exactly, we construct a confidence region at level of significance s by replacing L(ρ ML |{x i }) in the conventional definition of the likelihoodratio confidence region with the log-likelihood of our estimate L(ρ k |{x i }). As we explain below, this confidence region contains the conventional one. For the confidence region to be a good approximation of the maximum-likelihood confidence region requires that r k is less than a fraction of (d 2 − 1)/2, the standard deviation of χ 2 (d 2 − 1). This rule ensures that approximate p-values computed according to 2[L(ρ k |{x i }) − L(ρ|{x i })] and the actual p-values computed with D(ρ|{x i }) are sufficiently close. As a numerical example, consider tomography of a d = 10 quantum system, where we wish to construct the confidence region at level of significance 0.32. In this case, (d 2 − 1)/2 = 7.04, and the threshold for D(ρ|{x i }) is t = 105.04. Suppose we stop iterations when r k 2. If we construct a confidence region as the set of ρ for which 2[L(ρ k |{x i }) − L(ρ|{x i })] ≤ t, the region includes all ρ with p-values above 0.32, but may contain states with p-values as low as 0.23, because the true value of D(ρ|{x i }) can be as large as 2[L(ρ k |{x i }) + r k − L(ρ|{x i })] = 105.04 + (2 × 2) for those states. Note that the p-values included in the region are not data dependent provided we choose the stopping rule beforehand. If we stop at r k 1.5 and set the significance level at 0.05, corresponding to a threshold t = 123.22, the confidence region may contain states with p-values as low as 0.03.

Expectation value confidence interval stopping rule
Another way to utilize tomographic data is to estimate expectation values, such as Tr(ρ true A) and give a confidence interval for the estimate at a given level of significance s. Let F (f ) = {ρ : Tr(ρA) = f } be a level set for Tr(ρ true A). The dimensionality of this level set is d 2 −2. Let φ ML,f be the state in F (f ) maximizing the likelihood. To establish a confidence region for f via a likelihood-ratio test, we use the statistic D(φ ML,f |X). For each f there is an associated p-value (the p-value of the log-likelihood difference between ρ ML and φ ML,f ), and all f 's with p-value at least s are in the confidence region. By Wilks's Theorem, the statistic D(φ ML,f |X) has distribution χ 2 (1). Let t be the maximum value of D(φ ML,f |X) for which f is a member of the confidence interval at significance level s. Following the discussion above, t is related to s through the integral of the χ 2 (1) distribution. The confidence region for f is C = {f : D(φ ML |{x i }) ≤ t}. It is necessary to adapt the stopping rule to the maximum-likelihood problem constrained to F (f ). It is not practical to compute L(φ ML,f ) for all f , neither is it necessary to do so. We may use the Lagrange multiplier technique to compute L(φ ML,f ). With λ as the Lagrange multiplier, we maximize K(ρ, λ) = L(ρ) + λ Tr(ρA). This function is still concave over the full space of density matrices and can be maximized by the same methods as L(ρ) after replacing R(ρ) with R(ρ) + λA. In the standard Lagrange multiplier technique, one usually solves an equation for the value of λ that corresponds to the desired constraint f . Solving such an equation in this case would be difficult, and we do not know the desired f in advance. We need to approximate the values of f that are the limits of the confidence interval. To accomplish this, we choose a value for λ and maximize K(ρ, λ) to find a state φ ML,λ that is the maximum-likelihood state obeying the constraint Tr(φ ML,λ A) = f λ , where f λ depends on the choice for λ. If φ ML,λ has the desired log-likelihood difference t, f λ marks one boundary of the confidence interval. If not, we search for the desired λ by re-maximizing K(ρ, λ) with different choices of λ. This search is simplified by the observation that λ is monotonically related to the log-likelihood of φ ML,λ . This follows from concavity of the log-likelihood: The maximum log-likelihood L(f ) on level set F (f ) is a concave function of f and −λ is the slope of Given an iterative method for maximizing K(ρ, λ), the upper bound on loglikelihood derived from r k generalizes, yielding a bound r(φ j ) on the maximum possible increase in K(ρ, λ) at the j'th iterate φ j . Let f j = Tr(φ j A). Since K(ρ, λ) is constant on level sets F (f ), r(φ j ) is a bound on L(φ ML,f j ) − L(φ j ). Given the iterate found after stopping, we can bound the true value of the desired log-likelihood difference by For a conservative approximation of the desired confidence interval, we run the iterative method with a stopping rule, seeking lower and upper bounds f j for which D lb is close to t. To ensure a conservative estimate, it should be at least t. To avoid an unnecessarily large confidence interval, we should ensure that r(ρ k ) and r(φ j ) are sufficiently small fractions of √ 2, the standard deviation of χ 2 (1). For example, suppose that we wish to approximate a confidence interval at significance level 0.32. The threshold for D(φ ML,f j ) is t = 0.99. Suppose that we stop iterations when r(ρ k ), r(φ j ) 0.3 and set the confidence interval according to {f : D lb ≤ t}. Then the confidence interval includes all f 's with p-values larger than 0.32 and may contain f 's with p-values as low as 0.21. If we set the threshold at t = 3.84 according to a significance level of 0.05 and stop at r k , r(φ j ) 0.2, the confidence interval may contain f 's with p-values as low as 0.04.

Numerical simulation
To illustrate the behaviour of the bound used for the stopping rules, we simulated homodyne measurements [8] of a state created by sending a superposition of optical coherent states (|α + | − α , unnormalized, with α = 1) through a medium whose transmissivity is 80 %. The homodyne measurements are 90 % efficient. The Hilbert space was truncated at 10 photons. Results are shown in Fig. 1, where we have used the RρR algorithm to maximize the likelihood. To make this figure, we computed 1122 iterations and assigned ρ ML = ρ 1122 . Further iterations suffered from numerical errors. After an initial phase of very fast likelihood increase, the convergence rate significantly drops. As expected, L(ρ ML ) − L(ρ k ) decreases with each iteration, but r k sometimes increases. There is a significant gap between r k and L(ρ ML ) − L(ρ k ), so it would be helpful to find tighter bounds to prevent unnecessary iterations. Perhaps the bound could be made more tight using a higher order expansion of the log-likelihood function in ǫ. Without a reliable stopping rule such as one based on r k , a simple strategy is to stop iterations when the difference between successive density matrices obtained is very small. For comparison, the figure includes a plot of the trace distance between ρ k and ρ k+1 . According to the rough guidelines given above, if we want to use the result of this computation to obtain a confidence interval for an expectation value of the true state, we might halt iterations when r k ≤ 0.1, at which point the trace distance between ρ k and ρ k+1 is 3.6 × 10 −7 . In general, the relationship between r k and trace distance is dependent on the situation.

Conclusion
Our bounds on the likelihood ratios hold regardless of Wilks's Theorem, but we have used Wilks's Theorem to construct the confidence regions described above.
Wilks's Theorem must be applied carefully (if at all) when performing quantum state tomography. In particular the techniques discussed above cannot be used if the true state has eigenvalues that are statistically close to 0 or if there is insufficient data for the limiting distributions to be good approximations. Both of these situations are common in applications of tomography and can result in bad confidence regions and excessive biases. For example, we encountered such difficulties analyzing the data reported in Ref. [9]; see the discussion in this reference's supplementary materials. When Wilks's Theorem cannot be applied, one must resort to other techniques such as the robust bounds on log-likelihood differences described in [10,11] or parametric or nonparametric bootstrap [12] for estimating statistical errors and confidence regions. The bootstrap methods require running maximum-likelihood algorithms on many simulated or resampled data sets. Judicious use of one of the stopping rules given above can significantly reduce the number of iterations required when optimizing the likelihood, thereby making it possible to implement bootstrap with more resampled data sets to obtain better estimates. However, if bias in the maximum-likelihood estimate is large, confidence regions constructed by bootstrap may also be unreliable [12].
We have presented an upper bound r k on the log-likelihood difference of the maximum-likelihood state and the currently found state in iterative algorithms for maximum-likelihood tomography. The bound is easily computed from the gradient of the log-likelihood function and can be used in stopping rules for confidence regions or decisions that use the likelihood ratio as a test statistic.