Beating noise with abstention in state estimation

We address the problem of estimating pure qubit states with non-ideal (noisy) measurements in the multiple-copy scenario, where the data consists of a number N of identically prepared qubits. We show that the average fidelity of the estimates can increase significantly if the estimation protocol allows for inconclusive answers, or abstentions. We present the optimal such protocol and compute its fidelity for a given probability of abstention. The improvement over standard estimation, without abstention, can be viewed as an effective noise reduction. These and other results are exemplified for small values of N. For asymptotically large N, we derive analytical expressions of the fidelity and the probability of abstention, and show that for a fixed fidelity gain the latter decreases with N at an exponential rate given by a Kulback-Leibler (relative) entropy. As a byproduct, we obtain an asymptotic expression in terms of this very entropy of the probability that a system of N qubits, all prepared in the same state, has a given total angular momentum. We also discuss an extreme situation where noise increases with N and where estimation with abstention provides a most significant improvement as compared to the standard approach.


Introduction
Knowing the state of a system is a key task in quantum information processing. An unknown quantum state can only be unveiled by means of measurements. These, however, provide only partial knowledge about the system and, furthermore, this information gain comes always at the expense of destroying the state. Only when a reasonably large number N of identically prepared copies of the system is available an accurate estimation of the state is possible. For a given N , the aim is then to find the measurement protocol that yields the best estimate of the input state.
The standard estimation optimization problem is suited for a situation where, say, an experimentalist is confronted with an unknown state of a system of which she is asked to provide an estimate, based of course on the results of a measurement of her choice. A quantitative assessment of her performance is usually given by the expected value of the fidelity (or some other distinguishability measure) between the unknown input and her guess (see below). Hence, it is implicitly assumed that the experimentalist is obliged to provide such guess regardless of the measurement outcome she obtains. In this context, many results have been obtained over the last years in a large variety of scenarios [1,2,3,4,5,6,7,8,9,10,11,12,13].
Here, we will study a variation of this setting suited for a situation where the experimentalist is allowed to decide whether to provide a guess or abstain from doing so. Of course, this decision cannot be based on the actual state of the system (which is unknown by definition) but rather on the result of a measurement. This relaxation of the original setting is very useful because it enables the experimentalist to postselect her measurement outcomes in order to provide a more accurate guess. That is, the possibility of abstaining enables her to discard instances where the measurement outcome turns out not to be informative enough. We will find that abstention can provide an important advantage, specially in noisy scenarios. The problem of 'state estimation with abstention' ‡ is specially relevant in situations where the experimentalist can afford to re-run the experiment, i.e. she can easily prepare a new instance of the problem, or where she prioritizes having high quality estimates.
Post-selection is a widely used tool in quantum information, particularly in experimental scenarios, where one has special demands or constrains. A form of abstention has been already explored in state discrimination [14], another important quantum statistical inference primitive. Discrimination aims at identifying in which, out of a set of known quantum states, a system has been prepared. Two fundamental approaches are usually considered: the so-called 'minimum-error', where the experimentalist always has to provide a conclusive answer, at the expense, of course, of being wrong with certain probability [15], and 'unambiguous discrimination', where no errors are permitted, but instead an inconclusive answer (abstention) may be given with some probability [16]. By varying the allowed rate of inconclusive answers [17,18,19,20], we may go from one approach to the other [21,22,23]. The possibility of abstaining has been studied in [24] for phase estimation, and in [25] for direction estimation with arbitrary pure input signals. In both cases the results show a significant improvement over the standard (without abstention) approach.
In this work we consider the optimal estimation of a completely unknown pure qubit state when N copies of it are available for measurement and when certain rate (probability and rate will be used interchangeably throughout the paper) of abstention Q is permitted. We will show that in an ideal noise-free scenario, abstention does not improve the estimation accuracy. However, it does in a realistic noisy scenario, as we claimed above. Here we will consider a simplified model where noisy measurements will be replaced by local depolarizing channels followed by ideal measurements.
The paper is organised as follows. In the next section we consider estimation without abstention. More precisely, we obtain the protocol that gives the best estimate of the state of a qubit based upon non-ideal measurements on N independent and identically prepared systems. In Section 3, estimation with abstention is introduced, and the optimal protocol for a fixed value of the abstention rate Q is obtained. We study the asymptotic regime of large N and derive the corresponding maximum fidelity and probability of abstention. As an example, we also consider an scenario where abstention gives a drastic improvement. This is the case when noise increases with N in such a way that the fidelity of the estimation approaches a finite value less than one as N becomes large. We close the paper with some brief conclusions and present an outlook for future work.

No abstention
Let us consider N copies of a completely unknown pure qubit state | n (throughout the paper n will denote a unit Bloch vector) that we wish to estimate by performing a realistic, and therefore noisy, quantum measurement. We model it as an ideal measurement preceded by the single-qubit depolarizing channel acting on every copy: where with probability 1 − η no error occurs, while with probability η the state is affected by either a bit-flip, a phase-flip, or both. This error probability η is assumed to be known by the experimentalist, therefore, for the purpose of analyzing the effects of noise in the estimation process, we will transfer its effect to the input states and optimize the estimation protocol over ideal measurements. Hence, we will consider input states of the form ρ( n) = r | n n| + (1 − r) with r = 1 − (4/3)η. In words, we will assume that the input states either do not change with probability r or they become completely randomized with probability 1 − r = (4/3)η. The original problem is thus equivalent to the estimation of a pure state | n (or of a uniformly distributed Boch vector n) based upon the outcomes of an appropriate ideal measurement on N copies of the mixed state ρ( n) in Eq. (2), i.e., on the state ρ( n) ⊗N = τ ( n). For each measurement outcome χ an estimate | n χ is provided according to some guessing rule χ → | n χ . We choose to quantify the quality of the estimate by means of the squared overlap also known as the fidelity. The overall quality of the estimation protocol is then given by the average fidelity where dn = sin θdθ dφ/(4π) is the uniform probability distribution on the two-sphere and p(χ| n) is the conditional probability of obtaining the outcome χ if the input state is τ ( n). This probability is given by the Born rule p(χ| n) = tr[Π χ τ ( n)], where Π χ ≥ 0 are the elements of a Positive Operator Valued Measure (POVM). They satisfy the completeness relation χ Π χ = 1, where 1 denotes the identity operator in the space spanned by the input states {τ ( n)}. The index χ may be discrete, continuous or both.
A protocol (i.e., a measurement {Π χ } and a guessing rule χ → | n χ ) is said to be optimal if it maximizes F . For pure states, r = 1, the maximum fidelity is well-known [3]: It is also known that the (continuous) covariant POVM (with the obvious guessing rule Π( s) → | s ) is optimal. In (6), we use the standard notation, where {|jm } j m=−j is the eigenbasis of the total angular momentum operators J 2 and J z . We denote by U ( s) = [u( s)] ⊗N , u( s) ∈ SU(2), (the unitary representation of) the rotation that maps the unit (Bloch) vectorẑ into s [thus u( s)| 1 2 1 2 = | s ], and we have also introduced the definition J ≡ N/2. Note that the POVM {Π( s)} acts on the symmetric subspace of largest total angular momentum J, of dimention 2J + 1 = N + 1. In terms of J, (5) can also be written as Mixed states span a much larger Hilbert space and the computation becomes more involved. It greatly simplifies in the total angular momentum basis, where the input state τ ( n) is block-diagonal [9]. We have where τ jα ( n) is the normalized mixed state with the definitions: The additional index α, where α = 1, 2, . . . , n j , labels the various occurrences of the irreducible representation of total angular momentum j. The multiplicity n j is given by [26,9] In the sum (9), j runs from j min = 0 (j min = 1/2) for N even (odd) to the maximum total angular momentum J, in contrast to the pure state case where only the maximum value J appears. The numbers p jα > 0 are the probabilities that the state τ ( n) has quantum numbers j and α, i.e., p jα = tr[1 jα τ ( n)], where 1 jα = j m=−j |jm; α jm; α| is the projector onto the corresponding eigenspace. The projector onto the whole subspace of total angular momentum j is then Since the input state is permutation invariant (under the interchange of the individual qubits) representations with the same j are just mere repetitions of the same representation, they contribute a multiplicative factor of n j to the fidelity through the marginal probability p j = α p jα , which reads One can easily check that j p j = 1, as it should be. Because of the block diagonal form of the input states, an obvious optimal measurement consists of a direct sum of covariant POVMs, where each of them is a straightforward generalization of Eq. (6): One can easily check that the completeness condition ds Π( s) = 1 holds. The total fidelity then is where [1] with τ j (ẑ) being any one of the normalized states defined in Eq. (9), (say, the one with α = 1). A straightforward calculation gives Notice that for pure states, one has R → ∞, and in turn J z J → J, in agreement with Eq. (7). As will be shown in the next section, for asymptotically large N the probability p j peaks at a value of j rJ, which gives the dominant and subdominant contributions to the sum in (16). Up to order 1/N , and discarding exponentially vanishing contributions [e.g., ∼ R −rJ ], the asymptotic fidelity turns out to be This result is interesting on its own and, to the best of our knowledge, has not been presented before. Note that for pure states (r = 1) Eq. (19) agrees with the asymptotic expression of the fidelity in Eq. (5).

Abstention
In this section we focus on estimation protocols where the experimentalist is allowed not to produce an answer, or abstain, if the outcome of the measurement she performed cannot provide a good enough estimate of the unknown state. Obviously, F cannot decrease by excluding these abstentions from the average. In noisy scenarios, such as that considered in this paper, F actually increases, as will be shown below. Our aim is to quantify this gain and find the optimal protocol. In our approach, the probability of abstention, Q, is kept fixed, rather than unrestricted, since usually in practical situations one cannot afford discarding an unlimited amount of resources/state preparations.

General framework
To enable the possibility of abstaining, the POVM representing the measurement must include the abstention operator, which we denote by Π 0 , in addition to the operators {Π χ }, each of them associated to a specific estimate | n χ . Thus, the completeness relation reads The probability of abstention (abstention rate) and that of producing an estimate (acceptance rate) are then given respectively by and the mean fidelity defined in (4) becomes now where notice that the sum does not include the Π 0 operator andQ takes into account the abstentions excluded from the average. We next note that for any unitary transformation U of the type defined after Eq. (6), the operators {U Π χ U † , U Π 0 U † } give the same value of Q and F (Q) as the original set {Π χ , Π 0 }, provided we change the guessing rule as n χ → R U n χ , where R U is the SO(3) rotation whose unitary representation is U . Therefore, one can easily prove that Π 0 (the set {Π χ }) can always be chosen to be SU(2) invariant (covariant) by simply averaging over U . In other words, with no loss of generality the POVM elements that provide a guess | s can be chosen as where Π ≥ 0 is the so called seed of the POVM (in particular, note thatΠ(ẑ) = Π). The abstention operator then reads which is manifestly rotationally invariant (as claimed above). It is thus proportional to the identity on each invariant subspace where a j are coefficients that satisfy the condition 0 ≤ a j ≤ 1 and 1 j is defined in Eq. (12). Here we have used the permutation invariance of the input state to fix, without loss of generality, a jα = a j for all α. We can also chooseΠ( s) to have the block-diagonal form of the input state τ ( n), namely, For given {a j }, the optimality of Π jα ( s), defined in Eq. (15), clearly ensures that are also optimal for estimation with abstention. Recalling that the label α is unsubstantial, aside from the multiplicative factor n j , we have from (21) that the abstention probability is simply where p j is given in Eq. (13). The coefficients a j can be understood as the probabilities of abstention conditioned to the input state having total angular momentum j, i.e., a j = p(abstention|j). Similarly, for a given j, the probability of producing an estimate, or accepting, isā j = 1 − a j = p(acceptance|j). From Eq. (22) we obtain and the quantity ∆ j is given in Eqs. (17) and (18). Thus, we are only left with the free parameters a j , which have to be optimized in order to maximize F (Q), subject to the constraints 0 ≤ a j ≤ 1 and (28). Somehow expected, one can show that ∆ j is a monotonically increasing function of j, i.e., ∆ j−1 < ∆ j , therefore the largest contribution to the fidelity is given by ∆ J . This corresponds toā J = 1 andā j = 0, j < J. Hence, for unrestricted probability of abstention, the optimal protocol discards any contribution with j < J. This protocol, however, would provide an estimate with a probability that decreases exponentially with N , for r < 1, as p J (1/r)[(1 + r)/2] N +1 . Notice that in a noiseless scenario, r = 1, there is only the contribution j = J, which is already the optimal one and therefore abstention is of no use in such case. Clearly, for finite Q there can be contributions from other total angular momentum eigenspaces (j < J) compatible with Eq. (28). Recalling the monotonicity of ∆ j , and by convexity, it is obvious from Eqs. (29) and (30) that there must exist an angular momentum threshold j * such thatā j = 0 (ā j = 1), if j < j * (j > j * ). The value j * is determined through Eq. (28) to be Thus, we have In a more physical language, the optimal strategy consists actually of two successive measurements. The experimentalist first measures the total angular momentum j of the input state τ ( n) and decides to abstain (provide a guess) if j < j * (j > j * ). If j = j * , she simply decides randomly, by tossing a Bernoulli coin with probability a j * of coming up heads, and if heads (tails) show up, abstain (provide a guess). In order to provide the actual guess, if she decides to do so, she performs the optimal POVM measurement {Π( s)} [or just { α Π jα ( s)}] in Eq. (14) on the state α τ jα ( n) that resulted from the first measurement.

Small number of copies
In Fig. 1 we plot the fidelity gain due to abstention [F (Q) − F (0)]/F (0) = ∆F/F (0) vs. Q for N = 6, N = 8, and purities of r = 0.3 and r = 0.7. The structure of Eq. (32) is apparent from these plots: at Q = 0 (a j = 0 for all j) there is, naturally, no gain; kinks sequentially appear at the precise values of Q where a new coefficient a j in (32) becomes positive (and j * increases by one); the curves are convex between successive kinks, where the one a j that has become positive, a j * , keeps increasing. This pattern repeats until the abstention rate Q reaches a critical value Q crit at which j * = J, [see Eq. (13)]. Increasing Q further will not provide any additional gain, as the flat plateaus of Fig. 1 illustrate. This is so, since one can view the optimal abstention protocol as a filtering process where the low angular momentum components of the input state are filtered out. Hence, keeping the maximum value of j = J is the optimal filtering beyond which no further improvement is possible. Fig. 1(a) shows that in noisy scenearios, e.g. r = 0.3, abstention can increase the fidelity quite notably, up to 15%. For higher purities the gain is more moderate, as shown in Fig. 1(b). The enhancement in this case is about 4-5% but with an abstention rate slightly above 50%. Further results are shown in Fig. 2, where we plot the fidelity gain as a function of the number of copies N for abstention rates larger than Q crit , and for various values of the purity r.
All the curves have a maximum at a value of N that varies with the purity. The lower the purity, the higher the value of N at which the maximum occurs (e.g., for r = 0.3 the maximum gain occurs at N = 12; for r = 0.1 the maximum is off scale at the right of the figure).
As we have seen, the possibility of abstaining enables us to reach values of the fidelity that otherwise we could only attain with lower levels of noise. To quantify this effective reduction of noise, let us define an effective purity r eff by the implicit equation  F (r eff , N, 0) = F (r, N, Q). That is, for an estimation setting, given by r, N ,and Q, r eff is the purity of the input states that would provide the same fidelity if the standard strategy without abstention (Q = 0) were used instead. Since r is related to the probability of error η in our model of noisy measurements in (1), an increase of the effective purity corresponds to an effective reduction of the amount of noise in the measurement through the relation η eff = (3/4)(1 − r eff ). Figure 3 shows a plot of the effective purity r eff as a function of Q for various values of r and N . As can be seen, r eff increases faster at low values of N , but it saturates earlier (lower Q crit ), reaching a lower value. For low N and for a wide range of purities, 0.1 r 0.9, we observe a constant effective increase of the purity, r eff ≈ r + 0.2, for reasonable values of the abstention rate Q. As N increases one has to go to higher values of the abstention rate, Q ∼ Q crit , to have a significant gain. Hence, a moderate abstention rate is most effective in noisy scenarios when a small, but fair, number of copies is available.
Finally, let us point out that the protocol we have presented requires a projection on the total angular momentum eigenspaces. This is a non-local measurement that nonetheless can be implemented efficiently [27]. In a more extreme scenario where there are no restriction on the abstention rate, one can attain the maximum fidelity with an even simpler strategy: perform a local Stern-Gerlach measurement on every qubit (say, of the z-component of the spin) and abstain unless all outcomes agree. This strategy renders an abstention probability of Q = 1 − [(1 + r)/2] N , which might be comparable to Q crit in Eq. (33).

Asymptotic regime
We next compute the analytical expressions of the fidelity in the large N limit. Here it is useful to define the variable x as which becomes continuous in the limit N → ∞ (J → ∞). In this case, we can replace p j by the continuous probability distribution in [0, 1] defined by so that 1 0 dx p(x) = 1 as N goes to infinity. Eq. (29) can then be approximated by its continuous version, which reads where∆ where recall that∆ j is given in Eqs. (30). From Eq. (32) we see that asymptoticallyā j becomes the step function θ(x − x * ), where x * = j * /J, and we have used the standard definition With this, Eq. (30) becomes and, in turn, It also follows from (28) that At this point, we need to find a good approximation to p(x) that would enable us to obtain the explicit form for the asymptotic fidelity. From Eq. (13), and using the Stirling formula, we obtain where H(s t) is the (binary) relative entropy and the approximation is valid for both x and r in the open unit interval (0, 1). The appearance of a relative entropy in Eq.(42) can be understood as follows. Our N -copy input state (diagonal in the canonical J n basis) can be thought of as a classical coin tossing distribution of N identical coins with a bias of (1 + r)/2. From the theory of types [28] it is well known that the probability to get k heads is given by the Kulback-Leibler distance (or relative entropy) between the empirical distribution {f = k/N, first order in the exponent. The number of heads k is in one-to-one correspondence with the magnetic quantum number, m = k − J, and the conditioned probability p(j|m) is strongly peaked at m = j, as one can easily check. It follows that the probability that the input state has total angular momentum j, given by p(j) = m p(j|m)p(m), will be asymptotically determined by the probability distribution p(m), which has a convenient expression in terms of the typical and the empirical distribution of up/down outcomes. From Eq. (42) it follows that p(x) is peaked at the value x = r, i.e. at j = rJ, as shown in Fig. 4 and stated without a proof in Sec. 2. Actually, around the peak, x ∼ r, the exponent becomes quadratic and p(x) approaches the Gaussian distribution as also follows from the central limit theorem, whereas it falls off exponentially elsewhere.
It is now apparent that, asymptotically, abstention has negligible impact if components with j below rJ are filtered out (x * < r), since the main contribution to the fidelity, which comes from the peak around x r, is not excluded from the integral in Eq. (40) (only the left exponentially decaying tail is). For the same reason [see Eq. (41)],Q 1 (the abstention rate Q is exponentially small), and Eq. (40) yields which is the same expression as the asymptotic fidelity of the protocol without abstention, Eq. (19). It is then clear that, in order to have a discernible improvement in the fidelity, the abstention threshold x * must lie to the right of the peak of the probability distribution. The fidelity in (40) then can be written as where we have used that for x ≥ x * > r and for large enough N , p(x) falls off exponentially and the integral can be approximated by the value of the integrand at its lower limit. By the very same argument Eq. (41) gives which has also been used in (46). Using now (42) we obtain that in the asymptotic limit of many copies, the rate at which our protocol provides a guess is Recalling Eqs. (17) and (18) we obtain the optimal fidelity: for a value of Q given by (48). For x * = r the results (19) and (45) are recovered, whereas for x * → 1 (Q ≥ Q crit ) the maximum average fidelity is attained The advantage provided by our estimation with abstention protocol can be quantified by the effective number of copies that the standard protocol without abstention would require to achieve the same fidelity: N eff = (x * /r)N , where x * ∈ [r, 1) is determined by the abstention rate Q through (48).
For high noise levels (low purity, r 1) our protocol provides an important saving of resources/copies, as N eff /N = 1/r 1, whereas for nearly ideal detectors the saving in this asymptotic regime is more modest.
Alternatively, the advantage discussed above can also be quantified by the effective measurement-noise reduction, or equivalently, the effective purity r eff (See Sec. 3.2). Using (49) one can easily find a simple expression for the effective purity in the asymptotic limit and for large abstention rate: r eff = (r + √ 4r + 5r 2 )/[2(1 + r)]. In the limit of very low noise levels the errors probability η [recall Eq. (1)] is effectively reduced by a factor of three, i.e., η eff = η/3, while in the opposite limit of very noisy measurements one finds r eff = √ r.

Other regimes
In the previous section we have seen how a gain in fidelity can be obtained provided the 'acceptance' rateQ falls off exponentially as N becomes very large. Here we give an example where this gain takes place even at finiteQ. At fixed noise level (purity r), the fidelity is an increasing function of N . However, one could imagine an experimental setup where the noise (purity) also increases (decreases) with N . If this is so, the asymptotic fidelity could be strictly less than one, or in other words, perfect estimation could be unattainable even with unbounded resources. This is the case in our example, were we assume that r = a/ √ N , a being a positive constant. Notice that the threshold x * must also scale as 1/ √ N in order to have a reasonably low abstention rate. Therefore, it is convenient to use a new variable ξ = √ N x = √ N j/J = 2j/ √ N instead. Then, the probability distribution in this new variable is Recalling Eq. (13) and using Stirling formula this equation gives to leading order in inverse powers of N . The subleading terms are of order N −1/2 and will be neglected here. For a given threshold value ξ * = 2j * / √ N the abstention rate is where ξ * ± = (ξ * ± a)/ √ 2 and erf x is the error function. From Eqs. (17) and (18) we have in this same regime and at leading order With the above, the fidelity (29), [or rather, the counterpart of (36)] is where the last integral can be computed to be We can finally write the fidelity as As shown in (53) and (56), both Q and ∆ * are functions of the filtering threshold ξ * , which is just a properly scaled version of the original threshold j * . Finding the maximum fidelity for a given rate of abstention Q requires inverting Eq. (53) to obtain ξ * (Q), but this cannot be done analytically and one has to resort to numerical methods. In Fig. 5 we plot F as a function of Q for a = 1. The increase of the fidelity in the asymptotic regime of large N is clearly seen: e.g., an abstention rate of a 50% yields a rise of about 10%, and it goes up to about 30% for higher (but still reasonable) values of Q. The figure also shows the agreement between the approximate form of the fidelity given by Eqs. (53) to (57) and the numerical evaluation of its exact expression in (29).
It should be noted that in the regime described here a rise of the input size N fails to replicate the fidelity improvement that results from increasing the rate of abstention (no N eff can be defined in this regime), thus abstention appears to be the only means by which one can improve estimation.

Conclusions
In this work we have addressed optimal estimation of pure qubit states when abstention from providing an outcome is allowed. We have considered a reasonably realistic multiple-copy scenario, where a sample of N identically prepared systems go through a non-ideal (noisy) process of measurement. We have shown that in the limit of zero noise, abstention does not help to improve estimation (it does not hamper it either). However, abstention turns out to counterbalance the adverse effect of errors in a noisy process of measurement. We have shown that in general abstention is most useful for inputs of few copies and for error rates of the order of a few percent. E.g., for N = 6 and a value of the error probability of η = 0.5 (per qubit), one can easily attain fidelity gains of the order of 15% with an abstention rate of Q = 4/5. As N increases, one needs to allow for higher abstention rates to obtain a significant improvement. We have given analytical asymptotic expressions of the fidelity valid in the limit of large number of copies. In this limit, abstention can have the effect of increasing the number of copies by a constant fraction: N eff /N = x * /r (x * > r), with an acceptance rateQ given by the relative entropy: −(1/N ) logQ = H[(1 + x * )/2 (1 + r)/2]. For low levels of noise this amounts to reducing the error probability η by a factor of up to three.
We have also considered a scenario where the noise (per qubit) increases with the number of copies in such a way that perfect estimation is unattainable (lim N →∞ F < 1). In this case one can obtain a significant enhancement of the asymptotic fidelity (few percent) even for finite abstention probabilities Q < 1. Moreover, in such scenario abstention appears to be the only way to improve estimation.
In broader parameter estimation contexts, where, e.g., phase or direction information is encoded in more general many-particle states [10], abstention may have a much more dramatic effect. These issues will be analyzed in a separate publication [25].