Optimal signal states for quantum detectors

Quantum detectors provide information about quantum systems by establishing correlations between certain properties of those systems and a set of macroscopically distinct states of the corresponding measurement devices. A natural question of fundamental significance is how much information a quantum detector can extract from the quantum system it is applied to. In the present paper we address this question within a precise framework: given a quantum detector implementing a specific generalized quantum measurement, what is the optimal performance achievable with it for a concrete information readout task, and what is the optimal way to encode information in the quantum system in order to achieve this performance? We consider some of the most common information transmission tasks - the Bayes cost problem (of which minimal error discrimination is a special case), unambiguous message discrimination, and the maximal mutual information. We provide general solutions to the Bayesian and unambiguous discrimination problems. We also show that the maximal mutual information has an interpretation of a capacity of the measurement, and derive various properties that it satisfies, including its relation to the accessible information of an ensemble of states, and its form in the case of a group-covariant measurement. We illustrate our results with the example of a noisy two-level symmetric informationally complete measurement, for whose capacity we give analytical proofs of optimality. The framework presented here provides a natural way to characterize generalized quantum measurements in terms of their information readout capabilities.


I. INTRODUCTION
Quantum detectors provide the interface between the microscopic world of quantum phenomena and the world of macroscopically distinct events that we observe. A quantum detector is a device that interacts with the system under observation in a way that establishes correlations between certain properties of the system and a set of macroscopically distinct (orthogonal) states of the device. A general quantum detector can be described by a positive operator-valued measure (POVM), i.e., a set of positive operators {E i }, E i ≥ 0, i = 1, ..., M , summing up to the identity, i E i = I. For an input state ρ, the probability that the measurement yields outcome j is given by the Born rule, p j (ρ) = Tr{ρE j }.
A natural question is to what extent a given quantum detector is able to provide information about the system it is used to observe. This question can be conveniently formulated in the context of a quantum communication scenario, where a sender (Alice) tries to send messages to a receiver (Bob) who is constrained to read those messages using the quantum detector in question. Concretely, let the source of classical information that Alice wants to communicate to Bob be characterized by a probability distribution π i > 0, i = 1, ..., N , i π i = 1, that specifies the probability of each classical message i. Alice encodes the different messages into quantum states via an encoding map i → ρ i , and Bob reads the information by performing the POVM measurement. If there are no constraints on the way Alice can prepare the signal states and these states can reach Bob undisturbed (i.e., Alice and Bob are connected through a noiseless channel), then the optimal performance they can achieve for a given task can be regarded as quantifying the readout capabilities of the measurement with respect to that task. In this respect, a problem of primary importance is to find the optimal encoding (or signal states ρ i ) for which the detector achieves its optimal performance.
The problem just outlined bears strong similarities to the problem of quantum state discrimination [1][2][3][4][5][6][7][8], where the encoding of Alice is fixed and Bob's task is to decide which message he has received by optimizing his measurement. In fact, we will see below that the two problems can be regarded as dual to each other due to the symmetry that exists between the input ensembles and the POVM measurements. This allows one to adopt results from quantum state discrimination to the problem at hand. However, since in quantum state discrimination the space over which we optimize is more constrained due to the completeness relation i E i = I, it turns out that in many cases the problem of optimal signal states for quantum detectors is easier to solve.
Besides its application for characterizing detectors, the problem considered here is of natural practical inter-arXiv:1103.2365v3 [quant-ph] 23 Jul 2011 est for quantum communication, since generating different signal states [9] can be experimentally more accessible than performing different measurements. A quantum detector is usually fixed, while a preparation device, although possibly also based on a fixed (but nondestructive!) measurement, can be used together with post-selection, which provides additional flexibility to the preparation process. Furthermore, in the case of communication through a noiseless channel, any operation at the receiver's side prior to the detector can be equally done as part of the preparation strategy.
In this paper, we consider the above problem from the perspective of three different information transmission tasks-the task of optimal Bayes cost message discrimination (of which the well known problem of minimum error discrimination is a special case), unambiguous message discrimination, and the maximal mutual information. Due to the simplification mentioned above, we are able to provide solutions to the Bayesian and unambiguous discrimination problems in the general case. For the maximal mutual information, we show that this quantity is equal to the classical capacity of the quantumto-classical channel corresponding to the measurement, which we term "capacity of the measurement". This quantity provides a general figure of merit for the information readout capabilities of a detector. Based on its relation to the accessible information [6], we prove a result similar to Davies's theorem [2] (Proposition 2), which shows that the optimal ensemble can be chosen to consist of d 2 pure states, where d is the dimension of the system. For a group covariant measurement, we obtain that the problem is equivalent to that of accessible information of a group covariant ensemble of states. We apply our results to the case of a noisy two-level symmetric informationally complete measurement, for whose capacity we give analytical proofs of optimality.

II. THE BAYES COST PROBLEM
In the Bayes cost problem, one is interested in minimizing an average cost function of the form where P ij = Tr(π i ρ i E j ) are the joint probabilities for input i and measurement outcome j, and C ij ≥ 0 are the elements of the cost matrix (C ij is the cost of choosing hypothesis j when hypothesis i is true). In what we will refer to as the straight version of this problem, one assumes that the encoding i → ρ i is given, and the task is to find the measurement {E j } that minimizes the quantity in Eq. (1) [1]. An example of a Bayes cost problem is that of minimum error discrimination, i.e., minimizing the probability for incorrectly identifying the message. In this case, the probability for an error is given by p err = i =j P ij , i.e., the elements of the cost matrix are Here we are concerned with the opposite scenario which we will refer to as the reverse problem: we assume that the receiver has an apparatus that implements a particular POVM measurement, and we ask what the optimal way to encode the classical messages into quantum states is so that, using only the given POVM measurement and possibly some side information processing, the receiver will identify the message at the lowest cost. This side information processing involves finding the optimal way of choosing hypothesis i when the measurement outcome k takes place, and includes the possibility of following a mixed strategy, i.e., assigning a hypothesis i randomly according to some prescribed probability distribution, which might of course depend on the outcome k. In other words, the receiver can use the given POVM {E k } to obtain new POVM measurements with elements of the form where 0 ≤ p(j|k) ≤ 1 are conditional probabilities. Up to renormalization of the cost matrix, we can assume that 0 ≤ C ij ≤ 1. Hence, the problem is equivalent to that of maximizing the quantity For a given POVM measurement {E i }, consider some encoding and decoding strategies given by the map i → ρ i and the conditional probability distribution p(j|k), respectively. For these strategies, the quantity B(P ) reads Define j * (k) to be a value of j for which the quantity i B ij π i Tr(ρ i E k ), for a fixed k, is maximal (if there are two or more such values, we can pick any one of them). Then, which is achievable by choosing p(j|k) = δ jj * (k) . We see that for the purpose of achieving the maximum in Eq. (3), the receiver does not need a mixed strategy, i.e., the maximum can be achieved by choosing all conditional probabilities p(j|k) to be either 0 or 1. This means that the receiver can associate more than one measurement outcome E k with the same hypothesis j, but it does not help to associate two or more hypotheses with the same outcome. Note that this means, in particular, that in the case when the number of possible messages N is greater than the number M of different outcomes of the POVM, the best strategy is not to attempt to detect certain messages at all. In fact (see below), even when M ≤ N , it may be advantageous to group different POVM elements for the detection of a single state.
Let K j denote the set of those indices k for which j * (k) = j, i.e., the indices k for which the outcomes E k are associated with hypothesis j. Note that the sets K j are non-intersecting as shown above and that some sets may be empty. In other words, the set of possible assignments corresponds to that of all possible ways to distribute M elements into N groups {K α j } N j=1 , where the index α labels each of the N M distributions. Then for any such choice we have whereẼ α j = k∈K α j E k . The maximum of this quantity is achieved when each of the signal states ρ i is chosen to be an eigenstate corresponding to the maximal eigenvalue of the operator j B ij E α j , which we will denote by λ max ( j B ij E α j ). Hence, we can write where we have defined the vectors π = {π 1 , . . . , π N } and We thus see that the problem reduces to that of finding the sets K j for which the quantity in Eq. (8) is maximal. The corresponding partition specifies which outcomes k of the POVM measurement the receiver has to associate with a given classical message j. The optimal encoding strategy is to encode each classical message i into an eigenstate ρ max i corresponding to the maximal eigenvalue of j B ij E α j (note that these states can always be chosen to be pure).
The described optimization procedure involves calculating and comparing a finite set of quantities. In contrast, the straight version of the problem in the general case is a linear program that requires maximization over a continuous set. Even though the task of finding the optimal encoding for a given decoding POVM exhibits an apparent similarity with the problem of finding the optimal POVM for a given encoding (see the symmetry of the cost function (1) with respect to interchanging the POVM elements and the input states), an important difference between the straight and reverse problems is that the quantities over which we maximize in the straight version have to satisfy the constraint j E j = I, whereas in the reverse case there is no constraint on the signal states ρ i .
Observe that in the case when N < M , the above optimal strategy requires at least one of the messages to be associated with multiple measurement outcomes. However, as mentioned earlier, even in the case when N ≥ M , it may be advantageous to associate more than one outcome of the POVM with the same state. For example, in the problem of minimum error discrimination, two POVM elements may have very similar (or even identical) maximal eigenvalues and corresponding maximal eigenstates, but all prior probabilities of the different input messages may differ significantly. Then it is not difficult to see (see examples in the last section) that associating the two measurement outcomes in question with two different messages would be worse than associating both of them with one of the messages-the one that has a higher prior probability.
Note that the special case of minimum error discrimination with a given POVM has been previously studied in Ref. [11] as part of the problem of optimal encoding of classical information in a quantum system for minimal error discrimination when both the encoding and the measurement can be optimized. However, the solution provided in Ref. [11] for a fixed POVM is not truly optimal since it assumes that different outcomes must be associated with different states.
We remark that in certain cases it may be possible to simplify the general procedure described above. For example, in the problem of minimum error discrimination, when the prior distribution is flat, π i = 1/N , i = 1, ..., N , and M ≤ N , all we need to do is encode M of the N different possible messages into the eigenstates corresponding to the maximal eigenvalues of the different POVM elements. In this case, associating multiple measurement outcomes with the same message does not help since ( For a binary source, the minimum error probability can be written in a particularly simple form. In this case, the POVM grouping is { E α , I − E α }. We start discussing the unbiased case (i.e., π 1 = π 2 = 1/2) for which The maximum occurs when ρ 1 and ρ 2 are the states corresponding to the largest and lowest eigenvalue of E α , respectively. The difference between these two values is known as the spread of a matrix, defined for a generic matrix A as Spr(A) = max ij |λ i − λ j |, where λ i are the characteristic roots of A [10]. Hence, Notice the resemblance with the well known Helstrom's state discrimination formula [1], where the trace-distance has been replaced by the (semi-norm) spread. From Eq. (8), the success probability for arbitrary priors reads It is clear that when one signal is given with a prior probability larger than the success probability attained by a two-outcome POVM {E, I −E}, it pays to assign all outcomes to the most probable signal. In other words, the measurement does not add information to our prior knowledge, and the optimal grouping results in the trivial POVM {I, 0}. The transition occurs at π 1 = p s . More explicitly, the trivial POVM is optimal if Notice that if λ max (E) = 1, it is always advantageous to perform the measurement, irrespectively of the prior probabilities.

III. UNAMBIGUOUS MESSAGE DISCRIMINATION
Unambiguous quantum state discrimination [3-5, 7, 8] concerns the task of identifying which out of a set of possible states one has received in such a way as to ensure no error whenever a conclusive answer is given. In general, such conclusive answers cannot always be given, and the problem consists in maximizing the probability with which they occur.
Let {E i } be the POVM the receiver has been provided with and let us allow, as in the previous section, some side information processing that will result in new POVMs, E j [see Eq. (2)]. For the purpose of unambiguously identifying a given set of messages i = 1, ..., N , encoded in the quantum states ρ i , i = 1, ..., N , these POVMs must consist of N + 1 elements: E 1 , ..., E N , representing the conclusive answers, and an additional element E ? that represents the inconclusive one. It must hold that since errors are not allowed in conclusive answers. Any of the elements E 1 , ..., E N , E ? can be the zero operator as a special case. One can readily see that all the conditional probabilities p(j|k) that define { E i } in terms of the original POVM through Eq. (2) can be taken to be either 0 or 1, as for the Bayes cost problem, i.e., { E i } can be taken to be sums of certain subsets of the original POVM elements. This is so because there is no way one can unambiguously identify two or more messages that have been associated with a given E i if the corresponding outcome takes place. (If some outcome i occurs with zero probability, we can add E i to any of the elements E 1 , ..., E N , E ? , as this would not change the probabilities of the respective outcomes.) Similarly, if E k is randomly associated with both a given message i [i.e., 0 < p(i|k)] and the inconclusive answer [i.e., 0 < p(?|k)], the probability of success would increase with the choice p(i|k) = 1.
Thus, for the unambiguous discrimination of N input states ρ i , i = 1, ..., N , each occurring with prior probability π i , consider some grouping of the original POVM elements into N + 1 elements, E α 1 , ..., E α N , E α ? , where, as in the previous section, α labels the various grouping possibilities. Condition (13) requires that Conversely, if each ρ i is chosen to belong to this intersection (assuming it is non-empty), then unambiguous discrimination would be possible with probability Let P α i denote the projector on K α i . Note that this projector can be easily computed because (14) can be written as and we can maximize each of the traces by choosing ρ i to be an eigenstate of P α i E α i P α i with maximal eigenvalue. Let us denote this eigenvalue by λ max (P α i E α i P α i ). Then, we have where, as before, π = {π 1 , π 2 , . . . , π N }, and } in decreasing order of value (this, actually, defines the labeling of the POVM elements E α j ). Note that this ordering ensures maximization of the overlap π · s α . The probability of success of the optimal message discrimination protocol is Here α takes (N + 1) M /N ! different values, namely, the number of different ways of distributing M POVM elements in N +1 sets, where the sum of the elements in each of these sets are E α 1 , . . . , E α N , E α ? respectively (N ! takes into account the specific labeling defined above). Note that certain sets may be empty, i.e., we allow some of the new POVM elements to be the zero operator (the corresponding message will never be identified in these cases).
To compute p s we may consider the following procedure. Pick a grouping α and construct each of the projectors P α i on the intersection K α i for i = 1, 2, . . . , N . If some K α i is empty, terminate the calculation and consider a different grouping α . If there is an empty intersection for all α, the problem does not have a solution (other than the trivial E ? = I), which means that the given POVM {E i } cannot be used to unambiguously discriminate N messages. For each grouping such that K α i = ∅, i = 1, 2, . . . , N , compute s α and pick up the one, α * , that maximizes (17). Optimal detection is attained with the POVM measurement { E α * 1 , ..., E α * N , E α * ? } and the optimal encoding of each classical messages i is provided by with maximal eigenvalue (note that the states ρ i can always be chosen to be pure).
The above solution to the reverse unambiguous discrimination problem works for any POVM. In contrast, there is no known solution to the straight version of the same problem for an arbitrary ensemble of mixed input states (see, e.g., Ref. [8]). As in the case of minimum error discrimination, there are certain similarities between the problem of finding the optimal encoding for a given POVM and that of finding the optimal POVM for a given encoding: for the latter, the POVM { E i } have to be chosen such that E i ∈ ∩ N j =i ker ρ j , which resembles the condition ρ i ∈ ∩ N j =i ker E j in the reverse problem. Furthermore, in the two problems, one has to maximize the same quantity, Eq. (14), where states and POVM elements play essentially the same role (they are interchangeable). Recall, however, that in the straight case optimization has the additional constraint N i E i ≤ I, which makes the problem more difficult.

IV. MUTUAL INFORMATION
The problems considered in the previous sections characterize the ability of a POVM measurement to perform certain information readout tasks (e.g., minimum error discrimination or unambiguous message discrimination) with respect to a given source of classical messages described by the prior probabilities {π i }. These results are strongly dependent on the source. For example, if the source consists of only a single message, each of the tasks can be accomplished with unit probability using any measurement. Such a source, however, is trivial as it contains no information. In this section, we consider a source-independent characterization of the ability of a measurement to extract information which is provided by the maximum mutual information that can be established between the sender and the receiver over all possible sources and suitable encodings at the sender's side for the given POVM measurement at the receiver's side.
Consider an information source characterized by the probability distribution {π i }, i = 1, ..., N , and an encoding i → ρ i . The joint probabilities of the input messages and the outcomes of the POVM measurement {E j }, j = 1, ..., M , are The mutual information between the input and the output is given by We will be interested in the maximum of I(P ) over all possible source distributions {π i } and encoding strate- Note that, according to the data processing inequality, post-processing of information at the receiver's side cannot increase the mutual information, so in this case it cannot help to group POVM elements (or randomize outcomes).
As shown by the following proposition, C({E i }) has a natural interpretation as the capacity of the measurement {E i } which for all practical purposes can be modeled by a quantum channel of the form E(ρ) = j Tr(ρE j )|j j|, where {|j } are orthogonal states that carry the classical information about the outcome of the measurement.
Proposition 1. C({E i }) is equal to the classical capacity of the channel Proof. It is known [12,13] that the classical capacity of a quantum channel M over independent uses of the channel (i.e., when no entanglement between multiple inputs to the channel is allowed) is given by the quantity where S(ρ) = −Tr(ρ log ρ) denotes the von Neumann entropy of the state ρ. The general capacity of the channel, allowing possibly entangled inputs, is where M ⊗n denotes n uses of the channel. For entanglement-breaking channels [14], such as the quantum-to-classical channel E(ρ) above, it has been shown that the quantity χ(E) is additive [15][16][17], in particular χ(E ⊗ E) = 2χ(E), which implies that Furthermore, for any input ensemble {π i , ρ i }, the channel E(ρ) outputs an ensemble of commuting quantum states, {π i , E(ρ i )}, and for such an ensemble it is easy to verify that the quantity is equal to the mutual information in Eq. (19). The proposition then follows from the definitions (20) and (22). A comment is in order here. The classical capacity of a channel is the maximum rate at which information can be transmitted reliably through the channel in the limit of infinitely many uses. Since the optimal measurement for extracting information from the channel E(ρ) is a projective measurement in the basis {|j }, which preceded by E(ρ) is equivalent to the POVM measurement {E j }, the quantity C({E j }) is equal to the maximum rate at which information can be read reliably using the POVM {E j }.
Corollary 1. We have, Observe that we can write the joint probability (18) in the symmetric form Note, however, that the two problems are not identical as the operators {E i } satisfy a stronger constraint than the operators {ρ i }: i E i = I. (A strict duality transformation between signal ensembles and POVM measurements has been established in Refs. [18,19]. We will not be concerned with that correspondence here.) The above suggests that certain results in the study of the accessible information of an ensemble of states may prove useful for the study of the capacity of a measurement. For example, the symmetry of the problems and the difference in constraints implies whereȆ i = E i /d. Therefore, any known lower bound of A can be used to obtain a lower bound of C. For example, the lower bound obtained in Ref. [20] yields where m i = Tr(E i )/d,Ē i = E i /(m i d), and Q(ρ) is the subentropy of a density matrix ρ, which in terms of the eigenvalues λ k of ρ reads [20] (if two or more eigenvalues are equal, one takes the limit as they become equal).
Similarly, one may wonder if the Holevo quantity [21], which provides a simple upper bound on the accessible information A({Ȇ i }), could also provide a useful bound for the capacity C({E i }). As we will see below, however, this quantity is neither an upper nor a lower bound to C({E i }).
Proposition 2. The maximum in Eq. (20) can be achieved with an ensemble of pure input states ρ i = |ψ i ψ i |. Furthermore, the number N of input states can be made to satisfy d ≤ N ≤ d 2 .
This proposition is similar to Theorem 3 in Ref. [2], where it is shown that for a given ensemble of input states, the optimal POVM measurement can be taken to have rank-one POVM elements whose number M satisfies d ≤ M ≤ d 2 .
Proof. As noted in Ref. [2], I(P ) is a convex function over the convex set of N × M probability matrices P with fixed row sums. By a similar argument, I(P ) is a convex function over the convex set of N × M probability matrices P with fixed column sums. This implies that if P is a (N − 1) × M probability matrix obtained from P by replacing two rows by their row sum, then I(P ) ≤ I(P ), with equality when the two rows are proportional. Therefore, for any input ensemble {π i , ρ i }, where ρ i = k p ik |ψ ik ψ ik |, we can consider the purestate ensemble {π i p ik , |ψ ik ψ ik |} which has mutual information with the output no less than that of {π i , ρ i }. (Note that we can assume that no two states |ψ ik ψ ik | are identical, since if they are, we can combine them into a single state with prior probability equal to the sum of their prior probabilities, which does not change the mutual information.) Hence, the maximum in Eq. (20) is attained for an ensemble of different pure states.
Next, observe that Eq. (20) can be written as where the left maximization is over all density matrices ρ, and the right maximization is over all ensembles {π i , ψ i } ρ of pure states ψ i ≡ |ψ i ψ i |, whose averages are equal to ρ, i π i |ψ i ψ i | = ρ. (We note that the quantity max {πi,ψi}ρ I(P ) for a fixed ρ has been previously considered in relation to methods for obtaining bounds on the mutual information [19].) Following closely the proof in Ref. [2], we will show that for any ρ, max {πi,ψi}ρ I(P ) can be achieved by an ensemble of at most d 2 states. Indeed, the latter maximization is equivalent to a maximization over the convex set Y of probability distributions with finite support on the set of pure states, whose average is equal to ρ. Note that the different ensembles {π i , ψ i } ρ give rise to joint probability matrices P with fixed row sums equal to Tr(ρE j ), which according to the convexity property pointed out earlier implies that I(P ) is a convex function on Y . Hence, the maximum is achieved for an extreme point of Y , which by Caratheodory's theorem can be shown to be a probability distribution whose support has ≤ 1 + dim A points, where A is the convex set of density operators of which the pure states we are considering are extreme points. Since dim A = d 2 − 1, we obtain N ≤ d 2 .
To show that in general d ≤ N , we will use the fact that for every d, there are certain types of POVMs for which the optimal ρ in Eq. (31) is full-rank (in particular, we will show below (Theorem 1) that when the POVM is covariant under the irreducible representation of a finite group, the maximum in Eq. (31) is achieved for ρ = I/d).
We next consider the case of a group covariant POVM measurement, which is dual to the problem of accessible information for a group covariant input ensemble [2]. For this purpose, we need to introduce some terminology. Let S denote the set of all states on a Hilbert space H of dimension d. Following Ref. [2], we will regard a representation R of a group G as a homomorphism from G to the affine automorphisms of S, where every such automorphism is representable in the form α(ρ) = U ρU † with U being a unitary or an antiunitary operator (we will consider the action of R automatically extended to all operators over H by linearity). A representation of G is irreducible, if the only G-invariant point of S is I/d.
We will say that the POVM {E j }, j = 1, ..., M , is G-covariant if there exists a surjection f : G → {E j }, where we denote f (g) := E g , such that R g (E h ) = E gh , ∀g, h ∈ G. Note that every element E j must equal E g for at least one g ∈ G, but this correspondence may be degenerate, i.e., a given E j may be associated with two or more elements of the group. The fact the G is a group implies that this degeneracy must be the same for every element E j , and hence M must be a factor of |G|.
Theorem 1 (The group covariant case). If the POVM {E j } is covariant with respect to the finite group G that has an irreducible representation R on S, then there exists a pure state |ψ ψ|, ψ|ψ = 1, such that the maximum in Eq. (20) is achieved by the covariant ensemble of pure input states {|G| −1 , R * g (|ψ ψ|)}, where |G| is the number of elements of G, and R * denotes the representation of G dual to R. The capacity of {E j } is (32) Proof. Let {π i , ψ i } be an ensemble of pure input states that maximizes the mutual information for the given co-variant POVM measurement {E j }. Construct a new input ensemble {π ig ,ψ ig }, wherẽ The new probability matrixP obtained using this ensemble has the form where each of the probability matrices P 1 , P 2 , ..., P |G| is obtained from P by a permutation of the rows and columns of P , and the column sums ofP are all equal to |G| −1 . A straightforward calculation shows that the new probability matrix yields a value for the mutual information which is no less than that obtained for P , i.e., I(P ) ≥ I(P ): Now, consider the covariant input ensembles Let us denote the probability matrices that each of these ensembles yields byP i . Since the ensemble {π ig ,ψ ig } is a convex combination of the ensembles {|G| −1 ,ψ i g }, and the mutual information is a convex function of the input ensemble, we obtain i.e., the maximum in Eq. (20) is achieved for one of the covariant input ensembles {|G| −1 ,ψ i g } which has the form stated in the theorem. The value of the capacity [Eq. (32)] is obtained by a straightforward calculation, taking into account the possible degeneracy in the correspondence between the group elements and the POVM elements.
Notice that since the average of a group covariant ensemble is G-invariant, from the irreducibility of R it follows that g |G| −1ψi g = I/d. This shows that indeed for every d there are POVM measurements that require at least d optimal input states as argued in the proof of Proposition 2.
Comment. The optimal "seed" |ψ ψ| may be such that the input ensemble {|G| −1 , R * g (|ψ ψ|)} contains identical states, i.e., it may be that R * g (|ψ ψ|) = R * h (|ψ ψ|) for certain g = h. The fact that G is a group implies that each maximal set of identical states in the ensemble must contain the same number of elements (and hence the number N of distinct states in the ensemble must be a factor of |G|). It is straightforward to see that the ensemble {N −1 , |ψ i ψ i |} obtained from {|G| −1 , R * g (|ψ ψ|)} by identifying the identical states and redefining their probabilities as the sum of the original probabilities, is also optimal. This is because the joint probabilities resulting from the input ensemble {N −1 , |ψ i ψ i |} can be transformed into those resulting from {|G| −1 , R * g (|ψ ψ|)} by local postprocessing on the sender's side, which cannot increase the mutual information. Hence, the number of states in the optimal ensemble in general may be smaller than |G| (just as the number of outcomes of a group covariant POVM may be smaller than |G|). This is the case, for example, with the optimal ensemble for the two-dimensional SIC-POVM studied in Section V C, which has 4 elements while the symmetry group has 12. Corollary 2. In the group covariant case, we have Moreover, if the POVM measurement {F j } optimizes the mutual information for the input ensemble {Ȇ i }, the input ensemble {F j }, whereF j ≡ F j /d, optimizes the mutual information for the measurement {E i }.
Since under this symmetry the problem is equivalent to that of accessible information of a covariant input ensemble, any known results in the latter case can be applied here (see, e.g., Ref. [2]). In particular, in Section V C we calculate the capacity of the two-dimensional SIC-POVM.
Another important case in which calculating the capacity of a measurement reduces to a well known problem is that of a POVM {E i } with commuting elements, [E i , E j ] = 0, ∀i, j. In this case, we can assume that the optimal signal states ρ i are diagonal in the eigenbasis of {E j }, since for any ρ, the state ρ = diag(ρ nn ), where ρ nn are the diagonal elements of ρ in the eigenbasis of {E j }, yields the same values for the joint probabilities Tr(ρE j ). Furthermore, as we saw in the proof of Proposition 2, the optimal input ensemble can be taken to consist of the eigenstates of all ρ i , which means that the maximum in Eq. (20) is achieved for an ensemble of input states which are the common eigenbasis of {E i }. Hence the joint probabilities are P ij = π i λ i j , where λ i j is the i-th eigenvalue of E j , and the problem reduces to finding max {πi} I(P ) which is the capacity of the classical channel described by the conditional probability matrix p(j|i) = λ i j . Note that a measurement with two outcomes necessarily has commuting POVM elements, i.e., the capacity of a two-outcome measurement is always equal to the capacity of a classical channel with a binary output. Thus, for example, the capacity of a two-outcome qubit measurement that has elements E 1 = diag(α, β), E 2 = diag(1 − α, 1 − β) in some basis can be obtained from the formula for the capacity of a general binary channel [31] where is the entropy of a binary source. The optimal prior distribution in this case is {p, We can now see that, as mentioned earlier, the naively constructed Holevo quantity in general is neither an upper nor a lower bound to C({E i }). Indeed, it is known that the accessible information of an ensemble of density matrices is equal to the Holevo quantity of the ensemble if and only if all density matrices in the ensemble commute, and the maximal value of the mutual information is attained for a projective measurement in the common eigenbasis of the input ensemble. From the symmetry of the problem we see that for a POVM with commuting elements, the quantity S( i m iĒi )− i m i S(Ē i ) is equal to the mutual information between the equiprobable input ensemble of common eigenstates of {E i } and the outputs of the measurement {E i }. However, from Eq. (40) it can be seen that an equiprobable prior distribution is generally suboptimal for this case, i.e., the quantity S( i m iĒi ) − i m i S(Ē i ) can be strictly smaller than C({E i }). On the other hand, in the group covariant case we have C( We remark that the maximal possible mutual information for an input ensemble of states on a Hilbert space of dimension d and any POVM measurement is log d. This can be easily seen from Holevo's upper bound on the accessible information [21]. Moreover, this quantity is achievable only by an ensemble of pure commuting input states that sum up to the maximally mixed state, i.e., by an equiprobable ensemble of orthogonal basis states.
The unique optimal measurement for such an ensemble is a projective measurement on the basis in question. Reversely, any rank-one projective measurement has capacity log d which is achievable by the equiprobable input ensemble of corresponding basis states. Hence, rank-one projective measurements have the highest capacity.

V. EXAMPLE: THE SIC-POVM ON A QUBIT
In this section, we apply the above results to the case of a symmetric informationally complete (SIC) POVM on a qubit, as well as to a noisy, or unsharp, version of this POVM. A SIC-POVM [22,23] in dimension d consists of a set of d 2 rank-one positive operators, E i = (1/d)|ψ i ψ i |, where the pure states |ψ i are such that | ψ i |ψ j | 2 = 1/(d + 1) for i = j. The measurement is called "complete" in the sense that its statistics is sufficient for the full tomography of any quantum state [24,25]. SIC-POVMs are of particular interest due to their various applications in quantum information, including quantum tomography [26], quantum cryptography [27], and the foundations of quantum mechanics [28].
Up to a change of basis, the POVM elements of such a measurement for d = 2 can be written as where σ is the vector of Pauli matrices and In order to illustrate the relation between the "sharpness" of a measurement and its ability to read out information, we will consider a more general, noisy version of the above SIC-POVM, where each outcome is mixed with some amount of white noise, i = 1, 2, 3, 4, 0 ≤ ≤ 1.
When = 1, the measurement reduces to the ideal SIC-POVM [E i (1) ≡ E i ], while as → 0, the measurement becomes infinitesimally weak [29], approaching a trivial measurement, each of its outcomes occurring with probability 1/4 independently of the input state. In this sense, can be regarded as parameterizing the "sharpness" or "strength" of the measurement [30].

A. Minimum error discrimination
For simplicity, let us start with the noiseless SIC-POVM (41). Given the symmetry of the problem, it is enough to consider four groupings, α ∈ {A, B, C, D}: where it is understood that the vectors need to be padded with extra zeros if the number of signal states exceeds four (N > 4). For equiprobable signals, π i = 1/N , the optimal success probability is given by 3) for N = 3, and p s = 2/N for N ≥ 4, which are attained by the groupings C, B and A, respectively. That is, for four signals (N = 4) no grouping is necessary and the signal states have to be chosen to point along the directions of the SIC-POVM (43). Any additional signals (N > 4) can be assigned to arbitrary states and will never contribute to the success probability. For N = 3 one has to group two POVM elements leading to un unsharp effective measurement, and leave the remaining two outcomes ungrouped (i.e., sharp). In that case the three signals lie on a plane: two signals point along, say, n 1 and n 2 (corresponding to the sharp POVM elements), and the third points along −( n 1 + n 2 ). For N = 2 the optimal strategy is two encode the signals into orthogonal states pointing along the directions resulting from pairwise groupings, e.g., n 1 + n 2 and n 3 + n 4 = −( n 1 + n 2 ).
In Figure 1 we show the optimality regions for N = 3 and different priors. Within the region π 1 ≥ π 2 ≥ π 3 , delimited by a dashed outline in the figure, we observe that the set of points where each particular grouping is dominant is a convex polytope. The corresponding maximum success probabilities are: Note that regions C and D correspond to groupings where no outcome is assigned to the third signal state. This illustrates the fact that there are cases (regions C and D) where it pays not to assign any measurementoutcome to some of the messages (i = 3 in this example), even though the source emits them with non-zero prior probability. In particular, if the source is strongly biased towards one message (π 1 ≥ 1/ √ 3, in this example), all but one measurement outcome will be assigned to it (message i = 1).
In order to study the effect of noise, < 1 in (44), one proceeds along the same lines as above. We first note that since the noise is isotropic, the optimal signal states, i.e., the eigenvectors with maximum eigenvalue of the sums of POVM elements in each grouping, A to D, are the same as those for the sharp case, thus independent of the sharpness parameter . Their corresponding maximum eigenvalues in (46) now have a noisy component that scales with the number k αi of elements in those sums. More precisely, the vectors of eigenvalues have now components (s α ) i + k αi (1 − )/4. For equiprobable signals, π i = 1/N , the optimal groupings are those that are optimal in the sharp case. Thus, they do not depend on , only on the number N of input states. The minimum errors are now: p s = 1/2+ /(2 √ 3) for N = 2, p s = 1/3 + (1 + 1/ √ 3)/6 for N = 3, and p s = (1 + )/N for N ≥ 4.
In more generic cases, when the source emits symbols with arbitrary prior probabilities, the regions of optimality do depend on the noise or sharpness parameter . For the case of ternary sources, N = 3, it is straightforward to show that the overall structure of the optimality regions is that in Figure 1, but the point P ( ) where B, C and D intersect, moves monotonically away from P (1) = P = {2 √ 3 − 3, 9 − 5 √ 3} (when the POVM is sharp) to P (0) = {1/3, 1/3} (when it is maximally unsharp).

B. Unambiguous discrimination
We now turn to unambiguous discrimination with the SIC-POVM on a qubit. Clearly, the slightest amount of noise ( < 1) will ruin any possibility of performing unambiguous discrimination since any signal state can trigger each of the outcomes with a non-zero probability. We thus concentrate on the ideal sharp SIC-POVM. In a two-dimensional Hilbert space one can only hope to unambiguously discriminate two states (N = 2; π 1 < 1), hence grouping A can be excluded as it has too many outcomes. Moreover, we need only to consider groupings B and D, since only they have at least one rank-one POVM element and have, therefore, a non empty kernel (K α j = ∅). If grouping B is used, two messages can be unambiguously identified by choosing the signals in the kernels of E 3 and E 4 respectively, that is ρ 1 = (1 − n 3 · σ)/2 and ρ 2 = (1 − n 4 · σ)/2, so that outcome 4 can only be triggered by ρ 1 and outcome 3 by ρ 2 (i.e., E 1 = E 4 , E 2 = E 3 and E ? = E 1 + E 2 ). This leads to a probability of successful identification given by which is independent of the prior probabilities {π 1 , π 2 }. Proceeding along the same lines, one finds that for grouping D one can only unambiguously identify the state ρ 1 = (1 + n 4 · σ)/2 with E 1 = E 4 , by excluding ρ 2 = (1 − n 4 · σ)/2 (i.e., ρ 2 ∈ ker E 1 ), while all other outcomes of the original POVM will be necessarily inconclusive ( E ? = I − E 4 ). Obviously, no outcome will be associated to message i = 2 ( E 2 = 0). The success probability is which beats that of grouping B for π 1 > 2/3.

C. Mutual information
The SIC-POVM on a qubit, including its noisy version, is covariant under the tetrahedral group (indeed, the tips of the Bloch vectors (43) corresponding to the POVM elements define the vertices of a tetrahedron). Therefore, according to Theorem 1 in Section IV, the mutual information for this POVM is maximized by an ensemble of pure input states possessing the same symmetry. Its maximal value, i.e., the capacity of the measurement, is given by Eq. (32) for a state ψ from the optimal ensemble (all other states in the ensemble are obtained from ψ by applying operators of the symmetry group, i.e., ψ plays the role of a "seed" for the ensemble).
Theorem 2 (Capacity of the noisy two-level SIC-POVM). For every value of ∈ [0, 1], the seed ψ that maximizes expression (32) can be chosen such that its Bloch vector is anti-parallel to the Bloch vector of any one of the four POVM elements (44), i.e., v = − n j . The capacity of the (generally noisy) SIC-POVM is This result, which applies to both the straight and reverse formulations of the problem, is interesting on its own right. As far as we are aware, previous results (for = 1) relied on numerical optimization [2]. Here we provide an analytical proof for 0 ≤ ≤ 1.
Proof. Let us define We will first show that the following inequality holds for −1 ≤ t ≤ 1 and 0 ≤ ≤ 1: and h is the derivative of h with respect to its argument.
We can now turn to proving (51). We assume that > 0, since = 0 is a trivial case. If f (t) = h( t) − ℘ (t), then It follows from this equation that there is only one value of t at which f (t) vanishes. But using (53), we see that This bound is attained with any one of the four choices v = − n j . The value of the capacity (50) is obtained by a straightforward substitution. Note that in the minimum error scenario, the optimal signal ensemble is such that each state and its corresponding POVM element have maximum overlap (i.e., they are aligned to each other). In contrast, here we find that it pays to have a signal ensemble where each state would be excluded by one of the POVM outcomes in the absence of noise (i.e., states and POVM elements are anti-aligned to each other). This configuration minimizes the (average) conditional entropy of the output (the POVM outcomes) given the input signal ensemble [recall that the mutual information (32) can be obtained by subtracting this conditional entropy from the entropy of the output, which is constant here].
As expected, the capacity attains its maximal value C 1 = log 4/3 for = 1 (the ideal SIC-POVM) and monotonically decreases towards 0 as approaches 0. Note that, as pointed out in Corollary 2, the capacity of such a group covariant POVM is equal to the accessible information of an equiprobable ensemble of states proportional to the original POVM elements, The latter problem, in the case = 1, was studied in Ref. [2] where it was shown that the accessible information of the corresponding ensemble is A = log 4/3, which is equal to C 1 . The capacity of the ideal SIC-POVM has also been previously obtained in Ref. [19] by a different approach.

VI. CONCLUSION
In summary, we have studied the problem of optimal signal states for information readout with a given quantum detector. We considered some of the most common information transmission problems-the Bayes cost problem, unambiguous message discrimination, and the maximal mutual information. We provided solutions to the Bayesian and unambiguous discrimination strategies. We also showed that the maximal mutual information is equal to the classical capacity of the measurement and studied its properties in certain special cases. For a group covariant measurement, we obtained that the problem is equivalent to the problem of accessible information of a group covariant ensemble of states. As an example, we applied our results for the different discrimination strategies to the case of a SIC-POVM on a qubit, including a noisy version of that POVM.
An interesting question for future investigation is if and under what conditions the optimal solutions provided here are unique. Another question of significant interest would be to obtain an upper bound on the capacity of a measurement. We provided a lower bound which is obtained from a lower bound on the accessible infor-mation, but that lower bound could also be improved. It would also be interesting to investigate the continuity properties of the optimal quantities considered in this paper. For example, if two measurements are close in terms of the distance functions introduced in Ref. [32], are their capacities also close?
Finally, we note that the capacity of a POVM provides a very natural and source-independent means to give a quantitative characterization of a generalized quantum measurement. However, it cannot be used as the unique figure of merit against which measurement devices should be benchmarked. Ultimately, the performance of a given measurement apparatus strongly depends on the task it is meant to accomplish. For instance, a noisy Stern-Gerlach measurement might have a higher capacity than that of an ideal SIC-POVM, however, it would be misleading to claim that such a Stern-Gerlach measurement outperforms the SIC-POVM since the latter can carry out tasks, such as full single-qubit tomography or unambiguous state discrimination, that are impossible to achieve with the former.
Note added. Almost simultaneously with the posting of this paper, two concurrent works appeared-by M. Dall'Arno, G. M. D'Ariano, and M. F. Sacchi (Ref. [33]), and by A. S. Holevo (Ref. [34])-which also introduce and study the capacity of a POVM measurement.