Quantum probability rule: a generalisation of the theorems of Gleason and Busch

Busch's theorem deriving the standard quantum probability rule can be regarded as a more general form of Gleason's theorem. Here we show that a further generalisation is possible by reducing the number of quantum postulates used by Busch. We do not assume that the positive measurement outcome operators are effects or that they form a probability operator measure. We derive a more general probability rule from which the standard rule can be obtained from the normal laws of probability when there is no measurement outcome information available, without the need for further quantum postulates. Our general probability rule has prediction-retrodiction symmetry and we show how it may be applied in quantum communications and in retrodictive quantum theory.


Introduction
In the probabilistic interpretation of quantum measurement we have on one hand the physical process of preparing a system in some state and then performing a measurement procedure with the outcomes recorded, allowing probabilities which depend both on the measurement procedure and on the preparation process to be determined from the records of many experiments. On the other hand we have the mathematics of Hilbert space entities. To link the two it is axiomatic that there must be some postulate connecting a Hilbert space entity with something physical. The standard quantum probability rule that does this has been highly successful for predicting the outcomes of measurements. This rule could simply be accepted as the required postulate but it may be possible to obtain a better understanding of quantum theory if the rule could be deduced from more fundamental quantum postulates. Gleason's theorem shows, given reasonable assumptions, that quantum probabilities must be expressible as expectation values of projectors or, more precisely, as the trace of the product of a projector and a density operator [1]. This fundamental theorem is of central importance in quantum theory but although it is discussed in some textbooks [2,3] a derivation of it rarely appears, doubtless because of the complexity of Gleason's proof.
Busch has provided a remarkable extension of Gleason's theorem [4]. It is remarkable in three ways: (i) it applies to state spaces of any dimension whereas Gleason's proof only applies for dimensions greater than two, (ii) it extends Gleason's proof by including generalised measurements [3,5,6,7] as well as projective ones and (iii) it is far simpler than Gleason's original proof [4].
Busch associates an outcome m from a measurement with an effectÊ, that is a positive operator less than the identity which can therefore be an element of a probability operator measure (POM), often also referred to as a positive operator-valued measure (POVM). He equates the measurement outcome probability p(m|s) for a system prepared in some state s with the value of a function v(Ê) which he requires to have the following three properties whereF , · · · are also effects. The sum of the effects in (P3) must not exceedÎ. It should be emphasised that these properties are familiar in the theory of generalised measurements, but are derived on the basis of quantum theory [5,6]. Busch's aim was, and indeed ours is, rather different: the intention is to postulate these properties as axioms and to derive quantum probabilities from them. It is not difficult to show from the normalisation condition for the probabilities of all possible outcomes combined with (P2) and the additivity condition (P3), that the sum of the effects must be the identity, that is, the unit operator. Thus the effects representing all possible outcomes for a system in state s form a POM. Also these conditions are consistent with the probability p(m|s) given by v(Ê) being non-contextual in the sense that it has this value independently of the particular POM to whichÊ belongs, that is, it is independent of the particular measuring device as long as the outcome is represented byÊ. To see this, letÊ belong to two different POMs whose remaining elements arê F 1 ,F 2 · · · andĜ 1 ,Ĝ 2 · · · corresponding to measuring devices f and g respectively. Then from normalisation and additivity we have with a corresponding expression for p(m|s, g). Because the elements of each POM must sum to the unit operator, iFi = jĜj so p(m|s, f ) = p(m|s, g).
Busch's property (P1) is a property of probabilities in general. His quantum postulates, that is, those that concern Hilbert space operators, lie in (P2), (P3) and the association of a measurement outcome with an effect operatorÊ. In this paper we drop (P2) and weaken the effect quantum postulate so that it becomes a positive operator quantum postulate. This means we are not assuming that the operators representing the measurement outcomes are elements of a POM, which means that we no longer need to assume they are effects. We do, however, assume they are bounded positive operators and adopt an additivity postulate similar to (P3) but we no longer limit the sum of measurement outcome operators to be ≤Î. We find that it is possible with this reduced number of quantum postulates to derive a probability rule that is more general than the standard rule Tr(Ê iρ ). Furthermore we find that we can then deduce the standard rule from the general rule by the use of normal probability laws.

General Probability Rule
A measurement procedure for the determination of probabilities from a record of many experiments involving preparation and measurement will include a chosen measurement device and the method for recording the results obtained from it. For example, two measurement events, such as a zero and a one photocount event, might be recorded as separate events or as a single event described as less than two photocounts. As another example, some experiments might not be recorded because of a post-selection procedure, whereby an experiment is ignored in the event of a particular measurement outcome. We also include in the measurement procedure any means by which information can be obtained that affects the possibility of a recorded event. This can include posterior knowledge. For example if it is known that a photo-detector will be damaged if subjected to more than a certain number of photons, then an undamaged detector after the detection event will eliminate the possibility of a recording of a larger number of photons. For our purposes here it is sufficient to specify a measurement procedure x mathematically by the set of possible recorded measurement events {m 1 , m 2 , · · ·} that can be obtained from it. We shall not be assuming non-contextuality with respect to the measurement procedure x of the probability that a recorded measurement event is m i , so we shall write this probability as p(m i |s, x) to show that it may depend on x as well as the state s.
Our first postulate is that, for a given measurement procedure x, each possible recorded event m i can be associated with a positive bounded Hilbert space ‡ operator M i , in such a way that p(m i |s, x) is proportional to some function u(M i ) of this operator, that is, where the proportionality factor Q(s, x) is the same for allM i of the set of operators {M 1 ,M 2 , · · ·}, which we can now use to specify the measurement procedure x. We are not assuming that Q(s, x) is independent of the measurement procedure itself or of the particular state s. We note that any set of positive bounded operatorsM i , to which we refer as measurement operators, can define mathematically a measurement procedure and that, while some measurement procedures have reasonably straightforward physical realizations, others may not. The function u(M i ) may in general be a complex number exp(iθ i )w(M i ), say, where w(M i ) is a positive number. The positivity of p(m i |s, x) for all m i then requires Q(s, x) exp(iθ i ) to be positive for all θ i , which in turn requires θ i all to have the same value which we write as θ. Thus we can, from (2), write our first postulate in the form where N(s, x) is the positive normalisation factor Q(s, x) exp(iθ). Our second postulate is that the positive function w(Â) of any positive bounded operator is additive, that is, for all positive bounded operatorsÂ,B, · · ·. We use a method similar to that used by Busch to show firstly that this additivity postulate implies linearity with respect to non-negative rational numbers. From additivity we have, for positive integers r and n, We can then use the additivity and positivity of w(Â) in a limiting argument similar to that used by Busch who showed that αv(Ê) = v(αÊ), where α is real and 0 ≤ α ≤ 1.
In our case we find that αw(Â) = w(αÂ) where α is any non-negative real number. Combining this result with additivity we obtain the linearity relation ‡ It suffices, for our purpose, to consider only state spaces of finite dimension. In this way we avoid complications such as observables with continuous spectra. We may incorporate such observables by means of a suitable limiting process, but such considerations would detract from the essential simplicity of the point that we are trying to make.
We are now in a position to prove our first main result. The measurement operator M i is a positive operator so we can write it in the diagonal form: where {|λ i ℓ } are the eigenstates ofM i and λ i ℓ = Tr(M i |λ i ℓ λ i ℓ |) ≥ 0 are the corresponding eigenvalues, which are all positive. We should note that the positive operators {M i } will, in general, be non-commuting and therefore will have distinct eigenvectors. It follows, using our linearity condition (6) that The w(|λ i ℓ λ i ℓ |) are simply positive numbers, however, and hence we can write whereR i is a positive operator, the diagonal elements of which, in the {λ i ℓ } basis, are w(|λ i ℓ λ i ℓ |). Equation (9) gives no such information about the off-diagonal elements of R i so this operator is not completely determined by Eq. (9) but we can exploit the linearity relation (6) to show thatR i must be independent ofM i as follows. Linearity and Eq. (9) require that w(M 1 ) + w(M 2 ) equals Tr[(M 1 +M 2 )R 12 ] so we must be able to write w(M 1 ) and w(M 2 ) in the form Tr(M 1R12 ) and Tr(M 2R12 ) respectively, where the common operatorR 12 has diagonal elements w(|λ 1 ℓ λ 1 ℓ |) in the {λ 1 ℓ } basis and w(|λ 2 ℓ λ 2 ℓ |) in the {λ 2 ℓ } basis. We can combineM 1 with any other positive bounded operators to form a set defining a measurement procedure so the common operatorR must have diagonal elements w(|λ ℓ λ ℓ |) in any basis {λ ℓ } and thus is independent of any particularM i . We can then write for allM i , showing that the probability that a measurement event is m i depends both on the associated measurement operator and an independent operator, which it is natural to associate physically with the preparation process. We can show that common operator R is unique by using the lemma that two operators having the same diagonal elements in all bases must be equal. We prove this lemma in the Appendix.
To obtain the probability p(m i |s, x) we require the proportionality factor N(s, x), which can be found from the normalisation condition that the probabilities of all possible outcomes sum to unity. This yields whereX = j M j . Dividing the numerator and denominator by Tr(R) yields our general probability law We note thatX depends only on the possible recorded measurement outcomes, thereby characterising the particular measurement procedure x, leaving the unit-trace positivê ρ as a density operator to characterise the state s. This is the first main result of the paper: if we reduce the number of Busch's quantum postulates by discarding (P2) and, relaxing the assumption that the operator representing a measurement outcome must be an effect to simply being a positive bounded operator, we arrive at a probability law that any set of positive operators (with finite eigenvalues) can provide a set of probabilities and that these probabilities are calculated using (12). Before proceeding, we give a simple illustration of the meaning of our second postulate, the additivity postulate. The measurement procedure x only enters into (12) as the sumX. Consider a particular measuring device with, among other measurement events m 3 , m 4 , · · ·, the events m 1 and m 2 corresponding toM 1 andM 2 if these are recorded separately. If we record these events together as one event m 1 or m 2 our additivity postulate implies that the corresponding measurement operator isM 1 +M 2 . The sumX is thus the same whether the measurement procedure involves separately recorded events or a single combined event. As a result of this, while p(m 3 |s, x) depends on whether m 4 is a possible recorded event or not, it does not depend on whether m 1 and m 2 are recorded together or separately.

Standard probability rule
It remains for us to determine the physical meaning of our general probability law. In doing so we arrive, very naturally, at a Bayesian interpretation. Consider the case where we know that a number of possible states s k , for which the density operators areρ k , have probabilities p k of being the prepared state. The state s based on this knowledge will have a density operatorρ = k p kρk representing the average or a priori density operator and the probability of the recorded event being m i will be given by (12). If the state actually prepared was s k , say, then in place of (12) we would have a different probability We should be able to obtain (12) as a sum of these objects, suitably weighted by a probability: For this to hold in general we need only to set The fact that both the P k and the p k are probabilities means that their ratio is a likelihood [8], which we can interpret as the likelihood of s k given x: In order to adopt this interpretation it is necessary to interpret P k as an a posteriori probability based on some knowledge relating to the recorded outcome of the measurement. Specifically this will be knowledge affecting the possibility that some outcomes may occur. AsX is the sum of the operators representing the possible recorded measurement outcomes, its value will depend on this knowledge. Then P k will also depend on this knowledge from Eq. (15). The simplest example is where the actual outcome itself is known to be m i , say, and thenX =M i as no other outcomes are possible any longer. We then find from Eq. (12) that the a posteriori probability that the outcome is m i is unity as it must be. Another example is where joint events (s, m) showing the input state and the consequent measurement outcome are recorded after a known post-selection procedure has rejected some joint events containing particular measurement outcomes. This has the effect of reducing the number of possible recorded outcomes and thus the sum of the operators representing them. In this context P k is just the probability that the state in a recorded joint event is s k . We shall express the a posteriori nature of P k by writing it as P (s k |x), that is, the probability that state s k was prepared in a recorded experiment conditioned on the operator corresponding to the measurement outcome being limited to one of the reduced number of terms in the posterior expression forX. This leads us in turn to interpret p(m i |s, x) in Eq. (12) as which is consistent with Bayesian probability, confirming our interpretation of P (s k |x).
If there is no post-selection and no posterior knowledge about measurement results that can eliminate or reduce the possibility of particular measurement events and thus of the preparation events that may have produced them, then the a posteriori probability P (s k |x) that any state s k has occurred must be equal to the a priori probability p k that this state occurs. In this case we have, from Eq.
For this to hold for anyρ 1 ,X must commute with anyÛ and must therefore be proportional to the unit operator, that isX = KÎ. Then our general probability rule (12) becomes the standard, or restricted, probability law E i are therefore effects and form a POM. In Busch's notation p(m i |s) equals v(Ê i ).
Using the latter expression for the left side of Eq. (20) and then summing both sides over i gives condition (P2), which we see is a result of our approach, obtained from our general formula (12) by the usual rules of probability, rather than being an additional quantum postulate.

Applications
It is natural to ask whether there are any applications of our more general probability formula (12). Here we present three such applications. An obvious, but often overlooked, one is to measurement probabilities when we have some (incomplete) information about the measurement outcome. It is often the case in quantum optics experiments, for example, that we restrict our attention to probabilities given some future event, such as a two-photon cascade in which the detection of one photon is used to herald the emission of another [9]. In such casesX will be restricted to only those measurement event operatorsM i that include the heralding event.
A second example arises in the theory of quantum communications [7]. Here a transmitting party, Alice, selects from a set of possible states s i , with density operatorŝ ρ i and prior selection probabilities p i , and sends a quantum system prepared in this state to a receiving party, Bob. Bob's task is to determine from a measurement, as well as possible, the state prepared by Alice. As he knows from the measurement that the outcome is m j corresponding toM j , say, he knows thatX contains just this single term, that is, his knowledge has eliminated the possibility of all other terms. He can therefore simply write the sum of the possible terms asX =M j and obtain from (15) the a posteriori, or retrodictive, probability that Alice sent the system in state s k P (s k |m j ) = Tr(M jρk p k ) Tr(M jρ ) .
We note that retrodictive probabilities such as this can also be found by using Bayes' theorem in conjunction with the usual expression for the quantum probability Tr(Ê jρ ) [10]. Peres [11] has described an expression equivalent to Eq. (22) as the only retrodictive form that can be legitimately derived from conventional quantum mechanics. However here there is no need to add a Bayes rule; it is already contained in the general probability law (12) expressed in the form (15). We note that there is symmetry between the retrodictive form of our probability law (22) and the predictive form (12) which we write here as p(m k |s j , x) = Tr(ρ jMk ) Tr(ρ jX ) (23) withρ k p k in (22) corresponding toM k in (23),M j in (22) corresponding toρ j in (23) and thusρ in (22) corresponding toX = kMk in (23). This allows Bob an alternative and equivalent way to retrodict by defining a density operatorρ j for a "retrodictive state" asM j /Tr(M j ), writingM k asρ k p k and writingX asρ and then substituting into the right side of the predictive formula (23) to obtain the retrodictive expression (22). In this way the general probability rule (12) can be used for both prediction and retrodiction without the need to invoke Bayes' theorem, which is already effectively contained in the law. If there is a time interval between preparation and measurement, then Alice would need to allow for evolution of her predictive state in this interval to calculate the probability of a measurement event and Bob would need to allow for the retroevolution of the retrodictive state to retrodict a preparation event.
Our final example completes the resolution of a long-standing controversy in retrodictive quantum theory [12]. In retrodictive quantum theory we assign a retroevolving quantum state on the basis of a later measurement and can use this to ask questions about, among other things, initial preparation events. It has been suggested that we can only apply quantum retrodiction if there is no prior information about the preparation event so the prior initial density operator has an unbiased form and is proportional to the identity operator [13,14]. This is a result of attempting to find a retrodictive formula by making the restricted predictive probability Tr(Ê iρ ) symmetric or causally neutral [15] or by using a time-reversed form of Gleason's theorem [14]. From the symmetry inherent in our general probability rule, which reduces to the restricted predictive form whenX ∝Î, it is easy to see from the correspondence betweenX and ρ above that our general retrodictive formula will reduce to the restricted retrodictive form whenρ ∝Î. To obtain the general, and far more useful, retrodictive probability formula from causal neutrality of a predictive formula it is necessary to start with the general predictive form. For this reason it is also inadequate to use a time-reversed form of Gleason's theorem. Looked at from another view point, retrodicted preparation probabilities are quite often contextual, depending on what other states could possibly be prepared. For example if photon number states are being prepared, there is some limit set by the amount of energy available or simply by the difficulty in preparing some states §. Thus time-reversed theorems incorporating non-contextuality are inappropriate for a general treatment. § This type of situation is considered by Dressel and Jordan [16], who use a symmetric formulation of quantum theory based on quantum instruments (basically corresponding to measurement devices) to derive predictive, retrodictive and "interdictive" states.

Conclusion
We should note that it is also possible to derive a relationship between Bayes' theorem, predictive and retrodictive quantum theory based on an assumed expression for measurement and preparation probabilities in which preparation and measurement operators appear symmetrically [17,18]. Our general probability rule as derived in this paper, however, enables us to arrive at the correct expression for retrodictive probabilities without postulating a symmetric form for the probabilities and thus may be regarded as a more fundamental approach that formally justifies this earlier work.
Busch has relaxed Gleason's postulate that measurement outcomes must be represented by projectors by allowing measurement outcomes to be represented by effects. In this paper we have further relaxed this to the postulate that the probability of a measurement outcome for a particular input state and measurement procedure is proportional to a positive additive function of a bounded positive operator. By allowing the proportionality constant to depend not just on the state but also on the measurement procedure, including choice of measurement device, we are explicitly not assuming non-contextuality in relation to other possible measurement outcomes. Any set of positive operators (with strictly finite eigenvalues) can represent measurement outcomes and can be used to calculate the probabilities of these outcomes. The usually adopted requirement that these operators must sum to the identity is not assumed but follows from our approach for the case when there is no prior information about the measurement outcome. This resulting standard, or restricted, probability formula is seen to be a special case of a more general causally-neutral symmetric formula, which can be used for both prediction, that is finding probabilities of measurement outcomes, and for retrodiction involving finding the probabilities of preparation events. When used for prediction, the formula is applicable even when there is partial knowledge of possible measurement outcomes as may occur when post-selection is involved or when there is simply incomplete reporting of outcomes that have occurred. Retrodictive probabilities can, of course, be calculated from the usual restricted formula by employing Bayes' theorem but there is no need to invoke Bayes' theorem when using the general formula. Important examples of the use of our general formula include quantum communications, retrodictive quantum theory and where there is prior agreed postselection of measurement results.