Imprecise probability for non-commuting observables

It is known that non-commuting observables in quantum mechanics do not have joint probability. This statement refers to the precise (additive) probability model. I show that the joint distribution of any non-commuting pair of variables can be quantified via upper and lower probabilities, i.e. the joint probability is described by an interval instead of a number (imprecise probability). I propose transparent axioms from which the upper and lower probability operators follow. They depend only on the non-commuting observables and revert to the usual expression for the commuting case.

My purpose here is to propose a transparent set of conditions (axioms) that lead to quantum lower and upper joint probabilities. They depend only on the involved non-commuting observables (and on the quantum state).
Previous work. In 1967 Prugovecki tried to describe the joint probability of two non-commuting observables in a way that resembles imprecise probabilities [24]. But his expression was not correct (it still can be negative) [12]; see also [14] in this context. In 1991 Suppes and Zanotti proposed a local upper probability model for the standard setup of Bell inequalities (two entangled spins) [25]; see also [26,27]. The formulation was given in the classical event space of hidden variables, and it is not unique even for the particular case considered. It violates classical observability conditions for the imprecise probability [25,28,47]. In particular, no lower probability exists in this scheme. Despite of such drawbacks, the pertinent message of [25] is that one should attempt at quantum applications of the upper probabilities that go beyond its classical axioms. More recently, Galvan attempted to empoy (classical) imprecise probabilities for describing quantum dynamics in configuration space [31]. For a general discussion on quantum versus classical probabilities see [32].
Notations. All operators (matrices) live in a finite-dimensional Hilbert space H. For two hermitean operators Y and Z, Y ≥ Z (larger or equal) means that all eigenvalues of Y − Z are non-negative, i.e. ψ|(Y − Z)ψ ≥ 0 for any |ψ ∈ H. The direct sum Y ⊕ Z of two operators refers to the block-diagonal matrix: Y ⊕ Z = Y 0 0 Z . The range ran(Y ) of Y is the subspace of vectors Y |ψ , where |ψ ∈ H. For orthogonal (sub)spaces A and B, the space A ⊕ B is formed by all vectors |a + |b , where |a ∈ A and |b ∈ B. I is the unity operator of H. I n and 0 n are the n × n unity and zero matrices, respectively. Axioms for quantum imprecise probability. Existing axioms for imprecise probability are formulated on a classical event space with usual notions of con-and disjunction and complemention [44,[44][45][46][47]; see section 2 of the Supplementary Material for a reminder. For quantum probability it is natural to start from a Hilbert space and introduce upper and lower probabilities as operators. The axioms below require only the most basic feature of upper and lower probability and demand its consistency with the quantum joint probability whenever the latter is well-defined.
The usual quantum probability can be defined over (hermitean) projectors P = P 2 [33,34]. A projector generalizes the classical notion of characteristic function. Each P uniquely relates to its eigenspace ran(P ). P refers to a set of hermitean operators {P}: [P, P ] ≡ PP − P P = 0.
(1) P is a projector to an eigenspace of P or to a direct sum of such eigenspaces, i.e. P refers to an eigenvalue of P or to a union of several eigenvalues. The quantum (precise and additive) probability to observe P = 1 is tr [ρP ], where the density matrix 0 ≤ ρ ≤ I defines the quantum state [4,5,33,34]. Let Q be another projector which refers to the set {Q} of observables. Generally, [P, Q] = 0. Given the density matrix ρ, we seek upper and lower joint probabilities of P and Q (i.e. of the corresponding eigenvalues of P and Q): p(ρ; P, Q) = tr(ρ ω(P, Q) ), p(ρ; P, Q) = tr(ρ ω(P, Q) ), where ω(P, Q) and ω(P, Q) are hermitean operators. Their dependence on P and Q can be expressed via Taylor series. We impose the following conditions (axioms): [ω(P, Q), Q] = [ω(P, Q), P ] = 0, ω = ω, ω. Eq.
(2) implies that p and p depend on {P} and {Q} only through P and Q. This non-contextuality feature holds also for the ordinary (one-variable) quantum probability [37,38]. Provided that the operators ω and ω are found, p and p can be found in the usual way of quantum averages.
Conditions (3) stem from 0 ≤ p(ρ; P, Q) ≤ p(ρ; P, Q) ≤ 1 that are demanded for all density matrices ρ. Eq. (4) is the symmetry condition necessary for the joint probability. Eq. (5) is reversion to the commuting case. In particular, (5) ensures ω(P, 0) = ω(P, 0) = 0 and ω(P, I) = ω(P, I) = P . Since Q = I means that Q is anywhere, the latter equality is the reproduction of the marginal probability (which cannot be recovered by summation, since the probability model is not additive).
Finally, (7) means that ω(P, Q) (ω = ω, ω) can be measured simultaneously and precisely with P or with Q (on any quantum state), a natural condition for the joint probability (operators).
If there are several candidates satisfying (3-7) we shall naturally select the ones providing the largest lower probability and the smallest upper probability.
CS-representation will be our main tool. Given the projectors P and Q, Hilbert space H can be represented as a direct sum [39][40][41] [see also section 3 of the Supplementary Material] where the sub-space H αβ of dimension m αβ is formed by common eigenvectors of P and Q having eigenvalue α (for P ) and β (for Q). Depending on P and Q every sub-space can be absent; all of them can be present only for dimH ≥ 6. Now H 11 = ran(P )∩ran(Q) is the intersection of the ranges of P and Q. H ′ has even dimension 2m [40,41], this is the only sub-space in (8) that is not formed by common eigenvectors of P and Q. There exists a unitary transformation so thatP andQ get the following block-diagonal form related to (8) [40]: where C and S are invertible square matrices of the same size holding Now ran(P ′ ) and ran(Q ′ ) are sub-spaces of H ′ . One has C = cos T and S = sin T , where T is the operator analogue of the angle between two spaces. H m αβ are absent, if P and Q do not have any common eigenvector. This, in particular, happens in dim(H) = 2.
The main result. Note that if (3-7) holds for P and Q, they hold as well forP andQ, because ω(P, Q) = U † ω(P ,Q)U for ω = ω, ω. Section 4 of the Supplementary Material shows how to get ω(P ,Q) and ω(P ,Q) from (3-7) and (10,11): Let g(P, Q) = g(Q, P ) be the projector onto intersection ran(P ) ∩ ran(Q) of ran(P ) and ran(Q). We now return from (13,14,9) to original projectors P and Q [see section 4 of the Supplementary Material] and obtain the main formulas: For [P, Q] = 0, g(P, Q) = P Q, and we revert to ω(P, Q) = ω(P, Q) = P Q. Note that [P, (P − Q) 2 ] = [Q, (P − Q) 2 ] = 0. Physical meaning of ω and ω. When looking for a joint probability defined over two projectors P and Q one wonders whether it is just not some (operator) mean of P and Q. For ordinary numbers a ≥ 0 and b ≥ 0 there are 3 means: arithmetic a+b 2 , geometric √ ab and harmonic 2ab a+b . Now (15) is precisely the operator harmonic mean of P and Q [50] g(P, Q) = 2P (P + Q) − Q = 2Q(P + Q) − P, where A − is the inverse of A if it exists, otherwise it is the pseudo-inverse; see section 5 of the Supplementary Material for various representations of ω(P, Q) and ω(P, Q). More familiar formula is g(P, Q) = lim n→∞ Q(P Q) n = lim n→∞ P (QP ) n .
The intersection projector g(P, Q) appears in [33][34][35][36][37]. It was stressed that g(P, Q) cannot be a joint probability for non-commutative P and Q [17]. Its meaning is clear by now: it is the lower probability for P and Q. g(P, Q) is non-zero only if tr(P ) ≥ 2 (or tr(Q) ≥ 2), since two different rays cross only at zero.
Let us now turn to ω. The transition probability between 2 pure states is determined by the squared cosine of the angle between them: | ψ|φ | 2 = cos 2 θ φψ . Eq. (13) shows that ω(P, Q) depends on C 2 = cos 2 T , where T is the operator angle betweenP andQ. Note from (10,11) that the eigenvalues λ of P Q, which hold 0 < λ < 1 are the eigenvalues of C 2 , and-as seen from (13)-they are also (doubly-degenerate) eigenvalues of ω(P, Q). Thus we have a physical interpretation not only for tr(P Q) (transition probability), but also for eigenvalues of P Q (P Q and QP have the same eigenvalues).
The operator ω(P, Q) − ω(P, Q) quantifies the uncertainty for joint probability, the physical meaning of this characteristics of non-commutativity is new.
which nullifies if and only if p = p ′ and p = p ′ , and which reduces to the ordinary distance |p − p ′ | for usual (precise) probabilities. Now tr(ρ ω(P 1 , Q 1 )) > tr(ρ ω(P, Q)), means that the pair of projectors (P 1 , Q 1 ) is surely more probable (on ρ) than (P, Q); see section 7 of the Supplementary Material for examples. Note from (15,16) that if tr(ρ ω(P, Q)) > tr(ρ ω(I − P, I − Q)), holds for ω = ω, then it also hods for ω = ω (and vice versa). Though in a weaker sense than (21), (22) means that P and Q together is more probable than neither of them together (which is the pair (I − P, I − Q)). Eqs. (21,22) are examples of comparative (modal) probability statements; see [30] in this context. what is the most convenient way of defining averages with respect to quantum imprecise probability, or are there even more general axioms that involve the density matrix non-lineary and reduce to the linear situation when (5, 6) (effective commutativity) holds.

SUPPLEMENTARY MATERIAL
Imprecise probability for non-commuting observables by Armen E. Allahverdyan This Supplementary Material consists of seven sections. All of them can be read independently from each other. Sections 1, 2 and 3 recall, respectively, the no-go statements for the joint quantum probability, generalized axiomatics for the imprecise probability and the CS-representation. This material is not new, but is presented in a focused form, adapted from several different sources.
Section 4 contains the derivation of the main result, while sections 5 and 6 demonstrate various feature of quantum imprecise probability.
Section 7 illustrates it with simple physical examples.

NOTATIONS
We first of all recall the employed notations. All operators (matrices) live in a finite-dimensional Hilbert space H. For two hermitean operators Y and Z, Y ≥ Z means that all eigenvalues of Y − Z are non-negative, i.e. ψ|(Y − Z)ψ ≥ 0 for any |ψ ∈ H. The direct sum Y ⊕ Z of two operators refers to the following block-diagonal matrix: . I is the unity operator of H. ker(Y ) is the subspace of vectors |φ with Y |φ = 0. I n and 0 n are the n × n unity and zero matrices, respectively. In the direct sum of two sub-spaces, H ⊕ G it is always understood that H and G are orthogonal. The vector sum of (not necessarily orthogonal) sub-spaces A and B will be denoted as A + B. This space is formed by all vectors |ψ + |φ , where |ψ ∈ A and |φ ∈ B. Given two sets of non-commuting hermitean projectors: we are looking for non-negative operators Π ik ≥ 0 such that for an arbitrary density matrix ρ ik tr(ρΠ ik ) = 1, These relations imply Now the second (third) relation in (26) implies ran(Π ik ) ⊆ ran(Q i ) (ran(Π ik ) ⊆ ran(P k )). Hence ran(Π ik ) ⊆ ran(Q i )∩ ran(P k ). Thus, if ran(Q i ) ∩ ran(P k ) = 0 (e.g. when P k and Q i are one-dimensional), then Π ik = 0, which means that the sought joint probability does not exist.
If ran(Q i ) ∩ ran(P k ) = 0, then the largest Π ik that holds the second and third relation in (26) is the projection g(P k , Q i ) on ran(Q i ) ∩ ran(P k ) = 0. However, the first relation in (26) is still impossible to satisfy (for [P i , Q k ] = 0), as seen from the superadditivity feature (103): 1.2 Two-time probability (as a candidate for the joint probability) Given (23,24), we can carry out two successive measurements. First (second) we measure a quantity, whose eigen- . This results to the following joint probability for the measurement results [ρ is the density matrix] Likewise, if we first measure {Q i } and then {P k }, we obtain a quantity that generally differs from (28): If we attempt to consider (29) [or (28)] as a joint additive probability for P i and Q k , we note that (29) [and likewise (28)] reproduces correctly only one marginal: One can attempt to interpret the mean of (28, (29) as a non-additive probability. This object is linear over ρ, symmetric (with respect to interchanging P k and Q i ), non-negative, and reduces to the additive joint probability for [P k , Q i ] = 0. The relation µ(ρ; P k , I) = tr(ρP k ) can be interpreted as consistency with the correct marginals (once µ(ρ; P k , Q i ) is regarded as a non-additive probability, there is no point in insisting that the marginals are obtained in the additive way). However, the additive joint probability tr(ρP k Q i ) is well-defined also for [ρ, P k ] = 0 (or for [ρ, Q i ] = 0). If [ρ, P k ] = 0 holds, µ(ρ; P k , Q i ) is not consistent with tr(ρP k Q i ), i.e. depending on ρ, P k and Q i both µ(ρ; P k , Q i ) > tr(ρP k Q i ) and µ(ρ; P k , Q i ) < tr(ρP k Q i ) (32) are possible.
To summarize, the two-time measurement results do not qualify as the additive joint probability, first because they are not unique (two different expressions (28) and (29) are possible), and second because they do not reproduce the correct marginals. If we take the mean of two expressions (28) and (29) and attempt to interpret it as a non-additive probability, it is not compatible with the joint probability, whenever the latter is well-defined.

Generalized Kolmogorov's axioms
Given the full set of events Ω, p(.) and p(.) defined over sub-sets A, B, ... of Ω (including the empty set {0}) satisfy [45][46][47]: where Ω − A includes all elements of Ω that are not in A, and where A ∩ B means intersection of two sets; A ∩ B = {0} holds for elementary events.

Joint probability
The joint probabilities of A and B are now defined as Employing the distributivity feature which holds for any triple A 1 , A 2 , A 3 , we obtain from (36,37) for

The main theorem
Let Q ′ and P ′ are two subspaces of Hilbert space H ′ that hold (Q ′⊥ is the orthogonal complement of Q ′ ) The simplest example realizing (54) is when Q ′ and P ′ are one-dimensional subspaces of a two-dimensional H ′ . LetQ ′ andP ′ be projectors onto Q ′ and P ′ respectively. Now I −Q ′ is the projector of P ′⊥ , and let g(P ′ ,Q ′ ) be the projector Q ′ ∩ P ′ . Employing the known formulas (see e.g. [41]) tr(Q − g(Q, I −P ) ) = tr(P − g(P , I −Q) ), we get from (54) which means that dim H ′ should be even for (54) to hold 4 .
Here is the statement of the CS-representation [40]: after a unitary transformationQ ′ andP ′ can be presented aŝ where all blocks in (57) have the same dimension m.
To prove (57), note thatQ ′ andP ′ can be written as [cf. (56)] Next, let us show that We transform as where K = V † V ′ U . We shall now employ the fact that the last matrix in (61) is a projector: The first and second relations in (62) show that [K,B] = [L,B] = 0. Then the third relations producesB(K +L−1) = 0. SinceB > 0 (due to ker(B) = 0), we conclude that K + L = 1. The rest is obvious.

General form of the CS representation
The above derivation of (57) assumed conditions (54). More generally, the Hilbert space H can be represented as a direct sum [39][40][41] where the sub-space H αβ of dimension m αβ is formed by common eigenvectors of P and Q having eigenvalue α (for P ) and β (for Q). Depending on P and Q every sub-space can be absent; all of them can be present only for dimH ≥ 6. Now H 11 = ran(P ) ∩ ran(Q). H ′ has even dimension 2m [40,41], this is the only sub-space that is not formed by common eigenvectors of P and Q.

After a unitary transformation
P andQ get the following block-diagonal form that is related to (68) [40] and that generalizes (57): where C and S are invertible square matrices of the same size holding H ′ refers to P ′ and Q ′ . IfP andQ do not have any common eigenvector,P = P ′ andQ = Q ′ . We start with representation (70, 71) and axioms (3-7) of the main text. These axioms hold forP ,Q andρ = U ρU † [see (69)] instead of P , Q and ρ, because ω(P ,Q) = U † ω(P, Q)U for ω = ω, ω (recall that ω(P ,Q) and ω(P ,Q) are Taylor expandable). Hence we now search for ω(P ,Q) and ω(P ,Q).

Representations for the upper probability operator
Let us turn into a more detailed investigation of (90). Note from (70, 71, 84) that I − g(I − P, I − Q) is the projector to ran(P ) + ran(Q), where ran(P ) + ran(Q) is the vector sum of two sub-spaces. Note the following representation [42]: where The third equality in (91) is the obvious feature of the pseudo-inverse. The first equality in (91) follows from the fact that (P + Q) − (P + Q) is the projector on ran(P + Q) and the known relation [42]: ran(P + Q) = ran(P ) + ran(Q).
Note another representation for the projector to ran(P ) + ran(Q) [34] I − g(I − P, where I − g(I − P, I − Q) equals to the minimal hermitean operator A that holds 2 conditions after |.

Direct relation between the eigenvalues of P − Q and P Q
We can now prove directly (i.e. without employing the CS representation) that there is a direct relation between the eigenvalues of ω(P, Q) and P Q. Let |x be the eigenvector of hermitean operator P − Q: where −1 ≤ λ ≤ 1 is the eigenvalue. Multiplying both sides of (100) by P (by Q) and using P 2 = P (Q 2 = Q) we get which then implies Thus P |x (Q|x ) is an eigenvector of P Q (QP ) with eigenvalue 1 − λ 2 . As seen from (101), the 2d linear space Span(P |x , Q|x ) formed by all superpositions of P |x and Q|x remains invariant under action of bothP andQ. Together with tr(P − Q) = 0 this means that if (100) holds, then P − Q has eigenvalue −λ with the eigen-vector living in Span(P |x , Q|x ).
Further details on the relation between P Q and P − Q can be looked up in [2].
Likewise, ω(P, Q) is operator subadditive, but under an additional condition: They are the analogues of classical features (36) and (37), respectively. Note that (103) and (36) are valid under the same conditions, since QK = 0 is the analogue of A ∩ B = {0}. In that sense the correspondence between (104) and (37) is more limited, since Q + K = I is more restrictive than QK = 0. We focus on deriving (103), since (104) is derived in the same way. Note from (97) that g(P, Q) ≤ Q and g(P, K) ≤ K imply g(P, Q) + g(P, K) ≤ Q + K. Since QK = 0, g(P, Q) + g(P, K) ≤ P . Using (97) for g(P, Q + K) we obtain (103).

Two-dimensional Hilbert space
It should be clear from (82, 83) that in two-dimensional Hilbert space, any lower probability operator ω(P, Q) is zero (since two rays overlap only at zero), while the upper probability operator ω(P, Q) = p(ρ; P, Q) just reduces to the transition probability (i.e. to a number) tr(P Q). Thus for the present case both p and p do not depend on ρ.

Spin 1
The 3 × 3 matrices for the spin components read Now P a ±1,0 for a = x, y, z are the one-dimensional projectors to the eigenspace with eigenvalues ±1 or 0 of L a : where the zero components are orthogonal to each other: Other overlaps are simple as well (α = β) tr(P α j P β k ) = 1/4 if j = 0 and k = 0, = 1/2 if j = 0 or k = 0 but not both, = 0 if j = 0 and k = 0.
Such examples can be easily continued, e.g.