Quantum correlations in the temporal CHSH scenario

We consider a temporal version of the CHSH scenario using projective measurements on a single quantum system. It is known that quantum correlations in this scenario are fundamentally more general than correlations obtainable with the assumptions of macroscopic realism and non-invasive measurements. In this work, we also educe some fundamental limitations of these quantum correlations. One result is that a set of correlators can appear in the temporal CHSH scenario if and only if it can appear in the usual spatial CHSH scenario. In particular, we derive the validity of the Tsirelson bound and the impossibility of PR-box behavior. The strength of possible signaling also turns out to be surprisingly limited, giving a maximal communication capacity of approximately 0.32 bits. We also find a temporal version of Hardy's nonlocality paradox with a maximal quantum value of 1/4.


Introduction
Quantum theory displays many counterintuitive features which are in stark contrast to our everyday experiences in the macroscopic world. Possibly the most extreme of these is the collapse of the wavefunction due to measurement; its contentious interpretation has given rise to the measurement problem. Obviously, the only possibility to observe and study wavefunction collapse and its entailments is to conduct measurements on the collapsed wavefunction. Therefore, in order to gain a better understanding of what the collapse means and how it occurs, one has to study repeated measurements on the same quantum system, both from a theoretical and from an experimental perspective. This should be seen as motivation for our work on temporal quantum correlations. In theories different from orthodox quantum mechanics, for example when wavefunction collapse is not absolutely instanteous [Pea99], the properties of temporal correlations are likely to be different from those presented here.
Quantum correlations have mostly been investigated for scenarios of several spacelike separated parties sharing some nonlocal correlations. The simplest situation one can consider here is the Clauser-Horne-Shimony-Holt (CHSH) [CHSH69] scenario: two parties, commonly dubbed Alice and Bob, each operate with a physical system of their own on which they respectively conduct one of two dichotomic (i.e. two-valued) measurements. Then, on the one hand, quantum theory entails phenomena that cannot be achieved classically: many quantum states that have the property of being entangled let Alice and Bob observe correlations between their measurements which cannot be explained by classical models defined in terms of local hidden variables; this non-classicality can be detected by observing violations of the CHSH inequalities. These inequalities precisely characterize those correlations having local hidden variable models. Furthermore, Hardy's nonlocality paradox [Har93] shows that this feature is not solely a quantitative trait of the joint outcome probabilities: it proves that there also exists a qualitative difference between quantum correlations and the realm of local hidden variable models. On the other hand, it has been found out that there are nevertheless strict limitations on which correlations can be observed with quantum-mechanical systems. The Popescu-Rohrlich box (PR-box) is a joint probability distribution that is consistent with the causality principle of no-signaling, but yet such a PR box cannot be constructed in a quantummechanical world. This can be seen most directly from the Tsirelson bound, which specifies the maximal quantum violation of the CHSH inequalities.
In this paper, we study a temporal version of the CHSH scenario. We may imagine a single physical system in a laboratory, on which the two experimentalists Alice and Bob can conduct their measurements. However it so happens that their work shifts do not intersect, and Alice leaves the lab before Bob arrives. Now it is known that Alice, during her shift, has measured one of the two ±1-valued observables a 1 or a 2 , and likewise, Bob will measure one of the two ±1-valued observables b 1 or b 2 . It is crucial to assume that Alice only conducts one of the two projective measurements a 1 and a 2 , so that she cannot disturb the system and its natural dynamics in any other way. Then which joint probability distributions for the measurement outcomes can possibly arise in this way? In the following sections we answer certain aspects of this question. Just like in the spatial case, we find both fundamental possibilities achievable by such quantum correlations, and fundamental limitations on these quantum correlations. There are analogues of all the spatial phenomena mentioned in the previous paragraph: impossibility of hidden variable models-following [Lap06], no locality or non-invasiveness assumption is actually needed-, a version of Hardy's paradox which turns out to be stronger than in the spatial scenario, the possibility of signaling in a limited form, impossibility of the PR-box, and the Tsirelson bound. Moreover, although the set of joint probabilities realizable by spatial quantum correlations is strictly contained in the set of joint probabilities realizable by temporal quantum correlations, we find that the set of realizable correlators is the same in the temporal case as in the spatial case.
There has been a considerable amount of previous work on the properties of temporal quantum correlations. In particular, the Leggett-Garg inequalities [LG85] characterize the probabilistic hidden variable models for the scenario that one measures two-time correlators between three ±1valued observables 1 , and it is known that these can be violated quantum-mechanically. In contrast to spacelike separated situations, it is not necessary here to have more than one observable for each "party", i.e. at each point in time, since the observables between the different points in time need not commute, leading to specifically quantum phenomena. Very recently, Avis, Hayden and Wilde [AHW] have classified all tight Leggett-Garg inequalities for the two-time correlators between any number of dichotomic observables as precisely the facets of the cut polytope. Some other relevant references include [Lap06] and [BTCV].

Joint probabilities in the temporal CHSH scenario
We start with several statements about temporal correlations between projective quantum measurements of ±1-valued observables. Then we describe the temporal CHSH scenario, which has been outlined in the introduction, in a little more detail.
2.1 Setting the stage. Consider a single quantum system with an underlying Hilbert space H and dynamics described by the Hamiltonian H. Furthermore, we have ±1-valued, i.e. dichotomic, observables a and b, which are hermitian operators on H with the property Note that we can bring any pair of two-valued observables into this form by relabelling the outcomes as +1 and −1. Now Alice measures a at time t A and Bob measures b at time t B . Both measurements are assumed to be perfect projective von Neumann measurements, so that the state collapses to an eigenstate of the corresponding observable upon the measurement. This assumption is relevant for Alice since it limits the way in which her measurement a can influence the system; we will see in paragraph 2.3 that if we would allow arbitrary generalized measurements (Lüders measurements) for Alice, then any set of joint outcome probabilities without signaling from Bob to Alice could be modelled even with commuting Kraus operators, i.e. with a classical probabilistic system. However for Bob, the assumption of projective measurements is not essential: since his postmeasurement state does not get measured, this post-measurement state is irrelevant and only his outcome probabilities matter. And concerning these, we can always enlarge the Hilbert space to turn any POVM into a projective measurement while preserving the outcome probabilities. We take the system to be in the pure initial state |ψ just before Alice's measurement at time t A . The assumption of a pure initial state is merely for notational convenience, and all following calculations would also apply mutatis mutandis to the case of a mixed initial state. Note also that in case of a mixed initial state described by a density operator ρ on H, we can replace it by a purification |ψ on H ⊗ H ′ for some H ′ , while replacing the observables a and b by a ⊗ ½ and b ⊗ ½.
This retains all joint outcome probabilities.
When working in the Heisenberg picture, the unitary evolution of the state is trivial, while Bob's observables evolve according to b ′ ≡ e −iH(tB −tA) b e iH(tB −tA) .
Since the observable b was arbitrary, the evolved observable b ′ is also just an arbitrary ±1-valued observable on H. Hence as far as the existence of quantum-mechanical models for joint probabilities is concerned, the dynamics is irrelevant. In particular, we will choose H = 0 for simplicity, so that b ′ = b. Then wavefunction collapse is the only "dynamics" present in our formalism.
2.2 Joint probabilities and correlators. Now we calculate the joint probabilities in terms of a, b and |ψ . For the ±1-valued observable a, the projection operator onto the +1-eigenspace and the projection operator onto the −1-eigenspace are given by, respectively, and in the same way for b. Using the Born rule together with the projection postulate shows that the joint probability for Alice to get the outcome r ∈ {−1, +1} and for Bob to get the outcome s ∈ {−1, +1} reads as ½ + ra 2 ψ = 1 4 + 1 4 r ψ|a|ψ + 1 8 s ψ|b|ψ + 1 8 rs ψ|{a, b}|ψ + 1 8 s ψ|aba|ψ . (1) In this expression, {·, ·} denotes the anticommutator of two operators. P (r, s) is the probability that Alice observes the outcome r, multiplied by the probability that Bob gets the outcome s upon measuring the state of the system after state collapse due to Alice's outcome being r.
We also consider correlators, which are defined as (2) Using (1), the correlator can be expressed as which is intuitive since only the rs-term in equation (1) suggests any kind of correlation between the outcomes. So strangely, even though our scenario has a clear temporal order, the correlators do not depend on who measures first! As far as we can see, this curious property does not generalize to observables with more than two outcomes or to scenarios with more than two parties. Note that when we use the term "correlation", we simply mean "specification of joint outcome probabilities for all allowed choices of observables", while the notion of "correlator" refers only to the quantity (3).

2.3
The CHSH scenario. In the CHSH scenario, Alice and Bob both have an independent choice between two observables. While Alice can select either the observable a 1 or the observable a 2 , Bob has the freedom to measure either b 1 or b 2 . For each of the resulting four choices, we obtain a distribution of joint probabilities of the form (1). We will use the notation to denote the probability that Alice gets the outcome r ∈ {−1, +1} and Bob gets the outcome s ∈ {−1, +1}, given that Alice measures a k and Bob measures b l . Finally, we will use the notation C kl for the correlator between a k and b l . As announced in paragraph 2.1, it will now be proven that any set of probabilities (4) has a quantum-mechanical representation in terms of generalized measurements (Lüders measurements) for Alice, under the assumption that these probabilities satisfy causality in the sense that there is no backward signaling from Bob to Alice. This intuitive condition means that the joint probabilities can be factorized as P (r, s|k, l) = P B (s|r; k, l)P A (r|k) where P A (s|k) designates the outcome probabilities for Alice's measurement alone, and these are assumed to be independent of Bob's data l and s. On the other hand, Bob's conditional outcome probabilities P B (s|r; k, l) may well depend on Alice's data in an arbitrary way. Condition (5) is necessary for the existence of a representation via generalized measurements, since the product representation (5) is essentially how one would typically calculate the joint probabilities starting from the quantum-mechanical data: first determine Alice's outcome probabilities P A (r|k) given the initial state |ψ , then calculate Bob's outcome probabilities P B (s|r; k, l), and finally multiply these two probabilities to obtain the desired result. Bob's probabilities P B (s|r; k, l) depend on the system's quantum state after Alice's measurement, and this state in turn is determined by k, r and |ψ . Conversely, in order to find a quantum-mechanical representation for an arbitrary such set of probabilities, consider a five-dimensional Hilbert space with orthonormal basis We take the initial state of the system to be |ψ = |0 . There exist generalized measurements such that the state after Alice's measurement is |1 + if she measured a 1 and obtained a +1 outcome, and it is |1 − if she measured a 1 and obtained a −1 outcome, and similarly for |2 + and |2 − . Concretely, one can implement such measurements for example by using the Kraus operators as describing the measurement of a k . The first term guarantees that the post-measurement state of V r k is the desired |k r and that the given measurement statistics are reproduced, both on the initial state |ψ = |0 . (The other terms are merely needed for satisfaction of the completeness relation as representing the measurements b 1 and b 2 ; since Bob's post-measurement state does not get observed, we do not have to specify any Kraus operators implementing these POVMs. By construction, these POVMs reproduce the desired outcome probabilities P B (s|r; k, l) on the corresponding states This ends the construction of a quantum-mechanical model with generalized measurements for (5). Some final remarks: since neither the initial state nor any postmeasurement state is a superposition of basis states, this construction effectively yields a classical stochastic system. The trick in the construction is that Alice's post-measurement state keeps track of both her measurement setting and her outcome. This conditional state collapse to mutually orthogonal states would not be possible if we would only allow projective measurements for Alice.
2.4 Temporal hidden variable models. Using the assumption of what they called "macroscopic realism" and "non-invasiveness", Leggett and Garg [LG85] derived an inequality satisfied by temporal correlations in hidden variable models which is violated by certain temporal quantum correlations. Macroscopic realism is the assumption that the system is, at each instant in time, definitely situated in one of several distinct states. This system state determines all measurement outcomes exactly; in this sense, all observables possess preexisting definite values. This is thought to apply to macroscopic objects in particular, hence the name "macroscopic realism", or more succinctly "macrorealism". The crucial assumption now is non-invasiveness: this postulates that a measurement does not disturb the state of the system. There is an additional hidden assumption which has been made explicit and dubbed "induction" by Leggett [Leg08]: it is understood that the state of the system at time t is sufficient information to calculate the outcomes of all future measurements. (In other words, causality only propagates forward in time.) All of these assumptions seem rather natural when dealing with macroscopic systems. In a manner analogous to the spatical case, one can now use these premises to derive (see [Leg08], compare [BTCV]) the temporal CHSH inequality: On the other hand, it is known that this inequality can be violated by certain quantum correlations [BTCV]. This is an exciting area due to promising prospects of using such results for testing the applicability of quantum theory in the macroscopic domain [PLMN + 10]. We will get back to hidden variable models in section 6.
2.5 Comparison to the spatial scenario. In general, the non-invasiveness assumption for hidden variable models is the exact analogue of locality in the spatial case. In both cases, the distribution of joint measurement outcomes is a probabilistic combination (i.e. a convex combination) of a collection of realistic models; a realistic model in turn is described by a hidden variable λ, constant over space and time, which determines all the outcomes of all possible measurements in a definite way. Therefore, there is absolutely no difference between local hidden variable models in spatial scenarios, and non-invasive hidden variable models in temporal scenarios. So the reason that one considers inequalities characterizing hidden variable models for temporal scenarios which are different from those in the spatial case is not that the hidden variable models are different -they are the same. The reason is that the quantum-mechanical correlations are very different and strongly depend on whether one considers a spatial scenario or a temporal scenario. Although the Leggett-Garg inequality is perfectly valid as a spatial Bell inequality in a three-party scenario, it is not interesting in this case: since there is only one observable per party, no quantum violations are possible, and likewise no violations by more general no-signaling theories.
Let us also note that any set of joint outcome probabilities for a spatial Bell test can also appear in the temporal scenario. Mathematically, this follows from the fact that we recover exactly the spatial joint probabilities by taking a and b in (1) to operate on separate tensor factors. Physically, this is clear since we can just think of Alice's and Bob's spatially separated quantum systems as a single quantum system, and then simply imagine that Alice conducts her measurement first, with Bob's measurement operating at a later time.
To end the comparison with spatial scenarios, let us recast (3) in the following form: Proposition 2.1. While a spatial correlator is given by the expectation value of the tensor products of the observables, a temporal correlator is given by half the expectation value of the anticommutator of the observables: The qubit case and beyond. As a first example of temporal quantum correlations, we consider a single qubit in the Bloch sphere picture. This case has also been treated in [BTCV].
Let the system have an initial state given in terms of the Bloch vector v. A dichotomic observable is described by a unit vector a ∈ R 3 , such that the probability for getting the outcome r ∈ {−1, +1} on the state v is given by And in case that the outcome r has been observed, the state has collapsed to r a.
The dynamics of the qubit between t 1 and t 2 in this representation is specified by a rotation matrix R ∈ SO(3), such that the state prior to Bob's measurement is R(r a) = r R( a). Then given that Alice obtained the outcome r, the probability for Bob to get the outcome s is consequently 1 2 (1 + rs b · R( a)). (8) After multiplying the two expressions (7) and (8) to get the joint probability and summing over r and s with the appropriate sign, the correlator explicitly reads according to the definition (2) So remarkably, this correlator does not depend on the initial state, which is due to the collapse after Alice has measured, and the structure of the correlator as a particular linear combination of joint probabilities. This correlator is very similar to the correlator known from maximally entangled two-qubit states and therefore we can now find the maximal qubit value using simple techniques. The CHSH quantity then reads: For finding its maximum, note that since the vectors b are normalized, the vectors in the brackets are orthogonal. Moreover, | b 1 + b 2 | 2 + | b 1 − b 2 | 2 = 4 and so we can introduce two new orthogonal normalized vectors b + and b − such that b 1 + b 2 = 2 cos α b + and b 1 − b 2 = 2 sin α b − for some angle α. Plugging this into the expression for S qubit and optimizing over the R( a i ), which are also normalized vectors, yields the Tsirelson bound of 2 √ 2, which is therefore the maximal value achievable with a qubit. In particular, this violates the bound (6), confirming that quantum theory cannot be equivalent to a probabilistic hidden variable theory with preexisting values for all observables and repeatable measurements.
All the concrete examples of temporal quantum correlations which we will consider in the following sections are modelled on qubits. So here let us quickly demonstrate that not all quantum correlations in the temporal CHSH scenario can arise from qubit data. Consider a qutrit system with orthonormal basis {|0 , |1 , |2 }, and the following prescriptions: • the initial state |ψ = |0 , • a 1 measures if the system is in the state |0 + |1 , • a 2 measures if the system is in the state |0 + |2 , • b 1 measures if the system is in the state |2 , • b 2 is any dichotomic observable.
This system has the following properties: Alice's outcomes both have probability 1/2, independent of whether she chooses a 1 or a 2 . But her choice drastically affects Bob's prospects upon measuring b 1 : when Alice chooses a 1 , he will definitely observe a −1 outcome; however when Alice chooses a 2 , his outcome will be uniformly random and independent of hers. Such behavior is impossible in a qubit system: one would necessarily need to have b 1 = −½, otherwise Bob's outcome could not be definite after Alice's non-trivial measurement of a 1 . But then obviously his outcome would also have to be a definite −1 when Alice measured a 2 , which it is not allowed to be. It would be interesting to try and turn this into a dimension witness in the sense of [BPA + 08].

Correlator space and the Tsirelson bound
We may ask whether the temporal correlators satisfy the Tsirelson bound generally, or whether this just holds for the case of a qubit system. From the qubit case we know that the Tsirelson bound can be attained; but a priori, some temporal quantum correlations may in principle be so strong that even the Tsirelson inequality is violated. What we mean here by correlator space is the set of quadruples (C 11 , C 12 , C 21 , C 22 ) which can appear as correlators between Alice's and Bob's measurements in a quantum-mechanical world. Recall that the correlators are defined as rs P (r, s|k, l) so that there is a linear map from probability space down to correlator space. Obviously, taking the projection of a point from probability space down to correlator space throws away some data, so specifying the four correlators is not sufficient for knowing the full set of joint probabilities. Yet the correlators contain precious information about the system, for example the maximal violation of the CHSH inequality, and they are also related to the possibility of producing PR-box behavior (see section 4). For the remaining part of this section, we will consider the scenario in which Alice has a choice between m ∈ N dichotomic observables, while Bob has a choice betweeen n ∈ N dichotomic observables. Even in this generality, it is not hard to use the techniques of Tsirelson for showing that, in correlator space, the temporal quantum region coincides with the spatial quantum region. Tsirelson has proven in his paper [Tsi85] that the following three statements are equivalent, for any given matrix of correlators (C kl ) l=1,...,n k=1,...,m : (a) There exists a C * -algebra A with identity, hermitian elements a 1 , . . . , a m , b 1 , . . . , b n and a state f on A such that for any k, l, we have (b) There exist Hilbert spaces H a and H b together with Hermitian operators a 1 , . . . , a m ∈ B(H a ), b 1 , . . . , b n ∈ B(H b ) and a density matrix ρ on H a ⊗ H b such that In the Euclidean space of dimension min(m, n), there exist vectors x 1 , . . . , x m , y 1 , . . . , y n such that |x k | ≤ 1; |y l | ≤ 1; x k , y l = C kl ∀k, l Proposition 3.1. These conditions are also equivalent to the following two: (a') There exists a C * -algebra A with identity, hermitian elements a 1 , . . . , a m , b 1 , . . . , b n and a state f on A such that for any k, l we have There exists a Hilbert space H together with Hermitian operators a 1 , . . . , a m , b 1 , . . . , b n ∈ B(H) and a density matrix ρ on H such that . Given the data as in (b), it is clear that they also satisfy (b') if we take H = H a ⊗ H b and rename a k ⊗ ½ to a k and ½ ⊗ b l to b l .

The implication (b')⇒(a') easily follows by choosing
To close the circle of implications, we will now check that (a')⇒(c). But this works in exactly the same way as Tsirelson's own proof [Tsi85] that (a)⇒(c): start with the finite-dimensional vector space defined to be the R-linear span of the a k and the b l . This vector space carries an inner product, possibly degenerate, which is defined as x, y ≡ f 1 2 {x, y * } = Re f (y * x) After quotiening out the null space, this inner product becomes positive definite and produces a Euclidean space such that as required. Now just as in [Tsi85], all the requirements of (c) are satisfied, except that the dimension of the space has to be at most min(m, n). This can also be easily achieved by orthogonal projection of the vectors x k ≡ a k onto the subspace spanned by the vectors y l ≡ b l , or in the other way around.
By (3), we have therefore proven the following result: Theorem 3.2. A matrix of correlators (C kl ) l=1,...,n k=1,...,m can appear as temporal correlations between dichotomic projective measurements on the same system if and only if it can appear as spatial correlations between dichotomic measurements on two spatially separated entangled systems.
In particular, this implies that the Tsirelson bound (10) is indeed generally valid in our temporal setting.

Impossibility of PR-box correlations
We say that a PR-box correlation is a set of joint probabilities P (r, s|k, l) which has the property that the outcomes r and s are equal if and only if k = l = 2. This property is equivalent to the requirement that the four correlations (3) are given by Correlations of this form could be used e.g. to achieve optimal better-than-quantum performance in two-party XOR games (see e.g. [CHTW04]). When the joint probabilities P (r, s|k, l) are assumed to be no-signaling, then this requirement actually fixes all values for the probabilities uniquely; however this does not apply here as our temporal scenario allows signaling from Alice to Bob. Starting from (3), we now determine when a correlator C kl can have a value of ±1, which is equivalent to ψ|a k b l |ψ + ψ|b l a k |ψ = ±2.
But now since the absolute value of each term is ≤ 1, and becomes 1 if and only if |ψ is an eigenstate of the respective operator, it follows that PR-box behavior requires |ψ to be an eigenstate of the following form: b k a l |ψ = a k b l |ψ = (−1) (k−1)(l−1) |ψ But these equations imply which is impossible for any |ψ = 0. Therefore, PR-box behavior is impossible even for the temporal quantum correlations which we consider here. We could also have concluded this from theorem 3.2.

Strength of signaling
In our Bell-test scenario, the backward no-signaling equations P (r, −1|k, 1) + P (r, +1|k, 1) = P (r, −1|k, 2) + P (r, +1|k, 2) ∀r, k are still true: the marginal probability governing Alice's measurement cannot possibly depend on the measurement setting of Bob. However the forward no-signaling equations P (−1, s|1, l) + P (+1, s|1, l) = P (−1, s|2, l) + P (+1, s|2, l) ∀s, l are typically violated, since the choice of measurement for Alice influences the system state after her measurement, and therefore changes the outcome probabilities for Bob. Effectively, what Bob sees is not exactly the initial state |ψ , but |ψ after undergoing decoherence due to Alice's measurement.
It is an interesting question to ask how much the no-signaling equations (13) can be violated by our quantum-mechanical setup. This is why we want to look at the deviations from (13) and determine how large they can possibly be in a quantum theory. Since each of these four possible quantities involve only one fixed measurement setting l of Bob, we will disregard Bob's choice for the rest of this section, and assume that he simply measures any dichotomic observable b. The joint probabilities we then consider are of the form P (r, s|k). Then the two signaling quantities are S + ≡P (+1, +1|1) + P (−1, +1|1) − P (+1, +1|2) − P (−1, +1|2) Due to the total outcome probability for each choice of measurement being 1, it necessarily holds that S + + S − = 0, independent of whether the system is quantum or not. Therefore the interesting question now is, which values of S + are achievable by quantum mechanics? This is what we are going to answer here. A priori, S + can be expected to attain all the values in the interval [−1, +1]. The extreme values of −1 and +1 correspond to perfect signaling in the sense that Bob can definitely tell which measurement Alice had chosen. This can be interpreted as a classical communication channel with a capacity of 1 bit.
Each term within the absolute value brackets in turn can be bounded by 1, since it is the expectation value of a ±1-valued observable, so that the bound |S + | ≤ 1/2 follows. Conversely, since the set of allowed for S + needs to be convex, it is sufficient to show that the values +1/2 and −1/2 can be attained. For attaining the value +1/2, we can choose where a direct calculation shows that this indeed has the required property.
As was already mentioned briefly, we may also consider the signaling strength in terms of the information which Bob's measurement outcome contains about Alice's choice of setting. This is encoded in the two probabilities P (s|k) ≡ P (−1, s|k) + P (+1, s|k) which define a classical communication channel on the input alphabet k ∈ {1, 2} to the output alphabet s ∈ {−1, +1}. Since Bob's outcome is only dichotomic, we can equivalently consider the expectation value of his measurement, and the question then is, which pairs (E(s|1), E(s|2)) can occur quantum-mechanically, and how does this bound the classical capacity by which Alice can use her measurements in order to send information to Bob? The answer to this question is given in the following theorem: The maximal communication capacity is log 2 (5/4) ≈ 0.32 bits, which can be achieved using the qubit protocol (16).
This result is illustrated in figure 1.
Proof. Since E(s|1) + E(s|2) = 2S + , the constraint |E(s|1) + E(s|2)| ≤ 1 immediately follows. On the other hand, the qubit protocol (16) achieves E(s|1) = 1, E(s|2) = 0, which is one of the four non-trivial vertices of the convex quadrangle shown if figure 1. The other three vertices can be attained by the same protocol after possibly switching s ↔ −s and a 1 ↔ a 2 . Now since the quantum region has to be convex, and the quadrangle is the smallest convex set containing its vertices, it follows that |E(s|1)+ E(s|2)| ≤ 1 is also sufficient for the existence of a quantum-mechanical model. Now we get to the capacity statement. Since classical communication capacity is a convex function of the transition probabilities, we know that the maximal capacity is attained at the quadrangle's vertices. Since the four vertices are all simple permutations of the protocol (16), the corresponding channels have equal capacity, and it is sufficient to calculate the capacity achievable by the data (16). A direct calculation shows that the optimal input distribution is a relative frequency of 3/5 for a 1 and 2/5 for a 2 , resulting in a mutual information of log 2 (5/4) ≈ 0.32 bits.

A temporal version of Hardy's nonlocality paradox
Hardy's paradox [Mer94] occurs when the joint probabilities have the following properties: P (+1, +1|1, 1) = 0 P (−1, +1|1, 2) = 0 P (+1, −1|2, 1) = 0 P (+1, +1|2, 2) > 0 (18) This is impossible in any realistic theory where Alice's measurements are non-invasive. We note that the only relevant information contained in hidden variables lies in the preexisting values of all relevant observables. Hence any (stochastic) hidden variable model is given by a statistical mixture of the 16 realistic states a ± 1 a ± 2 b ± 1 b ± 2 (19) where in this notation (from [Lap06]), each sign stands for the corresponding measurement outcome it determines with certainty, and the four signs are independent of each other. By the assumption P (+1, +1|2, 2) > 0, we know that this statistical mixture contains at least one state of the form a ± 1 a + 2 b ± 1 b + 2 .
But now due to P (−1, +1|1, 2) = 0, this cannot be one of the two states a − 1 a + 2 b ± 1 b + 2 . Likewise by P (+1, −1|2, 1) = 0, it cannot be one of the two states a ± 1 a + 2 b − 1 b + 2 . Therefore, the statistical mixture of realistic states necessarily contains the state a + 1 a + 2 b + 1 b + 2 but now this contradicts the assumption P (+1, +1|1, 1) = 0 ! Therefore, the existence of joint probabilities with the property (18) exhibits a rather strong form of contextuality. Note that this kind of reasoning applies to a spatial as well as to a temporal Bell test scenario. In fact, (18) is indeed realizable in quantum theory, and it is known that the maximal value for P (+1, +1|2, 2) in a spatial scenario is approximately 0.09 [Mer94]. Here we would like to determine the maximal possible value of P (+1, +1|2, 2) in the temporal CHSH scenario. Again, since joint probabilities for the temporal case comprise those of the spatial case, the maximal temporally realizable value of P (+1, +1|2, 2) has to be at least 0.09. We will now proceed to show that one can achieve a substantially higher value than this. This shows again that temporal quantum correlations are often stronger than spatial quantum correlations.

Proof.
In order for a probability P (r, s|k, l) like (1) to vanish, one needs that • either Alice's outcome r by itself is already impossible to occur, i.e. the other outcome −r occurs with certainty. This means that the initial state is a −r-eigenstate of a k , a k |ψ = −r|ψ .