Three-dimensionality of space and the quantum bit: an information-theoretic approach

It is sometimes pointed out as a curiosity that the state space of quantum two-level systems, i.e. the qubit, and actual physical space are both three-dimensional and Euclidean. In this paper, we suggest an information-theoretic analysis of this relationship, by proving a particular mathematical result: suppose that physics takes place in d spatial dimensions, and that some events happen probabilistically (not assuming quantum theory in any way). Furthermore, suppose there are systems that carry"minimal amounts of direction information", interacting via some continuous reversible time evolution. We prove that this uniquely determines spatial dimension d=3 and quantum theory on two qubits (including entanglement and unitary time evolution), and that it allows observers to infer local spatial geometry from probability measurements.


I. INTRODUCTION
The fact that the state space of quantum two-level systems -the Bloch ball -and physical space are both three-dimensional and Euclidean has been regarded as a remarkable coincidence for many years, provoking interesting ideas and lines of research. Building on this observation, von Weizsäcker [1,2] constructed his "ur theory" as an attempt to derive spacetime from quantum mechanics. Similarly, Penrose's twistor theory [3] was built on the idea that the geometry of physical and quantum state space are fundamentally related, which was elaborated further by Wootters [4] pointing out the relation between quantum state distinguishability and geometry.
The idea that the quantum bit state space and physical space are somehow logically intertwined has become a widespread paradigm, cf. [5]. But what is the exact relationship -which one of the two determines the other? Could a similarly nice relationship also exist in other dimensions, or is there something special about d = 3?
The goal of this paper is to offer a particular information-theoretic analysis of these questions: we show that a certain natural interplay between geometry and probability is only possible if space has three dimensions, and if outcome probabilities of measurements are exactly as predicted by quantum theory. This result suggests to explore the idea that neither quantum theory nor spacetime are separately fundamental, but that both might have a common information-theoretic origin.
Our approach rests on some natural background assumptions. Suppose that physics takes place in d spatial dimensions (and one time dimension), and some of the physical processes involve probabilities. That is, there exist experiments with random outcomes -we can imagine that physicists, or nature, prepare physical systems in certain states, and later on, measurements on the systems reveal outcomes with certain probabilities. We do not assume that those probabilities are necessarily de-scribed by quantum theory.
Then we consider the situation depicted in Fig. 1: we have two agents, traditionally called Alice and Bob. Alice's goal is to send some spatial direction -that is, a unit vector x ∈ R d -to Bob, by encoding it into some state ω(x) and sending a physical system that carries this state. We assume that they do not share a common coordinate system, such that Alice cannot simply send a classical description of x. Bob holds a measurement device that can be rotated in space, which he can apply to the state that he received, obtaining one of finitely many possible outcomes. The outcome probabilities depend continuously on the device's spatial orientation. Furthermore, suppose that the following four postulates are satisfied: 1. Alice can encode any spatial direction x ∈ R d into some state such that Bob is able to retrieve x in the limit of many copies.
FIG. 1: The situation considered in this paper. Bob holds a macroscopic measurement device that he can rotate in d-dimensional space; its orientation in space is thus described by a unit vector ("direction") y ∈ R d . Alice's goal is to send a spatial direction x ∈ R d , |x| = 1, to Bob, by encoding it into a suitable state ω(x). After obtaining the state, Bob measures it with his device, obtaining one of several possible measurement outcomes with some probability (indicated by a flashing lightbulb in the picture). After obtaining many identical copies of ω(x), and measuring it in many different directions y ∈ R d , Bob is supposed to estimate Alice's direction x, such that his guess becomes arbitrarily close to Alice's actual choice in the limit of infinitely many copies. We assume that Alice and Bob have agreed on an arbitrary protocol beforehand, but they do not share a common coordinate system, such that Alice cannot simply send a classical description of x.
like information causality [6], where postulates of impossibility of certain information-theoretic tasks are exploited to derive properties of physical theories. These approaches also have successful historical examples, like the postulate of impossibility to build a perpetuum mobile of the second kind in thermodynamics. The approach in this paper may be interpreted as the application of novel mathematical tools to the old question of the relation between geometry and probability. These tools have their origin in the recent wave of axiomatizations of quantum theory [7][8][9][10][11][12], in particular in Hardy's seminal work [8], and are inspired by recent work on quantum reference frames [13][14][15][16][17].
The first part of this paper consists of an introduction to one of these tools, which is the framework of convex state spaces, generalizing quantum theory in a natural way. Then, the first two postulates will be defined in more detail, and will be used to derive the state space of a single system. Finally, joint state spaces and the remaining two postulates will be discussed in detail, yielding our main result. Throughout the paper, only the main ideas and proof sketches are given; the full proofs are deferred to the appendix.
Consider the simple physical setup in Fig. 2. We have a preparation device, which, whenever it is operated, generates an instance of a physical system (for example, a particle). We assume that we can operate the preparation device as often as we want (say, by pressing a button on the device, or by waiting until a periodic physical pro- cess has completed another cycle). In the end, the system can be measured, by applying one of several possible measurement devices with a finite number of outcomes.
The intuition is that the device prepares the system in a certain fixed state ω; operating the preparation device several times produces many independent copies of ω. To define exactly what we mean by that, consider any fixed measurement device M. If M is applied to the preparation device's output, we assume that we get one of k different measurement outcomes probabilistically, where k ∈ N is an arbitrary natural number (in Fig. 2, we have k = 2, represented by the two dots). The probability to obtain the i-th outcome (where 1 ≤ i ≤ k) is denoted M (i) (ω), such that i M (i) (ω) = 1.
Suppose we have two devices, both preparing the same type of physical system, but in two different states ϕ and ω. Then we can use them to build a new device that performs a random preparation: it prepares state ω with probability p, and state ϕ with probability 1 − p. The resulting state will be denoted pω + (1 − p)ϕ. This is a convex combination of ω and ϕ. If we apply measurement M to that state, we will get the i-th outcome with probability M (i) pω + (1 − p)ϕ = pM (i) (ω) + (1 − p)M (i) (ϕ) by the basic laws of probability theory. In summary, we see that states ω of some physical system are elements of a real affine space which we denote by some capital letter, say A; single measurement outcomes are described by affine maps M (i) : A → R which yield values between 0 and 1 for every state. Maps of this kind will be called effects. Full measurements are described by a collection of effects {M (i) } k i=1 that sum to unity if applied to any state. The set of all possible states of the corresponding physical system will be denoted Ω A , the state space. It is a bounded subset of A. We have just seen that ω ∈ Ω A and ϕ ∈ Ω A imply that pω + (1 − p)ϕ ∈ Ω A for all 0 ≤ p ≤ 1; this means that Ω A is convex. We will only consider finite-dimensional state spaces in this paper. Since outcome probabilities can only ever be determined to finite precision, we may (and will) assume that Ω A is topologically closed.
As a simple example, consider a physical system which resembles a classical bit, or coin. We can perform a measurement by looking whether the coin shows heads or tails; think of a two-outcome device which yields the first outcome if the coin shows heads, and the second otherwise. The possible states are then characterized by the probability p ∈ [0, 1] of obtaining heads. The state space becomes a line segment, with all states being probabilistic mixtures of two pure states that yield either heads or tails deterministically, see Fig. 3a).
FIG. 3: Examples of convex state spaces: a) is a classical bit, b) and c) are classical 3-and 4-level systems, d) is a quantum bit, e), f) and g) are neither classical nor quantum, even though e) can naturally be embedded in a qubit. Note that quantum n-level systems for n ≥ 3 are not balls [36].
The state spaces of a classical three-and four-level system are also shown in Fig. 3, b) and c): they are an equilateral triangle, resp. a tetrahedron. In general, the state space of a classical n-level system is the set of all probability distributions (p 1 , . . . , p n ), which is an (n−1)dimensional simplex.
Quantum state spaces look quite different. Quantum bits, the states of spin-1/2 particles, can be described by 2 × 2 complex density matrices ρ. These can always be written in the form ρ = (1 + r · σ)/2, where r is an ordinary real vector in R 3 with | r| ≤ 1, and σ = (σ x , σ y , σ z ) denotes the Pauli matrices [37]. We can consider r = (r x , r y , r z ) as the state of the qubit. Thus, the state space is a three-dimensional unit ball as shown in Fig. 3d). A spin measurement in the z-direction may be described by the two effects M (1) ( r) = (1 + r z )/2 and M (2) ( r) = (1 − r z )/2, for example, where the two outcomes correspond to "spin up" and "spin down", respectively.
However, the state space of a quantum n-level system is only a ball for n = 2; for n ≥ 3, quantum state spaces are not balls, but intricate compact convex sets of dimension n 2 − 1 [36,39].
Given any state space Ω A , all effects, i.e. affine maps M : A → R with M(ω) ∈ [0, 1] describe outcomes of conceivable measurement devices. We can work out the set of these maps from a description of Ω A . In general, some of these measurements might be physically impossible to implement; in order to describe a physical system, we have to specify which ones are possible and which ones are not. From the effects, we can construct expectation values of observables, simply called observables in the following. These are arbitrary affine maps h : A → R; in quantum theory, they are maps of the form ρ → tr(ρH), where H = H † is any self-adjoint matrix. One way to measure an observable (on many copies of a state) is to write it as a linear combination of effects, h = i h i M i , h i ∈ R, and to measure the effects M i on different copies (in general, they may not be jointly measurable on a single copy and thus be outcomes of different measurement devices).
Similarly, we can describe reversible transformations of a physical system: these are physical processes that take a state to another state, and may be inverted by another physical process (in quantum theory, these are the unitaries, mapping ρ to U ρU † ). Since they must respect probabilistic mixtures, they must also be affine maps. Due to reversibility, they map the state space Ω A onto itself -they are symmetries of the state space. The set of reversible transformations on A is a closed subgroup G A of all symmetries of Ω A .

III. SINGLE SYSTEMS: POSTULATES 1 AND 2
We consider a particular situation where measurements take place in d-dimensional space, with one time dimension. For simplicity, we assume that there is a fixed flat background space, such that there is a unique way to transport vectors from one laboratory A to another distant laboratory B (however, we think that our results may apply to more general situations). We will also assume that all physical operations considered in the following, such as measurements, are performed locally in a way such that all parties (particles, measurement devices etc.) are relative to each other at rest [72]. Thus, we do not have to consider conceivable relativistic effects.
In general, there may be many different kinds of physical systems described by convex state spaces. We now assume that there exists a particular type of physical system which, in a sense to be made precise, behaves like a "unit of direction information". We will call these systems "direction bits" (later on, we show that they are effectively 2-level systems, therefore "bits", cf. Lemma 19 in the appendix). We will not specify by what type of physical object they are carried -a direction bit could, for example, correspond to the internal degrees of freedom of a particle, or it could be something completely different. We will only assume that a direction bit may come in different states (matching the framework described above), with a state space denoted Ω d .

FIG. 4:
We assume that direction bits can be measured by some macroscopic measurement device, which yields one of several outcomes i ∈ {1, . . . , k} probabilistically. Due to symmetry, its modus operandi depends only on a vector y ∈ R d , |y| = 1 specifying its "direction" in the local laboratory frame. The probability M (i) y (ω) to obtain the i-th outcome depends only on the direction bit state ω and continuously on the direction y. The device can be rotated in space according to any rotation R ∈ SO(d). In the rotated reference frame of the device, this corresponds to a reversible transformation on the direction bit.
We assume that direction bits can be measured by a certain type of measurement device with a finite number of outcomes. As shown in Fig. 4, we imagine that the device is implemented as a macroscopic, massive object which can be rotated arbitrarily, i.e. can be subjected to any SO(d) rotation. Due to some symmetry of the device, its orientation in space (locally in the lab) may be described by a unit vector y ∈ R d , |y| = 1, choosing some arbitrary but fixed coordinate system in the local laboratory. Instead of naively thinking of the whole device as "pointing in direction y", we may also think that one of the device's components is a vectorial physical quantity which determines the type of measurement that is performed. A standard example in three dimensions is given by a Stern-Gerlach device, where y is the direction of inhomogeneity of a magnetic field.
The case d = 1 is special, because SO(1) = {1} is trivial, and thus no one-dimensional rotation can map the unit vector +1 ∈ R 1 to the unit vector −1 ∈ R 1 . In order to allow Bob to collimate his device in all directions also in d = 1, we will thus silently replace SO(1) by O(1) = {1, −1} in all of the following.
Since the measurement which is performed by the device may depend on its direction y in space, it is denoted M y . In the following, by a "direction", we shall always mean a unit vector in R d . For obvious physical reasons, we assume that the outcome probabilities M (i) y (ω) are continuous in the direction y.
Physically, we assume that we can perform a rotation R ∈ SO(d) to the measurement device without touching the direction bit; this transforms M y to M Ry , but leaves the bit's state ω invariant. The fact that the outcome probabilities are altered, from M (i) Ry (ω), should be understood as a result of the change in the relative orientation of the bit and the device. Thus, even though direction bits are considered as informational "black boxes" with arbitrary physical realization, we are forced to adopt the interpretation that direction bits carry actual physical geometrical orientation.
This enforces a certain duality that is familiar from quantum mechanics. Suppose that, after rotating the measurement device by R, we do not perform the measurement, but instead rotate the joint system of direction bit and measurement device back by R −1 . If it is physically unclear how to do this in practice, we can just imagine performing a passive coordinate transformation.
Since this transformation does not change the relative direction of the system and measurement apparatus, it does not alter the outcome probabilities. However, by changing to the new coordinate system, M Ry has been transformed back to M y , hence the direction bit state must have changed from ω to some other state ω such that M (i) Ry (ω). The state transformation ω → ω can be physically undone (by rotating the joint system again by R), hence it must be an element of the group of reversible transformations on Ω d . We call it G R −1 , such that we can switch from the "Heisenberg" to the "Schrödinger" picture via Clearly G R • G S = G RS ; in other words, the map R → G R is a group representation of SO(d) on the direction bit state space. Now suppose we have a situation where two agents (Alice and Bob) reside in distant laboratories as depicted in Fig. 1. Imagine that Alice holds an actual physical vector x ∈ R d (all vectors and rotations will be denoted with respect to Alice's local coordinate system in the following), and she would like to send this geometric information to Bob. Since Alice and Bob have never met, they have never agreed on a common coordinate system. Thus, it is useless for Bob if Alice sends him a classical description of x, because he does not know what coordinate system the description is referring to.
However, if Bob holds a measurement device as in Fig. 4, Alice can send him a direction bit in some state ω. As usual in information theory (taking into account the statistical definition of states), we analyze the properties of a single state ω by considering many identical copies of that state. So suppose Alice sends many independent copies of ω to Bob. On every copy, Bob can measure in a different direction, and he may find that some outcome probabilities are varying over the different directions y ∈ R d , |y| = 1. This breaks rotational symmetry, and so may be used by Alice to send physical direction information to Bob.
However, Alice cannot send information about the length of the vector x to Bob, if we assume that Bob can only rotate his device (as in Fig. 4) and not more. Thus, restricting to the situation in Fig. 1, we state that Alice's task is to send a direction vector x ∈ R d , |x| = 1, to Bob, by encoding it into some state.

Postulate 1 (Encoding).
There is a protocol (as in Fig. 1) which allows Alice to encode all spatial directions x ∈ R d , |x| = 1, into states ω(x) ∈ Ω d , such that Bob is able to retrieve x in the limit of many copies.
Denote the coordinates of some vector x ∈ R d in Bob's local coordinate system by x B . Then we stipulate that after obtaining n copies of ω(x), Bob makes a guess x (n) B of x B (based on his previous measurement outcomes) such that x (n) B → x B for n → ∞ with probability one. For obvious physical reasons, we assume that Alice's encoding x → ω(x) is continuous [73]. Moreover, Bob measures each direction bit individually and only once (we may imagine that direction bits get destroyed upon measurement [74]).
In principle, direction bits might carry further additional information that can be read out in measurements. As a naive example, the physical system that Alice transmits could be a simple wristwatch, with the watch hand pointing in the direction that Alice is intending to send. However, a wristwatch is hardly "economical" for this task: it carries a large amount of additional information, like the details of its head shape etc. Our second postulate says that direction bits should be "minimal" in their ability to carry directional information: any attempt to encode further information can only succeed at the expense of loosing some of the directional information.
Postulate 2 (Minimality). No protocol allows Alice to encode any further information into the state without adding noise to the directional information.
To spell out the mathematical details, we need to define what it means that one state ϕ is noisier in its directional information than another state ω. First, by directional information of ϕ, we mean the probability functions M (i) z (ϕ) as seen by direction bit measurement device. If we have two states ϕ, ω with directional probabilities related by a rotation, i.e. M (i) for some rotation R ∈ SO(d) and all i, we argue that both states are equally noisy in this respect -they both contain the same "amount of asymmetry", just pointing in different directions.
We could additionally say that ϕ and ω are equally noisy if H(ϕ) = H(ω) for some entropy measure H; however, there is no unique entropy definition for arbitrary state spaces [41][42][43], and entropy measures acquire meaning only relative to certain operationally defined tasks which is a complication we want to avoid. Therefore, we define ϕ to be at least as noisy in its directional information as ω if its directional probabilities are statistical mixtures of those of ω and other states that are equally noisy as ω; that is, if there are statistical weights λ j > 0, j λ j = 1, and rotations R j ∈ SO(d) such that for all outcomes i, Clearly, ϕ is noisier than ω in its directional information if it is at least as noisy, and at the same time not equally noisy as ω. In Definition 8 and following in Appendix A, we show that this notion is a natural generalization of the majorization relation [37] from classical probability theory and quantum theory. It also has a natural interpretation in terms of resource theories [14,38]: for any given ω, the probability functions z → M (i) z (ω)or rather their directional asymmetry -constitute a resource for Bob. One resource is less useful -that is, more noisy -than the other if it can be obtained from the other by "free" operations; in this case, by tossing a coin and performing a random rotation.
Suppose we had a protocol that satisfied Postulate 1, and two states ϕ = ω that would work as a possible encoding of some direction x, in the sense that Bob would in both cases decode direction x in the limit of obtaining infinitely many copies. Then, by choosing to send either ϕ or ω, Alice could send an additional classical bit to Bob. Postulate 2 says that this is only possible at the expense of adding noise -that is, one of the two states must be noisier in its directional information than the other.
Our goal is to determine the shape of the convex state space of a direction bit, using only Postulates 1 and 2 and the physical background assumptions (Postulates 3 and 4 will be considered later). To this end, suppose Alice encodes some direction x according to some protocol into a state ω(x) and sends many copies of it to Bob. If the protocol satisfies Postulates 1 and 2, Bob will be able to decode x. Now suppose that Bob secretly performed a rotation R ∈ SO(d) on his laboratory before the protocol started. Since the protocol must work regardless of the relative orientation of Alice and Bob, Bob will still succeed to obtain an accurate estimate of x as before.
As we have seen, applying R to a measurement device can be replaced by applying G R −1 to the direction bit state. Therefore, the following implementation will also allow Bob to guess x: • Apply G R −1 to every incoming direction bit; measure as in the protocol above.
• After obtaining n copies, determine the guess x (n) given by the protocol above.
• To compensate for the missing lab rotation, output the guess Rx (n) .
Suppose that R is in the stabilizer subgroup of x, i.e. Rx = x. Then the lines above prove that the original protocol also works if Alice sends the state G R −1 (ω(x)) to Bob instead of ω(x). But these states are equally noisy in their directional information, hence Postulate 2 implies that they are equal. In other words, we have shown the following: For any encoding x → ω(x), the state ω(x) is invariant with respect to all rotations that leave x invariant.
Thus, for every i, the probability M (i) y (ω(x)) is a function of the first component of y. As we show in Lemma 12 in Appendix A, this has the following consequence: there is at least one measurement outcome (call it i 0 ) and one direction y such that M (i0) −y (ω(x)) -otherwise, the state ω(x) would be "too symmetric" to allow the transmission of direction information, and Postulate 1 would be violated.
Fix this i 0 . For every direction y that satisfies the inequality above (there might be more than one), we define a new state ω (y) by averaging over the stabilizer group [44] From now on, we are only interested in the outcome i 0 , and use the abbreviation M z := M (i0) z . Furthermore, for all states ω and directions z, we set In particular, we obtain L y (ω (y)) = L y (ω(x)) > 0. As we prove in Lemma 13 in Appendix A, if y is chosen in a clever way, then this is the maximal possible value: there is a particular choice of y such that the map z → L z (ω (y)) attains its unique global maximum at z = y.
This property allows us to construct a new protocol for Alice and Bob to transmit direction information. Fix this particular choice of y, and ω (y). For all other directions z = y, define ω (z) by rotating ω (y) appropriately, i.e. ω (z) := G S ω (y), where S ∈ SO(d) is any rotation with Sy = z. The new protocol works as follows: • Alice encodes some direction x into the state ω (x) and sends many copies of it to Bob.
• By measuring the received copies, Bob determines a good estimate of the function f (z) := M z (ω (x)). Bob's guess is the vector z for which Remarkably, from an arbitrary protocol to transmit direction information, we have constructed a standard protocol. This involves a difference L z (ω) = M z (ω) − M −z (ω), which has striking similarity to the spin-1/2 angular momentum expectation value in quantum mechanics [75]: this expression is the expectation value of a random variable which assigns "+1" to direction z, and "−1" to direction (−z).
Since the new protocol allows Alice to transmit arbitrary spatial directions to Bob, it must satisfy Postulate 2. Thus, if we have two states ω and ϕ that satisfy L y (ω) > L z (ω) and L y (ϕ) > L z (ϕ) for all z = y, they must either be equal, or one must be strictly noisier than the other, as in eq. (1) (exchanging the names ω ↔ ϕ if it is the other way round). As we prove in Lemma 7 in Appendix A, this equation implies for the states themselves that ϕ = j λ j G R −1 j ω, as states turn out to be uniquely determined by their directional information. Since both ω and ϕ can be used as codewords for direction y in our standard protocol, our intermediate result one page above implies that G R ω = ω and G R ϕ = ϕ for all R ∈ SO(d) with Ry = y.
Suppose that furthermore the maxima agree, i.e. L y (ω) = L y (ϕ) =: M holds. Then This is only possible if L Rj y ω = M for all j, and thus R j y = y by construction. But then G Rj ω = ω for all j as just mentioned, and applying G R −1 j to both sides and substituting into the relation between ϕ and ω proves that ϕ = ω. Thus, if two states encode the same direction in our standard protocol, with the same maximal value of L, they must agree. This property will now be used to determine the state space of a direction bit.
From now on, x and y will denote arbitrary directions, disregarding the special choices above. Call any state ω with the property that L x (ω) > L y (ω) for all y = x a codeword for x. The codewords ω (x) constructed above are in general not the "optimal" ones for the standard protocol -we might be able to find "better" ones, ω (x), with L x (ω (x)) > L x (ω (x)). The previous inequality can be interpreted as saying that the ω (x) let Bob determine x more quickly than the codewords ω (x) in the standard protocol above, because the difference in probabilities is larger and statistically visible after transmission of fewer direction bits.
As we show in Lemma 15 in Appendix A, there is an "optimal" set of codewords which we call ω x , with the property that L x (ω x ) ≥ L x (ω (x)) for all other codewords ω (x). The codewords for different directions are related by rotations: if y = Rx for R ∈ SO(d), then ω y = G R ω x . Furthermore, there is a constant 0 < a ≤ 1 such that L y (ω y ) = a for all y; we call a the direction bit's visibility parameter.
Given ω x , we can define a "uniform noise" state which we call the maximally mixed state µ: Since all ω y are related by rotations, µ is independent of the initial choice of x. As this is an integral over the invariant Haar measure, there is constant c ∈ (0, 1) such that M y (µ) = c for all y. We call c the direction bit's noise parameter.
Now suppose ω is any state which is a codeword for some direction x. Then λ := L x (ω)/a is in the interval (0, 1]. Thus, ω := λω x + (1 − λ)µ is a valid state, and it is easy to see that it is also a codeword for x. But L x (ω ) = L x (ω), and so the intermediate result above implies that ω = ω . Since every state can be approximated arbitrarily well by some codeword, we have proven that every state ω can be written in the form We are free to reparametrize the state space Ω d via some affine map φ : R D → R D , where D is the dimension of Ω d : replacing states via ω →ω := φ(ω), effects via M →M := M • φ −1 and transformations via G →Ĝ := φ • G • φ −1 does not change any probabilities or physical predictions. Basic group representation theory [44] tells us that we can choose φ such that the transformed groupĜ acts linearly and contains only orthogonal matrices, and the transformed statesω x (for different x) -being connected by reversible transformations -have all the same Euclidean norm 1. Moreover, the maximally mixed stateμ, being invariant with respect to all transformations, becomes the zero vector.
Since all statesω are convex mixtures of someω x and µ, we obtain the situation depicted in Fig. 5: the transformed state spaceΩ d is compact convex subset of the D-dimensional unit ball, with allω x on the surface and µ = 0 in the center.
It is easy to see that the maximally mixed stateμ is in the interior ofΩ d , since it is a mixture of all pure states. Hence there is some ball of radius ε > 0 aroundμ = 0 which is fully contained inΩ d . Thus, if v ∈ R D is any unit vector, then εv/2 must be a valid state inΩ d . As we have proven above, there is some 0 ≤ λ ≤ 1 and some direction x ∈ R d such that εv/2 = λω x + (1 − λ)μ. This is only possible if ε = 2λ and v =ω x -in other words, v ∈Ω d . This proves thatΩ d is the full D-dimensional unit ball. By construction, the map x →ω x is a homeomorphism from the unit sphere in R d to the unit sphere in R D . This proves that D = d.   5: After a reparametrization, we obtain that the direction bit state spaceΩ d is a compact convex subset of a unit ball. Since the maximally mixed stateμ is in the interior, there is some ε > 0 such that the state space contains a full ε-ball around the originμ = 0. But we have proven that all statesω are convex combinations ofμ and some stateωx with |ωx| = 1, thusωx must thus lie on the line starting atμ = 0 and crossingω. Consequently, all points on the sphere must be contained in the state space -we obtain the full unit ball. By dimension counting, it is d-dimensional.
This shows that a direction bit cannot be described by a classical probability distribution: it must carry a non-classical state space, exhibiting uncertainty relations among d independent, mutually complementary measurements. Probabilistic systems of this type, i.e. ball state spaces, have been studied before [45][46][47]. In quantum physics as we know it, there is only one kind of system with a ball state space: it is the qubit, a quantum 2-level state space. It is three-dimensional, which coincides with the spatial dimension, confirming the result we just proved. By classifying the affine maps from the ball to [0, 1], it is easy to check that we must have In the familiar three-dimensional case, if c = 1/2 and a = 1, this describes a quantum spin measurement in direction x; if c = 1/2 or a < 1, it is a noisy spin measurement. To see why ball state spaces satisfy Postulate 2, note first that two states ϕ, ω, with corresponding "Bloch vectors"φ,ω in the d-dimensional Euclidean unit ball, are equally noisy in their directional information if and only if |φ| = |ω|; similarly, ϕ is noisier than ω if and only if |φ| < |ω| (in the case d = 3, where the state space is a qubit, this condition becomes tr(ϕ 2 ) < tr(ω 2 )). For any protocol, and any spherical shell of fixed Bloch vector norm, there is a one-to-one correspondence of states in that shell and spatial directions that these states encode. Thus, if two different states encode the same direction, they must have different norms, and so one is noisier than the other. We say more about the different possible protocols in Lemma 21 in the appendix.

IV. FRAME BITS INSTEAD OF DIRECTION BITS?
Before we formulate Postulates 3 and 4 and prove more properties of direction bits, let us reconsider one basic assumption. As depicted in Fig. 4, we have assumed that the orientation of a measurement device in space is given by a direction vector, implicitly assuming some internal rotational symmetry of the device. What if we drop this assumption? In general, the orientation of a massive body in R d is given by an orthonormal frame, that is, by some oriented orthonormal basis that can be written in the form of an orthogonal matrix X ∈ SO(d), instead of a unit vector x ∈ R d . Thus, an interesting question is what happens if we repeat the calculations above, formulating analogues of the postulates for "frame bits" instead of direction bits.
While we have to leave the general answer open, we can give the answer in a particular special case. First, note that for direction bits as considered above, our calculations show that Alice and Bob can also apply the following protocol: • Alice encodes spatial directions x ∈ R d into the particular states ω x .
• Bob holds a two-outcome measurement device, where the first outcome is described by the effect M y , with y the direction in which the device is pointing. Upon receiving (many copies of) some state ω, Bob looks for the spatial direction y in which M y (ω) is maximal, which will be his guess of Alice's direction x.
Effectively, the device that Bob holds asks the yes-noquestion "Is it this direction that Alice encoded?" The actual direction is the one in which the probability to obtain "yes" is maximal.
Let us now ask whether we can implement the analogous protocol for the case that Alice wants to send a frame X ∈ SO(d). The main idea is that  is used to denote the spatial orientation of an orthonormal frame attached to an object, with the i-th orthonormal vector pointing in direction The protocol is depicted in Fig. 6.
We formulate analogues of Postulates 1 and 2 (encoding and minimality) for this setup, and consider them only in the special case of this protocol. As we show in Appendix B, a calculation very similar to the one above proves that the "frame bit" state space must be generated by pure states ω Y , labelled now by orthogonal matrices Y ∈ SO(d) that are connected by rotations. In complete analogy to above, every state ω can be written in the form ω = λω Y + (1 − λ)µ, where 0 ≤ λ ≤ 1, µ is a maximally mixed state, and Y ∈ SO(d) some frame. Thus, the frame bit state space must also be a Euclidean ball of some dimension D.
At this point, we run into a topological problem: similarly as for direction bits, the map X → ω X turns out to be a homeomorphism, this time from SO(d) to the unit sphere on R D . Since SO(d) is not simply connected for d ≥ 2, but the unit sphere in R D is simply connected for D ≥ 3, this is only possible if D = 2 and thus (from dimension counting) d = 2. Thus, we have proven that there is no convex state space that allows implementation of this protocol, while satisfying analogues of Postulates 1 and 2 -unless d = 2, where a frame is the same as a spatial direction, and the setup reduces to the concept of the direction bit. (We will rule out d = 2 in Section VI, using two further postulates.)

V. SPATIAL GEOMETRY FROM PROBABILITY MEASUREMENTS
Before continuing our derivation, we take another slight detour by asking for the relationship between the geometry of physical space and state space.
As indicated in Fig. 4, our setting assumes that macroscopic objects can be physically rotated. The implicit assumption behind this is that local physical space in the laboratory is a vector space with a Euclidean structure, that is, with an inner product, that determines the notion of a rotation as a linear orthogonal map and, at the same time, allows to compute angles between vectors.
We assume that physical rotations R ∈ SO(d) have representations G R ∈ SO(d) ⊆ G A on the direction bit state space A. As we show in Lemma 23 in Appendix A, group theory dictates that the map R → G R is linear and, moreover, that it is of the form G R = ORO −1 for some orthogonal matrix O. Thus, there is automatically a correspondence between the vector space and Euclidean structures of state space and physical space. This has an interesting consequence: it implies that observers can measure physical angles by measuring probabilities. In other words: even if an observer has no meter stick to measure physical angles, she may infer physical angles from probabilities.
In Appendix C, we give a boot-strapped protocol that allows observers to determine angles from probability measurements on direction bits. This method generalizes the simple quantum-mechanical insight that polarized electrons with spin-up in a fixed direction give probability of spin-up in another direction (of relative angle θ) with probability cos 2 (θ/2). This structural coincidence (which is in particular true for quantum theory) seems remarkable beyond the specific derivation in this paper. Clearly, in this work, we start with postulates that assume a Euclidean structure in physical space, and obtain the ball state space with its reversible rotation transformations as a consequence. It is then not very surprising, though mathematically not FIG. 6: The "frame bit" setup for the special case discussed in Section IV. Instead of a spatial direction, Alice's goal is now to send an orthonormal frame X ∈ SO(d) to Bob, by transmitting some state ω(X) of some arbitrary convex state space. Bob holds a macroscopic binary measurement device that can be rotated arbitrarily in space. In contrast to the situation in Fig. 1 for direction bits, Bob's measurement device does not possess any rotational symmetry, such that its working is specified by the orthonormal basis that defines its spatial orientation, Y ∈ SO(d). Alice and Bob have agreed on the protocol that Bob detects the frame Y ∈ SO(d) in which the "yes"-probability is maximal. This will be his guess for Alice's frame X ∈ SO(d).
In contrast to the "direction bit" situation, we prove that there is no convex state space that allows this protocol while at the same time satisfying the analogue of Postulate 2 -except for spatial dimension d = 2, where frames and directions coincide.
trivial, that observers can use this state space structure to obtain information on spatial angles.
However, irrespective of the specific construction in this paper, it is interesting to speculate whether the physically fundamental order of logic (if there is any) might actually be reversed. In Example 39 in Appendix C, we give a modification of the direction bit setup where this speculation can be shown to make sense.
In this example, space is described by a topological manifold, and Bob's local laboratory space does not have a vector space structure or inner product to begin with. We then assume that there are physical processes that can in a certain sense be interpreted as generalized rotations of a device, yielding reversible transformations on some convex state space. Under specific conditions, we show that Bob can use the coefficients of the measurement outcome probabilities in the space of effects to define natural coordinates on his local physical space.
In these new coordinates, the generalized rotations act linearly and orthogonally on the devices, establishing spatial Euclidean structure that was not assumed to be there in the first place. Even though our particular example is not meant to describe an actual physical mechanism, it is tempting to speculate whether the Euclidean structure of tangent space might be fundamentally inherited from the convexity of probabilities.

VI. PAIRS OF SYSTEMS: POSTULATES 3 & 4
Consider two (distinguishable) direction bits A and B; taken together, they form a joint system AB, described by some convex state space Ω AB . In the usual formulation of quantum theory, the joint state space Ω AB would be given by the density matrices on the tensor product Hilbert space -however, in this paper, we do not assume quantum theory.
In full generality, for two state spaces Ω A and Ω B , the framework of convex state spaces allows infinitely many possible ways to combine them into some Ω AB , restricted only by a few physically obvious constraints. One of them says that if ω A ∈ Ω A and ω B ∈ Ω B are two local states, then there is a joint state ω A ω B ∈ Ω AB which describes the independent local preparation of both states on the subsystems, analogous to product states in quantum theory. Similarly, if M A and M B are measurement outcomes (i.e. effects) on A and B, then by assumption there is a global effect M A M B which asks whether both measurement outcomes have occurred jointly. It satisfies in particular and can be shown to respect the no-signalling conditions [20]. Furthermore, we assume that the local state space A (B) is closed with respect to postselection according to measurement outcomes on B (A), for details see Appendix A.
In physics, we are often interested in expectation values of observables such as energy or angular momentum. Classical as well as quantum physics have an important structural property regarding composite systems: suppose we have two systems A and B of the same type, and h is a single-system observable (in quantum theory, where states ω A are density matrices, this would be a map h(ω A ) = tr(ω A H), where H = H † ). Suppose we are interested in the sum of this observable on systems A and B -this defines a new observable h (2) on pairs of systems, where for uncorrelated states. What if we want to evaluate h (2) (ω AB ) for correlated (possibly entangled) states ω AB ? In quantum theory, there is a unique way to do this, because eq. (4) uniquely determines h (2) and its action on all states. We necessarily have Arguably, this uniqueness is an important property of composite systems -if it failed, it would not be clear how to add up observables on composite systems (for example, there would be no unique notion of a "non-interacting sum of Hamiltonians", and thus no unique physical way to combine systems in trivial non-interacting ways). We promote this property to a postulate.

Postulate 3 (Sums of observables).
If h is any direction bit observable, then there is a unique observable h (2) on pairs of direction bits such that It is easy to see that Postulate 3 holds true if and only if the uncorrelated states ω A ω B linearly span the global state space Ω AB . Thus, Postulate 3 is equivalent to a condition that is usually called "local tomography" in the literature [8]: it says that joint states on AB are uniquely characterized by the local measurement outcome probabilities on A and B and their correlations. Denoting the dimension of Ω A by d A , this is also equivalent to The global state space Ω AB carries its own group of reversible transformations G AB . We assume that Alice and Bob may still apply their local reversible trans- Due to Postulate 3, this transformation is uniquely determined by its action on uncorrelated states: This shows that Postulate 3 also has geometric significance: suppose we decide to carry out a local coordinate transformation; in our case, this is a rotation R ∈ SO(d). This transformation acts on states of direction bits via ω A → G R ω A . The third postulate now tells us that this uniquely determines the coordinate transformation map on (correlated) pairs of systems: they are transformed via ω AB → (G R G R )ω AB , which is the only possible linear map that transforms Every pair of state spaces Ω A and Ω B can be combined into a joint state space Ω AB in accordance with Postulate 3: the "smallest" possible choice (denoted Ω min AB ) is to define it as the convex hull of all product states ω A ω B . On the other hand, the "largest" possible choice (denoted Ω max AB ) is to allow all vectors ω AB such that all local measurements yield valid probabilities, even after postselection [48][49][50]. Every compact convex set Ω AB which satisfies is then a possible choice of the global state space, as long as local reversible transformations map Ω AB into itself. In quantum theory, Ω min AB turns out to be the set of unentangled states, while the actual global quantum state space Ω AB lies strictly in between Ω min AB and Ω max AB . Composites of convex state spaces have been extensively studied in the quantum information literature. Some of this interest is due to the fact that many of these state spaces contain states with non-local correlations that are stronger than those allowed by quantum theory. For example, if Ω A = Ω B is the square state space as in Fig. 3f), then the composite Ω max AB is the no-signalling polytope for two binary measurements on two parties, containing PR box states which violate the Bell-CHSH inequality stronger than any quantum state [20,[51][52][53]. This example also illustrates that the convex state spaces formalism describes a vast landscape of theories with physical properties that can be very different from those of quantum theory.
In the case of two direction bits A and B, where the local state spaces are d-balls, there are also many possible choices of the global state space Ω AB in accordance with Postulate 3, including Ω min AB and Ω max AB . Our fourth and final postulate now states that this global state space allows for continuous reversible interaction.

Postulate 4 (Interaction). For two direction bits
The group T AB t describes continuous reversible time evolution in a closed system of two direction bits: if we start at time t = 0 with a product state ω A ω B , then the state at time t will be ω AB (t) : t for all times t, then the global state would remain a product state forever: In this case, the two direction bits could never become correlated; there would be no interaction. Postulate 4 excludes this: it states that there is at least one time t ∈ R such that T AB t is not of this product form.
The global transformations T AB t and the local transformations G A G B with G A , G B ∈ SO(d) generate a Lie subgroup of G AB ; we call it H AB . Due to (5), it is a matrix Lie group acting on R (d+1)(d+1)−1 . The corresponding Lie algebra is called h AB . Let X be some element of h AB , and consider the circuit in Fig. 7. It depicts the outcome probability of a product measurement on an evolved product state, As we show in Lemma 24 in Appendix A, we may assume without loss of generality that the direction bit state space has noise parameter c = 1/2 and visibility parameter a = 1. This is the "noiseless case", where spin measurements give probabilities M x (ω −x ) = 0 and M x (ω x ) = 1, implying in particular that f (0) = 1 for the circuit in Fig. 7. Since this is the maximal possible value, we must have f (0) = 0 and also f (0) ≤ 0. Thus for all x, y ∈ R d with |x| = |y| = 1. By considering other circuits of this kind, we obtain a long list of constraint equalities and inequalities which must be satisfied by all global Lie algebra elements X.
FIG. 7: Circuit model which yields constraints for the global Lie algebra elements X. We prepare a pure product state ω A x ω B y , apply the transformation exp(tX), and perform a product measurement M A x M B y . Since this gives probability 1 for t = 0, and probabilities cannot be larger than 1 for other (small) t, this implies that the derivative at t = 0 must vanish, and the second derivative must be non-positive.
Surprisingly, as shown in Appendix A and in [54], if d = 3, then the only matrices X which satisfy all constraints are those of the form X = X A + X B . And these elements generate non-interacting time evolution of product form: exp(tX) = exp(tX A ) exp(tX B ). Thus, if d = 3, H AB contains only product transformations, and Postulate 4 cannot be satisfied.
Theorem 2. From Postulates 1-4 it follows that the spatial dimension must be d = 3.
The main reason why d = 3 is special becomes visible by inspection of the proof in [54]. It boils down to the group-theoretic fact that (at least for d ≥ 3) the subgroup of SO(d) which fixes a given vector (that is, SO(d − 1)) is Abelian only if d = 3. In other words, the fact that rotations commute in two dimensions, but not in higher dimensions is the main reason why d = 3 survives. The cases d = 1 and d = 2 are special as well, but are ruled out in the proof for other reasons.
It remains to show that we actually get quantum theory for two direction bits if d = 3. We already know that the dimension of the global state space is dim Ω AB = (d + 1)(d + 1) − 1 = 15, which agrees with the number of real parameters in a complex 4 × 4 density matrix. Thus, we can embed Ω AB in the real space of Hermitian 4 × 4matrices of unit trace. Now we have global Lie algebra elements X ∈ h AB that are not just sums of local generators, i.e. X = X A +X B . However, as shown in [55], these elements are still highly restricted: they generate unitary conjugations, i.e. transformations of the form ρ → U ρU † .
By Postulate 4, at least one of these unitaries must be entangling. Moreover, all local unitary transformations are possible (in the ball representation, these are the rotations in SO (3)). It is a well-known fact from quantum computation [56] that a set of unitaries of this kind generates the set of all unitaries -that is, every map of the form ρ → U ρU † must be contained in the global transformation group H AB ⊆ G AB .
The orbit of this group on pure product states generates all pure quantum states, and one can show [55] that there cannot be any additional non-quantum states. Thus, we have recovered the state space of quantum theory on two qubits. Due to positivity, all effects must be quantum effects; in the noisy case (i.e. c = 1/2 or a < 1), not all quantum effects may actually be implementablethat is, we might have a restricted set of measurements. We have thus proven: Theorem 3. From Postulates 1-4, it follows that the state space of two direction bits is two-qubit quantum state space (i.e. the set of 4 × 4 density matrices), and time evolution is given by a oneparameter group of unitaries, ρ → U (t)ρU (t) † .
As a simple consequence, there exists some 4×4 Hermitian matrix H such that U (t) = exp(−iHt), i.e. a Hamiltonian which generates time evolution according to the Schrödinger equation.

VII. CONCLUSIONS
We have derived two facts about physics from information-theoretic postulates: the threedimensionality of space [57], and the fact that probabilities of measurement outcomes for some systems are described by quantum theory. In order to do this, we assumed that there exist "reasonable" physical systems which, in a certain sense, carry minimal amounts of directional information.
Our result supports and clarifies the point of view that the geometric structure of spacetime and the probabilistic structure of quantum theory are closely intertwined, similar in spirit to [1][2][3][4][58][59][60]. As one can see in Fig. 3, this conclusion becomes particularly obvious in the context of convex state spaces. This interrelation is not only axiomatic, but also operational: as we have shown in Sec. V, observers can measure -or even define -physical angles by measuring probabilities.
Furthermore, these findings suggest exploring possible generalizations: the approach to construct state spaces from physical symmetry properties [70], together with minimality assumptions, might reproduce quantum systems of higher spin, or even physically interesting nonquantum state spaces that have so far remained unexplored.
In summary, there seem to be two possible interpretations of the results in this paper. First, the results might simply be mathematical coincidence, without any deep physical reason underlying them. This is perfectly conceivable; in this case, the main contribution of this paper is a detailed analysis of the structural fit between quantum theory and spacetime. Second, the results might point to an actual logical relation between geometry and probability that arises from some unknown fundamental physics, such as quantum gravity.
If the second possibility turned out to be true, this would suggest an exciting speculation, stated also in [61,62]: In many approaches to quantum gravity, the smoothness and/or three-dimensionality of space is considered to be only an approximation. But then, given the close relation between smooth Euclidean space and the qubit, maybe the universe's probabilistic theory is only approximately quantum? Taking this idea seriously would suggest to go beyond the usual "quantization of geometry" paradigm.
Appendix A: Characterization of all direction bit state spaces The proof consists of four steps: first, we prove that the direction bit state space is a Euclidean ball (possibly noisy, that is, with a restricted set of measurements). Then we show that that the noisy case can always be reduced to the noiseless case. Given this, the results from Ref. [54] do most of the work: they show that only d = 3 is possible. As a last step, in order to obtain quantum theory for d = 3, we refer to the results in Ref. [55].
We start with a formal definition of state spaces. As we have motivated in the main text, the set of normalized states on any system is a compact convex set. To simplify the calculations, it makes sense to start right away with the full set of unnormalized states, which will be all vectors of the form λ · ω, where ω is a normalized state, and λ ≥ 0. This yields a cone in the sense of convex geometry -that is, a subset C of a vector space with the property that x ∈ C implies λx ∈ C for all λ ≥ 0.
For reasons of brevity, we will not give a detailed explanation and motivation of all definitions. For more discussion, we refer the reader to the references mentioned in the main text, in particular to Chapter 3 in [33]. Furthermore, we define the dual cone A * + as the set of all linear functionals which are non-negative on A + , which implies E A ⊆ A * + . The set of all ω ∈ A + with U A (ω) = 1 will be denoted Ω A and is called the set of normalized states.
The requirement dim(E A ) = dim(A) has a simple physical motivation: if dim(E A ) < dim(A), then we would have states ω = ϕ that would yield the same outcome probabilities for all possible measurements, invalidating to call them "different states" in the first place.
To save some ink, we will usually just write A for the state space, instead of writing the full tuple. However, keep in mind that the choice of a state space comes also with a choice of A + , U A and E A .
Given any measurement with an arbitrary number of outcomes, the probability of one of the outcomes -if measured on some state ω ∈ Ω A -will be a real number in [0, 1]. The map M which takes the state ω to the corresponding probability M(ω) must be linear, since statistical mixtures of states must yield mixtures of probabilities. In principle, every linear functional M ∈ E max A may describe a measurement outcome probability, where However, one may imagine that it might be physically impossible to implement measurement devices for all these linear functionals. This is why the set E A is introduced in the definition above: it is meant to describe the collection of all possible effects that may actually be implemented in measurements. Clearly, we have E A ⊆ E max A . In some publications (e.g. [23]), it is assumed that E A = E max A , but not in this paper. In other words, we are not assuming the "no-restriction hypothesis" here [24]. The possibility to have E A = E max A describes situations, as we will see below, where all measurements on a direction bit are by necessity intrinsically noisy.
As an example, in finite-dimensional n-level quantum theory, • A is the real vector space of Hermitian matrices on C n , • A + is the set of positive semi-definite matrices on C n , • U A (ρ) = tr(ρ) is the trace functional, • E A is the set of all maps of the form ρ → tr(ρM ), with M ≥ 0 a positive semi-definite matrix, • Ω A is the set of density matrices on C n .
Similarly, the state space of classical n-level probability theory is (B, B + , U B , B + ), where • U B (p) = p 1 + p 2 + . . . + p n , • E B is the set of all maps p → p · q with q = (q 1 , . . . , q n ) such that all q i ≥ 0, where · denotes the Euclidean inner product, • Ω B is the set of all probability distributions: In both classical probability theory and quantum theory, all effects are allowed.
We would like to talk about reversible transformations on state spaces. To this end, we define is a state space, and G A is a compact (possibly finite) group of linear maps on A, is called a dynamical state space, if every G ∈ G A satisfies These two conditions say that reversible transformations must respect the set of normalized states and the set of allowed effects. It is easy to see that the first condition implies that A * + • G = A * + for all G ∈ G A . In quantum theory, G A is the group of all maps of the form ρ → U ρU † , with U unitary. In classical probability theory, G B is a representation of the permutation group. Specifically, for every permutation π, there is a reversible transformation G π with G π (p 1 , . . . , p n ) = (p π(1) , . . . , p π(n) ).
Here is a rigorous definition of equivalence of state spaces: Definition 6 (Equivalent state spaces). Two state spaces A and B are equivalent if there exists a bijective linear map L : A → B such that the following conditions are satisfied: Two dynamical state spaces A and B are equivalent, if they are equivalent as state spaces and additionally satisfy This is clearly an equivalence relation. If two (dynamical) state spaces are equivalent, they are indistinguishable in all their physical properties. Now we show how the notion of noisiness in Postulate 2 can be seen as a special case of "group majorization", a natural definition of noisiness with respect to a group that encompasses the classical and quantum cases in the obvious way. This definition is well-known in the mathematics literature [25]; we rephrase it in Definition 8 below in the context of convex state spaces. We start by showing a simple consequence of Postulate 2.
Lemma 7. Suppose that ω and ϕ are both possible encodings of the same direction x ∈ R d , |x| = 1, in some protocol that satisfies Postulate 1. From Postulate 2, it follows that there exist 0 < λ j ≤ 1, j λ j = 1, and rotations or with ϕ and ω interchanged. If ϕ = ω then this is a proper convex combination.
Proof. According to Postulate 2, the assumptions of this lemma imply or vice versa (in the latter case, rename ϕ and ω to fit this formula). Set ω : z (ϕ) for all z. But then ω could be used as a replacement for ϕ in the protocol, namely, as yet another codeword for direction x. Moreover, ω and ϕ are by construction equally noisy in their directional information, so Postulate 2 implies that they must be equal. Now we show how this fits into a majorization framework.
Definition 8 (Group majorization). Let A be any dynamical state space, and H a compact subgroup of G A . Then we define a relation H on Ω A in the following way: for ω, ϕ ∈ Ω A , it holds ω H ϕ if and only if there are λ i ≥ 0, i = 1, . . . , n, We write ω ϕ if any only if ω G A ϕ.
Proof. Property (ii) is trivial, by setting λ 1 = 1 and T 1 = 1 in (A2). If ω H ϕ, then if T 1 , T 2 ∈ H, proving (iv). If additionally ϕ H ρ such that ϕ = j λ j T j ρ, then ω = ij λ i λ j T i T j ρ H ρ. This proves (i). It remains to prove (iii). To this end, introduce an inner product ·, · on A which is invariant with respect to G A (and thus H), i.e.
x, y = T x, T y for all x, y ∈ A, T ∈ G A .
Moreover, let · be the corresponding norm. Then (A2) and the triangle inequality yield Thus, if both ω H ϕ and ϕ H ω, then ω = ϕ = T i ϕ =: r. Let S r be the unit sphere of radius r, then (A2) says that ω ∈ S r is a convex combination of the T i ϕ ∈ S r . Geometrically, it is clear that this is only possible if T i ϕ = ω whenever λ i = 0 (formally, it follows from the fact that all boundary points of the ball are exposed points). Setting T := T i for any of these i proves (iii). Now we see how our definition of noisiness from Postulate 2 fits naturally into the well-known notion of majorization. In the case of quantum theory (with the full unitary group), it follows from [37,Thm. 12.13] that our relation is identical to Nielsen's majorization relation on density matrices. From Lemma 7, we obtain the following: Given two state spaces A and B, we would like to define a composite state space AB which, according to Postulate 3, satisfies the local tomography property [10]: states on AB are uniquely characterized by the statistics of local measurements. Eq. (5) in the main text translates into dim(AB) = dim A dim B; thus, we may choose the vector space AB to be the tensor product A ⊗ B. This will turn out to be a handy choice: we can represent independent preparations ω A ω B by products ω A ⊗ ω B . We get the following definition: Definition 11 (Locally tomographic composite). Given two dynamical state spaces A and B, a dynamical state space (AB, (AB) + , U AB , E AB , G AB ) will be called a composite of A and B, if the following conditions are satisfied: • the linear space which carries the state space is AB = A ⊗ B, • for every N B ∈ E B and ω AB ∈ (AB) + , the vector ω A cond ("conditional state") defined by is a valid state, i.e. ω A cond ∈ A + , and similarly for A and B interchanged. Note that eq. (A3) is automatically satisfied if all effects on A are allowed. It means that we cannot get "new" states outside of Ω A by preparing global states and postselecting on local measurement outcomes. Similarly, we might demand that any map of the form for a fixed bipartite effect M AB and fixed state ω B is itself a valid effect on A. If this is violated, then the set of possible local measurements on A is increased by composing it with the other system B. However, since we do not need this condition in the following, we decided not to have it as part of the definition in order to have a result which is as general as possible.
By setting N B := U B in eq. (A3), we obtain the conditional state which Alice sees if Bob does not perform any measurement. This is the reduced state ω A of ω AB ∈ Ω AB , satisfying Thus, Definition 11 ensures that global states have valid reduced states (marginals).
We continue by proving two claims in the main text in the following two lemmas: Lemma 12. With the notation of the main text (in particular, x = (1, 0, . . . , 0) T ), there is some outcome i 0 and some direction y ∈ R d , |y| = 1, such that M −y (ω(x)).
Proof. As we have shown in the main text, the probabilities M  1. Bob's laboratory is aligned in exactly the same way as Alice's -that is, both share the same coordinate system (maybe by chance). In this case, Bob's coordinates x B of x are the same as Alice's: x B = (1, 0, . . . , 0) T .
2. Compared to Alice's laboratory, Bob's lab is oriented differently, namely it is rotated by S relative to Alice. In this case, Bob's coordinates x B of x are x B = (−1, 0, . . . , 0) T .
Since Alice does not know which of the two situations (or any of the infinitely other possible ones) apply to Bob's laboratory, her encoding x → ω(x) must work in both cases. However, due to M Sy (ω(x)) for all i and y, Bob sees exactly the same outcome probabilities in both cases, leading with probability one to the same estimate x B . This contradicts the soundness of the protocol, i.e. Postulate 1.
Lemma 13. Let i 0 be any outcome that satisfies the statement of Lemma 12. Then there is some direction y ∈ R d , |y| = 1, such that the state ω (y) :=

R∈SO(d):Ry=y
has the property that the map z → L z (ω (y)) attains its unique global maximum at z = y.

R∈SO(d):Ry=y
We have to show that this is strictly less than max . To this end, we define a continuous path on the surface of the d-dimensional unit ball. We will assume that z * 1 > 0; the case z * 1 < 0 is treated analogously (z * 1 = 0 is excluded from , let z(t) ∈ R d be some vector with |z(t)| = 1 such that t → z(t) is continuous, z(−z * 1 ) = −y, z(z * 1 ) = y, and such that the first component of z(t) equals t. If z ∈ {−y, y}, then there is some t ∈ (−z * 1 , z * 1 ) such that |z − y| = |z(t) − y|. Hence there is some R ∈ SO(d) with Ry = y such that R −1 z = z(t), and since |t| < |z * 1 |, we have But this expression appears in (A5): the integrand is upper-bounded by max for all R, and is strictly less than max for the rotation R that we have just found. This proves that L z (ω (y)) < max . Now we are ready to give a thorough definition of a "direction bit". It is arguably difficult to formalize Postulates 1 and 2 from the main text into a rigorous mathematical definition: rigorously defining what is meant by a "protocol" seems hardly worth the effort (the result would be long and not very illuminating); similarly, a formalization of the physical intuition about spatial symmetries (rotating the device versus the direction bit etc.), as used in the initial stage of the proof, seems over the top for the purpose of this paper. Instead, we use two consequences of Postulates 1 and 2, called Assumptions 1 and 2, as derived in the main at an intermediate stage of the proof, to write down a definition of direction bits. This avoids talking about the physical background situation, but ensures that all the "convex state space" argumentation rests on rigorous mathematical grounds.
The meaning of the assumptions is as follows. Assumption 1 states that the standard protocol which we have constructed in the main text works: There is some state ω which may serve as a codeword for some direction x in the standard protocol. This is because the quantity L y (ω), i.e. the difference of probabilities in directions y and (−y), has unique maximum in y = x. Assumption 2 formalizes the consequence of applying Postulate 2 to the special case of the standard protocol, proven in the main text: if two states encode the same direction in the standard protocol, with the same maximal value of L, they must agree. Assumption 3 subsumes Postulates 3 and 4. • Assumption 1: There exists ω ∈ Ω A with L x (ω) > L y (ω) for all y = x, where L y := M y − M −y .
• Assumption 2: Suppose that ω, ω ∈ Ω A are states with the property that the maps y → L y (ω) and y → L y (ω ) both have a unique maximum in the same direction y 0 , and the maximal value is the same: L y0 (ω) = L y0 (ω ).
• Assumption 3: There exists a locally tomographic composite AB of A and B := A with the property that G AB contains a one-parameter subgroup {G AB t } t∈R for which there exists t ∈ R such that G AB t cannot be written in the form G A ⊗ G B with G A ∈ G A and G B ∈ G B . Now the claims of the main text will be proven in detail.
Lemma 15. Let A be a direction bit for spatial dimension d, with distinguished direction x ∈ R d . Then there is a constant 0 < a ≤ 1 which we call visibility parameter with the following property: Moreover, for every y ∈ R d with |y| = 1, there is a unique state ω y ∈ Ω A such that L y (ω y ) = a and L z (ω y ) < a for all z = y. Furthermore, G S ω y = ω Sy and M y • G S = M S −1 y for all S ∈ SO(d) (resp. S ∈ O(1) if d = 1), and the maps y → ω y and R → G R are both homeomorphisms into their images (in the subspace topology).
Proof. In all of this proof, if d = 1, then all appearances of SO(d) shall be replaced by O (1). Let x ∈ R d be the direction bit's distinguished direction (cf. Definition 14), and M ≡ M x the distinguished effect. Let ω ∈ Ω A be any state with L x (ω) > L y (ω) for all y = x (it follows that L x (ω) > 0). Let ω ∈ Ω A be any other state satisfying L x (ω ) > L y (ω ) for all y = x and at the same time L x (ω ) > L x (ω) (if no such state exists, we are done: just set a := L x (ω) and ω x := ω). Define the state µ := R∈SO(d) G R ω dR. By invariance of the Haar measure, there is a constant β ≥ 0 such that M y (µ) = β for all y, and thus L y (µ) = 0. Set λ := 1 − L x (ω)/L x (ω ) ∈ (0, 1), and ϕ := λµ + (1 − λ)ω , then by construction L x (ϕ) > L y (ϕ) for all y = x, and L x (ϕ) = L x (ω). Thus, Assumption 2 implies that ω = ϕ = λµ + (1 − λ)ω . In summary, all states that have x ∈ R d as their unique maximizing direction of L • lie on the line which starts at µ and extends through ω to infinity.
Since the state space is compact, this line will hit the topological boundary of Ω A in some state that we call ω x . By construction, there is some λ ∈ [0, 1) such that ω = λµ + (1 − λ)ω x . But then, L x (ω) > L y (ω) for all y = x implies the analogous strict inequality for ω x . Set a := L x (ω x ), then it has the claimed property. For every y ∈ R d with |y| = 1, choose some R ∈ SO(d) with Rx = y, and set ω y : Let z = y be an arbitrary vector with |z| = 1, and let S ∈ SO(d) be any transformation with Sx = z. Then z = Rx, hence R −1 Sx = x, and so It follows directly from Assumption 2 that ω y is the unique state with these two properties. Recalling Definition 14, we also have A simple calculation also shows that ω := G S ω y has the properties L Sy (ω) = a and L z (ω) < a for all z = Sy, which shows that ω = ω Sy . Next we show that the map y → ω y is continuous. To this end, let {y n } n∈N be a sequence of vectors in R d with |y n | = 1 which converges to some vector y. Clearly, we can find a sequence of orthogonal linear maps {R n } n∈N with R n y n = y and R n n→∞ −→ 1. By continuity of the group representation, we have G Rn n→∞ −→ 1, and thus since the state space is compact. Since ω y = ω z for y = z, the map y → ω y is a continuous injective map from the compact unit sphere in R d to its image. Thus [63,64], it is a homeomorphism into its image.
Similarly, the calculations above show that R = S implies that G R = G S . Since the map R → G R is continuous, it is a homeomorphism into its image.
The next lemma also serves as a definition of the maximally mixed state.
The resulting state µ does not depend on the choice of x. Moreover, there is a constant 0 < c < 1 such that M x (µ) = c for all x ∈ R d with |x| = 1, and G R µ = µ for all R ∈ SO(d) (resp. R ∈ O(1) if d = 1). We call c the noise parameter of the direction bit A.
Proof. If d ≥ 2, it follows from G R ω y = ω Ry that the definition of µ does not depend on the choice of x. The identity G R µ = µ follows from the invariance of the Haar measure. Set c := M x (µ), and let y ∈ R d be any vector with |y| = 1. Proof. Let ω be an arbitrary direction bit state. By compactness of the unit sphere and continuity, there exists x ∈ R d , |x| = 1 such that L x (ω) ≥ L y (ω) for all y ∈ R d with |y| = 1 (there may be several maximizers x; we choose one of them arbitrarily). For 0 < ε < 1, define ω ε : , so Assumption 2 proves that ω ε = ω ε . Since λ := lim ε→0 λ ε exists (and equals L x (ω)/a), we can take the limit ε → 0 of this equation and obtain ω = λω x + (1 − λ)µ.
Since |ω x | = 1, this is only possible if λ = ε/2 and v =ω x . But this implies that v ∈Ω B . We conclude that all unit vectors are contained inΩ B -thus, by convexity,Ω B is the full unit ball. Since all points on its surface are of the formω x for some direction x, Lemma 15 implies that the map x →ω x is a homeomorphism from the unit sphere in R d to the unit sphere in R D . It follows that d = D.
where α ∈ R, β > 0, andv x ∈ R d is some unit vector. First, M x (µ) = c andμ = 0 implies α = c. For every rotation R ∈ SO(d), acting on direction bit states via G R , denote byĜ R the corresponding transformation in the ball picture, i.e.
We know thatĜ R ∈ SO(d), too. For arbitrary directions y ∈ R d , |y| = 1, choose R ∈ SO(d) with Rx = y, then This expression attains its maximum in y for y = x, thusv x =ω x . It follows that a = L x (ω x ) = 2β ω x ,ω x = 2β, hence β = a/2, and we obtain M x (ω) = c + (a/2) ω x ,ω . Due to eq. (A6), the analogous equation holds true for all other directions y = x. Thus, we know that all these M y must be allowed effects, i.e. elements of E B . In the following, we always assume that we have chosen the ball representation right from the start, such that E x = M x . Lemma 18 implies that direction bits have at most two perfectly distinguishable states in their state space, and not more. This justifies the name "direction bits". In more detail, if A is any state space, call a set of states ω 1 , . . . , ω n ∈ Ω A perfectly distinguishable if there are effects E 1 , . . . , E n ∈ E A with E 1 + . . . + E n = U A such that E i (ω j ) = δ i,j , that is 1 if i = j and 0 otherwise. The maximal number of any set of perfectly distinguishable states will be called the capacity N A [8,18]. In the special case of a quantum system, N A equals the system's Hilbert space dimension. The following lemma is well-known in the context of general probabilistic theories; we give the proof for completeness. It says that ball state spaces of any dimension d are bits, i.e. have capacity N = 2; this includes classical bits (d = 1) and quantum bits (d = 3) as special cases.

Lemma 19. If
A is a Euclidean ball state space with all effects allowed, i.e.
then it has capacity N A = 2, i.e. it is a generalized bit.
Proof. Let r ∈ R d be any unit vector, |r| = 1. Set ω 1 := (1, r) T ∈ Ω A and ω 2 := (1, −r) T ∈ Ω A , then the two functionals are effects in E A that perfectly distinguish ω 1 and ω 2 and sum up to U A . Thus N A ≥ 2. Suppose there are n ≥ 3 perfectly distinguishable states ω 1 , . . . , ω n ∈ Ω A , with corresponding effects E 1 , . . . , E n . Consider the hyperplane H := {x ∈ R d+1 | E 1 (x) = 0}; it is a support hyperplane [65] of Ω A . Furthermore, since ω 2 , . . . , ω n ∈ H, it contains more than one point of Ω A , so H ∩ Ω A is a face of Ω A that contains more than one point. However, all faces of Euclidean balls contain only one point; we obtain a contradiction. Hence N A ≤ 2.
According to Lemma 18, direction bit state spaces are Euclidean balls. In the case that all effects are allowed (which, as we show later, corresponds to the "noiseless" case with visibility and noise parameters a = 1 and c = 1/2), they have therefore capacity N = 2, i.e. they are in fact bits as the name suggests. If not all effects are allowed, then direction bits are noisy versions of bits (formally they have capacity N = 1). Thus, in contrast to von Weizsäcker [1], we do not assume from the beginning that our physical systems under consideration are 2-level systems, but we prove this from the postulates.
Proof. We know that a > 0 due to Lemma 15. In M x (ω) = c + (a/2) ω x ,ω , the inner product can attain any value in the interval [−1, 1] by choosingω in the unit ball appropriately. But M x (ω) is an outcome probability, hence in the interval [0, 1]. Working out the corresponding inequalities proves the claimed constraint on a.
Now that we know that direction bit state spaces are unit balls, we can say a bit more on the set of possible protocols satisfying Postulates 1 and 2. Surprisingly, dimension d = 2 turns out to be special, as illustrated in Fig. 8 below.
Lemma 21. Consider any protocol satisfying Postulates 1 and 2, under the additional requirement that every state ω = µ may be used to encode some direction x(ω) ∈ R d , |x(ω)| = 1, such that Bob's decoding ω → x(ω) is a continuous map. If d = 2 then there is an orthogonal matrix O ∈ O(d) such that that is, up to a fixed rotation (and possibly reflection), physical directions are encoded into Bloch vectors that point into the corresponding direction in state space. In particular, for d = 2, if ω and ϕ encode the same physical direction, then there is λ ∈ [0, 1] such that ω = λϕ + (1 − λ)µ or vice versa (i.e. with ω and ϕ exchanged) -that is, one of the states is obtained from the other by adding uniform noise.
Proof. Suppose that d = 1. Then every state ω = µ has one-dimensional "Bloch vector"ω ∈ [−1, 1] \ {0}. There are two possible directions, +1 and −1, which have to be encoded in accordance with Postulate 1. If this is done in a continuous way, the only two possibilities are proving the claim. Now consider the cases d ≥ 3. For every x ∈ S d−1 , define the stabilizer subgroup G x := {R ∈ SO(d) | Rx = x}. Let ω = µ be an arbitrary state. From the main text, we know that R ∈ G x(ω) implies that G R ω = ω, henceĜ Rω =ω = O −1 ROω with some orthogonal matrix O, so R ∈ G Oω . We get G x(ω) ⊆ G Oω . Since  FIG. 8: In d = 2 spatial dimensions, Alice and Bob may use a protocol that is unavailable in other dimensions: they may agree that Bob decodes mixed states with a purity-dependent rotation. That is, if Bob obtains many copies of some state ω and determines r := |ω|, his output will be Rrx, where x is the direction encoded in the pure state with Bloch vectorω/r, and Rr ∈ SO(d) is a rotation depending on r. The figure shows possible level sets of states in the disc state space that encode the same spatial direction. This strategy is impossible in higher dimensions, because without any shared reference frame, Alice and Bob will not be able to agree on a 2-dimensional reference subspace which carries the corresponding rotation. As a result, the level sets must be straight lines for d ≥ 3, as proven in Lemma 21. both groups are isomorphic to SO(d − 1) this implies equality. Since d ≥ 3, this in turn implies that x(ω) is parallel to Oω; thus, there is a sign σ(ω) ∈ {−1, +1} such that x(ω) = σ(ω)Oω/|ω|, and the sign (plus or minus) cannot depend on ω due to continuity of ω → x(ω). This proves the claim after possibly redefining O → (−O). We now prove a technical lemma which is related to the claim in the main text that the angles inferred from state space must agree with those in physical space (discussed in more detail in Appendix C below). We show that the map x →ω x which maps direction vectors x ∈ R d , |x| = 1, to pure states' Bloch vectorsω x is linear: there is some orthogonal matrix O ∈ O(d) such thatω x = Ox. This follows from the following lemma: Proof. According to Lemma 23, there is an orthogonal matrix O ∈ O(d) withĜ R = ORO −1 for all R ∈ SO(d). The lemma will be proven by distinguishing several cases.
First, consider the case that d is odd. Let x ∈ R d , |x| = 1 be arbitrary, then there is some R ∈ SO(d) for which the multiples of x are the only eigenvectors of eigenvalue 1, i.e. Ry = y with y ∈ R d is equivalent to y = αx with α ∈ R. But then and so O −1ω x ∈ {−x, x}. Since this is true for every direction x, and the map x →ω x is continuous, we either havê ω x = Ox for all directions x (in which case the lemma is proven), orω x = −Ox for all directions x, in which case we can replace O by (−O) and obtain the statement of the lemma as well.
Next, consider the case d = 2. For every x ∈ R d with |x| = 1, define We (1,0) T . As a map x →ω x , this is manifestly linear. Since it preserves the Euclidean norm, it must be orthogonal.
Finally, consider the cases of even d ≥ 4. Let S ⊂ R d be any 2-dimensional subspace. Suppose that x ∈ S. Clearly, there is R ∈ SO(d) which acts as the identity on S (and nowhere else), i.e. y ∈ S ⇔ Ry = y. But x ∈ S, hence x , and thus O −1ω x ∈ S. Now let S, S be two 2-dimensional subspaces with S ∩ S = span{x}. Since x ∈ S and x ∈ S , we have O −1ω Proof. Since every continuous homomorphism of Lie groups is analytic [66], φ induces an automorphism on the Lie algebra so(d), uniquely determined by its action on the neighborhood of the identity. But not every automorphism of a Lie algebra g necessarily induces an automorphism on the corresponding group G. Ref. [67] contains the automorphisms of the Lie algebras so(d), and in what follows, we figure out which of these correspond to automorphisms of SO(d).
One particular type of automorphisms for both, G and g, are conjugations by group elements, that is X → gXg −1 where g ∈ G. These are called inner automorphisms. Proposition D.40 from [67] (page 498) tells us that all automorphisms of a Lie algebra g are generated by the inner automorphisms times the symmetries of the associated Dynkin diagram. The Dynkin diagram of so(2n + 1) has no symmetries, hence all the corresponding automorphisms are inner. This proves the lemma for odd dimension.
Exercise 22.25 in [67] (page 362 with answer on page 529) states that for n ≥ 5, the symmetry of the Dynkin diagram of so(2n) is implemented by a conjugation X → P XP −1 , with P ∈ O(2n). This proves the lemma for even dimension d ≥ 10. In what follows we consider separately the cases d = 2, 4, 6, 8.
Case d = 2. The Lie algebra so(2) is a one-dimensional real vector space with trivial commutator. Hence, the automorphisms are X → αX for any real α. It is easy to see that among these, the only ones which induce an automorphism in SO(2) are the identity and α = −1. The second one can be implemented as Case d = 4. In page 274 of Ref. [67], it is shown that so(4) ∼ = su(2) ⊕ su(2) ∼ = so(3) ⊕ so (3). Hence, all the automorphisms of so(4) are those of so (3), which as shown above are inner, together with the exchange of the two summands in so(3) ⊕ so (3), which can also be implemented by conjugation.
Case d = 6. The standard representation of SO (6) is equivalent to the antisymmetric product of two copies of the standard representation of SU (4) (see page 284 in [67]), which is irreducible. This also implies that this representation of SU (4) is real, and hence, equivalent to its dual (or complex conjugated) representation (see page 218 in [67]). Exercise 22.25 in [67] (page 362 with answer on page 529) shows that the symmetries of the Dynkin diagram of SU (4) are implemented by complex conjugation X → X * . Since the representation of SU (4) equivalent to the standard representation of SO(6) is real, complex conjugation leaves the algebra and the group invariant. So, the only automorphisms of so(6) and SO(6) are inner.
Case d = 8. In this case the Dynkin diagram has the larger symmetry called triality. Section 20.3 in [67] shows that this symmetry permutes the defining representation of so(8) and the two fundamental spin representations S + , S − . This cannot be a symmetry of SO(8), since the exponentiation of S + or S − gives the group Spin(8), which is different from SO (8). So the only nontrivial automorphisms of SO(8) are inner.
We will now show that it is sufficient to consider "optimal" direction bits, i.e. ones with visibility parameter a = 1 and noise parameter c = 1/2. The idea is to take the state space A of a direction bit with c = 1/2 and/or a < 1 and to modify it by allowing all effects A * + . The bipartite state space of two modified direction bits is then defined as the orbit of G AB on the product states and effects. However, it has to be shown that this results in a valid state space; in particular, all probabilities must be positive. This is shown in the following lemma. It uses the definition of E max A as given in eq. (A1).
Lemma 24. Suppose that A = (A, A + , U A , E A , G A ) is a direction bit for spatial dimension d with arbitrary visibility and noise parameters a and c, with joint state space for two direction bits AB = (A ⊗ B, (AB) + , U A ⊗ U B , E AB , G AB ). Then is a direction bit for spatial dimension d with visibility parameter a = 1 and noise parameter c = 1/2, with a possible state space of two direction bits given by , and G ∈ G AB . Proof. Throughout the proof, if d = 1, replace SO(d) by O(1). Clearly, A is a valid state space. We know that to every direction x ∈ R d , there is a state ω x ∈ Ω A such that M x (ω) = c + (a/2) ω x ,ω for all ω ∈ Ω A . The linear map M x (ω) : It is easy to check that Let M be the effect from Definition 14, and M the effect on A related to it by the previous equation. Using the notation from Definition 14, where R ∈ SO(d) is a rotation with y = Rx, we have Thus, the prerequisites and Assumptions 1 and 2 from Definition 14 are satisfied for A . In order to show that Assumption 3 holds true, we have to prove that A B is a valid composite of A and B . Clearly, Since Ω AB is compact, so must be Ω A B . Since A + ⊗ B + spans the full space A ⊗ B, so does (A B ) + . This shows that (A B ) + is a proper cone.
According to Definition 11, all that remains to do is to show that (E max Clearly this is a closed convex set spanning A ⊗ B. It remains to show that all its elements are non-negative and no larger than one on Ω A B ; by convexity, we only have to show this for the elements and G ∈ G AB . Finally, convexity for the state cone additionally implies that it is sufficient to prove that The set E max A is easy to characterize: for every effect M ∈ E max A , there are λ, κ ∈ R and x ∈ R d with |x| = 1 such that M(ω) = λ ( ω x ,ω + 1)+κ for all ω ∈ Ω A . A negative sign of λ can be removed by the substitutionω x → −ω x =ω −x , so we may assume λ ≥ 0. Since M(ω) ∈ [0, 1] for all ω ∈ Ω A , we get 0 ≤ κ ≤ 1 − 2λ and λ ≤ 1/2. It follows that We can express N ∈ A * + in an analogous form, replacing λ, κ, x by λ , κ , y. Since conditional states are included in the local state spaces by definition, eq. (A3) and Corollary 20 imply for every ω AB ∈ Ω AB that where ω A cond is the conditional state on A after having obtained M y on B, and ω B is the marginal on B. If ω A ∈ Ω A , ω B ∈ Ω B , and G ∈ G AB , then by definition the vector ω AB := G(ω A ⊗ ω B ) is a valid state in Ω AB . Using that U A ⊗ M y (ω AB ) = M y (ω B ) with ω B the marginal of ω AB , the expression in eq. (A7) can be lower-bounded by An analogous calculation yields the upper bound M ⊗ N (ω AB ) ≤ (2λ + κ)(2λ + κ ) ≤ 1. We have proven that A B is, as given, a valid composite state space. Finally, We obtain an immediate consequence: There is no direction bit for spatial dimension d = 1.
is a direction bit for spatial dimension d = 1, and A is its optimal modification from Lemma 24, with A B the composite state space of two modified direction bits. We know that Ω A B contains at least all combinations of product states; that is, On the other hand, E A B contains all product effects. That is, if ω A B ∈ Ω A B , then for all possible choices of signs. It is easy to see that this is only possible of ω A B ∈ Φ A B : the four inequalities give the half-space representation [68] of the tetrahedron Φ A B . It follows that Ω A B ⊆ Φ A B , and thus equality of these sets: the state space for two modified direction bits is a tetrahedron, that is, a classical four-level system. It has only finitely many (four) pure states; thus, G A B = G AB must be a finite group. This contradicts Assumption 3 on direction bits. All dimensions d ≥ 2 for the ideal case a = 1 and c = 1/2 have been examined in [54]: there it is shown that for d = 3, all possible composites of d-dimensional ball state spaces have transformation groups that are non-interacting, therefore contradicting Postulate 4 resp. Assumption 3 in the definition of direction bits. (In Appendix D, we give a simplification of a key lemma in [54] for the special case of this paper.) Lemma 24 extends this result to noisy direction bits with c = 1/2 and/or a < 1: Theorem 26. There are no direction bits for spatial dimensions d = 3.
As mentioned in [54], we prove in [55] that the only possible composite of two noiseless 3-dimensional ball state spaces (up to equivalence), under the assumptions of Definition 11, is quantum theory on two qubits. Now we show that this extends to noisy 3-dimensional balls, with the only difference that the set of effects might get reduced: Theorem 27. Every direction bit for spatial dimension d = 3 (regardless of visibility and noise parameters a and c) can be represented as a "noisy qubit" • A is the real vector space of Hermitian 2 × 2 matrices, • A + is the set of positive semidefinite complex 2 × 2 matrices, • the unit functional U A is the map ρ → tr(ρ), • E A is a subset of the quantum effects, containing all maps of the form ρ → tr(ρM ), with M a positive semidefinite 2 × 2 matrix satisfying tr(M ) = 2c and the operator inequality M ≤ (c + a/2) · 1, • G A is the projective unitary group, ρ → U ρU † with U ∈ SU (2).

The joint state space of two direction bits is then by necessity
• A ⊗ B is the real vector space of Hermitian 4 × 4 matrices, • (AB) + is the set of positive semidefinite 4 × 4 matrices, • the unit functional U AB = U A ⊗ U B is the map ρ → tr(ρ), • E AB is some subset of the quantum effects ρ → tr(ρM ) with M a positive semidefinite 4 × 4 matrix, 0 ≤ M ≤ 1, • G AB is the projective unitary group, ρ → U ρU † with U ∈ SU (4).
Proof. The standard Bloch representation of a qubit shows that the vector space A as well as A + and U A can be chosen in the claimed form. In the ball representation, it is clear that the only possibilities for G A are SO(3) and O(3). The noiseless version A from Lemma 24 will have the same transformation group. However, it is shown in [55] that O(3) is impossible if we want to construct a global state space A B with interaction out of noiseless 3-balls (in a nutshell, the group O(3) would introduce partial transpositions on A B which yield negative probabilities). Thus, the group must be SO (3), which (in the chosen representation) is the projective unitary group. It is easy to confirm that the special effects M x , x ∈ R 3 , |x| = 1, can be represented in the following way: for every x, there is a complex unit vector |ϕ x ∈ C 2 with These matrices have trace tr(M ) = 2c. The convex hull of all these matrices is a subset of E A : As it has been shown in [55], the noiseless composite state space A B equals quantum theory on two qubits. Since G A B = G AB in the construction of Lemma 24, it follows that G AB must be the projective unitary group as claimed (that the vector space is AB = A ⊗ B and U AB = U A ⊗ U B follows directly from the definition of a composite, Definition 11). Since the unitary group generates the set of all quantum states from a pure product state, it follows that the quantum state space of two qubits, Ω Q := {ρ ∈ A ⊗ B | tr(ρ) = 1, ρ ≥ 0}, is contained in Ω AB . Suppose there was any σ ∈ Ω AB \ Ω Q , then this would be a Hermitian matrix with at least one negative eigenvalue. Using an appropriate unitary U , this matrix can be diagonalized and be brought into the form U σU † = 1 i,j=0 λ i,j |i i| ⊗ |j j| with λ 0,0 < 0, denoting by {|0 , |1 } a basis of C 2 . Using the linear functional M(ρ) := N (ρ) := 0|ρ|0 , we get M ⊗ N (U σU † ) = λ 0,0 < 0. However, this contradicts ineq. (A9), which shows that all noiseless product quantum measurements M ⊗ N on all bipartite states ω AB ∈ Ω AB must yield positive probabilities. Therefore Ω AB = Ω Q , hence (AB) + is the set of positive semidefinite (4 × 4)-matrices.
We do not really know what E AB is: since all its elements must be non-negative on all quantum states, it must be a subset of the quantum effects. Since there are no further conditions on E AB in Definition 11, it could possibly coincide with the set of quantum effects, or be a proper subset. All we know is that it contains the unitary orbit of all allowed product effects.
This proves the claim.
In the following, it will turn out to be useful to introduce some abbreviations. Call any state ω with the property M X (ω) > M Y (ω) for all Y = X a codeword for X. Furthermore, for every state ω, define Note that ∆ is continuous, but in general non-linear. Clearly ∆(G R ω) = ∆(ω) for all R ∈ SO(d). Furthermore, we have Proof. Let ψ := i λ i ω i , let Y be some frame with M max (ψ) = M Y (ψ), and let Z be some frame with M min (ψ) = M Z (ψ). Then Assume now λ i > 0 for all i. Inspecting the single inequality in the chain above proves the claimed condition for equality. Assumptions 1' and 2' imply the following: Lemma 31. Suppose that ω and ω are both codewords for the same X ∈ SO(d), and ∆(ω) = ∆(ω ). Then ω = ω .
Proof. Suppose that ω = ω . Then Assumption 2' implies that ω = j λ j G R −1 j ω for 0 < λ j < 1, j λ j = 1 and rotations R j ∈ SO(d) with R j = R k for j = k (if vice versa, rename ω ↔ ω ). Using Lemma 30 we get j ω) for all j. Since we have equality by assumption, it follows that for all j.
But since ω by assumption has a unique maximizing direction, we must have R j Y = R k Y for all j, k and thus R j = R k , which is a contradiction. Now we prove the existence of a unique maximally mixed state, and a bit more: There is a unique state µ such that c := M Y (µ) is constant in Y ∈ SO(d). Moreover, if ω and ω are both codewords for X with ∆(ω ) < ∆(ω), then ω = λω + (1 − λ)µ for λ := ∆(ω )/∆(ω).
The following lemma is the frame bit analogue of Lemma 15: There is a constant 0 < b ≤ 1 which we call intensity parameter such that for all X ∈ SO(d) b = max{∆(ω) | ω is a codeword for X}.
Moreover, for every X ∈ SO(d), there is a unique codeword ω X for X such that ∆(ω X ) = b, and we have ω Y = G Y X −1 ω X for all X, Y ∈ SO(d).
Proof. Fix any X ∈ SO(d). Lemma 32 implies that all codewords for X lie on the line which starts at the maximally mixed state µ, crosses the state ω(X), and extends to infinity. Since the state space is compact and convex, there is a unique state ω X at which this line crosses the state space's boundary. By construction, it has the maximal value of ∆(ω) among all codewords for X. Set b := ∆(ω). For all Y ∈ SO(d), define ω Y := G Y X −1 ω X . It is easy to check that ω Y is a codeword for Y , and ∆(ω Y ) = b. If there was any other codeword ω Y = ω Y for Y with ∆(ω Y ) ≥ b, then ω X := G XY −1 ω Y would be a codeword for X with ∆(ω X ) ≥ b and ω X = ω X , which is impossible.
Exactly the same argumentation as in Lemma 18 -including the introduction of "Bloch vectors"ω for states ωnow proves the following: Lemma 35. The frame bit state space is equivalent to a Euclidean D-dimensional unit ball, and the map X →ω X is a homeomorphism of SO(d) to the unit sphere S D−1 .
Since SO(d) is not simply connected for d ≥ 2, but S D−1 is simply connected for D ≥ 3, this is only possible if D = 2 and thus (from dimension counting) d = 2. But in this case, frames and directions coincide.
Theorem 36. "Frame bit" state spaces allowing the protocol in Fig. 6, while at the same time satisfying Assumptions 1' and 2' above, do not exist -unless d = 2, where they coincide with direction bits.
At first sight, this result may seem surprising, in particular Lemma 35 which says that the frame bit state space must be a Euclidean unit ball, exactly as the direction bit state space. The first obvious guess, before doing any calculations, would have been that the pure normalized frame states ω X with X ∈ SO(d) can simply be parametrized by the matrix X as their "Bloch vector", i.e.ω X = X, similarly asω x = x for directions x up to an orthogonal transformation (cf. Lemma 22).
We will now illustrate that this first guess does not work: it results in a state space that satisfies Assumption 1', but not Assumption 2', confirming Theorem 36. We only discuss the simplest non-trivial case d = 3. Surprisingly, in this case, it turns out that our naive guess reproduces 4-level quantum theory over the real numbers.
Example 37. Suppose we define a state space Ω with the orthogonal matrices X ∈ SO(3) as the pure states. As usual, we have to add a component for the normalization, such that Ω becomes The vector space that carries the cone of unnormalized states is 10-dimensional; by construction, for every X ∈ SO(3), we have a pure state ω X = (1, X) ∈ Ω. Every (mixed) state ω ∈ Ω is then of the form ω = (1, M ) with M ∈ R 3×3 some matrix which, according to [69,Corollary 5.2], has operator norm M ∞ ≤ 1. We denote this state by ω M .
According to [70,Proposition 4.1], the full state space Ω is an orbitope which can be parametrized in the following way. Denote the normalized state space of 4-level quantum theory over the reals by Ω 4,R QM ; that is, Bob starts by choosing two arbitrary preparation devices at random, preparing two unknown states ω 1 , ω 2 . Generically, the corresponding Bloch vectorsω 1 ,ω 2 will be linearly independent (if, for some reason,ω 1 andω 2 turn out to be (close to) linearly dependent, the protocol will fail and Bob will have to start again). Now Bob determines M x (ω 1 ) = c + (a/2) ω x ,ω 1 for many different directions x by repeated measurements. He never knows which direction x he is currently actually measuring, but by trying out many different directions, he can determine a good estimate of max x M x (ω 1 ) = c + (a/2)|ω 1 | and thus of |ω 1 |, and he may rotate his device in a direction which is very close to the actually maximizing direction x that satisfiesω x =ω 1 /|ω 1 | (still, without knowing any coordinate description of x orω x ).
If Bob holds a second measurement device which points in another unknown direction z, he may do the same thing, and altogether compute the angle ∠(ω y ,ω z ) between the two direction's Bloch vectors. While there was some freedom to assign an orthonormal frame to establish coordinates forω 1 andω 2 , this angle is independent of the specific choice of frame.
As shown in Lemma 22 in Appendix A, there exists some orthogonal transformation O ∈ O(d) such thatω x = Ox for all directions x. Thus, if Bob's space carries a metric, such that there is an actual physical angle ∠(y, z) between the two devices' directions, we have ∠(y, z) = ∠(ω y ,ω z ), and the angle that Bob determines by probability measurements must agree with the actual physical angle. Now we give a protocol which allows an observer to determine the angle between different direction bit measurement devices (or different settings of the same device), by means of probability measurements, in arbitrary dimensions d ≥ 2. The protocol will yield more or less accurate estimates of the corresponding angle, depending on the statistical effort that the observer spends to obtain probability estimates. We assume that the observer knows the outcome i 0 as defined in the main text, as well as the visibility and noise parameters a and c, and the spatial dimension d.
Protocol 38. In d ≥ 2 spatial dimensions, an observer (called Bob) can estimate the angle ∠(y, z) between two given measurement devices M y and M z (acting on systems according to Postulates 1 and 2) by the following protocol: 1. Bob randomly selects d direction bit preparation devices that he finds in his lab (or in nature), preparing (unknown) direction bit states ω 1 , . . . , ω d .
The protocol will assume that the corresponding Bloch vectorsω 1 , . . .ω d are linearly independent, which is generically the case. Otherwise, the protocol fails and has to be repeated.
2. For every i = 1, . . . , d, Bob measures ω i in many different (unknown) directions x ∈ R d , |x| = 1. This way, he determines max x M x (ω i ) = c + (a/2)|ω i |, and he can rotate his device close to the (unknown) maximizing direction x i whereω xi =ω i /|ω i |, setting the device up to perform the measurement M xi .
4. Bob computes any matrix S that solves the equation S T S = X. A solution of this kind exists: any matrix S with columnsω 1 , . . . ,ω d is a solution, ifω i is the coordinate representation ofω i in any orthonormal basis. Conversely, it follows from the polar decomposition that every solution is of this form.
Hence, in this step of the protocol, Bob obtains the coordinates of theω 1 , . . . ,ω d in some orthonormal basis.
5. For any pair of measurement devices pointing in directions y and z, Bob can determine the coordinates ofω y andω z in the previously obtained orthonormal basis by measuring M y (ω i ) = c + (a/2) ω y ,ω i and M z (ω i ) for i = 1, . . . , d, and therefore compute ∠(ω x ,ω y ). But according to Lemma 22, there is some orthogonal matrix O such thatω x = Ox andω y = Oy, hence this angle equals ∠(x, y).
As announced in Section C, we now give a modification of the direction bit setup, showing that physical space can in some situations inherit its linear and Euclidean structure from state space. The following example is not meant to describe actual physics in our universe; it is simply a "proof of principle" demonstrating the mechanism under very specific conditions. Example 39. Imagine an observer Bob in d-dimensional space, which is simply a topological manifold M . Bob's local laboratory is assumed to reside in a (small) part of this manifold, in the vicinity of some point p ∈ M . We assume that there are systems C (say, internal degrees of freedom of particles) described by a convex state space Ω C which is also d-dimensional, but not necessarily a Euclidean ball.
We also assume that there is an analogue of a direction bit measurement device which can "point in different directions" and can be "rotated". However, since M does not carry a metric tensor, Bob's local laboratory space does not carry an inner product (there is not even the notion of a tangent space to begin with). Thus, there is no literal notion of direction vectors or rotations, and we have to define what we mean by these notions in a generalized sense.
We do this by assuming that there is a special (small) open neighborhood U of p that is homeomorphic to a ddimensional Euclidean ball, with a topological boundary ∂U homeomorphic to the (d − 1)-sphere S d−1 , such that for every x ∈ ∂U there is an effect E x ∈ E C which describes the first outcome of the "device pointing in direction x". Formally, we only assume that the measurement device can be in different macroscopic states indexed by x ∈ ∂U . The concrete physical interpretation will be left completely open, with the wording "pointing in direction x" chosen only to supply a more concrete mental picture.
For simplicity, let us assume that we have a 2-outcome device, with outcomes labelled "yes" and "no", such that E x (ω) yields the probability of outcome "yes" if the device "points in direction x" and is applied to the state ω ∈ Ω C . For obvious physical reasons, E x should be continuous in x. A sketch is given in Fig. 9. We make an additional important assumption, namely that the effects determine the space points; that is, if x = y, then E x = E y .
FIG. 9: The topological manifold M and a neighborhood U of a point p with a boundary that is homeomorphic to a (d − 1)sphere. The different "directions" x ∈ ∂U in which a measurement device may be oriented are sketched as black arrows, illustrating the intuition that a measurement device "points" in the corresponding direction. However, the elements x ∈ ∂U are in general not vectors in any mathematically well-defined sense; they are only meant to label the different possible states of the macroscopic measurement device, leading to different types of measurements.
The analog of a "rotation" is then any physical transformation which takes a measurement device pointing in direction x ∈ ∂U to point in some other direction y = H(x) ∈ ∂U . Which transformations with corresponding maps H are actually possible depends on the physics in Bob's universe. To comply with some of our intuition on rotations, we only consider those transformations that are continuous and can be physically reversed by some inverse transformation of this kind. We will assume that the relevant physical quantities (measurement devices, particles etc.) are exactly as before if H and then H −1 is applied (however, there may be parts of the universe that have changed in this process; for example, a distant observer may have noticed the applications of H and then of H −1 and kept some memory of this).
Since these transformations map ∂U continuously onto itself, and ∂U is homeomorphic to the (d − 1)-sphere S d−1 , we obtain a subgroup H of homeomorphisms of ∂U , which can also be seen as a subgroup of homeomorphisms of the unit sphere S d−1 . So far, there is no reason why the transformations H ∈ H should act linearly; this notion does not even have any meaning at this point. We assume that these transformations allow Bob to collimate his device in any "direction" x ∈ ∂U that he likes; in other words, H acts transitively on ∂U .
But now suppose that some of the transformations H ∈ H have impact on the measured outcome probabilities: the probability to see the first outcome may change if the measurement device is transformed via H. If it makes sense that Bob undergoes the transformation H together with the measurement device such that the device has not changed from his perspective (that is, a "joint rotation"), he will model the net effect on the probabilities by some other transformation H that acts on the state ω instead. It must satisfy the equation for all x ∈ ∂U, ω ∈ Ω C .
We will assume that this is possible. Due to the probabilistic interpretation of states, the map H must be linear. Because of the assumed reversibility of H, it must be a reversible transformation, i.e. H ∈ G C . We do not yet know whether this equation and linearity determine H uniquely. Let ω ∈ Ω C be any pure state, and define µ := G C G(ω) dG (which may well depend on the choice of ω). Then in particular H µ = µ for all possible H , since H ∈ G C . Thus, if x, y ∈ ∂U are arbitrary, there is some H ∈ H such that H(x) = y, and E y (µ) = E H(x) (µ) = E x (H µ) = E x (µ) =: m ≥ 0.
Thus, all E x lie in the d-dimensional affine subspace A := {E | E(µ) = m} of the (d + 1)-dimensional dual space C * . Let A ⊆ A be the affine span of all E x , and d := dim A . Let h : ∂U → S d−1 be a homeomorphism and j : A → R d an invertible affine map. Then the map s → j E h −1 (s) is a continuous injective map from S d−1 to R d . Due to Lemma 40, we must have d ≥ d = dim A, and so A = A. Suppose that m = 0; in this case, relabel the two outcomes of the device "yes↔no", such that the new E x satisfy E x (µ) = 1 for all x ∈ ∂U . Thus, we may assume that m > 0, such that A is not a linear subspace. Consequently, the E x linearly span C * , and so eq. (C1) determines H uniquely.
If H does not alter the outcome probabilities, the corresponding map H will be the identity map; in particular, H → H need not be injective. Again, since the E x span C * , we obtain E x → E H(x) = E x • H =: L H (E x ) extends to a linear invertible map L H from the dual space C * to itself.
Let H be the topological closure of the group of all H , where H ∈ H. Since it is a subset of the compact group of reversible transformations of C, it must itself be compact. Let y 1 , . . . , y d+1 ∈ ∂U be any set of points such that E y1 , . . . , E y d+1 is a basis of C * . Define the subspace S of C * by S := {f ∈ C * | f (µ) = 0}.
Then the functionals E yi −m U C span S, and we can find d of them which constitute a basis of S. Call these functionals F 1 , . . . , F d (in some arbitrary order). Now we can define a coordinate map Λ : S → R d via This allows us to define coordinates λ(x) of space points x ∈ ∂U via What is the action of H in these coordinates? Since L H (U C ) = U C , we have (C2) In other words, we have constructed a fictitious d-dimensional linear space such that all x ∈ ∂U can be represented as elements of this linear space, and all transformations H ∈ H act linearly (represented by Λ • L H • Λ −1 ). This vector space structure is inherited from the convexity of probabilities. Define the group L as the topological closure of {Λ • L H • Λ −1 | H ∈ H}. Due to L H (E) = E • H , compactness of H implies compactness of L. Thus, there is an inner product on R d such that v, w = Lv, Lw for all v, w ∈ R d and L ∈ L. With respect to this inner product, L is a subgroup of SO(d). Hence eq. (C2) implies that λ H(x) = λ(x) for the corresponding norm, and since H is transitive on ∂U , we obtain that there exists r > 0 such that λ(x) = r for all x ∈ ∂U.
Let h : ∂U → S d−1 be a homeomorphism, then (1/r) · λ • h −1 is a continuous injective map from the sphere S d−1 into itself. According to Lemma 41, it must be surjective -in other words, the set { λ(x) | x ∈ ∂U } is the full sphere of radius r in R d . Since L is transitive on this sphere, we see that L acts irreducibly on R d ; hence the inner product ·, · is in fact unique.
In other words, we have obtained a unique Euclidean structure on our vector space representation of ∂U , inherited from the group of reversible transformations on state space.
Thus, we obtain H (ω) =H −1ω . In summary, for two states ϕ, ω ∈ Ω C , we have In other words, H acts on the subspace containing theω as the group L; since L is transitive on the unit sphere, this implies that the set of "Bloch vectors" {ω | ω ∈ Ω C } is a Euclidean ball (of some radius), and equivalent to a d-dimensional Euclidean unit ball, exactly as direction bits are. Thus, Bob may use Protocol 38 to determine angles between different orientations of his measurement device. In retrospect, we also see that the maximally mixed state µ is unique (it is the center of the ball); hence the linear structure that we have constructed is unique as well.
The following lemmas have been used in Example 39. Proof. Suppose there was a non-surjective continuous injective map f from S d−1 to itself. Let s ∈ S d−1 be any point which is not attained by f ; then f can be interpreted as a continuous injective map from S d−1 to the punctured sphere S d−1 \ {s}, which is well-known [63] to be homeomorphic to R d−1 . Let g : S d−1 \ {s} → R d−1 be a corresponding homeomorphism, then g • f is a continuous injective map from S d−1 to R d−1 , contradicting Lemma 40.
Appendix D: Simplified proof of the result of Section 4.3 in Ref. [54] for SO(d) In [54], it has been shown that two noiseless d-dimensional Euclidean ball state spaces can be combined into an interacting, joint state space if and only if d = 3. In that paper, we have considered the general case of ball state spaces with any compact group of reversible transformations which is transitive on the unit sphere. However, here, we are only interested in the special case that the group of reversible transformations on a direction bit contains the full orthogonal group SO(d), as established in Lemma 18. It turns out that this simplifies the proof of a key lemma in [54] significantly.
Here, we give the simplified proof, as a reference for readers who would like to follow the argumentation in [54]. Therefore, we do not introduce the relevant notation here in the appendix, but refer the reader to the introductory chapters of [54], and just use the notation that has been introduced there. Proof. Since W is antisymmetric, so are X and Y , which are thus generators of rotations. By assumption, we can perform the rotations exp(tX) ⊗ exp(tŶ ) on the joint system, which are generated by