A derivation of quantum theory from physical requirements

Quantum theory is usually formulated in terms of abstract mathematical postulates, involving Hilbert spaces, state vectors, and unitary operators. In this work, we show that the full formalism of quantum theory can instead be derived from five simple physical requirements, based on elementary assumptions about preparation, transformations and measurements. This is more similar to the usual formulation of special relativity, where two simple physical requirements -- the principles of relativity and light speed invariance -- are used to derive the mathematical structure of Minkowski space-time. Our derivation provides insights into the physical origin of the structure of quantum state spaces (including a group-theoretic explanation of the Bloch ball and its three-dimensionality), and it suggests several natural possibilities to construct consistent modifications of quantum theory.


I. INTRODUCTION
Quantum theory is usually formulated by postulating the mathematical structure and representation of states, transformations, and measurements. The general physical consequences that follow (like violation of Bell-type inequalities [1], the possibility of performing state tomography with local measurements, or factorization of integers in polynomial time [2]) come as theorems which use the postulates as premises. In this work, this procedure is reversed: we impose five simple physical requirements, and this suffices to single out quantum theory and derive its mathematical formalism uniquely. This is more similar to the usual formulation of special relativity, where two simple physical requirements -the principles of relativity and light speed invariance-are used to derive the mathematical structure of Minkowski space-time and its transformations.
The requirements can be schematically stated as: 1. In systems that carry one bit of information, each state is characterized by a finite set of outcome probabilities.
2. The state of a composite system is characterized by the statistics of measurements on the individual components.
3. All systems that effectively carry the same amount of information have equivalent state spaces. 4. Any pure state of a system can be reversibly transformed into any other. 5. In systems that carry one bit of information, all mathematically well-defined measurements are allowed by the theory.
These requirements are imposed on the framework of generalized probabilistic theories [3][4][5][6][7][8][9], which already as-sumes that some operational notions (preparation, mixture, measurement, and counting relative frequencies of measurement outcomes) make sense. Due to its conceptual simplicity, this framework leaves room for an infinitude of possible theories, allowing for weaker-or strongerthan-quantum non-locality [6,[10][11][12][13][14]. In this work, we show that quantum theory (QT) and classical probability theory (CPT) are very special among those theories: they are the only general probabilistic theories that satisfy the five requirements stated above. The non-uniqueness of the solution is not a problem, since CPT is embedded in QT, thus QT is the most general theory satisfying the requirements. One can also proceed as Hardy in [4]: if Requirement 4 is strengthened by imposing continuity of the reversible transformations, then CPT is ruled out and QT is the only theory satisfying the requirements. This strengthening can be justified by the continuity of time evolution of physical systems.
It is conceivable that in the future, another theory may replace or generalize QT. Such a theory must violate at least one of our assumptions. The clear meaning of our requirements allows to straightforwardly explore potential features of such a theory. The relaxation of each of our requirements constitutes a different way to go beyond QT.
The search for alternative axiomatizations of quantum theory (QT) is an old topic that goes back to Birkhoff and von Neumann [8], and has been approached in many different ways: extending propositional logic [7,8], using operational primitives [3][4][5][6]9], searching for informationtheoretic principles [5,6,10,11,[19][20][21], building upon the phenomenon of quantum nonlocality [6,[10][11][12][13]. Alfsen and Shultz [22] have accomplished a complete characterization of the state spaces of QT from a geometric point of view, but the result does not seem to have an immediate physical meaning. In particular, the fact that the state space of a generalized bit is a three-dimensional ball is an assumption there, while here it is derived from physical requirements.
This work is particularly close to [4,19], from where it takes some material. More concretely, the multiplicativity of capacities and the Simplicity Axiom from [4] are replaced by Requirement 5. In comparison with [19], the fact that each state of a generalized bit is the mixture of two distinguishable ones, the maximality of the group of reversible transformations and its orthogonality, and the multiplicativity of capacities, are also replaced by Requirement 5.
Summary of the paper. Section II contains an introduction to the framework of generalized probabilistic theories, where some elementary results are stated without proof. In Section III the five requirements and their significance are explained in full detail. Section IV is the core of this work. It contains the characterization of all theories compatible with the requirements, concluding that the only possibilities are CPT and QT. The Conclusion (Section V) recapitulates the results and adds some remarks. The Appendix contains all lemmas and their proofs.

II. GENERALIZED PROBABILISTIC THEORIES
In CPT there can always be a joint probability distribution for all random variables under consideration. The framework of generalized probabilistic theories (GPTs), also called convex operational framework, generalizes this by allowing the possibility of random variables that cannot have a joint probability distribution, or cannot be simultaneously measured (like noncommuting observables in QT).
This framework assumes that at some level there is a classical reality, where it makes sense to talk about experimentalists performing basic operations such as: preparations, mixtures, measurements, and counting relative frequencies of outcomes. These are the primary concepts of this framework. It also provides a unified way for all GPTs to represent states, transformations and measurements. A particular GPT specifies which of these are allowed, but it does not tell their correspondence with actual experimental setups. On its own, a GPT can still make nontrivial predictions like: the maximal violation of a Bell inequality [1], the complexity-theoretic computational power [2,18], and in general, all informationtheoretic properties of the theory [6].
The framework of GPTs can be stated in different ways, but all lead to the same formalism [3][4][5][6][7][8][9]. This formalism is presented in this section at a very basic level, providing some elementary results without proofs.
A. States there are the preparation, transformation and measurement devices. As soon as the release button is pressed, the preparation device outputs a physical system in the state specified by the knobs. The next device performs the transformation specified by its knobs (which in particular can be "do nothing"). The device on the right performs the measurement specified by its knobs, and the outcome (x orx) is indicated by the corresponding light.
tion, transformation and measurement devices, the relative frequencies of the outcomes tend to a unique probability distribution (in the large sample limit).
The probability of a measurement outcome x is denoted by p(x). This outcome can be associated to a binary measurement which tells whether x happens or not (this second eventx has probability p(x) = 1 − p(x)). The above definition of system allows to associate to each preparation procedure a list of probabilities for the outcomes of all measurements that can be performed on a system. As we show in Subsection IV C below, our requirements imply that all these probabilities p(x) are determined by a finite set of them; the smallest such set is used to represent the state The measurement outcomes that characterize the state x 1 , . . . , x d are called fiducial, and in general, there is more than one set of them (for example, a 1 2 -spin particle in QT is characterized by the spin in any 3 linearly-independent directions). Note that each of the fiducial outcomes can correspond to a different measurement. The redundant component ψ 0 = 1 is reminiscent of QT, where one of the diagonal entries of a density matrix is redundant, since they sum up to 1. In fact ψ 0 = 1 is sometimes used to represent unnormalized states, but not in this paper, where only normalized states are considered. The redundant component ψ 0 allows to use the tensor-product formalism in composite systems (Subsection II D), which simplifies the notation.
The set of all allowed states S is convex [23], because if ψ 1 , ψ 2 ∈ S then one can prepare ψ 1 with probability q and ψ 2 with probability 1 − q, effectively preparing the state qψ 1 +(1−q)ψ 2 . The number of fiducial probabilities d is equal to the (affine) dimension of S, otherwise one fiducial probability would be functionally related to the others, and hence redundant.
Suppose there is a R d+1 -vector ψ / ∈ S which is in the topological closure of S -that is, ψ can be approximated by states ψ ′ ∈ S to arbitrary accuracy. Since there is no observable physical difference between perfect preparation and arbitrarily good preparation, we will consider ψ to be a valid state and add it to the state space. This does not change the physical predictions of the theory, but it has the mathematical consequence that state spaces become topologically closed. Since state vectors (1) are bounded, and we are in finite dimensions (shown in Subsection IV C), state spaces S are compact convex sets [23].
The pure states of a state space S are the ones that cannot be written as mixtures: ψ = qψ 1 + (1 − q)ψ 2 with ψ 1 = ψ 2 and 0 < q < 1. Since S is compact and convex, all states are mixtures of pure states [23].

B. Measurements
The probability of measurement outcome x when the system is in state ψ ∈ S is given by a function Ω x (ψ). Suppose the system is prepared in the mixture qψ 1 + (1 − q)ψ 2 , then the relative frequency of outcome x does not depend on whether the label of the actual preparation ψ k is ignored before or after the measurement, hence This means that the function Ω x is affine on S. The redundant component ψ 0 in (1) allows to write this function as a linear map Ω x : R d+1 → R [3,6].
An effect is a linear map Ω : R d+1 → R such that Ω(ψ) ∈ [0, 1] for all states ψ ∈ S. Every function Ω x associated to an outcome probability p(x) is an effect. The converse is not necessarily true: the framework of GPTs allows to construct theories where some effects do not represent possible measurement outcomes. These restrictions are analogous to superselection rules, where some (mathematically well-defined) states are not allowed by the physical theory. This is related to Requirement 5. A tight effect Ω is one for which there are two states ψ 0 , ψ 1 ∈ S satisfying Ω(ψ 0 ) = 0 and Ω(ψ 1 ) = 1.
An n-outcome measurement is specified by n effects Ω 1 , . . . , Ω n such that Ω 1 (ψ) + · · · + Ω n (ψ) = 1 for all ψ ∈ S. The number Ω a (ψ) is the probability of outcome a when the measurement is performed on the state ψ. The states ψ 1 , . . . , ψ n are distinguishable if there is an n-outcome measurement such that Ω a (ψ b ) = δ a,b , where The capacity of a state space S is the size of the largest family of distinguishable states, and is denoted by c. This is the amount of classical information that can be transmitted by the corresponding type of system, in a singleshot error-free procedure. (In QT the capacity of a system is the dimension of its corresponding Hilbert space; which must not be confused with the dimension of the state space d = c 2 − 1, that is, the set of c × c complex matrices that are positive and have unit trace.) A complete measurement on S is one capable of distinguishing c states.

C. Transformations
Each type of system has associated to it: a state space, a set of measurements, and a set of transformations. A transformation T is a map T : S → S. Similarly as for measurements, if a state is prepared as a mixture qψ 1 + (1 − q)ψ 2 , it does not matter whether the label of the actual preparation ψ k is ignored before or after the transformation. Hence which implies that T is an affine map. The redundant component ψ 0 in (1) allows to extend T to a linear map T : R d+1 → R d+1 [3,6].
A transformation T is reversible if its inverse T −1 exists and belongs to the set of transformations allowed by the theory. The set of (allowed) reversible transformations of a particular state space S forms a group G. For the same reason as for the state space itself, we will assume that the group of reversible transformations is topologically closed. Previously we have seen that a state space S is bounded, hence the corresponding group of transformations G is bounded, too. In summary, groups of transformations are compact [24].

D. Composite systems
Definition of composite system. Two systems A, B constitute a composite system, denoted AB, if a measurement for A together with a measurement for B uniquely specifies a measurement for AB. This means that if x and y are measurement outcomes on A and B respectively, the pair (x, y) specifies a unique measurement outcome on AB, whose probability distribution p(x, y) does not depend on the temporal order in which the subsystems are measured.
The fact that subsystems are themselves systems implies that each has a well-defined reduced state ψ A , ψ B which does not depend on which transformations and measurements are performed on the other subsystem (see definition of system in Subsection II A). This is often referred to as no-signaling. Let x 1 , . . . , x dA be the fiducial measurements of system A, and y 1 , . . . , y dB the ones of B. The no-signaling constraints are for all i, j.
An assumption which is often postulated additionally in the GPT context is Requirement 2, which says that the state of a composite system is completely characterized by the statistics of measurements on the subsystems, that is, p(x, y). This and no-signaling (2) imply that states in AB can be represented on the tensor product vector space [3] as The joint probability of two arbitrary local measurement outcomes x, y is given by where Ω x is the effect representing x in A, that is p(x) = Ω x (ψ A ), and analogously for Ω y [3]. (The term "local" is used when referring to subsystems, and has nothing to do with spatial locations.) In other words, , . . . , n ; b = 1, . . . , m} defines a measurement on AB with nm outcomes. Local transformations act on the global state as where T A is the matrix that represents the transformation in A, and analogously for T B [3]. The reduced states are obtained from ψ AB by picking the right components (3). Alternatively, reduced states can be defined by is the unit effect. The reduced state ψ A must belong to the state space of subsystem A, denoted S A , and any state in S A must be the reduction of a state from S AB . (Analogously for subsystem B.) This implies that all product states are contained in S AB [3], and similarly, all tensor products of local measurements and transformations are allowed on AB.
Given two fixed state spaces S A and S B , the previous discussion imposes constraints on the state space of the composite system S AB . However, there are still many different possible joint state spaces S AB , and some of them allow for larger violations of Bell inequalities than QT. In fact, this has been extensively studied [5,6,[10][11][12][13][14], and is one of the reasons for the popularity of generalized probabilistic theories.
Nothing prevents Bob's system from being composite itself; hence one can recursively extend the definition of composite system and formulas (3), (4), (5), and (7) to more parties.

E. Equivalent state spaces
Let L : S → S ′ be an invertible affine map. If all states are transformed as ψ → L(ψ), and all effects on S are transformed as Ω → Ω • L −1 , then the outcome probabilities Ω(ψ) are kept unchanged. Analogously, if all transformations on S are mapped as T → L • T • L −1 then their action on the states is the same. The new state space S ′ , together with the transformed effects and transformations, is then just a different representation of S. In this case, we call S and S ′ equivalent. In the new representation, the entries of ψ need not be probabilities as in (1), but it may have other advantages. In this work, several representations are used.
In the standard formalism of QT, states are represented by density matrices, however they can also be represented as in (1).
Changing the set of fiducial measurements is a particular type of L-transformation. For example, if the components of the Bloch vector (of a quantum spin-1 2 particle) correspond to spin measurements in non-orthogonal directions, then the Bloch sphere becomes an ellipsoid.

F. Instances of generalized probabilistic theories
QT is an instance of GPT, and can be specified as follows. The state space S c with capacity c is equivalent to the set of complex c × c-matrices ρ such that ρ ≥ 0 and trρ = 1. This set has dimension d c = c 2 − 1, and its pure states are rank-one. The effects on S c have the form Ω(ρ) = tr(M ρ), where M is a complex c×c-matrix such that 0 ≤ M ≤ I. The reversible transformations act as ρ → V ρV † with V ∈ SU(c). The capacity of a composite system AB is the product of the capacities for the subsystems CPT is another instance of GPT, and can be specified as follows. The state space S c with capacity c is equivalent to the set of c-outcome probability distributions [p(1), . . . , p(c)], which has dimension d c = c − 1 (in geometric terms, each S c is a simplex). The pure states are the deterministic distributions p(a) = δ a,b with b = 1, . . . , c. The c-outcome measurement with effects Ω a (ψ) = p(a) for a = 1, . . . , c, distinguishes the c pure states, hence it is complete. Any other measurement is a function of this one. The reversible transformations act by permuting the entries of the state [p(1), . . . , p(c)]. The capacity of a composite system is also c AB = c A c B . Note that CPT can be obtained by restricting the states of QT to diagonal matrices. In other words, CPT is embedded in QT.
An instance of GPT that is not observed in nature is generalized no-signaling theory [6], colloquially called boxworld. By definition, state spaces contain all correlations (3) satisfying the no-signaling constraints (2). Such state spaces have finitely many pure states, and some of them violate Bell inequalities stronger than any quantum state [12]. The effects in boxworld are all generated by products of local effects. The group of reversible transformations consists only of relabellings of local measurements and their outcomes, permutations of subsystems, and combinations thereof [14].

III. THE REQUIREMENTS
This section contains the precise statement of the requirements, each followed by explanations about its significance.

Requirement 1 (Finiteness).
A state space with capacity c = 2 has finite dimension d.
If this did not hold, the characterization of a state of a generalized bit would require infinitely many outcome probabilities, making state estimation impossible. It is shown below that this requirement, together with the others, implies that all state spaces with finite capacity c have finite dimension.
Requirement 2 (Local tomography). The state of a composite system AB is completely characterized by the statistics of measurements on the subsystems A, B.
In other words, state tomography [3] can be performed locally. This is equivalent to the constraint [3,4]. This requirement can be recursively extended to more parties by letting subsystems A, B to be themselves composite.

Requirement 3 (Equivalence of subspaces).
Let S c and S c−1 be systems with capacities c and c − 1, respectively. If The notions of complete measurements and equivalent state spaces are defined in Subsections II B and II E. In particular, equivalence of S c−1 and implies that all measurements and reversible transformations on one of them can be implemented on the other. This requirement, first introduced in [4], implies that all state spaces with the same capacity are equivalent: if S c−1 andS c−1 are state spaces with capacity c − 1, then both are equivalent to (9), hence they are equivalent to each other. In other words, the only property that characterizes the type of system is the capacity for carrying information. If we start with S c and apply Requirement 3 recursively, we get a more general formulation: consider any subset of outcomes {a 1 , . . . , a c ′ } ⊆ {1, . . . , c} of the complete measurement Ω 1 , . . . , Ω c , then the set of states ψ ∈ S c with Ω a1 (ψ) + · · · + Ω a c ′ (ψ) = 1 (10) is equivalent to the state space S c ′ with capacity c ′ . This provides an onion-like structure for all state spaces The particular structure of QT simplifies the task of assigning a state space to a physical system or experimental setup. It is not necessary to consider all possible states of the system, but instead, the relevant ones for the context being analyzed. For example, an atom is sometimes modeled with a state space having two distinguishable states (c = 2), even though its constituents have many more degrees of freedom. In particular, if we know that only two energy levels are populated with nonzero probability, we can ignore all others and effectively get a genuine quantum 2-level state space. In a theory where this is not true, the effective state space might depend on how many unpopulated energy levels are ignored, or on the detailed internal state of the electron, for example. In order to avoid pathologies like this, we postulate Requirement 3.
The set of reversible transformations of a state space S c forms a group, denoted G c . This group endows S c with a symmetry, which makes all pure states equivalent. A group G c is said to be continuous if it is topologically connected: any transformation is the composition of many infinitesimal ones [24]. Hardy invokes the continuity of time-evolution in physical systems to justify the continuity of reversible transformations [3,4]; in this case, state spaces S c must have infinitely-many pure states; this rules out CPT and singles out QT. However, all the analysis in this work is done without imposing continuity, since we find it very interesting that the only theory with state spaces having finitely-many pure states, and satisfying the requirements, is CPT.
Requirement 5 (All measurements allowed). All effects on S 2 are outcome probabilities of possible measurements.
It is shown below that, in combination with the other requirements, this implies that all effects on all state spaces (with arbitrary c) appear as outcome probabilities of measurements in the resulting theory. Note that Requirement 5 has non-trivial consequences in conjunction with the other requirements: adding effects as allowed measurements to a physical theory extends the applicability of Requirement 3.
For completeness, we would like to mention that Requirement 5 can be replaced by the following postulate, which has first been put forward in an interesting paper that appeared after completion of this work [34]. It calls a state "completely mixed" if it is in the relative interior of state space. See Lemma 9 in the appendix for how the proof of our main result has to be modified in this case. Requirement 5' [34]. If a state is not completely mixed, then there exists at least one state that can be perfectly distinguished from it.

IV. CHARACTERIZATION OF ALL THEORIES SATISFYING THE REQUIREMENTS
A. The maximally-mixed state We use the following notation: the system with capacity c has state space S c with dimension d c and group of reversible transformations G c . The group G c is compact (Section II C), and hence, has a normalized invariant Haar measure [26]. This allows to define the maximallymixed state where ψ ∈ S c is an arbitrary pure state. It follows from Requirement 4 that the resulting state µ c does not depend on the choice of the pure state ψ. By construction, the maximally-mixed state is invariant: Moreover, Lemma 1 shows that it is the only invariant state in S c (this lemma and all others are stated and proven in the appendix).

B. The generalized bit
A generalized bit is a system with capacity two. For any state ψ ∈ S 2 in the standard representation (1), its Bloch representation is defined bŷ States in the Bloch representation do not have the redundant component ψ 0 , so equations (4, 5, 7) become less simple. The invertible map L : S 2 →Ŝ 2 is affine but not linear; hence, effects Ω in the Bloch representation (Ω = Ω • L −1 ) are affine but not necessarily linear.
The same applies to transformations (Ĝ = L • G • L −1 ), however, the maximally-mixed state in the Bloch representation is the null vectorμ 2 = 0, therefore (12) be-comesĜ(0) = 0, which implies thatĜ acts linearly (as a matrix). FIG. 2: The left figure is a state space whose boundary consists of facets (like Ω = 1). Each facet contains infinitely many states (Ω = 1 contains ψ1, ψ2 and all ψmix = qψ1 + (1 − q)ψ2). The right figure is a state space whose boundary has no facets. Any state space has supporting hyperplanes containing a unique state (like Ωone = 1 in both figures). Proof. In any convex set, pure states belong to the boundary [23]. Let us see the converse.
It is shown in [27] that any compact convex set has a supporting hyperplane containing exactly one point of the set. Translated to our language: there is a tight effect Ω one onŜ 2 such that only one stateφ one ∈Ŝ 2 satisfieŝ Ω one (φ one ) = 1; this is illustrated in FIG. 2. According to Requirement 5, the effectΩ one corresponds to a valid measurement outcome, and so does1 −Ω one , wherê 1(ψ) = 1 for allψ ∈Ŝ 2 . Thus, the two effectsΩ one and 1 −Ω one define a complete measurement onŜ 2 . Imposing Requirement 3 on the single outcomeΩ one constrains the state space with unit capacityŜ 1 to contain only one state.
Suppose there is a point in the boundaryφ mix ∈ ∂Ŝ 2 which is not pure:φ mix = qφ 1 + (1 − q)φ 2 withφ 1 = ϕ 2 and 0 < q < 1. Every point in the boundary of a compact convex set has a supporting hyperplane which contains it [23]. In our language: there is a tight effect Ω onŜ 2 such thatΩ(φ mix ) = 1. The affine functionΩ is bounded:Ω(φ) ≤ 1 for anyφ ∈Ŝ 2 , which implieŝ Ω(φ 1 ) =Ω(φ 2 ) = 1; this is illustrated in FIG. 2. Likê Ω one , the effectΩ defines a complete measurement, and Requirement 3 can be imposed on the single outcomeΩ, implying thatŜ 1 contains more than one state. This is in contradiction with the previous paragraph; hence, all points in the boundary are pure.
For the case d 2 = 1, the state space S 2 is a segment (a 1-dimensional ball), hence the previous and next theorems are trivial. For d 2 > 1, the previous theorem implies that S 2 contains infinitely-many pure states. The next theorem recovers the (quantum-like) Bloch sphere with a yet unknown dimension d 2 .
Theorem 2. There is a set of fiducial measurements for whichŜ 2 is a d 2 -dimensional unit ball.
Proof. Lemma 2 shows that there is an invertible real matrix S such that for eachĜ ∈Ĝ 2 the matrix SĜS −1 is orthogonal. Let us redefine the setŜ 2 by transforming the states asφ →φ ′ = qSφ, where the number q > 0 is chosen such that all pure states are unit vectors |φ ′ | 2 =φ ′Tφ′ = 1. This is possible because in the transformed state space, all pure states are related by orthogonal matrices (SGS −1 ) which preserve the norm. Since Theorem 1 also applies to the redefined setŜ ′ 2 , it must be a unit ball. In what follows we define a new set of fiducial measurements x ′ i such that the Bloch representation (13) associated to the new fiducial probabilities p(x ′ i ) coincides with the redefinitionφ ′ . Requirement 5 tells that inŜ ′ 2 , all tight effects are allowed measurements. For each unit vectorν ∈ R d2 the functionΩν(φ ′ ) = (1 +ν Tφ′ )/2 is a tight effect on the unit ball, and conversely, all tight effects on the unit ball are of this form. The new set of fiducial measurements is a fixed orthonormal basis for R d2 . For any stateφ ′ the new fiducial probabilities are p( . This is just (13) with the new fiducial measurements (note that µ ′ 2 = 0 and µ ′i 2 =Ω x ′ i (0) = 1/2). In the rest of the paper, we will use the representation derived in Theorem 2 above, where the generalized bit is represented by a unit ball. Moreover, we will drop the prime inŜ ′ 2 , x ′ i ,φ ′ used in the proof, and simply writê S 2 , x i ,φ.
As argued above, for each pure state ϕ ∈ S 2 there is a binary measurement with associated effect such thatΩ ϕ (φ) = 1 andΩ ϕ (−φ) = 0. In summary, there is a correspondence between tight effects and pure states in S 2 , and each pure state belongs to a distinguishable pair {φ, −φ}.

C. Capacity and dimension
Requirements 1, 2 and 3 imply that a state space with finite capacity c has finite dimension d c , which generalizes Requirement 1. To see this, consider a system composed of m generalized bits, with state space denoted by S 2 ×m . Since d 2 is finite, equation (8) implies that S 2 ×m has finite dimension. Due to the fact that perfectly distinguishable states are linearly independent, its capacity, denoted c m , must be finite, too. Since systems with the same capacity are equivalent, we must have c m = c n for m = n, and the sequence of integers c 1 , c 2 , . . . is unbounded. For any capacity c there is a value of m such that c ≤ c m , hence by Requirement 3 we have S c ⊂ S 2 ×m , which implies that S c is finite-dimensional.
In QT, the maximally-mixed state (11) has two convenient properties. First property: if µ A and µ B are the maximally-mixed states of systems A and B, then the maximally-mixed state of the composite system AB is Second property: in the state space S c , there are c pure distinguishable states ψ 1 , . . . , ψ c ∈ S c such that Lemmas 3 and 5 show that these two properties hold for every theory satisfying our requirements. The following theorem exploits these properties to show that the capacity is multiplicative (one of the axioms in [4]).
Theorem 3. If c A and c B are the capacities of systems A and B, then the capacity of the composite system AB is Proof. Equation (17) allows to write the maximallymixed states of systems A and B as where ϕ A 1 , . . . , ϕ A cA ∈ S A are pure and distinguishable, and ϕ B 1 , . . . , ϕ B cB ∈ S B are pure and distinguishable, too. This and equation (16) imply All states ϕ A a ⊗ ϕ B b ∈ S AB are distinguishable with the tensor-product measurement, therefore Let (Ω 1 , . . . , Ω cAB ) be a complete measurement on AB which distinguishes the states ψ 1 , . . . , ψ cAB ∈ S AB ; that is Ω k (ψ k ′ ) = δ k,k ′ . According to Lemma 4 these states can be chosen to be pure. Since cAB k=1 Ω k (µ AB ) = 1, there is at least one value of k, denoted k 0 , such that The product of pure states ϕ A 1 ⊗ϕ B 1 is pure [3], hence Requirement 4 tells that there is a reversible transformation G ∈ G AB such that G(ψ k0 ) = ϕ A 1 ⊗ ϕ B 1 . The measurement (Ω 1 • G −1 , . . . , Ω cAB • G −1 ) distinguishes the states G(ψ 1 ), . . . , G(ψ cAB ). Inequality (21), the invariance of µ AB , expansion (19), the positivity of probabilities, and This and (20) imply (18).
It is shown in [4] that the two multiplicativity formulas (8) and (18) imply the existence of a positive integer r such that: for any c the state space S c has dimension The integer r is a constant of the theory, with values r = 1 for CPT and r = 2 for QT.

D. Recovering classical probability theory
Let us consider all theories with d 2 = 1. In this case, equation (22) becomes d c = c − 1. In [4], it is shown that the only GPT with this relation between capacity and dimension is CPT, as described in Subsection II F. We reproduce the proof for completeness. Proof. Let S c be a state space and (Ω 1 , . . . , Ω c ) a complete measurement which distinguishes the states ψ 1 , . . . , ψ c ∈ S c . The vectors ψ 1 , . . . , ψ c ∈ R c are linearly independent; otherwise ψ a = b =a t b ψ b and 1 = Ω a (ψ a ) = b =a t b Ω a (ψ b ) = 0 gives a contradiction. Therefore, any state ψ ∈ S c ⊆ R c can be written in this basis ψ = a q a ψ a where q a = Ω a (ψ) turns out to be the probability of outcome a. The numbers (q 1 , . . . , q c ) constitute a probability distribution, hence, there is a one-toone correspondence between states in S c and c-outcome probability distributions. This kind of set is called a d c -simplex. A similar argument shows that the effects Ω 1 , . . . , Ω c are linearly independent. Hence, any effect Ω on S c can be written as Ω = a h a Ω a , and the constraint 0 ≤ Ω(ψ a ) ≤ 1 implies 0 ≤ h a ≤ 1. In other words, every measurement on S c is generated by the complete one.
Every reversible transformation on S c is a symmetry of the d c -simplex, that is, a permutation of pure states. Due to Requirement 4, there is a reversible transformation on the bit S 2 which exchanges the two pure states. Using Requirement 3 inductively: if there is a transformation on S c−1 which exchanges two pure states and leaves the rest invariant, this transposition can be implemented on S c , also leaving all other pure states invariant. Therefore, all transpositions can be implemented in S c , and those generate the full group of permutations.

E. Reversible transformations for the generalized bit
In the rest of the paper, only theories with d 2 > 1 are considered. Theorem 2 shows thatŜ 2 is a d 2 -dimensional unit ball. Equation (22) for c = 2 implies that d 2 is odd. The pure states inŜ 2 are the unit vectors in R d2 . A reversible transformationĜ ∈Ĝ 2 maps pure states onto pure states, hence it preserves the norm and has to be an orthogonal matrixĜ T =Ĝ −1 . ThereforeĜ 2 is a subgroup of the orthogonal group O(d 2 ). Requirement 4 imposes that for any pair of unit vectorsφ,φ ′ there isĜ ∈Ĝ 2 such thatĜ(φ) =φ ′ . In other words,Ĝ 2 is transitive on the sphere [28,29]. According to Lemma 6, ifĜ 2 is transitive on the sphere, then the largest connected subgroupĈ 2 ⊆Ĝ 2 is also transitive on the sphere. The matrix groupĈ 2 is compact and connected, hence a Lie group (Theorem 7.31 in [24]). The classification of all connected compact Lie groups that are transitive on the sphere is done in [28,29]. For odd d 2 , the only possibility isĈ 2 = SO(d 2 ), except for d 2 = 7 where there are additional possibilities: (7) for any M ∈ O(7), where G 2 is the fundamental representation of the smallest exceptional Lie group [30]. For even d 2 , there are many more possibilities [28,29], but equation (22) implies that d 2 must be odd.

F. Two generalized bits
The joint state space of two S 2 systems is denoted by S 2,2 . The multiplicativity of the capacity (18) implies that S 2,2 is equivalent to S 4 . However, we write S 2,2 to emphasize the bipartite structure.
In what follows, instead of using the standard representation for bipartite systems (3) we generalize the Bloch representation to two generalized bits. A state ψ AB ∈ S 2,2 has Bloch representationψ AB = [α, β, C] (24) for i, j = 1, . . . , d 2 . Note that α =ψ A and β =ψ B are the reduced states in the Bloch representation (13). The correlation matrix can also be written as ȳ j ), and characterizes the correlations between subsystems. Product states have Bloch representation with rank-one correlation matrix. In QT, where d 2 = 3, two-qubit density matrices are often represented by [α, β, C] through formula (41). Definition (24) implies The invertible map L[ψ AB ] =ψ AB defined by (24) also determines the Bloch representation of effectsΩ = Ω • L −1 . In particular, the tensor-product of two effects of the form (15) is The map L also determines the action of reversible transformation in the Bloch representation. Since L is affine but not linear, the actionĜ = L • G • L −1 need not be linear. Identities (16,25) andμ 2 = 0 imply that the maximally-mixed state inŜ 2,2 isμ 2,2 = (μ 2 ,μ 2 ,μ 2μ T 2 ) = 0. This and (12) imply that transformationsĜ 2,2 act on the generic vector [α, β, C] as matrices. In particular, local transformations G A , G B ∈ G 2 act as Subsection IV E concludes thatĜ 2 consists of orthogonal matrices, and Lemma 8 shows that all transformations in G 2,2 are orthogonal, too. Orthogonal matrices preserve the norm of vectors, therefore all pure states ψ ∈ S 2,2 satisfy The constant in the right-hand side can be obtained by lettingψ = [α, α, αα T ] with |α| = 1.

G. Consistency in the subspaces of two generalized bits
In this subsection we use a trick introduced in [19]: to impose the equivalence between a particular subspace of S 2,2 and S 2 (Requirement 3).
Theorem 5. The state space of a generalized bit has dimension three (d 2 = 3).
Proof. Recall that the case under consideration is odd d 2 larger than one. The space S ′ 2 ⊂ S 2,2 is equivalent to S 2 , which is a d 2 -dimensional unit ball. If ϕ 0,0 and ϕ 1,1 are considered the poles of this ball, then the equator is the set of states ψ eq such that Ω 0,0 (ψ eq ) = Ω 1,1 (ψ eq ) = 1/2. Equations (30,31) tell that equator states have α 1 = β 1 = 0, and then whereᾱ,β,γ,τ ∈ R d2−1 andC ∈ R (d2−1)×(d2−1) . Consider the action of G A ⊗I for G A ∈ G 2 on an equator state ψ eq . SinceĜ 2 is transitive on the unit sphere, ifγ = 0 then there is someĜ A ∈Ĝ 2 such that the correlation matrix transforms intô which is in contradiction with (26). Thereforeγ = 0, and by a similar argumentτ = 0. The stabilizer ofν 1 is the largest subgroupĤ 2 ⊂Ĝ 2 which leavesν 1 invariant (Subsection IV E). For any pair H A , H B ∈ H 2 the identity Ω a,b •(H A ⊗H B ) = Ω a,b holds, which implies that if ψ eq belongs to the equator (32) then ] also belongs to the equator. The equator is a unit ball of dimension d 2 − 1. Since the set is a subset of the equator, the dimension of its affine span is at most d 2 − 1.
From now on, only the case Let us see that the first case is impossible. The group H 2 = O(2) is irreducible in C 2 , thereforeH 2 ⊠H 2 is irreducible in (C 2 ) ⊗2 . Three paragraphs above it is shown thatC = 0, hence the set (34) has dimension (d 2 − 1) 2 , which is a lower bound for the one of (33), which is larger than the allowed one (d 2 − 1 = 2).
Let us address the second case. The groupH 2 = SO(2) is irreducible in R 2 but reducible in C 2 ; so the previous argument does not hold. The vector space of 2 × 2 real matrices decomposes into the subspace generated by rotations and the one generated by reflections where det R ± = ±1. For any pairH A ,H B ∈H 2 the ma-trixH A R +H Depending on whetherC is in the subspace generated by R + from (35) or by R − from (36), the states in the equator of S ′ 2 are eitherψ + eq orψ − eq , wherê The proportionality constants inC ∝ R ± are fixed by normalization (29). It turns out that both the symmetric caseψ − eq and the antisymmetric caseψ + eq correspond to different representations of the same physical theory -that is, the corresponding state spaces (together with measurements and transformations) are equivalent in the sense of Subsection II E. To see this, define the linear mapτ :Ŝ 2 →Ŝ 2 asτ (α 1 , α 2 , α 3 ) T := (α 1 , α 2 , −α 3 ) T ; that is, a reflection in the Bloch ball. The equivalence transformation is defined as L := τ ⊗ I (in quantum information terms, this is a "partial transposition"). This map respects the tensor product structure, leaves the set of product states invariant, and satisfiesL(ψ + eq ) =ψ − eq [19]. In other words: we have reduced the discussion of the antisymmetric theory to that of the symmetric theory [35], which will be considered for the rest of the paper. The orthogonality of the matrices inĜ 2,2 implies that S ′ 2 is a 3-dimensional ball, and not just affinely related to it. Hence all states on the surface of the ballŜ ′ 2 ⊂Ŝ 2,2 can be parametrized in polar coordinates u ∈ [0, π) and v ∈ [0, 2π) aŝ These states cannot be written as proper mixtures of other states fromŜ ′ 2 . It is easy to see that this implies that they are pure states inŜ 2,2 .

H. The Hermitian representation
In this subsection, a new (more familiar) representation is introduced, where states in S 2 are represented by 2 × 2 Hermitian matrices. For any state ψ ∈ S 2 in the standard representation (1), define the linear map The Pauli matrices together with the identity I constitute an orthogonal basis for the real vector space of Hermitian matrices. In terms of the Bloch representation, the map (38) has the familiar form All positive unit-trace 2 × 2 Hermitian matrices can be written in this way withψ in the unit sphere. Sincê S 2 is a 3-dimensional unit sphere, the set L[S 2 ] is the set of quantum states. The extreme points of L[S 2 ] are the rank-one projectors: each pure state ψ ∈ S 2 satisfies L[ψ] = |ψ ψ|, where the vector |ψ ∈ C 2 is defined up to a global phase. Effect (15) associated to the pure state ϕ ∈ S 2 is Note that the state ϕ and its associated effect Ω ϕ are both represented by |ϕ ϕ|. The action of a reversible transformationĜ ∈Ĝ 2 = SO(3) in the Hermitian representation is andĜ ji are the matrix components (equation VII.5.12 in [26]). In summary, the generalized bit in all theories satisfying d 2 > 1 and the requirements, is equivalent to the qubit in QT.

I. Reconstructing quantum theory
In this subsection, the main result of this work is proved. But before, let us introduce some notation.
In QT, the state space with capacity c and the corresponding group of reversible transformations are The joint state space of m generalized bits is denoted by S 2 ×m , and the corresponding group of reversible transformations by G 2 ×m . The Hermitian representation of a state ψ ∈ S 2 ×m is defined to be L ⊗m [ψ], where L ⊗m := L ⊗ · · · ⊗ L, and L is defined in (38). The map L ⊗m acts independently on each tensor factor, hence it translates the tensor product structure from the standard representation (4,5,7) to the Hermitian one. For example, if ϕ ∈ S 2 is a pure state, then L ⊗m [ϕ ⊗m ] = |ϕ ϕ| ⊗m . The notation will be useful. The Hermitian representation of a statê ψ AB = [α, β, C] ∈Ŝ 2,2 is The action of local transformations G A , G B ∈ G 2 on ψ AB ∈ S 2,2 is where ρ AB = L ⊗2 [ψ AB ] and U A , U B ∈ SU(2) are related to G A , G B via (40). Now, we are ready to prove Theorem 6. The only GPT with d 2 > 1 satisfying Requirements 1-5 is quantum theory.
We have seen that all quantum states S Q 4 are contained in S H 4 , but can there be other states? If so, the associated Hermitian matrices should have a negative eigenvalue (note that all states in the Hermitian representation (41) have unit trace). If ρ has a negative eigenvalue and |ψ is the corresponding eigenvector, then the associated measurement outcome (45) has negative probability. Hence, we conclude that S Q 4 = S H 4 , and similarly for the measurements.
All reversible transformations H ∈ G H 4 map pure states to pure states, that is, rank-one projectors to rank-one projectors. According to Wigner's Theorem [31], every map of this kind can be written as H(|ψ ψ|) = (U |ψ )(U |ψ ) † , where U is either unitary or anti-unitary. If U is anti-unitary, it follows from Wigner's normal form [32] that there is a two-dimensional U -invariant subspace spanned by two orthonormal vectors |θ 0 , |θ 1 ∈ C 4 such that U (t 0 |θ 0 + t 1 |θ 1 ) equals eithert 0 |θ 0 +t 1 |θ 1 ort 1 e is |θ 0 +t 0 e −is |θ 1 for some s ∈ R. In both cases, U acts as a reflection in the corresponding Bloch ball, which contradicts Requirement 3 because we know that G 2 = SO(3). Therefore G H 4 ⊆ G Q 4 . We know that G H 2,2 contains all local unitaries. Since this group is transitive on the pure states, it contains at least one unitary which maps a product state to an entangled state. It is well-known [33] that this implies that the corresponding group of unitaries constitutes a universal gate set for quantum computation; that is, it generates every unitary operation on 2 qubits. This proves that G H 4 = G Q 4 . Consider m generalized bits as a composite system. From the previously discussed case of S 2,2 , we know that every unitary operation on every pair of generalized bits is an allowed transformation on S H 2 ×m . But twoqubit unitaries generate all unitary transformations [33], hence G Q 2 m ⊆ G H 2 ×m . By applying all these unitaries to |ϕ ϕ| ⊗m , all pure quantum states are generated, hence S Q 2 m ⊆ S H 2 ×m . Reasoning as in the S 2,2 case, for every rank-one projector |ψ ψ| acting on C 2 m , the associated effect which maps ρ ∈ S H 2 ×m to tr(|ψ ψ|ρ) is an allowed measurement outcome on S H 2 ×m . This implies that all matrices in S H 2 ×m have positive eigenvalues, therefore S H 2 ×m = S Q 2 m and G H 2 ×m = G Q 2 m . The remaining cases of capacities c that are not powers of two are treated by applying Requirement 3, using that S c ⊂ S 2 m for large enough m.

V. CONCLUSION
We have imposed five physical requirements on the framework of generalized probabilistic theories. These re-quirements are simple and have a clear physical meaning in terms of basic operational procedures. It is shown that the only theories compatible with them are CPT and QT. If Requirement 4 is strengthened by imposing the continuity of reversible transformations, then the only theory that survives is QT. Any other theory violates at least one of the requirements, hence the relaxation of each one constitutes a different way to go beyond QT.
The standard formulation of QT includes two postulates which do not follow from our requirements: (i) the update rule for the state after a measurement, and (ii) the Schrödinger equation. If desired, these can be incorporated in our derivation of QT by imposing the following two extra requirements: (i) if a system is measured twice "in rapid succession" with the same measurement, the same outcome is obtained both times [4], and (ii) closed systems evolve reversibly and continuously in time.
This derivation of QT contains two steps which deserve a special mention. First, a direct consequence of Requirement 3 is that S 2 is fully surrounded by pure states, which together with Requirement 4 implies that S 2 is a ball. Second, this ball has dimension three, since d = 3 is the only value for which SO(d − 1) is reducible in C d .
Modifications and generalizations of QT are of interest in themselves, and could be essential in order to construct a QT of gravity. Some well-known attempts [15,16] have shown that straightforward modifications of QT's mathematical formalism quickly lead to inconsistencies, such as superluminal signaling [17]. This work provides an alternative way to proceed. We have shown that the Hilbert space formalism of QT follows from five simple physical requirements. This gives five different consistent ways to go beyond QT, each obtained by relaxing one of our requirements. Informational derivation of Quantum Theory, arXiv:1011.6451v2.
[35] As a physical interpretation of the antisymmetric case, consider two observers who have never met before, but who have independently built devices to measure spin-1 2 particles in three orthogonal directions. If they never had the chance to agree on a common "handedness" of spatial coordinate systems, and happen to have chosen two different orientations, they will measure antisymmetric correlation matrices on shared quantum states. The "three-bit nogo result" from [19] can be interpreted as follows: if there is a third observer, then it is impossible that every pair of parties measures antisymmetric correlation matrices.
Appendix A: Lemmas Lemma 1. In any state space S c , the only state ψ ∈ S c which is invariant under all reversible transformations is the maximally-mixed state µ c , defined in (11).
Proof. Suppose ψ ∈ S c satisfies (A1). Any state can be written as a mixture of pure states: ψ = k q k ψ k . Normalization Gc dG = 1, condition (A1), the linearity of G, the purity of all ψ k , the definition of µ c , and k q k = 1, imply which proves the claim.

Lemma 2.
If G is a compact real matrix group, then there is a real matrix S > 0 such that for each G ∈ G the matrix SGS −1 is orthogonal.
Proof. Since the group G is compact, there is an invariant Haar measure [26], which allows us to define Since each G is invertible, the matrix G T G is strictly positive, and P too. Define S = √ P > 0 where both S, S −1 are real and symmetric. For any G ∈ G we have (SGS −1 ) T (SGS −1 ) = I, which implies orthogonality.
Lemma 3. If µ A and µ B are the maximally-mixed states of the state spaces S A and S B , then the maximally-mixed state of the composite system S AB is Proof. The pure states ψ A in S A linearly span R dA+1 , and the pure states ψ B in S B linearly span R dB +1 . Therefore, pure product states ψ A ⊗ ψ B span R dA+1 ⊗ R dB +1 . In particular, the maximally-mixed state (11) of S AB can be written as where t a,b ∈ R are not necessarily positive coefficients, and all ψ A a , ψ B b are pure. From definition (1), the first component of the vector equality (A2) implies a,b t a,b = 1. The maximally-mixed state is invariant under all reversible transformations, in particular the local ones where the same tricks from Lemma 1 have been used.
Proof. By definition, for each tight effect Ω there is a (not necessarily pure) state ψ ′ such that Ω(ψ ′ ) = 1. Every ψ ′ can be written as a mixture of pure states ψ k , that is ψ ′ = k q k ψ k with q k > 0 and k q k = 1. Effects are linear functions such that Ω(ψ) ≤ 1 for any state ψ. Therefore, it must happen that all pure states ψ k in the above decomposition satisfy Ω(ψ k ) = 1.
To prove the second part, let ψ ′ 1 , . . . , ψ ′ n be the states that are distinguished by the measurement, that is Ω a (ψ ′ b ) = δ a,b . Every ψ ′ b can be written as a convex combination of pure states ψ ′ b = k q k ψ b,k . But effects are linear functions such that 0 ≤ Ω(ψ) ≤ 1 for any state ψ. Hence Ω(ψ ′ b ) = 0 is only possible if Ω(ψ b,k ) = 0 for all k, and similarly for the case Ω(ψ ′ b ) = 1. It follows that Ω a (ψ b,1 ) = δ a,b .
Lemma 5. If S c is a state space with capacity c ≥ 1 and µ c the corresponding maximally-mixed state, then there are c pure distinguishable states ψ 1 , . . . , ψ c ∈ S c such that Proof. Since S 1 contains a single state the claim is trivially true for c = 1. Since S 2 is the d 2 -dimensional unit ball, two antipodal pointsφ 1 andφ 2 = −φ 1 are pure, distinguishable and satisfy Now, consider the joint state space of n generalized bits, denoted S 2 ×n . Lemma 3 and (A3) imply that the maximally-mixed state of S 2 ×n is µ (n) = (µ 2 ) ⊗n = 1 2 n ai∈{1,2} ϕ a1 ⊗ · · · ⊗ ϕ an .
This and (A5) imply c n = 2 n . This together with (A4) shows the assertion of the lemma for state spaces whose capacity is a power of two. The rest of cases are shown by induction. Let us prove that if the claim of the lemma holds for a state space with capacity c, with c > 1, then it holds for a state space with capacity c − 1 too. The induction hypothesis tells that there is a complete measurement (Ω 1 , . . . , Ω c ) which distinguishes the pure states ψ 1 , . . . , ψ c ∈ S c , and µ c = 1 Lemma 8. Reversible transformations for two generalized bits in the Bloch representation (24) are orthogonal: Proof. In Subsection IV F the Bloch representation for two generalized bits is defined, and it is argued that reversible transformationsĜ 2,2 act on [α, β, C] as matrices. In particular, local transformations (28) are where each diagonal block acts on an entry of [α, β, C], andĜ A ,Ĝ B ∈Ĝ 2 . In Subsection IV E it is argued that G 2 ⊆ O(d 2 ), hence local transformations (A9) are orthogonal. Lemma 2 shows the existence of a real matrix S > 0 such that for anyĜ ∈Ĝ 2,2 the matrix SĜS −1 is orthogonal. In particular which implies the commutation relation Subsection IV E concludes that d 2 is odd, and that SO(d 2 ) ⊆Ĝ 2 except when d 2 = 7, where M G 2 M −1 ⊆Ĝ 2 and G 2 is the fundamental representation of the smallest exceptional Lie group [30]. For d 2 ≥ 3 these groups act irreducibly in C d2 , henceĜ 2 acts irreducibly in C d2 too [30]. The first two diagonal blocks in (A9) are irreducible. The exterior tensor product of two irreducible representations (in C d ) is also an irreducible representation, hence the third diagonal block in (A9) is also irreducible. This together with (A10) implies that S =   aI 0 0 0 bI 0 0 0 sI   for some a, b, s > 0 (Schur's Lemma [30]). According to Lemma 7, for each unit vector α ∈Ŝ 2 there is a transformation G swap ∈ G 2,2 such that Since SĜ swap S −1 is orthogonal, the vectors [α, 0, 0] and [0, ba −1 α, 0] have the same modulus, hence a = b. Also, there is a transformation G cnot ∈ G 2,2 such that  Proof. Assume Requirement 5', but not Requirement 5. It follows that S 1 contains a single state only: if it contained more than one state, there would exist some ψ ∈ S 1 which is not completely mixed, which would then be distinguishable from some other state, contradicting that S 1 has capacity 1. Requirement 5 is used in the proof of Theorem 1. This proof is easily modified to comply with Requirement 5' instead: adopting the notation from the proof, the state ϕ mix is not completely mixed, thus distinguishable from some other state. This proves existence of someΩ with the claimed properties, and the other arguments remain unchanged, proving thatŜ 2 can be represented as a unit ball. All pure states inŜ 2 are not completely mixed, hence have a corresponding tight effect which is physically allowed. But for every state on the surface of the ball, there exists only one unique tight effect which gives probability one for that state. Hence, all these effects must be allowed, and since they generate the set of all effects, this proves Requirement 5 for use in (14) and the rest of the paper.