Thermodynamics and the structure of quantum theory

Despite its enormous empirical success, the formalismof quantum theory still raises fundamental questions: why is nature described in terms of complexHilbert spaces, andwhatmodifications of it couldwe reasonably expect tofind in some regimes of physics?Here we address these questions by studying how compatibility with thermodynamics constrains the structure of quantum theory.We employ two postulates that any probabilistic theorywith reasonable thermodynamic behaviour should arguably satisfy. In the framework of generalised probabilistic theories, we show that these postulates already imply important aspects of quantum theory, like self-duality and analogues of projectivemeasurements, subspaces and eigenvalues. However, theymay still admit a class of theories beyond quantummechanics. Using a thought experiment by vonNeumann, we show that these theories admit a consistent thermodynamic notion of entropy, and prove that the second law holds for projectivemeasurements andmixing procedures. Furthermore, we study additional entropy-like quantities based onmeasurement probabilities and convex decomposition probabilities, and uncover a relation between one of these quantities and Sorkin’s notion of higher-order interference.


I. INTRODUCTION
Quantum mechanics has existed for about 100 years now, but despite its enormous success in experiment and application, the meaning and origin of its counterintuitive formalism is still widely considered to be difficult to grasp. Many attempts to put quantum mechanics on a more intuitive footing have been made over the decades, which includes the development of a variety of interpretations of quantum physics (such as the many-worlds interpretation [1], Bohmian mechanics [2], QBism [3], and many others [4]), and a thorough analysis of its departure from classical physics (as in Bell's Theorem [5] or in careful definitions of notions of contextuality [6]). In more recent years, researchers, mostly coming from and inspired by the field of quantum information processing (early examples include [21,22,51]), have taken as a starting point the set of all probabilistic theories. Quantum theory is one of them and can be uniquely determined by specifying some of its characteristic properties [53] (as in e.g. [19,43,51,54,55,[57][58][59][60][61]).
While the origins of this framework date back at least to the 1960s [15,16,18], it was the development of quantum information theory with its emphasis on simple operational setups that led to a new wave of interest in "generalized probabilistic theories" (GPTs) [51,52]. This framework turned out to be very fruitful for fundamental investigations of quantum theory's informationtheoretic and operational properties. For example, GPTs make it possible to contrast quantum information theory with other possible theories of information process-ing, and in this way to gain a deeper understanding of its characteristic properties in terms of computation or communication.
In a complementary approach, there has been a wave of attempts to find simple physical principles that single out quantum correlations from the set of all nonsignalling correlations in the device-independent formalism [70]. These include non-trivial communication complexity [71], macroscopic locality [72], or information causality [73]. However, none of these principles so far turns out to yield the set of quantum correlations exactly. This led to the discovery of "almost quantum correlations" [75] which are more general than those allowed by quantum theory, but satisfy all the aforementioned principles. Almost quantum correlations seem to appear naturally in the context of quantum gravity [77].
A relation to other fields of physics can also be drawn from information causality, which can be understood as the requirement that a notion of entropy [66][67][68][69] exists which has some natural properties like the dataprocessing inequality [74]. These emergent connections to entropy and quantum gravity are particularly interesting since they point to an area of physics where modifications of quantum theory are well-motivated: Jacobson's results [78] and holographic duality [79] relate thermodynamics, entanglement, and (quantum) gravity, and modifying quantum theory has been discussed as a means to overcome apparent paradoxes in black-hole physics [80].
While generalized probabilistic theories provide a way to generalize quantum theory and to study more general correlations and physical theories, they still leave open the question as to which principles should guide us in applying the GPT formalism for this purpose. The considerations above suggest taking, as a guideline for such modifications, the principle that they support a well-behaved notion of thermodynamics. As A. Einstein [32] put it, "A theory is the more impressive the greater the simplicity of its premises, the more different kinds of things it relates, and the more extended its area of applicability. Therefore the deep impression that classical thermodynamics made upon me. It is the only physical theory of universal content which I am convinced will never be overthrown, within the framework of applicability of its basic concepts." Along similar lines, A. Eddington [33] argued that "The law that entropy always increases holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell's equations then so much the worse for Maxwell's equations. If it is found to be contradicted by observation well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation." Here we take this point of view seriously. We investigate what kinds of probabilistic theories, including but not limited to quantum theory, could peacefully coexist with thermodynamics. We present two postulates that formalize important physical properties which can be expected to hold in any such theory. On the one hand, these two postulates allow for a class of theories more general than quantum or classical theory, which thus describes potential alternative physics consistent with important parts of thermodynamics as we know it. Indeed, by considering a thought experiment originally conceived by von Neumann, we show that these theories all give rise to a unique, consistent form of thermodynamical entropy. Furthermore, we show that this entropy satisfies several other important properties, including two instances of the second law. On the other hand, we show that these postulates already imply many structural properties which are also present in quantum theory, for example self-duality and the existence of analogues of projective measurements, observables, eigenvalues and eigenspaces.
In summary, our analysis shows that important structural aspects of quantum and classical theory are already implied by these aspects of thermodynamics, but on the other hand it suggests that there is still some "elbow room" for modification within these limits dictated by thermodynamics.
Thermodynamics in GPTs has been considered in some earlier works. In [35,36], the authors introduced a notion of (Rényi-2-)entanglement entropy, and studied the phenomenon of thermalization by entanglement [37][38][39] and the black-hole information problem (in particular the Page curve [40]) in generalizations of quantum theory. Hänggi and Wehner [46] have related the uncertainty principle to the second law in the framework of GPTs. Chiribella and Scandolo ([45,47], see also [48]) have considered the notion of diagonalization and majorization in general theories, leading to a resource-theoretic approach to thermodynamics in GPTs. There are various connections between their results and ours, but there are essential differences. In particular, they assume the purification postulate (which is arguably a strong assumption that in particular excludes classical thermodynamics), whereas we are not making any assumption on composition of systems whatsoever, and in this sense work in a more general framework. Furthermore, while Chiribella and Scandolo take a resource-theoretic approach motivated by quantum information theory, our analysis relies on a more traditional thermodynamical thought experiment (namely von Neumann's). We presented results related to some of those in the present paper in the conference proceedings [31]; here we use different assumptions and obtain additional results.
Our paper is organized as follows. We start with an overview of the framework of generalized probabilistic theories. Then we present von Neumann's thought experiment on thermodynamic entropy, and a modification of it due to Petz [42]. Although it relies on very mild assumptions, it already rules out all theories that admit a state space known as the gbit or squit (a squareshaped state space that can be used to describe one of the two local subsytems of a composite system known as the PR-box [83], exhibiting stronger-than-quantum correlations). Then we present our two postulates, and show that they imply many structural features of quantum theory. We show that theories that satisfy both postulates behave consistently in von Neumann's thought experiment and admit a notion of thermodynamic entropy which satisfies versions of the second law.
Because entropies are an important bridge between information theory and thermodynamics, in the final section we investigate the consequences of our postulates for generalizations of quantities of known significance in quantum thermodynamics [30], defined by applying Rényi entropies to probabilities in convex decompositions of a state, or of measurements made on a state. In particular, we show a relation between max-entropy and Sorkin's notion of higher-order interference [76]: equality of the preparation and measurement based max-entropies implies the absence of higher-order interference. Most proofs are deferred to the appendix. Several results of this paper have been announced in the Master thesis of one of the authors [34].

II. THE MATHEMATICAL FRAMEWORK
Our results are obtained in the framework of generalized probabilistic theories (GPTs) [51,52,55,85,88]. The goal of this framework is to capture all probabilistic theories, i.e. all theories that use states to make predictions for probabilities of measurement outcomes.
FIG. 1. An example state space, A, modelling a so-called "gbit" [52] which is often used to describe one half of a PR-box. The operational setup is depicted on the left, and the mathematical formulation is sketched on the right. An agent ("Alice") holds a black box ω into which she can input one bit, a ∈ {0, 1}, and obtains one output, x ∈ {1, 2}. The box is described by a conditional probability p(x|a). In the GPT framework, ω becomes an actual state, i.e. an element of some state space Ω. Concretely, ω = (1, p(1|0), p(1|1)) ∈ R 3 , where the first entry 1 is used to describe the normalization, p(1|0) + p(2|0) = p(1|1) + p(2|1). In this case, all probabilities are allowed by definition, so that the state space Ω becomes the square, i.e. the points (1, s, t) with 0 ≤ s, t ≤ 1. Alice's input a is interpreted as a "choice of measurement", and the two measurements are e x (ω) = 1 for all states ω ∈ Ω. If we describe effects by vectors by using the standard inner product, we have, for example, e (a=0) x=1 There are four pure states, labelled ω1, . . . , ω4. Every pure state ωi is perfectly distinguishable from every other pure state ωj for j = i, but no more than two of them are jointly distinguishable in a single measurement. More generally, every state on one side of the square is perfectly distinguishable from every state on the opposite side. The unit effect is uA = (1, 0, 0).
Although the framework is based on very weak and natural assumptions, we can only provide a short introduction of the main notions and results here. For more detailed explanations of the framework, see e.g. [34,51,52,55,86,87]. The framework contains quantum theory and also the application of probability theory to classical physics, often referred to as classical probability theory, as special cases. It also contains theories which differ substantially from classical or quantum probability theory, for example boxworld [52], which allows superstrong nonlocality, and theories that allow higher-order interference [76].
A central notion is that of the state and the set of states, the state space Ω A . A state contains all information necessary to calculate all probabilities for all outcomes of all possible measurements. One possible and convenient representation would be to simply list the probabilities of a set of "fiducial" measurement outcomes which is sufficient to calculate all outcome probabilities for all measurements [51,52]. An example is given in Figure 1.
It is possible to create statistical mixtures of states: Let us assume a black box device randomly prepares a state ω 1 with probability p 1 and a state ω 2 with probability p 2 . In agreement with the representation of states as lists of probabilities and the law of total probability, the appropriate state to describe the resulting measurement statistics is ω = p 1 ω 1 + p 2 ω 2 . This means that the state space Ω A is convex and is embedded into a real vector space A (to be described below). Due to the interpretation of states as lists of probabilities (which are between 0 and 1) we demand that Ω A is bounded. Any state that cannot be written as a convex decomposition of other states is called a pure state. As pure states cannot be interpreted as statistical mixtures of other states, they are also called states of maximal knowledge. Furthermore, there is no physical distinction between states that can be prepared exactly, and states that can be prepared to arbitrary accuracy. Thus, we also assume that Ω A is topologically closed. In order to not obscure the physics by the mathematical technicalities introduced by infinite dimensions, we will assume that A is finite-dimensional. Thus Ω A is compact. Consequently, every state can be obtained as a statistical mixture of finitely many pure states [89].
Furthermore, it turns out to be convenient to introduce unnormalized states ω, defined as the nonnegative multiples of normalized states. They form a closed convex cone A + := R ≥0 · Ω A . For simplicity of description, we choose the vector space containing the cone of states to be of minimal dimension, i.e. span(A + ) = A.
We introduce the normalization functional u A : A → R which attains the value one on all normalized states, i.e. u A (ω) = 1 for all ω ∈ Ω A . It is linear, non-negative on the whole cone, zero only for the origin, and ω ∈ A + is an element of Ω A if and only if u A (ω) = 1. The normalization u A (ω) can be interpreted as the probability of success of the preparation procedure. For states with u A (ω) < 1, the preparation succeeds with probability u A (ω). The states with normalization > 1 do not have a physical interpretation, but adding them allows us to take full advantage of the notion of cones from convex geometry.
Effects are functionals that map (sub)normalized states to probabilities, i.e. into [0, 1]. To each measurement outcome we assign an effect that calculates the outcome probability for any state. Effects have to be linear for consistency with the statistical mixture interpretation of convex combinations of states. A measurement (with n outcomes) is a collection of effects e 1 , . . . , e n such that e 1 + . . . + e n = u A . Its interpretation is that performing the measurement on some state ω ∈ Ω A yields outcome i with probability e i (ω).
A set of states ω 1 , . . . , ω n is called perfectly distinguishable if there exists a measurement e 1 , . . . , e n such that e i (ω j ) = δ ij , that is, 1 if i = j and 0 otherwise. A collection of n perfectly distinguishable pure states is called an n-frame, and a frame is called maximal if it has the maximal number n of elements possible in the given state space. In quantum theory, for example, the maximal frames are exactly the or-thonormal bases of Hilbert space. In more detail, a frame on an N -dimensional quantum system is given by ω 1 = |ψ 1 ψ 1 |, . . . , |ψ N ψ N |, where |ψ 1 , . . . , |ψ N are orthonormal basis vectors.
Transformations are maps T : A → A that map states to states, i.e. T (A + ) ⊆ A + . Similarly as effects, they also have to be linear in order to preserve statistical mixtures. They cannot increase the total probability, but are allowed to decrease it (as is the case, for example, for a filter), thus Instruments 1 [84] are collections of transformations T j such that j u A • T j = u A . If an instrument is applied to a state ω, one obtains outcome j (and postmeasurement state T j (ω)/p j ) with probability p j := u A (T j (ω)). Each instrument corresponds to a measurement given by the effects u A • T j . We will say it "induces" this measurement.
The framework of GPTs does not assume a priori that all mathematically well-defined states, transformations and measurements can actually be physically implemented. Here, we will assume that a measurement constructed from physically allowed effects is also physically allowed. Moreover, we assume that the set of allowed effects has the same dimension as A + , because otherwise there would be distinct states that could not be distinguished by any measurement.

III. VON NEUMANN'S THOUGHT EXPERIMENT
The following thought experiment has been applied by von Neumann [41] to find a notion of thermodynamic entropy for quantum states ρ. The result turns out to equal von Neumann entropy, H(ρ) = −tr(ρ log ρ). We apply the thought experiment to a wider class of probabilistic theories.
We adopt the physical picture used by von Neumann [41] to describe the thought experiment 2 ; we will comment on some idealizations used in this model at the end of this section. We consider a GPT ensemble [S 1 , ..., S N ], where S i denotes the i-th physical system, and N j of the systems are in state ω j , where j = 1, . . . , n and j N j = N . This ensemble is described by the state ω = n j=1 p j ω j , where p j = N j /N , which describes the 1 Some authors have recently begun referring to instruments as operations, but long-standing convention in quantum information theory (including [50]) uses the term "operation" for the quantum case of what we are calling transformations (which are completely positive maps). Also, Davies and Lewis [84] define instrument more generally, to allow for continuously-indexed transformations, where we only consider finite collections T j . 2 Our thought experiment is identical to von Neumann's, up to two differences: first, we translate all quantum notions to more general GPT notions; second, while von Neumann implements the transition from (5) to (6) in Figure 2 via sequences of projections, we implement this transition directly via reversible transformations. effective state of a system that is drawn uniformly at random from the ensemble. We introduce N small, indistinguishable, hollow boxes 3 , and we put each ensemble system S j into one of the boxes such that the system is completely isolated from the outside. Furthermore, we assume that the boxes form an ideal gas, which will allow us to use the ideal gas laws in the following derivation. This gas will be called the ω-gas. We will denote the total thermodynamic entropy of a system by H, with a subscript which may indicate whether it is the total entropy of a gas, which potentially depends both on the states of the GPT systems in the boxes and on the classical degrees of freedom (positions, momenta) of the boxes, or just the entropy of the GPT or of the classical degrees of freedom.
At first we need to investigate how the entropy of the gas and of the ensemble are related to each other because later on, we will only consider the gas. So we consider also a second GPT ensemble [S 1 , ..., S N ] (described by ω ∈ Ω A ) implanted into a gas the same way. At temperature T = 0, the movement of the boxes freezes out and we are left with the GPT ensembles. In this case, the thermodynamic entropies of the gases and the GPT ensembles must satisfy: H ω-gas − H ω -gas = H ω-ensemble − H ω -ensemble . Remember that the heat capacity is C = δQ/dT , and as the gases only differ in their internal systems, which are isolated, C is the same for both gases. With dH = δQ/T we thus find that The central tool for the thought experiment is a semipermeable membrane. Whenever a box reaches the membrane, the membrane opens that box and measures the internal system. Depending on the result, a window is opened to let the box pass, or the window remains closed. It is crucial to note that this membrane will not cause problems in the style of Maxwell's demon, as was already discussed by von Neumann himself, because the membrane does not distinguish between its two sides. Now we begin with the experiment itself; see Figure 2. We consider a state ω = n j=1 p j ω j where ω j are perfectly distinguishable pure states, and p j = N j /N , where N j boxes contain a system in the state ω j . We assume that the ω-gas is confined in a container of volume V . Let there be a second container which is identical to the first one, but empty. The containers are merged together, the wall of the non-empty container separating the containers replaced by a semi-permeable membrane which lets only ω 1 pass. At the opposite wall of the nonempty container we insert a semi-permeable membrane which only blocks ω 1 . The solid wall in the middle and the outer semi-permeable membrane are moved at constant distance until the solid wall hits the other end.
Once this is accomplished, i.e. in stage 4) in Fig. 2, one container has all ω 1 -boxes and the other one contains all the rest. Note that this procedure is possible without performing any work as can be seen via Dalton's Law [90]: The work needed to push the semi-permeable membrane against the ω 1 -gas can be recollected at the other side from the moving solid wall, which is pushed by the ω 1 -gas into empty space. Thus we have separated the ω 1 -boxes from the rest. We repeat a similar proce-dure until all the ω j -gases are separated into separate containers of volume V .
Next we compress the containers isothermally to the volumes p j V , respectively. Denoting the pressure by P , and using the ideal gas law, we obtain the required work where log denotes the natural logarithm. As the temperature and thus the internal energy remain constant, we extract heat N k B T j p j log p j .
At this point, we have achieved that every container contains a pure state ω j . We now transform every ω j to another pure state ω which we choose to be the same for all containers. This is achieved by opening the boxes and applying a reversible transformation T j in every container j which satisfies T j ω j = ω . These transformations exist due to Postulate 1. Since the same transformation T j is applied to all small boxes in any given container j (without conditioning on the content of the small box), this operation is thermodynamically reversible. Now we merge the containers, ending with a pure ωgas in the same condition as the initial ω-gas. This merging is reversible, because the density is not changed and because all states are the same, so one can just put in the walls again. The only step that caused an entropy difference was the isothermal compression. Thus, the difference of the entropies between the ω-gas and the ω -gas (which are equal to the entropies of the respective GPT ensembles) is N k B j p j log p j . Therefore H ω-ensemble = H ω -ensemble − N k B j p j log p j . If we assume that pure states have entropy zero, we thus end up with and with the following entropy per system of the ensemble: In summary, we have made the following assumptions to arrive at this notion of thermodynamic entropy: Assumptions 1. (a) Every (mixed) state can be prepared as an ensemble/statistical mixture of perfectly distinguishable pure states.
(b) A measurement that perfectly distinguishes those pure states can be implemented as a semi-permeable membrane, which in particular does not disturb the pure states that it distinguishes.
(c) All pure states can be reversibly transformed into each other.
(d) Thermodynamical entropy H is continuous in the state.
(Since ensembles must have rational coefficients p j = N j /N , we need this to approximate arbitrary states in the thought experiment.) (e) All pure states have entropy zero.
A generalized version of the thought experiment presented by Petz [42] is applicable to more general decompositions: suppose that ω 1 , . . . , ω n ∈ Ω A are perfectly distinguishable, but not necessarily pure. Let p 1 , ..., p n be a probability distribution. Then Petz' thought experiment implies that The main idea is that steps 1)-5) of von Neumann's thought experiment can be run even if the perfectly distinguishable states ω 1 , . . . , ω n are mixed and not pure (as long as the membrane will still keep them undisturbed). Then the entropy of the state in 5) can be computed by making an additional extensivity assumption: denote the GPT entropy of an ω-ensemble of N particles in a volume V by H ω-ensemble (N, V ), then this assumption is that for λ ≥ 0. Assuming in addition that the entropy of the n containers adds up, the total entropy of the configuration in step 5) is N j p j H(ω j ), from which Petz obtains (3). While this approach needs this additional extensivity assumption, it does not need to postulate that all pure states can be reversibly transformed into each other (in contrast to von Neumann's version). Under the assumption that all pure states have entropy zero, it reproduces eq. (2) as a special case.
We conclude this section with a few comments on the idealizations used in the thought experiments above. The use of gases in which the exact numbers of particles with each internal state is known parallels von Neumann's argument in [41]. We rarely if ever have such precise knowledge of particle numbers in real physical gases, so our argument involves a strong idealization, but one that is common in thermodynamics and that has also been made by von Neumann. 4 Although fluctuations in work are significant for small particle numbers, in the thermodynamic limit of large numbers of particles there is concentration about the expected value given, in von Neumann's protocol, by the von Neumann entropy, and therefore our arguments (and von Neumann's) have the most physical relevance in this large-N situation. This is of course true for classical thermodynamics as well-indeed, the use made of the ideal gas law and Dalton's law in von Neumann's argument are additional places where large N is needed if one wants fluctuations to be negligible. We expect finer-grained considerations to be required for a thorough study of fluctuations in finite systems, which is one reason for interest in the additional entropic measures studied in Subsection V.6, but von Neumann's argument does not concern these finer-grained aspects of the thermodynamics of finite systems.

IV. WHY THE "GBIT" IS RULED OUT
In Section II, we have introduced the "gbit", a system for which the state space Ω is a square. Gbits are particularly interesting because they correspond to "one half" of a Popescu-Rohrlich box [83] which exhibits correlations that are stronger than those allowed by quantum theory [70]. One might wonder whether the thought experiments of Section III allow us to define a notion of thermodynamic entropy for the gbit. We will now show that this is not the case, which can be seen as a thermodynamical argument for why we do not see superstrong correlations of the Popescu-Rohrlich type in our universe.
Since not all states of a gbit can be written as a mixture of perfectly distinguishable pure states, von Neumann's original thought experiment cannot be of direct use here. However, we may resort to Petz' version: every mixed state ω of a gbit can be written as a mixture of perfectly distinguishable mixed states, as illustrated in Figure 3. Furthermore, the other crucial assumption on the state space is satisfied, too: for every pair of perfectly distinguishable mixed states, there is an instrument (a "membrane") that distinguishes those states without disturbing them. We even have that all pure states can be reversibly transformed into each other (namely by a rotation of the square).
Thus, we can analyze the behavior of a gbit state space in Petz' version of the thought experiment. Any continuous notion of thermodynamic entropy H consistent with this thought experiment would thus have to satisfy (3). However, we will now show that the gbit does not admit any notion of entropy that satisfies (3). Consider different decompositions of the state ω = 1 2 ω a + 1 2 ω b in the center of the square, where ω a = pω 1 + (1 − p)ω 2 as well as It is geometrically clear that every choice of 0 < p < 1 corresponds to a valid decomposition. We find (applying Eq. (3) to ω for the first equality, and to ω a and ω b for In an attempt to define a notion of thermodynamic entropy for the gbit, we can decompose any state into perfectly distinguishable states. This is done in two steps, as explained in the main text. the second): This expression can never be constant in p, no matter what value of entropy of the four pure states H(ω i ) we assume. Thus, the entropy H(ω) of the center state ω is not well-defined, since it depends on the choice of decomposition.
In other words, the structure of the gbit state space enforces that any meaningful notion of thermodynamic entropy H will not only be a function of the state, but a function of the ensemble that represents the state. If a state ω is represented by different ensembles, then this will in general give different values of entropy.
So what goes wrong for the gbit? Clearly, all we can say with certainty is that the combination of assumptions made in von Neumann's thought experiment turns out not to yield a unique notion of entropy, while a deeper physical interpretation seems only possible under further assumptions on the interplay between the gbit and the thermodynamic operations. However, a comparison with quantum theory motivates at least one further speculative attempt at interpretation. In the example above, we have decomposed a state ω into two perfectly distinguishable states ω a and ω b , which can themselves be decomposed into pairs of perfectly distinguishable states ω 1 and ω 2 , or ω 3 and ω 4 respectively. In quantum theory, this would only be possible if ω a and ω b are orthogonal, which would then imply that all four states ω 1 , . . . , ω 4 are pairwise orthogonal. This would enforce that there exists a unique projective measurement (a "membrane") that distinguishes all these four states jointly. This membrane could feature in von Neumann's thought experiment (or other similar thermodynamical settings), yielding a unique notion of thermodynamic entropy.
On the other hand, in the gbit, the four pure states ω 1 , . . . , ω 4 are not jointly perfectly distinguishable. Hence there is no canonical choice of "membrane" that could be used in the thought experiment to define a unique natural notion of entropy for the gbit states. Entropy will be "contextual", depending on the choice of membrane resp. ensemble decomposition that is used in any given specific thermodynamical setting. Therefore, the implication "pairwise distinguishability⇒joint distinguishability", which is true for quantum theory, has thermodynamic relevance. This implication, if suitably interpreted, leads to the "exclusivity principle" [7,8,91], namely that the sum of the probabilities of pairwise exclusive propositions cannot exceed 1 (in this case these propositions correspond to the outcomes of the jointly distinguishing measurement). This suggests that the exclusivity principle, which has so far been considered only in the realm of contextuality, may be thermodynamically relevant. This observation is also closely related to the notion of "dimension mismatch" described in [82], and to orthomodularity in quantum logic (see for example [23]).

V.1. The two postulates
In this section we introduce the two postulates that express key operational concepts from thermodynamics. The first postulate is motivated by the universality of thermodynamics and the distinction between microscopic and macroscopic behaviour. At first we consider the universality of thermodynamics, in the sense that thermodynamics is a very general theory whose basic principles can be applied to many possible implementations, as already noticed by N. Carnot [44]: "In order to consider in the most general way the principle of the production of motion by heat, it must be considered independently of any mechanism or any particular agent. It is necessary to establish principles applicable not only to steam engines but to all imaginable heat-engines, whatever the working substance and whatever the method by which it is operated." Recalling von Neumann's thought experiment in the case of quantum theory, we can think of thermodynamical protocols (which will ultimately also include heat engines) as acting on a given ensemble, defined as a probabilistic mixture of pure states chosen from a fixed basis. If we interpret ensembles with different choices of basis as different "working substances", then Carnot's principle should apply: protocols that can be implemented on one ensemble (say, ensemble 1) can also be implemented on the other (say, ensemble 2). 5 In quantum 5 Here we only consider ensembles of identical Hilbert space dimen-theory, this universality is ensured by the existence of unitary transformations: all orthonormal bases can be translated into each other by a unitary and therefore reversible map. In this sense, the state of ensemble 1 can in principle be transferred to ensemble 2, then the thermodynamic protocol of ensemble 2 can be performed (if we have also transformed the projectors describing the membranes accordingly), and then one can transform back. Even if this cannot always be achieved in practice, the corresponding unitary symmetry of the quantum state space (considered as passive transformations between different descriptions) enforces the aforementioned universality. 6 This universality of implementation, as well as independence of the choice of labels and descriptions, should continue to hold in all generalized theories that we consider. An orthonormal basis from quantum theory is nothing else than a set of perfectly distinguishable pure states, i.e. an n-frame. Therefore, in our generalized theories, we expect that this universality of implementation is achieved by the existence of reversible transformations that, in analogy to unitary maps, transform any given n-frame into any other: Postulate 1: For each n ∈ N, all sets of n perfectly distinguishable pure states are equivalent. That is, if {ω 1 , . . . , ω n } and {ϕ 1 , ..., ϕ n } are two such sets, then there exists a reversible transformation T with T ω j = ϕ j for all j.
Furthermore, Postulate 1 expresses a physical property that is crucial for thermodynamics: that of microscopic reversibility. Many characteristic properties of thermodynamics arise from limited experimental access to the microscopic degrees of freedom, which by themselves undergo reversible time evolution. This reversibility, for example, forbids evolving two microstates into one, which is at the heart of the nondecrease of entropy. If the experimenter had full access to the microscopic degrees of freedom, then he or she could convert any state of maximal knowledge to any other one as long as he or she preserved distinguishability. Postulate 1 formalizes this microscopic basis of thermodynamics by demanding the existence of "enough" distinguishability-preserving, microscopic transformations T , which can be understood as reversible time evolutions.
sions. If the dimensions are different (say, 2 versus 3), then one can implement different sets of protocols on the ensembles (say, ones involving semipermeable membranes that distinguish 3 alternatives in the latter, but not the former case). One could then still discuss a notion of universality in Carnot's spirit, by referring to the equivalence of, say, a state space with N = 3 alternatives to a subspace of a state space with N = 2 × 2 alternatives, but we will not discuss this further here. 6 In classical thermodynamics, the analog of a choice of basis is the labelling of the distinguishable configurations. Clearly, the availability of thermodynamic protocols does not change under relabelling.
Postulate 1 has substantial information-theoretical justifications and consequences. The basic concepts of both thermodynamics and information processing are independent of the choice of implementation. For information processing this is formalized by the Turing machine which admits a multitude of physical realizations. Perfectly distinguishable pure states can be taken as bits, and Postulate 1 expresses that all bits (or their higher-dimensional analogues) are equivalent. It is for this reason that Postulate 1 was called generalized bit symmetry in [34], and its restriction to pairs of distinguishable states was called bit symmetry in [64]. Starting with Landauer's principle, "thermodynamics of computation" [92] has become a fruitful paradigm that relates the two apparently disjoint fields. The two complementary interpretations of Postulate 1 are one instance of this.
Now we turn to our second postulate. We are looking for theories very similar to the thermodynamics we are used to; thus it is essential that we can adopt basic notions of standard thermodynamics unchanged or with only very small alterations. Two such notions of great importance are (Shannon) entropy S = −k B j p j log p j and majorization theory. In classical and quantum thermodynamics, these notions operate on the coefficients in a decomposition of a state into perfectly distinguishable pure states (in quantum theory, the eigenvalues). In order to not change thermodynamic theory too much, we would also like this to be possible in our more general state spaces. Thus, we demand that every state has a convex decomposition into perfectly distinguishable pure states.
Note that this was indeed one of our assumptions in von Neumann's thought experiment in Section III. There, it allowed us to realize any state ω as a "quasiclassical ensemble", i.e. as an ensemble of states that behave like classical labels. This gives us a further justification of our second postulate: thermodynamic (thought) experiments require that states have an ensemble interpretation. An unambiguous notion of "counting of microstates" demands that the ensembles consist of perfectly distinguishable, pure states. Without this, obtaining a phenomenological thermodynamics for which the theory is the underlying microscopic theory seems problematic. Thus, our second postulate is Postulate 2: Every state ω ∈ Ω A has a convex decomposition ω = j p j ω j into perfectly distinguishable pure states ω j .
It is tempting to interpret the two postulates as reflecting the microscopic and the macroscopic aspects of thermodynamics, respectively: while Postulate 1 describes microscopic reversibility of the pure states that may describe single particles in thermodynamics, Postulate 2 ensures that mixed states can be interpreted macroscopically as descriptions of quasiclassical ensembles, composed of a large number of particles that are separately in unknown but distinguishable microstates.
We will not introduce any further postulates. In particular, we will not make any assumptions on the composition of systems. All our results are therefore independent from notions like tomographic locality [51] (which is arguably dispensable in many important situations [81]) or purification [56] (which is a rather strong assumption); we do not assume either of the two.

V.2. Some consequences of Postulates 1 and 2
Postulates 1 and 2 have been analyzed in [43], but in a different context: instead of investigating thermodynamics, the goal in [43] was to obtain a reconstruction of quantum theory, by supplementing Postulates 1 and 2 with further postulates. Some of the insights from [43] will be important here, and are therefore briefly discussed below. Starting with Subsection V.4, we will also obtain new results that are interesting in a thermodynamic context.
Moreover, the cone of unnormalized states becomes self-dual with this choice of inner product. In particular, every effect e can be taken as a vector in A + , such that e(ω) = e, ω . In standard quantum theory, this is the Hilbert Schmidt inner product on the real vector space of Hermitian matrices: X, Y = tr(XY ) for X = X † , Y = Y † .
Quantum theory has more structure: the convex set of density matrices Ω A has faces 7 , and these faces are in one-to-one correspondence to subspaces of Hilbert space (namely, a face F contains all density matrices that have support on the corresponding Hilbert subspace). To every face F , we can associate a number |F | which is the dimension of the corresponding Hilbert subspace, and F G implies |F | < |G|. Every face F is generated by |F | pure and perfectly distinguishable states in F (an |F |-frame in F ), and every (smaller) frame that is a subset of F can be completed, or extended, to a frame which has |F | elements and thus generates F .
In all theories that satisfy Postulates 1 and 2, all these properties hold in complete analogy [43]. How-ever, since faces do not any more correspond to Hilbert spaces, the numbers |F | do not have an interpretation as the dimension of a subspace. Instead, we call |F | the rank of F . If von Neumann's thought experiment is supposed to make sense for these theories, we need a way to formalize the working of a semipermeable membrane, which in quantum theory is done via projective measurements.
Since we are dealing with unnormalized states, the corresponding analog in GPTs will be formulated in terms of the set of unnormalized states A + . As one can see in the case of the gbit, it is not automatic that we have any notion of "projective measurements" for any given state space. However, Postulates 1 and 2 turn out to ensure that projective measurements exist. For any face F of A + (the non-negative multiples of the corresponding face of Ω A ), consider the orthogonal projector P F onto the span of F . One can show that P F is positive, i.e. maps (unnormalized) states to (unnormalized) states [43]. Moreover, P F does not disturb the states in the face F .
Thus, to a given set of mutually orthogonal faces F 1 , . . . , F m such that |F 1 | + . . . + |F m | = N A , we can associate an instrument with transformations T i := P Fi , which describes a projective measurement, as in a semipermeable membrane. Transformation T i leaves the states in face F i unperturbed, but fully blocks out states in the other faces, i.e. T i ω = 0 for ω ∈ F j , i = j. In standard quantum theory, these transformations are P Fi ρ = π i ρπ i , where π i is the orthogonal Hilbert space projector onto the i-th Hilbert subspace. The rank condition becomes tr(π 1 ) + . . . + tr(π m ) = N A (the total Hilbert space dimension), and mutual orthogonality is π i π j = δ ij π i . We will show in Subsection V.4 that the mutually orthogonal faces replace the eigenspaces from quantum theory and that the projective measurement described here can be interpreted as measuring an observable.
The Hilbert space projector π i therefore also has an interpretation as an effect in standard quantum theory: it yields the probability of outcome i in the projective measurement on a state ρ, namely tr(π i ρ). The analogous effect in a GPT that satisfies Postulates 1 and 2, corresponding to a face F , is (identifying the effect u A with a vector via the inner product). The effect u F is sometimes called the "projective unit" of F . In quantum theory, we can write π i = j |ψ j ψ j |, where the |ψ j are an orthonormal basis of the corresponding Hilbert subspace. The same turns out to be true in our GPTs: we have where ω 1 , . . . , ω |F | is any frame that generates F . There-fore, the probability to obtain outcome i in the projective measurement above on state ω is u Fi , ω = u A , P Fi ω .

V.3. State spaces satisfying Postulates 1 and 2
It is easy to see that both quantum and classical state spaces satisfy Postulates 1 and 2. By a "classical state space", we mean a state space that consists of discrete probability distributions. Concretely, for any number N ∈ N of mutually exclusive alternatives, consider the state space Any pure state is given by a deterministic probability vector, i.e. ω i = (0, . . . , 0, 1, 0, . . . , 0) (where 1 is on the ith place). If we have two equally sized sets of such vectors (as in Postulate 1), then there is always a permutation that maps one set to the other. In fact, the reversible transformations correspond to the permutations of the entries. Postulate 2 is then simply the statement that Which state spaces are there, in addition to standard complex quantum theory and classical probability theory, that satisfy Postulates 1 and 2? We think that this question is very difficult to answer. Thus, we formulate the following

Open Problem 1.
Classify all state spaces that satisfy Postulates 1 and 2.
From the results in [43], we know which state spaces satisfy Postulates 1 and 2 and one additional property: the absence of third-order interference. The notion of higher-order interference has been introduced by Sorkin [76], and has since been the subject of intense theoretical [93,95,96] and experimental [97][98][99][100][101][102] interest.
For the main idea, think of three mutually exclusive alternatives in quantum theory (such as three slits in a triple-slit experiment), described by orthogonal projectors π 1 , π 2 , π 3 . The event that alternative 1 or alternative 2 takes place is described by the projector π 12 = π 1 + π 2 ; similarly, we have π 13 , π 23 and π 123 . Their actions on density matrices are described by superoperators ρ → P 12 (ρ) := π 12 ρπ 12 (and similarly for the other projectors). As a consequence, we obtain that P 12 = P 1 + P 2 , which expresses the phenomenon of interference. However, it is easy to check that which means that interference over three alternatives can be reduced to contributions from interferences of pairs of alternatives. Similar identities hold for an arbitrary number n ≥ 4 of alternatives: quantum theory admits only pairwise interference, and no "third-order interference" which would be characterized by a violation of this equality. In the context of Postulates 1 and 2, we have an analogous notion of orthogonal projectors, and thus we can consider (5) and its generalization to n ≥ 4 alternatives on a state space with N ≥ n perfectly distinguishable states. Postulating this "absence of third-order interference" in addition to Postulates 1 and 2 gives us the following: Theorem 2 (Lemma 33 in [43]). The possible state spaces which satisfy Postulates 1 and 2 and which do not admit third-order interference, in addition to classical state spaces, are the following. First, for N ≥ 4 perfectly distinguishable states, there are only three possibilities: • Standard complex quantum theory.
• Quantum theory over the real numbers. That is, only real entries are allowed in the N × N density matrices.
• Quantum theory over the quaternions. The state spaces are the self-adjoint N ×N quaternionic matrices of unit trace.
For N = 3 perfectly distinguishable states, all of the above and one exceptional solution are possible, namely quantum theory over the octonions (but only for the case of 3 × 3 unit trace density matrices). Mathematically, these examples correspond to the state spaces of the finite-dimensional irreducible formally real Jordan algebras [24,43]. We do not know whether there are theories that satisfy Postulates 1 and 2 but admit higher-order interference and therefore do not appear on this list. In Theorem 12, we will show that the question whether a theory has third-order interference is related to the properties of its Rényi entropies.

V.4. Observables and diagonalization
A central part of physics are observables and how they can be measured. In standard quantum theory, we can introduce observables in two different ways, which both equivalently lead to the prescription that observables are described by Hermitian operators/matrices.
First, in finite dimensions, we can characterize observables as those objects that linearly assign real expectation values to states. In the case of quantum theory it follows that observables are represented by matrices X, and Hermiticity X = X † implies that expectation values tr(ρX) are always real. Linearity is enforced by the statistical interpretation of states, for the same reason that effects in GPTs are linear.
Second, we can introduce observables by saying that there is a projective measurement π 1 , . . . , π n that measures this observable, and which has outcomes x 1 , . . . , x n ∈ R. This leads to the Hermitian operator X = n i=1 x i π i . Since every Hermitian operator can be diagonalized, these two definitions are equivalent.
Our two postulates provide the structure to introduce observables in a completely analogous way. First, using the inner product, we can define observables as linear maps of the form ω → x, ω and thus identify them with elements x ∈ A of the vector space that carries the states (as in quantum theory, where this vector space is the space of Hermitian matrices). As noticed in [62], every such vector has a representation of the form where the u i are projective units corresponding to mutually orthogonal faces F i , x i ∈ R, and x i = x j for i = j. The analogy with quantum theory goes even further: due to (4), we have i | in standard quantum theory. In analogy to quantum theory we will call the F i eigenfaces and the x i eigenvalues. To further justify this terminology, note that the x i are eigenvalues of the map i x i P i , where P i are the orthogonal projectors onto the spans of the faces F i .

Theorem 3. If Postulates 1 and 2 hold, then every element
x ∈ A has a representation of the form x = n j=1 x j u j where x j ∈ R are pairwise different and the u j are the projective units of pairwise orthogonal faces F j such that j u j = u A . This decomposition x = n j=1 x j u j is unique up to relabelling. In analogy to quantum theory, we will call the x j eigenvalues and the F j eigenfaces.
Furthermore, for every real function f with suitable domain of definition, we can define as in spectral calculus. If P j is the orthogonal projector onto the span of F j , then (P 1 , ..., P n ) is a well-defined instrument with induced measurement (u 1 , ..., u n ) which leaves the elements of span(F j ) invariant: In analogy to quantum theory, we will call this instrument the projective measurement of the observable x.
We will give a proof in the appendix. 8 Eq. (7) allows us to define a notion of entropy, in full analogy to quantum mechanics.

Definition 4 (Spectral entropy). If
A is a state space that satisfies Postulates 1 and 2, we define the spectral entropy for any state ω ∈ Ω A as where ω = i p i ω i is any convex decomposition of ω into pure and perfectly distinguishable states ω i , and 0 log 0 := 0.
Theorem 3 tells us that this definition is independent of the choice of decomposition: it is easy to check that where log ω is understood in the sense of spectral calculus as in (7). The right-hand side is manifestly independent of the decomposition. It can also be written S(ω) = u A (η(ω)), where η(x) = −x log x for x > 0 and η(0) = 0. In particular, ω is a pure state ⇔ S(ω) = 0.

V.5. Thermodynamics in the context of Postulates 1 and 2
If a state space satisfies Postulates 1 and 2, then it also satisfies all the assumptions that we have made in von Neumann's thought experiment. It is easy to check all items in Assumptions 1: (a) is simply Postulate 2, and (c) is a consequence of Postulate 1. As we have seen in the previous section, our two postulates imply that we have orthogonal projectors sharing important properties with those of standard quantum theory. If we make the physical assumption that we can actually implement them by means of semipermeable membranes (as in quantum theory), we obtain (b). Item (e) is the same as (8). Note that assumption (d) is not a mathematical assumption about the state space, but a physical assumption about thermodynamic entropy. This shows part of the following (the full proof will be given in the appendix): 8 This can also also obtained by combining the fact that Postulates 1 and 2 imply the state space is projective (first part of Theorem 17 in [43]) and self-dual (Proposition 3 in [43]) with results such as Theorem 8.64 in [24]. Observation 5. Von Neumann's thought experiment, as explained in Section III, can be run for every state space that satisfies Postulates 1 and 2. The notion of thermodynamic entropy H that one obtains from that thought experiment turns out to equal spectral entropy S as given in Definition 4,

H(ω) = S(ω) for all states ω.
This is consistent with Assumptions 1. Furthermore, it is also consistent with Petz' version of the thought experiment, because spectral entropy satisfies for every convex decomposition ω = j p j ω j of ω into perfectly distinguishable, not necessarily pure states ω j .
Thus, spectral entropy S gives meaningful and consistent physical predictions in situations like von Neumann's and Petz' thought experiments. However, we clearly do not know whether S is a consistent notion of physical entropy in all thermodynamical situations.
It turns out that there are further properties of S that encourage its physical interpretation as a thermodynamical entropy. In particular, we will now show that the second law holds in two important situations.
We start by considering projective measurements P 1 , . . . , P n . Projective measurements can model semipermeable membranes as in von Neumann's thought experiment, or they describe the measurement of an observable as explained in Subsection V.4. Consider the action of this measurement on a given state ω. With probabilities (u A • P j ) (ω), this measurement yields the outcome j with post-measurement state ω j := P j ω/ (u A • P j (ω)). Performing this measurement on every particle of an ensemble (without learning the outcomes) yields a new ensemble, described by the postmeasurement state Projective measurements do not decrease the entropy of the ensemble: Theorem 6. Suppose Postulates 1 and 2 are satisfied. Let P 1 , . . . , P n be orthogonal projectors which form a valid instrument. Then the induced measurement with postmeasurement ensemble state ω = j P j w does not decrease entropy: S(ω ) ≥ S(ω).
The proof will be given in the appendix. As in standard quantum theory, projectors P j form a valid instrument if and only if they are mutually orthogonal, i.e. P i P j = δ ij P i , and complete: Another important manifestation of the second law is in mixing procedures as in thought experiment, let the j-th tank contain an N jparticle gas that represents an ω j -ensemble. Furthermore, assume that all the gases are at the same pressure and density. Identifying thermodynamic entropy H with spectral entropy S (as suggested by Observation 5), the entropy of the GPT-ensemble in tank j is N j S(ω j ), where S is the entropy per system. Thus the total GPT-entropy is j N j S(ω j ). We remove the walls and let the gases mix. Then we put the walls back in. Now all the tanks contain gases hosting j Nj N ω j ensembles at the same conditions as before, where N = j N j . The total GPT-entropy in the end is given by As the gases in the tanks have the same density, volume, temperature and pressure as before, the only difference in entropy is due to the GPT-ensembles. The second law requires that the entropy does not decrease in this process, i.e. that j N j S(ω j ) ≤ N S j Nj N ω j and thus j Nj N S(ω j ) ≤ S j Nj N ω j . The following theorem shows that our two postulates guarantee that this is true: Theorem 7. Assume Postulates 1 and 2. Then entropy is concave, i.e. for ω 1 , . . . , ω n ∈ Ω A and p 1 , ..., p n a probability distribution, we have Thus, the second law automatically holds for mixing processes. One way to prove (10) is to see that S equals "measurement entropy" as we will show in Subsection V.6, proven to be concave in [66] and [67]. However, there is a simpler proof that uses a notion of relative entropy, which is an important notion in its own right.
Here, for ϕ = j q j ϕ j any decomposition into a maximal frame, log ϕ = j log(q j )ϕ j according to Theorem 3. (As in quantum theory, this can be infinite if there are q j = 0 such that ω, ϕ j = 0).
A notion of relative entropy in GPTs has also been defined in Scandolo's Master thesis [48], but under differ-ent assumptions, as discussed in the introduction. Relative entropy continues to satisfy Klein's inequality, a fact that is useful in proving Theorem 7. The proof is similar to that within standard quantum theory and deferred to the appendix.

Theorem 9 (Klein's inequality). For all ω, ϕ ∈ Ω
Klein's inequality can be used to give a simple proof of Theorem 7: Given all the calculations in this subsection in terms of orthogonal projections, it may seem at first sight as if every statement or calculation in quantum theory can be analogously made in the more general state spaces that satisfy Postulates 1 and 2. However, this may not quite be true, as the fact that the following is an open problem shows: Open Problem 2. For state spaces satisfying Postulates 1 and 2, if ω is a pure state, and P an orthogonal projection, then is P ω also (up to normalization) a pure state?
In classical and quantum state spaces, the answer is "yes", but we do not know if a positive answer follows from Postulates 1 and 2 alone. We will return to this problem in Theorem 12.
Note that Chiribella and Scandolo have applied similar techniques and found beautiful results, including some which are comparable to some of ours, in [45,Section 7] (see also [48]). They derive diagonalizability of states from a very different set of postulates.

V.6. Information-theoretic entropies and their relation to physics
So far we have considered entropy from a thermodynamic perspective. But entropies also arise in information theory, and as the GPT framework is mostly studied in quantum information theory, indeed there have been many results on entropy from a information-theoretic perspective. Our exposition will mainly follow [66], but has also been given in a slightly different formalism in [67]. If M is bijective, then the measurement f is simply a relabelling of e. If there exists a k with M (j) = k ∀j, then because of the normalization of the e-measurement, f k = 0, i.e. f k corresponds to a trivial outcome that never happens. If M is not injective, then f is a coarse-graining of e (or vice versa, e a refinement of f ) in the sense that f is obtained from e by collecting several outcomes of e and giving them a common new outcome label (and by possibly adding the 0-effect a few times), see Figure 5.
In this sense, we do not care about which of the e j triggered the new effect. However, there exist trivial refinements/coarsegrainings: for those, e j ∝ f M (j) ∀j. We write e j = p j f M (j) . Then such a measurement can be obtained by performing f , and if outcome k is triggered, we activate a classical random number generator which generates the final outcome j among those j with M (j) = k with probability Thus, a trivial refinement does not yield any additional information about the GPT-system. We call a measurement fine-grained if it does not have any non-trivial refinements. The set of fine-grained measurements on any state space A is denoted E * A . Now we consider the Rényi entropies [65], which are defined for probability distributions p = (p 1 , . . . , p n ) as is just the regular Shannon entropy H. For α ∈ [0, ∞] and GPTs satisfying Postulates 1 and 2, we generalize the classical Rényi entropies: where ω = j p j w j is any decomposition into perfectly distinguishable pure states. According to Theorem 3, the result is independent of the choice of decomposition. We have S 1 = S, the spectral entropy of Definition 4.
Following [66], for every α ∈ [0, ∞] and ω ∈ Ω A , we define the order-α Rényi measurement entropy as where H α on the right-hand side denotes the classical Rényi entropy. The order-α Rényi decomposition entropy is defined asŠ where the infimum is over all convex decompositions of ω into pure states ϕ j ∈ Ω A . The idea of measurement entropy is to characterize the state before a measurement. For example, in quantum theory, particles prepared in a state |ψ which all give the same result in energy measurements would be said to be in an energy eigenstate. If instead we performed a position measurement, the resulting distribution of positions would have non-zero entropy. However, this entropy would arguably not come from the initial state, but from the measurement process itself due to the uncertainty principle.
Suppose we would like to prepare a state ω by using states of maximal knowledge (i.e. pure states) ϕ j , and a random number generator which gives output j with probability p j . Then the decomposition entropy quantifies the smallest information content (entropy) of a random number generator that would be necessary to build such a device. For more detailed operational interpretations of measurement and decomposition entropy, in particular for α = 1, see [66,67] Note that in quantum theory, measurement, decomposition and spectral Rényi entropies all coincide, with the α = 1 case giving von Neumann entropy, S(ω) = −tr(ω log ω).
Our first result is that the spectral and measurement definitions of the entropies agree: Theorem 10. Consider any state space A which satisfies Postulates 1 and 2. Then the Rényi entropies S α and the Rényi measurement entropies S α coincide, and upper-bound the Rényi decomposition entropyŠ α , i.e.
In particular, for α = 1, the measurement entropy S is the same as the spectral entropy S from Definition 4, which we have identified with thermodynamical entropy H in Observation 5.
The inequalityŠ α ≤ S α is easy to see: for a decomposition ω = i p i ω i into perfectly distinguishable pure states ω i , the states ω i can also be seen as a fine-grained measurement, yielding outcome probabilities p i . So taking the infimum over all decompositions gives at most H α (p) = S α (ω). The equality between S α and S α is shown in the appendix.
Proof. To give the reader an idea of the kind of arguments involved, we present the proof for S 2 , but defer the proof for S ∞ to the appendix. If ω = j p j ω j is any convex decomposition into a maximal set of perfectly distinguishable pure states (without loss of generality p 1 ≥ p 2 ≥ . . .), and ω = j q j ϕ j any (other) convex decomposition into pure states ϕ j (also with q 1 ≥ q 2 ≥ . . .), and sinceŠ 2 (ω) is defined as the infimum over the righthand side, we obtain thatŠ 2 (ω) ≥ S 2 (ω); we find the converse inequality in Theorem 10.
We do not know whether the same identity holds for the most interesting case α = 1, the case of standard thermodynamic entropy S = S 1 . In the max-entropy case α = 0, however, we have a surprising relation to higher-order interference: Theorem 12. Consider a state space satisfying Postulates 1 and 2. Then the following statements are all equivalent: (i) The state space does not have third-order interference.
(iii) The state space is either classical, or one on the list of Theorem 2.
(iv) If ω is a pure state and P F any orthogonal projection onto any face F , then P F ω is a multiple of a pure state.
(v) The "atomic covering property" of quantum logic holds.
The equivalences (i) ⇔ (iii) ⇔ (iv) ⇔ (v) are shown in [43]; our new result is the equivalence to (ii), which is shown in the appendix.
Absence of third-order interference is meant in the sense of eq. (5), as introduced originally by Sorkin [76]: only pairs of mutually exclusive alternatives can possibly interfere. It is interesting that this is related to an information-theoretic property of max-entropy S 0 , as given in (ii). We do not currently know whether S 0 (or, in particular, the identity ofŠ 0 and S 0 ) has any thermodynamic relevance in the class of theories that we are considering, but it certainly does within quantum theory, where it attains operational meaning in single-shot thermodynamics [28,29].
As (iii) shows, this theorem is closely related to Open Problem 1: it gives properties of conceivable state spaces that satisfy Postulates 1 and 2, but are not on the list of known examples (namely, they do not satisfy any of (i) − (v)). Similarly, (iv) shows the relation of higherorder interference to Open Problem 2, and (v) relates all these items to quantum logic. In fact, one can show that Postulates 1 and 2 imply that the set of faces of the state space has the structure of an orthomodular lattice, which is often seen as the definition of quantum logic. For readers who are familiar with the terminology of quantum logic, we give some additional remarks in Subsection VII.2 in the appendix.

VI. CONCLUSIONS
As discussed in the introduction, many works (dating back at least to the 1950s) have considered quantum theory as just one particular example of a probabilistic theory: a single point in a large space of theories that contains classical probability theory, as well as many other possibilities that are non-quantum and non-classical. More recent works have focused on the information-theoretic properties of quantum theory, for example deriving quantum theory as the unique structure that satisfies a number of information-theoretic postulates.
Rather than attempt a derivation of quantum theory from postulates, this paper has examined the thermodynamic properties of quantum theory and of those theories that are similar enough to quantum theory to admit a good definition of thermodynamic entropy, and of some version of the Second Law. Postulate 1 states that there is a reversible transformation between any two sets of n distinguishable pure states. This can be thought of as an expression of the universality of the representation of information, in particular that a choice of basis is arbitrary, and also allows for reversible microscopic dynamics, as is crucial for thermodynamics. Postulate 2 states that every state can be written as a convex mixture of perfectly distinguishable pure states. This ensures that a mixed state describing an ensemble of many particles can be treated as if each particle has an unknown microstate, drawn from a set of distinguishable possibilities.
Much follows from Postulates 1 and 2, without need-ing to assume any other aspects of the standard formalism of quantum theory. In order to derive thermodynamic conclusions, we considered the argument originally employed by von Neumann in his derivation of the mathematical expression for the thermodynamic entropy of a quantum state. The argument involves a thought experiment with a gas of quantum particles in a box, and semi-permeable membranes that allow a particle to pass or not depending on the outcome of a quantum measurement. By applying the same thought experiment, we showed that given any theory satisfying Postulates 1 and 2, there is a unique expression for the the thermodynamic entropy, equal to both the spectral entropy and the measurement entropy. By way of contrast, a fictitious system defined by a square state space, which arises as Alice's local system of an entangled pair producing stronger-than-quantum "PR box" correlations, does not satisfy either Postulate. This system -the gbit -does not admit a sensible notion of thermodynamic entropy, at least not one that is given to it by the von Neumann or Petz arguments. While many works have discussed the inability of quantum theory to produce arbitrarily strong nonlocal correlations, this connection with thermodynamics deserves further investigation. It would be very interesting, for example, if Tsirelson's bound on the strength of quantum nonlocal correlations could be derived from a thermodynamic argument.
There are many other consequences of Postulates 1 and 2 for both thermodynamic and informationtheoretic entropies. For example, a form of the Second Law holds in that neither projective measurements nor mixing procedures can decrease the thermodynamic entropy. The spectral and measurement order-α Renyi entropies coincide for any α. The spectral and decomposition order-α Renyi entropies coincide for α = 2 or ∞. An open question is whether any theory satisfying Postulates 1 and 2 is completely satisfactory from the thermodynamic point of view. While the von Neumann and Petz arguments can be run with no trouble in the presence of Postulates 1 and 2 as we have shown, there could still be a different physical scenario, in which theories would fail to exhibit sensible behaviour unless they have even more of the structure of quantum theory.
Finally, another major open question is whether quantum-like theories exist, satisfying Postulates 1 and 2, that are distinct from quantum theory in that they admit higher-order interference. Roughly speaking, this means that three or more possibilities can interfere in order to produce an overall amplitude, unlike in quantum theory, where different possibilities only interfere in pairs. We extend the results of Ref. [43], where it was shown that in the context of Postulates 1 and 2 the existence of higher-order interference is equivalent to each of three other statements. We provide an equivalent entropic condition: there is higher-order interference if and only if the measurement and decomposition versions of the max entropy do not coincide.
Our understanding of quantum theory would be greatly improved if higher-order interference could be ruled out by simple information-theoretic, thermodynamic, or other physical arguments. On the other hand, if theories with higher-order interference exist and are eminently sensible, an immediate question is whether an experimental test could be performed to distinguish such a theory from quantum theory. While previous experiments [97][98][99][100][101][102] only tested for a zero versus non-zero value of higher-order interference, sensible higher-order theories that satisfy Postulates 1 and 2 (if they exist) could help to inform future experiments by supplying concrete models that can be tested against standard quantum theory. Leifer for many useful discussions, and we are grateful to the participants of the "Foundations of Physics working group" at Western University for helpful feedback. We would also like to thank Giulio Chiribella and Carlo Maria Scandolo for coordinating the arXiv posting of their work with us. This research was supported in part by Perimeter Institute for Theoretical Physics. Research at Perimeter Institute is supported by the Government of Canada through the Department of Innovation, Science and Economic Development Canada and by the Province of Ontario through the Ministry of Research, Innovation and Science. This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. This research was supported by the FQXi Large Grant "Thermodynamic vs. information theoretic entropies in probabilistic theories". HB thanks the Riemann Center for Geometry and Physics at the Institute for Theoretical Physics, Leibniz University Hannover, for support as a visiting fellow during part of the time this paper was in preparation.

VII.1.1. Proof that observables are well-defined
In this appendix, a decomposition of a state into perfectly distinguishable pure states (which always exists due to Postulate 2) will be called a "classical decomposition". Lemma 13. Assume Postulates 1 and 2. Let F = {0} be a face of A + and ω ∈ Ω A ∩ F . Then there exists a classical decomposition ω = j p j ω j with ω j ∈ F for all j.
Proof. Let ω = j p j ω j be a classical decomposition with p j = 0. As ω ∈ F and F a face, ω j ∈ F for all j.
Proof of Theorem 3. Let x ∈ A be arbitrary. By Lemma 5.46 from [62] there exists a frame {ω j } and x j ∈ R such that x = j x j ω j . We extend {ω j } to a maximal frame by adding x j := 0 for the new indices j. Now we group together the j with the same x j value, and by relabelling we find that x = n k=1 x k i ω k;i where the x k are pairwise different values of the x j and the ω k;i are the ω j that belong to this x j value. For any given k, the ω k;i generate a face F k with projective unit u k = i ω k;i . Therefore we find a decomposition x = n k=1 x k u k with x k pairwise different real numbers and u k order units of faces F k and n k=1 u k = u A . Now we show that the faces F k are mutually orthogonal: Let ω ∈ F k be an arbitrary normalized state. By Lemma 13 it has a classical decomposition ω = j p j ω (k) j which uses only pure states ω (k) j ∈ F k . Wlog we assume that these pure states form a generating frame of F k , by extending the frame and adding p j = 0 to the decomposition. Consider another face F m , i.e. m = k. Likewise to ω, let ω ∈ F m be an arbitrary normalized state and ω = j q j ω As ω ∈ F k and ω ∈ F m were arbitrary (normalized) states, this implies that F k and F m are orthogonal. As k = m were arbitrary, all the faces are mutually orthogonal. Now we will show that the decomposition x = j x j u j is unique. So assume there are two decompositions j with a j ∈ R pairwise different and projective units u j;i form a maximal frame; in particular they add up to u A (likewise for b). Therefore: For the remaining indices, we construct an inductive proof: Choose L ∈ R large enough such that a 1 + L > max{a na , b n b }, and define x := x + L · u (a) Furthermore defining a 1 := a 2 , a 2 := a 3 ,..., a na := a 1 + L, u At last we construct the projective measurement that corresponds to measuring the observable x: For F k , let P k be the orthogonal projector onto the span of F k (in particular, P k : A → span(F k ) surjective). We know that these projectors are positive and linear and satisfy u A • P k = u k . Furthermore 0 ≤ u k = u A • P k ≤ u A and k u A • P k = k u k = u A , i.e. we obtain a well-defined measurement; therefore the P k form a well-defined instrument. As they are projectors, the P k leave the elements of F k unchanged.

VII.1.2. Proof of Observation 5
In order to show that H(ω) = S(ω) is consistent with Assumptions 1, we only have to show that ω → S(ω) is continuous, to comply with assumption (d). According to Theorem 10 (which we will prove below), the spectral entropy S(ω) equals measurement entropy S(ω). But it is well-known [67] and easy to see from its definition that S is continuous.
It remains to show eq. (9). So let ω = j p j ω j be any decomposition of ω into perfectly distinguishable, not necessarily pure states ω i . Decompose all the ω i into perfectly distinguishable pure states ω Perfectly distinguishable states live in orthogonal faces, thus ω i , ω j = 0 for i = j (note that this is a conclusion that follows from Postulates 1 and 2, but could not be drawn from bit symmetry alone in [64]). Thus, we also have ω and therefore This completes the proof of Observation 5.

VII.1.4. Proof of Klein's Inequality and the Second Law for projective measurements
We consider an ensemble of systems described by an arbitrary state ω ∈ Ω A . To all systems of this ensemble we apply a projective measurement described by orthogonal projectors P a which form an instrument, resulting in a new ensemble state ω . The P a project onto the linear span of faces F a that replace the eigenspaces from quantum theory. We want to show that the measurement cannot decrease the entropy of the ensemble, i.e.

S(ω ) ≥ S(ω).
We decompose the proof into several steps. Our basic idea follows the proof of a similar statement for quantum theory in [50]: We reduce the proof of the Second Law to Klein's inequality. But as we do not have access to an underlying pure state Hilbert space, we will need to use a different argument for why Klein's inequality implies the Second Law for projective measurements.
So at first we prove Klein's inequality, adapting the proof of [50]. We note that a similar proof has also been found by Scandolo [48], albeit under different assumptions.
Proof of Theorem 9. We consider two arbitrary states ω, ν with classical decompositions ω = j p j ω j , ν = k q k ν k , where wlog the ω j and the ν k form maximal frames. We define the matrix P jk := ω j , ν k . All its components are non-negative, i.e. P jk ≥ 0, because the scalar product itself is non-negative for all states. As all maximal frames have the same size, the matrix is a square matrix; as maximal frames sum to u A , the rows and columns sum to one: We define r j := k P jk q k . Note that the r j form a probability distribution: r j ≥ 0 and j r j = k j P jk q k = k q k = 1. Using the strict concavity of the logarithm, we find: Therefore we get We recognize the last expression as the classical relative entropy of the probability distributions p j and r j . This classical relative entropy has the important property that it is never negative. Thus: In order to get the main proof less convoluted, we will state some technical parts as lemmas.

Lemma 14.
Assume Postulate 1 and 2. Consider orthogonal projectors P j which form an instrument. Then the P j are mutually orthogonal: Proof. We prove P k P j ω = 0 for all ω ∈ A, j = k. If P j ω = 0 this is trivial, so from now on assume P j ω = 0. As the cone is generating (i.e. Span(A + ) = A) and the projectors linear, it is sufficient to show P k P j ω = 0 for all w ∈ A + .
As P j is positive, P j ω = 0 implies that (u A • P j )(ω) > 0 because only the zero-state is normalized to 0. Using u A = u A • ( j P j ) = j u A • P j and P j P j = P j : As the projectors are positive and only the zero-state is normalized to 0, this shows P k P j ω = 0 for k = j.
Lemma 15. Assume Postulates 1 and 2. Consider an orthogonal projector P which projects onto the linear span of a face F of A + . Then for all states ω ∈ A + we find P ω ∈ F .
Proof. From basic convex geometry (see e.g. Proposition 2.10 in [63]), we know that F = span(F ) ∩ A + . Since P is positive, we have P ω ∈ A + ; furthermore, since P projects onto F , we have P ω ∈ span(F ), thus P ω ∈ F .
Proof of Theorem 6. We know that S(ω ω ) = −S(ω) − ω, log ω ≥ 0. As in Theorem 11.9 from [50] , we claim − ω, log ω = S(ω ) and therefore −S(ω) + S(ω ) ≥ 0. Thus we only have to prove − ω, log ω = S(ω ). But as we do not have access to an underlying pure state Hilbert space, our proof is different from [50]. By Lemma 14, the P a are mutually orthogonal, i.e. P a P b = δ ab P b . By symmetry of the P a also the P a ω are mutually orthogonal: P a ω, P b ω = ω, P a P b ω = 0 for a = b. This also shows that the F a are mutually orthogonal. If P a ω = 0 we use the decomposition P a ω = u A (P a ω) k r ak w ak with r ak = δ ak and w ak an arbitrary generating frame of F a . If P a ω = 0, then Paω u A (Paω) ∈ F a ∩ Ω A and by Lemma 13, there is a classical decomposition Paω u A (Paω) = k r ak ω ak with ω ak ∈ F a . We complete the ω ak to generating frames of the F a by adding terms with r ak = 0. As we are using classical decompositions/frames, we know ω aj , ω ak = δ jk . Furthermore, as the F a are mutually orthogonal, we know ω aj , ω bk = 0 for a = b.
We note that the the w aj form a maximal frame: For a = b we have P b ω aj = P b P a ω aj = 0, so we have a classical decomposition ω = a P a ω = a j u A (P a ω)r aj ω aj with ω aj a maximal frame that satisfies P a ω bj = δ ab ω bj . Note that we do not need to normalize ω as the measurement itself is required to be normalized. Using a P a log ω = bj log(u A (P b ω)r bj ) a P a ω bj = bj log(u A (P b ω)r bj )ω bj = log ω and − S(ω ) = − bj (u A (P b ω)r bj ) log(u A (P b ω)r bj ) = ω , log ω as well as the symmetry of the P a we finally find: − S(ω ) = ω , log ω = a P a ω, log ω = ω, a P a log ω = ω, log ω .
VII.1.5. Proof that measurement and spectral entropies are identical In the main text we encountered different ways to define the entropy. One of them is to adapt classical entropy definitions by using the coefficients of a classical decomposition. Another is to adapt classical entropy definitions by using measurement probabilities and minimizing over all fine-grained measurements. Here we will show that in the context of Postulates 1 and 2, these two concepts yield the same Rényi entropies.
To prove this, we will first analyze fine-grained measurements in further detail. The results will allow us to reproduce the quantum proof found in [66] for our GPTs. Lemma 16. Assume Postulates 1 and 2. Consider an arbitrary fine-grained measurement (e 1 , ..., e n ). Then for all j there exist some c j ∈ [0, 1] and a pure state ω j ∈ Ω A such that e j = c j ω j , · .
Proof. If e j = 0, we can just take c j = 0 and any pure state ω j . So from now on assume e j = 0.
Because of self-duality there exists some ω ∈ A + such that ω , · = e j . As e j = 0 also ω = 0 and therefore u A (ω ) = 0. With A + = R ≥0 · Ω A and c j := u A (ω ) > 0 there exists an ω ∈ Ω A such that ω = c j · ω. We want to prove that ω is pure, so assume it was not pure. Then it has a classical decomposition ω = N k=0 p k ω k with p k > 0 and N ≥ 1. By relabelling we can assume j = n, i.e. we consider e n = c j Thus the measurement (e 1 , ..., e n+N ) is a refinement of (e 1 , ..., e n ). With e n (ω 0 ) = c j p 0 = e n (ω 0 ) and e n (ω 1 ) = 0 = e n (ω 1 ) we find that e n is not proportional to e n , thus the fine-graining is non-trivial. This is in contradiction to our assumptions. Thus ω has to be pure. Furthermore 1 = u A (ω) ≥ e j (ω) = c j ω, ω = c j . So in total we have found that e j = c j ω, · with ω ∈ Ω A pure and c j ∈ [0, 1].

Lemma 17.
Assume Postulates 1 and 2. Let ω ∈ Ω A and ω = d j=1 p j ω j be a decomposition into a maximal frame. Then the measurement that perfectly distinguishes the ω j (i.e. e k (ω j ) = δ jk ) can be chosen to be fine-grained.
Proof. Define e j := ω j , · . As maximal frames add up to the order unit, this is a well-defined measurement and it satisfies e j (ω k ) = δ jk . It remains to show that this measurement is fine-grained.
Consider a fine-graining e k with e i = {j|M (j)=i} e j . By self-duality, there exist c j ≥ 0 and ω j ∈ Ω A such that e j = c j ω j , · and therefore {j|M (j)=k} c j ω j = ω k . As 1 = u A (ω k ) = {j|M (j)=k} c j u A (ω j ) = {j|M (j)=k} c j we find that {j|M (j)=k} c j ω j = ω k is a convex decomposition of a pure state. This requires c j = 0 or ω j = ω k . In both cases e j = c j ω k , · = c j e k holds true for all j with M (j) = k. Therefore, the fine-graining is trivial. Furthermore, consider a state ω ∈ Ω A with classical decomposition ω = d j=1 p j ω j into a maximal frame. Define the vector q := (e j (ω)) 1≤j≤N of outcome probabilities and the N -component vector p = (p 1 , ..., p d , 0, ..., 0) ∈ R N . Then q ≺ p, i.e. there exists a bistochastic N × N -matrix M with q = M p.
Proof. By Lemma 16 there exist c j ∈ [0, 1] and pure ω j ∈ Ω A such that e j = c j ω j , · . Define q l := e l (ω) = c l ω l , ω . Using  Now we come to the proof of the theorem: Proof of Theorem 10. Consider an arbitrary fine-grained measurement (e 1 , ..., e N ) and an arbitrary state ω ∈ Ω A with classical decomposition ω = d j=1 p j ω j into a maximal frame. Define q l := e l (ω) and the N -component vector p = (p 1 , ..., p d , 0, ..., 0). Let M be the bistochastic matrix from Lemma 18 with q = M · p. By Birkhoff's theorem, it is a convex combination of permutation matrices, i.e. M = σ∈S N a σ P σ for a probability distribution a σ and permutation matrices P σ . Wlog we only consider the Shannon entropy; the proof for the Rényi entropies works exactly the same way. As the Shannon entropy is Schur-concave and invariant under permutations: Furthermore H(p) = − d j=1 p j log p j = S(ω) is the entropy of a measurement that perfectly distinguishes the ω j , i.e. e j (ω k ) = δ jk . Because of Lemma 17, such a measurement can be chosen to be finegrained. Therefore we find: H(ω) = inf e∈E * H(e(ω)) = H(p) = S(ω).
VII.1.6. Proof of Theorem 12 As mentioned in the main text, the equivalences (i) ⇔ (iii) ⇔ (iv) ⇔ (v) are shown in [43]. We will now prove the equivalence (ii) ⇔ (v), which proves Theorem 12. Taking into account Theorem 10, and formulating the atomic covering property in the context of theories that satisfy Postulates 1 and 2, it remains to show the equivalence of the following two statements: may be considered more technical than substantive, and always holds in finite dimension) and also assumed lattice dimension 4 or greater, so Hilbert spaces of dimension 3 or less were not dealt with, nor were spin factors or the exceptional Jordan algebra. These low-dimensional cases also satisfy Piron's, and Ludwig's premises, but a theorem ruling out other instances satisfying them appears to be lacking.