The de Finetti theorem for test spaces

We prove a de Finetti theorem for exchangeable sequences of states on test spaces, where a test space is a generalization of the sample space of classical probability theory and the Hilbert space of quantum theory. The standard classical and quantum de Finetti theorems are obtained as special cases. By working in a test space framework, the common features that are responsible for the existence of these theorems are elucidated. In addition, the test space framework is general enough to imply a de Finetti theorem for classical processes. We conclude by discussing the ways in which our assumptions may fail, leading to probabilistic models that do not have a de Finetti theorem.


I. INTRODUCTION
There are many scenarios involving probabilistic reasoning about a large number of systems, where it is important that these systems can be regarded as identical and independent. Classical parameter estimation and quantum tomography, in which a source is calibrated by measuring separately a number of systems it has produced, provide excellent examples. More generally, much of experimental science involves repetition of an experiment, with conclusions drawn from observed relative frequencies, where the conclusions are only valid under the assumptions of identity and independence of the separate trials. A problem that arises, therefore, is what justifies these assumptions? De Finetti theorems are designed to answer this question.
De finetti theorems have a fundamental significance for the subjective Bayesian interpretation of probabilities. They enable the subjective Bayesian to explain why a rational agent treats a sequence of trials as identical and independent, without appealing to a notion of objective chance, or to unknown probabilities. In addition to this fundamental significance, they are also important technical tools. For example, quantum de Finetti theorems have been applied to various problems in quantum information theory, including proofs of the security of quantum key distribution [1], algorithms for deciding the separability of bipartite quantum states [2,3] and global optimization of the maximum output purity of quantum channels [4].
In this paper, we provide a de Finetti theorem for nonsignalling states on test spaces. A test space is a generalization of the sample space of classical probability theory and the Hilbert space of quantum the- * Electronic address: j.barrett@bristol.ac.uk † Electronic address: matt@mattleifer.info ory. The de Finetti theorem for test spaces includes the classical and quantum theorems as special cases. It also implies a de Finetti theorem for classical processes, which can also be viewed as a theorem about states in theories that exhibit "superquantum" correlations, as have been studied recently in quantum information [5,6,7,8,9,10,11,12,13,14,15,16,17]. The remainder of this paper is structured as follows. In §II, the classical and quantum de Finetti theorems are reviewed and their relevance for Bayesian statistics is discussed. In §III, the test space framework is introduced and its connection to the convex sets framework used in [18,19] is explained. §IV states the de Finetti theorem for test spaces and §V outlines some of its consequences, including the classical and quantum theorems, and the theorem for classical processes. §VI discusses the role of the various assumptions of the test space framework and outlines some more general scenarios in which the theorem fails. §VII concludes.

II. THE CLASSICAL AND QUANTUM DE FINETTI THEOREMS.
De Finetti introduced his theorem in the context of a subjective Bayesian approach to probability theory [20,21]. In this approach, probabilities are not defined as limiting relative frequencies, nor as objective properties of the physical world. Instead they are measures of the degrees of belief of a decision making agent. An immediate question is, why should degrees of belief be represented by real numbers obeying the Kolmogorov axioms? De Finetti's answer is provided by his famous Dutch Book argument [20,21] 1 . But a second question is, why do the usual rules of statistical inference apply to these quantities? In particular, how and why should relative frequencies be used to update probability assignments?
Consider an experiment in which a trial with d possible outcomes is repeated n times. Under a standard sort of analysis, the trials are first judged to be independent and identically distributed, so that the probability of getting the outcome sequence x 1 , . . . , x n is given by P n (x 1 , . . . , x n ) = p(x 1 ) × · · · × p(x n ), for some "unknown" probability distribution p. The distribution p is a parameter to be estimated.
To the subjective Bayesian, however, this is problematic since no sense can be given to an "unknown" probability. De Finetti provides an alternative analysis. If the trial can in principle be repeated an arbitrary number of times, then the joint distribution over outcome sequences P n should be defined for any n, so consider an infinite sequence of distributions P 1 , P 2 , . . . Suppose that for the first n trials, the agent is indifferent as to whether any further trials are actually performed or not, and that the agent is also indifferent as to the order in which the outcomes are reported. This suggests that the sequence P 1 , P 2 , . . . should satisfy the following. Definition 1. P n is symmetric if and only if it is invariant under permutations of the n tests. That is, P n (x 1 , . . . , x n ) = P n (x π(1) , . . . , x π(n) ) for all permutations π : {1, 2, . . . , n} → {1, 2, . . . , n}.
Theorem 1 (de Finetti's representation theorem [21,24]). If the sequence P 1 , P 2 , . . . is exchangeable, then P n can be written in the form where ∆ d is the set of all probability distributions over the outcomes {1, . . . , d}, µ is a probability measure on ∆ d , µ is independent of n, and µ is unique.
This is the classical de Finetti theorem for infinite sequences and a finite number of outcomes. It shows that if the sequence P 1 , P 2 , . . . is exchangeable, then P n can be written as if it were generated via a probability distribution µ over unknown probabilities p. In particular, if the first m < n trials are performed, and standard Bayesian updating applied directly to the joint distribution P n , [22,23]. one finds that the posterior probability for the remaining n − m trials is given by (2) where µ is updated as if Bayesian conditioning had been performed on an unknown parameter p directly: The quantum de Finetti theorem is a generalization of the classical theorem. It was first presented in Refs. [25,26], and a simpler proof given in Refs. [27,28]. Whereas the classical theorem concerned the outcome probabilities of a test that could be repeated an arbitrarily large number of times, the quantum theorem concerns the joint state of an arbitrarily large number of quantum systems. Suppose that each of these systems is associated with a d-dimensional Hilbert space H d . The joint state of n systems is then a density operator on the tensor product Hilbert space H ⊗n d . Exchangeability is defined for a sequence of states ω 1 , ω 2 , . . ., where ω n is a state of n systems. Tr (Q 1 ⊗ · · · ⊗ Q n ω n ) = Tr Q π(1) ⊗ · · · ⊗ Q π(n) ω n , for all permutations π : {1, 2, . . . , n} → {1, 2, . . . , n}, and for any projection operators Q 1 , . . . , Q n .
Note that this is equivalent to requiring that ω n = S π ω n S † π for all permutations π, where S π is the operator that permutes the n systems according to π. Definition 4. The sequence ω 1 , ω 2 , . . . is exchangeable if and only if 1. ∀n, ω n is symmetric, 2. ∀n, ω n = Tr n+1 (ω n+1 ), where Tr n+1 is the partial trace over the n + 1st system.
Theorem 2 (the quantum de Finetti theorem). If the sequence ω 1 , ω 2 , . . . is exchangeable, then ω n can be written in the form where Ω is the set of density operators on H d , µ is a probability measure on Ω, µ is independent of n, and µ is unique.
From a fundamental point of view, the quantum de Finetti theorem is particularly important for those approaches to quantum theory that take a subjective view of the quantum state [29,30,31,32,33]. These approaches are closely related to the Bayesian view of probabilities. A quantum state is taken to represent the degrees of belief of an agent, where these might be beliefs about the potential outcomes of measurements. In this case, the notion of an unknown quantum state, so prevalent in the literature, becomes problematic. The quantum de Finetti theorem shows how to dispense with this notion, at least in some situations.
Both of the theorems just presented assume finite sample spaces (or finite dimensional Hilbert spaces), and concern a number of trials or systems that tends to infinity. Both have been generalized in a number of ways. Classical theorems for a finite number of trials are discussed in [34,35,36] and quantum theorems for a finite number of systems in [1,37,38,39,40]. In addition to the quantum de Finetti theorem, there is also a de Finetti theorem for quantum operations [28,41], for representations of unitary groups [40,42] and for unitarily invariant quantum states [43].

III. TEST SPACES
The classical de Finetti theorem involves probabilities of outcome sequences for a test that can in principle be repeated an arbitrarily large number of times. The quantum de Finetti theorem involves the joint quantum state of a number of quantum systems that can in principle be arbitrarily large. Both of these can be viewed as special cases of a more general scenario. Suppose that a single system is associated with a number of different possible tests, which are mutually exclusive in the sense that only one can be performed at a time. The classical case is recovered when there is in fact only one such test, and the quantum case when the tests correspond to the different possible measurements on a quantum system. A state is an assignment of probabilities to the outcomes of all the different possible tests.
Assume further that given n systems, a test for each system can be independently chosen, and that a joint state is an assignment of probabilities to outcome sequences for each possible sequence of tests. Given all this, it is possible to define exchangeability for a sequence of states and to prove a de Finetti representation theorem. These ideas are formalized in this section, with the representation theorem given in the next.

A. Single systems
The technical notion that we use to describe a single system is that of a test space. Test spaces were introduced with the explicit purpose of describing probabilistic models more general than classical and quantum theory, but including both as special cases [44,45]. E a set, and S a set of countable subsets of E that covers E.
The idea is that each element of E is a possible outcome of a test. Each set s ∈ S corresponds to a possible test, with the elements of s being the outcomes of that test. The sets in S may overlap, thus the definition of a test space is designed to allow for the possibility that outcomes of two different tests are identified. The sets E and S themselves may have any cardinality, but the definition stipulates that the outcomes of any particular test are countable.
For finite E, a test space can be conveniently summarized by a Greechie diagram [46]. Each element of E is represented by a circle, and tests are represented by connecting the corresponding set of circles with a continuous line. Examples of Greechie diagrams are given in Figs. 1-3.
A state defines probabilities for the outcomes of each test such that (i) these probabilities sum to 1 for each test, and (ii) if an outcome appears in more than one test it gets the same probability in each case. Given a test space A, write the set of all possible states Ω(A). Note that it is easy to construct test spaces for which Ω(A) is the empty set, or for which a particular outcome has probability 0 in all states, or which may for similar reasons be judged unsatisfactory. Extra assumptions would rule these out, but here there is no need.
The set of possible state spaces of a test space is generic, in the sense that every finite dimensional convex set arises as the set of states for some test space. 2 As noted above, discrete classical probability theory is recovered when there is only one test, i.e., when A is of the form (E, {E}) (see Fig. 2). It is also possible to recover classical probability theory over an arbitrary measurable set [48]. Quantum theory with projective measurements is recovered when A = (P(H), M(H)), with P(H) the set of projection operators on a Hilbert space H and M(H) the set of projective decompositions of the identity. In this case, Gleason's theorem [49] implies that each state corresponds to a density operator ρ, with the probability assigned to projector P given by Tr(ρP ) 3 .
It will be useful to define arbitrary linear combinations of states. Given states ω 1 , . . . , ω k ∈ Ω(A), the linear The set of all linear combinations of states is a vector space denoted V (A). In the case of quantum theory, for example, V (A) is the real vector space of Hermitian operators on the Hilbert space. Clearly, Ω(A) is a convex subset of V (A).
Since Ω(A) by definition spans V (A), they have equal dimension. Importantly, from hereon we assume the following.
Finally, let V * (A) be the vector space dual to V (A), that is the set of all linear maps V (A) → ℜ. Note that each outcome of a test, that is each e ∈ E, can be uniquely identified with a mapẽ ∈ V * (A) such that e(ω) = ω(e) ∀ω ∈ Ω(A). Under this identification one can write ω(e) and e(ω) interchangeably, according to whether states are viewed as assigning probabilities to outcomes of tests, or vice versa. The set E can be viewed as a subset of V * (A), and it is easy to see that the span of E is equal to V * (A).
Before moving on to composite systems, we briefly note that there is an alternative approach to operational prob-In [47] this was proved for the state space of an orthomodular lattice, but the set of all finite ortho-partitions of unity of the lattice is a test space that has the same state space. 3 This holds provided the dimension of the Hilbert space is ≥ 3.
A generalization of Gleason's theorem to encompass positive operator valued (POV) measurements does hold for dimension 2 [50,51], but the formalism of test spaces is not general enough to encompass these measurements. This is because a POV decomposition of the identity can contain multiple instances of the same term, such as {I/2, I/2}, and states are constrained to assign the same probability to each instance. In the test space formalism, different outcomes of the same test are always considered distinct. For precisely this reason, generalizations of test spaces known as effect-test spaces have been studied that do encompass quantum POV measurements [52,53]. However, there is no real need to consider them here because we are primarily concerned with properties of states, and the set of possible state spaces of an effect-test space is no more general than that of a test space.
abilistic theories that has recently been used to investigate the information theoretic properties of such theories [6,18,19]. In this approach one starts with a compact convex set Ω, to be interpreted as a space of states, and defines measurement outcomes to be the set of affine functionals f : Ω → [0, 1]. The present work could easily have been formulated in this framework, but it would be odd to do so from a subjective Bayesian point of view.
If the states are supposed to represent degrees of belief then it makes sense to start with the objects that they are degrees of belief about, i.e. the tests, rather than the states themselves. There is no loss of generality in working with test spaces, since the result of [47] implies that any compact convex set can arise as the state space of a test space in the finite dimensional case.

B. Multi-partite systems
In order to consider multi-partite systems, one needs to consider the composition of test spaces and the definition of joint states. Suppose that two systems A and B are associated with test spaces A = (E, S) and B = (F, T ). If the combined system is regarded as a system in and of itself, then it too should be associated with a test space. But how is this constructed, and how is it related to A and B? Nothing that has been said with respect to single systems implies a unique answer, so further assumptions are needed.
Suppose that given separate systems A and B, it is possible to perform any test s on system A simultaneously with any test t on system B. A joint state assigns probabilities to pairs (e, f ) of outcomes. In particular, if e ∈ s and e ∈ s ′ , then the probability of obtaining (e, f ) does not depend on whether the tests performed are s and t, or s ′ and t. Similarly if f ∈ t and f ∈ t ′ . Suppose further that a specification of the joint probability for all outcome pairs serves to define the joint state uniquely.
This motivates the following.
Definition 7. Given two test spaces, A = (E, S) and B = (F, T ), the Cartesian product, A × B, is a new test space whose set of outcomes is the set theoretic Cartesian product E × F , and whose set of tests is {s × t |s ∈ S, t ∈ T }, where s × t is again the set theoretic Cartesian product.
From Definition 6, a state on A × B is a map E × F → [0, 1], with probabilities summing to 1 for each pair of tests (s, t).
If a state is nonsignalling, then the marginal probability of obtaining outcome e for test s does not depend on which B test is performed. This means that a marginal state ω A ∈ Ω(A), can be defined such that where the right hand side does not depend on the choice of t. Similarly, one can define a marginal ω B ∈ Ω(B).

Definition 9.
Given ω A ∈ Ω(A) and ω B ∈ Ω(B), the direct product, The definition of the Cartesian product of test spaces is valid for any pair of test spaces, including the case in which one of them is itself a Cartesian product. Considering three test spaces, A = (E, S), B = (F, T ), and C = (G, U ), it is easy to see that (A×B)×C is isomorphic to A×(B ×C). Thus one can simply write A×B ×C, with states on A×B×C identified with maps E×F ×G → [0, 1]. The product A × A × · · · × A, where there are n terms in the decomposition, can be written A ×n .
The notion of a nonsignalling state has been defined with respect to bipartite decompositions. It extends readily to the case of an n-fold product.
Then ω is n-fold nonsignalling iff for all α, for all e j with j / ∈ α, and for all tests t i1 , . . . , t i k and t ′ i1 , . . . , t ′ i k . Finally, this will be useful: Lemma 1. Consider the test space A 1 × · · · × A n . The direct product states, of the form ω 1 ⊗ · · · ⊗ ω n , span a subspace of V (A 1 × · · · × A n ), and the subspace can be identified with V (A 1 ) ⊗ · · · ⊗ V (A n ). If a joint state ω is n-fold nonsignalling, then ω ∈ V (A 1 ) ⊗ · · · ⊗ V (A n ), i.e., ω can be written as a linear combination of direct products.
Proof. Begin with the n = 2 case. The tensor product V (A 1 )⊗ V (A 2 ) can be defined as the set of bilinear maps V * (A 1 ) × V * (A 2 ) → ℜ. Any direct product ω 1 ⊗ ω 2 defines such a map via (a, b) → a(ω 1 )b(ω 2 ), and it is straightforward that these span V (A 1 ) ⊗ V (A 2 ). Now consider a nonsignalling joint state ω. The fact that ω is nonsignalling permits the definition of the marginal state ω 1 . Define the conditional state ω 2|e such that ω 2|e (f ) is the probability of outcome f on system 2, given that outcome e was obtained on system 1. Thus ω(e, f ) = ω 1 (e)ω 2|e (f ), for all (e, f ) ∈ E × F . Note that ω 2|e ∈ Ω(A 2 ). Now suppose that f and {g i } are elements of F such that, considered as elements of V * (A 2 ), f = i r i g i . Thus ω 2 (f ) = i r i ω 2 (g i ) for all ω 2 ∈ Ω(A 2 ). Then ω(e, f ) = ω 1 (e)ω 2|e (f ) = i r i ω 1 (e)ω 2|e (g i ) = i r i ω(e, g i ). A value ω(e, b) can now be defined for arbitrary b ∈ V * (A 2 ) by linear extension. Similar reasoning concludes that ω(e, b) is linear in the first argument, hence can be extended to ω(a, b) for a ∈ V * (A 1 ). So ω defines a bilinear map V * (A 1 )×V * (A 2 ) → ℜ as required. The extension to general n is straightforward.
Note that if we have two quantum test spaces A = (P(H A ), M(H B )) and B = (P(H B ), M(H B )) then the space of nonsignalling states on A × B is larger than the state space of AB = (P(H A ⊗H B ), M(H A ⊗H B )). To see this, recall that the nonsignalling state space of A × B only has to be positive for all possible choices of local measurements on A and B, whereas the test space AB includes joint measurements, such as the Bell measurement for example. Thus, the criteria to be a state on AB are more restrictive than those for nonsignalling states on A × B. Indeed, if we take a state on AB and perform a positive, but not completely positive, map on system A then the result is still a valid state on A × B, but not on AB in general, e.g. consider performing a partial transpose on a Bell state. Nevertheless, the state space of AB is still a convex subset of the nonsignalling states on A×B, which is enough to apply our theorem. More generally, one might want to consider rules for composing subsystems that yield a convex subset of the nonsignalling states on the Cartesian product for arbitrary test spaces.

IV. A DE FINETTI THEOREM FOR TEST SPACES
Given a system associated with a test space A = (E, S), suppose that n copies are associated with the product A ×n . Given an infinite sequence of states ω 1 , ω 2 , . . . where ω n ∈ Ω(A ×n ), it is possible to define symmetry and exchangeability in a manner similar to the classical and quantum cases. The main difference is that here, the definition of exchangeability involves the extra condition that the states are nonsignalling. ∀n ω n is symmetric, 2. ∀n ω n is n-fold nonsignalling, 3. ω n (e 1 , . . . , e n ) = en+1∈s ω n+1 (e 1 , . . . , e n , e n+1 ).

Theorem 3 (The de Finetti theorem for test spaces).
Suppose that the sequence ω 1 , ω 2 , . . . where ω n ∈ Ω(A ×n ), is exchangeable. Then ω n can be written in the form where µ is a probability measure on Ω(A), µ is independent of n, and µ is unique.
Proof. The proof is an adaptation of the proof of the quantum de Finetti theorem due to Caves et. al. [27].
Recall that in quantum theory, an informationally complete measurement is a positive operator-valued (POV) measurement such that if the outcome probabilities are all known, then the state is determined uniquely. The strategy of Caves et al. is to generate a classical distribution by considering an informationally complete POV measurement performed separately on each quantum system. Applying the classical de Finetti theorem to the distribution of outcome sequences allows the form of the quantum state to be inferred. In our context, the test space A need not include a test that is informationally complete for the state space Ω. But for the purposes of proof, this does not matter. All that is needed is a corresponding mathematical construction. Proof. This result is not new. It is also used in Ref. [18], and we give the same proof. A more general version is proven in Ref. [54]. The set M will play the role of an informationally complete measurement. The linear independence of the a i means that a state ω is determined uniquely by the values a i (ω). The idea now is that given a nonsignalling ω n ∈ Ω(A ×n ), one can at least imagine a measurement corresponding to M performed separately on each system. The probability of obtaining an outcome sequence (a i1 , . . . , a in ) for a direct product state ω 1 ⊗ · · · ⊗ ω n is defined as a i1 (ω 1 ) · · · a in (ω n ). Recalling Lemma 1, according to which a nonsignalling state can be written as a linear combination of direct product states, the probability of the sequence (a i1 , . . . , a in ) for an arbitrary nonsignalling state can be defined by linear extension. Note that this way, the sequence (a i1 , . . . , a in ) corresponds to a vector a i1 ⊗ · · · ⊗ a in ∈ (V * (A)) ⊗n , such that the probability is given by (a i1 ⊗ · · · ⊗ a in )(ω n ). The vectors a i1 ⊗ · · · ⊗ a in are linearly independent and span the tensor product space (V * (A)) ⊗n . This means that the joint measurement M ×n is informationally complete for the nonsignalling n-partite system.
For each such distribution p, there is a unique ω p ∈ V (A) such that a i (ω p ) = p(a i ) (where uniqueness follows from the linear independence of the a i ). Thus Eq. (13) can be rewritten Since the joint measurement is informationally complete for the nonsignalling n-partite states, this implies This is not quite sufficient to establish Theorem 3. The right hand side of Eq.(15) is an integral over all ω p ∈ V (A) satisfying 0 ≤ a i (ω p ) ≤ 1 and i a i (ω p ) = 1. That is, it is an integral over all ω p that return sensible probabilities for the imaginary informationally complete measurement. But in general, there are ω p satisfying these conditions that are not valid states because they return a value < 0 for some element of the test space. It remains to show that the integral can be restricted to those ω p ∈ Ω(A).
To this end, consider an ω p ∈ V (A) such that 0 ≤ a i (ω p ) ≤ 1 and i a i (ω p ) = 1, but ω p (e) < 0 for some e ∈ E. It must be the case that e ∈ s for some test s. Let the other elements of s be {f 1 , . . . , f k } and note that i ω p (f i ) > 1. There must exist an ǫ > 0 and a neighborhood N of ω p in V (A) such that i ω(f i ) > 1+ǫ for all ω ∈ N . LetÑ be the subset of ∆ d such that p(a i ) = a i (ω) for some ω ∈ N . The expression (15) holds for any n, so suppose that n is even and that the test s is performed on each of the n systems. The probability that outcome e is never obtained is defined by ω n and is given by where we have used the fact that the first term in the second line is ≥ 0 if n is even. For large enough n this expression is > 1 unless µ(Ñ ) = 0. This holds for any ω p / ∈ Ω(A). It follows that, with a suitable redefinition of µ, V. CONSEQUENCES OF THEOREM 3 The de Finetti theorem for test spaces is rather general. Very little goes into the definition of a test space itself. In fact, in the finite dimensional case we are considering, the state space of an individual system may be an arbitrary compact convex set. Thus the most substantive assumptions that go into the theorem are those that concern how test spaces combine when joint systems are considered.
In the classical case, the test space (E, {E}) contains a single test and the state space for an individual system is ∆ d , where d is the number of elements of E. In this case, the nonsignalling condition is redundant because there is just a single test on each system and hence no freedom to choose alternative measurements. The Cartesian product of classical test spaces corresponds to the usual Cartesian product of sample spaces and so Theorem 3 reduces to Theorem 1 straightforwardly. The quantum case is a little more subtle because the state of n quantum systems belongs to the state space of (P(H ⊗n ), M(H ⊗n )), rather than the state space of the Cartesian product of n test spaces of the form (P(H), M(H)). Nevertheless, a quantum state defined by a density operator on H ⊗n is uniquely specified by the probabilities for measurement outcomes of the form Q 1 ⊗ Q 2 ⊗ · · · ⊗ Q n , where the Q i are projection operators. It follows that the states on (P(H ⊗n ), M(H ⊗n )) can be identified with a subset of the nonsignalling states on (P(H), M(H)) ×n . Hence Theorem 3 implies the quantum de Finetti theorem.
As a further illustration of the generality of the result, we note that a theorem for classical processes, or conditional probabilities, can also be viewed as a special case of Theorem 3. A process C can be thought of as taking an input Y into an output X, where Y takes values in {1, . . . , k} and X takes values in {1, . . . , d}. The process can be defined as the set of conditional probabilities of the form P (X = x|Y = y) (abbreviated P (x|y)). Clearly, the set of all such C can be regarded as the set of states on a test space (E, S), where E consists of ordered pairs with E = {(x, y)} x=1,...,d,y=1,...,k and Fig. 3. The state space of this test space is isomorphic to the set of conditional probability distributions via the identification ω((x, y)) = P (x|y).
More generally, a process C n takes inputs Y 1 , . . . , Y n into outputs X 1 , . . . , X n , where each Y i takes values in {1, . . . , k}, and each X i takes values in {1, . . . , d}. Such a C n can be defined as the set of conditional probabilities of the form P n (x 1 , . . . , x n |y 1 , . . . , y n ), and can also be identified with a state on the Cartesian product (E, S) ×n . The n-fold nonsignalling and symmetry conditions can be defined for C n exactly as they are in Definitions 10 and 11. Exchangeability for a sequence C 1 , C 2 , . . . can be defined exactly as in Definition 12. Theorem 3 becomes the following theorem for classical processes. Theorem 4. If the sequence C 1 , C 2 , . . . is exchangeable, then the conditional probabilities defining C n can be written in the form P (x 1 , . . . , x n |y 1 , . . . , y n ) = Ω dµ(p) p(x 1 |y 1 ) · · · p(x n |y n ), (18) where Ω is the set of processes that take a single input X into a single output Y , µ is a probability measure on Ω, µ is independent of n, and µ is unique.
A couple of remarks regarding Theorem 4 might be helpful.
First, we described Theorem 4 as a special case of Theorem 3. It is worth noting the sense in which it is strictly less general. After all, any state on any test space could be thought of as a classical process, which takes an input (choice of test) into an output (outcome of test). The point is that, with this identification applied to a generic test space A, not all sets of conditional probabilities p(x|y) will correspond to valid states on A. It is only if A has the special feature that the tests are non-overlapping that this will be the case. The integral in Eq. (18) ranges over all processes of the form p(x|y), whereas in Eq. (9), it is important that the integral ranges only over Ω(A).
Second, the de Finetti theorem for classical processes can be viewed as a de Finetti theorem for states in a theory that is non-classical and non-quantum, but admits the most general type of correlations compatible with the nonsignalling requirement. This theory is discussed in [6], where it is called Generalized Non-signalling Theory. It admits superquantum correlations, which have been discussed in the quantum information literature under the name Popescu-Rohrlich, or nonlocal, boxes. Of course, as in the quantum case, exchangeable states do not actually exhibit such correlations, since the de Finetti theorem shows that they are separable.

VI. WHEN DOES A DE FINETTI-TYPE
THEOREM NOT HOLD?
The de Finetti theorem for test spaces holds thanks to a number of assumptions concerning how systems combine to make joint systems. One of these is the nonsignalling condition for joint states. Others are encoded in the formal definition of a Cartesian product of test spaces. It is interesting to see what happens when these assumptions are relaxed, so in this section we present a number of cases where the theorem fails.

A. The nonsignalling condition
First, the assumption that the joint states are nonsignalling is crucial not only in the proof, but in the very definition of exchangeability. In general, if a state ω ∈ Ω(A 1 × A 2 ) is signalling, then it is not possible to define marginal states ω 1 and ω 2 . But exchangeability requires that, given the n + 1th state in the sequence, the marginal state of the first n systems should be defined and should equal the nth state. In fact, if a state ω ∈ Ω(A 1 × A 2 ) is only signalling in one direction then one of the marginals can be defined. For example, if probabilities of outcomes for system 2 depend on which test was performed on system 1, but not vice versa, then ω 1 is well defined. But such a state is not symmetric, thus could not form part of an exchangeable sequence. Arguably, the possibility of performing tests on one system that do not affect the other is part and parcel of what we mean when we speak of separate systems (or separate trials).

B. Simultaneous measurements
Implicit in the definition of the Cartesian product of test spaces is the idea that a test on one system can be regarded as simultaneous with a test on the other system. One can certainly imagine rules for combining systems where this is not the case. As a very simple example, consider two classical bits which have combined in the following strange manner. If bit 1 is measured before bit 2, then the bits are found to be 00 or 11 with equal probability. On the other hand, if bit 2 is measured before bit 1, then the outcomes are 01 and 10 with equal probability. Note that a suitable no-signalling condition is satisfied and it is possible to define marginal states for these bits. With more complicated test spaces one can construct examples like this which are also symmetric. 4 We leave open the status of the de Finetti theorem in such cases.

C. Extra degrees of freedom
Another assumption that is implicit in the Cartesian product of test spaces is that the joint state of two systems is completely specified by the probabilities for the joint outcomes (e, f ) of each pair of local tests (s, t). 4 Consider the test space A, with outcomes E = {a, b, c, d}, and two tests corresponding to s1 = {a, b}, and s2 = {c, d}. Suppose that there are two systems, A and B, each described by the test space A, and that they have combined as follows. If the test s1 is performed on both systems, the outcomes are completely random and uncorrelated. If the test s2 is performed on both systems, then the outcomes are completely random and uncorrelated. On the other hand, if the test s1 is performed on either one of the systems, followed by s2 on the other, then the joint outcomes are ac or bd with equal probability. If the test s2 is performed on either one of the systems, followed by s1 on the other, then the joint outcomes are ad or bc with equal probability. It is clear that this peculiar bipartite system does not allow signalling and is also invariant under a permutation of the two systems.
One can construct theories in which a joint state does indeed determine such probabilities, but is not completely specified by them. There are extra degrees of freedom, bound up in the two systems, which are inaccessible unless some kind of joint operation involving both systems at once is performed.
As discussed in [27], a clear example of this is provided by a modification of quantum theory in which real Hilbert spaces are used rather than complex Hilbert spaces. States and observables correspond to real symmetric, rather than complex Hermitian, operators. For a 2-dimensional system (a rebit ), there are measurements corresponding to xand z-spin but not y-spin. In this case, if σ y is the usual Pauli matrix, then 1/4(I ⊗ I + σ y ⊗ σ y ) is an allowed state of two rebits. But the only way it can be distinguished from 1/4(I ⊗ I) is if a joint observable such as σ y ⊗ σ y is measured. Note that in real quantum theory, this really is a joint observable: it cannot be measured via a separate σ y measurement on each system.
In [27] it is shown explicitly that the de Finetti theorem fails in real quantum theory. If ω n is defined by ω n = 1 2 I + σ y 2 ⊗n + 1 2 then it is real and symmetric, and the sequence ω 1 , ω 2 , . . . is exchangeable. But by the de Finetti theorem for complex quantum theory, the right hand side of Eq. (19) is the unique de Finetti representation for this sequence.

VII. CONCLUSION
In this paper, an infinite de Finetti theorem for test spaces has been presented, which generalizes both the classical and quantum de Finetti theorems. To illustrate the generality of the result, we have shown that a de Finetti theorem for classical processes, which may also be interpreted as a de Finetti theorem for nonlocal boxes, follows as a special case.
From a practical point of view, proving theorems for test spaces, rather than just for quantum theory, confers significant advantages. Not only do we achieve a unification of the classical and quantum results, but we also obtain results that apply to essentially arbitrary convex sets. This is potentially relevant when technological limitations prevent the preparation of arbitrary quantum states of certain systems, so that there is an effective restriction to a convex subset.
From a foundational point of view, this work can be seen as part of a project of understanding what is responsible for the enhanced information processing power of quantum theory, and for the project of deriving quantum theory from information theoretic axioms. In particular, if one adopts a subjective Bayesian approach to probability, it might be desirable to impose the requirement that, in any reasonable theory, one should be able to make sense of the idea of reconstructing an unknown state of a system by making repeated measurements. Having a de Finetti theorem for test spaces means that this does indeed make sense for theories in this framework, and that the existing approaches to Bayesian state tomography in quantum theory would generalize straightforwardly.
There are various directions for future work. It would be useful to produce a finite de Finetti theorem for test spaces. It would also be useful to establish whether a (finite or infinite) de Finetti theorem holds without the assumption made here of finite dimensionality of state spaces. Finally, as discussed in Section VI B, it might be interesting to explore the status of de Finetti-type theorems in cases where systems combine in non-standard ways.
Note added. Related results have recently been obtained by M. Christandl and B. Toner [55], who derive a de Finetti theorem for classical processes, analogous to Theorem 4 of the present work, but extended to the finite case.