Generalised probabilistic theories and conic extensions of polytopes

Generalized probabilistic theories (GPT) provide a general framework that includes classical and quantum theories. It is described by a cone $C$ and its dual $C^*$. We show that whether some one-way communication complexity problems can be solved within a GPT is equivalent to the recently introduced cone factorisation of the corresponding communication matrix $M$. We also prove an analogue of Holevo's theorem: when the cone $C$ is contained in $\mathbb{R}^{n}$, the classical capacity of the channel realised by sending GPT states and measuring them is bounded by $\log n$. Polytopes and optimising functions over polytopes arise in many areas of discrete mathematics. A conic extension of a polytope is the intersection of a cone $C$ with an affine subspace whose projection onto the original space yields the desired polytope. Extensions of polytopes can sometimes be much simpler geometric objects than the polytope itself. The existence of a conic extension of a polytope is equivalent to that of a cone factorisation of the slack matrix of the polytope, on the same cone. We show that all $0/1$ polytopes whose vertices can be recognized by a polynomial size circuit, which includes as a special case the travelling salesman polytope and many other polytopes from combinatorial optimisation, have small conic extension complexity when the cone is the completely positive cone. Using recent exponential lower bounds on the linear extension complexity of polytopes, this provides an exponential gap between the communication complexity of GPT based on the completely positive cone and classical communication complexity, and a conjectured exponential gap with quantum communication complexity. Our work thus relates the communication complexity of generalisations of quantum theory to questions of mainstream interest in the area of combinatorial optimisation.


Introduction
Generalised Probabilitic Theories (GPT) [49,28,31,34,42,39,9,22,23,50] are a framework that allows generalisations of both classical and quantum theories. In its simplest form a GPT is given by a closed convex cone C that defines the state space, by the dual cone C * that defines the measurement space, and by a unit u ∈ C * that normalises the states. Upon adding a sufficient set of axioms one restricts to classical or quantum theory. But using only a subset of the axioms provides a framework in which more general theories can be studied. Many phenomena considered uniquely quantum, such as no-cloning and no-broadcasting, trade-off between state disturbance and measurement, properties associated with entanglement, teleportation, remote steering of ensembles, and properties of entropy, already appear at the level of GPT, see e.g. [46,47,9,4,5,6,8,7]. Related lines of enquiry have shown that non local theories obeying no-signalling have "quantum" properties such as intrinsic randomness, impossibility of cloning, secret key generation, see e.g. [51,10,2].
Ideally one would hope to find a set of simple and physically intuitive axioms that naturally restrict GPT to quantum theory [35]. Information and complexity theory provide a possible line of approach by providing criteria that can be used to rule out classes of theories. The development of quantum information [53] shows that perfectly consistent complexity theories alternative to classical are possible. On the other hand it has been shown that unlimited supply of maximally non local boxes makes communication complexity trivial [27,44], which can be taken as an argument for why such correlations are not physical. More recently, the principle of information causality was shown to be violated by many non local correlations [54].
Independently of the above, considerable work has been devoted to understanding the geometry and extension complexity of polytopes [24,43,59]. For instance, the polytope associated with the Travelling Salesman Problem (TSP) is the convex hull of all points in {0, 1} ( n 2 ) that correspond to a Hamiltonian cycle in the complete n-vertex graph K n . Solving the TSP is equivalent to linear optimisation over the TSP polytope. Representing the set of feasible solutions of a problem by a polytope forms the basis of a standard and powerful methodology in combinatorial optimisation, see, e.g., [58].
Many polytopes of interest have exponentially many facets, which makes them difficult to use directly. An extension (or lift) of a polytope is a geometric object in a larger dimensional space whose projection onto the original space yields the desired polytope. This is related to the concept of extended formulation, which refers to the description of the extension, here a system of linear equations plus one (conic) constraint. Linear extensions of polytopes are given in terms of linear programs. Semidefinite and conic extensions of polytopes are given in terms of semidefinite programs and conic programs. The extension may be much simpler than the original polytope, see e.g. [24]. This motivates the definition of the linear (semidefinite; conic) extension complexity of a polytope as the minimum size of a linear (semidefinite; conic) program expressing the polytope, in terms of the dimension of the cone. When the extension complexity is small, optimisation problems that seem difficult over the original polytope may become simple over the extended formulation.
It was shown in [60,36] that the existence of a linear (semidefinite; conic) extension of a polytope is essentially equivalent to certain linear (PSD; conic) factorisations of a matrix associated to the polytope, called the slack matrix. The slack matrix records for each pair (v, F ) of vertex v and facet F of the polytope the corresponding algebraic distance. Specifically the matrix M has a cone factorisation M = T U if T is a matrix whose rows belong to the cone C and U is a matrix whose columns belong to the dual cone C * . When the cone C is the nonnegative orthant or the cone of PSD matrices (positive semidefinite matrices), one obtains the nonnegative and PSD factorisations of the matrix.
As shown in [32,33], the size of a nonnegative (PSD) factorisation of the matrix M is equal, up to a small additive constant, to the number of classical (quantum) bits that must be sent in a randomized one-way communication complexity scenario with nonnegative outputs that computes the matrix in expectation. Conversely, the existence of such a communication complexity protocol implies the corresponding factorisation of the matrix M . Note that the communication complexity scenario used here differs from the one most often used in the literature since on the one hand we require that the matrix M be reproduced exactly (we tolerate no error), but on the other hand it must only be reproduced on average (the protocol could for instance output 0 a large fraction of the time).
Inspired by the connection between PSD factorisation and quantum communication complexity, it was shown in [33] that the linear extension complexity of some important polytopes from combinatorial optimisation, including the correlation polytope and TSP polytope, is exponential. Extensions and strengthening's of this result can be found in [16,18,17,3,55,21].
Understanding the semidefinite extension complexity of polytopes is an important research question [36,33,37,38]. In particular since semidefinite programming is in P, a small semidefinite extension of the TSP polytope with efficiently computable coefficients would imply that P = NP. It is therefore reasonable to conjecture that the semidefinite extension complexity of polytopes such as the TSP polytope is exponential. This conjecture is supported by the counting argument of [19] (based on the earlier work of [57]) that shows that some 0/1 polytopes have large semidefinite extension complexity.
In the present work we connect the above two areas of study. First we give an operational meaning to the cone factorisation of a matrix M , for an arbitrary cone C: it is equivalent (up to the communication of a single classical bit) to the existence of a randomized one-way communication complexity scenario with nonnegative outputs that computes the matrix M in expectation when the states and measurements are described by the GPT associated to cone C. This generalises the operational interpretation of the nonnegative and PSD factorisations in terms of classical and quantum communication complexity.
In order to understand the implications of this result, it is important to have an upper bound on how much classical information can be stored in a state of a GPT. The analogous result stating that at most one classical bit can be stored in a quantum bit is known as Holevo's theorem [40], and underlies much of quantum information theory. Indeed only in the presence of such a bound can one give meaning to communication complexity of the corresponding GPT. Our second result is to provide such a bound: namely we show that if a GPT is associated to a cone C ⊂ R n , then the states of this GPT can store at most log n classical bits. To prove this result, we use the fact that the space of measurements in a GPT is convex, and then prove that the extremal measurements have at most n non-zero outcomes. This characterisation of GPT extremal measurements is to our knowledge new and of interest in itself. It generalises a well known characterisation of extremal quantum measurements [26].
We then consider the specific case of the copositive cone C n = {X | y ⊺ Xy ≥ 0, ∀y ∈ R n + } and its dual the completely positive cone C * n = {X = k i=1 y i y ⊺ i | y i ∈ R n + } (throughout, the elements of R n are column vectors). From the point of view of quantum information, the completely positive cone C * n can be viewed as the space of n × n density matrices that can be expressed as convex combinations of pure states with real nonnegative coefficients (in a specific basis). It is well known that completely positive (copositive) programming, that is, maximising a linear function over the intersection of an affine subspace and the completely positive (copositive) cone, is NP-hard [13]. In other words, the very complicated geometry of the completely positive and copositive cones allows one to efficiently encode NP-complete decision problems.
We show here that all polynomially definable 0/1 polytopes have polynomial size completely positive extension complexity. Such a polytope is a polytope whose vertices form a subset of {0, 1} d that can be recognized by a circuit of size poly(d). To prove this result we proceed in two steps. First we show, extending the work of Maximenko [52], that all polynomially definable 0/1 polytopes are projections of faces of the correlation polytope COR(n) = conv{aa ⊺ | a ∈ {0, 1} n }, with n = poly(d). Second, exploiting results of Burer [20], we show that the correlation polytope has a polynomial size completely positive extension, that is COR(n) is given by the projection of the intersection of C * poly(n) with an affine subspace. This result is interesting by itself because small completely positive (or copositive) programming formulations have been found for a large number of combinatorial optimisation problems, see, e.g. [56,13,48,20]. We show that virtually all combinatorial optimisation problems share this property. In fact, Burer [20] asks: "Other than the handful of problems listed above, what types of problems can be represented as COPs [copositive programs] or as CPPs [completeley positive programs]?". Our result can be viewed as an answer to this question: all combinatorial optimisation problems (integer linear programs with 0/1 variables) such that testing the feasibility of a solution can be done efficiently, can be efficiently represented as completely positive programs.
Using the correspondence outlined above, this result implies an exponential gap between the communication complexity of GPT based on the completely positive cone and classical communication complexity. In view of the very plausible conjecture mentioned above, one also anticipates an exponential gap with quantum communication.
We now return to the problem of introducing sets of axioms that reduce to quantum theory. In general there will be an interplay between structural axioms that define the mathematical framework (e.g. that states are vectors in R n ), physical axioms (e.g. that convex combinations of states are states), and information theoretic axioms (e.g. that certain communication complexity tasks are impossible, or that secure key distribution is possible). In particular the present work suggests that even at the very basic level of GPT, complexity arguments could be used to rule out certain theories. Indeed our results show that GPT based on the completely positive cone C * n provides exponential saving over classical (conjectured quantum) communication, and this could be used to rule out this theory. (There are probably many other reasons to rule out GPT based on C * n , but these would invoke other axioms, related for instance to transformations between states). The present work can thus be viewed as a step along the program of [35,15] who wish to use as much as possible information theory type axioms to restrict possible physical theories.
As a concluding remark, we note that there have already been a number of results in classical complexity that were obtained through quantum arguments, or inspired by quantum information, see e.g. [45,1,33] and the review [30]. Here the same kind of connection occurs, but with ideas and arguments inspired by the foundations of quantum mechanics, and in particular generalisations of quantum theory. The connection arose very naturally during the development of the present work: we first realised that the recently introduced cone factorisation of matrices could be given an operational interpretation within the context of GPT, and then explored to what exent the completely postive cone would provide an interesting example, which finally lead to new results in combinatorial optimisation.
The reader mainly interested in the foundation of physics aspects should concentrate on Sections 2 to 5. On the other hand, the reader interested in the combinatorial optimisation aspects should go to the self contained Section 6.

General formulation
We work in R n with the usual scalar product which we denote ·, · . Let C ⊂ R n be a proper cone (i.e. C is a closed, pointed and full-dimensional cone), and denote by C * = {x ∈ R n | ∀y ∈ C : x, y ≥ 0} its dual. Notice that C * is again a proper cone.
An element ω ∈ C is an unnormalised state. An element e ∈ C * is an unnormalised effect. The unit effect u ∈ C * is an interior point of the dual cone. Thus u, ω > 0 for every non-zero ω ∈ C.
Normalised states are states ω ∈ C such that ω, u = 1. Any unnormalised state ω = 0 can be rescaled ω → ω/ ω, u to become a normalised state. Normalised states form a closed convex set. Convex combinations of states correspond to probabilistic mixture: for 0 ≤ p ≤ 1, pω 1 + (1 − p)ω 2 can be interpreted as the state obtained by preparing ω 1 with probability p and preparing ω 2 with probability 1 − p.
A measurement M is a finite set of effects that sum to the unit effect: M = {e i ∈ C * | i e i = u}. Note that any effect e ∈ C * , e = 0 can be rescaled so that {λe, u − λe} is a measurement since u is an interior point.
The above construction allows one to study one-way communication scenarios as follows. One party, Alice, prepares a (normalised) state ω and sends it to another party, Bob, who carries out a measurement M on the state. The probability that Bob obtains outcome i is These are indeed probabilities since from the definitions, P (i|ω) ≥ 0 and i P (i|ω) = 1.
A Generalised Probabilistic Theory, in the simple form used here, is therefore defined using the above construction by a proper cone C ⊂ R n and a unit u ∈ C * . We denote it GPT(C,u).

Classical theory.
Classical theory corresponds to the case where the cone C = {x ∈ R n | ∀i : x i ≥ 0} is the nonnegative orthant and the unit effect is u = (1, 1, . . . , 1) ⊺ . Then the normalised states belong to the simplex ∆ n = {x ∈ R n | ∀i : x i ≥ 0, i x i = 1}. This simplex has n extreme points ω i = (0, 0, . . . , 1, 0, . . . , 0) ⊺ . Any normalised state ω has a unique decomposition as a convex combination of the extreme points ω = i p i ω i with p i ≥ 0 for all i and i p i = 1. One can therefore view ω as a probability distribution over the extreme points.
The dual cone C * is also the nonnegative orthant. There is a canonical measurement with effects e i = (0, 0, . . . , 1, 0, . . . , 0) ⊺ , i = 1, . . . , n. The probability that one gets result i in state ω is p i , i.e. the probability that the system was in extreme point ω i .

Quantum theory
Quantum theory corresponds to the case where the cone C PSD = C * PSD is the set of positive semidefinite hermitian matrices. If the Hilbert space dimension is d (over the complex numbers), then C PSD is a proper full-dimensional cone in the space of all d × d matrices (this time over the reals). Thus n = d + 2 d 2 = d 2 here. The scalar product can be written as ω, e = Tr (ωe). The unit effect is the identity matrix u = I. A state ω ∈ C PSD is normalised if Tr(ω) = 1. The extreme states are called pure states. They correspond to rank 1 positive semidefinite matrices with unit trace. A measurement M = {e i ∈ C PSD | i e i = I} is called a Positive Operator Valued Measure (POVM).

GPT based on the completely positive and copositive cones
Let S d denote the set of all d × d real symmetric matrices. The cone of completely positive matrices is the set of matrices It can be thought of as the restriction of quantum theory to states with real nonnegative coefficients (in a preferred basis), since any matrix in C * d is the convex combination of pure states with nonnegative real coefficients.
Its dual (relative to the scalar product X, Y = Tr(XY )) is the cone of copositive matrices We can take the unit effect to be the unit matrix Alternatively we could take the unit effect to be u = J, the matrix with all entries 1. Let f = (1, . . . , 1) ⊺ . Since J = f f ⊺ it is also an (unnormalised) state. In this case the normalised pure states are of the form the state space is strictly smaller than the effect space.
The dual of the copositive cone is the completely positive cone (as implied by the traditional notation used above). Hence we could also consider the dual theory where C d constitutes the state space and C * d the set of measurements. The unit effect could be taken to be I or J as above. In this case the effect space is smaller than the state space. (However it is the case where the set of states is C * d that will interest us here).

Refining measurements
That is, M ′ is a refinement of M if carrying out measurement M is equivalent to carrying out measurement M ′ and then forgetting part of the information contained in the outcome (in the notation above, forgetting the label j of the outcome i, j, and keeping only label i). Carrying out the refined measurement M ′ will in general provide more information than carrying out the original measurement M .
An extremal vector v in a proper cone C is defined by the property that for any w ∈ C, w ≤ C v implies w = λv for some λ ≥ 0, where we use the notion of generalized inequality: if x, y ∈ C, Lemma 1. Any measurement M can be refined to a measurement M ′ whose effects are all extremal vectors of C * .
This follows immediately from the Krein-Milman theorem [11] adapted to cones that states that any vector in a cone C ⊂ R n can be written as a nonnegative combination of extremal vectors. Moreover, by Caratheodory's theorem [11], v can in fact be written as a nonnegative combination of at most n extremal vectors.

Convex combinations of measurements
Note that some of the e i , f i , g i may be equal to zero. Operationally this means that the measurement M can be realized by carrying out measurement M 1 with probability p and measurement M 2 with probability 1 − p, and then keeping only the label of the outcome, but forgetting which of the two measurements was in fact realized. Carrying out measurement M 1 with probability p and measurement M 2 with probability 1 − p will in general provide more information than carrying out the original measurement M .

Extremal measurements
can be refined to a measurement M ′ such that M ′ can be written as the convex combination of measurements each of which has at most n nonzero effects which are all extremal vectors of C * .
The analogue of this result for quantum measurements is well known. In a Hilbert space of dimension d, the extremal POVM's with more than d 2 elements have only d 2 nonzero elements, and these elements are rank one projectors [26]. Note that if one fixes the number of outcomes of the POVM (rather than first carrying out refinement as above), then the structure of extremal POVM's is more complicated, see [26].
Proof of Theorem 1. Given any measurement M , we use Lemma 1 to construct a refined measurement, all of whose effects are extremal. Hence from now on, we suppose that the measurement M is composed of m extremal effects. If the number of effects m ≤ n, then the assertion is trivial. So assume m > n, and that all e i = 0.
By Carathéodory's Theorem [11], there are coefficients λ i ≥ 0 such that i λ i e i = u and at most n of the λ i 's are nonzero. We denote First we observe that λ max > 1. To prove this, suppose that λ max ≤ 1. Then using the notion of generalized inequality and the fact that M has m > n nonzero effects we have u = i λ i e i ≤ C * i:λ i =0 e i < C * i e i = u, a contradiction. We define the measurement M 1 = {f i ∈ C * | i = 1, . . . , m} with effects f i = λ i e i if λ i = 0, and f i = 0 otherwise. Note that measurement M 1 has at most n nonzero effects.
We define measurement We note that M 2 is a legitimate measurement, since g i ∈ C * (use λ max > 1) and i g i = u. We note that measurement M 2 has at most m − 1 nonzero effects. Indeed for all i such that λ i = λ max (by definition of λ max there is at least one such i), we have g i = 0.
Finally, measurement M is the convex combination of measurements M 1 and M 2 with weights (These weights are nonnegative since λ max > 1). Note that from the above construction all the effects of M 1 and of M 2 are proportional to the effects in M , hence the effects of M 1 and M 2 are extremal vectors of C * . We have thus written measurement M as a convex combination of two measurements, one of which has n nonzero outcomes, and the other at most m − 1 nonzero outcomes, both of which consist only of extremal vectors of C * . By iterating the argument, M can be written as a convex combination of measurements with at most n nonzero effects, all of which are extremal vectors of C * .

Holevo bound for GPT
How much classical information can be stored or transmitted using states ω ∈ C of a generalized probabilistic theory? The corresponding result in quantum information states that at most one classical bit can be stored in a quantum bit. This is known as Holevo's theorem [40] and it underlies much of quantum information theory. For instance, for communication complexity problems it is the benchmark that allows meaningful comparison between sending classical and quantum information [53].
To answer this question we consider the following scenario, formulated using a GPT(C, u): Alice receives some classical input x, distributed according to some probability distribution p(x). She encodes it into a state ω(x) ∈ C which she sends to Bob. Bob carries out a measurement M , obtaining outcome y. The classical capacity of the channel is the mutual entropy I(X; Y ) between x and y, maximized over the probability distribution p(x), the coding ω(x) and the measurement M .
The Holevo capacity of a noiseless channel defined by GPT(C, u) is Operationally, it corresponds to choosing an encoding and measurement that maximises the classical capacity of the channel. Assuming that Alice and Bob share a noiseless channel for states in some GPT(C, u), we can frame the problem as follows. Alice receives input x ∈ {0, 1} k and Bob receives input y ∈ {0, 1} ℓ . Upon receiving her input, Alice prepares a normalised state ω(x) ∈ C that she sends to Bob. Bob carries out a measurement M (y) = {e i (y) ∈ C * } on the state prepared by Alice. He obtains result i with probability P (i) = ω(x), e i (y) . Bob then produces an output r(i, y) that depends on the result i of his measurement and on y. We require that the result output by Bob is always nonnegative: r(i, y) ≥ 0. We further require that the expectation of the outputs, where C xy ≥ 0 is the communication matrix. Let us relax the constraints on Alice slightly, and allow Alice to send to Bob subnormalised state u, ω(x) ≤ 1. Physically, this can be done by providing Alice with an extra classical communication channel of capacity 1 bit. Alice then sends the state ω(x) to Bob, and uses the extra bit to tell Bob whether he must output 0, or carry out the procedure outlined above.
This result generalises the link between nonnegative factorisation of a matrix and classical communication complexity problem [60,32], and between PSD factorisation and quantum communication complexity [33], to arbitrary cones and GPT communication complexity. Note that this result is independent of the choice of unit u ∈ C * , as long as u is an interior point of C * .
Proof of Theorem 3. Let us denote by b ∈ {0, 1} the extra bit sent by Alice, and by p(b|x) the probability that the bit is 0 or 1. The average output given inputs x, y is then E(r|xy) = p(b = 1|x) i r(i, y) ω(x), e i (y) . Let us defineω(x) = p(b = 1|x)ω(x) ∈ C and r(y) = i r(i, y)e i (y) ∈ C * . Then producing the communication matrix C x,y implies that we can write C xy = ω(x), r(y) .
In the next section we discuss one-way communication complexity when the states belong to the completely positive cone, and the effects to the copositive cone. We exhibit a communication matrix, specifically, the slack matrix of the correlation polytope COR(n), that can be realised by sending states in C * d with log d = O(log n). On the other hand this problem requires Ω(n) classical bits of communication. It is highly plausible that it also cannot be achieved using a logarithmic number of quantum bits of communication (the contrary would come close to proving that P = NP, as we discuss below).

Conic extensions of polytopes
A polytope P ⊆ R d can be described either as the convex hull of a finite set of points V = {v 1 , . . . , v m } ⊆ R d or as the set of solutions of a finite system of linear inequalities Ax ≥ b, where A ∈ R n×d , b ∈ R n , provided that this set of solutions is bounded (see [61] for a thorough treatment of the subject). Thus P has the following inner and outer descriptions: The slack matrix S ∈ R m×n of the polytope with respect to the above descriptions is the matrix obtained by computing by how much each vertex satisfies each inequality, i.e., it is given by S ij = A j v i − b j , where A j is the jth row of A. By definition all elements of a slack matrix are nonnegative. (We remark that, compared to some previous work, in particular [33], we work here with a transposed slack matrix here -this turns out to be more natural in the present context.) Let C ⊆ R k be an arbitrary closed convex cone and C * its dual cone. A conic extension of the polytope P is a set Q = {(x, y) ∈ R d+k | Ex + F y = g, y ∈ C} where E ∈ R p×d , F ∈ R p×k , and g ∈ R p such that the projection of Q into the x-space equals P : The extension is called proper if the affine subspace defined by Ex + F y = g contains some interior point of C.
Given cone C, the existence of a conic extension of the polytope P is essentially equivalent to the existence of a cone factorisation of the slack matrix of P [36], in the following sense.
Theorem 4 (Gouveia, Parrilo and Thomas [36]). Let P be a polytope that is neither empty or a point, and S be any slack matrix of P .
• If P admits a proper conic extension with respect to cone C, then its slack matrix S admits a cone factorisation on cones C and C * .
• Conversely, if S admits a cone factorisation on C and C * then P admits a (non-necessarily proper) cone extension with respect to C.
Gouveia et al. [36] show that the technical condition of being proper can be removed if cone C is nice, that is, K * + F ⊥ is closed for all faces F of K. For instance, it is known that the nonnegative orthant and the PSD cone are both nice.
From Theorem 3, it follows that it the existence of a conic extension with respect to C is also (essentially) equivalent to the existence of a one-way communication protocol using states belonging to cone C and effects belonging to the dual cone C * , that produces as average output the slack matrix.

Correlation polytope
The correlation polytope COR(n) is defined as the convex hull of all the rank-1 binary symmetric matrices of size n × n. In other words, It is shown in [33] that any linear extension of the correlation polytope has size 2 Ω(n) , that is, there exists a constant α > 0 such that any linear extension of the correlation polytope has size at least 2 αn . By linear extension we mean that the cone C is taken to be the nonnegative orthant R k + . This implies that any one-way classical communication protocol with nonnegative outputs that produces the slack matrix of COR(n) in expectation requires at least αn classical bits, i.e., the dimension of the space in which the classical information is coded is at least 2 αn .
It is reasonable to conjecture that there does not exist a semidefinite extension of the correlation polytope of size poly(n). By semidefinite extension of size k we mean that the cone C is taken to be the cone of k × k (real) PSD matrices. Indeed if such a polynomial size semidefinite extension exists, and if an approximation to the coefficients defining this PSD extension could be computed in polynomial time, then there would exist a polynomial time algorithm for maximisation of a linear function over the correlation polytope (since semidefinite programming is in P). But maximising a linear function over COR(n) is NP-complete. Hence this would imply that P = NP. Just the existence of a poly(n)-size semidefinite extension of COR(n) would imply NP ⊆ P/poly, as follows from the results of Brïet, Dadush and Pokutta [19].
If this conjecture is true, and COR(n) does not have a poly(n)-size PSD extension, then there are no quantum one-way communication protocols using quantum states belonging to a Hilbert space of size poly(n) with nonnegative outputs that produce the slack matrix of COR(n) in expectation. This follows from the relation between PSD extensions, PSD factorisation, and quantum communication given in [33] On the contrary, if one places oneself in the context of a GPT wherein states belong to the completely positive cone and effects to the copositive cone, then there exists a one-way communication protocol with nonnegative outputs that produces in expectation the slack matrix of the correlation polytope, and in which the information is coded in a space of dimension poly(n). Hence this provides an exponential saving with respect to classical communication, and a conjectured super-polynomial saving with respect to quantum communication. Similarly to semidefinite extension, we say that the size of a completely positive extension is k if it is relative to the cone of all k × k completely positive matrices. Proof. Consider an optimization problem of the form Suppose that for every x that satisfies x j 0 for all j = 1, . . . , n and a ⊺ i x = b i for all i = 1, . . . , m we have that x j 1 for all j ∈ B. Burer [20] has shown that, under this assumption, the above problem can be rewritten as the following conic program: where • denotes the Frobenius product (so Q • X = Tr(QX) = Q, X ) and C * 1+n denotes the completely positive cone generated by the matrices zz ⊺ , where z ∈ R 1+n 0 . Now, optimizing over the correlation polytope COR(n) can be modeled as the following optimization problem: min where e i denotes the ith unit vector in R 2n . It is easy to see that for every x that satisfies (e i + e i+n ) ⊺ x = 1 for all i = 1, . . . , n and x j 0 for all j = 1, . . . , 2n, we have that 0 x j 1 for all j = 1, . . . , n since these conditions are just a rephrasing of x i + x i+n = 1, x i 0, x i+n 0 for i = 1, . . . , n. Thus the above formulation satisfies the condition for Burer's result. Therefore, we obtain an equivalent optimization program over the copositive cone: Finally, this conic program can be rewritten using a new matrix variable Y = (y ij ) i,j=0,...,2n = 1 x ⊺ x X and symmetrizing as (P) : min Y ∈ C * 1+n+n Since this holds for every possible choice of Q = (q ij ), we see that COR(n) = {Z = (z ij ) i,j=1,...,n | ∃Y = (y ij ) i,j=0,...,2n : z ij − y ij = 0 ∀i, j = 1, . . . , n (7)-(10) Therefore, the correlation polytope COR(n) has a poly(n)-size completely positive extension.
We point out that the completely positive extension of COR(n) constructed in the proof of the previous theorem is know to be not proper, that is, there is no Y in the interior of C * 1+2n satisfying (7)-(10). This was observed by Burer [20]. Thus Theorem 4 does not imply the existence of a cone factorisation for the slack matrix of COR(n) over the completely positive cone C * 1+2n . We now proceed to construct such a factorisation following another route.
First, we write down the dual of program (P) above: Call M = M (α, β, γ, δ) the left-hand side of the conic constraint above. We claim that there is a choice of the variables of (D) such that M is in the interior of C 1+2n . In order to prove this, let β i = δ i = 0 and γ i = α for all i = 1, . . . , n. Moreover, choose α < 0 such that λ − α 2 0 for all eigenvalues λ of Q.
It is known that any symmetric matrix M ∈ R (1+2n)×(1+2n) is in the interior of the copositive cone C 1+2n if and only if ξ that is not the all-zero vector [12,29]. With our choice of α, β, γ and δ, we have: By choosing α < 0 with a large enough absolute value, we see that this quantity is always strictly positive when ξ x ∈ R 1+2n 0 is different from the all-zero vector. So M is in the interior of C 1+2n . This implies that Slater's condition is satisfied for (D), therefore strong duality holds (see, e.g., [14]). Since (P) is a reformulation of an optimisation problem over the correlation polytope, it is bounded and has a finite optimum value. Then by strong duality, (D) has an optimum value equal to the optimum value of (P).
Later, we will need the fact that the optimum value of (D) is attained. Unfortunately, this is in general not implied by Slater's condition for (D). * We prove it by using the specific structure of (D).
Lemma 2. The optimum value of (D) is attained. * This would be implied by Slater's condition for (P), but as noted before (P) is known to lack this property [20].
Proof. Let κ 0 denote the optimum value of (D). Fix any bound B < κ and consider the set F B of feasible solutions of (D) of value at least B. It suffices to prove that F B is bounded. Indeed, F B is closed and by strong duality, we know that there exists a sequence of points of F B whose objective value converges to κ. If F B is bounded, then it is compact and the sequence admits a subsequence that converges in F B . The limit of the subsequence is an optimal solution of (D).
Thus we find the following bounds: It is routine to obtain upper and lower bounds on δ i for each i = 1, . . . , n from the bounds on the other variables, using (13) and (19). We obtain Thus F B is bounded, because it is contained in a box.
We will now use this to give a completely positive factorisation for the slack matrix of the correlation polytope COR(n). Let S(n) denote the slack matrix of the correlation polytope COR(n), with respect to its vertices and facet-defining inequalities. For each given vertex aa ⊺ of COR(n), Bob creates a completely positive matrix which clearly lies in the feasible region of (P). For a given facet-defining inequality n i=1 n j=1 q ij y ij κ, Alice uses the coefficients α, β, γ, and δ from an optimal solution of (D) -which exists by Lemma 2-to define a copositive matrix M = M (α, β, γ, δ) for the given facet. We then have where the middle equality follows from the fact that Y is feasible for (P) and the last equality follows from the optimality of the dual solution given by α, β, γ, and δ. Thus Y • M is the entry of the slack matrix S = S(n) of COR(n) corresponding to the given facet n i=1 n j=1 q ij y ij κ and given vertex aa ⊺ . The different matrices Y (one for each vertex) and M (one for each facet) give us a factorisation of the slack matrix S(n) of the correlation polytope over the completely positive cone C * 1+2n .

Polynomially definable 0/1-polytopes
A polytope P ⊆ R d is called a 0/1-polytope if all its vertices are in {0, 1} d . Now fix a polynomial p = p(d). Informally, we say that a 0/1-polytope P is "p(d)-definable" if there exists a predicate defining the vertex set of P that is efficient in the sense that it can be implemented by a circuit of "size" at most p(d). Formally, 0/1-polytope P ⊆ R d is said to be p(d)-definable if there exists a Boolean circuit C(x, y) with k + d inputs x ∈ {0, 1} k , y ∈ {0, 1} d , one output and at most p(d) gates, such that The idea is that the circuit C(x, y) checks whether y ∈ {0, 1} d is a vertex of P or not, given advice x ∈ {0, 1} k . The bits of x give side information that is used to define the vertex set of P . For instance, the stable set polytope of a n-vertex graph G is 2n 2 -definable because there exists a circuit C(x, y) with k = n 2 plus d = n inputs, one output and 3 n 2 + 1 2n 2 gates -2 n 2 + 1 AND gates and n 2 NOT gates-that checks whether y is the incidence vector of a stable set † S in G, given the incidence vector x of the edge set of n-vertex graph G. Thus for x = G and y = S , the circuit C simply checks whether for each edge ij of G we have i / ∈ S or j / ∈ S. In some cases, the advice bits are not necessary and we can let k = 0. For instance, for the case of the correlation polytope, one can easily design a circuit C(y) for d = n 2 with O(d) = O(n 2 ) gates that tests whether y ∈ {0, 1} d is a binary correlation vector y = bb ⊺ for some b ∈ {0, 1} n .
Often one does not consider single polytopes but families of 0/1-polytopes defined in various dimensions. An example of such a family is that of all correlation polytopes, that is, {COR(n) | n 1}. We say that such a family of 0/1-polytopes is polynomially definable if all its members that live in R d are p(d)-definable, for the same polynomial p(d). Examples of polynomially definable families of polytopes include the correlation polytopes (or cut polytopes, since they are linearly equivalent to the correlation polytopes), and many others such as travelling salesman polytopes, stable set polytopes, and so on. As a matter of fact, most commonly studied families of 0/1-polytopes are polynomially definable, since they have the property that recognising vertices can be done efficiently.
Here, we slightly strengthen an interesting observation of Maximenko [52] about correlation polytopes. He proves that every p(d)-definable 0/1-polytope can be obtained as the projection of some face of some correlation polytope COR(n) with n polynomial in d. In this sense, correlation polytopes are "universal objects". In fact, Maximenko restricts himself to the case where the predicate defining the 0/1-polytope has no side input x, that is, k = 0. (He also uses the cut polytopes, which are linearly equivalent to the correlation polytopes, and this makes the proof a bit more complicated). We give a short proof of the unabridged version of his result, see Theorem 6.
In order to make the proof of Theorem 6 as short and transparent as possible, we assume here that the circuit C is implemented using only NOR gates (with fan-in 2 and unbounded fan-out).
Recall that NOR(z i , z j ) = 1 if and only if z i = z j = 0. It is known that any circuit using standard gates (OR, AND, NOT, . . . ) can be transformed into such a circuit, with only a polynomial blow-up in size (constant blow-up for circuits with bounded fan-in). In particular, NOT(z i ) = NOR(z i , z i ) and we allow pairs of parallel arcs in the circuit to be able to repeat inputs.
Finally, remark that, although we do not put any explicit bound on the size of advice x, it is clear that the circuit C(x, y) can only read at most 2p(d) many bits of x since it has at most p(d) gates.
Theorem 6. Every p(d)-definable 0/1-polytope P ⊆ R d is a projection of a face of the correlation polytope COR(n), with n ≤ d + p(d).
Proof. Suppose P = P (C, x) for some circuit C(x, y) with at most p(d) gates, and some x ∈ {0, 1} k . Assume that P is not empty, otherwise the result is trivial. We create n ≤ d + p(d) new Boolean variables which we denote as z i,i for i = 1, . . . , n. The first d variables z i,i represent the variables y i , and the last at most p(d) variables represent the output values of the NOR gates. With a slight abuse of notations, we also denote the variables of the space in which COR(n) is defined as z i,j , where i, j ∈ {1, . . . , n}. Notice that none of the variables z i,j corresponds to the advice bits x ℓ , which are considered as constants on which the polytope P depends.
For defining a face F of COR(n) that projects to P , we intersect COR(n) with p(d) + 1 valid hyperplanes, one for each NOR gate and one for the output of the circuit.
For each NOR gate z k,k = NOR(z i,i , z j,j ) we add the equation This equation defines an hyperplane that is valid ‡ for COR(n) because the left-hand side is always less or equal to 1 (recall for instance that z i,i = 1 and z j,j = 1 implies z i,j = 1), and equals 1 for a vertex of COR(n) if and only if (z i,i , z j,j , z k,k ) ∈ {(1, 0, 0), (0, 1, 0), (1, 1, 0), (0, 0, 1)}, i.e. if and only if z k,k = NOR(z i,i , z j,j ). For a NOR gate involving one constant (e.g., one of the advice bits), that is, a NOR gate of the form z k,k = NOR(z i,i , c), we have z k,k = NOT(z i,i ) if c = 0 and z k,k = 0 if c = 1. Gates of this type can also be easily simulated by valid hyperplanes similar to (23). For instance we can use: z i,i + z k,k − 2z i,k = 1 (when c = 0) ; (24) z k,k = 0 (when c = 1) .
The output of a NOR gate involving two constants (e.g., two advice bits) is simply considered as a new constant whose value is a function of x ∈ {0, 1} k .
Finally, assuming that z n,n represents the output of the circuit, we add the equation that defines a valid hyperplane. The face F is thus the intersection of COR(n) and the hyperplanes eqs. (23,24,25,26). From the above construction it follows that y ∈ P (that is C(x, y) = 1) if and only if y i = z i,i for i = 1, . . . , d and z ∈ F . Therefore, the image of F by the projection to the variables z i,i for i ∈ {1, . . . , d} is exactly P .
By combining this with Theorem 5, we obtain the following result.
This proves that virtually all problems of interest in combinatorial optimisation can be expressed in an economical way as conic programs over the completely positive cone, which is striking given the large number of papers establishing this for individual problems, e.g. [56,13,48,20]. Here, we consider a combinatorial optimisation problem as the task of finding in a (possibly implicitly described) collection F of subsets of a finite universe U whose elements have weights w(u) ∈ R, a set F ∈ F such that w(F ) = u∈F w(u) is maximum (minimum), which generically corresponds to maximizing (minimizing) a linear function over a 0/1 polytope. (More general would be to optimize over integer polyhedra, that is, polyhedra whose vertices have integer coordinates; this is part of discrete optimisation.) In terms of GPTs, the theorem supplies a large number of communication problems that are easy for GPTs based on completely positive / copositive cones, but hard in the classical case.