Causal structures from entropic information: Geometry and novel scenarios

The fields of quantum non-locality in physics, and causal discovery in machine learning, both face the problem of deciding whether observed data is compatible with a presumed causal relationship between the variables (for example a local hidden variable model). Traditionally, Bell inequalities have been used to describe the restrictions imposed by causal structures on marginal distributions. However, some structures give rise to non-convex constraints on the accessible data, and it has recently been noted that linear inequalities on the observable entropies capture these situations more naturally. In this paper, we show the versatility of the entropic approach by greatly expanding the set of scenarios for which entropic constraints are known. For the first time, we treat Bell scenarios involving multiple parties and multiple observables per party. Going beyond the usual Bell setup, we exhibit inequalities for scenarios with extra conditional independence assumptions, as well as a limited amount of shared randomness between the parties. Many of our results are based on a geometric observation: Bell polytopes for two-outcome measurements can be naturally imbedded into the convex cone of attainable marginal entropies. Thus, any entropic inequality can be translated into one valid for probabilities. In some situations the converse also holds, which provides us with a rich source of candidate entropic inequalities.


I. INTRODUCTION
Starting point of this paper is the question: What can be inferred about the causal relationship of a collection of random variables from a restricted set of observations? To phrase this problem more precisely, we need to introduce the notions of a marginal scenario and a causal structure -the two pieces of data which specify the instances we will be considering.
A marginal scenario describes which sets of random variables are jointly observable. Joint observations might be constrained for a variety of reasons. In quantum nonlocality, these reasons are physical: random variables corresponding to non-commuting observables cannot always be jointly measured. In general, there might also be practical reasons, for instance: We have no access to the variable describing the genetic disposition of a patient to become both a smoker and to develop lung cancer (not the least because we do not know whether such a genetic influence exists).
For the purpose of this paper, a causal structure is a list of linear constraints on the (conditional) mutual information between sets of random variables. For example, in the familiar Bell scenario, one commonly demands that measurement choices of, respectively, Alice X and Bob Y are independent of the hidden variable λ: I(X, Y : λ) = 0. Relaxing this constraint to demand the correlations be small I(X, Y : λ) ≤ ǫ would still be linear in the mutual information and thus an element of a causal structure according to our definition. By allowing for arbitrary linear constraints, we go slightly beyond the way the notion of "causal structure" is commonly formalized in the field of causal inference [1,2]. There, the combinatorial structure of a direct acyclic graph (DAG) is used * rafael.chaves@physik.uni-freiburg.de to encode certain sets of conditional mutual informations that are assumed to vanish. Our approach subsumes and extends this.
With every given causal structure, we can associate the set of marginal distributions that are compatible with it. If we observe a data point that lies outside that region, we can exclude the presumed causal structure as a valid model for the observed data. This logical structure (characterize the global properties compatible with local observations) is an instance of a marginal problem, which occur frequently both in classical [3] and in quantum probability [4][5][6][7].
In quantum non-locality [8], the focus has traditionally been on settings for which the marginal distributions happen to be convex polytopes. In that case, checking whether an observed marginal distribution is compatible with the causal model reduces to the task of verifying that none of the inequalities associated with the facets of the polytopes is violated-these are the Bell inequalities [9,10]. However, the non-convex nature of mutual information means that the marginals that appear for more general causal structures are, at best, non-trivial algebraic varieties. A few such examples have been treated in the quantum literature, including bilocality scenarios [11] or scenarios that allow for correlations between Alice's and Bob's measurement choices with the hidden variable [12,13] (c.f. also Section V).
A priori, it is unclear whether these more complicated marginal regions allow for an explicit description that is tractable from an analytic and computational point of view. It is this problem that entropic methods greatly simplify. Indeed, as indicated above, (conditional) independence constraints are linear in terms of Shannon entropies. As a result, the image of the marginal regions of general causal structures turn out to possess natural descriptions in terms of linear inequalities.
The set of all joint entropies (without any causal con-straints and prior to marginalization) has been analyzed extensively in information theory [14,15]. While it is known to be a convex cone, its precise form is still not explicitly understood. For practical purposes, it is often replaced by an outter approximation: the convex Shannon cone, which is defined by a finite number of explicit Shannon type inequalities. All marginals of the Shannon cone can, in principle, be found computationally using linear programming [15]. What is more, causal structures merely amount to further linear constraints and can therefore be included in a natural way. An additional nice feature of entropic inequalities is that they are valid for variables consisting of any number of outcomes. This stands in stark contrast to the usual approach for which increasing the number of outcomes of the marginal scenario increases the dimension and complexity of the correlation polytope, in practice meaning that new inequalities need to be derived and tailored to the specific number of outcomes under consideration. On the negative side entropic inequalities provide, in principle, only a necessary condition for the solution of the marginal problem [16]. In spite of that, entropic inequalities are known to be fine enough to distinguish, for example, different causal structures [17,18] or witness non-locality and contextuality [19][20][21][22][23][24][25][26].
In spite of its potential applications, the entropic approach to the marginal problem has been little explored. In particular, no entropic inequalities are known for Bell scenarios involving more than 2 parties or many measurement settings. Another problem, well suited to be tackled with entropies, is the one where the amount of shared randomness between the parties involved in a Bell test is bounded to be below a certain value. Commonly, shared randomness is assumed to be a free and boundless resource, but quantitative considerations about how much of it is actually necessary to reproduce some quantum correlations can give useful insights that would be extremely hard to tackle with the usual approaches.
These are the kind of problems we look at in this paper. In Sec. II we start defining the entropic cone described by all Shannon-type inequalities. In Sec. III we state known results about convex cones that will be used in Sec. IV to prove a theorem showing that, for marginal scenarios without statistical independence, any Shannon-type inequality is also valid for the probabilities if a proper translation is made. We also show that the converse is in general not true by providing a counter example showing that not every inequality for probabilities is also valid for Shannon entropies. Inspired by these results, in Sec. V we derive the entropic version of the Collins-Gisin inequalities [27] also considering the effects of bounded shared randomness between the parties. In Sec. VI we derive a multipartite generalization of the entropic inequality originally derived by Braunstein and Caves for the bipartite case [19]. In Sec. VII we computationally apply the Fourier-Motzkin (FM) algorithm to derive entropic inequalities for a couple of different scenarios, including marginal models that also include statistical independencies and the effects of bounded shared randomness. We discuss our findings in Sec. VIII while technical results and proofs can be found in the Appendices.

A. Marginal Scenarios
Given a set of variables X 1 , . . . , X n , a marginal scenario is a collection of certain subsets of them, those subsets of variables that can be jointly measured. In the case of Bell scenarios, the marginal scenario is achieved by imposing space-like separation between some of the observables. Clearly, a subset of jointly measurable variables is still a jointly measurable set. Formally (see [20,22] for further details), In practice some joint statistics is measured for every S ∈ M. For example, if the variables X i and X j are jointly measurable ({X i , X j } =: S ∈ M), one can access P (X i = x i , X j = x j ), the probability of obtaining the outcomes X i = x i and X j = x j . These marginal probabilities determine in particular the marginal Shannon entropy: As first noticed in [19], the existence of a joint distribution for all variables X 1 , · · · , X n implies that marginal Shannon entropies satisfy certain inequalities, which may be violated by measurement statistics originating from quantum experiments. Below, we will recall how to compute these inequalities in general, potentially in the presence of extra causal constraints.

B. Entropy cones
For the purpose of this section, assume that a number n and some joint distribution for n random variables X 1 , . . . , X n . We denote the set of indices of the random variables by [n] = {1, . . . , n} and its powerset (i.e. set of subsets) by 2 [n] . For every subset S ∈ 2 [n] of indices, let X S be the tuple of observables (X i ) i∈S and H(S) := H(X S ) be the associated marginal entropy. With this convention, the entropy becomes a function on the power set. The linear space of all set functions is of course isomorphic to Ê 2 n together with a basis {e S | |S ⊂ 2 [n] } labeled by subsets. We denote that vector space by R n and will henceforth not distinguish between realvalued set functions and the space R n . For every vector h ∈ R n and S ∈ 2 [n] , we denote by h S the component of S with respect to the basis vector e S .
The region {h ∈ R n | h S = H(S) for some entropy function H} of vectors in R n that correspond to entropies has been researched extensively in information theory [15]. It is known to be a convex cone (c.f. Section III), but an explicit description has not yet been found. However, several properties of entropy functions are well-understood. These are, respectively, monotonicity, sub-modularity, and a normalization condition: for all S, T ∈ 2 [n] . The set of inequalities (1) are known as the elementary inequalities in information theory or the polymatroidal axioms. An inequality that follows from the elementary ones is called a Shannon-type inequality. The region defined by the Shannon-type inequalities is the Shannon cone Γ n , a polyhedral closed convex cone Γ n . Clearly, it is an outter approximation to the true entropy cone. Since the latter is not yet fully characterized, we will work for the remainder of this paper solely in terms of the Shannon cone 1 . For future reference, we re-state this definition more formally: The Shannon cone Γ n is the set of vectors h ∈ R n that are 1 This relaxation implies that while all inequalities we will derive below are valid for any true entropy vector, they may fail to be tight.
We now return to descriptions involving a marginal scenario M. Given a point h : 2 [n] → R, one computes the restriction h |M : M → R dismissing the values h(S) for all S ∈ [n] \ M. The entropic cone bounding the correlations in M is a projection of Γ n along a map R 2 [n] → R M throwing away some of the coordinates, that ones not corresponding to observable quantities. This set is also a convex cone, that we denote by Γ M . Given an inequality description of Γ M , deciding if the marginal model can be extended is very simple, since one only needs to check whether it satisfies all the inequalities defining it. In other terms, if a marginal model violates an inequality derived only by the combination of polymatroidal axioms, this implies that this marginal model cannot arise from a joint probability distribution.
To determine the projection Γ M , a natural possibility would be to calculate the extremal rays of Γ n and dismiss the irrelevant coordinates of it. However, determining all the extremal rays of the cone Γ n is a very hard problem, with explicit solutions known only for few cases [28][29][30].
To determine Γ M in practice we start with the inequality description (1) of Γ n and then apply a Fourier-Motzkin (FM) elimination [31], a standard method for calculating the inequality description for the projection of a polyhedral cone.

C. Inequalities for marginal entropies
To illustrate the general method, we begin considering the simplest non-trivial Bell scenario, corresponding to the CHSH scenario [32] and consisting of two parties, say A and B, who can measure one out of two observables each, {A 0 , A 1 } and {B 0 , B 1 } respectively. This corresponds to a marginal scenario consisting of the following observable variables: As shown in [20,22] the only non-trivial Shannon-type entropic inequality (up to symmetries) corresponds to the inequality derived by Braunstein and Caves [19], the entropic CHSH, given by where here and in following we employ the notation H(A i B j ) = H AiBj (similarly to any number of variables) to avoid lengthy expressions.
In Ref. [19] this inequality was derived using the chain rule of entropies. However, as just discussed, any Shannon-type inequality can be derived from the elemental set of inequalities (1). To illustrate the general procedure, we consider how to obtain the entropic inequality (2), performing a FM elimination of the non-observable variables appearing in the set of elementary inequalities. To derive the CHSH inequality (2) it is sufficient to com-bine the two sub-modularity inequalities Using that H A0B0B1 ≥ H B0B1 and H A1B0B1 ≥ H A1B1 we get exactly (2). Note however, that these two last monotonicity inequalities are not in the elemental set (1). To obtain for instance H A0B0B1 ≥ H B0B1 from the basic ones we combine It is clear that, in general, any monotonicity inequality, follows immediately from the basic ones. One should note the similarity of CHSH E with the usual CHSH inequality in terms of probabilities [27,32], that can be expressed as with q AiBj being the probability of getting the outcome 0 if the measurement settings i, j are used, and similarly for the marginals q Ai and q Bj . We see that both inequalities are equivalent, if one just makes the simple replacement H AiBj → −q AiBj . Based on this simple observation we prove in Sec. IV a formal explanation to the similarities between the probability and entropic inequalities.

D. The role of causal structures
Bell's theorem is usually associated with the incompatibility of quantum correlations with a natural causal structure for space-like separated events. However, in the derivation of the entropic inequality (2) no explicit mention of a causal structure has been made. Inequality (2) is valid for any set of 4 variables. The only assumption made up to this point is the validity of classical probability theory, or in other terms, the existence of a well-defined joint probability distribution p(A 0 = a 0 , A 1 = a 1 , B 0 = b 0 , B 1 = b 1 ). Bell's theorem can be seen as a recipe for interpreting the variables appearing in (2) or (7) as physically observable quantities.
We recall the usual argument: Bell's theorem assumes a description of marginal models where there exists a hidden variable λ which subsumes all the information the variables A 0 , A 1 , B 0 , and B 1 , may depend on. This is the realism assumption in Bell's construction, assuring that all the variables have well-defined values prior to any measurement. At each run of the experiment, Alice and Bob independently chose which variable they will locally access, tossing, respectively, uncorrelated coins X and Y : if X = 0 Alice measures the observable associated with A 0 , if X = 1 she measures A 1 (similarly to Bob). Because Direct acyclic graph (DAG) representing the causal structure associated with a bipartite Bell experiment [9]. The associated marginal scenario consists of all sets of variables that do not contain two different observables of the same party. For instance, if X and Y both are dichotomic with x = 0, 1 and y = 0, 1, we have the CHSH scenario [33] that is characterized by the marginal scenario in general A 0 and A 1 (similarly B 0 and B 1 ) are associated with non-commuting observables, quantum mechanics prohibits both to be jointly measurable. The compatibility between A i and B j is guaranteed by invoking the assumption of locality, stating that space-like events are not causally connected. Note however, that for example A 0 is in principle not an observable quantity, rather what Alice observes is A 0 conditioned on the fact that X = 0. If X is correlated with λ, potentially the value of A 0 would be different had Alice chosen to measure A 1 .
Here enters the final assumption in Bell's theorem, that of measurement independence, stating that X and Y are independent from the hidden variable λ. Together, the three assumptions in Bell's theorem implies the causal structure shown in Fig. 1.

III. CONVEX CONES
In this section, we state several basic facts about closed convex cones and their duals. Detailed background and proofs can be found in [34]. General text on convexity that also treat cones are [35,36]. All cones that appear in this paper are closed and convex, so we will at times drop the attributes.
A closed convex cone C is a subset of R n 1. closedC = C,
A simple example is given in Figure 2.
The simplest types of cones are rays, i.e. sets of the form {λv | λ ≥ 0} for some vector v ∈ Ê n . Let L ⊂ C be a ray contained in a closed convex cone C. It is an extremal ray if it cannot be written as a non-trivial convex combination of elements in C, i.e. if for all x, y ∈ C, whenever 1 2 (x+ y) ∈ L, we already have that x, y ∈ L. Under a technical assumption, closed convex cones are the convex hull of their extremal rays. To state the assumption, we need to introduce the notion of a base. A base is a convex subset B ⊂ C of a convex cone C such B a b c F

FIG. 2.
A closed convex cone in R 3 . The extremal rays are labeled by a, b, c, while B designates a base. One of three facets is shaded and labeled F . that 0 ∈ B and every element v ∈ C is uniquely of the form v = λ b with λ ≥ 0 and b ∈ B. Not every cone admits a base (C = R n ⊂ R n , e.g. does not). However, cones which have a compact base are the convex hull of their extremal rays [35,Chapter 9]. This will be true for all cones that we will deal with in this paper.
In this sense, it is sufficient to specify the extremal rays in order to specify C. Thus, cones that have only finitely many extremal rays are of particular interest. A cone has this property if and only if it is the region in R n specified by finitely many linear and homogeneous inequalities [34,Chapter 3.4]. Such cones are called polyhedral. The (closure) of all achievable entropy vectors is now known not to be polyhedral [37]. However, the cone Γ n is manifestly defined by finitely many inequalities and hence polyhedral. The same is true for all other cones that we will be working with.
There is a powerful notion of duality for closed convex cones. Let C be such a cone. The dual cone (also polar cone) C * is the set of all homogeneous linear inequalities valid on C: In this language, the set of Shannon-type inequalities is just the dual cone Γ * n to Γ n . The generating set in (1) are the extremal rays of Γ * n . We will need the following properties of the duality operation: 1. By the Bipolar Theorem, (C * ) * = C for every closed convex cone C [35,Chapter 4]. In particular, a cone is completely specified by its dual.
3. Dual cones transform "contragradiently": Let C be a closed convex cone and D a linear map. Then C ′ := D(C) is again a convex cone and where D T is the adjoint of D.
Proof of Property 3. Let C be a closed convex cone, D a linear map, and C ′ = D(C). Then

IV. THE CORRESPONDENCE BETWEEN PROBABILISTIC AND ENTROPIC INEQUALITIES
In this section, we will present a simple geometric construction that explains and generalizes the connection, observed above, between the entropic CHSH E inequality and the usual CHSH inequality. We will find that the set of probability distributions for n binary experiments can be imbedded into the cone Γ n of set functions fulfilling the polymatroidal axioms. Dually, it follows that every linear inequality valid for Γ n can be turned into an inequality valid for probability distributions. The linear map that connects the two types of inequalities will turn out to send CHSH E to CHSH, thus providing a geometric explanation for the observed coincidence. (Figure 3) provides a high-level roadmap through the succession of convex cones that appear in the argument).
We start by considering various ways of representing the probability distribution of n binary random variables X 1 , . . . , X n .Most naturally, the distribution is given by a function on binary strings of length n with the interpretation that Let x be an n-bit string. The string is obviously characterized by the set A ∈ 2 [n] of the positions where it equals 0. Hence we can equivalently consider p as a function on the set of subsets of [n]: where, again, X A are those components of the random vector X whose indices appear in the set A. With this convention, p can be seen as an element of the real vector space R n over the powerset of [n]. More precisely, it is The various cones appearing in the argument that entropic inequalities can be mapped to Bell inequalities. We start with the positive orthant Pn of Rn, which is the cone over the set of probability distributions. The Möbius transform M sends it linearly and bijectively to a cone Qn (Qn, and all further cones that appear later, happen to be a sub-cone of the positive orthant. Thus, in this way, our two-dimensional sketch is faithful). The set of Möbius-transformed distributions is a sub-cone of Sn, a cone which fulfills a set of "inverted" polymatroidal axioms. The latter cone can be imbedded linearly into the Shannon cone Γn. Using cone duality (8), we can invert the chain above and imbed the dual cone Γ * n into Q * n . That yields the main claim of this section. Note that the initial Möbius transform is not strictly necessary to arrive at D T (Γ * ) ⊂ Q * n . We have stated it primarily to clarify the geometric nature of Qn (i.e. as an orthant, up to a linear isomorphism).
an element of the non-negative orthant of R n , and every element of that orthant corresponds to a (not necessarily normalized) distribution. We denote the non-negative orthant of R n by The reason we found it necessary to elaborate on this rather straight-forward correspondence is that the CHSH inequality (7) is given in terms of a different parametrization of probability distributions, which we can now explicitly connect to the standard one. Indeed, the quantities appearing in (7) are these: Equation (9) defines a linear map M : R n → R n such that q = M p. A priori, it is not clear that M is invertible, i.e. that one can specify a distribution in terms of the "q-vector" above. However, that turns out to be true. In essence, the relation is given by the Möbius inversion formula [38,Chapter 6]. is invertible. Its inverse is given by The superscript C stands, of course, for the set complement within [n].
Proof. A few manipulations bring the problem into a standard form of the Möbius transformation (we use the notions of [38, Chapter 6.6]). Using (9) and repeatedly re-labeling the sets one sums over: Thus [38, (6.10) which is the stated relation, up to an additional reparameterization of A → A C .
The set of non-negative distribution in qrepresentation is thus the Möbius transform of the non-negative orthant. We denote it by The significance of Q n is that its elements fulfill a set of "inverted" polymatroid axioms. In order to state this precisely, we have to introduced yet another (and final!) cone.
Definition and Lemma 4. The cone S n is the set of vectors s ∈ R n that are It holds that Q n ⊂ S n .
Proof. Let q = M p be the Möbius transform of a probability distribution. We will verify the properties 1. -3. in turn. Since they are obviously invariant under re-scaling by a positive number, this suffices to conclude Q n ⊂ S n .
Positivity follows directly from the definition of a probability. Property (2) is likewise a straight-forward consequence of (9): If B ⊂ A, then the probability that all X B are simultaneously zero is certainly larger than or equal to the probability that even all X A are equal to zero.
As for super-modularity: For any event E, let δ(E) be the "indicator function" that takes the value 1 if E occurs and 0 else. The inequality holds with probability one. Indeed, as soon as one of the terms on the right hand side (r.h.s.) is one, δ(X A∩B ) will also be one; if both terms on the r.h.s. are one, then so are both summands on the l.h.s. Super-modularity now follows from taking expectations on both sides.
The remainder of the argument will proceed as follows: We observe that there is a linear map D that sends S n onto Γ n . It then follows from elementary convex geometry (Section III) that the dual map D T sends linear inequalities valid on Γ n (i.e. Shannon-type inequalities) to linear inequalities valid on S n . Since Q n ⊂ S n , the inequalities also hold for Möbius-transformed probability distributions. The following statements make this precise.
Lemma 5. Let D : R n → R n be defined by Proof. Let A ⊆ B and s ∈ S n , then the inequality follows from the fact that vectors s in S n have decreasing components. The inequality follows from super-modularity of s ∈ S n . Next, follows from the fact that s is decreasing. Finally Inequalities (10), (11), (13) and (14) show, respectively, that D(s) is monotonously increasing, submodular, non-negative and that its ∅-component is zero. These are the defining properties of the Shannon-cone. Hence D(s) ∈ Γ n .
We thus find that any Shannon-type inequality can be mapped to an inequality valid for any Möbiustransformed probability distribution: Corollary 6. Let M be a marginal scenario and let f ∈ (Γ M ) * be a Shannon-type inequality. Then i.e. D T (f ) holds for Möbius-transformed probability distributions.
Proof. We combine properties 2. and 3. of cone duality as stated in Section III with Lemmas 4, 5 to obtain Since (Γ M ) * ⊂ Γ * n for any marginal scenario M, we are done.

A. Discussion
The space R n is equipped with a basis e A , A ∈ 2 [n] labeled by subsets of {1, . . . , n}. If one orders the basis in any way such that e ∅ is the first element, then the linear map D takes the form and its transpose is Written as a vector, the entropic CHSH E inequality (2) reads Because the coefficients sum to zero, (D T f ) ∅ = 0 and hence D T f = −f , which is the vector representing the ordinary CHSH inequality (7). We have thus indeed geometrically explained the coincidence observed initially. We remark that the inclusion D T (Γ * n ) ⊂ Q * n is not tight in general. I.e. it is not the case that all inequalities for Q n can be obtained from those of Γ n . Geometrically, this would be surprising, as Q n is just an orthant, while Γ n seems to be a more complicated geometrical object. It is indeed simple to find explicit counter-examples: Consider a specific inequality, for instance the Mermin inequality for tripartite correlations [39], it is possible to gain a better intuition. This is an example of an inequality that is valid on q functions but can not be translated into an entropic inequality. The reason is that for its derivation 32 independent inequalities (arising from positivity of some probability distribution) are needed; one being the positivity of q Ω ≡ q {A0,A1,B0,B1,C0,C1} , three correspond to the decreasing property, six are supermodularities and there are other 22 inequalities that cannot be translated into Shannon type inequalities.

V. COLLINS-GISIN ENTROPIC INEQUALITIES WITH AND WITHOUT BOUNDED SHARED RANDOMNESS
In this section we derive an entropic version of the Collins-Gisin (CG) inequalities [27], concerning a bipartite scenario where each party, say Alice and Bob, can choose between m measurement settings each. We further derive a different version of these inequalities that take into account a bounded amount of shared randomness between the parties.
The CG inequalities are typically written in the form I mm22 ≤ 0, where for m = 2 this corresponds to the CHSH inequality [33]. The notation of the inequality I mm22 stresses that each party has access to m possible measurement settings with 2 outcomes each. For m = 3 it has been shown that these inequalities are useful since they can detect the nonlocality of states that can not be detected by the CHSH inequality [27]. Moreover, as shown in [40], the I mm22 are tight Bell inequalities, that is, they correspond to facets of the local polytope.
The I mm22 inequality can be written compactly using the following matrix notation Using this matrix notation the I mm22 inequality can be written as It is important to stress the difference between the way one proves the validity of an entropic inequality and the validity of a probability inequality. In general to prove that a probability inequality is valid, one uses the information about the extreme points of the local polytope, that is, all the deterministic functions assigning values to the outcomes. In turn, as stressed before, little is known about the extremal rays of the Shannon-type entropic cone (apart from simple cases [28][29][30]). In the absence of information about the extremal rays of the entropic cone, the only way we can prove that the entropic inequality is valid is to use the linear programm approach of Yeung [15]. If the extremal rays are known, a very similar approach to the one used in correlation polytopes [10] can also be used in the entropic case (See Appendix B for further details).
We start considering the case m = 3. From Theorem 6 and the corresponding translation rule H AiBj → −q AiBj one could expected that the entropic analogous of I 3322 , that we label as I E 33 , could be given by This is indeed the case as this inequality can be obtained by the combination of the following basic inequalities together with the following monotonicity inequalities H A2B0B1 ≥ H A2B1 , H A0A1B2 ≥ H A1B2 and H A0A1B0B1 ≥ H B0B1 (remember that all monotonicity inequalities can be obtained by the basic inequalities). The notation of the inequality I E mm stresses that each party A and B has access to m possible measurement settings with any number of possible outcomes, in contrast to the I mm22 inequalities that are only valid for dichotomic observables. As discussed in the introduction this outcome size independence is an advantage of the entropic inequalities over the probabilistic ones.
In the Appendix C it is proven, proceeding with a similar FM elimination as the one sketched above, that the CG inequalities are valid for entropies if one simply applies the transformation rule H AiBj → −q AiBj , that is, were we have used a similar notation to the one in (17). From Theorem 6, this also implies that the I mm22 inequalities (18) can be derived relying exclusively on the inverse polymatroidal axioms.
Given the inequality (25) the first question one needs to answer is if it is able to witness nonlocal correlations. For the usual CG inequality (18) the maximal violation is achieved by the nonlocal non-signalling distribution that can be understood as a generalization of the paradigmatic PR-box [41] for m measurement settings. If we directly compute the value of I E mm for the distribution p m we find no violations. This is no surprise since entropies are unable to distinguish between correlations and anticorrelations; for example, p m is entropically equivalent to the classically correlated distribution In order to find violations of the entropic inequalities one needs to find a way of entropically distinguishing correlations from anti-correlations. As shown in [16] one way to do that is to make use of shared randomness between the parties. Consider two distributions p c and p a that have, respectively, correlated outputs (a ⊕ b = 0) and anti-correlated outputs (a ⊕ b = 1), whatever the inputs. Entropically both distributions are indistinguishable but if we allow the parties to make use of some extra shared randomness then we can tell apart both distributions. For example, mixing with equal probabilities the distributions with an independent copy of p c , we see that p c remains unchanged while p a is turned into a uncorrelated distribution. Similarly if we mix 1 2 p m + 1 2 p c we see that I E mm = m − 1, a violation of the entropic inequality that can be proven to be optimal, that is, in some sense (allowing the use of shared randomness) the maximally nonlocal probability distribution is also the maximally entropically nonlocal.
To prove the maximal violation of I E mm ≤ 0 we first consider the maximum algebraic value that the operator I E mm can achieve, that turns out to be the same as the one obtained with the distribution 1 2 p m + 1 2 p c . In order to understand the maximal violations I E mm let us rewrite it in terms of mutual informations Using that I AiBj − H Ai ≤ 0 and I A0B0 − H B0 ≤ 0 we see that the maximum violation is given by I E mm = − i=0,...,m−2 I A1+i:Bm−1−i + i=0,...,m−2 I A1+i:Bm−2−i ≤ i=0,...,m−2 H A1+i . For dichotomic observables it turns out that the maximal violation is given by I E mm ≤ m − 1.

A. Entropic CG inequality with bounded shared randomness
With the locality and realism assumption any correlation displayed between A and B can only occur through the hidden variable λ. The variable λ is the common ancestor to all the observable quantities. That means that any correlation shown between A and B must be screened off if we know the actual value of λ. Mathematically this corresponds to say that I A0A1:B0B1|λ = 0 (similarly to all subsets, for example, I A0:B0|λ = 0) or in other terms H A0A1B0B1|λ = H A0A1|λ + H B0B1|λ that can be rewritten as H A0A1B0B1λ + H λ = H A0A1λ + H B0B1λ . Remember that the mutual information between two variables can be expressed in terms of Shannon entropies as I A:B = H A + H B − H AB and similarly I A:B|λ = H Aλ + H Bλ − H ABλ − H λ . Note that we allow the parties to have access to local randomness, that is, H A0A1|λ (similarly to B) not necessarily is equal to 0. Our aim is to bound the entropy of the hidden variable to be H λ ≤ C.
Such a restriction fits naturally in the entropic approach to marginal models, since the considerations about finite shared randomness are equivalent to extra linear constraints that still define an entropic cone. In practice we start considering all the polymatroidal axioms describing the cone Γ n+1 , corresponding to all n variables of the marginal scenario plus the hidden variable λ. We add to this set of basic inequalities the ones that contain the information about the causal structure of the experiment, that is, saying that λ is the only common ancestor to all the space-like separated variables and also the inequality bounding the entropy of the hidden variable. Formally, this means we add to the set of basic inequalities, the following inequalities where A = (A 0 , ..., A m−1 ) and B = (B 0 , ..., B m−1 ). The first step in the FM elimination is to eliminate the hidden variable λ. Note that since (31) is the only inequality that depends on C, after the FM elimination of the variable λ any non-trivial inequality depending on the amount of shared randomness should appear as the sum of this inequality with some of the other inequalities. To begin with, we now prove that where again A = (A 0 , ..., A m−1 ) and B = (B 0 , ..., B m−1 ).
To obtain (32) we add the independence condition I A:B|λ = 0 with one basic submodularity inequality, one basic monotonicity inequality and the bound on H(λ): Note that in the limit that C = 0, since H B + H A ≥ H AB this implies that H B + H A = H AB , that is, no correlations between A and B are possible, as one should expect. We have checked computationally that (32) is the only extra facet inequality to the usual basic set one gets after eliminating λ for the CHSH scenario. We believe this is still the case for scenarios with more measurement settings but we do not have a formal proof of that. Note that all the terms appearing in (32) involve nonobservable quantities and should then be eliminated. Our approach here is to add basic inequalities in such a way that we eliminate all the non-observable quantities. Combining the following basic inequalities with inequality (32) we obtain that one can regard as the entropic CHSH with bounded shared randomness. For general m, as proven in the Appendix, the following inequality can be regarded as the entropic CG inequality with bounded shared randomness, where once more we have used a matrix notation similar to the one in (17). In terms of the mutual information, the inequality can be written as (using the matrix notation (29)) That means that independently of how many measurement settings one employs the inequality will be saturated resorting to not more than only two bits of shared randomness.

VI. MULTIPARTITE SCENARIOS
We start considering the simplest multipartite scenario, consisting of 3 parties with 2 measurement settings each. In terms of the correlation polytope it is known that there are 46 different classes of inequalities [42]. As we discuss in Sec. VII the FM elimination method to obtain the entropic inequalities bounding the marginal scenario is too demanding and we were not able to finish the computation.
To circumvent this limitation we proceed to derive a non-trivial inequality using the chain rule for entropies, a similar approach originally employed to derive the entropic CHSH inequality [19]. Remember that a marginal model in accordance with a LHV description assures the existence of the joint full probability distribution p (a x=1 , b y=1 , c z=1 , a x=0 , b y=0 , c z=0 ), with x, y, and z describing the measurement choices, for example x = 0 corresponds to Alice measuring the observable A 0 . The existence of the joint full distribution in turns imply the existence of the joint full entropy H(A 1 , B 1 , C 1 , A 0 , B 0 , C 0 ). Using the chain rule for the entropies we have that that in turns implies that where we have simply used the monotonicity of the Shannon entropy and the fact that conditioning on a variable cannot increase the entropy, that is, H A ≤ H AB and H A|BC ≤ H A|C . Note that the chain rule for entropies and the two other used properties aforementioned are Shannon type relations and as so the inequality M 3 ≤ 0 can also be derived from the basic set of inequalities (1). Given the inequality (44) the first question one needs to answer is if the inequality is able to detect genuine tripartite nonlocal correlations. To show that we consider the two kinds of genuine tripartite nonlocal correlations introduced in [43]: and p 2 (a, b, c|x, y, z) = 1/4 , a ⊕ b ⊕ c = xy ⊕ xz ⊕ yz 0 , otherwise .
(46) If we compute the value M 3 for these distributions we find no violations. As mentioned before, this comes as no surprise since entropies are unable to distinguish between correlations and anti-correlations; for example, both distributions p 1 and p 2 are entropically equivalent to the classically correlated distribution As discussed before one way to make the distinction between correlation and anti-correlation from the entropic perspective is to use classical shared randomness. If we just mix the distributions p 1 and p 2 with p c , for example equally mixing them with the same probability of 1/2, one can straightforwardly compute the value of the op-erator to be M 3 = 1 for both distributions, a violation of the inequality (44) that therefore witnesses the non-local behaviour of the distributions. A nice feature of the inequality (44) is that it can be easily generalized for any number of parties N . Once more, just making use of the chain rule, the monotonicity of the Shannon entropy and the fact that conditioning on a variable cannot increase the entropy we arrive at where now we have used the notation X j i to label the i-th observable of the j-th party with i = {0, 1} and j = 1, . . . , N . The operator P stands for all the different permutations of the parties, for example for N = 3 P (H X 1 It is easy to see that (48) is violated by a generalization of the distribution (45) for more parties, given by if we just mix it with the classical correlated distribution (x 1 ⊕ · · · ⊕ x N = 0). A nice feature of the entropic inequalities is that they can be readily applied to marginal scenarios with an arbitrary number of outcomes. This is in sharp contrast to the usual Bell inequalities approach where increasing the number of outcomes also increases the complexity and dimension of the correlation polytopes. To our knowledge very few inequalities have been derived for marginal multipartite Bell scenarios with many outcomes, in particular in Ref. [44] tripartite inequalities have been derived for any number of outcomes, but as the authors stress there is no straightforward generalization of their methods to more parties (Also note the Ref. [45], but there the inequalities involve products of observables from the same party and therefore have no direct application to Bell scenarios). Entropic inequalities may be proven as a useful tool in such cases. We have briefly explored this possibility by looking for quantum violations of the inequality (48) using multidimensional GHZ states given by and employing the Fourier-transformed measurements used in [46]. We have considered N = 3, . . . , 10 and d = 2, . . . , 10 and found that the violation of (48) increases with both N and d. As numerically noted in [44], the maximal quantum violation for the inequalities considered there can be reached only using systems with local Hilbert space dimension exceeding the number of measurement outcomes; what suggests that this kind of inequalities can be used as multipartite dimension witnesses [47]. We believe this is an interesting line of research one may pursue in the entropic approach.

VII. COMPUTATIONAL RESULTS
In Sec. V we have used a specific combination of the basic inequalities in order to derive the entropic inequalities (28) and (41). However in principle different combinations could give rise to different classes of entropic inequalities. To understand what other classes of inequalities one gets, we rely in this section on computational results. Using standard software to perform the Fourier-Motzkin elimination we computed all classes of entropic inequalities for the simplest marginal models where the computation is expected to finish in a reasonable time.
It turns out that even for very simple scenarios involving more than 5 variables the computations are already too large to finish. To go beyond that limitation one needs to further simplify the set of basic inequalities. In order to do that we follow the approach proposed in Ref. [48] for usual Bell inequalities. Let us begin considering a bipartite scenario, with m measurement settings for Alice and n for Bob. The existence of a classical description for all pairwise observables is equivalent to the existence of classical descriptions for the n subsystems, {A 0 , · · · , A m−1 , B j } with j = 0, · · · , n − 1, coinciding on {A 0 , · · · , A m−1 }. To find all the entropic inequalities for the marginal scenario it is then sufficient to start out with the union of the basic set of inequalities defining each one of the n subsystems (see Fig. 4). A further simplification is possible since for each of the subsystems (indexed by j) it is sufficient to consider only the inequalities involving the subsets of the variables {{A i , B j } , {A 0 , · · · , A m }} with i = 0, . . . , m − 1. That is, in practice we start with the set of Shannon-type inequalities describing the cone {A 0 , · · · , A m−1 , B j } and project it down to the to the cone describing {{A i , B j } , {A 0 , · · · , A m }}. With this simplification we were able to fully characterize bipartite marginal models involving up to 7 variables, also accounting for the effects of bounded shared randomness.
Using similar simplifications, we also obtain inequalities for marginal scenarios involving statistical independencies. Details are given in the Sec. VII B below.
To characterize the entropic cone of a multipartite marginal model we can in principle proceed as before, first simplifying the set of basic inequalities. For example for 3 parties, similarly to the bipartite case we can restrict the initial set of inequalities to the ones describing the following two subsystems, {A 0 , A 1 , B 0 , B 1 , C 0 } and {A 0 , A 1 , B 0 , B 1 , C 1 }, where A i , B j and C k with i, j, k = {0, 1} describe the measurement choices available to the parties. However, even with that the FM elimination still demanded too many computational resources and we were not able to finish the computation. This highlights the value of analytical derivations as the one in Sec. VI.

A. Bipartite scenario
In the simplest case, given by the CHSH scenario (m a = m b = 2, that is, 2 measurement settings for Alice and Bob) it is known [20,22] that the only class of 5. (Color online) Graphical representation (figure on the left) of bipartite Bell scenarios: each vertex represents an observable and edges connect observables that are jointly measurable. The dotted red edges represent the fact that all the correlations between the parties must be mediated by the hidden variable λ, that is, I A:B|λ = 0. The existence of a classical description for all pairwise observables is equivalent to the existence of classical descriptions for the n subsystems (figure on the right), {A0, · · · , Am−1, Bj, λ} with j = 0, · · · , n − 1, coinciding on {A0, · · · , Am−1, λ}.
non-trivial inequalities is given by the entropic CHSH. Using the computational approach described above it follows that in the case with (m a = 2, m b = 3) there are still only entropic CHSH inequalities, in full analogy with the probabilistic case. However, differences to the probabilistic case already start to appear in the case (m a = m b = 3). For probabilities there are only two different classes of non-trivial inequalities, the CHSH and the I 3322 inequality [27]. However for entropies there are 4 classes of non-trivial tight inequalities, shown in the table I. Inequalities 3 and 4 correspond, respectively, to the I E 22 and to the I E 33 . By theorem 6 all the inequalities are also valid in the probability space after the proper translation is made, however, inequalities 5 and 6 do not correspond to tight inequalities in the correlation polytope. For the scenario with (m a = 3, m b = 4), 5 new classes of inequalities have been found, the inequalities 7 to 11 in Table I. We have also performed the same computation, but now bounding the amount of shared randomness, following the idea described in Sec. V A. First of all we note that as one should expect, the inequalities derived in the absence of any restriction on H λ still define facets of the entropic cone. For the inequalities depending on H λ it follows that in the (m a = m b = 2) and (m a = 2, m b = 3) cases, the only inequalities bounding the shared randomness are the ones given in Table II, inequalities 1 to 3. For the case (m a = m b = 3) it follows that the problem is already too demanding and we were not able to finish the computation. However, a further simplification is possible as shown in Fig. 5. We consider the union of the basic inequalities for the sets {A 0 , A 1 , A 2 , B j , λ} {A j , B 0 , B 1 , B 2 , λ} with j = 0, 1, 2, coinciding on {A 0 , A 1 , A 2 } and {B 0 , B 1 , B 2 }. For each of these sets of inequalities parameterized by j, we first project it down to the cones describing {{A i , B j } , {A 0 , A 1 , A 2 , λ}} and {{A i , B j } , {B 0 , B 1 , B 2 , λ}} with i, j = 0, 1, 2. Then we proceed to the final FM elimination, eliminating the nonobservable variables. With this simplification we were able to finish the computation and 26 new classes of inequalities were found, inequalities 4 to 29 in Table II. All the inequalities in Table II have a rather remarkable feature. For all of them it is not difficult to prove that the maximal value achievable by local correlations is given by H B0 + min {H A0 , H B0 }. That is, up to 3 measurements settings and considering dichotomic observables, any local distribution needs not more than 2 bits of shared randomness to be simulated. As we discuss in Sec. VIII it seems improbable that, increasing the number of measurement settings, only 2 bits of shared randomness still would suffice to simulate any local distribution, specially the ones arising from entangled states. It would be very interesting to find classes of entropic inequalities that would require in principle more bits of shared randomness to simulate local distributions.

B. Scenarios with statistical independencies between the hidden variables
Consider three random variables A, B and C, characterizing for instance some traits of three different languages [17]. From the observed data one concludes that all the three variables are all pairwise maximally correlated, for example, I A:B = I A:C = I B:C = H A = H B = H C = H ABC . Furthermore, no conditional independencies can be inferred from the data. The question is then: are the observed correlations compatible with a causal structure involving no common ancestor to all the three variables ( Fig. 6 on the right)? Or is a common ancestor needed to explain the data (Fig. 6 on the left)? A direct application of causal discovery algorithms [1,2,49] would try to distinguish between the two causal structures, but since no conditional independencies are imposed and by the principle of minimality (Occam's Razor), the algo-rithm would return the causal structure on the right as the answer. But clearly this is wrong, because this causal structure implies that (for the observed data) if A is maximally correlated with B, then it should be completely uncorrelated with C.
The entropic approach offers a surprisingly simple solution in this case. It is not difficult to show that the causal structure on the left of Fig. 6 implies a bound on the correlations given by I A:B + I A:C ≤ H A (and permutations thereof) [18]. The observed data clearly violates this inequality, meaning that it cannot be explained by the corresponding causal structure. That is, an ancestor common to all the three variables is required to explain the observed probability distribution. It is interesting to note that this same scenario has been considered under two very different perspectives, from the purely causal inference one [17] but also from the quantum non-locality point of view [18,50].
The causal structure depicted on the right of Fig. 6 imply many statistical independencies. As an example we have that I λ1:λ2 = I λ1:λ3 = I λ2:λ3 = 0 and I A:B|λ1 = I A:C|λ2 = I B:C|λ3 = 0. Using all available constraints many of the variables can be eliminated. A final FM elimination gives as a result that 3 different classes of non-trivial entropic inequalities completely characterize the marginal scenario: Note that the inequality (51) is exactly the same as the one obtained in [18]. Our derivation shows that indeed this is a tight Shannon-type inequality. However, the are two other inequivalent classes, inequalities (52) and (53) that were not known before.
Another interesting case is the one of a common ancestor to all the variables, as depicted on the left of Fig.   6, however now bounding the entropy of the common ancestor to be H λ ≤ C. In this case we find that the only non-trivial inequality is given by  (54) is H A , that is, as expected the distribution needs not more than H A bits of shared randomness to be achieved.

VIII. DISCUSSION
In this work we have explored the entropic approach to marginal problems, gathering several results that we believe may pave the way to a better understanding and more systematic application of entropic inequalities in a wide range of applications. In the next paragraphs we summarize and briefly discuss our findings, with special attention to the open problems and possibilities that we believe deserve future investigation.
We have shown a correspondence between Shannontype inequalities and inequalities in the probability space, stating that any Shannon-type inequality is also a valid probabilistic inequality if a very simple translation is made. This correspondence formally explains the similarities observed for the n-cycle marginal scenario (that has as a particular case the CHSH scenario) [20,22,51] between the entropic inequalities and the probabilistic version. For the n-cycle scenario all the non-trivial Shannon entropic inequalities have an exact correspondence in the probability space [16,22], however this is not true in general, since not all probabilistic inequalities define valid Shannon entropic inequalities, that is, they involve probability inequalities that cannot be translated to a Shannon type entropic inequality. Also, as mentioned before not all valid entropic inequalities are of the Shannon type. Could it be that taking into account non-Shannon type inequalities a deeper correspondence between entropic and probabilistic inequalities can be made? The use of non-Shannon type inequalities is also interesting from a practical perspective, since in principle taking them into account one may get more restrictive inequalities, bounding more tightly the set of allowed correlations.
Based on the correspondence between entropic and probabilistic inequalities we analytically proved the entropic version of the Collins-Gisin inequalities [27], valid for a bipartite scenario where each party has access to m measurements each (m a = m b = m). The entropic inequalities have the advantage of being valid for observables with any number of outcomes while the CG inequalities are specially tailored for dichotomic ones. Moreover, for the scenarios with (m a = m b = 2), (m a = m b = 3) and (m a = 3, m b = 4) we have computationally (through the FM elimination) derived all the entropic inequalities and shown that there are, respectively 3, 6, 11 inequivalent classes of inequalities. For the computational results, since the FM elimination generally produces as an output a huge list of redundant inequalities, we have also proven a result that allows one to check, given the list of extremal points and half-lines of the convex set, if the inequality corresponds to a facet or not (see Appendix B).
We have also considered, for the bipartite case with (m a = m b = 2) and (m a = m b = 3), the effects of bounded shared randomness and shown that in these cases any local distribution with dichotomic outcomes can be entropically simulated with at most two bits of shared randomness. This is a very interesting point that deserves further investigation. It seems implausible that increasing the number of measurement settings, any local distribution would still require at most two bits of shared randomness. To further understand that, we have analytically proven a different entropic version of the CG inequalities (for arbitrary m), where the effects of bounded shared randomness are taken into account. However, these inequalities still have the surprising property that no more than two bits of shared randomness are necessary to entropically simulate any local distribution. However, this is only one class of inequalities and there are possibly many more with increasing m. If one can find other classes of inequalities one could investigate, for example, what are the shared randomness requirements to simulate the correlations of a Werner state ̺ W = v|Φ + Φ + | + (1 − v)I/4 parameterized by the visibility v [52]. In the region where the state is known to violate some Bell inequality [53], since the state is nonlocal it follows that even an infinite amount of shared randomness is not sufficient to reproduce the correlations. There are however two interesting regions where not much is known. First, for 1/3 < v < 1/K G ≈ 0.66 the state is entangled but local [54]. Since the state is entangled one may expect that more shared randomness would be required, but how much of it? The most interesting region is of course the one for 1/K G < v < v Ver ≈ 0.7056 where it is not known if the state is nonlocal or not. What are the shared randomness requirements for correlations obtained in this region?
Working in a generalization of the approach followed in [19], we derived entropic inequalities for multipartite marginal scenarios consisting of any number of parties, each having access to two observables with any possible number of measurement outcomes. Using specific projective measurements we have numerically shown that the violation of these inequalities for multidimensional multipartite GHZ states increase with both the size and the local dimension of the state. An interesting perspective would be the possible use of these inequalities as multipartite dimension witnesses, similarly to what has been suggested in [44].
Finally we have considered a scenario involving conditional independencies, for which the question is to decide if a given correlation for the observable quantities is compatible with a causal structure involving only pairwise common ancestors. A natural question is how to generalize the obtained results to the case of many observable quantities and different configurations of common ancestors [17]. An interesting related problem would be to understand relaxations over the bilocality assumption of entanglement swapping experiments [11], for example, allowing correlations between the hidden variables while keeping the bilocality on the level of the observed quantities. Similarly one could use entropic inequalities to relax the measurement independence assumption [12,13], stating that the measurement choice made by the parties is independent of the hidden variable.

IX. ACKNOWLEDGEMENTS
It is a pleasure to thank Dominik Janzing for insightful discussions about causal structures. We also would like to thank A. Acín and J. B. Brask for pointing out the potential application of bounded shared randomness inequalities for Werner states. Our work is supported by the Excellence Initiative of the German Federal and State Governments (Grant ZUK 43).
Every convex set can be expressed in terms of dual representations, either in terms of extremal points and half-lines or in terms of inequalities (half-spaces) defining the facets of the convex set. As discussed in Sec. II not much is known about the extremal half-lines/points of the Shannon-type entropic cone and in order to derive entropic inequalities for a given marginal scenario one needs to rely on the FM elimination. One problem that arises is that after performing the FM elimination usually the set of inequalities will contain many (for the scenarios we consider in Sec. VII typically several thousands) redundant inequalities not corresponding to facets of the cone. That is, among the huge list of inequalities obtained via the FM elimination, we need to find the minimal set of inequalities describing the marginal scenario, that ones corresponding to facets of the entropic cone. One way to find this minimal set of inequalities is to solve a linear problem, i.e, whenever a given inequality can be expressed as a linear combination (with positive coefficients) of other inequalities it can be safely eliminated. However, given the typical case we face, of sets containing a huge number of inequalities, this approach soon becomes unfeasible. Notwithstanding the difficulty in characterizing the extremal rays/points of the Shannon-type entropic cone, for most of the marginal scenarios we consider computationally, we were also able to get a list of them. In order to derive the minimal set of inequalities and further understand the structure of the entropic cones we rely instead on the information provided by the extremal points and half-lines.
Given the extreme points, extreme directions and a list of inequalities satisfied by all points of some polyhedral set it is easy to decide which of these inequalities belong to a minimal list characterizing the polyhedral set.
Before we prove this fact in general let us first look at the example of the two-dimensional unbounded closed polyhedral set in two-dimensional Euclidean space that is defined by the inequalities x ≥ 0, y ≥ 0 and − x + y + 1 ≥ 0 (B1) The convex set (blue color) generated by the half-lines x ≥ 0, y ≥ 0 and − x + y + 1 ≥ 0, where each of this inequalities generates a facet of the set. The inequalities x+1 ≥ 0 and x+y ≥ 0 are indicated, respectively, in red and black, and it is clear that while valid (since they can be obtained as a linear combination of the generating set) they are not facets of the proposed convex set. displayed in 7. Its extreme points are p 1 = (0, 0) and p 2 = (1, 0) and the extreme directions are v 1 = (0, 1) and v 2 = (1, 1). Given a list of the three defining inequalities plus lets say the valid inequalities x+ 1 ≥ 0 and x+ y ≥ 0 we want to reduce it to the minimal list containing only the three defining inequalities.
In this two-dimensional example, one can guess that a necessary and sufficient condition for a valid inequality belonging to the minimal list is that it is saturated by an one-dimensional subset of the polyhedral set.
We start with x ≥ 0 and see that it is saturated by p 1 and the halfline p 1 + λv 1 , λ ≥ 0. The intersection of our polyhedral set with all points in R 2 saturating the inequality is therefore the halfline p 1 +λv 1 . Its dimension is one and thus the inequality belongs to the minimal list.
Consider now the inequality y ≥ 0. We find p 1 and p 2 saturating it and thus the one-dimensional object conv{p 1 , p 2 } being the set of points saturating the inequality. The third defining inequality is also saturated by an one-dimensional subset of our set -namely the halfline p 2 + λv 2 , λ ≥ 0.
In turn, the inequality x+y ≥ 0 is only saturated by p 1 . So we found a zero-dimensional subset and disregard this inequality. The last inequality x + 1 ≥ 0 is not saturated by any point, so we found the ∅ and also disregard this inequality. Clearly, the inequalities not belonging to the minimal list are already implied by those belonging to the minimal list.
To begin with the general case we note that every closed convex set P in R n that contains no lines is the convex hull of its extreme points and extreme half-lines (corollary 2.6.15 of [55]). In our case, however, after the TABLE II. All classes of entropic bipartite inequalities with bounded shared randomness for ma = 2, 3 and m b = 2, 3. We have listed the coefficients of one inequality in each row, and all inequalities are of the form ≤ c.
FM elimination performed with PORTA [56], it is not the extreme points and extreme half-lines we are given, but the extreme points and extreme directions. It is obvious that the set containing all half-lines arising from the combination of all extreme points with all extreme directions contains the set of all extreme half-lines, as a half-line can not be extreme if it arises from the combination of a non extreme point with a extreme direction. Formally speaking if x = αx 1 + βx 2 with α, β ≥ 0 and α + β = 1 then hl = x+λy = α(x 1 +λy)+β(x 2 +λy) = αhl 1 +βhl 2 , with extreme points x 1 , x 2 , extreme direction y and extreme half-lines hl 1 , hl 2 . Every inequality x|h i ≤ C i corresponds to a hyperplane H i = {x| x|h i = C i } that divides the space into two closed half-spaces Definition 7. Let P be a closed convex set with dimension r and H some hyperplane with P ⊆ H − then F := P ∩ H is called an exposed face. An exposed face with dimension r − 1 is called a facet.
Note that the name exposed face is justified by the fact that F indeed is a face of P .
With the following lemma it is easy to check wether an inequality corresponds to a facet. Lemma 8. Let P = conv{A} with A being the set of extreme points and extreme half-lines of P and H some hyperplane with P ⊆ H − then F := P ∩ H = conv{A ∩ H}. Proof.
k α k y k |h = C, α k ≥ 0, k α k = 1, y k ∈ A} We know that P ⊆ H − and so A ⊆ H − which means that y k |h ≤ C for every k. Thus the condition k α k y k |h = C (B4) can only be fulfilled for every convex combination if y k |h = C for all k. And so F = P ∩ H = conv{A} ∩ H = {x|x = (B5) k α k y k , y k |h = C, α k ≥ 0, k α k = 1, y k ∈ A} = conv{A ∩ H} For every inequality it is easy to find the set A ∩ H = {x| x|h = C, x ∈ A} i.e. the set of all extreme points and half-lines of P that saturate the inequality.
Still we have to check the dimension of F . It is known (theorem 4.1.3 of [55]) that F as the convex combination of its extreme points and extreme half-lines can be written as the direct sum of the convex combination of its extreme points and the cone of its extreme directions : We see that F is some linear combination of the vectors (p i − p 1 ) and v i . The dimension of F thus is equal to the rank of the matrix with columns (p i − p 1 ) and v i .
In summary what one has to do is to find all extreme points and directions that saturate a given inequality and calculate the rank of this matrix. If and only if it is equal to (r − 1) the inequality induces a facet.