Non-Shannon inequalities in the entropy vector approach to causal structures

A causal structure is a relationship between observed variables that in general restricts the possible correlations between them. This relationship can be mediated by unobserved systems, modelled by random variables in the classical case or joint quantum systems in the quantum case. One way to differentiate between the correlations realisable by two different causal structures is to use entropy vectors, i.e., vectors whose components correspond to the entropies of each subset of the observed variables. To date, the starting point for deriving entropic constraints within causal structures are the so-called Shannon inequalities (positivity of entropy, conditional entropy and conditional mutual information). In the present work we investigate what happens when non-Shannon entropic inequalities are included as well. We show that in general these lead to tighter outer approximations of the set of realisable entropy vectors and hence enable a sharper distinction of different causal structures. Since non-Shannon inequalities can only be applied amongst classical variables, it might be expected that their use enables an entropic distinction between classical and quantum causal structures. However, this remains an open question. We also introduce techniques for deriving inner approximations to the allowed sets of entropy vectors for a given causal structure. These are useful for proving tightness of outer approximations or for finding interesting regions of entropy space. We illustrate these techniques in several scenarios, including the triangle causal structure.


Introduction
A common challenge in science is to make predictions based on incomplete information.Full details of the mechanism by which correlations between two or more variables come about is often not apparent and there may be several competing causal explanations.Experimentation with interventions is one way to decide between the candidate explanations [51].However, in many situations such intervention is difficult (or unethical), for instance if certain involved systems are outside our control.
Considering a particular causal structure generally imposes restrictions on the set of correlations that can be produced.A well-known example of such a constraint is a Bell inequality [5].That such relations can be violated using measurements on quantum states motivates the consideration of more general quantum causal structures.Correlations that can be generated in such structures but not in their classical analogue are the basis for several important cryptographic tasks [28], in particular for device-independent protocols for key distribution [1,3,43,60] or the generation of private randomness [21,22,47,55].In a cryptographic scenario, an adversary is usually able to exert influence at particular points in the protocol, which can be conveniently encoded using a causal structure.Characterising the set of possible classical, quantum and post-quantum correlations within a specific causal structure provides a basis to understand further tasks and possible quantum and postquantum advantages, which were initially studied in specific cases [7,9,20,23,34].
For a general causal structure with unobserved variables, deciding whether a given set of correlations can be generated is computationally difficult and only feasible for small examples [33,39].One way to get around this, is to use entropy to simplify the characterisation of the corresponding sets of correlations [11-16, 32, 36, 38, 46, 53, 58].Rather than looking at the distributions themselves, we consider entropy vectors whose components are the joint entropies of each subset of the observed variables.This often 1 has the advantage that the set of entropies realisable in a given causal structure is convex, in contrast to the set of compatible distributions.In addition, the causal constraints can be represented by linear relations between entropies instead of polynomial constraints.It is also significant that entropic constraints on possible correlations in a causal structure are independent of the dimension of the involved random variables.Hence, the method enables the derivation of constraints that are valid for arbitrarily large alphabet sizes of all involved observed and unobserved systems.These properties make entropy vectors a convenient means to distinguish different causal structures in many situations.
In this paper we report the use of non-Shannon inequalities for distinguishing causal structures.After a short outline of the entropy vector approach and after introducing the necessary notation in Section 2, we go on to show in Section 3 that non-Shannon inequalities play a central role for the distinction of causal structures.This is illustrated with the triangle causal structure (Section 3.1), one of the simplest causal structures in which there is a separation between classical and quantum at the level of correlations.For this example, we present numerous new entropic constraints, which involve several infinite families of valid inequalities, that together form the tightest entropic characterisation of the classical triangle causal structure known to date.This also leads us to disprove a claim that previously known entropic approximations to this causal structure were tight [14,16].Whether our new inequalities are sufficient to separate classical and quantum versions of causal structures is left as an open problem.
In Section 3.2, we analyse a number of other causal structures, taking into account non-Shannon inequalities for their entropic characterisation.These inequalities are relevant for distinguishing different classical causal structures as well as for settling the question of whether there is a classical-quantum separation in the entropy vector approach.
We further analyse the role of non-Shannon inequalities for the entropic characterisation of the causal structure relevant in the context of information causality [50] in Section 3.3, where the combination of non-Shannon inequalities with postselection allows us to derive numerous new entropy inequalities.
In Section 4, we provide the first inner approximations to the entropy cones of causal structures.These are useful for certifying that particular entropy vectors are realisable in a causal structure as well as for showing tightness of an entropic outer approximation in some cases (see Section 3.2 for examples).In cases where the outer approximation is not tight (or not known to be tight), an inner approximation that shares some extremal rays with the outer approximation allows the identification of parts of the boundary of the true entropy cone as well as regions where identifying the cone's boundary requires further analysis.
For comparison with the classical case, we also briefly consider non-Shannon inequalities in the context of quantum and hybrid causal structures in Section 5, which is illustrated with the example of the triangle causal structure, before concluding in Section 6.

Entropic cones and the entropy vector approach to causal structures
In this section, we briefly outline the entropy vector approach and introduce the required notation.An elaborate introduction to the topic can for instance be found in the review [62].

Entropic cones
For a set of n jointly distributed random variables Ω = {X 1 , X 2 , . . ., X n } taking values in the alphabet X Ω = X 1 ×X 2 ×• • •×X n we denote the set of all possible joint probability distributions as P n .For a set of variables with joint distribution P Ω ∈ P n its Shannon entropy [57] is The Shannon entropy of Ω and of all its subsets can be expressed in an entropy vector in R 2 n −1 , H(P ) := (H(X 1 ), H(X 2 ), . . ., H(X n ), H(X 1 X 2 ), H(X 1 X 3 ), . . ., H(X 1 X 2 . . .X n )).
The closure of the set of all possible entropy vectors, Γ * n , is a convex cone, denoted as Γ * n [68]. 2 While for n ≤ 3, the entropy cone Γ * n is polyhedral 3 [69], an infinite number of linear inequalities are required to characterise Γ * n for n ≥ 4 [42].Hence, considering approximations to Γ * n is common practice.

Approximations to Γ * n
Before specifying approximations to Γ * n , we define a few quantities, that are relevant in the following.The conditional entropy of two disjoint subsets X S , X T ⊆ Ω is defined as and for three mutually disjoint subsets X S , X T , X U ⊆ Ω the conditional mutual information of X S and X T conditioned on X U is Note that the entropy of the empty set is H(∅) = 0, so that H(X S ) = H(X S |∅), for example.Two other entropic quantities we will make use of in this article are the interaction information [44] of three mutually disjoint subsets X S , X T , X U ⊆ Ω, I(X S : X T : X U ) := I(X S : X T ) − I(X S : X T |X U ), and the Ingleton quantity of four mutually disjoint subsets X S , X T , X U , X V ⊆ Ω, I ING (X S , X T ;X U , X V ) := I(X S : X T |X U ) +I(X S : X T |X V ) + I(X U : X V ) − I(X S : X T ).(1) For any entropy vector of a joint distribution of the random variables Ω the following Shannon inequalities hold: • For any X S ⊆ Ω, H(X S ) ≥ 0.
• For any disjoint X S , X T ⊆ Ω, H(X S |X T ) ≥ 0.
They are known to constrain a convex polyhedral cone, the Shannon cone, Γ n [66].Because the 2 The closure is taken because there isn't in general a good reason to put an upper bound on the alphabet sizes and it is known that Γ * n = Γ * n for n ≥ 3 [68]. 3In fact Γ * 3 equals the corresponding Shannon cone, Γ3, introduced below.
Shannon inequalities hold for any entropy vector we have Γ * n ⊆ Γ n .The first entropy inequality that is not of Shannon type was found in [68] and is presented in the following.
Proposition 1 (Zhang & Yeung).For any four discrete random variables X 1 , X 2 , X 3 and X 4 the following inequality holds: In the following the lhs of this inequality is abbreviated as The first account of infinite families of inequalities was given in [42].
Proposition 2 (Matúš).Let X 1 , X 2 , X 3 and X 4 be random variables and let s ∈ N. Then the following inequalities hold: For s = 1 both inequalities are equivalent to ♦ X 1 X 2 X 3 X 4 ≥ 0. For the current state of the art on non-Shannon inequalities we refer to [27].To our knowledge, all known non-Shannon entropy inequalities in four variables that are not (known to be) rendered redundant by tighter ones can be written as the sum of the Ingleton quantity and (conditional) mutual information terms [25,27,41,42,61,69].
Complementary to outer approximations (such as the Shannon cone, Γ n ) it is also interesting to consider inner approximations, Γ I n , to the nvariable entropy cone Γ * n .Such approximations can be defined in terms of so-called linear rank inequalities, which are inequalities that hold for the dimensions of subspaces of vector spaces [37].Entropic inequalities imply linear rank inequalities4 but the converse does not hold [35], which is why using (the entropic analogue of) linear rank inequalities gives an inner approximation.In the case of n = 4 the Shannon inequalities and the Ingleton inequality, i.e., (and its permutations), define such an inner approximation Γ I 4 [37].Γ I 5 is defined by the Shannon inequalities, all instances of the Ingleton inequality and 24 additional classes of inequalities [26].For 6 or more variables, a complete list of all linear rank inequalities is not known, nor is it known whether such a list would be finite.A list of over a billion inequalities (counting permutations) has been found [24]. 5

The entropy vector approach to causal structures
A causal structure, C, is a set of variables arranged in a directed acyclic graph (DAG).The parents, X ↓ 1 , of a variable X in a DAG are the variables from which an arrow is directly pointing at X, and the descendants, X ↑ of X are all variables that may be reached from X along a directed path within the DAG.We use C C and C Q to denote the classical and the quantum version of a causal structure respectively.

Classical causal structures
The graph of a classical causal structure, C C , with random variables X 1 , X 2 , . . ., X n , encodes the independence relations of X 1 , X 2 , . . ., X n in the sense that the distribution P X 1 X 2 ...Xn is said to be compatible with C C if it can be decomposed as This interpretation of classical causal structures follows the theory of Bayesian networks [51].The set of all compatible distributions is in the following denoted P(C C ).The compatibility requirement is equivalent to the condition that for each variable X i ,

I(X
X ↑ i denotes the non-descendants of X i , i.e., all variables in the causal structure except for the variable itself and its descendants. 6he entropic description of causal structures has first been considered in [13,32].The n equalities (5) restrict the n-variable entropy cone Γ * n to the cone of all entropy vectors compatible with C C , denoted Γ * C C .An outer approximation to Γ * (C C ) is constructed by supplementing Γ n with the same n equalities, which leads to the cone When k out of the n variables of C C are observed, we take these to be the first k variables, X 1 , X 2 , . . ., X k , without loss of generality.For k < n we are then interested in deriving constraints for the observed variables only.For a compatible distribution, this is achieved by marginalising over the unobserved variables X k+1 , X k+2 , . . ., X n which yields a distribution in the set of all compatible marginal distributions P X 1 X 2 •••X k ∈ P M C C .Entropically, marginalisation corresponds to a projection of the entropy cone to the corresponding k-variable marginal cone Γ * M C C R 2 k −1 , which would be obtained by dropping all components involving any of the n − k unobserved variables from each vector in Γ * C C .The outer approximation Γ C C can be analogously projected to an ap- by means of bounding hyperplanes and applying a Fourier-Motzkin elimination algorithm to the system of linear inequalities [64].

Quantum causal structures
A quantum causal structure C Q differs from its classical analogue in that the unobserved nodes correspond to quantum systems.Here, we only consider causal structures with two generations (a) For a quantum causal structure with an observed input node, X 1 -meaning a parentless node from which there is only one arrow to another observed node, X 2there always exists another (quantum) causal structure that allows for exactly the same correlations and where the observed input is replaced by a shared quantum parent of X 1 and X 2 .To simulate any correlations in (a) within scenario (b) we can use a quantum system that sends perfectly correlated classical states to both nodes X 1 and X 2 , distributed as X 1 .On the other hand, any correlations obtained in scenario (b) can be created in scenario (a) by having a random variable X 1 sent to node X 2 , where the relevant quantum states (the reduced states that would be present in (b) conditioned on the value of X 1 ) are locally generated.Note that these considerations are not restricted to quantum causal structures but apply also in the classical case (or even if considering states from a generalised probabilistic theory).
of nodes, where the nodes of the first generation are unobserved quantum systems and the nodes of the second generation represent observed (classical) variables.Note that this also allows for the description of causal structures with observed input nodes, as is illustrated in Figure 1.
For such causal structures, each edge has an associated Hilbert space, which can be labelled by the parent and child, e.g., for a DAG with an edge X → Y , there is an associated H X Y .Each unobserved node is labelled by a quantum state, a density operator on the tensor product of the Hilbert spaces associated with the edges originating at that node.For each observed node there is an associated POVM that acts on the tensor product of the Hilbert spaces associated with the edges that meet at that node.The distributions, P ∈ P M C Q , of the observed variables that are compatible with a causal structure C Q , are those resulting from performing the specified POVMs on the relevant systems via the Born rule.
A technique to analyse these sets entropically was proposed by Chaves et al. [16] and is outlined in the following, where the idea of considering entropy cones of multi-party quantum states goes back to Pippenger [54].The set of compatible observed distributions P ∈ P M C Q can be mapped to a set of compatible entropy vectors, the closure of which is denoted Γ * M C Q .To approximate this set, a system is assigned to each observed variable as well as to each outgoing edge of each unobserved node.As opposed to the classical case, where we can always define a joint distribution over all variables in a causal structure C C , there is in general no joint quantum state over all systems in C Q .In particular, the systems corresponding to the edges that meet at an observed node do not coexist with the outcome at that node and hence there is no joint quantum state from which a joint entropy could be derived.The approach is therefore based on a notion of coexistence: two systems are said to coexist if neither is a quantum ancestor of the other in C Q , and a set of systems that pairwise coexist form a coexisting set.For each coexisting set, X S ⊆ Ω, the von Neumann entropy H(X S ) := − tr(ρ X S log 2 ρ X S ) of their joint state ρ X S is defined; all of these von Neumann entropies are considered as components of an entropy vector.
For each coexisting set the entropies of all its subsets as well as all conditional mutual informations of its systems are positive [40].The conditional entropy may not be positive in general, but for three mutually disjoint subsets of a coexisting set, holds instead.These three types of inequality hold for the components of any entropy vector.For the von Neumann entropy of a multi-party quantum state no additional entropy inequalities are known.It has been suggested, however, that any classical 'balanced entropy inequality' [10] (which includes all known non-Shannon inequalities) may also hold for multiparty quantum states [8].It is worth remarking that the lack of a joint state for all nodes within a quantum causal structure would restrict the applicability of such inequalities in the causal context if they were to hold. 7n many circumstances the conditional entropy of certain sets of systems is known to be positive, e.g. if all systems in a coexisting set are classical.Such constraints on the entropy vectors are also added (see [62] for further details).The causal restrictions encoded in the graph are accounted for by the condition that two subsets of a coexisting set are independent (and hence have zero mutual information between them) if they have no shared ancestors. 8To relate the entropies of systems in different coexisting sets, data processing inequalities (DPIs) are used: Let ρ X S X T ∈ S(H X S ⊗ H X T ) and E be a completely positive trace preserving (CPTP) map on S(H X T ) leading to a state ρ X S X T .Then Results on the redundancy of certain DPIs have been presented in [62].All constraints on the possible entropy vectors taken together define a polyhedral cone, which we denote Γ C Q .Its projection to the observed variables, Γ M C Q , is an outer approximation to Γ * M C Q , that can be computed from Γ C Q with a Fourier-Motzkin elimination algorithm [64].

Improving current entropic characterisations with non-Shannon inequalities
In this section we show how non-Shannon inequalities allow us to improve the previous outer approximations to the entropy cones of classical causal structures.We give an improved entropic description of the triangle causal structure of Figure 2(e) (Section 3.1), discuss the application of non-Shannon inequalities to further causal structures (Section 3.2) and demonstrate that non-Shannon inequalities are also applicable in combination with post-selection using information causality as an example (Section 3.3).
The computational procedure that we use in order to derive these new inequalities is roughly outlined in the following.(1) We take the Shannon inequalities for the joint distribution of all variables in a causal structure C C , (2) we add a set of valid non-Shannon inequalities to these, (3) we add all conditional independence equalities that are implied by C C , (4) we eliminate all entropies of unobserved variables from the full set of inequalities (by means of a Fourier-Motzkin elimination algorithm [64]), which leads to constraints on the entropies of the observed variables only.
Note that the same procedure, but missing out step (2) corresponds to the computation of Assuming no ancestral relations between any of the three observed variables X, Y and Z (i.e., no member of {X, Y, Z} is an ancestor of any other), the above are the only possible causal structures (up to relabelling).A, B and C correspond to unobserved variables.
Γ M (C C ) as in [13,15,32] and outlined in Section 2.2.1.Thus, the inclusion of ( 2) is responsible for the new constraints.In addition to deriving entropy inequalities computationally, we also provide analytic derivations of (infinite families of) new inequalities.

Improved outer approximation to the entropy cone of the classical triangle scenario
The triangle causal structure, called C 3 , is one of the simplest examples with interesting features [6,14,16,31,36].It can be used when three parties make observations, X, Y and Z respectively, on systems, A, B and C, that are shared pairwise between them.This may for instance be realised in a communication protocol where three parties aim to obtain (correlated) data while interacting in pairs and without ever having interacted as a group.
C 3 is one of only five distinct causal structures involving three observed random variables that exhibit no ancestral relations between the observed variables (cf. Figure 2).All except for the causal structures (c) and (e) may be distinguished by looking at independences among the observed variables, X, Y and Z, listed in Table 1.However, while the causal structure of Figure 2(c) does not impose any restrictions on the compatible P XY Z , the distributions that are compatible with the triangle causal structure of Figure 2(e) obey additional constraints [58]. 9This illustrates that causal 9 For instance, perfectly correlated bits X, Y and Z, i.e., those with joint distribution  2. structures encode more than the observed independences.Furthermore, C 3 is unique among these five causal structures, it being the only one that features quantum correlations that are not classically reproducible, i.e., P M C C 3 P M C Q 3 , as proven in Ref. [31] (see Section 5.1 for further details regarding the quantum scenario). 10  are not achievable in this causal structure.This is not only true classically, but also in any generalised probabilistic theory [36,58]. 10In structures (a), (b) and (c) all joint distributions are allowed for the variables that share a common cause in the classical case.Hence, quantum systems do not enable any stronger correlations.This is because, for any quantum state ρA shared at A and measured later, the correlations can be classically reproduced if A sends out the same classical output statistics to the parties directly.In structure (d) no non-classical quantum correlations exist either [31].This is also fairly intuitive: the quantum measurements performed at X and Y could be equivalently performed at the sources B and A respectively, such that these sources distribute cq-states of the form In the following, we derive new and improved outer approximations to Γ * M (C C 3 ) by using non-Shannon entropy inequalities.These show that the Shannon approximation to . We remark that our findings contradict the considerations of [14,16], which together argue that in the marginal scenario there is no separation between the Shannon cone and the classical entropy cone, i.e., they argue that , which would imply that non-Shannon inequalities are irrelevant. 11For further discussion of the discrepancy with [14,16], see Appendix A.
The set of all observed distributions compatible with C C 3 is12 The compatible entropy vectors are, and is a convex cone (cf.[62]).The Shannon outer approximation 13 , was explicitly computed by Chaves et al. [14,16], where is the coefficient matrix of the following three equivalence classes of inequalities y PY (y) |y y| ⊗ ρ y A Z instead.The same correlations can be achieved classically by taking random variables B = X and A = Y (these being distributed according to PX and PY ).Since ρ x B Z and ρ y A Z are functions of X and Y , the statistics formed by measuring such states can be computed classically via a probabilistic function (this function could be made deterministic by taking B = (X, W ), where W is distributed appropriately).
(where permutations of X, Y and Z lead to a total of 7 inequalities): We now show that tighter outer approximations of the set of achievable entropy vectors in the marginal scenario of the triangle, Γ * M C C 3 , can be derived by using non-Shannon type inequalities.However, there are infinitely many such linear entropy inequalities.To restrict the number of inequalities to be considered, the following reasoning can be applied.As mentioned in Section 2.1, all known non-Shannon entropy inequalities for four variables can be written as the sum of the Ingleton quantity (1) and (conditional) mutual information terms.Since the latter are always positive, any non-Shannon inequality is irrelevant (i.e., implied by existing ones) if the causal restrictions imply that the Ingleton term is non-negative.This significantly reduces the choices of variable sets for which the known additional inequalities may be relevant.
Example 1.Consider Proposition 1 with (X 1 , X 2 , X 3 , X 4 ) = (A, B, C, X).The corresponding inequality is Whenever a causal structure C C implies I(A : B) = 0, i.e., independence of A and B, the above inequality is implied by the Shannon inequalities and the independence constraint I(A : B) = 0. Hence it cannot improve our outer approximation.
The following proposition restricts the permutations of each non-Shannon inequality that may be relevant for the derivation of our improved approximations to Γ * M C C 3 .

Proposition 3. Consider an entropy inequality on four variables that enforces the non-negativity
14 Recall that an explicit linear description of their entropy cone is generally only available for causal structures with up to three nodes.In particular, such a description is not available for Γ * C C 3 , which involves six nodes.Hence, it is impossible to directly compute Γ * M C C 3 with a variable elimination algorithm.
of a positive linear combination of the Ingleton quantity (1) and (conditional) mutual information terms.This inequality is implied by the Shannon inequalities and the conditional independences of C C 3 (i.e., I(A : XBC) = 0, I(X : Y ZA|BC) = 0 and appropriate permutations) for all choices of four out of the six involved random variables, except up to exchange of X 1 and X 2 or exchange of X 3 and X 4 .
All known irredundant non-Shannon inequalities satisfy the conditions of this proposition.Note also that the application of non-Shannon inequalities to subsets of four out of the six random variables in C C 3 does not encompass all possible applications of these inequalities.Specifically, each inequality can also be applied to sets of five or to all six random variables, where the joint distribution of some sets of two or three random variables are interpreted as those of one of the four random variables in the non-Shannon inequality.We have not looked into such configurations.
Proof.For four random variables X 1 , X 2 , X 3 and X 4 , the Ingleton inequality (9) can be equivalently rewritten in four more ways with the following equalities: For the inequality ( 9) not to be implied by the Shannon inequalities and the conditional independences we need X 1 , X 2 , X 3 and X 4 to be such that hold simultaneously.If the conditional independences of C C 3 imply that one of these mutual informations is zero then the Ingleton inequality can be expressed as a positive linear combination of (conditional) mutual information terms in one of its five equivalent forms and the corresponding non-Shannon inequality is redundant.
For the five constraints (11) to hold simultaneously, X 1 and X 2 have to be correlated with one another as well as with two further variables.This excludes the independent sources A, B and C as candidates for X 1 and X 2 ; therefore X 1 , X 2 ∈ {X, Y, Z}.Furthermore, the variables X 3 and X 4 have to be correlated with both, X 1 and X 2 .This excludes the two variables in {A, B, C} that do not lie between X 1 and X 2 in C C 3 .Hence, for each choice of X 1 and X 2 , the variables X 3 and X 4 have to be chosen as the remaining element of {X, Y, Z} and the variable positioned opposite it in C C 3 .In summary, (X 1 , X 2 ) (X 3 , X 4 ) can only be (X, Y ) (Z, C), (X, Z) (Y, B) and (Y, Z) (X, A) up to permutations of the variables within a tuple.
If we were to take one 4-variable non-Shannon inequality into account and apply it to any subset of four out of the total of six random variables in the causal structure, this would leave us with 360 permutations of the inequality (if the inequality is not invariant under the permutation of any of the four involved variables).Proposition 3 reduces this to only 12 (potentially) irredundant permutations.
For each non-Shannon inequality, these 12 permutations are candidates for improving the outer approximation to Γ * M C C 3 .We remark here that for most known non-Shannon inequalities, several of these 12 permutations can be shown to be redundant 15 .Despite accounting for this reduction in the permutations of each inequality, the number of different inequalities to be considered is infinite, and any outer approximation to Γ * M C C 3 could (potentially) be tightened further by including additional inequalities.
In principle, the more inequalities that are added, the better the approximation to Γ * M C C 3 .However, adding too many inequalities at a time renders the task of marginalising infeasible.Applied to a system of n 0 inequalities the Fourier-Motzkin algorithm can yield up to n 0 2 2 inequalities in the first elimination step.Iterating the procedure for n steps produces up to 4 inequalities.To avoid this double exponential behaviour the elimination algorithm can be adapted by implementing a few rules to remove some of the many redundant inequalities produced in each step.These rules are collectively known as Cernikov rules [17,18] and comprehensively explained in [4].It is known, however, that the number of necessary inequalities can still grow exponentially [48].That said, the worst case scaling may not be exhibited in our case.In fact, the inequalities defining Γ C C 3 contain few variables each and thus lead to far fewer than the maximal number of inequalities.However, computational resources still limit us to adding a relatively small number of different supplementary inequalities to the standard Shannon cone at a time.
We have used the previously outlined technique to compute tighter outer approximations to Γ * M C C 3 , by including a manageable number of non-Shannon inequalities at a time: Case 1 : We include the inequality from Proposition 1 as well as all six inequalities from [25] applied to all subsets of four out of the six variables of C C 3 .This leads to 45 classes of inequalities, of which 41 are not part of the outer approximation Γ M C C 3 .Case 2 : We include the inequalities of the form given in (2) and (3) for s = 1, 2, 3 and for all subsets of four out of the six variables in C C 3 .In this case, we find 114 classes of inequalities, of which 110 are not part of the outer approximation Γ M C C 3 .In each case, all classes (together with the number of members in each class) are provided as Supplefor the marginal scenario then some of these inequalities may be redundant for our purposes.
mentary Information.We have compared our new approximations to the Shannon outer approximation by sampling uniformly over the surface of the positive sector of the unit hypersphere around 0 in R 7 [49] 16 .A measure for the hyperdimensional solid angle included by these approximations is given in terms of the fraction, α, of points within the respective cones.We have sampled 3.2 × 10 9 points each, which led to the following estimates for α: This shows that the difference between the three approximations it relatively small: the hyperdimensional solid angle encompassed by the cones of the Case 1 and Case 2 approximations are both roughly 93% of that of the Shannon cone.An explicit entropy vector that lies in the Shannon approximation, but not in either of the new outer approximations to (11,14,14,20,20,23,28).We also derive some valid families of inequalities.

Proposition 4. All entropy vectors
for all s ∈ N. The same holds for all permutations of X, Y and Z. 16 I.e., from the set {v ∈ R 7 : vi ≥ 0, The proof of this proposition can be found in Appendix B.
Further families of inequalities can be derived by separately considering different inequalities from a family, e.g. the same permutation of (2) for each s ∈ N, and combining them with the same Shannon inequalities to obtain new constraints on the marginal scenario by means of the Fourier-Motzkin elimination algorithm.Tighter inequalities are often obtained by combining several permutations of an inequality (2).
Combining instances of (2) for several s ∈ N leads to an even larger number of new inequalities, which render many of the families derived with the previously explained method redundant.For the few orders s up to which we were able to run our calculations, the families (12) and (13) from Proposition 4 were the only two for which none of the inequalities were implied by others.Similar considerations can be applied to (3) (from which ( 14) is derived) and to further families of inequalities [27].
One might imagine that adding genuine five and six variable inequalities to Γ C C 3 leads to further entropy inequalities for C C 3 .It turns out that applying the five and six variable inequalities from [41,69] to five and six variables of the triangle causal structure respectively does not lead to a tighter outer approximation to C C 3 than the inequality from Proposition 1.This can be shown by expanding the inequalities into a linear combination of mutual information terms and applying a similar reasoning to that in the proof of Proposition 3. As they are not particularly instructive, the technical details of these arguments are omitted here.The same is not known to hold for the inequality derived in [67].

Conjecture 5. Infinitely many linear inequalities are needed to characterise
Our main evidence for this is that the families of inequalities (2), used by Matúš to prove that the analogue of this conjecture holds for Γ * 4 , lead to infinite families of inequalities for C C 3 after marginalising (cf.Proposition 4).The curve constructed by Matúš in Ref. [42] to prove his statement for Γ * 4 can be adapted to our scenario, which can be used to show that the inequalities (12) are independent.However, we were not able to show that this curve can be realised with entropy vectors that are compatible with the triangle causal structure, and hence we cannot exclude the possibility that the marginal cone The infinite families of inequalities (cf.Proposition 4) that we obtained from Matúš's original family of inequalities may indicate that this region of entropy space retains a non-polyhedral segment after the causal constraints are included and the set is projected to the marginal scenario.However, it could be that non-polyhedral boundary regions do not survive the mapping to entropy vectors for C C 3 .If this were the case then (most of) our infinite set of inequalities would be rendered redundant by another inequality.

Application of non-Shannon inequalities to various causal structures
The concept of a generalised DAG was introduced in [36], the idea being to have a framework in which classical, quantum and even more general systems can be shared by unobserved nodes.For the details, we refer to the original paper.The part that is of interest here is that the authors of Ref. [36] list 21 generalised DAGs with up to six nodes for which there may be a separation between the correlations realisable classically and quantum mechanically, i.e., between P M (C C ) and P M (C Q ) [36,53]. 17We analyse these from an entropic perspective, looking for a causal structure C in which there is a separation between Γ * M C C and Γ * M C Q .Among these structures there are three that have fewer than six nodes, displayed in Figure 4.For these three, we find that the vertices of the corresponding Shannon cone, Γ M (C C ), are achievable with entropy vectors of classical probability distributions compatible with the causal structure, from which it follows that this cone is equal to the entropy cone Γ * M (C C ). (This can also be shown by computing an inner approximation to the corresponding entropy cones and showing that the inner and outer approximations coincide, e.g. by employing linear rank inequalities as outlined in Section 4.) Our results also imply that the consideration of non-Shannon inequalities cannot lead to any further constraints in these three causal structures.In the following, we furthermore show that there is no entropic separation between classical and quantum versions of these causal structures.Proposition 6.Let C be any of the causal structures shown in Figure 4.
Remark 7. Note that there are causal structures involving up to five variables that reduce to those shown in Figure 4 under the reduction rules from [36].Our proof does not rule out that these exhibit a classical to quantum separation.
Further details, including the proof of Proposition 6 are given in Appendix C.
The 18 remaining example causal structures involve six variables.For all of them we have found that several instances of the non-Shannon inequality from Proposition 1 lead to tighter entropic constraints for the classical marginal scenarios than those listed in [36].
For the causal structures with four observed variables, instances of this inequality are relevant even without considering the unobserved nodes.These instances thus hold whether or not the unobserved nodes are classical or quantum.Hence, they allow us to tighten the outer approximations to the sets of achievable entropy vectors in both cases, in contrast to non-Shannon inequalities that are applied to unobserved variables classically (for which the quantum analogue is not known to hold). 18he above considerations have not enabled us to show a separation between the achievable entropy vectors in the classical and quantum cases, hence we are left with the following open problem.
Open Problem 8. Find a causal structure C with a set of observed nodes M in which the sets Γ * M (C C ) and Γ * M (C Q ) are provably different, or show that this can never occur.

Application of non-Shannon inequalities with post-selection
In the discussion so far we have not considered a related technique that allows for post-selection on particular outcomes of certain variables.The idea of doing this first appeared in [7] based on results by Fine [29,30] and was later generalised [11-13, 16, 32, 53].We refer to [62] for an explanation of this technique.
Here we illustrate that non-Shannon inequalities can be used in combination with post-selection by discussing a specific example relevant for information causality [50].Information causality is an information theoretic principle obeyed by classical and quantum physics but not by general probabilistic theories in which there are correlations that violate Tsirelson's bound [19], e.g.generalized no signalling theory [2], which allows PR-Boxes as a resource [56,59].The principle is stated in terms of the optimal performance of two parties in a game, which we describe below, and is quantified in terms of an entropic quantity.
Alice holds two pieces of information 19 , X 0 and X 1 , she can send classical information Z to Bob, who is later given a message R indicating whether he should guess X 0 or X 1 .Bob's guess is denoted Y .Alice and Bob are able to use a pre-shared resource (depicted as A) to help them.The relevant causal structure of the game is displayed in Figure 5(a) and it is often analysed after post-selecting on the value of R, which can be done using the causal structure of Figure 5(b) (note that in the quantum case the variables Y |R=0 and Y |R=1 do not coexist, so it doesn't make sense to consider IC Q R ; instead a restricted set of entropies needs to be considered -see later).A theory is said to obey information causality if for all pre-shared resources allowed by the theory, A stronger set of entropic constraints for this causal structure were found in [16], including the relation which holds for both classical and quantum shared resources. 2019 In general the game is formulated for more, but we restrict to two here for simplicity. 20Because the existence of a joint distribution of Y |R=0 and Figure 5: (a) Causal structure underlying the Information Causality game, IC.Alice holds a database, here made up of two bits X 0 and X 1 .These need not be independent, which is expressed by a potential causal influence from X 0 to X 1 .She is then allowed to send a message Z to Bob, who, depending on which bit R a referee asks for, takes a guess Y of either X 0 or X 1 .Alice and Bob may have shared some resources (represented by A) before performing the protocol, either some classical randomness, a quantum system, or a resource from a more general nonsignalling theory, which Alice may use in order to choose her message and Bob may use to make his guess.(b) The effective causal structure of the Information Causality game after post-selecting on binary R, labelled IC R .This causal structure shares some of its marginal distributions with conditional distributions of IC, i.e., if we use P for the distribution in IC R and Q for that in IC then We show that using non-Shannon inequalities leads to a tighter outer approximation of the information causality scenario in the case of a classical shared resource.Considering just the inequality from Proposition 1 (and permutations) has led us to derive a total of 265 classes of entropy inequalities, including the 52 classes that were obtained without non-Shannon constraints in [16] (a list of all 265 classes together with the number of representatives of each class is available as Supplementary Information).Moreover, we expect further non-Shannon inequalities to lead to numerous additional constraints potentially rendering our inequalities redundant.In principle, infinite families of inequalities, similar to those found in Proposition 4 for the triangle scenario could also be derived here.
In the quantum case, we can only apply the non-Shannon inequalities to the two coexisting sets of exclusively classical variables X 0 , X 1 , Z, Y |R=0 and X 0 , X 1 , Z, Y |R=1 , which means that we can impose a set of 24 additional constraints (including permutations) just by adding all permutations of the inequality from Proposition 1 to the outer approximation that is obtained without these (no Y |R=1 with appropriate marginals is not clear in the quantum case, the two variables have to be interpreted as alternatives and are part of different coexisting sets.Therefore, the analysis of IC C R does not carry over to the quantum case, but a separate analysis is required there. further variable elimination is required).
It is worth pointing out that although our results (in the form of new inequalities) imply that previous entropic characterisations of IC R were not tight, the inequality (15) is not rendered redundant by our new inequalities.

Inner approximations to the entropy cones of causal structures
To complement the outer approximations, it is sometimes useful to consider inner approximations to the entropy cones of causal structures.This is particularly useful when one can show that inner and outer approximations coincide, as they then identify the actual boundary of the entropy cone.Examples for this are the three causal structures of Figure 4, also discussed in the previous section.Hence, inner and outer approximation together serve as a relatively simple means to identify the boundary of certain entropy cones.Such findings also immediately imply that non-Shannon inequalities are irrelevant for improving on the outer approximation to the entropy cone for the causal structure in question.
Furthermore, we can often find inner approximations that share extremal rays with the outer approximations derived from the Shannon and independence constraints (even when the two do not coincide).They hence allow us to identify the regions of entropy space where our approximations are tight and those regions where there is a gap between inner and outer approximation. 21Such a gap can be explored, e.g. by using non-Shannon inequalities, as was explained in the previous section.
Inner approximations also serve as a tool to decide whether entropy vectors are suitable for certifying the unattainability of particular distributions that are suspected not to be achievable within the causal structure at hand.If such a distribution leads to an entropy vector within an inner approximation to the entropy cone in question, this means either that the distribution is in fact achievable within the causal structure or that the causal structure allows for another distribution with the same 21 Such a comparison of inner and outer approximations can be performed for the entropy cone of a causal structure including its unobserved variables, i.e., before marginalisation, as well as for the respective approximations to its marginal cone, which we are mainly interested in here.
entropy vector (or an arbitrarily good approximation of such).Hence, to determine whether the distribution in question is achievable, switching to a more fine-grained method (see for example [53,65]) is necessary.
In the following we show how inner approximations can be found in different scenarios.

Techniques to find inner approximations for causal structures with up to five observed variables
For a causal structure, C, that involves a total of four or five variables, inner approximations to its entropy cone can be derived from Γ I 4 or Γ I 5 respectively (as defined in Section 2.1) combined with the conditional independence constraints of C C , which together constrain a cone Γ I C C .An inner approximation to the corresponding marginal scenarios, Γ I M C C , is then obtained from Γ I C C with a Fourier-Motzkin elimination, like for outer approximations.It is guaranteed that Γ I M C C is an inner approximation to Γ * M C C , as it is a projection of an inner approximation Γ I C C ⊆ Γ * C C .Hence, inner approximations can be straightforwardly computed for such causal structures.Examples where this applies are the three causal structures of Figure 4. Implementing all relevant linear rank inequalities of four and five variables (which includes their permutations and the application of the Ingleton inequality to each four variable subset as well as grouping several variables to one) [26] and then performing a variable elimination may be impractical and computationally challenging for certain causal structures.Furthermore, for causal structures that involve more than five nodes not all possible linear rank inequalities are known and their number may even be infinite [24].It is therefore useful to derive inner approximations by other methods.For a causal structure, C, the following methods are examples of how to derive inner approximations, Γ I M C C : • Construct (random) entropy vectors from distributions compatible with C C and take their convex hull.
• Take the vertices of Γ M C C that are reproducible with distributions compatible with the causal structure, their convex hull is an inner approximation.
• Take the outer approximation to the classical causal structure, Γ C C , as a starting point and add a manageable number of linear rank inequalities to derive further constraints.These inequalities may be employed either before or after marginalising, which leads to different cones. 22The convex hull of the reproducible rays is an inner approximation.
For the three examples of Figure 4 it is rather straightforward to recover all extremal rays of the outer approximation to the marginal scenario, Γ M C C (cf. also Appendix C), i.e., the second method above is effective.
Overall, we found that whenever the extremal rays are not all straightforwardly recovered, the third method is effective.This is our preferred technique because by starting out with extremal rays of the Shannon cone we obtain approximations that in some regions are already tight (as 22 If for instance all linear rank inequalities in up to k observed variables are added after marginalisation, the resulting cone corresponds to the intersection ΓM C C ∩ Γ I k , where k is the number of observed variables. opposed to the first method), and, at the same time adding linear rank inequalities helps us identify those extremal rays that are likely to be reproducible with distributions in C C (this may help us avoid dropping reproducible rays in some situations).The entropy cones obtained in this way are not necessarily inner approximations, and, if they are, they have to be proven as such, for example by explicitly constructing distributions that reproduce entropy vectors on each of the extremal rays (as with the second method above).However, in all our examples this method allowed us to recover a cone of which all extremal rays were easily seen to be reproducible after adding only few linear rank inequalities to Γ C C .(If this were not the case one could still drop several irreproducible rays from the resulting cones to obtain an inner approximation.)The method is illustrated in the example below.
We also remark here that in order to improve on inner approximations obtained with the second or third method above, the first method is applicable.A detailed exposition of this is presented in Appendix D.
This method can also be applied to causal structures with more than five variables.For the first few causal structures from [36] we have recovered inner approximations by adding the Ingleton inequality to Γ M C C , i.e., by taking the intersection Γ M C C ∩ Γ I k (the extremal rays as well as distributions recovering entropy vectors on each of them are available as Supplementary Information).
In the following we give a detailed analysis of the inner approximation to the triangle causal structure and compare this to the outer approximations presented in previous sections.

Example: Inner approximation to
Here, we derive an inner approximation to the entropy cone compatible with C C 3 .An inner approximation to Γ * 6 in terms of linear rank inequalities is not available (see also Section 2.1).Nonetheless, we are able to derive an inner approximation to Γ * M C C 3 by relying on Ingleton's inequality.In the following, we apply (4) to any subset of four out of the six random variables of C C 3 and take all their permutations into account.We concisely write these inequalities in a matrix M I and consider the cone When marginalising this cone we obtain  16). 25 The proof proceeds as follows.There are only three instances of the Ingleton inequality that are not implied by Proposition 9.
is an inner approximation to the marginal entropy cone of the triangle causal structure, The proof of Proposition  C 3 ) such that H(P ) = H(P ).The correlations of Figure 6, realised in the quantum version of the triangle causal structure, C Q 3 , which will be considered in detail in Section 5.1, are one such example.These are not in P M (C C 3 ), but their entropy vector nevertheless satisfies (16).Our argument implies that there must be another distribution realisable in C C 3 with the same entropy vector. 26

Non-Shannon inequalities in the quantum and hybrid triangle causal structures
In this section, we compare classical and quantum versions of the triangle causal structure (the distinction reflecting the nature of the unobserved nodes).We also consider hybrid scenarios, in which some of the unobserved systems are restricted to be classical while others are quantum.
These turn out to be insightful for understanding the gap between classical and quantum causal structures.We also analyse whether non-Shannon inequalities lead to improved entropic characterisations in these cases.

Quantum triangle scenario
It was first shown in Ref. [31], that there are joint distributions among the three observed variables X, Y and Z in C Q 3 that cannot be reproduced in C C  3 , based on the CHSH scenario (see Figure 6 and Appendix F for the details).Hence C Q 3 might also lead to a larger set of compatible entropy vectors than C C 3 .Entropically, C Q 3 can be analysed with the technique outlined in Section 2.2.2.An outer approximation, Γ M C Q 3 , to the set of achievable entropy vectors, Γ * M C Q 3 , was constructed in [16].It led to the Shannon inequalities for the jointly distributed X, Y and Z and the additional inequality as well as its permutations in X, Y and Z [16].
It is natural to ask whether tighter approximations to Γ * M C Q 3 can be realised by a similar procedure to the one that led to tighter approximations in the classical case.Unfortunately, we don't know of any similar inequalities for the von Neumann entropy of multi-party quantum states.Furthermore, even if the known non-Shannon inequalities were to hold for von Neumann entropy we would not be able to use them to add constraints to C Q 3 due to the lack of large enough sets of coexisting, interdependent variables. 27pen Problem 10.Do the closures of the sets of compatible entropy vectors coincide in the classical and the quantum triangle scenario, i.e., does Note that if this were to be answered in the affirmative, it would point towards deficiencies of the current entropic techniques for approximating Γ * M C Q 3 , which are not able to recover any additional inequalities similar to the non-Shannon inequalities found in the classical case.
One way to solve this problem would be to find an entropy vector compatible with C Q 3 that lies outside one of our outer approximations to Γ * M C C 3 .Random searches where the sources A, B and C distribute up to four qubits each did not yield violations.However, the evidence from these random searches against a separation of the classical and the quantum sets is relatively weak.For one, our classical outer approximations might be so loose that they contain Γ * M C Q 3 .To counter this, we have attempted to randomly search for vectors that lie in In spite of the fact that we know such vectors exist, we were unable to randomly find any.This shows the weakness of random searching, and also that the region we are looking for (if it even exists) is small with respect to our sampling measure. 28 A natural candidate for an entropy vector that might violate some of our classical inequalities is the one corresponding to the CHSH correlations that were shown not to be reproducible in C C 3 in Ref. [31] (detailed in Figure 6 where Z = (A , B ) and in Appendix F).However, the corresponding entropy vector lies inside Γ I M C C 3 so is classically reproducible.This particular distribution is also achievable in the causal structure P 4 (a causal structure equivalent to the one in Figure 4(b)).Any distribution compatible with P 4 may be realised in C 3 by choosing one of the variables, e.g.Z, to have two outputs, one depending only on the input from node A and the other one depending on the input from B. Distributions realisable in P Q 4 or P C 4 are thus always realisable in C Q 3 or C C 3 respectively.According to the results of [63], all entropy vectors realised with distributions in P Q 4 are also classically achievable, i.e., realisable in P C 4 28 This is not a statement about the geometric extent of this region (for instance in terms of a hyperdimensional solid angle as was previously considered for inner and outer approximations to Γ * M C C 3 ).Instead, since we are sampling quantum states here, and since these are not in a one-to-one correspondence with the entropy vectors, it is a statement about the fraction of states and measurements that may produce entropy vectors outside Γ I M C C 3 (in low dimensions) according to our sampling distribution.This must be a very small proportion of states and measurements (we didn't sample any).Note also that if there is a gap between Hence, constructing a vector in the first gap by sampling quantum states is even more difficult than for the second.[31].The observed variables X = ( X, B) and Y = ( Ỹ , Ã) are chosen such that P X Ỹ|AB maximally violates the CHSH inequality [20].Z = (A , B ) is such that B = B = B and A = Ã = A. In essence the reason that this cannot be realised in the causal structure C C 3 is the CHSH violation.Note though that it is also important that information about A is present in both Y and Z (and analogously for B), otherwise the correlations could be mocked up.In Proposition 11, we prove that a strategy where Z = AND(A , B ) also leads to correlations that cannot be classically realised (see Appendix F for further details).
(at least asymptotically).Hence, no distribution in P Q 4 can violate any of the classical entropy inequalities valid for C C 3 .A way that might still allow us to use our knowledge about quantum correlations that are not classically reproducible in the Bell scenario to violate our entropic constraints to Γ * M C C 3 , is by processing the inputs to all three nodes X, Y and Z, so as to get around the results from [63]. 29 In the following, we generalise the distribution that was utilised in Ref. [31] to show that there is a separation between the achievable distributions in C C 3 and C Q 3 , to a scenario where there is local processing at each output node.This also allows us reduce the required dimension of the output at Z for which one can provably detect a difference between classical and quantum distributions from two bits to one bit.Proposition 11.There are non-classical quantum correlations in C 3 in the case where X and Y output two bits each while Z outputs only one.
A proof of Proposition 11 can be found in Appendix F. It is interesting in so far as the example in [31] relies on a Bell inequality violation.Given this, one might have expected that all information 29 Two distributions that share the same entropy vector can be very different and hence may be separated by local processing.
about the measurement choices in the Bell setup, Ã and B, has to be exposed at the observed node Z. Proposition 11 shows that this is not the case.
Nonetheless, we find that the entropy vector used to prove this proposition does not violate our classical inequalities.We have also taken Z to be determined by different functions of A and B and have additionally considered local processing of X and Y .However, even after such post-processing, for instance by applying all possible functions from two bits to one, we have not been able to detect any violations of the classical entropic bounds.

Note that vectors outside Γ I
M C C 3 can be constructed with appropriate post-processing of the (quantum) distribution.A possible way to achieve this is applying and or or functions appropriately.One may for instance consider the quantum scenario detailed above, and take X = AND( X, B), Y = AND( Ỹ , Ã) and Z = OR(A , B ).This renders the interaction information of the entropy vector of the joint distribution of X, Y and Z positive, so the vector is not in We have similarly tried to violate our entropy inequalities by relying on games other than the CHSH scenario, for which we know that there is distinctive quantum behaviour (i.e., a separation at the level of correlations); these include input states and measurements known to lead to violations of the chained Bell inequalities [7] or the Mermin-Peres magic square game [45,52], all with post-processing at (X, Y and) Z.
We have further considered scenarios where all three parties measure entangled states and use the measurement outputs as inputs for further measurements.We have also attempted to incorporate functions known to lead to a positive interaction information in the classical case, as well as functions from two to one bits in general, into these scenarios.None of these attempts has led to a violation of the classical inequalities so far.In a number of scenarios we have also considered shared PR-boxes instead of entangled states, again without detecting any violations of the inequalities.In most cases the corresponding entropy vectors have a negative interaction information, and hence lie in Γ I M C C 3 , so can be realised with a classical distribution as well, like in the case of the correlations mentioned at the end of Section 4.2.

Hybrid triangle scenarios
In a hybrid causal structure some of the unobserved nodes are allowed to be quantum, whereas others are restricted to be classical.One motivation for this is that sharing entanglement over large distances is challenging due to noise, so two distant observations might be assumed to have a classical cause while nearby ones could have quantum causes.In the case of the causal structure C 3 , there are two such hybrid scenarios: either one or two of the three unobserved variables can be chosen to be classical, whereas the others are quantum.We call these two causal structures C CQQ 3 and C CCQ 3 respectively.In the following, we will approximate the sets of compatible entropy vectors for both scenarios.We show that in hybrid scenarios of the triangle causal structure non-Shannon inequalities are relevant.

C CQQ 3 scenario
In this scenario one of the unobserved variables is classical (we take this to be A).The techniques introduced in Sections 2.2.1 and 2.2.2 allow us to compute approximations of the set of allowed entropy vectors.We find i.e., the outer approximation to Γ * M C CQQ 3 obtained without taking non-Shannon inequalities into account coincides with the outer approximation to Γ * M C Q 3 .However, unlike in the fully quantum case C Q 3 , non-Shannon constraints can be included for C CQQ   3   , for instance the inequality from Proposition 1 with variable choices This results in a tighter approximation to Γ * M C CQQ   3   , which comprises the Shannon inequalities for three variables, the constraint (17) Further non-Shannon constraints could also be exploited to improve these approximations.Hence, some of the extremal rays of the Shannon outer approximation Γ M C Q 3 are provably not achievable if A, B and C do not all share entangled states.Note that this does not imply that the sets of achievable entropy vectors in C Q 3 and C CQQ 3 differ.However, the difference in their outer approximations may prove useful for analysing whether there is a difference between the two.
If one were to prove by other means that there is no such difference the inequalities for C CQQ 3 would give us a way to better approximate the set of achievable entropy vectors of C Q 3 .

C CCQ 3 scenario
In this scenario we take A and B to be classical.This scenario can be understood as a Bell scenario, where the measurement choices of the two parties are unobserved and processed to one single observed output, Z. 31 The distributions from Section 5.1, that are provably not reproducible in C C 3 can be generated in this causal structure.Its entropic analysis thus restricts the violations of our classical inequalities we may hope to achieve with such distributions.To approximate the set of compatible entropy vectors of this scenario, Γ * M C CCQ

3
, we proceed analogously to the C Q 3 and C CQQ 3 scenarios before.However, the result differs and leads to a tighter cone, even without considering non-Shannon inequalities, i.e., . The approximation is given by the three variable Shannon inequalities and the following additional inequalities: up to permutations of X, Y and Z in the first inequality and of Y and Z in the second.Note that these five inequalities are a subset of the seven inequalities (8) delimiting Γ M C C 3 and that The inequalities ♦ XZBY ≥ 0, ♦ Y ZAX ≥ 0 and ♦ Y ZXA ≥ 0 lead to the additional inequalities (including permutations of X and Y in the last inequality).They render the second inequality (and its permutations) in (18) redundant, while the first remains (for all of its permutations).Note that the first inequality of ( 19) is also present in . As in the previous example, further constraints could likely be derived by considering additional non-Shannon inequalities.

Conclusions
We have shown that non-Shannon inequalities tighten the entropic approximations of the classical entropy cones in many causal structures including the triangle scenario and the causal structure relevant for information causality.Our newly derived inequalities improve on the entropic distinction of these from other (classical) causal structures, which is of interest for inferring (classical) causal relations.They also constitute a set of restrictions on the classical entropy cones that we cannot derive in the quantum case, which may point towards differences between the sets of achievable entropy vectors in classical and quantum case.
Since it is known from the Bell scenario that quantum correlations can be detected by considering the entropies of the variables in a post-selected causal structure [7], our analysis of the information causality scenario is the one that is most likely to be useful for this purpose.In this context, non-Shannon inequalities may also be important with regard to the discussion of whether entropic techniques may even be sufficient for certifying classical reproducibility in certain scenarios, a question 30 The second of these inequalities can be easily derived from ♦Y ZXA ≥ 0 and the conditional independences, analogously to Proposition 4. To derive the first inequality, on the other hand, several inequalities have to be combined. 31Note that even though the sets of achievable entropy vectors in classical and quantum case coincide in the Bell scenario (cf.Section 3.2 and [63]) this may not be the case here as very different distributions may lead to the same entropy vector in the classical and quantum case, which may be separated by local processing.
that has previously been explored for the CHSH scenario in Ref. [11].
While the entropy vector approach is known to be a useful means for distinguishing different classical causal structures, its ability to differentiate between classical and quantum versions of the same causal structure is known to be limited [63].The present work has unveiled further limitations of the approach: for all causal structures classified in [36] we found either that the sets of achievable entropy vectors in classical and quantum case coincide (for the causal structures of Figure 4), or that non-Shannon inequalities play a role in their characterisation leaving us unable to make such a statement.
One of the reasons why it is difficult to make such a statement when non-Shannon inequalities play a role is our relatively poor understanding of the structure of entropy space.Even in the absence of a causal structure we lack a tight characterisation of the set of allowed entropy vectors for four random variables.In the quantum case, it is an important open problem whether any further general constraints on the von Neumann entropy exist.This partly explains our inability to show whether there is some causal structure in which the described entropy vector approach can be useful for distinguishing classical and quantum.
Behind all this is the question, of whether there is a novel technique that allows for an efficient and accurate way to distinguish classical and quantum versions of the same causal structure.Such a technique needs to simplify the description of the set of allowed distributions but remain complex enough to retain the distinctive features of classical, quantum and post-quantum probability distributions.Identifying such a quantity would provide further insight into the meaning of cause in quantum mechanics.searches in Fortran.We are grateful to Matthew Pusey for alerting us to an error in an earlier draft.This work was supported by the Engineering and Physical Sciences Research Council through a First Grant (no.EP/P016588/1) and the Quantum Communications Hub (grant no.EP/M013472/1).
A Discussion of discrepancy with [14,16] Our new approximations to Γ * M C C 3 presented in Section 3.1 contradict the claim in [14,16] that 3 ).This appendix reviews these results and explains the discrepancy.
In [14,16], the inequalities defining the set Γ M C C 3 as well as its vertex description were calculated.Furthermore, probability distributions P ∈ P M that achieve the rays of Γ M C C 3 were presented in the Supplementary Information of [16].However, it was not shown there, that the corresponding distributions, P , lie in P M C C 3 , and hence that the corresponding entropy vectors are achievable in C C 3 .Our results imply that and that three of the extremal rays of Γ M C C 3 cannot lie within Γ * M C C 3 , specifically the ray containing the vector v = (2, 3, 3, 4, 4, 5, 6) and its permutations.In the Supplementary Information of [16], v is shown to be achieved with the probability distribution where x ∈ {1, . . ., 4}, y, z ∈ {1, . . ., 8} and ⊕ denotes addition modulo 2. This means that P XYZ (x, y, z) = 1 64 if and only if either x, y and z are all odd or they are all even.This distribution can be mapped to the perfect correlations of (7) by locally mapping all odd outcomes to 1 and all even outcomes to 0 at X, Y and Z.Since perfect correlations are known not to be achievable in P M C C 3 , the distribution (20) is not compatible with the triangle causal structure.
This resolves the apparent contradiction of our results with those from [14,16].What was shown there is that i.e., that all vectors v ∈ Γ M C C 3 can be written as the entropy of a valid probability distribution, or arbitrarily well approximated by such (but not necessarily one that is achievable in the triangle causal structure).

B Infinite families of inequalities
Infinite families of inequalities may be derived to tighten the entropic approximation to Γ * M C C 3 .Here we give the proof for the three examples provided in Proposition 4.However, there are numerous other examples that can be derived in a similar way.
The families (12) and (13) are derived from (2) by combining the inequalities for one s-value at a time with Shannon and conditional independence constraints.These are the only families derived from (2) in this way for which none of the resulting inequalities are rendered redundant by those found in the calculations for Case 2 in Section 3.1.
Proof of Proposition 4. We tackle the three equalities separately.(12): The instance of inequality ( 2) with (X 1 , X 2 , X 3 , X 4 ) = (X, Y, Z, C) can be rewritten as Applying I(X : Y |C) = 0 and I(Z : C) = 0, all terms containing the variable C cancel and we recover (12).

H(XY ), H(XZ), H(Y Z), H(XY Z))
that arise from Shannon and independence inequalities.Two of the constraints this elimination yields are the following inequalities, We now use (22) to remove H(CY ) from ( 21), which yields With ( 23), H(CX) and H(C) are eliminated from (24), which concludes the proof for this case.( 14): In a similar manner as for the family (13), we consider inequality 3 with variable choices (X 1 , X 2 , X 3 , X 4 ) = (X, Y, C, Z) and the independences I(X : Y |C) = 0 and I(Z : C) = 0 to obtain We also consider two inequalities that are obtained from marginalising Γ C

C Entropy cones for causal structures with up to five variables
For most causal structures with up to five nodes, the sets of compatible distributions generated with classical and quantum resources are identical, and hence, so are their entropic cones [36].Ref. [36] reports one causal structure with four nodes (Figure 4(a)) and 96 causal structures with five, for which this equivalence does not hold.were reduced to the three causal structures shown in Figure 4, using reduction criteria.In the following, we show that, for the three causal structures in question, the classical and quantum entropy cones coincide.Note that this does not imply that the same holds true for the remaining 94 causal structures.An example where we have not been able to establish this is the causal structure Î C.
Proof of Proposition 6.We begin by showing that for the causal structures shown in Figure 4, the classical entropy cones coincide with the corresponding Shannon approximations.For the instrumental scenario of Figure 4(a) this is shown in Example 2, for the Bell scenario of Figure 4(b) this was previously shown in [63].In the following, we hence consider Figure 4(c).
The Shannon inequalities and independence constraints lead to an outer approximation that is the conic hull of the following vectors, denoted here as lists of their components, ordered as  The following strategies confirm that all of the extremal rays are achievable within the causal structure and, hence, that we have found the associated entropy cone.Note that ⊕ denotes addition modulo 2.
• The entropy vectors (1) and ( 2) are recovered by choosing A and B to be uniform bits and • (3) is recovered by letting A and B be uniform bits and X = A ⊕ B, Y = B, Z = A.
• The entropy vector ( 4) is recovered by letting A and B be uniform bits and X = A ⊕ B, Y = (B, X), Z = (A, X).
• Let A and B be uniform bits and let • X is a uniform bit and Y = X = Z to recover (6).
• To recover vectors ( 7) and ( 8), A or B are taken to be a uniform bit, and X = A = Z or X = B = Y respectively.The remaining variable is deterministic.
• Entropy vectors ( 9)-( 11) are obtained by choosing either X, Y , or Z respectively to be uniform bits and the other two variables to take a value deterministically.
We next show that in all three examples the Shannon outer approximation also coincides with the set of compatible entropy vectors in the quantum case Γ * M C Q .For this, we rely on the facet description of the respective cones and show that each of the inequalities also holds in the quantum case.
1.For the instrumental scenario of Figure 4(a) the only inequality in addition to the Shannon inequalities for three observed variables is I(X : ZY ) ≤ H(Z) [36].This holds in the quantum case because where the first inequality is a DPI, then we use submodularity, the independence of X and A Y and monotonicity for the cq-state ρ XZA Y .
2. For the Bell scenario of Computing an outer approximation in terms of Shannon and independence (in)equalities as well as including all permutations of the Ingleton inequality for four of the five variables, yields a cone with 46 extremal rays.In the following, we list an entropy vector on each such extremal ray, with components (H(X 0 ), H(X 1 ), H(Z), H(Y ), H(X 0 X 1 ), H(X 0 Z), H(X 0 Y ), H(X 1 Z), H(X 1 Y ), H(ZY ), H(X 0 X 1 Z), H(X 0 X 1 Y ), H(X 0 ZY ), H(X 1 ZY ), H(X 0 X 1 ZY )), where rays that are obtained from others by permuting X 0 and X 1 are omitted.
( We have identified probability distributions compatible with Î C C that reproduce vertices on each of the rays.Hence the convex hull of these rays is an inner approximation to Γ * M Î C C .It is characterised by 23 classes of inequalities, giving a total number of 35 inequalities when including permutations.In the following we list distributions recovering one vector on each extremal ray (again not listing strategies for the rays that are obtained from others by permuting X 0 and X 1 ).For this purpose, let C 1 , C 2 , C 3 , C 4 , C 5 and C 6 be random bits and let ⊕ denote addition mod 2.
• The searches for these distributions were performed by hand; they could, however, also be straightforwardly automated.
The Shannon outer approximation to Γ * M Î C C shares the 46 extremal rays of the inner approximation (given above) but has six additional ones, where in the following we list one vector on each ray, omitting rays obtained through permutation of X 0 and X 1 as above, We can show that these vectors are all outside Γ * M Î C C , by resorting to non-Shannon inequalities.The Shannon outer approximation is characterised by 19 classes of inequalities, or a total of 29 inequalities including permutations.

E Proof of Proposition 9
In the following we prove Proposition 9. First, we have computed the 7 extremal rays of Γ I M C C 3 .We list one vector on each such extremal ray in the 33

As usual, we order the components as (H(X), H(Y ), H(Z), H(XY ), H(XZ), H(Y Z), H(XY Z))
. 34 To do so, note that in seven dimensions seven inequalities can lead to at most seven extremal rays (choosing six of the seven to be saturated).One can then check that each of the claimed rays saturates six of the seven inequalities constraining Γ I M C C 3 .

F Proof of Proposition 11
In this section, we prove Proposition 11 based on reasoning from [31].There, the idea is that in C Q 3 one can take X and Y to correspond to two bits, which we call ( X, B) and ( Ỹ , Ã) respectively.The quantum state corresponding to node C is a maximally entangled state Ψ C = 1 √ 2 (|01 − |10 ), the first half of which is the subsystem to C X and the second half is C Y .A and B can be taken to be uniform classical bits.We introduce Π θ = |θ θ|, where |θ = cos( θ 2 ) |0 + sin( θ 2 ) |1 , and the four POVMs violates the CHSH inequality [20].The observed variables are then X = ( X, B), Y = ( Ỹ , Ã) and Z = (A , B ), with the correlations set up such that B = B = B and A = Ã = A. In essence the reason that this cannot be realised in the causal structure C C 3 is the CHSH violation.Note though that it is also important that information about A is present in both Y and Z (and analogously for B).
If for example, we consider the same scenario but with Y = Ỹ then we could mock-up the correlations classically.This can be done by removing A, replacing B with (B 1 , B 2 ) and taking B 1 , B 2 and C to each be a uniform random bit.We can then take Y = C, Z = (B 1 , B 2 ) and X = (f (C, B 1 , B 2 ), B 1 ), where f is chosen appropriately.Since f can depend on all of the other observed variables it can generate any correlations between them 35 .In the causal structure C 3 , taking Ã = A ensures that these are shared through A and hence information about them cannot be used to generate X.
Our Proposition 11 requires a restriction of Z to one bit of information, which we prove to be possible in the following.
Proof of Proposition 11.First, since all classical distributions can be realised using quantum systems, P M C C 3 ⊆ P M C Q 3 .We now show that

Figure 3 :
Figure 3: Triangle causal structure C 3 .Three observed random variables X, Y and Z have pairwise common causes.In the classical case these common causes are random variables, A, B and C, while in the quantum case these are replaced by quantum systems, (A Y , A Z ), (B X , B Z ) and (C X , C Y ).

Figure 4 :
Figure 4: Three causal structures, C, for which the outer approximation, Γ M (C C ) tightly approximates the classical entropy cone Γ * M (C C ), which also coincides with Γ * M (C Q ).The observed variables are labelled W , X, Y and Z, the unobserved nodes are called A and B.

Example 2 (
Inner approximation to the instrumental scenario.).For the classical instrumental scenario, C I of Figure 4(a), we can compute an inner approximation by adding the conditional independence constraints I(A : X) = 0 and I(X : Y |AZ) = 0 to the Ingleton cone Γ I 4 , as prescribed above.We can, however, also directly prove that Γ * M C C I = Γ M C C I by showing that all permutations of the Ingleton inequality are implied by Shannon and conditional independence constraints and, hence, inner and outer approximations coincide for C C I .Since I(A : X) = 0, I ING (A, X;Y, Z) ≥ 0 is immediately implied by Shannon and independence constraints.Furthermore, the rewritings of I ING according to (10) imply that I(A : X) = 0 which (together with the Shannon inequalities) implies all permutations of the Ingleton inequality except for I ING (Y, Z;A, X) ≥ 0. We can rewrite I ING (Y, Z;A, X) = I(Y :Z|A)+I(Y :X|Z)+I(X:A|Y )−I(X:Y |A) = I(Y :X|Z)+I(X:A|Y )+I(Y :Z|AX)−I(X:Y |AZ), the positivity of which is hence implied by the Shannon inequalities and the independence constraint I(X : Y |AZ) = 0.

Example 3 .
Consider the classical causal structure of Figure 5(a) and remove the node R to give a 5-variable causal structure, Î C C .We can in principle consider all linear rank inequalities of five random variables combined with all Shannon inequalities and the conditional independence constraints, which would give us an inner approximation, Γ I M Î C C , to the entropy cone, Γ * M Î C C .This procedure would involve a (impractically) large number of inequalities.Instead, we can consider the outer approximation in terms of Shannon inequalities and conditional independence constraints, Γ M Î C C , and intersect this cone with the Ingleton cone for the four observed variables, Γ I 4 , i.e., we add all permutations of the Ingleton inequality for the four observed variables to Γ M Î CC .This is easily obtained but does not result in any restrictions beyond those of the Shannon outer approximation, which is characterised by 52 extremal rays.Adding the Ingleton inequality for all subsets of four out of the five random variables to Γ Î C C before performing the variable elimination, only 46 extremal rays are recovered.These are straightforward to reproduce with entropy vectors in Î C C .23

where M I,M C C 3 contains only one inequality, 24 − 16 )24
I(X : Y : Z) ≥ 0. (This relation can also be analytically derived from the Ingleton inequality and the conditional independence constraints of C C 3 . 25achievable in Î C C , because they violate the entropy inequalities we obtain when taking non-Shannon inequalities into account in the computation of the outer approximations to Γ Inequality (16) renders the three Shannon inequalities of the form I(X : Y |Z) ≥ 0 redundant.Γ I M C C 3 is thus fully characterised by the six remaining three variable Shannon inequalities (constraining Γ3) and (

Figure 6 :
Figure6: Scenario involving unobserved quantum systems, leading to a distribution which is not reproducible with classical A, B and C[31].The observed variables X = ( X, B) and Y = ( Ỹ , Ã) are chosen such that P X Ỹ|AB maximally violates the CHSH inequality[20].Z = (A , B ) is such that B = B = B and A = Ã = A. In essence the reason that this cannot be realised in the causal structure C C 3 is the CHSH violation.Note though that it is also important that information about A is present in both Y and Z (and analogously for B), otherwise the correlations could be mocked up.In Proposition 11, we prove that a strategy where Z = AND(A , B ) also leads to correlations that cannot be classically realised (see Appendix F for further details).

Figure 4 (
b) the only constraints (in addition to the four variable Shannon inequalities) are the independencies I(W : Y Z) = 0 and I(Z : W X) = 0, which hold in the quantum case.3.For the causal structure of Figure4(c) the only additional inequality is I(Y : Z|X) ≤ H(X)[36].This holds in the quantum case becauseI(Y : Z|X) ≤ I(Y : B Z |X) ≤ I(A Y : B Z |X) ≤ H(A Y X)+H(B Z X)− H(X)−H(A B Z ) = H(A Y X)+H(B Z X)−H(X)−H(A Y )−H(B Z ) ≤ H(X),where the first two inequalities are DPIs and the third holds by monotonicity.The equality holds because A Y and B Z are independent and the last inequality follows from two submodularity constraints.D Inner approximation to Γ * M Î CC

Table 1 :
Distributions compatible with the three-variable causal structures displayed in Figure Π 3π/2 F 0 = Π π/4 , Π 5π/4 F 1 = Π 3π/4, Π 7π/4 .Consider a measurement on the C X subsystem with POVM E B (i.e., if B = 0 then E 0 is measured and otherwise E 1 ), and likewise a measurement on C Y with POVM F A .Let us denote the corresponding outcomes X and Ỹ .With this choice P X Ỹ|AB