Algebra of Nonlocal Boxes and the Collapse of Communication Complexity

Communication complexity quantifies how difficult it is for two distant computers to evaluate a function f ( X, Y ) where the strings X and Y are distributed to the first and second computer, respectively and under the constraint of exchanging as few bits as possible. Surprisingly, some nonlocal boxes, which are resources shared by the two computers, are so powerful that they allow to collapse communication complexity, in the sense that any Boolean function f can be correctly estimated with the exchange of only one bit of communication. The Popescu-Rohrlich ( PR ) box is an example of such a collapsing resource, but a comprehensive description of the set of collapsing nonlocal boxes remains elusive. In this work, we carry out an algebraic study of the structure of wirings connecting nonlocal boxes, thus defining the notion of the “product of boxes” P ⊠ Q , and we show related associativity and commutativity results. This gives rise to the notion of the “orbit of a box”, unveiling surprising geometrical properties about the alignment and parallelism of distilled boxes. The power of this new framework is that it allows to prove previously-reported numerical observations concerning the best way to wire consecutive boxes, and to numerically and analytically recover recently-identified noisy PR boxes that collapse communication complexity for different types of noise models. Nonlocal boxes (NLBs) were introduced by Popescu and Rohrlich in 1994 as a theoretical generalization of quantum correlations [39]. When Alice and Bob share a pair of entangled states | Ψ ⟩ , each of them can choose to measure their state in a certain basis depending on

Communication complexity quantifies how difficult it is for two distant computers to evaluate a function f (X, Y ) where the strings X and Y are distributed to the first and second computer, respectively and under the constraint of exchanging as few bits as possible.Surprisingly, some nonlocal boxes, which are resources shared by the two computers, are so powerful that they allow to collapse communication complexity, in the sense that any Boolean function f can be correctly estimated with the exchange of only one bit of communication.The Popescu-Rohrlich (PR) box is an example of such a collapsing resource, but a comprehensive description of the set of collapsing nonlocal boxes remains elusive.
In this work, we carry out an algebraic study of the structure of wirings connecting nonlocal boxes, thus defining the notion of the "product of boxes" P ⊠ Q, and we show related associativity and commutativity results.This gives rise to the notion of the "orbit of a box", unveiling surprising geometrical properties about the alignment and parallelism of distilled boxes.The power of this new framework is that it allows to prove previously-reported numerical intuitions concerning the best way to wire consecutive boxes, and to numerically and analytically recover recently-identified noisy PR boxes that collapse communication complexity for different types of noise models.
Nonlocal boxes (NLBs) were introduced by Popescu and Rohrlich in 1994 as a theoretical generalization of quantum correlations [38].When Alice and Bob share a pair of entangled states |Ψ⟩, each of them can choose to measure their state in a certain basis depending on some "instructions" x, y ∈ {0, . . ., p}, and then each of them can encode their outcomes in respectively a, b ∈ {0, . . ., q}.Similarly, a two-party NLB is a "black box" shared between Alice and Bob, with some inputs x, y and some outputs a, b, and with the rule that Alice has access only to the left part and Bob only to the right part, see Figure 1.This way, we only study the statistics produced by the "hidden state" inside of the box and not the physical theory describing that state, which is the reason why nonlocal boxes are said to be device-independent.In this work, we consider one of the simplest scenarios, the CHSH scenario named after Clauser, Horne, Shimony and Holt [15], where p = q = 1 and with two parties Alice and Bob (for more general scenarios, see [3,5,13,41]).
Generally, nonlocal boxes are non-signalling, meaning that they respect the relativistic constraint of no faster-than-light communication between parties (although there is a recent interest for partially-signalling scenarios [41]).These non-signalling boxes form a set NS defined by the together with the conditions that the marginals of each party are independent of the other party's question: ∀a, x ∈ {0, 1}, ( The physical interpretation of Equations ( 2) and (3) is that Alice and Bob are space-like separated, so it would take more time for a light ray to move from Bob to Alice than the time needed for Alice to do her protocol and to receive her output of the box, and therefore Alice's marginal does not depend on Bob's input.The best-known example of a non-signalling box is the PR box, named after Popescu and Rohrlich [38].This box is designed to perfectly win at the CHSH-game [15], i.e. the box produces outputs a, b such that a ⊕ b = xy with probability 1, where the symbol "⊕" stands for the sum modulo 2. 1 More precisely, given an input pair (x, y), there are two possibilities for the outputs: either (a = 0, b = xy) or (a = 1, b = xy ⊕ 1), each being output with probability 1/2 by the PR box.In fact, it is possible to show that the PR box thus defined is the only box of NS that perfectly wins at the CHSH-game.Let us give two other examples of boxes: (i) the fully mixed box I, which outputs purely random bits a and b; and (ii) the deterministic boxes P 0 , P 1 that always output (0, 0), (1,1) independently of the inputs.All these boxes are in NS and can be written as conditional probability distributions: .
For more details on correlation sets and their separation, see [26].Interestingly, some nonlocal boxes collapse what is called communication complexity (CC), a notion introduced by Yao in [45] and reviewed in [27,40] that quantifies the difficulty of performing a distributed computation.Say we want to evaluate a Boolean function f : {0, 1} n × {0, 1} m → {0, 1} using two distant computers, where the first computer receives as input X ∈ {0, 1} n , and the other computer receives as input Y ∈ {0, 1} m .The CC of f is then defined as the minimal number of bits that the computers need to communicate in order for the first computer to output the value f (X, Y ).For instance, when n = m = 2, X = (x 1 , x 2 ), Y = (y 1 , y 2 ), the CC of f 1 := x 1 • (y 1 ⊕ y 2 ) equals 1, using the communication bit y 1 ⊕ y 2 , whereas it is possible to show that the CC of f 2 := x 1 • y 1 ⊕ x 2 • y 2 equals 2, using communication bits y 1 and y 2 ; therefore f 2 is more complex than f 1 in the sense of CC.Yao also introduced in [45] a probabilistic version of CC, in which the computers can access shared randomness, and where for all X, Y the first computer has to output the correct value f (X, Y ) with probability at least p > 1/2 (p being independent of X and Y ).Now, if a nonlocal box P is used in the protocol to compute the value f (X, Y ), we say that the box P collapses communication complexity if there exists a fixed p > 1/2 for which any Boolean function f , with arbitrary input size, can be correctly computed with only one bit of communication and probability p.In this definition, an arbitrary number of copies of the box P can be used in the protocol.Such a collapse is strongly believed to be unachievable in Nature since it would imply the absurdity that a single bit of communication is sufficient to distantly estimate any value of any Boolean function f [6,10,12,17].For more details on the link between nonlocal boxes and communication complexity, see [7].
Open question.Among the four examples of boxes listed in Equation ( 4) , only the PR box collapses communication complexity [17], meaning that this box is very "powerful".We also know that some noisy versions of the PR box collapse CC for different types of noise [8,10,11,12,20].On the other hand, we know that quantum correlations do not collapse communication complexity [16], and neither does a slightly wider set named "almost quantum correlations" [34].To this day, the question is still open whether the remaining non-signalling boxes are collapsing, meaning that there is still a gap to be filled.We refer to [8, Fig. 2] for a figure that summarizes the situation.
Recent results.Two recent papers make progress on the question above.In [11] Brito, Moreno, Rai and Chaves study correlation distillation in quantum voids [39], which are subsets of a face of NS where all nonlocal points are non-quantum.They prove strong distillation properties in 1and 2dimensional quantum voids, and deduce that these regions collapse communication complexity, which partially answers the open question.More recently, in [20], Eftaxias, Weilenmann, and Colbeck propose a sequential algorithm to find a suitable sequence of wirings to collapse communication complexity, where a wiring is defined as a connection between boxes that allows the creation of a new box out of copies of a box (see Section 1).This allows to numerically determine a collapsing region of nonlocal boxes, which is again a partial answer to the open question.
Our Results.We provide a new mathematical framework and algorithms in working towards addressing the open question.The ideas are based in part on the M.Sc.thesis of one of the authors [7].
(i) We introduce a new framework that we call the algebra of boxes.After recalling the definition of a wiring W, we use it to introduce a product of boxes P ⊠ W Q. This leads to a natural embedding of the non-signalling set NS in an algebra, which we call an algebra of boxes and for which we characterize associativity and commutativity (see Proposition 7 ).This gives an algebraic perspective on protocols for correlation distillation-for instance, the nonassociativity of the algebra of boxes tells us that the order in which the boxes are wired matters.
(ii) This framework gives rise to the fascinating notion of what we call the orbit of a box.The orbit of P ∈ NS is roughly the set of all possible boxes that can be produced by wiring arbitrarily many copies of P.This allows interesting visualizations of the hidden structure of boxes (see Figure 7), and surprisingly we observe that these orbits satisfy strong alignment and parallelism properties as shown in Theorem 10 and Corollary 11.Moreover, we derive the expression of the highest CHSH-valued box of the (tilted) orbit in Theorem 13 , which confirms the numerical intuition reported in [20,Appendix B], and for which we derive an insightful linear-time algorithm that is exponentially more efficient compared to the naive exponential-time computation of the entire orbit.In addition, we recover in Theorem 15 a similar result as in [20] stating that those methods lead to finding collapsing boxes via the recursive application of the multiplication • ⊠ P on the right.
(iii) We provide algorithms in our GitHub page [9] for the following task: given a box P that we want to show is collapsing, find an appropriate wiring W such that the orbit contains a collapsing box.The idea is to repeat several times in parallel a variant of the Gradient Descent Algorithm in order to find the most appropriate wiring W.These algorithms allow us to recover in Figure 10 similar new collapsing areas as in [20, Figure 3].
(iv) In Theorems 22 and 28, we show that our framework also allows us to recover some analytical results with a new proof based on the algebra of boxes: some triangles in the boundary of NS are collapsing [11].To that end, algorithms of (iii) above were performed in order to identify a convenient wiring.Moreover, in Corollary 26, we recover a result from [39] with a new proof, based on communication complexity, showing that the triangle joining PR, P 0 , P 1 is a "quantum void".
Further Comparison with Recent Results.A contribution of our work is in providing a new algebraic framework for a unified perspective on three recent results [11,20,39].Compared to [20], our work concurrently and independently2 derives the existence of new collapsing boxes using right multiplication (Theorem 15) and we report a similar numerical result (Figure 10).However, as detailed in Remark 18 , we note that our method is very different: instead of maximizing each party's half wiring separately, we optimize over the whole set of wirings -this underscores the complementarity of the works.Moreover, as mentioned above, our Theorem 13 analytically confirms a numerical intuition reported in [20,Appendix B].As for [11], we recover some of their results in Theorems 22 and 28, and although our proof reproduces previously-known analytical collapsing areas, we view our contribution as a new approach based on the algebra of boxes.Regarding [39], we recover one of their result in Corollary 26 with a new proof, based on communication complexity, showing that the triangle joining PR, P 0 , P 1 is a "quantum void".In summary, our new algebraic viewpoint unifies the results of [20] (see Subsection 4.1), [11] (see Subsection 4.2), and [39] (see Subsection 4.2).
Structure.This work is divided into four sections, a conclusion, and some appendices: • Section 1: given a wiring W, we introduce the product of nonlocal boxes P ⊠ W Q and we study the new framework of algebra of boxes.
• Section 2: we define the notion of the orbit of a box P, which consists of all boxes produced using copies of P and the product ⊠ previously defined, and we investigate its surprising geometric structure.
• Section 3: we present algorithms for the following task: given a box P that we want to show is collapsing, find an appropriate wiring W such that the orbit contains a collapsing box.These algorithms are based on Gradient Descents methods and are entirely accessible via our GitHub page [9].
• Section 4: we find collapsing boxes in two different ways: (i) numerically, using the algorithms of the previous section, and (ii) analytically, using the algebra of boxes and the orbit of a box.
• Conclusion: we conclude with some discussion, open problems, and avenues for future work.
• Appendices: we complete this work with supplementary figures and proofs.

Algebra of Boxes
The set of non-signalling boxes NS is the compact convex subset of the vector space B = P : {0, 1} 4 → R satisfying Equations ( 1), (2), (3).In this section, we propose to endow the vector space B with a multiplication ⊠ W , so that B becomes an algebra that we call algebra of boxes and that we denote B W .To that end, we recall the notion of wiring W (deterministic then mixed), which, for the sake of simplicity, we define for only two boxes being connected -see Remark 6 for more generality.Then we provide some typical examples of wirings from the literature, and we finally introduce the algebra of boxes and characterize its associativity and commutativity.

Intuition Behind Wirings
Given two non-signalling boxes P and Q, it is possible to build a new box by wiring them together.This notion of wiring has found a great interest in the last two decades , especially with the following two goals: (i) attempting nonlocality distillation, i.e. we want to build a box that is "strongly nonlocal" starting from some boxes that are "weakly nonlocal" [6,11,12,18,19,20,21,22,24,32]; (ii) finding sets that are closed under wirings, because it is argued that a consistent physical theory should, in principle, be closed under natural simple operations as wirings [1,6,28,33,34].
As one might guess, a wiring simply connects some outputs to some inputs under some rules, and it applies some pre-and post-processing operations to the carried bits.An example wiring is presented in Figure 2 (a), where the wiring indeed connects some outputs to some inputs, but is counter-intuitive at first, since Alice and Bob do not use their share of the boxes in the same order: while Alice uses P then Q, Bob uses Q then P. This independence on the choice of the box order for each player generalizes quantum mechanics, in the sense that if Alice and Bob were sharing two entangled pairs instead of two nonlocal boxes, Alice would be able to measure her first particle and then the second one, while Bob would be able to do the converse, and they would still receive the outputs "instantaneously".Now, as in the quantum case, Alice receives an answer from the box P instantaneously even if Bob has not yet inputted a bit in his side of P, and she can use the output a 1 as a parametrization for the input x 2 of the box Q; similarly for Bob.This "instantaneous-answer" property of a box is typical of non-signalling correlations, as modelled by Equations ( 2) and (3) saying that Alice's marginal is independent of Bob's input, and vice-versa.Note that a wiring cannot link Alice's side to Bob's side, nor the opposite, since otherwise it could create a signalling box: there would be communication between parties.

Deterministic Wirings
Two boxes P and Q can be wired as in Figure 2 (b), using functions f i and g j depending on the global entries x and y and on the outputs a k and b ℓ of the boxes.Nevertheless, to be a valid wiring, the inputs on Alice's side must be in a valid order: the input x 2 of Q can depend on the output a 1 of P only if the input x 1 of P does not depend on the output a 2 of Q; the same holds true on Bob's side.In other words, the functions f 1 (x, a 2 ) and f 2 (x, a 1 ) cannot both depend on a 2 and a 1 respectively for the same value of x, and similarly for g 1 (y, b 2 ) and g 2 (y, b 1 ).These conditions are formalized in (5) and (6) of the following definition: Definition 1 (Deterministic wiring).A deterministic wiring W between two boxes P, Q ∈ NS consists in six Boolean functions f 1 , f 2 , g 1 , g 2 : {0, 1} 2 → {0, 1} and f 3 , g 3 : {0, 1}3 → {0, 1} satisfying the non-cyclicity conditions: ∀y, g 1 (y, 0) − g 1 (y, 1) g 2 (y, 0) − g 2 (y, 1) = 0 .( Given a wiring W and two boxes P, Q ∈ NS, we obtain a new box that we denote P ⊠ W Q. Formally, this new box is defined as the following conditional probability distribution: × 1 a=f3(x,a1,a2) × 1 b=g3(y,b1,b2) , (7) where the symbol 1 stands for the indicator function, taking value 1 if the indexed condition is satisfied, and 0 otherwise.It is important to specify the condition P, Q ∈ NS, since in that case P ⊠ W Q is indeed a conditional probability distribution as shown in the proof of Fact 3 ; otherwise, if one requires only the condition on P, Q to be conditional probability distributions (not necessarily lying in NS), then it might happen that the product P ⊠ W Q is not a well-defined probability distribution: consider for example P = Q = 1 a=y 1 b=x and the deterministic wiring Definition 2 (Closed under wirings).A set X ⊆ NS is said to be closed under wirings if for all boxes P, Q in X and all wirings W, the new box P ⊠ W Q is in X as well. 3  For the sake of completeness, we recall the fact that the non-signalling polytope NS is an example of a set that is closed under wirings.Fact 3. [1] NS is closed under deterministic wirings.
Proof.We need to show that the box P ⊠ W Q given in ( 7) is a well-defined conditional probability distribution that satisfies the non-signalling conditions (2) and (3).First, by non-negativity of P and Q, the new box P ⊠ W Q is non-negative as well.Now, fix x, y ∈ {0, 1}.The non-cyclicity conditions (5) and (6) tell us that f 1 or f 2 is constant in the second variable, and similarly for g 1 and g 2 .Without loss of generality, up to changing the roles of both f 1 , g 1 with respectively f 2 , g 2 , we only need to consider the following two non-exclusive cases: • Case 1: the functions f 1 (x, a 2 ) and g 1 (y, b 2 ) are constant in the second variable, and we denote them f 1 (x) and g 1 (y); • Case 2: the functions f 2 (x, a 1 ) and g 1 (y, b 2 ) are constant in the second variable, and we denote them f 2 (x) and g 1 (y).
In Case 1, we see that coefficients sum to one by normalization of P and Q: In Case 2, using the non-signalling conditions on P and Q, we see that coefficients sum to one again: Hence P⊠ W Q is a conditional probability distribution.It only remains to check that P⊠ W Q satisfies the non-signalling conditions (2) and (3).Fix x, a ∈ {0, 1}.In Case 1, we have for all y ∈ {0, 1}: i.e. the result does not depend on y, which means that the marginal in b is well-defined.This is similar in Case 2, changing f 1 (x) into f 1 (x, a 2 ) and f 2 (x, a 1 ) into f 2 (x).Hence the first nonsignalling condition (2) is satisfied, and the other one (3) follows in a similar way.

Mixed Wirings
Using local randomness, one can generalize the class of deterministic wirings to the one of mixed wirings.The difference is that the functions f i and g j take values in [0, 1] instead of {0, 1}.
For instance, if f 1 (x, a 1 ) = p ∈ [0, 1] for some fixed bits x and a 1 , it means that Alice uses a Bernoulli distribution B(p) to input the bit 1 with probability p, or the bit 0 with probability 1 − p.In other words, we have 32 Bernoulli variables (B 1 , . . ., B 32 ), whose parameters are stored in W = f1(0, 0), f1(0, 1), . . ., g3(1, 1, 1) ∈ R 32 , and the box product P ⊠ Q becomes the expected value of the deterministic wirings: Note that this generalization of wirings does not change the definition of nonlocal boxes: the inputs and outputs of a box are still classical bits, not any real number between 0 and 1.In order to ensure a well-defined local order for both Alice and Bob, we will need to add a dependence relation between the variables B i , namely the non-cyclicity condition, as for the deterministic wirings: Definition 4 (Mixed wiring).A mixed wiring W between two boxes P, Q ∈ NS consists in six functions f 1 , f 2 , g 1 , g 2 : {0, 1} 2 → [0, 1] and f 3 , g 3 : {0, 1} 3 → [0, 1] satisfying the non-cyclicity conditions ( 5) and (6).Mixed wirings form a set that we denote W.
The set of mixed wirings W is not convex because non-cyclicity conditions (5) and (6) are non-affine equalities.For instance, consider the wirings W, W ′ with all coefficients 0 except the one corresponding to respectively f 1 (0, 0) = 1, f ′ 2 (0, 0) = 1; each of these wirings satisfies the non-cyclicity conditions (5) and (6), but the mean W ′′ = (W + W ′ )/2 does not: hence the non-convexity of W. The expression of P ⊠ W Q is the same as before, with the convention that 11), which gives: Using the probabilistic point of view of Equation ( 8), we see that a mixed wiring is a convex combination of deterministic wirings (the realization of Bernoulli variables being either 0 or 1).Hence, by linearity of the expectation E, we deduce from Fact 3 that the set NS is closed under mixed wirings: Fact 5. NS is closed under mixed wirings.Remark 6.Note that the formalism presented here is deliberately not the most general one, since this simpler version is enough to state our results in the next sections.For a more general framework, see [6].For instance, here we require a deterministic local box order: knowing x, Alice perfectly knows which box she will use first, and similarly for Bob knowing y, but a more general mixed wiring would consist in setting a probability distribution on the different permutations of Alice's boxes and another one on Bob's boxes.In addition, here we only defined wirings of depth 2, but it is possible to have more complex wirings using k nonlocal boxes, thus obtaining a wiring of depth k.

Typical Examples of Wirings
We now review typical wirings that are studied in the literature.See Figure 3 for an illustration of these wirings.Note that all of these wirings are deterministic wirings.
Example 0. The trivial wiring W triv is defined as the wiring that does "nothing", in the sense that it outputs exactly the global inputs: (a, b) = (x, y).Similarly, the linear wiring W lin simply connects the output of a box to the input of the box immediately below.
Example 1.In [22], Forster et al. introduce a wiring W ⊕ in order to distill nonlocality.It consists in setting boxes in parallel and in taking the sum mod 2 of the outputs.
Example 2. In [12], Brunner and Skrzypczyk enhance the wiring from Example 1 in order to obtain a better distillation protocol of nonlocality.Their wiring W BS is adaptive, in the sense that boxes are no longer in parallel, but the second box's inputs are the respective product of the general inputs x, y with the outputs a 1 , b 1 of the previous box.Their new protocol is so powerful that it allows to arbitrarily reduce the noise of any correlated box (defined as convex combinations of PR and SR), so that the PR box is almost perfectly simulated.As a consequence, communication complexity collapses; see the next section for more details.
Example 3. In [1], Allcock et al. study two variants of the previous wirings.First, their "distillation wiring" W dist is similar to the one in Example 2: it is adaptive and it also gives a good distillation protocol of correlated boxes.Second, their "AND wiring" W ∧ resembles the one in Example 1: boxes are set in parallel, but we take the product of the outputs instead of the sum.Example 4. In [24], Høyer and Rashid study the depth-k generalizations of the wirings from Examples 1 and 2 and 3: they wire k boxes instead of only two.They also give an example of depth-3 protocol W depth3 that extends the known region of distillable boxes.Note that in this work, our study is limited to depth-2 wirings.
Example 5.More recently, in [32], Naik et al. defined the "OR-AND wiring" W ∨∧ in order to distill nonlocality of quantum correlations.That wiring is a mix of the ones in Examples 1 and 3: it consists in setting boxes in parallel and in taking the maximum (the "OR") of Alice's outputs and the minimum (the "AND") of Bob's outputs.

Algebra of Boxes Induced by a Wiring
Let B be the vector space of all the functions {0, 1} 4 → R, and consider a mixed wiring W.
As defined in Equation ( 9) , the operation ⊠ W is bilinear, so the vector space B equipped with the product ⊠ W is actually an algebra, which we denote by B W for that specific wiring W ∈ W. Its dimension is dim B W = 2 4 = 16.Note that the (affine) dimension of the non-signalling polytope NS ⊆ B W is dim NS = 16 − 8 = 8 because there are 8 dependent variables in the affine conditions (1), ( 2), (3); see [4] for a more general expression.
Multiplication Table .In order to better understand the behavior of the box product ⊠, it is interesting to compute the product of some basic boxes: for instance the boxes PR, P 0 , P 1 , I defined in Equation ( 4).In Figure 4, we present the multiplication table for the wiring W BS from [12].By bilinearity of the box multiplication, this table shows that the convex hull Conv{PR, P 0 , P 1 } is stable under ⊠.On the contrary, obserce that the convex hull Conv{PR, P 0 , P 1 , I} is not stable under ⊠: the product I ⊠ PR gives Q 1 := 1 4 PR − 1 8 P 0 + P 1 + I which is out of the convex hull (nevertheless the affine hull Aff{PR, P 0 , P 1 , I} is stable under ⊠).Notice that we show in Proposition 21 that actually Conv{PR, P 0 , P 1 } = NS ∩ Aff{PR, P 0 , P 1 }.From this table, one may postulate that P 0 is be a right identity in the sense that P ⊠ P 0 = P for all P in NS, and it is indeed true as a simple consequence of formula (7).One may similarly verify that I is a right fixed point, in the sense that P ⊠ I = I for all P in NS, as it is possible to guess from the table.See all the multiplication tables of the typical depth-2 wirings in Appendix C.

Non-Commutativity and Non-Associativity. A direct consequence of the multiplication table in
Figure 4 is that the algebra B W BS induced by the wiring W BS is non-commutative (P 0 ⊠ PR ̸ = PR ⊠ P 0 ) and non-associative ((P 0 ⊠ P 1 ) ⊠ PR ̸ = P 0 ⊠ (P 1 ⊠ PR)).This non-associativity is at the root of interesting remarks, see drawings of the orbit of a box in the next section, Figure 7. Similarly, the algebra induced by the wiring W dist is both non-commutative and non-associative, but on the contrary, the algebras induced by W ∈ {W triv , W ⊕ , W ∧ , W ∨∧ } are both commutative and associative.One may wonder if there exist induced algebras that are associative but not commutative, and the converse.To that end, here is a characterization of commutativity and associativity in a simple case where boxes are set in parallel and with the same input functions: Proposition 7 (Characterization of commutativity and associativity).Assume W is a wiring such that f 1 = f 2 = f (x) and g 1 = g 2 = g(y).Then: in the last two variables, in the sense that f 3 (x, a 1 , a 2 ) = f 3 (x, a 2 , a 1 ) for all x, a 1 , a 2 , and similarly for g 3 .
If in addition f (x) = x and g(y) = y: ) for all x, a 1 , a 2 , a 3 , and similarly for g 3 .
Proof.(i) First, from the expression (7), see that for all bits a, b, x, y and any boxes P, Q in B W , we have: Hence, if f 3 and g 3 are both symmetric in the last two variables, then the difference is null and the algebra is commutative.Conversely, suppose that B W is commutative, so that the left-hand side is null.Taking probability distributions P and Q that are always positive (such as I), we have that the difference in the right-hand side has to be null for all x, y, a, b, a 1 , a 2 , b 1 , b 2 .Fix x, a 1 , a 2 and consider a := f 3 (x, a 1 , a 2 ), and similarly fix y, b 1 , b 2 and consider b := g 3 (y, b 1 , b 2 ).We obtain 1 − 1 a=f3(x,a2,a1) 1 b=g3(y,b2,b1) = 0, which means that both indicator functions are equal to 1, and therefore both subscript equalities hold.Hence, this being true for any fixed x, a 1 , a 2 and y, b 1 , b 2 , we obtain that f 3 and g 3 are symmetric as wanted.
(ii) From ( 7) again, we have for all bits a, b, x, y and any boxes P, Q, R in B W : A similar proof with double implication as in (i) applies, hence the associativity criterion follows.
Now, it is easier to build an associative non-commutative induced algebra B W ′ .Consider the wiring W ′ given by f 1 (x, a 2 ) = f 2 (x, a 1 ) = x, and g 1 (y, b 2 ) = g 2 (y, b 1 ) = y, and f 3 (x, a 1 , a 2 ) := a 1 , and g 3 (y, b 1 , b 2 ) := b 1 .This wiring satisfies the condition (ii) of the proposition and does not satisfy the condition (i), hence it is as wanted.Conversely, with similar arguments, a commutative non-associative algebra B W ′′ is induced by the wiring W ′′ defined by the same f 1 , f 2 , g 1 , g 2 and Therefore, we obtain the table in Figure 5.

Orbit of a Box
In this section, we study the set of all boxes that can be generated given many copies of a starting box P and a wiring W. After introducing the orbit of a box, we provide some consequences to communication complexity.Subsequently, we study a particular example, W BS , with which we find collapsing boxes in Subsection 4.2, and then we give some general remarks about other orbits.Finally, we conclude this section by giving the technical proof of the theorem stating that the "best" parenthetization is the multiplication on the right.

Definition
Given multiple copies of a non-signalling box P ∈ NS and of a (mixed) wiring W, Alice and Bob can produce many other boxes, e.g.(P ⊠ W P) ⊠ W P or P ⊠ W (P ⊠ W P). All of these new boxes are again non-signalling because NS is closed under wirings, see Lemma 5. We call orbit of the box P (induced by the wiring W) the set of all of these possible new boxes: Orbit W (P) := boxes Q ∈ NS that can be produced by using finitely many times the box P and the wiring W where Orbit (k) (P) is called the orbit of depth k of P (or simply k-orbit), defined as: W (P) := all possible products with k times the term P, using the multiplication ⊠ W .
When the context is clear, we overload the notation and write Orbit and Orbit (k) respectively.In general, these k-orbits are not singletons for k ≥ 3 since the algebra B W induced by W is not necessarily associative and commutative (see Figure 5).Actually, up to multiplicity, the cardinal # Orbit (k) is exactly the number of parenthesizations with k terms, which is the Catalan number , which grows exponentially fast.Here are the 3and 4orbits: Orbit (3) (P) = (P ⊠ P) ⊠ P, P ⊠ (P ⊠ P) , Orbit (4) (P) = (P ⊠ P) ⊠ P ⊠ P, P ⊠ (P ⊠ P) ⊠ P, (P ⊠ P) ⊠ (P ⊠ P), P ⊠ (P ⊠ P) ⊠ P , P ⊠ P ⊠ (P ⊠ P) .
Note that a k-orbit (k ≥ 2) can be inductively computed using orbits with lower depth: which is the same recurrence relation as that of Catalan numbers.

Consequences to Communication Complexity
Assume Alice and Bob are given infinitely many copies of a nonlocal box P, and assume they want to distantly compute (in finite time) the value of a Boolean function f (X, Y ), where X, Y ∈ {0, 1} n are strings that are known by Alice and Bob respectively.Among all the possible protocols they can try to do in order to succeed, they can wire their copies of P in order to produce a "better" box.For example, starting from a noisy box P, Alice and Bob can try to produce a box that is closer to the "perfect box" PR which satisfies a ⊕ b = xy without noise.Such a protocol is called a distillation protocol [12].
Find Collapsing Boxes Using the Orbit.Imagine Alice and Bob are able to produce a collapsing box Q after applying wirings to copies of a starting box P. Then they can use that new box Q to distantly compute the value f (X, Y ), which means that they have a protocol to collapse communication complexity and therefore that P is collapsing.This point of view is particularly interesting since it implies that it is sufficient to find a single collapsing box in the union W Orbit W (P) to deduce that P is collapsing as well.See an illustration in Figure 6 (a).
Find Collapsing Boxes Using a Cone.Once we find a collapsing box P, we can deduce many other collapsing boxes: there is a convex cone taking origin at P that is collapsing as well.More precisely, given a box P, denote C P the convex cone of boxes R for which there exists a local correlation L ∈ L such that We claim that if P is collapsing, then any R ∈ C P is collapsing as well.Indeed, assume Alice and Bob are given copies of a box R.Then, they can use shared randomness to produce the wanted box L and the wanted convex coefficient λ, so that they can generate the box P with the relation P = λ R + (1 − λ) L. Now, as P is collapsing, they have a protocol that collapses communication complexity, hence R is collapsing as well.See an illustration in Figure 6 (b).In the study of collapsing boxes, notice that it is standard to assume that shared randomness is a "free" resource; for instance Brassard et.al. made that choice in [10] in their collapsing protocol.
Combining arguments from these last two paragraphs and by the fact that Alice and Bob can make a convex combination of boxes using shared randomness, we deduce a sufficient criterion for a box to collapse communication complexity: If ∃ Q ∈ C that is collapsing, then P is also collapsing.
Figure 6: Orbits that collapse communication complexity.

Case Study: Orbit of W BS
In this subsection, we focus our attention on the wiring W BS inspired by Brunner and Skrzypczyk [12].Denote ⊠ the corresponding box multiplication.Define the shared randomness box as SR := (P 0 + P 1 )/2; it is designed to output a couple (a, b) such that a = b uniformly and independently of the inputs.From the multiplication table in Figure 4, one can see that the 2-dimensional affine space A := Aff{PR, SR, I} is stable under ⊠.As a consequence, the orbit Orbit(P) of any box P in A is itself included in A, and as A is two-dimensional, it is particularly easy to draw the orbit of a box in that case.We represent an orbit in Figure 7.The quantum area Q (in pink) is drawn using formulas from [30].Dark green represents the collapsing area that was found by Brassard et.al. in [10], which consists of all the boxes with CHSH-value higher than 3+ √ 6 6 ≈ 0.91.The orbit is drawn in yellow and orange dots -observe that it intersects the collapsing area in dark green, so Propostion 8 tells us that the starting box P is collapsing.The black circles represent the boxes that were studied in [12], doing "pairwise" multiplications: P, P ⊠ P, (P ⊠ P) ⊠ (P ⊠ P), etc...Each iteration is the wiring of two copies of the previous iteration, it gives a subset of our orbit.As displayed in the drawing and detailed in the proof of Proposition 15, our method allows us to find a larger set of boxes P that are collapsing.
By definition of the affine space A, any box A ∈ A can be uniquely written as A = c 1 (A) PR + c 2 (A) SR + c 3 (A) I for some real coefficients c i (A) that sum to 1, called convex coordinates of A in the affine basis {PR, SR, I}.An interesting aspect of considering convex coordinates is that it gives a simple characterization of the parallelism property of lines: Moreover, in our case, we have an additional interesting property of the third convex coordinate: Proof.The multiplication table induced by the wiring W BS [12] is: where each cell displays the result of P ⊠ Q.For A, B ∈ A whose coefficients c i are denoted a 1 , a 2 , a 3 and b 1 , b 2 , b 3 for the sake of readability, we use the bilinearity of the product ⊠ and we get: Hence, using the normalization property of coefficients i a i = j b j = 1, the third coefficient simplifies as Now, interestingly, we observe that the points of a given k-orbit are all aligned, and we even know the equation of the line: Theorem 10 (Alignment).For any k ≥ 1 and P ∈ A, the points of Orbit (k) (P) are all aligned on a line L k whose expression in convex coordinates is given by: Proof.We prove by induction on k ≥ 1 that Orbit (k) ⊆ L k .For k = 1, the 1-orbit contains only one element, namely P, which obviously satisfies c 3 (P) = 1 − 1 − c 3 (P) , so P indeed belongs to L 1 .Now, assume the result holds until some integer k ≥ 1, and let Q ∈ Orbit (k) .By definition, the box By the induction hypothesis, we know that Then using Lemma 9, we obtain: which means that Q belongs to the line L k .As a consequence, we see that all the points of the k-orbit have the same third convex coefficient, so using the equivalence given in Equation (10), we obtain: Corollary 11 (Parallelism).The supporting line L k of all the orbits Orbit (k) are parallel to the diagonal line L D := Aff{PR, SR}: In particular, all the orbits are parallel to each other: Moreover, looking closely at the sequence of coefficients 1 − 1 − c 3 (P) k and noticing that the diagonal line L D is defined by the equation c 3 (A) = 0, we see that: Corollary 12 (Orbits move to the left).Assume P / ∈ L D .Then the orbits are more and more distant from the diagonal line as k grows.Moreover, the sequence of lines (L k ) k tends to the line L ∞ defined by the equation c 3 (A) = 1, which is exactly the line passing through I and parallel to the diagonal L D .
It takes a lot of computational time to draw k-orbits of a box P as k grows, since it requires to compute elements (Catalan number), which grows exponentially.However, our goal is not to compute the whole orbit, but simply to determine whether or not the orbit intersects the known collapsing area (dark green).To that end, one may notice that it is enough to compute the "highest" box of each k-orbit in the y-coordinate (see Figure 7) and to check whether those "highest" boxes intersect the collapsing area (dark green area).This is the purpose of the following proposition, which displays a simple expression of the "highest" box of each k-orbit, and which allows much faster tests of a box P being collapsing or not without computing all the points of the orbit.We prove this result only in a subset of the orbit, that we call tilted orbit, which is easier to manipulate in inductions, and which is defined by Orbit (1) (P) := {P} and for k ≥ 2: Note that the cardinality of that set is # Orbit (k+1) = 2 k , up to multiplicity.We call CHSH-value the y-coordinate, indicating how "high" is a box: We say that a tilted orbit distills the CHSH-value if it contains a box Q such that CHSH(Q) ≥ CHSH(P).
In the following theorem, we present the expression of the best parenthetization in terms of CHSHvalue, which confirms the numerical intuition reported in [20, Appendix B]: Theorem 13 (Highest box).Let P ∈ A be a box, and let k ≥ 2 an integer such that the tilted (k − 1)-orbit distills the CHSH-value.Then the highest CHSH-value of Orbit (k) (P) is achieved at a box whose expression is the product of k times P on the right: Proof.See Subsection 2.5.

Conjecture 14 (Dyck paths).
We conjecture that the same result actually holds without the tilde, i.e. the right multiplication P ⊠k gives the highest CHSH-value of Orbit (k) (P), as observed numerically.

An idea of the proof could be to use Dyck paths. Each time we open/close a parenthesis, the path goes up/down respectively, which produces a certain Dyck path. The statement to be proved is that each time we convert a ∨ into a ∧, the CHSH-value is non-decreasing. Then, we would have that the best Dyck path is necessarily the one that always goes up first and then always goes down, which corresponds to the multiplication of boxes on the right.
As previously mentioned, the next theorem is concurrent and independent of the work of [20]: Theorem 15 (New collapsing boxes).The techniques described in Subsection 2.2 allow the discovery of new collapsing boxes.See new collapsing areas in Figure 10.
Proof.See Figure 7 for an intuition of the proof.Take the starting box P with coordinates (0.627, 0.862) in the affine plane A = Aff{PR, SR, I}, where the coordinate system is given by the CHSH ′ -and CHSH-values of P. On the one hand, the tilted orbit of P intersects the collapsing area that was found in [10] (in dark green), since for instance CHSH(P ⊠5 ) ≈ 0.913 > 0.908 ≈ 3+ √ 6 6 , so P is collapsing by Proposition 8 .On the other hand, this box P does not lie in any of the previously-known collapsing areas from [8,10,11,12,17] (to the best of our knowledge, these five references are the only previous results showing a collapse of communication complexity, in addition to [20] which concurrently and independently found a similar result to ours as mentioned before).Indeed, it is not in the collapsing areas from [10,17] since CHSH(P) = 0.862 < 0.908, nor is it in the collapsing area from [8] since A + B ≈ 14.13 < 16 (using the authors' notation).The box P neither is in any of the collapsing regions found in [11] since it does not belong to the boundary ∂NS of the non-signalling set.The last area to check is the one from [12], which was numerically found.From a box P, they define a sequence of boxes using "pairwise" multiplications: and they check whether or not there exists an integer n such that 6 .But, for our starting box P, none of the Q n satisfy this inequality: indeed, for 1 ≤ n ≤ 5, it possible to check it by hand, for n = 6 we have Q 6 ∈ L, and for n ≥ 7 we also have Q n ∈ L since L is closed under wirings [1].Hence our example P is a new collapsing box.

Some Other Orbits
In the previous subsection, we studied specifically the orbit of the wiring W BS in the slice of NS passing through the boxes PR, SR, I.Here we comment on some examples of other orbits in three different ways: (i) it is possible to study the same wiring W BS but in different slices of NS; (ii) it is possible to study another wiring than W BS but to keep the same slice as in the previous subsection; (iii) it is possible to change both the wiring and the slice.See Appendix A for many drawings.
(i) We keep the wiring W BS and we consider the slice of NS passing through PR, P 0 , P 1 .Notice that we prove in Proposition 21 that this slice is actually precisely the convex combination of the three points.We draw two examples of such an orbit in Figure 8 , with two different starting boxes.We observe that both of them seem to recover the alignment and parallelism properties that we showed in Theorem 10 and its corollary.Again, these lines seem all parallel to what we called previously the diagonal line L D , which is defined as the line passing through PR and SR = 1 2 (P 0 + P 1 ).Notice that we show in Theorem 22 that all the boxes of this triangle are actually collapsing, except the ones in the segment Conv{P 0 , P 1 }, drawn in pink.
(ii) Among the "typical" wirings defined in Subsection 1.4 , the only ones that stabilize the plane Aff{PR, SR, I} are W ⊕ and W BS , see Appendix C .This is why, for these two wirings, we can conveniently draw the orbits in a plane.The orbit of W ⊕ is drawn in Appendix A (a).We observe that each k-orbit contains only one element, which is not surprising since we know from Figure 5 that its induced algebra is associative, meaning that the choice of parenthetization does not lead to a different result.In the same appendix, we also draw the orbit for three other wirings.Surprisingly, we observe that the three new orbits look the same as the orbit of W BS .
(iii) We add the slice PR, P 0 , P 1 for each wiring of Appendix A. We observe that the alignment and parallelism properties seem to still hold in those cases, as before.Moreover, we see that the example (d) distills the CHSH-value better than the other examples in that slice.

Proof of Theorem 13
Recall that SR := (P 0 + P 1 )/2 is the shared randomness box.Given a non-signalling box P ∈ NS, its CHSH-and CHSH ′ -values are defined as follows:

P(a, b | x, y) .
For example, we have CHSH(PR) = 1 and CHSH(SR) = 3  4 and CHSH(I) = 1 2 .Denote A the affine space A := Aff{PR, SR, I}, and denote A ⊆ A the set of boxes P in the convex hull Conv{PR, SR, I} whose CHSH-value is ≥ 3/4.We will prove our results in A; by symmetry of the problem, similar results also hold in other areas, such as 2I − A the symmetric of A by I. Lemma 16 (Multiplying by P preserves the CHSH-value order).Let P ∈ A, and let Q ̸ = R ∈ A such that the line Aff{Q, R} is parallel to the diagonal line L D := Aff{PR, SR}.We have: Proof.As the box P lies in A, it is of the form P = p 1 PR + p 2 SR + (1 − p 1 − p 2 )I for some coefficients p 1 , p 2 ≥ 0 such that p 1 + p 2 ≤ 1. Rewrite it as P = p 1 p 2 , and similarly denote Q = q 1 q 2 and R = r 1 r 2 for some coefficients q i , r j ∈ R. By the parallelism assumption, vectors Q − R and PR − SR have to be colinear, i.e. there must exist some , so we may rewrite the second coefficient of R as r 2 = q 1 + q 2 − r 1 .With this notation, we can use the linearity of the function CHSH(•) to see that condition CHSH(Q) ≥ CHSH(R) simplifies to (q 1 − r 1 ) ≥ 0: Now, using the multiplication table from Figure 4 and bilinearity of ⊠, we may compute the following expressions: which gives the desired result.
Lemma 17 (Right multiplication gives better CHSH-value).For any P ∈ A and Q ∈ Orbit(P), we have: Proof.Use the coordinate system (x, y) given by the CHSH ′ -and CHSH-values respectively in order to write P and Q as taking coordinates (x P , y P ) and (x Q , y Q ).For instance we have PR : ( 1 2 , 1) and SR : ( 34 , 3  4 ) and I : ( 1 2 , 1 2 ).Use the multiplication table from Equation ( 11) and apply the bilinearity of ⊠ in order to obtain the following expression: For any fixed P ∈ A, we want to show that f P (x Q , y Q ) ≥ 0. By construction, we know that P ∈ L 1 and Q ∈ L k for some k ≥ 1, so by Corollary 12 we have x Q + y Q ≤ x P + y P , which we may rewrite as x Q ≤ x P + y P − y Q .As P lies in A, we have y P ≥ 3 4 , so the first partial derivative is non-positive: which means that the function f P (•, y Q ) is decreasing over R for any fixed y Q .It yields the following inequalities: since both factors are non-negative: the first one is non-negative using the hypothesis CHSH(Q) ≥ CHSH(P), and the second one is non-negative using x P ≥ 1/2 and y P ≥ 3/4 since P ∈ A. Hence f P is non-negative and we obtain the wanted result.
Recall that the set Orbit (k) (P) is called the tilted k-orbit of the box P and contains some boxes Q that are generated by applying a wiring to copies of P. We say that this tilted k-orbit distills the CHSH-value if it contains a box Q such that CHSH(Q) ≥ CHSH(P).In that distilling scenario, we can compute the expression of a box achieving the best CHSH-value: Statement (Theorem 13).Let P ∈ A be a box, and let k ≥ 2 be an integer such that the tilted (k − 1)-orbit distills the CHSH-value.Then the highest CHSH-value of Orbit (k) (P) is achieved at a box whose expression is the product of k times P on the right: Proof.We prove the result by induction on k ≥ 2. It is obviously true for k = 2 since Orbit (2) (P) only contains P ⊠ P. Now, fix k ≥ 2 and assume CHSH(P ⊠k ) ≥ CHSH(Q) for any Q in the tilted k-orbit (induction hypothesis).Assume as well that CHSH(P ⊠k ) ≥ CHSH(P) (distillation hypothesis).We want to show that: for all Q in the tilted k-orbit.The first inequality follows from Lemma 16 using the relation P ⊠k+1 = P ⊠k ⊠P and the induction assumption.For the other inequality, start from CHSH(P ⊠k+1 ) = CHSH(P ⊠k ⊠P) and apply Lemma 17 in order to get ≥ CHSH(P⊠P ⊠k ).Then conclude using Lemma 16 and the induction hypothesis in order to obtain ≥ CHSH(P ⊠ Q) for any Q in the tilted k-orbit.

Numerical Optimization on the Set of Wirings
We saw in the previous section that, given a non-signalling box P, there may exist a wiring W that sufficiently distills the box P in order to collapse communication complexity.The question we address in this section is the following: if the box P is fixed, how to find a wiring W good enough to collapse communication complexity (when it is possible)?The difficulty is that, for each input x, y ∈ {0, 1}, there are 82 possible deterministic wirings [42], leading to a total number of 82 4 ≈ 10 8 possible deterministic wirings.So a naive discrete optimization over deterministic wirings seems inefficient in addition to being ill-adapted to mixed wirings.To that end, we present two optimization algorithms: (i) an algorithm that tests many different combinations of wirings and that is suitable for numerical simulations, and (ii) another one that finds a "uniform" collapsing wiring W in a whole region of boxes, which is appropriate for deriving an analytical proof (see the next section).This section might be skipped at first reading as it is more technical.See our GitHub page for the details of the algorithms [9].
Remark 18 (Comparison with [11,20]).We now compare and contrast our methods with two recent works that also study optimization over wirings: (i) In [11], the authors suggest reducing the 82 4 ≈ 10 8 possible deterministic wirings for Alice and Bob to only 3152 by simply considering the ones that preserve the PR box, i.e. wirings W such that PR ⊠ W PR = PR, and then doing a discrete optimization over that smaller set.This smaller set encompasses for instance the wirings W BS , W dist but discards W ⊕ , W ∧ , W ∨∧ (see definitions in Subsection 1.4).This technique allows them to analytically prove that many new areas of boxes are collapsing.
(ii) In [20], the authors mix two types of algorithms: one for Bob, and then one for Alice.First, they fix Alice's half-wiring and compute a discrete optimization over the 82 2 ≈ 10 5 possible deterministic wirings on Bob's side.Then they fix Bob's half-wiring and apply linear programming to optimize Alice's half-wiring.This allows them to numerically find new collapsing boxes.
(iii) In our work, we use an efficient variant of the Gradient Descent algorithm, based on Line Search methods, frequent resets and parallel descents.A limitation in the method from [11] could come from the fact that many wirings are discarded, and a limitation in [20] could come from the fact that the "best" wiring for the pair (Alice, Bob) might be better than the best one for Bob composed with the best one for Alice.This is why we choose to take our feasible set to be the entire set of mixed wirings W ⊆ [0, 1] 32 .This way, we recover both the numerical results of [20] (see Subsection 4.1) and the analytical results of [11] (see Subsection 4.2).

Goals of the Algorithms
Task A. In order to prove that a box P is collapsing, a particular case of Proposition 8 says that it is enough to find a finite sequence of wirings (W 1 , . . ., W N ) such that the following box is collapsing: Note that we need to specify the parenthetization because the different products ⊠ Wi are potentially non-associative.Among the numerous possibilities, we choose the parenthesization on the left because it is easy to implement and because it is the best one when the wiring is W BS , see Theorem 13.This algorithm will consist in an iterative construction of the sequence (W i ) i : first, find a wiring W 1 such that the CHSH-value of the box P 2 := P ⊠ W1 P is high enough, then find W 2 such that the CHSH-value of the box P 3 := P 2 ⊠ W2 P is high enough, so on and so forth until the N -th iteration.If the CHSH-value of the box P N +1 is above the threshold 3+ √ 6 6 ≈ 0.91%, we know that communication complexity collapses [10], so the starting box P is collapsing as well.Otherwise, we cannot conclude whether P is collapsing or not.
Task B. The goal of this algorithm is essentially the same as the first one, but we add a strong constraint: we want all the W i to be the same wiring W: In that sense, this is a "uniform" version of the first algorithm.The interest of this algorithm is that it helps to give analytical proofs (see Section 4): if the value CHSH(P ⊠ W N ) is above the threshold ≈ 0.91% for some N , then by continuity of ⊠ W , there is an open neighborhood around P such that for any Q close enough to P we also have that CHSH(Q ⊠ W N ) is above the threshold, and therefore the whole neighborhood of P is collapsing.This technique will help to discover wide collapsing areas.

Toy Example (N = 1)
In this subsection, we treat the case when there is only one product ⊠ W between two boxes Q, P ∈ NS.We detail the maximization algorithm we use: a Projected Gradient Descent.The optimization problem consists in finding W * as follows: where the objective function is Φ(W) := CHSH(Q ⊠ W P) for some fixed non-signalling boxes Q, P, and where W is the set of mixed wirings introduced in Definition 4, which we recall below.
The Constraint W ∈ W. Recall that a mixed wiring W between two boxes Q, P ∈ NS is the data of six functions Recall that the corresponding diagram can be found in Figure 2 (b), and that mixed wirings form a set that we denote W. In our algorithms, we view W a real vector with 4 × 2 2 + 2 × 2 3 = 32 variables.This vector stores each value of each function: In order to satisfy the normalization constraint that the f i , g j take value in [0, 1], and the noncyclicity conditions (13) and (14) (which are non-linear conditions), we implement a projection function proj : R 32 → R 32 in Algorithm 1.Notice that our real code is written in a vectorized fashion and is difficult to read as such, so we only present the idea here.Moreover, we use the package PyTorch [36] for automatic differentiation.
Algorithm 1: Projection function proj on the feasible set W. Vectorized version in our GitHub page [9].

Naive Gradient Descent
In order to gain insight into the complexity of the optimization problem, we begin by studying a basic algorithm, the Projected Gradient Descent, with a small learning rate (α ≪ 1) and a lot of iterations (K ≫ 1).We will obtain a histogram of the frequency of the different results we obtain, see Figure 9 (a).
Projected Gradient Descent.We implement a "projected" version of the Gradient Descent algorithm in order to satisfy the constraint W ∈ W at each step.It simply means that each iteration is projected on the feasible set: where α ∈ R is the learning rate.Our implementation can be found in Algorithm 2. We compute the gradient of the objective function using the automatic differentiation Python package torch.autogradthat provides us with the commands backward and grad.As we do not have a good intuition of what could be a good wiring W in W to start with given a fixed box P, we take a random initialization: W 0 is uniformly generated in the hypercube [0, 1] 32 .As such, the vector W 0 is not necessarily a well-defined mixed wiring since it does not necessarily satisfy the non-cyclicity conditions (13) and ( 14), but this problem is fixed after one iteration in the Projected Gradient Descent algorithm since the wiring is then projected.Otherwise, one can also directly apply proj to W 0 .The notation W ∼ U(X) means that we uniformly generate W in the set X.
Estimating the Proportion of "Good" Outputs.We use Algorithm 2 with a learning rate α = 0.01, a number of iterations of K = 10 6 , a tolerance of ε = 10 −6 , and we obtain the histogram presented in Figure 9 (a).Recall that the objective function is Φ(W) := CHSH(Q ⊠ W P); this histogram is drawn with Q = P = pPR + qSR + (1 − p − q)I, where p = 0.39 and q = 0.6.The number of reruns is m = 10 3 , done simultaneously in parallel, which is faster than doing m descents one after another 5 .We observe that the results concentrate on certain discrete values.These values  correspond to different attractive points in different basins of attraction (recall that the initial W is taken uniformly at random in [0, 1] 32 ).As we want to maximize Φ, we are interested in the highest concentrated value ≈ 0.87.In that example, we observe that the proportion of starting wirings such that W out is only χ ≈ 2 /10 3 = 0.2% using this basic Gradient Descent algorithm.This information tells us that the function Φ is difficult to maximize, which is why we present a more efficient algorithm in the following subsection.

More Efficient Algorithm: Line Search with Resets
In this subsubsection, we present a variant of the Gradient Descent algorithm called Line Search, which we enhance with frequent resets of bad outcomes.See [35] for a standard reference book in numerical optimization.The idea of this algorithm is, instead of always keeping the same α, to estimate the best coefficient α k at each step of the descent: As we observed in the previous subsection, the proportion χ of "good" starting wirings is very weak, which is why we apply frequent resets: we do m = 10 3 descents in parallel but only K reset = 100 steps, then we keep only the best m • χ wirings and we reset all the others to a new random initialization.Then we repeat that procedure but we reset fewer wirings (say, at the j-th repetition, keep for instance the best j • m • χ wirings), and we repeat this procedure 1 /χ times.In the end, most of the wirings should be in the good basin of attraction, so we can apply one final run of line search, with many more steps so that it converges to the attractor.See Algorithm 3, and we obtain results of Figure 9 (b).

Task A
Algorithm A is presented in Algorithm 4; it simply consists in applying the toy case Q ⊠ P from the previous subsection recursively N times.We want to find a sequence of wirings W 1 , ..., W N such that the CHSH-value is above the following threshold: .
A consequence is that the box P collapses communication complexity [10].Notice that for some boxes P ∈ NS, it might not be possible to find such a sequence of wirings because it is impossible to distill them by any means.This algorithm is used in Subsection 4.1 in order to plot the new regions of collapsing nonlocal boxes.some hyper-parameter.Typically, we take M ≤ N because it is a lot faster to evaluate the N -th power of P than to optimize the N -th power of P. See the details in Algorithm 5.

New Collapsing Boxes
In this section, we present collapsing boxes found in two different ways.(i) First with a numerical approach, using the algorithms (Section 3).(ii) Then with an analytical approach, using the algebra of boxes (Section 1) and the orbit of a box (Section 2).

Numerical Results
Using Algorithm 4 that addresses Task A, we obtain many collapsing boxes.Some samples are drawn in Figure 10 on some slices of the non-signalling set NS, but note that this algorithm also applies more generally to any desired slice.As previously mentioned, this work is concurrent and independent of the work of [20].In the drawings, some boxes are denoted P L and P NL , let us recall their definition here.The local set L and the non-signalling set NS are polytopes, i.e. the convex hull of a finite number of extremal points.The first set L admits exactly 16 extremal points, called local extreme points and denoted P µ,ν,σ,τ L , where µ, ν, σ, τ ∈ {0, 1}.These 16 points are as well extremal points of NS, together with 8 additional extremal points, called non-local extreme points and denoted P µ,ν,σ NL .They are defined as follows [2,4]: • Nonlocal extremal points: Note that PR = P 000 NL and P 0 = P 0000 L and P 1 = P 0101 L (we remove the commas in the superscripts for the sake of simplicity of notations).
Observe in Figure 10 that, depending on the chosen slice, the collapsing area does not always have the same "shape" nor the same "area".Moreover, notice in the graphs that there seems to exist a collapsing area in the neighborhood below the diagonal segments joining PR and respectively SR, P 0 , P 1 .This is actually true.Indeed, we analytically show below in Theorem 22 that those three segments are collapsing, and we also know that the box product P ⊠ W Q is continuous in P and Q for any W (it is even bilinear, recall the expression in (9)), so distillation protocols are continuous and in some sense the orbits are also "continuous", hence there exists an open neighborhood below these diagonal segments that collapses communication complexity.
Remark 20 (Continuous extension of a finite collapsing set).The algorithm only provides us with finitely many collapsing boxes, but we can still deduce a continuous "extension" of that collapsing set.Indeed, as explained in Subsection 2.2, if we know that a box P ∈ NS collapses communication complexity, then we also know that the cone C P is collapsing, where C P denotes a certain cone  15).The three graphs have the same color legend, displayed at the center, and they are all configured with the same algorithm parameters (Kreset, χ, m, M, N ), detailed at the top.We adopt the following convention: (i) boxes that are numerically determined are drawn with dots, (ii) boxes that are analytically determined are drawn in plain regions (there exist explicit equations describing those regions).Notice that the left drawing is similar to [20, Figure 3], which was found using a different algorithm as detailed in Remark 18.The quantum set Q (in pink) is drawn using formulas from [30]

4.2.1
The Triangle PR, P 0 , P 1 is Collapsing In this subsection, we extend the result [12] from Brunner and Skrzypczyk, who showed that any box in the segment joining the boxes PR and SR := (P 0 + P 1 )/2 is collapsing (except the box SR, which is classical).In the following theorem, we recover a result from [11] stating that any box in the triangle joining the boxes PR, P 0 , P 1 is collapsing (except the boxes in the segment joining P 0 and P 1 , which are classical), with a new proof, based on the algebra of boxes.Recall that PR is the non-signalling box that outputs (a, b) such that a ⊕ b = xy when (x, y) is inputted, and P 0 and P 1 are respectively the deterministic boxes that output (0, 0) and (1, 1) independently of the inputs.Recall also that the convex hull of a set {Q 1 , . . ., Q N } is the set of all possible convex combinations of thoses Q i : and the affine hull of {Q 1 , . . ., Q N } has the same definition but without the non-negativity constraint: the wiring W BS (see definition in Subsection 1.4).By bilinearity of ⊠ and using the multiplication table in Figure 4, computations lead to P α,β ⊠ P α0,β0 = P α, β where: From this remark, we define the following sequence: We easily identify that ℓ := (1, 0) is a fixed point of x → A x + b, so it yields: where the last equality follows from an induction on k.But the matrix A admits exactly two distinct6 eigenvalues λ 1 = 1 − a/2 and λ 2 = −1 + a + 2b.So A is diagonalizable, and its power , where P is an invertible matrix.Hence, from the above equation, the sequence (u k ) k tends to ℓ, and by continuity we have that the sequence of boxes (P u k ) k ⊆ R 16 converges to P ℓ = PR.But Brassard et al. showed that there is an open neighbor around PR that collapses communication complexity [10].Therefore, we know that the sequence (P u k ) k reaches this collapsing neighbor for some k large enough, and using Proposition 8 we conclude that any starting box P u0 ∈ T is indeed collapsing.
Remark 23 (Why is Conv{P 0 , P 1 } non-collapsing?).It is not surprising that the boundary segment Conv{P 0 , P 1 } of T is not in the collapsing area because this segment is included in the local set L, which is itself included in the quantum set Q, for which it is known that communication complexity does not collapse [16].
Remark 24 (Left multiplication does not give the same result).In the proof, we defined our sequence of boxes (P u k ) k based on right multiplication.One could instead try to do the left multiplication: P u k+1 = P u0 ⊠ P u k .In that case, similar computations lead to: α0+β0) (no division by 0 since α 0 > 0).The matrix A ′ is already in the triangular form, its eigenvalues are λ ′ But in that case P ℓ ′ is not the PR box, so we cannot apply Ref. [10] to build a collapsing protocol from any starting box.
Remark 25 (Pairwise multiplication gives the same result).It is also possible to try the pairwise multiplication: P u k+1 = P u k ⊠ P u k , which is the way the authors of [12] originally proved that the segment Conv{PR, SR}\{SR} is collapsing.But this pairwise multiplication does not behave as well as with the right multiplication, iterations u k = (α k , β k ) are non-affine here: Nevertheless, the result still holds using this pairwise multiplication, but the proof we found is much more technical, see Appendix B.
The intersection of the quantum set Q with the boundary ∂NS of the non-signalling set was recently studied in [14].Moreover, the notion of the quantum void was introduced and studied in [11,39], which consist in a subset of ∂NS for which all quantum correlations are actually local.A direct corollary of the previous theorem allows us to single out the quantum correlations of the face C = Conv{PR, P 0 , P 1 }: they are exactly the ones in the segment Conv{P 0 , P 1 }.Indeed, on the one hand, it is known that quantum correlations do not collapse communication complexity [16], and on the other hand, local correlations are particular cases of quantum correlations, so we recover the following statement from [39] with a new proof, based on communication complexity: Corollary 26.The face C = Conv{PR, P 0 , P 1 } ⊆ ∂NS is a quantum void:

Other Collapsing Triangles
The fact that the triangle T := Conv{PR, P 0 , P 1 }\ Conv{P 0 , P 1 } is collapsing induces many further collapsing triangles.In this subsection, we give some examples of such collapsing triangles, thus recovering some results of [11] with a new proof, based on the algebra of boxes.
Proposition 27.Let P, Q, R be three boxes.If there exists a wiring W ∈ W that induces the multiplication table below, then the triangle Conv{P, Q, R}\ Conv{Q, R} is collapsing.
Then, applying the same proof gives the desired result.
Theorem 28.All the triangles drawn in Figure 13 are collapsing.
The wirings of Figure 13 are arbitrary examples of collapsing wirings that were obtained using Algorithm 5; many more wirings can be found in other triangles of NS using the same algorithm, which is accessible via our GitHub page [9].Notice that these wirings are all different from the  ones used in the proof of [11].Now, an interesting problem would be to understand better the structure of the set W so that, given a triangle in NS, we know how to construct a collapsing wiring W without using a search algorithm.

Conclusion
Our new algebraic perspective on nonlocal boxes allowed us to discover a surprising structure of what we called the orbit of a box, with some strong alignment and parallelism properties (see Figure 7 ).As a consequence, this deeper understanding of the algebraic structure of nonlocal boxes enabled us to recover in Section 4 some collapsing regions of NS that were recently also found in [11,20]-for instance boxes in the triangle joining PR, P 0 , P 1 (except boxes in the segment P 0 , P 1 ).The importance of our results is emphasized by considering the many known impossibility results [6,31,43].Note also that according to our present intuition of Nature [6,10,12,17], a direct consequence is that the collapsing boxes we presented are unlikely to exist in Nature.
We made advances towards answering the open question of determining which nonlocal boxes do indeed collapse communication complexity, but there is still a gap to be filled: for instance, as plotted in light blue in Figure 10, there are still regions of NS for which it is unknown whether there is a collapse of communication complexity.
Further work includes the study of the "square root" of a box: here we introduced and studied the properties of the product P ⊠ P, but one might be interested in finding all the boxes Q such that Q ⊠ W Q = P for some wiring W ∈ W. If one knows that P is collapsing, then the wiring W yields a collapsing protocol, thus all the square roots Q are also collapsing.
Another interesting problem is the one mentioned at the end of Section 4. Given a triangle of boxes in NS, we already provided in Section 3 and in our GitHub page [9] an algorithm that intends to find a wiring W that makes this triangle collapse communication complexity.Now, it would be interesting to address the question of how to construct such a collapsing wiring W without using a search algorithm, simply by knowing the target triangle of NS.This can then help to understand better the structure of the set W and importantly to better understand protocols for correlation distillation.
In Subsection 4.2, we give examples of collapsing sets of dimension d = 2 (triangles).An open question from [11] consists in finding higher-dimensional collapsing sets.For d = 3, the authors provide explicit examples of fully distillable sets, but it is still unknown whether these sets are collapsing.For d ≥ 4, they prove that fully distillable quantum voids are not possible.
Another interesting problem would be to generalize and to study the notion of "product of boxes" with more than two boxes.Indeed, our study is limited to wirings connecting two boxes, but it is possible to consider more general wirings, connecting n boxes, which are known to be strictly more powerful [20].As a consequence, it may be that the multi-product of boxes W(P 1 , . . ., P n ) gives rise to similar structures of orbits as the one we found in Figure 7, which would be useful in the study of n-box distillation protocols.
In this work, we chose to study the principle of communication complexity.Although this principle alone cannot rule out the quantum set [34], a clever idea would be to combine it with other principles, such as nonlocal computation [29], information causality [25,37], macroscopic locality [33], local orthogonality [23], nonlocality swapping [44], many-box locality [46], in working towards a comprehensive information-based description of Quantum Mechanics.

A Drawing of Some Orbits
We present below some examples of orbits to illustrate Subsection 2.4, using different wirings, each time in two different slices of NS.Each orbit is drawn with depth going until k = 12.The game G is defined by the winning rule a = 0 and b = y.
In other words, we showed that α∞ is a fixed point of f β∞ .Now, by contradiction if α∞ were different from µ * , i.e. if α∞ < µ * , then (ii) and (iii) would imply either the contradiction α∞ < f β∞ (α∞), or the inlcusion f β∞ (α∞), gα ∞ (β∞) ∈ Int(K), but then (ii) would give f β∞ (α∞) < f β∞ • f β∞ (α∞), which is again the contradiction α∞ < f β∞ (α∞).Hence α∞ = µ * and we obtain the wanted result.where Kn := ∆ ∩ {α ≥ 1/n}.Denote ⊠ the box product induced by the wiring W BS inspired from [12].Fix (α0, β0) ∈ ∆, and fix an integer n ≥ 1 large enough so that (α0, β0) ∈ Kn.We want to build a protocol that collapses communication complexity using the starting box P α 0 ,β 0 as a resource.To do so, we will define a sequence of boxes (P α k ,β k ) k that are eventually collapsing for k large enough.First, see that the closure T of the triangle T is stable under ⊠, because T equals NS ∩ Aff{PR, P0, P1} (see Proposition 21) and the two intersected sets are stable under ⊠7 .Moreover, by bilinearity of ⊠ and using the multiplication table in Figure 4, we obtain: P α,β ⊠ P α,β = P α, β , ( , which implies that ( α, β) ∈ Int(Kn).Finally, we may assume that α0 < 1 (otherwise α0 = 1 and then P α 0 ,β 0 = PR, which is collapsing) and all the conditions of Lemma 29 are satisfied.The lemma tells that the sequence (α k ) k tends to µ * = 1, so (P α k ,β k ) k ⊆ R 16 tends to PR.But we know from [10] that there is a non-empty open set around PR that is collapsing.So there exists a finite k for which the box P α k ,β k is collapsing and we obtain the wanted collapsing protocol. .Find an algorithm to compute a multiplication table in our GitHub page [9].Recall the definitions of P L and P NL in Equation (15).

Figure 1 :
Figure 1: Representation of a nonlocal box.

Figure 2 :
Figure 2: (a) Example of wiring between two boxes P and Q.(b) General wiring between two boxes P and Q.

Figure 4 :
Figure 4: Multiplication table of the operation ⊠ W BS induced by the wiring from [12].Each cell displays the result of P ⊠ Q.The box Q1 at the bottom left is Q1 := 1 4 PR − 1 8 P0 + P1 + I. Further multiplication tables are available in Appendix C.

Figure 5 :
Figure 5: Associativity and commutativity of the induced algebra B W , depending on the wiring W displayed in the table cell.

Figure 7 :
Figure 7: Example of a box orbit, drawn for depth up to k = 12.The quantum area Q (in pink) is drawn using formulas from[30].Dark green represents the collapsing area that was found by Brassard et.al. in[10], which consists of all the boxes with CHSH-value higher than 3+

4
Indeed, for A ̸ = B ∈ A whose convex coefficients are respectively a 1 , a 2 , a 3 and b 1 , b 2 , b 3 , saying that the line Aff{A, B} is parallel to the line Aff{PR, SR} is equivalent to knowing that there exists a scalar λ ∈ R * such that A − B = λ (PR − SR), i.e. we have three equations: a 1 − b 1 = λ and a 2 − b 2 = −λ and a 3 − b 3 = 0. Now, using the normalization condition i a i = j b j = 1 and after removing redundant equations, this condition simplifies a 3 = b 3 , as claimed.

Figure 8 :
Figure 8: Orbit of W BS in a different slice than in Subsection 2.3: here we consider the slice of NS passing through PR, P0, and P1.We represent the orbit with two different starting boxes.Each orbit is drawn with depth going until k = 12.The game G is defined by the winning rule a = 0 and b = y.

Figure 9 :
Figure 9: Histogram of the evaluations of the objective function Φ applied at the output Wout of (a) Algorithm 2 and (b) Algorithm 3. As expected, we observe that the latter is more efficient than the former in maximizing Φ, for equivalent computation duration.

Figure 10 :
Figure 10: In orange are drawn the collapsing boxes outputted by Algorithm 4. Each drawing represents a different slice of NS; the extreme points of the triangles indicate which slice is drawn and the definition of the boxes P NL can be found in Equation (15).The three graphs have the same color legend, displayed at the center, and they are all configured with the same algorithm parameters (Kreset, χ, m, M, N ), detailed at the top.We adopt the following convention: (i) boxes that are numerically determined are drawn with dots, (ii) boxes that are analytically determined are drawn in plain regions (there exist explicit equations describing those regions).Notice that the left drawing is similar to [20, Figure3], which was found using a different algorithm as detailed in Remark 18.The quantum set Q (in pink) is drawn using formulas from[30].References: [BBLMTU06]=[10], [BS09]=[12], [BBP23]=[8].
Figure 10: In orange are drawn the collapsing boxes outputted by Algorithm 4. Each drawing represents a different slice of NS; the extreme points of the triangles indicate which slice is drawn and the definition of the boxes P NL can be found in Equation (15).The three graphs have the same color legend, displayed at the center, and they are all configured with the same algorithm parameters (Kreset, χ, m, M, N ), detailed at the top.We adopt the following convention: (i) boxes that are numerically determined are drawn with dots, (ii) boxes that are analytically determined are drawn in plain regions (there exist explicit equations describing those regions).Notice that the left drawing is similar to [20, Figure3], which was found using a different algorithm as detailed in Remark 18.The quantum set Q (in pink) is drawn using formulas from[30].References: [BBLMTU06]=[10], [BS09]=[12], [BBP23]=[8].

Figure 13 :
Figure 13: Examples of collapsing triangles, together with wirings that can be used in Proposition 27 to show a collapse of communication complexity.The definition of the boxes P L and P NL can be found in Equation (15).

Find.
below the multiplication tables of ⊠ W for different wirings W. Each cell displays the result of P ⊠ W Q, where P lies in the first column and Q lies in the first line.P10 := P 0100 L tested box power, L ∈ N maximal optimized box power, (K reset , χ, m, M ) parameters for Algorithm 3.