Witnessing causal nonseparability

Our common understanding of the physical world deeply relies on the notion that events are ordered with respect to some time parameter, with past events serving as causes for future ones. Nonetheless, it was recently found that it is possible to formulate quantum mechanics without any reference to a global time or causal structure. The resulting framework includes new kinds of quantum resources that allow performing tasks - in particular, the violation of causal inequalities - which are impossible for events ordered according to a global causal order. However, no physical implementation of such resources is known. Here we show that a recently demonstrated resource for quantum computation - the quantum switch - is a genuine example of"indefinite causal order". We do this by introducing a new tool - the causal witness - which can detect the causal nonseparability of any quantum resource that is incompatible with a definite causal order. We show however that the quantum switch does not violate any causal nequality.


I. INTRODUCTION
It is commonly assumed that information is processed through a series of operations which are performed according to a specific order. This is justified by the assumption of a global, underlying time parameter according to which all operations can be ordered. A convenient representation of this structure is that of a circuit [1], Fig. 1(a), in which systems are "wires" that connect "boxes", which represent operations performed on the systems. At a more abstract level, a circuit only imposes a given causal structure between operations, as the time order between operations that can be performed in parallel is irrelevant. The circuit framework is also ubiquitous in the study of quantum foundations to formalize generalized, possibly post-quantum, probabilistic theories [2][3][4][5].
It has been suggested that such a framework might be too restrictive to encompass the most general kinds of information processing allowed by quantum physics [6]. For example, one can consider protocols in which the order between different operations is controlled by a quantum degree of freedom. It has been shown that such protocols exploiting a so-called "quantum switch" not only provide computational advantage over standard, time-ordered, ones [7,8], but they are also physically realizable and a first experimental proof-of-principle has been recently demonstrated [9]. At a more fundamental level, an underlying time or causal order might not be well-defined in a theory that combines the dynamical causal structure of general relativity and the probabilistic nature of quantum mechan-ics [10][11][12].
It is therefore natural to ask what the most general resources allowed by quantum mechanics beyond the circuit model are. In Ref.
[13] the process matrix formalism was proposed as a general framework to describe resources that can be accessed in "local laboratories" and which are locally in agreement with quantum physics, Fig. 1(b).
Causal relations are defined operationally in this formalism. If, for example, through appropriate state preparations, an agent A can influence the outcomes of measurements performed by an agent B, whereas B is never able to influence A, then A causally precedes B by definition and, in this case, the physical resources available to them can in fact be represented as a circuit. A first example of a resource that cannot be represented as a circuit is a probabilistic mixture of circuits: a definite order still exists between A and B in each run of an experiment, but which order is realized in a given run is only specified according to some probability distribution. Resources compatible with a definite causal order, in this broader sense, are called causally separable. Surprisingly, the formalism also allows for causally nonseparable resources, which are incompatible with any definite order between operations. It was found that a set of agents with access to a specific causally nonseparable resource could perform a task, the violation of a causal inequality, which is impossible for arbitrary causally ordered strategies, even allowing probabilistic mixtures of orders [13]. However, there is no physical interpretation for such resources and no physically realizable protocol is known which can violate a causal inequality. A process matrix formalizes a resource in which the order between operations may not be fixed. A probabilistic mixture of different orders is an example of a process matrix that does not correspond to a circuit. Still, in this case operations are performed in a well-defined order in each experimental run; the most general resource with this property is called causally separable. The process matrix formalism also allows for the more general case of causally nonseparable resources [13].
It is therefore not completely clear what is the precise relation between "quantum correlations with no causal order", which violate causal inequalities, and physically implementable resources, such as the quantum switch, which outperform causally ordered ones. To understand this relation, a crucial observation is that the causal inequalities are device-independent constraints: they are formulated independently of the physics of the systems or the specific apparatuses employed. On the other hand, the tasks discussed in Refs. [7,8] include additional assumptions, as for example that in each laboratory quantum systems of a definite dimension have to be used. It is clear that, given additional restrictions, it is more difficult for causally-ordered agents to perform certain tasks and, consequently, it can be easier to detect the lack of causal order in a physical resource.
The aim of the present work is to develop a general framework for the device-dependent detection of causal nonseparability. The central tool we introduce is what we call a causal witness, which represents a set of quantum operations, such as unitaries, channels, state preparations, and measurements, whose expectation value is non-negative as long as all the operations are performed in a definite causal order, i.e., as long as only causally separable resources are used. The observation of a negative expectation value is thus sufficient to conclude that the operations were not performed in a definite order. The concept is analogous to that of entan-glement witness: an observable that has a non-negative expectation value for separable states but can have a negative expectation value for specific entangled states.
We find that, for every causally nonseparable process, it is possible to construct a causal witness that detects it. Importantly, and differently from the case of entanglement witnesses, it is possible to use this method to write necessary and sufficient conditions for causal separability in a form that can be checked efficiently using semidefinite programming (SDP).
The tools developed are applied to the study of the quantum switch as a resource within the process matrix formalism. We show that, indeed, the quantum switch corresponds to a causally nonseparable process. We show that the protocol of Ref. [7] can be reformulated as a causal witness which detects the causal nonseparability of the quantum switch. We also find new, more efficient witnesses, which could be useful for experimental implementations.
We finally address the question of whether the quantum switch can pass any device-independent test of causal nonseparability. As it turns out, this is not possible: we prove that a broad class of resources, including the quantum switch, cannot violate any causal inequality.
The paper is organized as follows: In Section II, we review the process matrix formalism, giving a convenient characterization of general and causally separable process matrices for the cases of interest. In Section III, we introduce and characterize the central concept of causal witness, and we present efficient algorithms for finding witnesses and for proving the causal (non)separability of a general process matrix. In Section IV we formalise the quantum switch as a process matrix. We proceed to prove its causal nonseparability in Section V, through the use of causal witnesses. One such witness is the task proposed in Ref. [7], that we optimize to increase its resistance to noise. Finally, we clarify in Section VI the link between causal witnesses and causal inequalities and show that the quantum switch cannot violate any causal inequality.

II. THE PROCESS MATRIX FORMALISM
In the general scenario we consider in this paper, N parties A i establish correlations by exchanging physical systems between their laboratories. Each party opens their laboratory only once to let an incoming system enter and to send an outgoing system out; they can act on these systems by performing an arbitrary operation in their local laboratory, which can yield different measurement outcomes. The causal relations between the parties (i.e., the ordering of events) are not a priori specified. The most general situation compatible with the assumption that the operations performed in each local laboratory can be described by the quantum formalism can be conveniently represented in the "process matrix" formalism introduced in Ref. [13]. This extends the "comb" formalism of Ref. [14], which describes causally ordered quantum networks. The aim of the formalism is to characterize all possible probability distributions that can be obtained in our general scenario. The key concept is that of a process, which can be understood as the external resource determining the statistics of the local operations, and which generalizes both the notions of quantum state and of quantum channel. The process matrix is a useful mathematical representation of such a concept. We shall use these two terms interchangeably.

A. Local operations
Each party A acts in a local quantum laboratory, which can be identified by an input Hilbert space H A I and an output Hilbert space H A O . The dimensions d A I and d A O of input and output spaces do not have to be equal, as ancillary systems can be added or discarded during an operation; we shall nevertheless assume throughout the paper that all Hilbert spaces are finite-dimensional. According to quantum theory, the most general local operation is described by a completely positive (CP), trace non-increasing map M A : A I → A O [15], where we write A I , respectively A O , for the space of hermitian linear operators over the Hilbert space H A I , resp. H A O . Examples of CP maps are deterministic operations, such as unitaries or quantum channels, or (generalized) measurements. In general, a label a, denoting the measurement outcome, is associated with the CP map M A a . The choice of operation (e.g. of measurement setting) is represented by an instrument [16], which is defined as the collection J A = M A a m a=1 of CP maps associated to all measurement outcomes, characterized by the property that ∑ m a=1 M A a is CP and trace-preserving (CPTP). An instrument generalizes the notion of POVM (positive operator-valued measure) to include the transformations applied to the system; it reduces to a POVM for 1-dimensional output spaces. When the choice of operation is described by a classical variable x, we will express such a dependence explicitly A convenient representation of CP maps is given by the Choi-Jamiołkowski (CJ) isomorphism [17]. For a CP map M A a : A I → A O , its corresponding CJ matrix is defined here as , M (2)

B. Process matrices
As discussed in Ref.
[13], requiring that quantum mechanics holds locally implies that the probability that the N parties A i observe the outcomes a 1 , . . . , a N , for a choice of operations x 1 , . . . , x N , is a multilinear function Using the CJ representation, it was shown that these probabilities can then be expressed as for some hermitian operator W ∈ A 1 O called a process matrix, which describes the general quantum resource connecting the local laboratories.
The set of valid process matrices is defined by requiring that probabilities are well-defined -that is, they must be non-negative and must sum up to 1 -for all possible operations, including operations that involve, in each laboratory, local interactions with ancillary systems that may be entangled with the other laboratories. As we show in Appendix B, these conditions are equivalent to , and L V is a projector onto the We will denote the closed convex cone of non-normalized processes defined by (4) and (6) by W.
In the case of two parties A (Alice) and B (Bob), see Figure 2, these conditions on where (here and throughout the paper) the operator X · denotes the CPTP map consisting in tracing out the subsystem X and replacing it by the normalized identity operator, formally defined as Representation of a bipartite process matrix W, connecting Alice's (A O ) and Bob's (B O ) output systems to their input systems (A I and B I ).

Non-signalling and 1-way-signalling process matrices
Two important particular cases of process matrices may shed light on the above definition. The first case is when the process matrix does not allow for any signalling, and the second one is when it allows for signalling only in one fixed direction between the parties. They are discussed in more details in Appendix A 2.
The first case is described by process matrices W satisfying where ρ A 1 I ...A N I is a density matrix representing an ordinary quantum state. In this case, the probability rule (3) reduces to the standard Born rule where The second case, of which the first one is a particular case, is described by process matrices W satisfying These conditions, first found in [14,18], mean that party A i can only signal to party A j if i < j. The process is therefore compatible with the causal order When this is the case, we write as a mnemonic Process matrices of this form (and the obvious permutations) are called causally ordered. As shown in Refs. [14,18], they correspond to standard (causally ordered) quantum circuits and can be implemented as quantum channels with memory between the parties.
with 0 ≤ q ≤ 1. Ignoring the normalization constraint, the set of causally separable process matrices is a convex cone, which we denote by W sep . A process matrix that cannot be decomposed as in (18) is called causally nonseparable.

Tripartite causally separable processes
In this paper we will define tripartite causal separability only for processes where the output space of the third party C (Charlie) is trivial, i.e., d C O = 1 (see Figure 3). As C cannot signal to the other parties, every process of this kind if compatible with C being last. Thus, only two causal orders are relevant in this case: The conditions for process matrices being compatible with these orders are, according to equation (15), and Since these three conditions together define a linear subspace, we can write them more succinctly as where L A≺B≺C and L B≺A≺C are the projectors onto the aforementioned subspaces. Therefore, when C's output space is trivial, we will call a tripartite process matrix W sep causally separable if it is of the form with 0 ≤ q ≤ 1. Ignoring the normalization constraint, this defines a convex cone W sep 3C . We will use this definition in Section V to show that a recently introduced tripartite quantum resource, which yields informationprocessing advantages with respect to causally ordered processes [7,8], is causally nonseparable.
The generalization of the notion of causal separability to a larger number of parties, with arbitrary dimensions of the output spaces, is not trivial. The reason is that one can consider situations in which an agent, through her local operations, could modify a classical variable that determines the causal order of agents in her future. In such a "classical switch", operations would still be causally ordered in each run of an experiment, but it wouldn't be possible to write the corresponding process matrix as a mixture of causally ordered ones. As this issue does not affect the cases treated here, we shall not consider it further. A more detailed analysis will be presented in an upcoming work [19].

A. Definition and characterization
In this section we develop mathematical tools to identify, in the bipartite case, which process matrices are causally separable and which are not. In analogy with entanglement witnesses [20], we call a hermitian operator S a causal witness (or witness, simply) if 1 for every causally separable process matrix W sep . This definition is motivated by the separating hyperplane theorem [21]: since the set of causally separable processes is closed and convex, for every causally nonseparable process matrix W ns there exists a causal witness S W ns such that tr[S W ns W ns ] < 0.
To construct a witness for a given nonseparable process, we will start by characterizing the set of all causal witnesses in terms of linear constraints on a convex cone. This will allow us to cast the problem of finding a witness as an SDP problem. First, note that (28) is equivalent to Let us focus on condition (29a). Using Eq. (17) and noting that for any valid process matrix W, B O W is a valid causally ordered process matrix compatible with the order A ≺ B, one finds that (29a) is equivalent to Thinking of the trace as the Hilbert-Schmidt inner product and noting that the map B O · is self-dual, we have that and it is sufficient that BO S ≥ 0 for the right-hand-side to be non-negative for all valid W. An analogous argument shows that A O S ≥ 0 is sufficient to satisfy condition (29b). We conclude that for S to be a causal witness, it is sufficient that Note also that adding an operator S ⊥ belonging to the orthogonal complement L ⊥ V of L V to any witness S gives another valid witness, since tr[(S + S ⊥ )W] = tr[SW] for any valid process matrix W. It turns out that this suffices to completely characterize the set of causal witnesses, as stated in the following theorem:

a causal witness if and only if S can be written as
where S P and S ⊥ are hermitian operators such that The rather technical proof of this theorem is relegated to Appendix C. This theorem provides a characterization of the closed convex cone of causal witnesses S.
Since S ⊥ does not change the expectation value tr[SW], it can freely be chosen to be for instance so that S = L V (S P ). This has the effect of restricting witnesses to the subspace of valid processes L V , which have the following characterization:

Corollary 2. A hermitian operator S ∈ L V is a causal witness if and only if there exists a hermitian operator S P ∈
This restricted set of causal witnesses is also a closed convex cone, which we denote by S V = S ∩ L V .
One could define witnesses as belonging to S V instead of S, since both sets are as powerful in detecting causal nonseparability. However, some physically motivated witnesses, such as those presented in Section V B (for the tripartite case), do not belong to S V , which is why we use the more general definition that witnesses belong to S.

B. Finding causal witnesses
The previous characterization of the convex cone of causal witnesses allows one to efficiently check the causal nonseparability of any process matrix W through algorithms for semidefinite programming (SDP) [22]. They output a causal witness if W is causally nonseparable, and an explicit decomposition in terms of causally ordered process matrices otherwise.
The idea is simply to minimize tr[S W] over the cone of causal witnesses 2 S V , and check whether we obtain a negative value or not. Note that in order to make tr[S W] lower bounded (to avoid getting a value −∞ for causally nonseparable process matrices) a normalisation constraint on the witnesses has to be imposed. This normalisation is arbitrary -any constraint that makes S V compact suffices -and different normalisation choices give rise to different interpretations for the value of tr[S W]. We shall normalise the witnesses by imposing that tr[S Ω] ≤ 1 for every (normalised) process matrix Ω, for − tr[S W] can then be interpreted as a measure of causal nonseparability, as we shall see later in this subsection. In order to be able to use it in the SDP problem we still need to write this normalisation as a conic constraint. To do so, we extend the constraint tr[S Ω] ≤ 1 to non-normalised process matrices by linearity: which is equivalent to for all Ω ∈ W. Recalling that S is assumed to be in S V ⊂ L V , this means that 1/d O − S ∈ W * V := W * ∩ L V , where W * is the dual cone of W -that is, the cone of hermitian operators that have non-negative trace with process matrices.
To test the causal nonseparability of a given process matrix W, we are thus led to define the following SDP problem: which is written explicitly in terms of positive semidefinite constraints in Appendix D.
If the solution of the SDP problem (38) leads to a negative expectation value of S, one can conclude that W is causally nonseparable, since SDP algorithms can be guaranteed 3 to find the optimal solution [22]. In such a case, the optimal solution S * provides an explicit witness to verify the causal nonseparability of W. On the other hand, if tr[S * W] = 0, one concludes that W is causally separable, and an explicit decomposition of W into causally ordered processes is given by the SDP problem dual to (38) (this can be seen explicitly from the representation of the SDP problem (39) given in Appendix D). As shown in Appendix E, this dual is where W sep is the cone of non-normalized causally separable process matrices, as previously defined. Furthermore, the optimal value tr[Ω * ]/d O of problem (39) is related to the optimal value tr[S * W] of problem (38) through This gives an operational meaning to − tr[S * W]. As shown in Appendix E, this quantity corresponds to the minimal λ ≥ 0 such that is causally separable, optimized over all valid, normalised processes Ω. In other words, it quantifies the resistance of W to the worst-case noise. This is an analogue of the measure of entanglement called generalised robustness, which quantifies the resistance of the entanglement of a quantum state to worst-case noise [23]. It turns out that for our case the interpretation of − tr[S * W] as a measure of causal nonseparability is also tenable, as it respects some simple axioms that we propose in Appendix F. For this reason, we define the generalised robustness of a process W as Again in analogy with the case of entanglement measures, one can also define the random robustness [24] of W as is its resistance to "white noise", which can be defined as the process that sends maximally mixed states to each laboratory, independently of the local operations: The optimal witness with respect to random robustness can be found by solving an SDP problem analogous to (38): whose dual is min λ and random robustness itself is defined as where tr(S * W) is now the optimal value of the problem (44). This quantity can be used to compare witnesses in scenarios where white noise is an appropriate noise model, however, it cannot be interpreted as a proper measure of causal nonseparability, as it does not respect all the axioms we propose in appendix Fmore specifically, it is not monotonous under local operations.
A geometrical interpretation of the results of this section is shown in Figure 4.

C. Implementing causal witnesses
Once a causal witness S has been obtained for a given causally nonseparable process matrix W, a natural question is how to "measure" it, i.e., how to access the quantity tr[S W] -and, in particular, check its sign -experimentally.
To do so, note that as S ∈ A I ⊗ A O ⊗ B I ⊗ B O is a hermitian operator, it can always be decomposed as a linear combination of the form 4 where γ x,y,a,b are real coefficients and M Here we schematically represent the set of normalised process matrices in W by the red ellipse and the set of normalised causally separable processes in W sep by the blue ellipse. Since the latter set is closed and convex, any causally nonseparable process W is separated from it by a hyperplane, corresponding to an operator S which we call a causal witness.
In the figure we represent two such causal witnesses, S R g and S R r , that represent two different ways to quantify how far W is from being causally separable. − tr(S R g W) measures the generalised robustness of W, which is its resistance to the worstcase noise Ω. Geometrically, the generalised robustness of W is given by the ratio of distances d(W, W g )/d(W g , Ω), where W g is the causally separable process closest to W on the depicted line. In its turn, − tr(S R r W), the random robustness of W, is its resistance to the "white noise" 1 • . Geometrically, it is given by analogous ratio d( where W r is again the causally separable process closest to W on the depicted line. S R g and S R r are the optimal solutions of the SDP problems (38) and (44) and combining them as in Eq. (48).
The decomposition (47) is not unique. Furthermore, as noted before we can add to any witness S a term S ⊥ such that L V (S ⊥ ) = 0 without changing its validity or its trace with any valid process. Hence, it actually suffices to find a decomposition for S + S ⊥ for some arbitrary S ⊥ , implement the corresponding maps, and combine their statistics as above.

D. Example
Let us now illustrate the above considerations on an explicit example. Ref.
[13] introduced the following process matrix, for a case where all incoming and outgoing systems of A and B are 2-dimensional (qubit) systems (i.e., where Z and X are the Pauli matrices, and tensor products are implicit. One can easily check that W OCB ≥ 0, that tr[W OCB ] = 4 = d O , and that W OCB satisfies Eqs. (9)- (11), which ensures that it is indeed a valid process matrix. It was shown that W OCB allows for a violation of a causal inequality (see Section VI), which implies that it is causally nonseparable.
The concept of causal witnesses introduced here allows us to prove the causal nonseparability of W OCB more directly. Solving the SDP problem (44) with YALMIP [25] and the solver MOSEK [26], we obtained, up to numerical precision, the optimal witness with respect to random robustness is the random robustness as defined in Equation (46)). This proves that W OCB is causally nonseparable. This also implies that the process matrices of the form are causally nonseparable for (and their causal nonseparability is then witnessed by the solution of the SDP problem (45) provides an explicit decomposition for W OCB (R r (W OCB )) (as can be seen when writing (45) in a form similar to Eq. (D6)), from which we can derive an explicit decomposition for all W OCB (λ) for λ ≥ √ 2 − 1, as where are causally ordered process matrices. (Note that for λ < √ 2 − 1, W A≺B OCB (λ) and W B≺A OCB (λ) as defined above would not be positive semidefinite, which explains why Eq. (53) then fails to provide a valid causally separable decomposition of W OCB (λ).) To measure the witness S OCB and obtain the quantity tr[S OCB · W] experimentally, one can for instance decompose it in the following way: define, for x, y, y ′ , a, b = 0, 1, the CJ matrices which represent measure-and-prepare maps (see Appendix A 1). One can then check that where δ j,k is the Kronecker delta. Thus, one can compute tr[S OCB · W] by performing the maps above on W and combining the probabilities P M b|y,y ′ · W] as follows: As one may recognize, the choice of CP maps in (56)-(58) is the same 5 as that considered in Ref.
[13], we exchanged in the present paper the notations x, y and a, b for inputs and outputs, so as to use here the same notations as most of the recent works on quantum and nonlocal correlations [27]. Furthermore, in [13] the state sent out by B when y ′ = 1 was arbitrary, while here we fixed it to be 1 B O /2. that the experimental procedure proposed here to measure the witness S OCB would be the same as that suggested in [13] to violate a causal inequality. The labels x, y, y ′ , a, b can be considered as inputs and outputs for the above maps (which indeed satisfy for all x, y, y ′ ). As it turns out, in the causal inequality of Ref.
[13] the probabilities P(a, b|x, y, b|y,y ′ are actually combined in precisely the same way as above -namely, tr[G OCB · W] above can be identified with the probability p succ of winning the corresponding "causal game", when the inputs x, y, y ′ = 0, 1 are given with equal probabilities. Remarkably, in this particular case the bounds of the causal witness S OCB and of the causal inequality (62) coincide, i.e., tr[S OCB · W] ≥ 0 if and only if p succ = tr[G OCB · W] ≤ 3/4, where 3/4 is the upper bound on p succ for any causal correlation (as defined in Section VI below). Furthermore, the noise threshold below which the noisy process matrix W OCB (λ) (53) can violate the causal inequality is the same as the threshold R r (W OCB ) below which W OCB (λ) is causally nonseparable, as already noted in Ref. [28]. This is however not a general property of causal witnesses and causal inequalities: similarly to the case of entanglement vs. quantum nonlocality and of entanglement witnesses vs. Bell inequalities [27], there exist causally nonseparable process matrices that cannot yield any violation of any causal inequality -while there always exists a causal witness that detects their causal nonseparability. We will come back to this issue in Section VI below, with an explicit example in the tripartite case.

A. The quantum switch
It has recently been suggested that quantum computation can be extended beyond the framework of quantum circuits, which enforces a fixed order between the execution of quantum gates. The main idea is that the order in which gates are performed can be coherently controlled by a quantum system. The new resource that allows for such a control is the quantum switch, first proposed in Ref. [6]. It works as follows: consider a two-qubit system, composed of a control and of a target qubit. Two parties A and B act on the target qubit with the unitaries U A , U B respectively. If the control qubit is prepared in the state |0 , U A is applied to the target before U B , while if the control is in state |1 the two unitaries are applied in the reversed order. The global unitary, acting on both the target and control qubits, is thus (63) where the first factor in each tensor product acts on the control system and the second factor acts on the target. For an initial state ⊗ |ψ of the controltarget system, one gets, after applying V, the state which can be interpreted as having applied the two unitaries on the target in a "superposition of orders" 6 .
Note that if the control system is discarded, one is left with the mixed state This can be produced by randomly exchanging the order in which U A and U B are applied and thus can be seen as an equal mixture of causally ordered processes.
To make the situation more interesting, we shall be led to introduce a third party, C, who can perform measurements on the control qubit (and possibly also on the target qubit) in order to define a causally nonseparable process (using the definition (27)) using quantum control of causal order.

B. Process matrix representation of the quantum switch
For our purposes, we can formally represent the quantum switch (with fixed input state) as a tripartite process matrix: the two parties A and B perform an arbitrary CP map each on the target qubit, while C performs an arbitrary two-qubit POVM measurement on the resulting control-target state (with no outgoing system). The dimensions of input and output systems of the local laboratories are therefore (65) For clarity, we shall divide C's input space as C I = C c I ⊗ C t I , where C c I and C t I refer to the control and target qubits, respectively (with therefore d C c . In order to describe the process matrix of the quantum switch, we are first going to make use of the "pure" 6 Since any CP map can be purified to a unitary evolution by introducing an ancillary system and a projective measurement on some subsystem of the original system and ancilla, the notion of superposition of orders can be easily extended from unitary operations to arbitrary CP maps by introducing an ancillary register for each party.
version of the formalism, described in Appendix A 2. An identity channel from a party's output space A O to another party's input space B I is described, as a process matrix, by the projector onto the "process vec-

The situation where
A receives a state |ψ , performs an arbitrary operation on it, and sends the output directly to B through an identity channel, who in turn sends the output of his operation to C t I , is represented by the process vec- Then the quantum switch, with the control qubit initially in the state |0 +|1 √ 2 and the target qubit in the state |ψ , is represented by the process matrix |w w|, where This can be checked by noting that where |U * ⊗ |ψ . Note that the process (66) itself is clearly causally nonseparable 7 , since i) it is a superposition of a pure process only compatible with the order A ≺ B ≺ C and a pure process only compatible with the order B ≺ A ≺ C and ii) it is a projector onto a pure vector, thus it cannot be written as a nontrivial mixture of causally ordered processes.
From Eq. (66), one finds (using the facts that where are (bipartite) causally ordered process matrices; tr C I |w w| indeed describes the situation of Eq. (64). For some information-processing tasks, the quantum switch is known to provide an advantage over causally ordered processes [7,8], even when C ignores the target system and only measures the control system. We will thus restrict our attention to witnesses of the form S C c I ⊗ 1 C t I , which can simplify the analysis and the experimental implementation. The reduced process we will be dealing with is the partial trace of the quantum switch (66) over the target system: Note that the proof of causal nonseparability based on the purity of the switch does not extend to the reduced switch (71), since it is not an extremal process. We will therefore use the framework of causal witnesses to show that the reduced switch is also causally nonseparable.

V. WITNESSES FOR THE QUANTUM SWITCH
Since the quantum switch is a tripartite process where d C O = 1, we can use definition (27) to study its causal (non)separability. In this tripartite situation, we will define causal witnesses to be the hermitian operators S such that for every causally separable processes W sep in the cone W sep 3C . The set of causal witnesses is thus the cone dual to W sep 3C , which we denote by S 3C , or S 3C,V when restricted to L V . The characterization of S 3C is given by the following theorem:

is a causal witness if and only if S can be written as
where with L A≺B≺C and L B≺A≺C as defined in Subsection II B 3.
The proof is given in Appendix G. This characterization allows us to cast the problem of finding a witness for the quantum switch (or in fact for any process W with d C O = 1) as an SDP problem analogous to (38): where W * 3C,V := W * 3C ∩ L V , with W * 3C the dual of the cone W 3C of (non-normalized) tripartite process matrices with d C O = 1.
Analogously to problems (38)-(39), the dual of (76) writes and the optimal values of (76) and (77) respect the duality relation (40), which allows us to interpret − tr(S * W) as generalised robustness also in this case. Furthermore, (76) and (77) respect the assumptions of the Duality Theorem, and therefore SDP algorithms can find their optimal solutions efficiently. We shall, however, omit the proofs, as they are simply a slight modification of the ones already presented in Appendix E.

A. Optimal witness
To find the optimal generalised robustness witness for the quantum switch we need to solve SDP problem (76) providing W switch from Eq. (71) as an argument. Solving it using YALMIP and the solver MOSEK we obtain a witness S optimal numerically; the generalised robustness of the quantum switch is found to be Later in this section we will compare this number to that obtained from non-optimal witnesses. For this purpose, we shall use the amount of worst-case noise tolerated by a witness, i.e., the amount of worst-case noise that can be added to the quantum switch before the witness can no longer detect its causal nonseparability. It should be clear that, when the said witness is optimal, this number reduces to the generalised robustness of the quantum switch.

B. Chiribella's witness
In Ref. [7] Chiribella proposed an informationprocessing task for which the quantum switch had an advantage over causally ordered processes. We want to understand what this advantage means, and how it relates to causal nonseparability. For that we shall present a slightly modified version of his task and show how it can be understood as a causal witness.
Our version of the task is as follows: Alice (party A) receives a qubit in her lab, applies a unitary U A to it, and sends it away. Bob (party B) receives a qubit in his lab, applies a unitary U B to it, and sends it away. We assume that in each run of the experiment, U A and U B either commute or anticommute. Charlie (party C) receives a qubit in his lab, and makes a measurement on it to decide whether U A and U B commute or anticommute.
To construct a causal witness in relation to this task, we start with the Choi-Jamiołkowski representation of the actions of the parties: Alice applying a unitary U A , Bob applying a unitary U B , and Charlie obtaining the result ± when measuring in the |± = |0 ±|1 √ 2 basis.
Using the CJ representations |U * A and |U * B of U A and U B (see Appendix A 1), the corresponding operator is The witness corresponding to the task is obtained by averaging over the cases where Charlie obtains + when Alice and Bob apply commuting unitaries, and the cases where Charlie obtains − when Alice and Bob apply anticommuting unitaries: where dµ [ , ] is a measure over commuting unitaries, and dµ { , } is a measure over anticommuting unitaries (we assume here that the cases where U A and U B commute and anticommute each appear with probability 1 2 ). The probability of success in this task when the parties are using a strategy described by a process matrix W is then It is easy to check that for any choice of measures dµ [ , ] , dµ { , } the probability of success is 1 when W = W switch . The maximal probability of success for causally separable processes, however, depends crucially on the measures dµ [ , ] and dµ { , } . If we were to choose, for example, measures that only produce pairs of Pauli matrices, then there is a causally separable circuit 8 that can decide the commutativity or anticommutativity with probability 1.
To avoid this problem we will first choose measures that can produce any pair of commuting or anticommuting unitaries (modulo global phases). Specifically, we choose the commuting measure dµ [ , ] to pick up commuting unitaries of the form where U is uniformly distributed according to the Haar measure, and θ i are uniformly distributed in the interval [0, 2π]. For the anticommuting measure dµ { , } , we will use U A = VXV † and U B = VZV † , where V is also a Haar-random unitary (and X and Z are the Pauli matrices) 9 . With these measures G Chiribella turns out to be a valid causal witness, as the maximal probability of success for causally separable processes p sep succ is bounded below one. To calculate it we need to solve the following SDP problem: Solving it with YALMIP and MOSEK, we obtain The amount of worst-case noise that G Chiribella can tolerate is 0.0766, which is much worse than the 0.5454 tolerated by S optimal . An issue with G Chiribella is that it would take an infinite number of measurements to estimate each term of the sum in (80). Furthermore dµ [ , ] and dµ { , } were chosen arbitrarily, while it would be preferable to have a justification for the choice of a particular measure. Both problems are solved by restricting the unitaries U A and U B to come from a finite set. In this way we only need perform a finite number of measurements to estimate the witness, and it is possible to optimize the measures over commuting and anticommuting unitaries through SDP problems.
The best witness we found is obtained by choosing the following ten unitaries: (Y being the third Pauli matrix), and defining the witness to be where U k ∈ G, and q are the input probability distributions over commuting and anticommuting uni- 9 It turns out that with this choice of measures the witness G Chiribella is the same as we would obtain by translating the task from Ref. [7] directly into the language of causal witnesses; the only difference, then, is that in [7] the witness was decomposed in terms of measurements and repreparations, whereas we decomposed it using unitaries only.
To obtain the weights q and tolerance to worst-case noise 0.1507, which is higher than G Chiribella 's 0.0766, but still lower than S optimal 's 0.5454. We want to emphasize that the witnesses obtained in this subsection are equivalent to the ones defined through (72) in the beginning of the present section -the only difference being the arbitary choice of the causal bound being ≥ 0 vs ≤ p sep succ . More precisely, let G be a witness such that for every (normalised) causally separable W sep and for every (normalised) process matrix W. Then is a valid generalised robustness witness. Furthermore, if S is the optimal witness for some process matrix W that saturates the upper bound tr(G W) = T 1 , it follows that When G is either G finite or G Chiribella , we have that T 0 = 0 and T 1 = 1. And even though they are not optimal witnesses for W switch , the relationship between p sep succ and resistance to worst-case noise is valid for them, i.e., for both G finite and G Chiribella the resistance to worst-case noise is equal to 1/p sep succ − 1, as given by (91).

VI. CAUSAL INEQUALITIES
The notion of causal separability considered above relies on the quantum description of the local laboratories. One may ask what are the constraints imposed by a definite causal structure regardless of the specific description, or even the physics governing the devices performing the local operations. To study such restrictions, we will make use of so-called causal inequalities [13], which bound the possible correlations that can be established between events following a definite causal order. The violation of a causal inequality gives a stronger, device-independent signature of lack of causal order than the measurement of a witness. It is natural to ask whether it is possible to use the quantum switch to violate a causal inequality; we show below that this is not the case.

A. Device-independent causal relations
We still consider a multipartite scenario in which a set of N parties {A i } N i=1 are located in different, separated laboratories. Each party can perform operations and obtain measurement outcomes. Contrary to the previous case however, we do not consider here any particular physical description of what happens in each lab; the "settings" for the operations in the different laboratories and the measurement outcomes are labelled by some classical variables x i and a i (with 1 ≤ i ≤ N), respectively; for simplicity we assume that the x i 's and a i 's take a finite number of values. Defining the vector of settings x = (x 1 , . . . x N ) and the vector of outcomes a = (a 1 , . . . , a N ), the device-independent description of the correlations established in such an experiment is encoded in the conditional probability P( a| x).
Causal inequalities [13] are constraints on P( a| x) derived from the assumption that there exists an underlying causal structure defining the order between parties. To be more precise, let us represent the causal order in which the parties act by a permutation σ, defined such that party i acts before party j if and only if σ(i) < σ(j). This leads to a total ordering of the parties, namely A σ(1) ≺ A σ(2) ≺ . . . ≺ A σ(N) . We then say that a probability distribution P( a| x) is compatible with the causal order σ if no party signals to those before her 10 , namely if for every i the marginal distribution does not depend on the inputs x σ(j) with j > i; i.e., (93) A probability distribution that is compatible with at least one causal order σ is said to be causally ordered.
More generally, we allow the parties to share randomness to agree on a specific order of sending signals between them before the inputs of the game are given to them. This allows for convex combinations of causally ordered probability distributions: where each P σ is compatible with a fixed order σ. These are still not the most general correlations compatible with the assumption of a definite causal structure, as one party could control the causal order of a set of parties in its future [19,29,30]. Correlations compatible with this most general scenario of definite causal order are called simply causal. In the bipartite case, the set of causal correlations forms a convex polytope, delimited by a finite number of facets that define causal inequalities [31]. The explicit definition of causal correlations in the general N-partite case is, however, rather cumbersome, and for the purposes of this article it will be enough to consider probability distributions of the form (94), which is a sufficient (although not necessary) condition for causal separability.
As causally separable processes can only generate causal correlations, the violation of a causal inequality can also be used to detect the causal nonseparability of a process. While causal witnesses are device-dependent and can only detect causal nonseparability if each party trusts her operation's implementation, causal inequalities are completely device-independent: even if each party distrusts her laboratory, they can still detect causal nonseparability from the statistics of their experimental outcomes, if those violate a causal inequality. While for every causally nonseparable process there is causal witness that will detect its nonseparability, there are causally nonseparable processes cannot be used to violate any causal inequalities: in the next subsection we will prove that the quantum switch provides such an example. There is an analogy here with entanglement witnesses, which allow for a device-dependent way of detecting entanglement, and Bell inequalities, which provide a device-independent entanglement certification -"nonlocality" [27]. The important difference is that states violating Bell inequalities are physically implementable, while no example of a physically implementable process violating causal inequalities is known.

B. Quantum control of orders and causal inequalities
One might first wonder if the quantum switch allows for a causal inequality violation between A and B (such as the bipartite causal inequalities of Refs. [13, 31]); this is however clearly not the case since, as pointed out before, ignoring (i.e., tracing out) the third party C makes the process matrix of the quantum switch causally separable.
One might still hope that the quantum switch can be used to violate a tripartite inequality (see e.g. [30]), explicitly involving party C; as it turns out, this is also impossible, as a consequence of the following theorem 11 : 11 A similar conclusion based on the same example has been obtained Theorem 4. Consider N+1 parties A 1 , . . . , A N , C with  settings {x 1 , . . . , x N , z} and outcomes {a 1 , . . . a N ,  2. P( a| x) = ∑ σ q σ P σ ( a| x), where q σ ≥ 0, ∑ σ q σ = 1, and the probability distributions P σ are causally ordered, then the full (N+1)-partite probability distribution P( a, c| x, z) is causal.
Proof. Using Bayes' rule and the assumptions of the theorem, we can write P( a, c| x, z) = P( a| x, z) P(c| a, x, z) where P σ ( a, c| x, z) To see that the correlations generated by the quantum switch (Eq. (66)) respect assumptions 1. and 2. of the previous theorem, let us calculate the marginal probability distribution defined in Eq. (95) through the generalized Born rule (3) This implies that P(a, b|x, y, z) does not depend on z, as required. As argued before, tracing out C from by Oreshkov and Giarmatzi independently of the other authors of this paper and is presented in Ref. [19] the process matrix representing the quantum switch leads to a causally separable process matrix of the form W AB = 1 2 W A≺B + 1 2 W B≺A with causally ordered process matrices W A≺B and W B≺A , which can only generate causally ordered probability distributions P A≺B and P B≺A . Hence, P(a, b|x, y, z) can be decomposed as 1 2 P A≺B (a, b|x, y, z) + 1 2 P B≺A (a, b|x, y, z), so that the second assumption of Theorem 4 is also satisfied. Therefore, the quantum switch represents an example of a causally nonseparable process that can only generate causal correlations, and hence cannot be used to violate any causal inequality 12 . It is noteworthy that all the examples of causally nonseparable processes for which a physical interpretation is known, including those generated by space-time superpositions [32], fall into this category. This raises the question of whether causally nonseparable processes that do violate causal inequalities can be physically implemented at all.

VII. CONCLUSION
The process matrix formalism was originally conceived as a rather speculative extension of quantum mechanics to possibly include the indefinite causal structures expected in a quantized theory of gravity [10]. The results of this work show that, in fact, it is a natural framework to study a class of quantum resources which cannot be captured by the circuit model, but nonetheless are physically realizable and can provide powerful computational advantages. We have shown that the quantum switch, a recently demonstrated resource for quantum computation, can be conveniently represented as a causally non-separable process matrix. We have also presented causal witnesses that can verify the causal nonseparability of the switch. As they only require performing unitaries in a "superposition of order" and a final measurement of a control qubit, such witnesses can be easily implemented in quantum-optics setups, as the one employed in Ref. [9].
The theory of causal witnesses developed here has close resemblances with the theory of entanglement witnesses. In both cases, one is interested in finding ways to certify that a resource is outside some convex set, the set of separable states in the latter case, that of causally nonseparable process matrices in the former case. Following this analogy, causal inequalities can be seen as the counterpart to the Bell inequalities, as they both provide device-independent tests regarding the existence of some classical variable: local hidden variables for measurement outcomes in one case, 12 Note that Theorem 4 implies that this is also true for the N-partite generalization of the quantum switch defined in [8].
classical variables determining the causal order in the other. A significant difference between the two frameworks is that the problem of determining causal separability can be solved numerically with efficient algorithms, whereas characterizing entanglement has been proven to be an NP-hard problem [33]. As one could expect from the analogy with entanglement, there exist causally nonseparable processes that cannot violate causal inequalities. What is striking, in the case of process matrices, is that a physical interpretation is known only for resources in this category. As one of the main open problems in this field is the characterization of physical process matrices, it is tempting to speculate whether the (im)possibility to violate causal inequalities could provide a useful guidance in this respect. tion; the Templeton World Charity Foundation (grant TWCF 0064/AB38); the French National Research Agency through the 'Retour Post-Doctorants' program (ANR-13-PDOC-0026); and the European Commission through a Marie Curie International Incoming Fellowship (PIIF-GA-2013-623456).

Appendix A: Details of the formalism
Here we explore in more details the properties of the Choi-Jamiołkowski (CJ) isomorphism and of the process matrix formalism. Note that other existing definitions of the CJ isomorphism differ by a transposition or a partial transposition from the one given here, which follows the convention in [13] and allows a direct identification of non-signaling processes with quantum states.

Choi-Jamiołkowski isomorphism
a. Pure CJ isomorphism. It is convenient to distinguish two versions of the CJ isomorphism: one for maps over density matrices and one for linear operators on pure state. The latter -the "pure CJ isomorphism" -can be represented via the "double-ket" notation [34,35]. For a linear operator A :

|j A I ∈ H A I ⊗ H A I
(with also, of course, the usual notation 1| = |1 † ), and the complex conjugation * is defined with respect to the chosen orthonormal basis {|j A I } of H A I . The inverse map is given by We say that |A * is the CJ representation (or CJ vector) of A. The cumbersome complex conjugation in the definition allows us to have a simpler representation for the process matrix.
b. Maximally entangled states and unitaries. Consider here the case where the input and output spaces have equal dimensions, d A I = d A O . The state obtained by applying a local unitary to one subsystem of a maximally entangled state is also maximally entangled. in reverse, it is possible to generate any (bipartite) maximally entangled state by applying a local unitary to one subsystem of a reference maximally entangled state. Therefore, the CJ vector |U * A I A O = 1 ⊗ U * |1 is maximally entangled if and only if U is a unitary. More explicitly, an operator is unitary if and only if ∑ l u jl u * kl = ∑ l u * lk u l j = δ j,k for all j, k. One can check that this is also a necessary and sufficient condition for which is maximally entangled.
c. Measurement-preparation. Another useful linear operator is |ψ φ|, which describes the observation of an outcome |φ in a projective measurement, followed by the repreparation of a state |ψ . Plugging this into the definition (A1), we find the CJ representation Reciprocally, every pure product CJ vector represents a measurement-preparation operation. An important particular case is when |ψ = |φ , which corresponds to the ideal non-demolition von Neumann measurement: 13 Superscripts on CJ vectors and CJ matrices indicate the systems they refer to (they may be omitted when the context makes it clear enough).
Note that a CP map can be part of an instrument only if it is trace-non-increasing, a condition that translates to A useful example is the CPTP map M A (σ) = ρ tr σ, which corresponds to the preparation of a (normalized) state ρ independently of the input state σ. Its CJ representation is found to be

Process matrices
Here we discuss in more detail some examples and properties of process matrices. a. Quantum states. Consider a bipartite process matrix of the form According to the generalized Born rule, Eq. (3), the probability for the two parties A and B to perform trace non-increasing CP maps with CJ matrices M A I A O and M B I B O , respectively, is given by where Notice that a process matrix of this form does not allow signalling in either direction and therefore, being compatible with both A ≺ B and B ≺ A, it is causally separable. This is irrespective of the state ρ, which can be entangled or separable. Note also the difference between the process matrix (A15) and the CJ representation of state preparation, Eq. (A12).
b. Channels. Consider a bipartite situation where a party A only performs state preparations, while the second party B only performs measurements. In this case, the local laboratory of A is characterized by a trivial input space, d A I = 1, while B has a trivial output space, d B O = 1. The process matrix shared by A and B, which represents here a quantum channel, is then defined on the space A O ⊗ B I ∋ W. The probability that B observes a POVM element E when A prepares a state ρ is given by where we used (A12) and (A13) for the local operations. This is equivalent to saying that B measures E in the state tr Comparing this with the inverse CJ transformation (A9), we find that the process matrix W corresponds to a channel with CJ representation W T . In other words, a channel C from A O to B I is represented by the process matrix Note that the CJ representation of a channel, Eq. (A7), differs by a transposition from the corresponding process matrix (A18).
With the usual generalized Born rule (3), the reduced process matrix gives the probability for the remaining N − 1 parties to measure arbitrary CP maps, given that the j-th party performs M A j O accounts for the possibility of signalling: the remaining parties observe different probability distributions depending on the choice of CPTP map performed by party j. As an example, consider a process matrix of the form (A18). If A prepares a state ρ, the reduced process matrix for B is Thus, for a process that represents a channel from A to B, the reduced process for B, given that A prepares ρ, is simply the channel applied to ρ, as should be expected.
d. Pure process matrices. In some cases, the process matrix turns out to be a rank-one projector: W = |w w| for some "process vector" |w . If the CJ operators representing the local operations are also rank-one projectors, as is the case for unitaries and projective measurements followed by pure repreparations, it is convenient to work at the level of vectors and of probability amplitudes: given the local operations A 1 , . . . , A N represented by the CJ vectors |A * O , the overall probability amplitude is given (up to global phase, which we choose to be 0) by The probability is then obtained as the modulus square of the amplitude and conforms to the general expression (3). Given that party j performs the unitary U j , the reduced process is clearly given by the partial scalar product The process matrix describing a unitary channel U from A O to B I is of particular interest. Using (A18), we find that it is given by Note again the difference between this expression and the CJ representation (A1). Generalizing this to a sequence of parties A 1 , . . . , A N , with the output of party j connected to the input of party j + 1 via the unitary U j , we find

Appendix B: Valid process matrices
The conditions for an operator W ∈ A I ⊗ A O ⊗ B I ⊗ B O to be a valid process matrix were first found in Ref.
[13], where they were formulated in a basisdependent way. Here we derive the equivalent characterization of valid process matrices given in Eqs. (4)-(6); we formulate it in a basis-independent way, which we find to be more convenient for our purposes.
We present below the derivation in the bipartite case, and also write explicitly, for ease of reference, the characterization in the tripartite case. The N-partite case follows from a straightforward generalization.

Bipartite process matrices
Recall that a given operator is a valid process matrix if and only if it yields, through the generalized Born rule (3), only well-defined probabilities -that is, the probabilities must be non-negative and must sum up to 1.
Non-negativity. As recalled previously, a map is completely positive if and only if its CJ representation is positive semidefinite. Including the possibility that A and B's operations involve interactions with a (possibly entangled) ancillary system in a state ρ A ′ I B ′ I , the non-negativity of probabilities is thus equivalent to 14 Note that ignoring the possibility of an ancillary system, one would only find that W must be "positive on pure tensors" (with respect to the partition A I A O /B I B O ) -a class strictly larger than positive semidefinite matrices [37].
Requiring that its value is non-negative for all M B ′ I B I B O ≥ 0 implies that W must be positive semidefinite.
Reciprocally, W ≥ 0 clearly implies that (B1) is satisfied. Hence, the non-negativity of probabilities is equivalent to W being positive semidefinite, Eq. (4).
Normalization. The fact that probabilities must sum up to 1 for all instruments is equivalent to the constraint that the probability of realization of any CPTP map is 1. Now, recall that a CP map M A : Ignoring here for simplicity the possible use of an ancillary system (which leads to the same conclusion 15 ), the normalization of probabilities is thus equivalent to  satisfy the above normalization constraints (where from now on we are omitting the superscripts to reduce cluttering), we find that Eq. (B2) is equivalent to For x = y = 0, this yields the normalization condition of Eq. (5), For y = 0 and x = 0, respectively, this in turn implies which then imply which we can rewrite as which are conditions (9)- (11). Note that each condition (B8)-(B10) defines a linear subspace, and the intersection of these three linear subspaces is the smallest subspace that contains all valid bipartite process matrices, which we denote by 17 17 Note that although we do not write that explicitly, the projectors we define below (e.g. L V ), and of course the subspaces they define (e.g. L V ), depend on the number of parties N.
The projector onto this subspace, L V , shall be used quite often in the paper, so it is useful to find an explicit expression for it. To do that, first we rewrite conditions (B11)-(B13) explicitly as projections onto subspaces, i.e., as where the projectors L A , L B , and L AB are given by Since the three projectors above commute, the projector onto the intersection of their subspaces L V is given simply by the composition of L A , L B , and L AB , i.e., which, after simplification, can be written as Summing up, we conclude that an operator W ∈ as in Eqs (4)-(6).

Tripartite process matrices
A similar reasoning leads to the conclusion that an where the maps L A , L B , L C , L AB , L AC , L BC , and L ABC are now commuting projectors onto linear subspaces of where we used the shorthand notation for a sum over products of subsystems X with coefficients α X (and with 1 W := W). The constraints in (B21) are equivalent to where the map L V is obtained here by composing the 7 maps L A , L B , L C , L AB , L AC , L BC , and L ABC . One finds in this tripartite case, after simplification, which defines a projector onto the linear subspace

N-partite process matrices
The generalization to the N-partite case is rather straightforward. We find that an operator , and for all 2 N − 1 non-empty subsets X of {1, . . . , N}, Note that the 2 N − 1 maps L X are commuting projectors onto linear subspaces of A 1 where the map L V is obtained this time by composing the 2 N − 1 maps L X . More explicitly, one finds in the N-partite case the general expression which again defines a projector onto the linear subspace For X = X ′ , note that there exists (at least one) i 0 such that the product P X P X ′ contains the factor (1 Proof. The "only if" direction is straightforward (simply replace q W A≺B → W A≺B and (1−q) W B≺A → W B≺A to go from (18) to (C4), so that W A≺B and W B≺A in (C4) are not normalized).
To see that the converse also holds, first note that W A≺B ≥ 0 and W B≺A ≥ 0 imply that W ≥ 0, so that W is indeed a valid process matrix. Note furthermore that , that W B≺A satisfies (9). Since W ∈ L V also satisfies (9), so does W A≺B = W − W B≺A . Similarly, W A≺B = B O W A≺B , together with the assumption that W ∈ L V , implies that both W A≺B and W B≺A satisfy (10). Lastly, W A≺B = B O W A≺B and W B≺A = A O W B≺A directly imply that both W A≺B and W B≺A satisfy (11). All in all, this shows that W A≺B and W B≺A are, up to normalization (which can easily be dealt with as above so as to recover the form (18)), valid causally ordered process matrices.
We are now in a position to prove Theorem 1: where P is the self-dual cone of positive semidefinite matrices and L B O and L A O are the linear subspaces Taking then the dual of W sep using the duality relations (C2)-(C3), we get that the cone of causal witnesses is Focusing on (P ∩ L B O ) * , using again the duality relations (C2)-(C3), we see that where the last equality is stating the fact that S = S + + Putting Eqs. (C14)-(C16) together, we see that a causal witness can be written as

Appendix D: Explicit positive semidefinite constraints
For the convenience of the reader, we present the SDP problems (38) and (39) with all conic constraints rewritten in terms of the positive semidefinite cone, to facilitate implementation. To rewrite (38), we need a characterisation of the cones S V and W * V . The first one is given by Corollary 2. The second one is obtained as follows: since W = P ∩ L V , we have that where Equation (D4) follows from an argument analogous to the one used to derive Corollary 2.
With this characterisation, the SDP problem (38) then becomes min tr(SW) To rewrite the SDP problem (39), we use the characterisation of W sep given in Lemma 5: Note that we could use directly the definition of W sep from Section II B 2, which would give us a slightly more complicated SDP problem.

Appendix E: Duality for conic problems
In this appendix we show that the two problems defined in Section III B are SDP problems and they are dual to each other. We show, furthermore, that the Duality Theorem applies to them, which implies that the optimal solutions can be found efficiently and that Equation (40)  where K * ⊂ E ′ is the cone dual to K, L ⊥ ⊂ E is the orthogonal complement to L, L + b ⊂ E and L ⊥ + c ⊂ E ′ are affine subspaces. (P) and (D) are called, respectively, the primal and dual problems associated with the above data.
We want our SDP problems to measure how much worst-case noise needs to be added to a given process matrix W to make it causally separable, i.e., the minimal λ ≥ 0 for which is a causally separable process, optimized over all valid (normalised) processes Ω. First note that we can get rid of the quadratic variable λ Ω by defining Ω = λ Ω, which makes the objective λ equal to tr Ω/d O . Remembering also that the normalisation 1/(1 + λ) is irrelevant for conic constraints, the problem reduces to minimizing tr Ω/d O such that To translate this SDP problem into the language of Definition 6, let us define and the inner product (S, Σ), (W, Ω) = tr(SW) + tr(ΣΩ).
With these definitions, and denoting by x = (ω, Ω) its variable, the primal SDP problem becomes which indeed corresponds to the SDP problem (39).
To construct the dual SDP problem, first note that where we used the property that (K 1 × K 2 ) * = K * 1 × K * 2 in equation (E11). Denoting by y = (S, Σ) its variable, the dual SDP problem is then which corresponds to the SDP problem (38). Let us emphasize that here the duals of W sep and W are, respectively, S V and W * V , instead of S and W * , which is a consequence of choosing the vector space We did this because as subsets of A I ⊗ A O ⊗ B I ⊗ B O , the cones W sep and W (and therefore K = W sep × W) have empty interiors, and therefore these cones would not satisfy the requirements of Definition 6. This is problematic because the duals of cones with empty interior are not pointed (in our case, S and W * are not pointed), and algorithms that solve SDP problems are numerically unstable when optimizing over non-pointed cones.
This definition is indeed satisfied by the cones we chose, i.e., W sep × W ⊆ L V × L V is indeed pointed and has nonempty interior, as we shall check now. A pointed cone K is a cone such that K ∩ (−K) = {0}. This indeed satisfied for W sep × W, as both cones require their elements to be positive semidefinite, and W ≥ 0 and −W ≥ 0 imply that W = 0. To show that W sep × W has nonempty interior, it is enough 19 to find an operator that belongs to int W sep . This is done through the following lemma: Proof. Since Ω ∈ L V , the discussion in section II B 2 implies that the operators are causally ordered (in the sense that they satisfy Eq. (17) and the analogous relation for the order B ≺ A, respectively), and so are the operators Since, furthermore, we have that 1 • + Ω ∈ W sep if W A≺B and W B≺A are positive semidefinite. This is the case if where · is the standard operator norm (i.e., the maximum singular value). To be able to enforce that, first note that ω A≺B and ω B≺A are orthogonal, and therefore Pythagoras' theorem implies that This concludes the proof that problems (38) and (39) are SDP problems dual to each other. We shall now proceed to show that the Duality Theorem (Theorem 4.2.1 in [22]) applies to them: Theorem 8. Let (P), (D) be a primal-dual pair of conic problems as defined above, and let the pair be such that 1. The set of primal solutions K ∩ (L + b) intersects int K; 2. The set of dual solutions K * ∩ (L ⊥ + c) intersects int K * ; 3. c, x is lower bounded for all x ∈ K ∩ (L + b).
Then both the primal and the dual problems are solvable, and the optimal solutions x * and y * satisfy the relation Let us check that for the SDP problems (E9) and (E13), the three assumptions of the Duality Theorem are indeed satisfied.
To see that 1. is satisfied, we need to find Ω ∈ int W such that W + Ω ∈ int W sep . Take Ω = λ1 • ; then W + λ1 • ∈ int W sep iff 1 λ W + 1 • ∈ int W sep . Using Lemma 7, we conclude that this is true if it is enough to choose to satisfy inequality (E23), and we're done. To see that 2. is satisfied, we need to exhibit a witness S such that S ∈ int S V and 1/d O − S ∈ int W * V . Since the cone P ∩ L V of positive semidefinite matrices in L V is a full-dimensional subset of both S V and W * V , it is enough to find an operator S such that S > 0 and 1/d O − S > 0. One can take S = 1/(2d O ).
To see that 3. is satisfied, note that Ω ≥ 0 implies that tr Ω/d O ≥ 0.
All in all, the three assumptions of the Duality Theorem above are thus satisfied. Applying the identity (E22) to our pair of conic problems, we have, for the optimal solutions Ω * and S * : as claimed in Eq. (40). As discussed in Sec. III B, a value tr[Ω * ]/d O = − tr[S * W] > 0 guarantees that the process matrix W is causally nonseparable, and the solution S * of the dual problem provides an explicit causal witness; a value tr[Ω * ]/d O = 0 proves that the process matrix W is causally separable, and the primal problem provides a decomposition of W in terms of causally ordered process matrices W A≺B and W B≺A (again, this is easier to see in the representation of the primal problem shown in (D6)).

Appendix F: Measuring causal nonseparability
A causal witness can be used not only to detect the causal nonseparability of a given process, but also to measure it. This is analogous to the situation with entanglement witnesses and entanglement measures [38]. First of all, we need to define what we mean by a measure of causal nonseparability. In analogy with the case of entanglement, we suggest that a proper measure of causal nonseparability N should satisfy the following properties: Discrimination: N (W) ≥ 0 for every process matrix W, with N (W) = 0 if and only if W is causally separable.
Convexity: N (∑ i p i W i ) ≤ ∑ i p i N (W i ) for any process matrices W i and any p i ≥ 0, with ∑ i p i = 1.
Monotony: N $(W) ≤ N (W), where $(W) is any process obtainable from W by composing it with local CPTP maps.
Now we shall prove that both R g (W) and R r (W) as defined in equations (42) and (46) respect the properties of Discrimination and Convexity, whereas R g (W) respects Monotony but R r (W) does not.
Discrimination follows from the definition of the SDP problems (38)-(39) and (45)-(44). Note that since they satisfy the assumptions of the Duality Theorem (8), there are algorithms that actually find the optimal solutions efficiently.
To demonstrate Convexity, let us denote by S W the optimal witness for a given process matrix W; because of its optimality, one has, for any process matrices W i and any p i ≥ 0, and therefore that is, N (∑ i p i W i ) ≤ ∑ i p i N (W i ). Now we show that Monotony does hold for R g (W). For that, first we need to define the map $(·) that composes a process W with local operations. More specifically, the map $(·) composes a process with the CPTP map M A 1 applied to Alice's input, the CPTP map M A 3 applied to Alice's output, the CPTP map M B 1 applied to Bob's input, and the CPTP map M B 3 applied to Bob's output. We can then define $(·) as the map such that for all processes W and all CP maps C A 2 and C B 2 we have that where is the Choi-Jamiołkowski operator of the composition of the each party's operations. The processes W and $(W) are illustrated in Figure 5. Figure 5. (a) The situation where the parties share a bipartite process W (in red) and apply the CPTP maps M X 1 and M X 3 (in blue) to their inputs and outputs can be equivalently described by (b) a single bipartite process $(W) (in red).
It follows from this definition that $(W) is a valid process. To see this, note that the validity of W implies that the probabilities are positive and normalised. By definition, these are equal to the probabilities and the arguments in Appendix B show that requiring the probabilities P(C A 2 , C B 2 ) to be positive and normalised is enough to imply the validity of the process $(W).
Furthermore, if W is causally separable so is $(W). This follows from the linearity of $(·) and from the fact that $(·) preserves the causal order when applied to a causally ordered process, which follows directly from the analogous property for quantum combs [14].
We want to show that for all $(·) (i.e., for all CPTP maps (where $ * is the dual map of $), which follows from the optimality of S W if $ * S $(W) is a valid causal witness that respects the normalisation condition for generalised robustness (as defined in SDP problem (38)). Therefore, we need to show it has the two following properties: The first one follows from duality tr $ * S $(W) W sep = tr S $(W) $(W sep ) (F12) and the fact that $(W sep ) is causally separable and S $(W) is a causal witness. The second one is equivalent to for every (not necessarily normalised) process matrix Ω. From duality and linearity this is equivalent to and this follows from the fact that $(·) is tracepreserving and that 1/d O − S $(W) ∈ W * (which is the normalization condition from the SDP problem (38)). An analogous proof fails for random robustness, as the dual map $ * (·) can increase the trace of a witness, and therefore make it fail to satisfy the normalisation condition for SDP problem (44). To show that R r (W) does not in fact satisfy Monotony, it is enough to find a process and local operations such that R r $(W) > R r (W).
A concrete counterexample can be obtained by considering W OCB and S OCB from section III D. Let be the process obtained from W OCB by adding a maximally mixed qubit to Alice's input space. Then its random robustness is (up to numerical precision) where is its optimal random robustness witness. Now, we can obtain the process from W 1 simply by discarding the system in Alice's input space A ′ I and replacing it with |0 0|, which is clearly a local operation. Then its random robustness is (up to numerical precision) R r ($(W 1 )) = − tr S $(W 1 ) $(W 1 ) = 2( √ 2 − 1), (F19) where S $(W 1 ) = S W 1 . Thus we have shown that