Quantum causal modelling

Causal modelling provides a powerful set of tools for identifying causal structure from observed correlations. It is well known that such techniques fail for quantum systems, unless one introduces `spooky' hidden mechanisms. Whether one can produce a genuinely quantum framework in order to discover causal structure remains an open question. Here we introduce a new framework for quantum causal modelling that allows for the discovery of causal structure. We define quantum analogues for many of the core features of classical causal modelling techniques, including the Causal Markov Condition and Faithfulness. Based on the process matrix formalism, this framework naturally extends to generalised structures with indefinite causal order.


I. INTRODUCTION
A. Why cause Causality often appears as an axiom in the formulation of physical theories, typically as the assumption that effects cannot precede their causes. However, what is often left unclear is which general principles should be used to define causal relations and relata.
The recent technical developments of causal modelling provide a consistent mathematical formalism that relates causal influence to the possibility of signalling [1,2]. This approach provides an explanation for why causation is not 'merely correlation' and has shed light on contentious issues across a variety of scientific fields [3][4][5].
Causal models rest on essentially two concepts: the existence of autonomous mechanisms, responsible for producing the observed correlations, and the possibility of interventions, which modify part of the mechanism according to some factor external to the model. It is the latter notion that differentiates causal relations from simple correlations. Together, mechanisms and interventions define what is known as the causal Markov condition, a constraint imposed on the structure of a causal model.
Causal models have proved tremendously useful for understanding causal relations across a broad range of disciplines. They provide a powerful mathematical toolbox for efficiently extracting causal information from observed data, Figs. 1, 2. They enable scientists to predict, manipulate and explain. The application of such models to physics represents a promising direction for a deeper and unified understanding of causality [6].
Unfortunately, one cannot straightforwardly apply these methods to quantum systems. Causal models presume the existence of objective properties that can be observed and manipulated locally. Such assumptions are incompatible with quantum mechanics, as most promin- ently demonstrated by Bell's theorem [7]. Of course, one can apply the classical methodology, but only at the expense of introducing hidden, fine tuned [8] mechanisms, such as action at a distance [9] or retrocausality [10]. Although seemingly conservative, such approaches violate the very pillars of causal reasoning: they posit the existence of mechanisms that cannot be discovered and variables that cannot be controlled.
An alternative approach is to reformulate causal models from the ground up, in a way that makes direct use of the quantum formalism. Indeed, quantum theory can be seen as a theory of interventions [11] and, from the perspective of quantum information, causal relations are identified with signalling, in direct agreement with the intuitions underlying classical models. Additionally, quantum circuits [12] are often interpreted as being representative of causal structure. This suggests that quantum causal models should be rather natural within quantum theory. However, despite much work in this direction [13][14][15][16][17][18][19][20][21][22], a general framework for quantum causal models has not been found. As identified in Ref. [21], a key missing step is the formulation of a quantum version of the causal Markov condition. In classical models, this condition enables one to use observational and interventionist inference to deduce causal structure from data. Arguably, a formalism for quantum causal modelling would require a similar condition.  The rain fills the stream that moves the wheel, activating the mechanism that grinds the grains to produce flour.

B. This work
Here a complete framework for quantum causal modelling is presented. The framework rests on quantum definitions of mechanisms and spatio-temporally localised interventions. A quantum version of the causal Markov condition is defined, allowing for the possibility of causal discovery, namely, the identification of causal structure from empirical data. Classical causal models are recovered as a limiting case of the formalism. Furthermore, the formalism allows for natural quantum extensions of classical concepts, such as faithfulness and the distinction between direct and indirect causes.
In the present framework, causal relations are identified with the possibility of signalling, that is, with the possibility of modifying the statistics of an event through intervention on a different event. The role of interventions was not clearly identified in previous works, which rather attempted to generalise various aspects of 'causally neutral' Bayesian networks [13,14,[16][17][18][19][20][21]. Without a definition of intervention, it is not clear in what sense such generalisations should be thought of as causal.
Although some notion of interventions is present in Ref. [15], in that formalism not all correlations have a causal explanation, as a new form of 'contemporaneous' correlation is postulated to account for entanglement. By contrast, in the causal models introduced here all correlations arise from direct, indirect, or common causes.
Classical causal models use probability distributions and it is natural to ask whether quantum causal models can be defined by simply replacing such distributions with density matrices. However, a classical joint probability distribution can describe common-cause as well as direct-cause relations, whilst a joint density matrix cannot describe direct-cause relations between quantum events [16].
Attempts have been made to generalise density matrices in order to solve this problem [16,23]. In such attempts each quantum event is associated with a single Hilbert space, and it has been shown that such approaches face serious problems [24]. In contrast, the present work identifies a quantum event with a local operation and thus is associated with a pair of Hilbert spaces, describing respectively the input and the output of the operation. This follows the quantum networks [25] and process matrix [26] formalisms. Ultimately, 'splitting' the Hilbert space of an event is what allows for a unified representation of arbitrary causal structures. This fact was also recognised in Ref. [22], which presented a special case of the general framework introduced here.
Other recent works have focussed on the correlations between classical variables produced in experiments involving quantum systems [18][19][20]. These works adopt a device independent perspective, where the observed data contain no information about the functioning of or the physical theory describing the instruments used. Such frameworks cannot distinguish direct from indirect causes, or more generally formulate a quantum version of the causal Markov condition. Such limitations are overcome here by adopting a device dependent quantum formalism, where the description of events contains all relevant information regarding the physical devices that produced them.
The work is organised as follows. In Sec. II the framework of classical causal models is briefly reviewed. The quantum formalism is introduced in Sec. III, where Markov quantum causal models are defined and it is shown that non-Markovian, causally-ordered models can always be reduced to Markovian ones. In sec. IV, faithful causal models are defined and the basic tools for quantum casual discovery introduced. It is shown in Sec. V that classical causal models are recovered as a limiting case of quantum ones. Possible extensions of the formalism to indefinite and cyclic causal structures are briefly discussed in Sec. VI. An introduction to the framework presented here is also considered from a broader philosophical perspective in Ref. [27].

II. CLASSICAL CAUSAL MODELS
The brief review of classical causal models presented here is based on the monograph by Pearl [2].

A. Mechanisms
In classical physics, an event is identified with the value assumed by some property of a system at a given moment in time (denoted x, y . . . ). Potential events are described by random variables (X, Y, . . . ), with values distributed according to some probability Pr(X = x, Y = y, . . . ). A classical mechanism is defined by a function Y = f (X), relating random variables. The variables are depicted as circles and the mechanisms depicted via connecting arrows. External noise is represented by an additional variable Λ with an associated probability distribution. Noise variables are typically not drawn explicitly and the arrows are interpreted as generic stochastic processes, defined by a conditional probability P (Y |X).
The short-hand P (x) ≡ Pr(X = x) is used when there is no risk of confusion.
The building blocks of a causal model are mechanisms that connect events to each other according to classical laws of dynamics, see Fig. 3. Deterministic laws imply a functional dependence between variables: Unknown noise affecting the mechanism can be modelled as an additional variable Λ with an associated probability distribution P (λ). The functional relation is then Y = f (X, Λ), which defines a stochastic mechanism that transforms X to Y according to the conditional probability where δ ab is the Kronecker delta.

B. Interventions
What distinguishes a causal mechanism from a generic conditional probability is the possibility of changing it via intervention, Fig. 4. Causal relations are indeed defined by the possibility of intervening on a variable in the model and forcing it to assume some chosen value, thereby overriding the natural process that generates it. If X is a cause of Y , then setting X to a particular value will not change the conditional probability P (y|x). On the other hand, setting Y to a specific value requires breaking the process that generates Y , thus removing the correlations between X and Y .
More generally, an intervention on a variable Y can be any modification of the mechanism that produces Y , without necessarily breaking all pre-existing causal influences. This can be modelled by some control variable I Y that parametrizes the mechanism producing Y . Intervention variables can be treated as random variables A causal mechanism can be altered through an external intervention. An intervention can be represented as an additional random variable IY , which is commonly not drawn explicitly.
themselves, thus causal mechanisms are defined as conditional probabilities If the only type of intervention considered is "set Y to value y", the intervention variable takes as possible values for all possible values y of Y . The 'do' intervention is formalised as where P * represents the 'natural' probability, i.e. the one observed if no intervention is made, often simply written as P (y|x). 1 It is of no particular significance whether the intervention is performed by a human agent, a pre-programmed device, or nature itself. Intervention variables simply represent parameters that have to be fixed independently of the variables that are included in the model itself, in order to extract predictions from it. The possibility of separating internal variables from external parameters is the fundamental assumption that allows for a causal interpretation of the model. Accordingly, a variable X is interpreted as a cause of another variable Y if correlations exist between I X and Y .

C. Causal models
A causal model captures the causal relations between a set of variables in terms of mechanisms and interventions, Fig. 5. The qualitative structure of cause-effect relations in a model defines a causal structure, which is represented as a directed acyclic graph.

acyclic graph (DAG) is a directed graph that contains no directed cycles.
A functional causal model is defined by a set of functions that determine the observed variables given their direct causes and possibly some noise. The Markov assumption states that all noise variables are uncorrelated, leading to the following definition: 2. A list of conditional probabilities for each variable given its parents, P (x j |pa j ), j = 1, . . . , n, where the set of parents PA j of a vertex X j in a DAG is defined as the set of vertices X with an edge from X to X j : The conditional probabilities P (x j |pa j ) represent autonomous causal mechanisms that can be modified through interventions. As mentioned above, the intervention variables are often left implicit, i.e. it is understood that P (x j |pa j ) ≡ Pr(X j = x j |PA j = pa j , I j = idle).
A causal model generates a probability distribution for the observed random variables. Decomposition (7) is called the Markov condition and can be used to test (a) (b) Figure 6. Latent variables. (a) A probability P (x1, x2, x3) that does not satisfy the Markov condition can be seen as an unknown mechanism that connects the variables. In this case, it is natural to look for an extended model (b), including latent variables (here X4) such that P (x1, x2, x3, x4) does satisfy the Markov condition for some causal structure.
the compatibility of given data with a causal structure, even in the absence of interventions. However, without additional assumptions, the Markov condition does not identify a unique DAG and the causal structure remains underdetermined. 2 Explicitly, the causal information of a model is encoded in the conditional probability which reduces to condition (7) when I j = idle for all j.
Eq. (8) can be called causal Markov condition, although often this is defined as condition (7), leaving intervention variables and causal interpretation implicit. Eq. (8) can also be used to define causal models in the language of influence diagrams [28]. If a set of variables Y 1 , . . . , Y r in a causal model is not observed, the marginal probability P (x 1 , . . . , x n ) = y1,...,yr P (x 1 , . . . , x n , y 1 , . . . , y r ) does not necessarily satisfy the Markov condition, and the unobserved variables are called latent, see Fig. 6. Given a probability that does not fulfill the Markov condition, it is always possible to extend the set of variables to include latent variables that restore the condition. Implicit in such a move is the assumption that latent variables can be accessed at least in principle and their causal role verified through interventions. In the context of quantum correlations, such an extension leads to hidden variable models. However, a causal interpretation of such models implies a deviation from quantum mechanics: intervening on hidden variables would allow signalling at a distance, in contradiction with quantum predictions. Models in which the hidden variables cannot be accessed, even in principle, are a logical possibility, but they do not have a causal interpretation in the interventionist sense that underpins classical causal modelling.

III. QUANTUM CAUSAL MODELS
The process matrix formalism reviewed below was introduced in Ref. [26] as an extension of causally ordered quantum networks [25,29,30]. Only finite-dimensional systems are considered here, a generalisation to infinite dimensions was studied in Ref. [31].

A. Quantum events
In contrast to classical systems, quantum systems do not possess objective properties that can be assigned values prior to and independently of measurement. Thus, the relata of causal relations are not classical random variables-a genuinely quantum notion of event is needed. A quantum event can be identified with all available information about a system localised in space and time. Typical quantum events are "the spin was found aligned with the z axis" and "the spin was rotated by π around the z axis". A combination of the type "the spin was found aligned with the z axis and then reprepared aligned with the x axis" can be considered as an event too. In general, a quantum event is associated with an operation, which can be deterministic (as a fixed transformation) or non-deterministic (as one associated with the outcome of a measurement).
Formally, such a quantum operation is represented by a completely positive (CP) map M : A I → A O , where input and output spaces are the spaces of linear operators over input and output Hilbert spaces, , respectively (here identified with the corresponding matrix spaces). Complete positivity means that, for arbitrary dimensions of an ancillary system A , the map I A ⊗ M transforms positive operators into positive operators, where I A is the identity map on A . Input and output spaces can have different dimensions d A I , d A O , as ancillas can be added or discarded.
Using the Choi-Jamiołkowski isomorphism [32,33], a CP map can be represented as a matrix 3 : where {|j }  . Local laboratory. A local laboratory represents a quantum system in a region of space-time with space-like boundaries. Past and future boundaries are identified with an input and an output state space, AI and AO, respectively. The local laboratory is identified with the product space AI ⊗ AO. A quantum event is a quantum operation that takes place in the local laboratory and is represented as a completely positive map from input to output space. CP maps will be identified with their Choi-Jamiołkowski representation unless otherwise stated.
An example of a quantum event is a projective measurement that yields as an outcome some pure state |ψ and resets the systems in the same state. As a CP map, this is represented as where the notation [ψ] := |ψ ψ| is used [34]. Another example is a transformation of the system occurring with unit probability, for example as defined by a unitary matrix U . Introducing the notation [35,36] the local event corresponding to the unitary transformation U is given by where * is the complex conjugation in the chosen basis. A more general deterministic transformation is a CP and trace-preserving (CPTP) map, represented as A CPTP map is also called a quantum channel. The space of potential events is identified with the set of CP maps between an input (A I ) and an output (A O ) space, which is isomorphic to (the cone of positive matrices in) the space A I ⊗ A O . Input and output spaces can be respectively identified with the past and the future space-like boundaries of the space-time region where the event takes place, Fig. 7, as in Oeckl's 'general boundary' formalism [37]. The space of potential events, which plays a role analogous to a classical random variable, is called a local laboratory.
Whilst the name implies some anthropocentric element, the term 'local laboratory' is taken to represent an observer-independent spatio-temporal region where quantum events can take place. In direct analogy to the classical case [38], one can argue whether it is possible to entirely remove the concept of observer, though we shall not enter into such discussions here.

B. Mechanisms
As in the classical case, a causal model is built on the notion of physical mechanisms that are responsible for mediating causal influences between events. A quantum mechanism maps the output space of a local laboratory, say A O , to the input space of another one, say B I . The analogue of a deterministic mechanism is a unitary 5 map U , which transforms states as ρ → U ρU † . External noise can be described by an interaction with an environment which is then traced out, leading to the more general definition of a mechanism as a CPTP map.
In the present approach, CPTP maps representing events in local laboratories are distinguished from CPTP maps representing connecting mechanisms. This distinction is reflected by a different representation of the two: a CPTP map T corresponding to a connecting mechanism is described by the transpose of representation (9): where T X T Y denotes partial transposition on subsystem X.
As an easy example, a unitary transformation from the output space of A to the input space of B is represented by the projector [[U ]] A O B I , the transpose of (13). A mechanism that connects a space of dimension 1 to the input of a local laboratory is given by a density matrix ρ A I ≥ 0, tr ρ = 1, which corresponds to the situation where state ρ is sent to the input space A I .

C. Interventions
Recall, the possibility of intervention is required to give causal models empirical meaning. Quantum interventions can be formalised using the notion of instrument [39], which represents the collection of all possible events that can be observed given a specific choice of probing the system. An intervention is thus defined as a choice of instrument. Given a local laboratory A I ⊗ A O , an instrument is formally defined as a set of CP maps that sum up to a CPTP map: 5 In fact, the direct analogue of a deterministic mechanism is an isometry, for which V † V = 1 but it is not necessarily true that V V † = 1. Unitaries represent reversible deterministic mechanisms.
(The trace-preserving condition for the sum guarantees that probabilities sum up to 1.) A typical example of an instrument is the measurement of the incoming system according to a positive operator valued measure by the preparation of a state ρ. Using representation (9), this is given by By keeping the POVM fixed and choosing different states, one breaks the flow of information through the local laboratory and obtains the equivalent of classical 'do' interventions. As discussed in Sec. V below, classical 'idle' interventions correspond to quantum projective measurements in a fixed basis. These, however, are not interpreted as passive observations in the quantum formalism. Indeed, it is one of the essential features of quantum mechanics that measurements necessarily disturb the system.

D. Process matrices and causal relations
The general situation of interest is described by a set of local laboratories , interpreted as representing n disjoint space-time regions, each bounded by two space-like surfaces. In an individual run of an experiment, instruments J L1 1 , . . . , J Ln n are implemented in these local laboratories and the corresponding events recorded. The events are described by CP maps M L1 1 , . . . , M Ln n . By assuming the local validity of quantum mechanics, with no further assumption about causal relations between local laboratories, it is possible to prove that the probability for such events to occur is given by the generalised Born rule [26] where W L1...Ln ≥ 0 is called the process matrix, Fig. 8, and represents the information about the outside world available in the local laboratories 6 . In particular, the process matrix encodes all the information about the causal relations between laboratories. As in the classical case, causal relations are defined by signalling. By definition, a laboratory L h can signal to a laboratory L k if there exists a set of instruments 6 To be more precise, the conditional probability is defined as Eq. (20) when all the maps belong to the corresponding instruments, M j ∈ J j for j = 1, . . . , n, and it is 0 otherwise.  such that Signalling between sets of laboratories is similarly defined. By definition, a local laboratory A represents a cause for a distinct laboratory B if A can signal to B.
In classical causal models there is an important distinction between direct and indirect causes. Intuitively, a direct cause always influences its effect, no matter what else is changed in the model. This is formalised as follows:

E. Examples
Before defining quantum causal models in full generality, it is useful to go through some explicit examples.

Single laboratory
Consider first the case of a single laboratory A, with only non-trivial input space A I . In this case, process matrices reduce to W A I ≡ ρ A I ≥ 0, CP maps reduce to POVM elements E A I ≥ 0, and the generalised Born Rule (20) reduces to the ordinary Born rule: P (E A I ) = tr Eρ. This describes the situation where laboratory A receives a quantum state ρ from the outside environment and performs a measurement on it. The environment, responsible for preparing state ρ, describes some part of the world that is not under experimental control. Of course it is always possible to consider a different scenario in which the state preparation is controlled. This is formalised by introducing a second laboratory B, with only output space B O , in which the state is prepared. Now B has the possibility to choose from different instruments, i.e. prepare different states ρ i . Recall that, as local events, such state preparations are represented as The process matrix now describes how the two laboratories are connected: is connected to A via the identity channel, where notation (12) is used, The generalised Born rule (20) now reads This reduces to the previous single-laboratory case if B prepares ρ i ≡ ρ. As the probability for any E = 1 depends non-trivially on ρ, B represents a cause for A in this example.

Common cause
The next example consists of two laboratories with non-trivial input and output spaces, describes the situation where the two laboratories have no causal influence over each other (Fig. 9). The generalised Born rule reduces to where Thus, the the above process matrix describes a bipartite state shared by the two laboratories. As before, one can allow for control over the state preparation by introducing a third laboratory C with only non-trivial output space C O . As C can prepare bipartite states, its output space must decompose as a tensor product It is easy to see that, when the instrument is chosen, the generalised Born rule reduces to expression (26).
As a particular case, state ρ can be entangled and A, B, can perform measurements that violate Bell inequalities. The common cause for such correlations is simply associated with laboratory C, i.e. with the possibility of preparing different states.

Still in the scenario with two laboratories
The unitary evolution U is now represented by the singleelement instrument [[U * ]] C I C O (see Eq. (13)). It is now Figure 10. Outgoing edges. To each edge is associated a source space. The output space of a local laboratory is the tensor product of the source spaces associated with all outgoing edges. In the picture, possible to consider interventions in the additional laboratory C. For example, C can implement an instrument that breaks the flow of information from A to B, e.g. the maximally noisy channel 1 C /d C O . In this case, no choice of instrument at A can affect the probability for any POVM element in B. Thus, A is a direct cause of B given the process W AB but it is an indirect cause given the process W ABC . As in the classical case, whether a causal relation is direct or indirect is relative to the set of variables included in the model.

F. Markov quantum causal models
As for classical causal models, it is useful to depict causal relations graphically. Local laboratories are represented as nodes and the causal mechanisms connecting them as arrows.
Multiple arrows departing a single node represent different physical systems. Thus the output space of a laboratory factorises as a tensor product, with a tensor factor associated to each outgoing arrow, see Fig. 10. Formally, a source space S e is associated to the origin of each directed edge e, such that each output space factorises as A O = e∈out A S e , where out A is the set of edges departing from A.
As with classical models, arrows should represent a generic possibility of causal influence (and not just the undisturbed transfer of a system, as in the representation of quantum circuits). Thus, a graph with arrows that connect A to C and B to C represents quantum systems exiting from A and B and, possibly after interaction, entering C. In other words, the graph represents a generic quantum channel T (A O B O )C I from the output of laboratories A and B to the input of laboratory C, Fig. 11  When there are no outgoing edges from a laboratory, the output space can be understood as connecting to the trivial space (of dimension one). As the only CPTP map from a space D O to the trivial space is T D O = 1 D O , a process matrix must be equal to the identity on the output space of all such laboratories. A quantum causal model can be defined as a set of unitaries that connect (a subsystem of) the output spaces of the parents of a laboratory, plus some noisy environment, to the input space of the laboratory (possibly after Figure 11. Incoming edges. The parent space PSC of a local laboratory C is the tensor product of the source spaces associated with all the incoming edges. (In the picture, PSC = AO ⊗ BO.) The incoming edges represent a quantum channel from the parent space to the input space. discarding part of the system).
Recall for classical causal models, the causal Markov condition is guaranteed by the assumption that all environmental noise remains uncorrelated. Also known as the independent noise assumption, this feature of causal modelling is retained in the quantum case. Thus a quantum causal model is a list of quantum channels, the structure of which forms a DAG, Fig. 12.

Definition 4 (MQCM). Given a set of local laboratories
is the set of edges departing from the vertex L j ; 3. W is a process matrix of the form where O D := k∈D O k is the output space of the laboratories with no outgoing edges, D := {k|out k = ∅}; PS j := e∈inj S e is the parent space associated with laboratory L j , with the set of incoming edges to L j ; and The matrices T j in Eq. (31) define the quantum channels that connect local laboratories to each other and replace the conditional probabilities that define causal mechanisms in a classical model. A process matrix with the structure (29) is said to factorise over the DAG G. As

this factorisation is a direct analog of the causal Markov condition (8), it is natural to name (29) the quantum causal Markov condition.
It is useful to relate Markov causal models to Markovian time evolution, a concept familiar to most physicists [40]. Consider a DAG L 1 → L 2 → · · · → L n , with d Ij = d Oj (the generalisation to different dimensions is straightforward). The quantum causal Markov condition (29) reduces to If all laboratories except L k perform the identity channel, [ [1]] Lj for j = k, and L k performs a POVM measurement, , the generalised Born rule (20) reduces to where • denotes composition of maps. Eq. (34) defines a discrete-time Markovian evolution, while Eq. (33) shows that the k-th laboratory receives the result of such an evolution applied to the initial state ρ 1 , provided that no intermediate operation is performed. Channels of the form (34) are also called divisible [41].

G. Latent laboratories and non-Markovian models
A quantum process can involve unobserved events. Such events may correspond to naturally occurring evolution in the environment surrounding the observed events, or to non-selective measurements, with unknown outcomes, performed by adversary agents. A local laboratory in which the events are not observed will be called latent, in direct analogy to latent variables in classical models.
for some CPTP maps CL 1 1 , . . . , CL m m , where tr L denotes the partial trace over all the laboratories in L.
In Eq. (35), W is called reduced process matrix [42,43] and provides a full description of the physical situation for the observed laboratories, once the CPTP maps in the latent laboratories are fixed. The observed laboratories can now be linked by non-Markovian evolution, possibly with initial system-environment correlations: all the information regarding initial state and evolution of the environment is encoded in the latent laboratories. A formalism closely related to the one described here has in fact been developed for the study of non-Markovian dynamics [44,45].
A process matrix is called causally ordered if it is possible to define a relation of partial order among the laboratories, and signalling from a laboratory L 1 to another L 2 is possible only if L 1 precedes L 2 according to the assigned partial order. An important result from the theory of quantum networks [25,29] is that every causally ordered process matrix (formally equivalent to a quantum strategy or a quantum comb) can be realised by combining a sequence of quantum channels with memory. This implies that every causally ordered process matrix has a Markovian causal explanation. In other words, causally ordered, non-Markovian models can always be reduced to Markovian ones through the introduction of latent laboratories.
An MQCM can be further extended by including connecting mechanisms as locally observed events. There is a general procedure to re-write a connecting mechanism as a local event. Consider a local laboratory A = A I ⊗ A O , with parent space PS A = e∈in A S e , and a process matrix that factorises as W = T PS A A I ⊗W , where T PS A A I is the mechanism to be included as an event. The extended With sufficient resources, it is in principle possible to take control over all the quantum channels that define an MQCM. In this scenario, all nontrivial evolution takes place in local laboratories and can be in principle controlled, while the process matrix only describes connections between laboratories. In the resulting causal model, each edge e is associated both with a source space S e and a target space R e , such that S e ∼ = R e , O j = e∈outj S e and I j = e∈inj R e for every j. The process matrix is This process represents the set of 'wires' obtained from a quantum circuit after removing all the gates. Thus it is possible to give a causal interpretation to quantum circuits and related structures [46][47][48] by allowing the possibility of replacing each gate with an arbitrary operation.

IV. CAUSAL DISCOVERY
The success of classical causal models is largely due to the existence of efficient tools for discovering causal structures. Apart from practical utility, causal discovery has foundational significance, as it gives empirical meaning to causal structures. This section shows that it is always possible to uniquely determine the causal structure of a Markov quantum causal model from experimental observations. The assumptions required are similar to those imposed on classical causal models. The question of whether this can be done efficiently will not be addressed here.

A. Process-matrix tomography
A process matrix can always be reconstructed using informationally complete instruments. This is directly analogous to informationally complete measurements [49][50][51]. For this purpose, note that a process matrix W L1...Ln can be formally regarded as a density matrix with different normalisation: where d O = n j=1 d Oj is the product of the dimensions of all output spaces. The matrix (37) is normalised as a quantum state, tr ρ = 1, because process matrices are normalised as tr W = d O [26,42]. It is always possible to find an informationally complete POVM with product elements E L1 x1 ⊗ · · · ⊗ E Ln xn , xj E Lj xj = 1 Lj , such that ρ is a function of the probability distribution P (x 1 , . . . , x n ) = tr E L1 x1 ⊗ · · · ⊗ E Ln xn · ρ L1...Ln . (38) This means that W is also a function of the same distribution. Define now M xj := E xj /d Oj for j = 1, . . . , n. As the sets J j := M xj xj are properly normalised instruments, the statistics (38) can be obtained by implementing such instruments on the original process matrix: Thus, it is possible to reconstruct W by measuring the instruments J 1 , . . . , J n . An example of this procedure is the causal tomography considered in Ref. [22]. Alternatively, one can use the fact that a process matrix defines a CPTP map from all outputs, n j=1 O j , to all inputs, n j=1 I j , and such a CPTP map can be reconstructed using conventional process tomography [52]. Informationally complete testers [25] also allow reconstructing a process matrix, although in general they require a priori knowledge of the causal order between laboratories. In the context of causal discovery, such a knowledge is not available and thus arbitrary testers cannot be used.
Given a process with a Markovian causal explanation, it is in principle possible to gain control over all connecting mechanisms and, through tomography, reconstruct the process matrix of 'wires' (36). This matrix contains full information about the causal structure. However, this may well be unfeasible in practice; the methods discussed below can provide information about causal structure in the absence of full control.

B. Discovery using Hilbert-Schmidt basis
Consider a space of linear operators over a Hilbert space with finite dimension d X , X ≡ L(H X ). A Hilbert-Schmidt (HS) basis is a set of self-adjoint matrices σ Xµ µ d 2 −1 µ=0 with σ 0 = 1, tr σ µ σ ν = d X δ µν . This implies tr σ j = 0 for j ≥ 1. 7 Local operations and process matrices can be expanded in HS bases. For example, a generic CP map for a local laboratory A = A I ⊗ A O decomposes as Following Ref. [26], the following terminology applies.
7 Pauli matrices are an example for d = 2.
• The term proportional to identity, v 00 1, is called of type 1.
• Terms equal to the identity on all subsystems except subsystem X are called of type X. For example, the terms v 0j • Terms equal to the identity on all subsystems except X 1 ⊗ · · · ⊗ X k are called of type X 1 · · · X k . For example, the terms v jl • The sum of terms of different types X 1 , . . . , X r is called of type ( r j=1 X j ). For example, a generic CP map (40) has terms of type (1 + Some simple algebra can be applied to the above notation, such as 1 · X = X and The HS decomposition can be used to characterise correlations and causal relations. For example, given a bipartite process matrix W AB , terms of type A O B I allow signalling from A to B. Terms of type A I B I , on the other hand, are responsible for common cause correlations when POVM elements E A I ⊗ F B I are measured. For a quantum channel T A O B I , the trace-preserving condition, Eq. (17), implies that terms of type A O vanish. Consequently, a process matrix that factorises over a DAG G, Eq. (29), has to satisfy a set of constraints. It is thus possible to test the compatibility of a process matrix with a DAG G by looking for the terms excluded by G: if an excluded term is found, the process matrix is not compatible with G. Note that, although the constraints are formulated with reference to a HS basis, they are in fact basis independent, as any local change of HS basis does not change the type of terms contained in a process matrix. Such conditions can also be reformulated as basis-independent linear constraints, following the methods introduced in [42], but this technique will not be discussed here.

C. Faithfulness
A general process matrix might factorise over more than one DAG. Consider the example, Fig. 13(a), with tr ρ A = tr ρ B = 1. This can be factorised in the following ways (with tensor product symbols understood):   (29). Such terms are said to be implied by G.
The terms implied by a DAG can be readily identified. To each laboratory L j is associated a quantum channel connecting the parent space PS j to the input space I j . Such a channel is given by a matrix that satisfies Eq.
In particular, every edge e ending on a laboratory L j implies terms of type S e I j . Recall, for an edge e connecting a laboratory L k to a laboratory L j , the source space S e is a subsystem of the output space O k . At the price of a slight abuse of terminology, we will refer to terms of type S e I j , with e ∈ out k ∩ in j , simply as terms of type O k I j .
We can thus say that a term O k I j is implied if and only if L k is a parent of L j in the DAG. If G, W is a faithful MQCM, W is said to be faithful to G. A non-faithful causal model is said to be fine tuned. A process matrix W is fine tuned if it only factorises for fine tuned models, while it is faithful if it is faithful to some DAG.
For faithful causal models, the relations between laboratories in a DAG directly correspond to causal relations. In Appendix A, the following theorem is proved.

Theorem 1. Given a faithful causal model G, W for a set of laboratories
The 'only if' part of the theorem is also true for nonfaithful models. This property can be phrased as children are able to screen off the causal influence of their parents to the rest of the world. This means that there exist instruments for the child laboratories that break the causal connections from their common parents to any other laboratory.

D. Discovery of faithful causal structures
The task of causal discovery is to find a causal structure, i.e. a DAG G, which can explain a set of empirical data. Faithful quantum causal models have two important properties in this respect: i) the causal structure of a faithful MQCM can be discovered unambiguously; ii) almost all causal models are faithful, in an appropriate measure-theoretic sense. These points are formalised in the following theorems.

Theorem 2.
If two MQCMs G, W , G , W for the same process matrix W are both faithful, then G = G .
Proof. As G and G have the same set of vertices, the only thing to prove is that they have the same set of edges. This is equivalent to proving that each laboratory has the same set of parents for G and G . The latter is a direct consequence of the fact that W contains a term of type O k I j if and only if L k is a parent of L j , Eq. (45).
Thus, the HS terms contained in a faithful process matrix uniquely identify a causal structure. The HS terms can be fully determined if in each laboratory an informationally complete instrument is available. If only a subset of the HS terms is measured, the observed data can be compatible with more than one faithful causal model. This is analogous to the classical case: if arbitrary interventions are not available, it is not possible to fully characterise causal structure.
The next theorem shows that, given a sufficiently regular prior knowledge on the causal structure of a model, there is vanishing probability that the model is fine-tuned (see Ref. [53] for the classical analogue).

Theorem 3. Given a set of laboratories
, a DAG G = L, E and a nonsingular measure on the space of process matrices that factorise over G, the set of fine-tuned process matrices have measure zero.
Proof. The set of process matrices that factorise over G can be parametrised by m real parameters, where m is the number of HS terms implied by G (excluding the term of type 1, which has a constant coefficient). A nonsingular measure on this set is a measure on R m that is non-singular with respect to the Lebesgue measure. The set of fine-tuned process matrices is defined by having exactly vanishing coefficients for some of the implied HS terms, which is a zero-measure set for every non-singular measure.
Note that there is only a finite number of DAGs with the same vertex set. Thus, a nonsingular measure over the space of causal models for a given set of laboratories decomposes as a finite sum of nonsingular measures for the individual DAGs. This implies the following corollary: , the set fine-tuned causal models has measure zero.
The above results indicate that, given a set of possible causal explanations, a faithful one should be preferred because, unless additional information is available, a vanishing probability is assigned to fine-tuned models.
In the presence of latent laboratories, faithfulness might not be sufficient to single out a unique causal model. In this case, further assumptions would be needed to decide among a set of faithful causal explanations.

V. CLASSICAL LIMIT
It is important that classical causal models can be recovered from quantum ones in an appropriate limit. To ensure we are recovering classical models that can be considered causal, it is necessary to show that these models support identification via classical interventions. Recall, this is the hallmark of the classical causal modelling methodology. As it will be shown in this section, to recover classical causal models it is sufficient that all available local operations can be described as classical interventions. The particular case of classical models conditioned on 'idle' interventions, or equivalently 'purely observational' Bayesian networks, is recovered by further restricting local operations to projective measurements in a fixed basis 8 . It is crucial that all classical causal structures can be recovered in this way. This was not possible, for example, in the framework of Ref. [20], where only a subset of DAGs was recovered in the classical limit.
Quantum systems effectively behave as classical ones in the limit where it is only possible to access and manipulate states in a fixed basis that factorises over separated systems (called pointer basis). Whether this condition is enforced by decoherence [54], collapse models [55], 'fuzzy measurements' [56], or in other ways, will not be discussed here. Rather, it will be shown that, provided the transition to classicality takes place, Markov quantum causal models reduce to classical ones.
For a local laboratory L j = I j ⊗O j in an MQCM, with output space factorised according to the outgoing edges, O j = e∈outj S e , consider CP maps of the form In this interpretation, x j are the observed variables and i j are intervention variables. z j and o j , on the other hand, are latent variables, as they are not observed directly. In order to recover a causal model for the variables x j , without such additional latent variables, it is necessary to further restrict the possible local operations. In particular, the variables o j , responsible for transmitting causal influence out of the laboratory, should be influenced directly only by the observed variables x j (and only indirectly by i j and z j ): Given the above conditions, the statistics generated by a Markov quantum causal model are equivalent to those generated by a classical causal model with the same causal structure. The proof of the following theorem is in Appendix B. obtained from CP maps of the form (46) with probabilities of the form (47), satisfies the causal Markov condition (8) for a DAG G c isomorphic to G. G c has random variables X 1 , . . . , X n as vertices, where X j takes as values the labels x j of the local measurement outcomes.
It is also possible to show that each classical causal model can be reproduced as the classical limit of an MQCM with the same causal structure, i.e. with an isomorphic DAG. Replace each random variable X j with a local laboratory L j = I j ⊗ O j , where I j has a basis element |x j for each value of X j and O j is the tensor product of a copy of I j for each outgoing edge. The process matrix is then with coefficients P (z j |ps j ) = Pr(X j = z j |PA j = ps j , I j = idle) given by the classical model. The 'observed' probability distribution (where no interventions are made, from the perspective of the classical model) is obtained when all instruments are restricted to be projective measurements in the pointer basis: Interventions in the classical model correspond to more general diagonal operations of the form (46). In particular, 'do' interventions are realised by ignoring the input and preparing the chosen value on the output space: The split of classical nodes into input and output reproduces the single node intervention graphs (SWIGs), studied in the context of classical causal models [57].
The result above not only shows that classical causal models can be subsumed by quantum ones, but also provides a direct way to apply classical tools to quantum models. Indeed, classical models are recovered under restrictions on the performed operations. Such restrictions can arise because of uncontrolled factors, such as environmental decoherence, but it is also possible to simply choose to use only instruments compatible with a classical description. Thus, all properties of classical causal models directly extend to quantum ones, conditioned on the choice of classical instruments. For example, in classical causal models, if X 3 is the only common cause of X 1 and X 2 , it follows from the causal Markov condition that X 1 and X 2 are conditionally independent given X 3 : P (X 1 , X 2 |X 3 ) = P (X 1 |X 3 )P (X 2 |X 3 ), where it is understood that such a property holds conditioned on the 'idle' intervention. Such a property still holds for quantum causal models, conditioned on the choice of appropriate instruments: If L 3 is a quantum common cause of L 1 and L 2 , then P (M L1 3 ) for all instruments J L1 1 , . . . satisfying conditions (46), (47).

VI. BEYOND DEFINITE AND ACYCLIC CAUSAL STRUCTURES
The results of the previous section show that the causal structure of a Markov process, represented by a DAG, remains unaltered in the quantum-to-classical transition. Thus, the quantum causal models discussed above are 'quantum' only by virtue of the physics describing the systems that convey causal influence. The causal structure itself is not different to that of classical models.
The formalism, however, can be naturally extended to more general structures. Probabilistic mixtures of causal structures are a natural extension. Owing to the linearity of the generalised Born rule, Eq. (20), these are represented as (51) where the sum is over DAGs G and W (G) is a process matrix that factorises over G.
A process of the form (51) is still compatible with a definite, albeit unknown, causal structure, which can be understood as an objective property that exists independently of the operations performed. The natural next step is to look for causal structures that might be, in some sense, indefinite. That space-time itself might have quantum properties in a theory of quantum gravity motivates this idea [58,59].
An example is the 'quantisation' of the mixture (51). Given a set of laboratories L = {L j = I j ⊗ O j } n j=1 , introduce two additional laboratories, C ≡ C O (with trivial input) and D ≡ D I (with trivial output). Let W (G) = [w (G) ] be a rank-one projector which factorises over the DAG G (this is the case if all mechanisms are unitary matrices or pure-state preparations). A quantum-controlled causal structure is then given by the process matrix where the basis vectors |G range over the set of controlled DAGs. If C prepares a state |G C O , the laboratories in L find themselves in the causal relations dictated by G (and the control system is transferred undisturbed to D). However, if C prepares an arbitrary superposition G ψ G |G C O , and D also measures in a superposition basis, one can expect causal relations that are incompatible with any DAG 9 . Indeed, an example of quantum-controlled causal structure is the quantum switch [60,61], where the above sum ranges over DAGs of type L σ(1) → L σ(2) · · · → L σ(n) , with σ a permutation of n elements. This type of resource can be shown to be incompatible with any definite causal order [42,43] and thus in particular with any DAG. Additionally, the quantum switch can provide an advantage for several tasks [62][63][64] and an experimental proof-of-principle was recently demonstrated [65], showing the potential practical relevance of quantum causal structures.
The process (52) can be interpreted as the superposition of different amplitudes, each corresponding to a directed, acyclic causal structure. It is an interesting question whether more general causal models can also be understood in a similar way. An indication that this might not be the case is the fact that processes of the form (52) cannot violate causal inequalities [42,43] whereas more general processes allowed by the formalism can [26,66]. Intriguingly, this is also true for classical systems: there exist causal models, locally compatible with classical physics, that are incompatible with any causal order or mixture of causal orders [67,68].
The possibility of closed time-like curves (CTCs) in general relativity [69] motivates considering models that contain directed cycles. The process matrix formalism provides a natural framework for studying CTCs as cyclic causal structures, as the possibility of interventions, and thus the causal interpretation of the model, does not rely on the acyclicity property. It is not clear, however, whether indefinite and acyclic causal structures are in fact separate concepts. Furthermore, it is unclear whether and how core notions such as the causal Markov condition and faithfulness should be generalised beyond definite, acyclic causal structures.

VII. CONCLUSIONS
This work has foundational implications insofar as it shows that quantum mechanics has a causal interpretation in a similar manner to classical mechanics. Causeeffect relations are identified with correlations between controlled and observed events, and a causal structure is a set of transformations that define a DAG.
The findings presented here do not reproduce the wealth of results from the literature on classical causal models. However, the surprising similarity between the frameworks suggests that further techniques from classical causal modelling may be applicable to the quantum case. 'Quantising' causal models ought to be a rich and promising field of research.
Causal discovery plays a prominent role in classical machine learning and ought to play a similarly pivotal role in the the emerging field of quantum machine learning [70]. The mathematical formalism for quantum causal discovery introduced here can provide a foundation for further development in this direction.
also contains a term of type O k I h . As such a term can be used for signalling, L k has causal influence over L h for any instruments in the remaining laboratories, thus L k is a direct cause for L h . Conversely, assume that L k is not a parent of L h . Then, every HS term nontrivial on O k must also be nontrivial on an input I j such that L k is a parent of L j (and thus j = h). Let σ O k Ij X be the corresponding element of the HS basis (where X is any additional subsystem on which σ is non-trivial); then, for the CPTP map C Lj = 1 Lj /d Oj , and recalling that HS basis elements have null partial trace for any subsystem in which they are not trivial, which means that the reduced process matrix obtained when L j performs the maximally noisy channel does not contain the HS term O k I j X. If the maximally noisy channel is performed in every laboratory that has L k as a parent, then the resulting reduced process does not contain any term of the form O k X, which means that no signalling from L k is possible in the reduced model. Thus, L k is not a direct cause of any laboratory of which it is not a parent.

Appendix B: Proof of theorem 5
The probabilities (48) are equivalently generated by the diagonal process matrix Using the quantum causal Markov condition (29), the above matrix can be written as a product of CPTP maps diagonal in the pointer basis: where pa j := {x k |(L k , L j ) ∈ E} is the set of outcomes associated with the parents of node j with respect to the original graph G.