Causal Inference using Graphical Models with the R Package pcalg

The pcalg package for R (R Development Core Team (2010))is a tool for estimating intervention eﬀects when the true underlying causal structure is unknown. To this end, pcalg contains functions for estimating the causal structure using graphical models (functions skeleton , pc and fci ) and functions for estimating intervention eﬀects given an estimate of the causal structure (functions ida and idaFast ). In this document, we will give a brief overview of the methodological background of estimating intervention eﬀects. Moreover, we will demonstrate how the package pcalg can be used for estimating intervention eﬀects and graphical models using examples.


Introduction
Understanding cause-effect relationships between variables is of primary interest in many fields of science.Usually, experimental intervention is used to find these relationships.In many settings, however, experiments are infeasible because of time, cost or ethical constraints.
Recently, we proposed and mathematically justified a statistical method (IDA -intervention calculus when a directed acyclic graph is absent) to obtain bounds on total causal effects based on observational data under some assumptions (see Maathuis, Kalisch, and Bühlmann (2009)).Furthermore, we presented an experimental validation of our method on a large-scale biological system (see Maathuis, Colombo, Kalisch, and Bühlmann (2010)).
For further validation and broader use of this method, well documented and easy to use software is indispensable.Therefore, we wrote the R package pcalg, which incorporates the above mentioned method (IDA).
The objective of this paper is to introduce the R package pcalg and explain the range of functions.
To get started, we show how two of the main functions can be used in a typical application.Suppose we have a system described by some variables and many observations of this system.Furthermore, assume that it seems plausible that there are no hidden variables and no feedback loops in the underlying causal system.The causal structure of such a system can be conveniently represented by a directed acyclic graph (DAG), where each node represents a variable and each arrow represents a direct cause.We are interested in the change of variable Y if we changed the variable X by intervention, i.e., we seek the causal effect of X on Y.To fix ideas, we have simulated an example data set with p = 8 continuous variables with Gaussian noise and n = 5000 observations, which we will now analyse.First, we load the package pcalg and the data set.
R> library("pcalg") R> data("gmG") In the next step, we use the function pc to produce an estimate of the underlying causal structure.Since this function is based on conditional independence tests, we need to define two things.First, we need a function that can compute conditional independence tests in a way that is suitable for the data at hand.For standard data types (Gaussian, discrete and binary) we provide predefined functions.See the example section in the help file of pc for more details.Secondly, we need a summary of the data (sufficient statistic) on which the conditional independence function can work.Each conditional independence test can be performed at a certain significance level alpha.This can be treated as a tuning parameter.In the following code, we use the predefined function gaussCItest() as conditional independence test and create the corresponding sufficient statistic, consisting of the correlation matrix of the data and the sample size.Then we use the function pc() to estimate the causal structure and plot the result.
R> suffStat <-list(C = cor(gmG$x), n = nrow(gmG$x)) R> pc.fit <-pc(suffStat, indepTest = gaussCItest, + p = ncol(gmG$x), alpha = 0.01) R> stopifnot(require(Rgraphviz))# needed for all our graph plots R> par(mfrow = c(1,2)) R> plot(gmG$g, main = "") R> plot(pc.fit,main = "") As can be seen in Fig. 1, there are directed and bidirected edges in the estimated causal structure.The directed edges show the presence and direction of direct causal effects.The direction of the bidirected edges, however, could not be decided by our method.Thus, they represent some uncertainty in the resulting model.A fundamental property of estimating DAGs is that some uncertainty of this kind sometimes remains, even if an infinite amount of data is available.
Based on the inferred causal structure, we can estimate the causal effect of an intervention.Denote the variable corresponding to node i in the graph by V i .For example, suppose, we would increase variable V 1 by external intervention by one unit.The recorded increase in variable V 6 is the (total) causal effect of V 1 on V 6 .More precisely, the causal effect If the causal relationships are linear, the causal effect is independent of x.
Since the causal structure was not identified uniquely in our example, we cannot expect to get a unique number for the causal effect.Instead, we will get a set of possible causal effects.This set can be computed by using the function ida().To provide full quantitative information, we need to pass the covariance matrix in addition to the estimated causal structure.
R> ida (1, 6, cov(gmG$x), pc.fit@graph) [1] 0.75364 0.54878 Since we simulated the data, we know that the true value of the causal effect is 0.71.Thus, one of the two estimates is indeed close to the true value.Since both values are larger than zero, we can conclude that variable V 1 has a positive causal effect on variable V 6 .Thus, we can always estimate a lower bound to the absolute value of the causal effect.(Note that at this point we have no p-value to control the sampling error).
If we would like to know the effect of a unit increase in variable V 1 on variables V 4 , V 5 and V 6 , we could simply call ida three times.However, a faster way is to call the function idaFast(), which was tailored for such situations.
R> idaFast (1, c(4,5,6), cov(gmG$x), pc.fit@graph) [,1] [,2] 4 0.01027 0.012014 5 0.23875 0.017935 6 0.75364 0.548776 Each row in the output shows the estimated set of possible causal effects on the target variable indicated by the row names.The true values for the causal effects are 0, 0.2, 0.71 for variables V 4 , V 5 and V 6 , respectively.The first row, corresponding to variable V 4 , quite accurately indicates a causal effect that is very close to zero or no effect at all.The second row of the output, corresponding to variable V 5 , is rather uninformative: Although one entry comes close to the true value, the other estimate is close to zero.Thus, we cannot be sure if there is a causal effect at all.The third row, corresponding to V 6 was already discussed above.

Methodological background
Our proposed method consists of two major steps.In the first step, the causal structure is estimated.This is done by estimating a graphical model.A graphical model is a map of the dependence structure of the data and can thus be an interesting object by itself.In the second step, we use the estimated causal structure and do-calculus (see Pearl (2000)) to calculate bounds on causal effects.

Estimating graphical models
Graphical models can be thought of as maps of dependence structures of a given probability distribution or a sample thereof (see for example Lauritzen (1996)).In order to illustrate the analogy, let us consider a road map.In order to be able to use a road map, one needs two given factors.First, one needs the physical map with symbols such as dots and lines.Second, one needs a rule for interpreting the symbols.For instance, a railroad map and a map for electric circuits might look very much alike, but their interpretation differs a lot.In the same sense, a graphical model is a map.First, a graphical model consists of a graph with dots, lines and potentially edge marks like arrowheads or circles.Second, a graphical model always comes with a rule for interpreting this graph.In general, nodes in the graph represent (random) variables and edges represent some kind of dependence.

Without hidden and selection variables
An example of a graphical model is the DAG model.The physical map here is a graph consisting of nodes and arrows (only one arrowhead per line) connecting the nodes.As a further restriction, the arrows must be directed in a way, so that it is not possible to trace a circle when following the arrowheads.The interpretation rule for this graph is called dseparation.This rule is a bit intricate and we refer the reader to Lauritzen (1996) for more details.The interpretation rule for this graph can be used in the following way when given a DAG model: If two nodes X and Y are d-separated by a set of nodes S, then the corresponding random variables X and Y are conditionally independent given the set of random variables S. For the following, we will only deal with distributions where the following holds: For each distribution, it is possible to find a DAG, whose list of d-separation relations perfectly matches the list of conditional independencies of the distribution.Such distributions are called faithful.It has been shown that the set of distributions that are faithful is the overwhelming majority (Meek (1995)), so that the assumption does not seem to be very strict in practice.
Since the DAG model encodes conditional independences, it seems plausible that information on the latter helps to infer aspects of the former.This intuition is made precise in the PC algorithm (see Spirtes, Glymour, and Scheines (2000); PC stands for the initials of its inventors Peter Spirtes and Clark Glymour) which was proven to reconstruct the structure of the underlying DAG model given a conditional independence oracle up to some ambiguity (equivalence class) to be discussed below.In practice, the conditional independence oracle is replaced by a statistical test for conditional independence.For situations without hidden variables and under some further conditions it has been shown that the PC algorithm using statistical tests instead of an independence oracle is computationally feasible and consistent even for very high-dimensional, sparse DAGs (see Kalisch and Bühlmann (2007)).
It is possible that several DAGs encode the same list of conditional independencies.One can show that such DAGs must share certain properties.To be more precise, we have to define a v-structure as the subgraph i → j ← k on the nodes i, j and k where i and k are not connected.Furthermore, let the skeleton of a DAG be the graph that is obtained by removing all arrowheads from the DAG.It was shown that two DAGs encode the same conditional independence statements if and only if the corresponding DAGs have the same skeleton and the same v-structures (see Verma and Pearl (1991)).Such DAGs are called Markov-equivalent.In this way, the space of DAGs can be partitioned into equivalence classes, where all members of an equivalence class encode the same conditional independence information.Conversely, if given a conditional independence oracle, one can only determine a DAG up to its equivalence class.Therefore, the PC algorithm can not determine the DAG uniquely, but only the corresponding equivalence class of the DAG.Each DAG in this equivalence class would be equally valid to describe the conditional independence information.The equivalence class can be visualized by a graph that has the same skeleton as every DAG in the equivalence class and directed edges only where all DAGs in the equivalence class have the same directed edge.
Arrows that point into one direction for some DAGs in the equivalence class and in the other direction for other DAGs in the equivalence class are visualized by bidirected edges (sometimes, undirected edges are used instead).This graph is called completed partially directed acyclic graph (CPDAG).

Algorithm 1 Outline of the PC-algorithm
Input: Vertex set V, conditional independence information, significance level α Output: Estimated CPDAG Ĝ, separation sets Ŝ Edge types: →, − (P1) Form the complete undirected graph on the vertex set V (P2) Test conditional independence given subsets of adjacency sets at a given significance level α and delete edges if conditional independent (P3) Orient v-structures (P4) Orient remaining edges.
We will now describe the PC-algorithm, which is shown in Algorithm 1, in more detail.The PC-algorithm starts with a complete undirected graph, G 0 , as stated in (P1) of Algorithm 1.In stage (P2), a series of conditional independence tests is done and edges are deleted in the following way.First, all pairs of nodes are tested for marginal independence.If two nodes i and j are judged to be marginally independent at level α, the edge between them is deleted and the empty set is saved as separation sets Ŝ[i, j] and Ŝ [j, i].After all pairs have been tested for marginal independence and some edges might have been removed, a graph results which we denote by G 1 .In the second step, all pairs of nodes (i, j) still adjacent in G 1 are tested for conditional independence given any single node in adj(G 1 , i) \ {j} or adj(G 1 , j) \ {i} (adj(G, i) denotes the set of nodes in graph G that are directly connected to node i by some edge) .If there is any node k that makes i, j conditionally independent, the edge between i and j is removed and node k is saved as separation sets (sepset) Ŝ[i, j] and Ŝ [j, i].If all adjacent pairs have been tested given one neighboring node, a new graph results which we denote by G 2 .The algorithm continues in this way by increasing the size of the conditioning set step by step.The algorithm stops if all neighborhoods in the current graph are smaller than the size of the conditioning set.The result is the skeleton in which every edge is still undirected.Within (P3), each triple of vertices (i, k, j) such that the pairs (i, k) and (j, k) are each adjacent in the skeleton but (i, j) are not, is oriented based on the information saved in the conditioning sets Ŝ[i, j] and Ŝ Finally, in (P4) it may be possible to direct some of the remaining edges, since one can deduce that one of the two possible directions of the edge is invalid because it introduces a new v-structure or a directed cycle.Such edges are found by repeatedly applying rules described in Spirtes et al. (2000), p.85.The resulting output is the equivalence class (CPDAG) that describes the conditional independence information in the data, in which every edge is either undirected (or bidirected, which has the same meaning in the context of the CPDAG) or directed.

With hidden or selection variables
When discovering causal relations from nonexperimental data, two difficulties arise.One is the problem of hidden (or latent) variables: Factors influencing two or more measured variables may not themselves be measured.The other is the problem of selection bias: Values of the variables or features under study may themselves influence whether a unit is included in the data sample.
In the case of hidden or selection variables, one could still visualize the underlying causal structure with a DAG that includes all observed, hidden and selection variables.However, when inferring the DAG from observational data, we do not know all hidden and selection variables.Therefore, it is practically unfeasible to find the true underlying DAG.
A more modest goal could be to find a structure that represents all conditional independence relationships among the observed variables given the selection variables (but not the latent and selection variables) of the underlying causal structure.It turns out that this is possible.However, the resulting object will in general not be a DAG for the following reason.Suppose, we have a DAG including observational, latent and selection variables and we would like to visualize the conditional independencies among the observed variables only.We could marginalize out all latent variables and condition on all selection variables.It turns out that the resulting list of conditional independencies can in general not be represented by a DAG, since DAGs are not closed under marginalization or conditioning and are therefore not suited for representing the conditional independence structure on the observed variables only (see Richardson and Spirtes (2002)).
A class of graphical independence models that is closed under marginalization and conditioning and that contains all DAG models is the class of ancestral graphs.A detailed discussion of this class of graphs can be found in Richardson and Spirtes (2002).In this text, we only give a brief introduction to ancestral graphs.Ancestral graphs have nodes, which represent random variables and edges which represent some kind of dependence.The edges can be either directed, undirected or bidirected (note that in the context of ancestral graphs, undirected and bidirected edges do not mean the same).There are two rules that restrict the direction of edges: 1: If i and j are joined by an edge with an arrowhead at i, then there is no directed path from i to j.
2: There are no arrowheads present at a vertex which is an endpoint of an undirected edge.
Maximal ancestral graphs (MAG), which we will use from now on, also obey a third rule: 3: Every missing edge corresponds to a conditional independence.
The conditional independence statements of MAGs can be read off using the concept of mseparation, which is almost identical to the concept of d-separation in DAGs.Furthermore, part of the causal information in the underlying DAG is represented in the MAG.If in the MAG there is an edge between node i and node j with an arrowhead at node i, then there is no directed path from node i to node j nor to any of the selection variable in the underlying DAG.If, on the other hand, there is a tail at node i, then there is a directed path from node i to node j or to one of the selection variables in the underlying DAG.
Recall that finding a unique DAG from an independence oracle is in general impossible.Therefore, one only reports on the equivalence class of DAGs in which the true DAG must lie.The equivalence class is visualized using a CPDAG.The same is true for MAGs: Finding a unique MAG from an independence oracle is in general impossible.One only reports on the equivalence class in which the true MAG lies.
An equivalence class of a MAG can be uniquely represented by a Partial ancestral graph (PAG) (see Zhang (2008)).A PAG contains the following types of edges: o-o, o-, o->, ->, <->, -.Roughly, the bidirected edges come from hidden variables, and the undirected edges come from selection variables.The edges have the following interpretation: (i) There is an edge between x and y if and only if x and y are conditionally dependent given S for all sets S consisting of all selection variables and a subset of the observed variables; (ii) a tail on an edge means that this tail is present in all MAGs in the equivalence class; (iii) an arrowhead on an edge means that this arrowhead is present in all MAGs in the equivalence class; (iv) a o-edgemark means that there is a at least one MAG in the equivalence class where the edgemark is a tail, and at least one where the edgemark is an arrowhead.
An algorithm for finding the PAG given an independence oracle is the FCI algorithm (see Spirtes et al. (2000)).This algorithm was slightly extended and proven to be sound and complete in Zhang (2008).FCI is very similar to PC but makes additional conditional independence tests and uses more orientation rules (see section 3.3 for more details).We refer the reader to Zhang (2008) for a detailed discussion of the FCI algorithm.

Estimating bounds on causal effects
One way of quantifying the causal effect of variable X on Y is to measure the state of Y if X is forced to take value X = x and compare this to the value of Y if X is forced to take the value X = x+1 or X = x+δ.If X and Y are random variables, forcing X = x could have the effect of changing the distribution of Y .Following the conventions in Pearl (2000), the resulting distribution after manipulation is denoted by P To illustrate this, imagine the following simplistic situation.Suppose we observe every hour a particular spot on the street.The random variable X denotes whether it rained during that hour (X = 1 if it rained, X = 0 otherwise).The random variable Y denotes whether the street was wet or damp at the end of that hour we looked (Y = 1 if it was wet, Y = 0 otherwise).If we assume P (X = 1) = 0.1 (rather dry region), P (Y = 1|X = 1) = 0.99 (the street is almost always still wet when we look after and it rained during the last hour) and P (Y = 1|X = 0) = 0.02 (other reasons for making the street wet are rare), we can compute the conditional probability P (X = 1|Y = 1) = 0.85.So, if we observe the street to be wet, the probability that there was rain in the last hour is about 0.85.However, if we take a garden hose and force the street to be wet at a randomly chosen hour, we get P (X = 1|do(Y = 1)) = P (X = 1) = 0.1.Thus, the distribution of the random variable describing rain is quite different when making an observation and making an intervention.Oftentimes, only the change of the target distribution under intervention is reported.We will use the change in mean, i.e. ∂ ∂x E[Y |do(X = x)], as measure for the causal effect for Gaussian random variables.For Gaussian random variable Y , E[Y |do(X = x)] depends linearly on x.Therefore, the derivative is constant which means that the causal effect does not depend on x.For binary random variables (with domain {0, 1}) we use The goal in the remaining section is to estimate the effect of an intervention if only observational data is available.

Without hidden and selection variables
If the causal structure is a known DAG and there are no hidden and selection variables, Pearl (2000) (Th 3.4.1)suggested a set of inference rules known as "do-calculus" whose application transforms an expression involving a "do" into an expression involving only conditional distributions.Thus, information on the interventional distribution can be obtained by using information obtained by observations and knowledge of the underlying causal structure.Shpitser and Pearl (2008) restructured the rules of "do-calculus" into algorithmic form and showed that they are complete in a sense that, if the algorithm doesn't find a transformation, there exists none.
Unfortunately, the causal structure is rarely known in practice.Under some assumptions, Pearl (2000) showed (Th.1.4.1) that there is a link between causal structures and graphical models.Roughly speaking, if the underlying causal structure is a DAG, we observe data generated from this DAG and then estimate a DAG model (i.e. a graphical model) on this data, the estimated CPDAG represents the equivalence class of the DAG model describing the causal structure.This holds if we have enough samples and assuming that the true underlying causal structure is indeed a DAG without latent or selection variables.Note that even given an infinite amount of data, we usually cannot identify the true DAG itself, but only its equivalence class.Every DAG in this equivalence class is equally likely to represent the true causal structure.Taking this into account, we conceptually apply the do-calculus on each DAG within the equivalence class and thus obtain a possible causal effect for each DAG in the equivalence class (in practice, we developed a local method that is faster but yields a similar result; see 3.4 for more details).Therefore, even if we have an infinite amount of observations we can only report on a multiset of possible causal values (it is a multiset rather than a set because it can contain duplicate values).One of these values is the true causal effect.Despite the inherent ambiguity, this result can oftentimes still be very useful, when the set has certain properties (e.g.all values are much larger than zero).
In addition to this fundamental limitation in estimating a causal effect, errors due to finite sample size blur the result as with every statistical method.Thus, in a real world situation, we only get an estimate of the set of possible causal values.It was shown that this estimate is consistent under some assumptions by Maathuis et al. (2009).The method was termed IDA.
It has recently been shown empirically that despite the described fundamental limitations in identifying the causal effect uniquely and potential violations of the underlying assumptions, the method performs well in identifying the most important causal effects in a highdimensional yeast gene expression data set (see Maathuis et al. (2010)).
At the moment, we can not yet estimate causal effects when hidden variables or selection variables are present.

Summary of assumptions
For all proposed methods, we assume that the data is faithful to the unknown underlying causal DAG.For the individual methods, further assumptions are made.

PC-algorithm:
No hidden or selection variables; consistent in high-dimensional settings (the number of variables grows with the sample size) if the underlying DAG is sparse, the data is multivariate Normal and satisfies some regularity conditions on the partial correlations, and α is taken to zero appropriately.See Kalisch and Bühlmann (2007) for full details.Consistency in a standard asymptotic regime with a fixed number of variables follows as a special case.
FCI-algorithm: Consistent in high-dimensional settings if the underlying MAG is sparse (in a stronger sense than what is needed for PC), the data is multivariate Normal and satisfies some regularity conditions on the partial correlations, and α is taken to zero appropriately.See Colombo, Maathuis, Kalisch, and Richardson (2010) for full details.Consistency in a standard asymptotic regime with a fixed number of variables follows as a special case.
IDA: No hidden or selection variables; all conditional expectations are linear; consistent in high-dimensional settings if the underlying DAG is sparse, the data is multivariate Normal and satisfies some regularity conditions on the partial correlations and conditional variances, and α is taken to zero appropriately.See Maathuis et al. (2009) for full details.

Package pcalg
This package has three goals.First, it is intended to provide fast, flexible and reliable implementations of the PC and FCI algorithms.Second, it is intended to provide a tool for estimating the causal effect among continuous variables given a causal structure using a form of the do-calculus known as back-door criterium.Finally, combining these two applications of the package, it is possible to estimate causal effects when no causal structure is known and hidden or selection variables are absent.In the following, we describe the main functions of our package for achieving these goals.The functions skeleton, pc and fci are intended for estimating graphical models.The functions ida and idaFast are intended for estimating causal effects when the causal structure is known or was estimated.
Alternatives to this package for estimating graphical models in R include: bnlearn, deal, gRain, gRbase and gRc.

Skeleton
The function skeleton() implements (P1) and (P2) from Algorithm 1.The function can be called with the following arguments skeleton(suffStat, indepTest, p, alpha, verbose = FALSE, fixedGaps = NULL, fixedEdges = NULL, NAdelete = TRUE, m.max = Inf) As was shown in section 2.1, the main task in finding the skeleton is to compute and test several conditional independencies.To keep the function flexible, skeleton takes as argument a function indepTest() that performs these conditional independence tests and returns a pvalue.All information that is needed in the conditional independence test can be passed in the argument suffStat.The only exceptions are the number of variables p and the significance level α for the conditional independence tests, which are passed separately.For convenience, we have preprogrammed versions of indepTest for Gaussian data (gaussCItest), discrete data (disCItest), and binary data (binCItest).Each of these independence test functions needs different arguments as input, described in the respective help files.For example, when using gaussCItest, the input has to be a list containing the correlation matrix and the sample size of the data.In the following code, we estimate the skeleton on the data set gmG (which consists of p = 8 variables and n = 5000 samples) and plot the results.The estimated skeleton and the true underlying DAG are shown in Fig. 2.
To give another example, we show how to fit a skeleton to the example data set gmD (which consists of p = 5 discrete variables with 3, 2, 3, 4 and 2 levels and n = 10000 samples).The predefined test function disCItest() is based on the G 2 statistic and takes as input a list containing the data matrix, a vector specifying the number of levels for each variable and an option which indicates if the degrees of freedom must be lowered by one for each zero count.Finally, we plot the result.The estimated skeleton and the true underlying DAG are shown in Fig. 3.
The information in fixedGaps and fixedEdges is used as follows.The gaps given in fixedGaps are introduced in the very beginning of the algorithm by removing the corresponding edges from the complete undirected graph.Thus, these edges are guaranteed to be absent in the resulting graph.Pairs (i, j) in fixedEdges are skipped in all steps of the algorithm, so that these edges are guaranteed to be present in the resulting graph.
If indepTest returns NA and the option NAdelete is TRUE, the corresponding edge is deleted.If this option is FALSE, the edge is not deleted.
The argument m.max is the maximal size of the conditioning sets that are considered in the conditional independence tests in (P2) of Algorithm 1.
Throughout, the function works with the column positions of the variables in the adjacency matrix, and not with the names of the variables.
Sampling errors (or hidden variables) can lead to conflicting information about edge directions.Sampling errors or hidden variables can also lead to invalid CPDAGs, meaning that there does not exist a DAG that has the same skeleton and v-structures as the graph found by the algorithm.An example of this is an undirected cycle consisting of the edges a − b − c − d and d − a.In this case it is impossible to direct the edges without creating a cycle or a new vstructure.The optional argument u2pd specifies what should be done in such a situation.If it is set to "relaxed", the algorithm simply outputs the invalid CPDAG.If u2pd is set to "rand", all direction information is discarded and a random DAG is generated on the skeleton.The corresponding CPDAG is then returned.If u2pd is set to "retry", up to 100 combinations of possible directions of the ambiguous edges are tried, and the first combination that results in a valid CPDAG is chosen.If no valid combination is found, an arbitrary CPDAG is generated on the skeleton as with u2pd = "rand".
By setting the argument conservative = TRUE, the conservative PC algorithm is chosen.The conservative PC is a slight variation of the PC algorithm and is intended to be more robust against sampling errors.After the skeleton is computed, all potential v-structures a − b − c are checked in the following way.We test whether a and c are conditionally independent given any subset of the neighbors of a or any subset of the neighbors of c.If b is in no such conditioning set (and not in the original sepset) or in all such conditioning sets (and in the original sepset), no further action is taken and the usual PC is continued.If, however, b is in only some conditioning sets, the triple a−b−c is marked as "unfaithful".If in the conservative step there is no subset among the neighbors that makes a and c conditionally independent, the triple is also marked as "unfaithful".An unfaithful triple is not oriented as a v-structure.Furthermore, no later orientation rule that needs to know whether a − b − c is a v-structure or not is applied.(For more details, see the help file of the internal function pc.cons.intern()which is called with the argument version.unf= c(2,2).) As with the skeleton, the PC algorithm works with the column positions of the variables in the adjacency matrix, and not with the names of the variables.When plotting the object, undirected and bidirected edges are equivalent.

fci
This function implements a generalization of the PC algorithm (see section 3.2), in the sense that it allows arbitrarily many latent and selection variables.Under the assumption that the data are faithful to a DAG that includes all latent and selection variables, the FCI algorithm (Fast Causal Inference algorithm) estimates the equivalence class of MAGs that describe the conditional independence relationships between the observed variables given the selection variables.
The first part of the FCI algorithm is analogous to the PC algorithm.It starts with a complete undirected graph and estimates an initial skeleton using the function skeleton().All edges of this skeleton are of the form o-o. Now, the v-structures are directed as in the PC algorithm.Due to the presence of hidden variables, it is no longer sufficient to consider only subsets of the neighborhoods of nodes x and y to decide whether the edge x − y should be removed.Therefore, the initial skeleton may contain some superfluous edges.These edges are removed in the next step of the algorithm.To decide whether edge x o-o y should be removed, one computes Possible-D-SEP(x) and Possible-D-SEP(y) and performs conditional independence tests of x and y given all possible subsets of Possible-D-SEP(x) and of Possible-D-SEP(y) (see helpfile of function pdsep()).Subsequently, all edges are transformed into o-o again and the v-structures are determined (using information in sepset).Finally, as many undetermined edge marks (o) as possible are determined using (a subset of) the 10 orientation rules given by Zhang (2008).
The option doPdsep indicates whether Possible-D-SEP should be computed for all nodes, and all subsets of Possible-D-SEP are considered as conditioning sets in the conditional independence tests.If FALSE, Possible-D-SEP is not computed, so that the algorithm simplifies to the Modified PC algorithm of Spirtes et al. (2000).
The argument conservative can be used to invoke the conservative FCI which consists of two parts.In the first part (done if conservative [1] is TRUE), we call the internal function pc.cons.internal()with argument version.unf= c(1,2) after computing the skeleton.This is a slight variation of the conservative PC (which used version.unf= c(2,2)): If a is independent of c given some S in the skeleton (i.e., the edge a − c dropped out), but a and c remain dependent given all subsets of neighbors of either a or c, we will call a − b − c "faithful".This is because in the FCI algorithm, the true separating set might be outside the neighborhoods of a or c.In the second part (done if conservative[2] is TRUE), we call pc.cons.internal(..., version.unf= c(1,2)) again after Possible-D-Sep was found and the graph potentially lost some edges.Therefore, new triples might have occurred.If this second part is done, the resulting information on sepset and faithful triples overwrites the previous and will be used for the subsequent orientation rules.
By setting the argument biCC = TRUE, only nodes on paths between a and c are considered to be in sepset(a,c).This method uses biconnected components to find all paths between nodes a and c.
The argument cons.rulesmanages the way in which the information about unfaithful triples affects the orientation rules for directing edges.If cons.rules= TRUE, an orientation rule that needs information on definite non-colliders is only applied, if the corresponding subgraph relevant for the rule does not involve an unfaithful triple.Using the argument labels, one can pass names for the vertices of the estimated graph.By default, this argument is set to NA, in which case the node names as.character(1:p) are used.As an example, we estimate the PAG of a graph with five nodes using the function fci() and the predefined function gaussCItest() as conditional independence test.Finally, we plot the result.Node V 1 is latent.R> data("gmL") R> suffStat1 <-list(C = cor(gmL$x), n = nrow(gmL$x)) R> pag.est <-fci(suffStat1, indepTest= gaussCItest, + p = ncol(gmL$x), alpha = 0.01, labels = as.character(2:5))R> par(mfrow = c(1,2)) R> plot(gmL$g, main = "") R> plot(pag.est)A note on implementation: As pc() and fci() are similar in the result they produce, their resulting values are of (S4) class pcAlgo and fciAlgo respectively, both of which extend the class (of their "communalities"), gAlgo.

ida
It is assumed that we have observational data that are multivariate Gaussian and faithful to the true (but unknown) underlying causal DAG (without hidden variables).Under these assumptions, ida estimates the multiset of possible total causal effects of x on y, where the total causal effect is defined via Pearl's do-calculus as ∂ ∂z E[Y |do(X = z)] (this value does not depend on z, since Gaussianity implies that conditional expectations are linear).We estimate a multiset of possible total causal effects instead of the unique total causal effect, since it is typically impossible to identify the latter when the true underlying causal DAG is unknown (even with an infinite amount of data).
To illustrate this, consider the following example.We have data from seven Gaussian variables with a causal structure given on the left of Fig. 6.We assume that the causal structure is unknown and want to infer the causal effect of V 2 on V 5 .First, we estimate the equivalence class of DAGs that describe the conditional independence relationships in the data, using the function pc() (see section 3.2).R> data("gmI") R> suffStat <-list(C = cor(gmI$x), n = nrow(gmI$x)) R> pc.fit <-pc(suffStat, indepTest=gaussCItest, + p = ncol(gmI$x), alpha = 0.01) Comparing the true DAG with the CPDAG in Fig. 6, we see that the CPDAG and the true DAG have the same skeleton.Moreover, the directed edges in the CPDAG are also directed in that way in the true DAG.Three edges in the CPDAG are bidirected.Recall that undirected and bidirected edges bear the same meaning in a CPDAG, so they can be used interchangeably.Since there are three undirected edges in the CPDAG, there might be up to 2 3 = 8 DAGs in the corresponding equivalence class.However, the undirected edges V 2 −V 3 −V 5 can be formed to a new v-structure.As mentioned in section 2.1, DAGs in the equivalence class must have exactly the same v-structures as the corresponding CPDAG.Thus, V 2 − V 3 − V 5 can only be directed as and not as V 2 → V 3 ← V 5 , since this would introduce a new colliding v-structure.There is only one remaining undirected edge (V 1 − V 6 ), which can be directed in two ways.Thus, there are six valid DAGs (i.e. they have no new v-structure and no directed cycle) and these form the equivalence class represented by the CPDAG.In Fig. 7, all DAGs in the equivalence class are shown.DAG 6 is the true DAG.
For each DAG G in the equivalence class, we apply Pearl's do-calculus to estimate the total causal effect of x on y.Since we assume Gaussianity, this can be done via a simple linear regression: If y is not a parent of x, we take the regression coefficient of x in the regression lm(y ~x + pa(x)), where pa(x) denotes the parents of x in the DAG G; if y is a parent of x in G, we set the estimated causal effect to zero.
If the equivalence class contains k DAGs, this will yield k estimated total causal effects.Since we do not know which DAG is the true causal DAG, we do not know which estimated total causal effect of x on y is the correct one.Therefore, we return the entire multiset of k estimated effects.
In our example, there are six DAGs in the equivalence class.Therefore, the function ida() (with method = "global") will produce six possible values of causal effects, one for each DAG.
R> ida(2, 5, cov(gmI$x), pc.fit@graph, method = "global", verbose = FALSE) [1] -0.0049012 -0.0049012 0.5421360 -0.0049012 -0.0049012 0.5421360 Among these six values, there are only two unique values: −0.0049 and 0.5421.This is because we compute lm(V5 ~V2 + pa(V2)) for each DAG and report the regression coefficient of V 2 .Note that there are only two possible parent sets of V 2 in all six DAGs: In DAGs 3 and 6, there are no parents of V 2 .In DAGs 1, 2, 4 and 5, however, the parent of V 2 is V 3 .Thus, exactly the same regressions are computed for DAGs 3 and 6, and the same regressions are computed for DAGs 1, 2, 4 and 5. Therefore, we end up with two unique values, one of which occurs twice, while the other occurs four times.
Since the data was simulated from a model, we know that the true value of the causal effect of V 2 on V 5 is 0.5529.Thus, one of the two unique values is indeed very close to the true causal effect (the slight discrepancy is due to sampling error).
The function ida() can be called as ida(x.pos,y.pos, mcov, graphEst, method = "local", y.notparent = FALSE, verbose = FALSE, all.dags= NA) where x.pos denotes the column position of the cause variable, y.pos denotes the column position of the effect variable, mcov is the covariance matrix of the original data, and graphEst is a graph object describing the causal structure (this could be given by experts or estimated by pc()).
If method="global", the method is carried out as described above, where all DAGs in the equivalence class of the estimated CPDAG are computed.This method is suitable for small graphs (say, up to 10 nodes).The DAGs can (but need not) be precomputed using the function allDags() and passed via argument all.dags.
If method="local", we do not determine all DAGs in the equivalence class of the CPDAG.Instead, we only consider the local neighborhood of x in the CPDAG.In particular, we consider all possible directions of undirected edges that have x as endpoint, such that no new v-structure is created.For each such configuration, we estimate the total causal effect of x on y as above, using linear regression.
At first sight, it is not clear that such a local configuration corresponds to a DAG in the equivalence class of the CPDAG, since it may be impossible to direct the remaining undirected edges without creating a directed cycle or a v-structure.However, Maathuis et al. (2009) showed that there is at least one DAG in the equivalence class for each such local configuration.As a result, it follows that the multisets of total causal effects of the global and the local method have the same unique values.They may, however, have different multiplicities.
Recall, that in the example using the global method, we obtained two unique values with multiplicities two and four yielding six numbers in total.Applying the local method, we obtain the same unique values, but the multiplicities are 1 for both values.
R> ida(2,5, cov(gmI$x), pc.fit@graph, method = "local") [1] 0.5421360 -0.0049012One can take summary measures of the multiset.For example, the minimum absolute value provides a lower bound on the size of the true causal effect: If the minimum absolute value of all values in the multiset is larger than one, then we know that the size of the absolute true causal effect (up to sampling error) must be larger than one.The fact that the unique values of the multisets of the global and local method are identical implies that summary measures of the multiset that only depend on the unique values (such as the minimum absolute value) can be found using the local method.
In some applications, it is clear that some variable is definitively not an effect but it might be a cause.Consider for example a bacterium producing a certain substance, taking the amount of produced substance as response variable.Knocking out genes in the bacterium might change the ability to produce the substance.By measuring the expression levels of genes, we want to know which genes have a causal effect on the product.In this setting, it is clear that the amount of substance is the effect and the activity of the genes is the cause.Thus in the causal structure, the response variable cannot be a parent node of any variable describing the expression level of genes.This background knowledge can be easily incorporated: By setting the option y.notparent = TRUE, all edges in the CPDAG connected to the response variable (no matter whether directed or undirected) are overwritten so that they are directed towards the response variable at the end.

idaFast
In some applications it is desirable to estimate the causal effect of one variable on a set of response variables.In these situations, the function idaFast() should be used.Imagine for example, that we have data on several variables, that we have no background knowledge about the causal effects among the variables and that we want to estimate the causal effect of each variable onto each other variable.To this end, we could consider for each variable the problem: What is the causal effect of this variable on all other variables.Of course, one could solve the problem by using ida on each pair of variables.However, there is a more efficient way which uses the fact that a linear regression of a fixed set of explanatory variables on several different response variables can be computed very efficiently.
The function idaFast() can be called with the following arguments idaFast(x.pos,y.pos.set,mcov, graphEst) The arguments x.pos, mcov, graphEst have the same meaning as the corresponding arguments in ida.The argument y.pos.set is a vector containing the column positions of all response variables of interest.
6 -0.0058532 -0.0062310 7 1.0086138 0.8970766 3.6.Using a user specific conditional independence test In some cases it might be desirable to use a user specific conditional independence test instead of the provided ones.The pcalg package allows the use of any conditional independence test defined by the user.In this section, we illustrate this feature by using a conditional independence test for Gaussian data that is not predefined in the package.
The functions skeleton, pc and fci all need the argument indepTest, a function of the form indepTest (x, y, S, suffStat) to test conditional independence relationships.This function must return the p-value of the conditional independence test of x and y given S and some information on the data in the form of a sufficient statistic (this might be simply the data frame with the original data), where x, y, S indicate column positions of the original data matrix.We will show an example that illustrates how to construct such a function.
A simple way to compute the partial correlation of x and y given S for some data is to solve the two associated linear regression problems x ∼ S and y ∼ S, get the residuals, and calculate the correlation between the residuals.Finally, a correlation test between the residuals yields a p-value that can be returned.The argument suffStat is an arbitrary object containing several pieces of information that are all used within the function to produce the p-value.In the predefined function gaussCItest() for example, a list containing the correlation matrix and the number of observations is passed.This has the advantage that any favorite (e.g.robust) method of computing the correlation matrix can be used before partial correlations are computed.Oftentimes, however, it will suffice to just pass the complete data set in suffStat.We will choose this simple method in our example.An implementation of the function myCItest() could look like this.R> pc.myfit <-pc(suffStat = gmG$x, indepTest = myCItest, p = 8, alpha = 0.01) R> par(mfrow = c(1,2)) R> plot(pc.fit,main = "") R> plot(pc.myfit,main = "") As expected, the resulting CPDAG (see Fig. 8) is the same as in section 3.2 where we used the function gaussCItest() as conditional independence test.Note however that using gaussCItest is considerably faster than using myCItest (on our computer 0.059 seconds

Figure 1 :
Figure 1: True (left) and estimated (right) causal structure represented by a DAG.

Figure 2 :
Figure 2: True DAG (left) and estimated skeleton (right) fitted on the Gaussian data set provided by the package.

Figure 3 :
Figure 3: True DAG (left) and estimated skeleton (right) fitted on the discrete data set provided by the package.
For example, one may find that a − b − c and b − c − d should both be directed as v-structures.This gives conflicting information about the edge b − c, since it should be directed as b ← c in v-structure a → b ← c, while it should be directed as b → c in v-structure b → c ← d.In such cases, we simply overwrite the directions of the conflicting edge.In the example above this means that we obtain a → b → c ← d if a − b − c was visited first, and a → b ← c ← d if b − c − d was visited first.

Figure 4 :
Figure 4: True DAG (left) and estimated CPDAG (right) fitted on the Gaussian data set provided by the package.

Figure 5 :
Figure 5: True data generating DAG (left) and estimated PAG (right).Variable 1 of the data generating DAG is latent.

Figure 8 :
Figure 8: The estimated CPDAGs using the predefined (left) and the user specified (right) conditional independence test are identical.