Multilevel testing of constraints induced by structural equation modeling in fMRI effective connectivity analysis: A proof of concept

In functional MRI (fMRI), effective connectivity analysis aims at inferring the causal influences that brain regions exert on one another. A common method for this type of analysis is structural equation modeling (SEM). We here propose a novel method to test the validity of a given model of structural equation. Given a structural model in the form of a directed graph, the method extracts the set of all constraints of conditional independence induced by the absence of links between pairs of regions in the model and tests for their validity in a Bayesian framework, either individually (constraint by constraint), jointly (e.g., by gathering all constraints associated with a given missing link), or globally (i.e., all constraints associated with the structural model). This approach has two main advantages. First, it only tests what is testable from observational data and does allow for false causal interpretation. Second, it makes it possible to test each constraint (or group of constraints) separately and, therefore, quantify in what measure each constraint (or, e..g., missing link) is respected in the data. We validate our approach using a simulation study and illustrate its potential benefits through the reanalysis of published data.


Introduction
A key concept in investigating functional brain interactions in blood oxygen level dependent (BOLD) functional magnetic resonance imaging (fMRI) is effective connectivity, which has been defined as the influence that regions exert on one another (Friston, 1994).There are various methods to deal with effective connectivity in fMRI, including, but not restricted to, structural equation modeling (SEM) (McIntosh and Gonzalez-Lima, 1994;Gonzalez-Lima and McIntosh, 1995;Büchel and Friston, 1997;Bullmore et al., 2000;James et al., 2009;Ramsey et al., 2010), dynamic causal modeling (Friston et al., 2003;Stephan et al., 2007;Daunizeau et al., 2011;Lohmann et al., 2012;Friston et al., 2017) and Granger causality (Goebel et al., 2003;Roebroeck et al., 2005;Hu et al., 2011;Barrett and Barnett, 2013;Davey et al., 2013); for partial reviews, see, e.g., Valdes-Sosa et al. (2011); Stephan and Roebroeck (2012); Bielczyk et al. (2019); Jovellar and Doudet (2019).We here focus on SEM (Bollen, 1989;Hoyle, 2012;Civelek, 2018;Davvetas et al., 2020), which relies on expressing the time course y i (t) of one region i as a linear function of the time course of other regions, y i (t) = j̸ =i λ ij y q (t) + e i (t). (1) The model parameters (λ ij ) are then estimated by optimization of a global cost function that usually depends on the data through the sample covariance matrix only (Cudeck et al., 1993;Bullmore et al., 2000).While of great practical interest, blind extraction of the model's structure from data quickly becomes intractable as the number of regions increases.Methods have been proposed that provide heuristics to explore the set of potential models and provides a "best fit" according to some (also global) metric (Bullmore et al., 2000;Ramsey et al., 2010).
There are two main issues with most methods that have been proposed so far.First, they put the emphasis on links (arrows).By contrast, given observational data, a structural model can only be defined in terms of its constraints of conditional independence, which characterize the missing links.It is thus not the links themselves but these missing links that are characteristic of a structural model in the face of data.As a consequence, it can happen that different models (i.e., with different sets of links) generate the same sets of constraints of conditional independence.These models, called observationally equivalent (Stelzl, 1986;Verma and Pearl, 1990;Mayekawa, 1994;Shipley, 2000b, Section 8.11;Pearl, 2001a, Section 5.2.3), have identical values of data fit regardless of the measure and, therefore, cannot be distinguished from data alone.This means that methods dealing with links do not compare models but classes of (observationally equivalent) models.Application of the main methods of structural equation modeling without being aware of this major feature might give the neuroscientist the false impression that a unique optimal causal model has been found and lead to an erroneous physiological interpretation of such causality.
Another issue is that most methods propose models based on global measures.They therefore base their decision on how a given model globally fits the data.The larger the number of links, the more complex the search space and the smaller the weigh of each link into these global schemes.By contrast, one is often interested in a very specific link, set of links, or pathway and, for instance, the influence of a given task on it.Even in a more general setting, it would still be of interest to be able to test different parts of a model separately, in order to determine where the structural model can be trusted and where it could be improved.
In the present paper, we propose a novel method that avoids the two above-mentioned issues.Given a structural model, it extracts the set of all constraints of conditional independence induced by the absence of links between pairs of regions in the model and tests for their validity in a Bayesian framework, either individually (constraint by constraint), jointly (e.g., by gathering all constraints associated with a given missing link), or even globally (i.e., all constraints associated with a structural model).In other words, we use the relevance of the constraints associated with a structural model as a proxy for the relevance of the structural model itself.With such an approach, observationally equivalent models have the same sets of conditional independence constraints and, therefore, the same set of constraints is tested.Also, it makes it possible to test each constraint (or group of constraints) separately and, therefore, quantify in what measure each constraint (or, e.g., missing link) is respected in the data.This feature gives the neuroscientist a key feedback as to which assumptions underlying the model can be deemed correct and which ones seem to contradict the data.We validate our approach on synthetic data and illustrate its behavior by a reanalysis of published experimental data, showing its potential added value.
We illustrate the approach on a dataset and two structural models originating from Bullmore et al. (2000).This example has several advantages.It includes a limited number of regions and therefore allows for an exhaustive description of the procedure.Also, it deals with cyclic graphs, which are common in neuroscience/neuroimaging while being quite challenging from a graphtheoretic perspective.Finally, it has already been investigated using other approaches, including SEM (Bullmore et al., 2000) and partial correlation (Marrelec et al., 2007;Marrelec and Benali, 2009), giving us a good view of the informative content and limits of the model and data.We use this study to (i) exemplify the extraction of conditional independence constraints from real-life SEMs; (ii) provide a generative model for our synthetic data; and (iii) reanalyze their experimental data.
The outline of the manuscript is the following.In Section 2, we introduce and develop the method.In Section 3, we assess its validity using a simulation study.In Section 4, we reanalyze the experimental data from Bullmore et al. (2000).Further issues are discussed in Section 5.

Method
In this section, we introduce the general frameworks of directed graphs (Section 2.1) and graphical representation of SEMs (Section 2.2).We then delve into the extraction of individual conditional independence constraints, first stating the general procedure (Section 2.3), then applying it to the two structural models of Bullmore et al. (2000) (Section 2.4).In the particular case of multivariate normal distributions, we show that the contraints have a simple expression in terms of conditional correlation coefficients (Section 2.5).Finally, we provide a numerical Bayesian inference procedure to assess the validity of individual constraints and show how it can be used to also test joint and global constraints (Section 2.6).

Directed graphs
SEMs are often represented in the form of directed graphs and the method that we propose strongly relies on such a representation.In the present section, we quickly introduce the basic notions relative to the theory of directed graphs.For more information, the interested reader can refer to Shipley (2000a) or Pearl (2001b).
A graph is defined by the set of its vertices (or nodes)-here brain regions-, R, and the set of its edges, or arrows, A. Edges stand for effective connections of the type i → j from brain region i to brain region j.A path from region i to region j is a sequence of regions such that two consecutive regions are connected by an edge.The path is said to be directed if it is possible to go from region i to region j only by following arrows (i.e., from tail to head); it is undirected if the direction of arrows is ignored.j is a descendant of i if there exists a directed path from i to j; in that case, we also say that i is an ancestor of j.A collider c is a region that has arrows pointing to it, i.e., forming a pattern of the form i → c ← j.For an application of this terminology to a toy example, see Figure 1.
A directed graph is acyclic if no directed path connects a vertex to itself; otherwise it is cyclic.The theoretical properties of acyclic graphs are much better understood than those of cyclic graphs.However, models originating from neuroscience are quite often cyclic, so both types of graphs need to be taken into account.
A key concept in directed graphs is d-separation (Pearl, 1988), which itself relies on the notion of path blocking.A path between regions i and j is blocked given a set of regions S if there is a region k on the path for which one of two conditions holds: • k is a collider (i.e., → k ←) and neither k nor any of its descendants is in S.
Two regions i and j are then said to be d-separated given a set of regions S, denoted dsep(i, j|S), if and only if all paths between regions i and j are blocked by S. d-separation is a formal way of describing whether S can block the flow of information between regions.An illustration is given in Figure 1.

Graphical representation of SEMs
As mentioned earlier, directed graphs are a convenient and powerful way to represent SEMs.Assume that the state of every region i in a network R of R regions is quantified by a variable y i and that the global, R-dimensional variable y = (y 1 , . . ., y R ), describing the state of the whole Figure 1: Example of directed graph and SEM.Top: Directed acyclic graph.In this example, we have R = {1, 2, 3, 4, 5, 6} and A = {(1, 2), (1, 3), (2, 4), (3, 4), (4, 5)}. 1 → 2 → 4 → 5 is a directed path from 1 to 5, 1 → 2 → 4 ← 3 an undirected path between 1 and 3. 4 and 5 are descendant of 2. 1, 2, and 3 are ancestors of 4. 4 is a collider, since both 2 and 3 are pointing to it.The path 1 → 2 → 4 → 5 is blocked by 2, since 2 is a non-collider which is on the path; it is also blocked by {2, 3} for the same reason.However, this path is not blocked by 3 alone, since 3 is a non-collider but is not on the path.The path 2 → 4 ← 3 is blocked by ∅ (the empty set), since the only node on the path is 4, which is a collider and does not belong to ∅.However, this path is neither blocked by 4 nor by 5, since 4 is a collider that is on the path and 5 is its descendant. 1 and 4 are d-separated by {2, 3}, since both paths connecting 1 and 4 (i.e., 1 → 2 → 4 and 1 → 3 → 4) are blocked by {2, 3}; however, they are not d-separated by 2 only, since this node does not block the path 1 → 3 → 4. Inverting the link 1 → 2 yields a graph that is observationally equivalent, as does inverting the link 1 → 3. Bottom: Structural equation model whose graphical representation is given by the top graph.
network, is modeled by a SEM with relationships of the form of Equation (1).Note that λ ij reads as the part of the signal of i that is contributed by region j.Then y can be associated with a directed graph, where each variable y i is represented by a vertex i, and where there is an arrow from region j to node i whenever λ ij ̸ = 0.

Constraints imposed by a structural model
It has been shown that for a wide variety of structural models, including acyclic causal models, as well as cyclic causal models in which all variables are discrete or in which functional relationships are linear, each relationship of d-separation can be translated into a relationship of conditional independence (Shipley, 2000a, Section 2.8).More precisely, every time two regions i and j are d-separated by a set of regions S, then y i and y j must be conditionally independent given y S , denoted y i ⊥ ⊥y j |y S . (2) Since each relationship of d-separation within an SEM is mirrored by a relationship of conditional independence, extracting the set of all relationships of d-separation for a model yields all relationships of conditional independence that must be verified in the data if the model is correct.Note that all these relationships are not necessarily independent from one another, as some statements can be predicted from others.In particular, we expect constraints involving the same pair of variables (i, j) to be highly correlated.We here decide to keep and test all constraints.This question is further discussed in Section 5.

Example
We use two structural models from the literature (Bullmore et al., 2000) to exemplify the extraction of conditional independence constraints from real-life directed graphs.Bullmore et al. (2000) considered SEM analysis of a task requiring semantic decision and subvocal rehearsal.The following R = 5 left hemispheric cortical regions of interest were selected: the ventral extrastriate cortex (VEC), the prefrontal cortex (PFC), the supplementary motor area (SMA), the inferior frontal gyrus (IFG), and the inferior parietal lobule (IPL).Thus, the set of vertices is given by R = {VEC, PFC, SMA, IFG, IPL} .
In their study, they introduced two structural models.They first proposed a plausible structural model based on anatomical and functional considerations.The resulting model, henceforth referred to as the "theoretically preferred model" (or "TP" model; see Figure 2, left).They also applied a procedure implemented in the LISREL proprietary software package1 , yielding a "best fit model", henceforth referred to as such (or "BF" model).The resulting model is schematized in Figure 2 and "best fit" (BF) model (right) from Bullmore et al. (2000).
We are now in position to extract the conditional independence constraints from the TP model (Section 2.4.1) and the BF model (Section 2.4.2).

Theoretically preferred model
For the theoretically preferred model, the following links are missing: between VEC and SMA; VEC and IFG; PFC and IFG, PFC and IPL, SMA and IPL.Reviewing all paths between these pairs of regions (see Table 1) leads to the following sets of d-separation and corresponding constraints on conditional correlation-• Between VEC and SMA: There are three potential paths between both regions.PFC must be in the conditioning set, otherwise VEC → PFC → SMA is not blocked.IFG must also be in the conditioning set, otherwise VEC ← IPL ← IFG ← SMA is not blocked.Since IFG also blocks VEC → IPL ← IFG ← SMA, the set { PFC, IFG } is sufficient.Adding IPL in the conditioning set does not change the conclusion.We therefore have the following two constraints: • Between VEC and IFG: IPL must be in the set, otherwise IFG → IPL → VEC is not blocked.However, IPL activates the path IFG → IPL ← VEC, with no possibility to block it with a non-collider.As a consequence, no set of regions can d-separate VEC and IFG.
• Between PFC and IFG: SMA must be in the conditioning set to block PFC → SMA → IFG.VEC must also be in the conditioning set to block both PFC ← VEC ← IPL ← IFG and PFC ← VEC → IPL ← IFG.Adding IPL keep the conclusion unchanged.We therefore have the following two constraints: • Between PFC and IPL: VEC must be in the conditioning set to block both PFC ← VEC ← IPL and PFC ← VEC → IPL.As to the first path, it is blocked by either SMA or IFG, or the two.As a consequence, we have three constraints: • Between SMA and IPL: IFG must be in the conditioning set to block the first path.Both the second and third path are blocked by VEC and/or PFC.We consequently have the following constraints:

Best fit model
For the best fit model, the following links are missing: between VEC and SMA; VEC and IFG; and PFC and IPL.Reviewing all paths between these pairs of regions (see Table 1) leads to- • Between VEC and SMA: PFC and IPL must be in the conditioning set, otherwise either one of the first two paths connecting these regions is not blocked.This set is sufficient, since both nodes also block the third path.Adding IFG in the conditioning set does not change the conclusion.We therefore have the following two constraints: • Between VEC and IFG: for the same reason as above, PFC and IPL must be in the conditioning set.This set is sufficient, since both nodes also block the third path.Adding SMA in the conditioning set does not change the conclusion.We therefore have the following two constraints: • Between PFC and IPL: VEC, SMA, and IFG must all be in the conditioning set to block any of the three paths.As a consequence, we have one constraint: • Between SMA and IFG: PFC must be in the conditioning set to block the first path.This node also blocks the second and fourth path.However, PFC is also a descendant of IPL, which is a collider in the third path.As a consequence, it unblocks the third path, with no possibility to block it, since it has only one intermediary node (which is the collider).
As a consequence, no set of regions can d-separate SMA and IFG.

Case of multivariate normal distributions
A common assumption in fMRI is that the data follow a multivariate normal distribution.In the framework of SEMs, this is a direct consequence of the assumption that the noise components e i (t) are Gaussian distributed.If y is multivariate normal with covariance matrix Σ, conditional independence is characterized by a zero conditional correlation.Each relationship of d-separation dsep(i, j|S) that can be read off the directed graph associated with the SEM can therefore also be associated with one constraint of the form ρ i,j|S = Corr[y i , y j |y S ] = 0. (3) This coefficient of conditional correlation can be obtained from Σ as follows.First, we discard the part of Σ not related to T = S ∪ {i, j}.We then partition the remaining matrix Σ T as Σ {i,j} is the part of the covariance matrix that is specific of regions i and j, Σ S the part of the covariance matrix that is specific to regions in S, while Σ {i,j},S contains the covariances between regions in S on the one hand and, on the other hand, regions i and j.We then compute the 2-by-2 conditional covariance matrix of {i, j} given S as (Anderson, 1958, Section 2.5.1) and the corresponding conditional correlation coefficients by normalization of the covariance coefficient,

Inference
Assume that the SEM leads to the expression of K constraints of the form given by Equation (3), each constraint C k being expressed as Starting from a dataset D = {y 1 , . . ., y N } of N samples assumed to be independent and identically distributed (i.i.d.) realizations of a multivariate normal distribution with unknown mean µ and covariance matrix Σ, the goal of the present section is to propose a way to simultaneously assess the validity of all constraints.To this aim, we use a standard Bayesian analysis complemented with a numerical sampling scheme, as described here.For the sake of simplicity, we denote the set of all conditional correlation coefficients of interest.In a Bayesian analysis, the information brought by the data is summarized by the posterior distribution of the parameters given the data, p(ρ|D).While this distribution cannot be expressed in closed form in the present case, it can easily be approximated using a standard result regarding the posterior distribution of the covariance matrix, p(Σ|D) (Section 2.6.1),together with a numerical sampling scheme (Section 2.6.2).From there, hypothesis testing can be performed at the individual, joint, or global level (Section 2.6.3).(7)

Posterior distribution of ρ
We start by numerically sampling Σ from its posterior distribution, Equation ( 7), which can easily be done (Gelman et al., 1998, Appendix A).From each sample Σ [l] , l = 1, . . ., L (e.g., L = 10 5 ) and each constraint C k , one can then compute the conditional correlation coefficient ρ associated with C k using the procedure detailed in Section 2.5 and leading to Equation ( 4).This procedure yields a sample from the posterior distribution of interest, p(ρ|D).

Hypothesis testing
We are now in position to provide a general test that uses the numerical sample from p(ρ|D) to test all (i.e., individual, joint, and global) constraints induced by a model in a similar fashion.This test is based on a (multivariate) normal approximation of the distributions of interest.A (multivariate) normal distribution is fully characterized by its mean and variance.In our case, approximations for the posterior mean and covariance matrix can readily be computed using their sample counterparts.Let ρ be the K-dimensional (1 ≤ K ≤ K) subvector of ρ that we wish to test, and c, respectively V , be the sample mean, respectively variance (or covariance matrix), of the numerical sample (ρ [l] ) approximating p(ρ|D).We then define deviance as It is a squared Mahalanobis distance which characterizes how ρ is close to c.If ρ were normal distributed, then d(ρ) would provide the contours of constant probability for ρ and would be chi-square distributed with K degrees of freedom.We finally define p as the probability that d(ρ) < d(0).p is a measure of how plausible the null hypothesis ρ = 0 is (Tanner, 1994;Kershaw et al., 1999;Marrelec et al., 2003).This probability could be computed using the chi-square approximation.Here, we rather rely on the numerical sample and approximate p as the fraction of samples for which d(ρ [l] ) is smaller than Finally, if a decision is required, a significance level α can be set (e.g., α = 0.05).All tests such that p < α are declared significant, and the corresponding null hypotheses are rejected.

Simulation study
We assessed the validity of our approach by applying it to synthetic data.The data generation process, the analysis, the evaluation scheme, and the main results are presented in Sections 3.1, 3.2, 3.3, and 3.4, respectively.

Data generation
We generated synthetic data using the TP and BF models introduced earlier (see Section 2.4).
For the model parameters (path coefficients and noise variances), we used the values that were inferred by Bullmore et al. (2000) from their data and are summarized in our Table 2.For each model, we generated 1000 sample covariance matrices, for a total of 1000 (samples per structural model) ×2 (models) = 2000 sample covariance matrices.

Analysis
For each sample covariance matrix, we tested the relevance of the constraints originating from both the theoretically preferred model and the best fit model.More specifically, for each sample covariance matrix and model, we tested the validity of the following constraints: • For the theoretically preferred model: 10 individual constraints, 3 joint constraints, and 1 global constraint; • For the best fit model: 5 individual constraints, 3 joint constraints, and 1 global constraint.
Each constraint was tested by computing the corresponding p as detailed in Section 2.6.

Evaluation
For each model used to generate the data (theoretically preferred model and best fit model) and each model providing the constraints tested (again, theoretically preferred model and best fit model), we computed two summaries: • the 5% percentile for p, i.e., the value p 5% for which we have p < p 5% for 5% of the 1 000 samples (i.e., for 50 values); • the fraction f 0.05 of tests declared significant at a significance level of α = 0.05.
When analyzing data generated according to a structural model with the constraints associated with the same model (i.e., testing constraints from the theoretically preferred model using data generated according to the theoretically preferred model, or testing constraints from the best fit model using data generated according to the best fit model), we expect to have p 5% ≈ 0.05 and f 0.05 ≈ 0.05.

Results
Results of simulation study can be found in Table 3.When testing the constraints of the theoretically preferred model on data generated using the theoretically preferred model, the values of p 5% and f 0.05 were both relatively close to the expected value of 0.05 for the individual constraints (range for p 5% : 0.029-0.052;range for f 0.05 : 0.029-0.060)and the joint constraints (range: 0.044-0.060),but quite different for the global constraint (p 5% = 0.252, f 0.05 = 0.004).
When testing the constraints associated with the theoretically preferred model with data generated using the best fit model, values where much lower for p 5% (range: < 0.001 to 0.038) and much larger for f 0.05 (range 0.072-0.941)for all types of constraints.By contrast, when testing the constraints associated with the best fit model, we found limited difference in results between data generated using the theoretically preferred model (range for p 5% : 0.0163-0.116;range for f 0.05 : 0.010-0.117,all constraint types) and data generated using the best fit model (range for p 5% : 0.028-0.148;range for f 0.05 : 0.005-0.072,all constraint types).

Experimental data
We finally used the experimental data provided in Bullmore et al. (2000) to reanalyze their study and provide new insight into the structural modeling.

Data
Each of the 5 regions mentioned in Section 2.4 was associated with a time course of length T = 96 time samples.The sample correlation matrix between the time series was given in Bullmore et al. (2000) and is reported in Table 4.

Analysis
We applied our procedure to the real data to infer in what measure they support the existence of the theoretically preferred model and/or the best fit model.In particular, while the theoretically preferred model and the best fit model differed both structurally (different set of arrows) and numerically (different path coefficients for arrows that are common to both models), Bullmore et al. (2000) concluded that the data did not contain enough evidence to enable one to discard the theoretically preferred model as being significantly different from the best fit model.We wanted to qualify this statement.

Results
See Figures 3 and 4 for an example of output from the approximate sampling scheme.The significance of the different constraints are reported in Table 5 and summarized in graphical form in Figure 5.While the global set of constraints could not be rejected for the theoretically preferred model (p = 0.171), some specific constraints (such as C 5 , C 8 and C 9 ; also C 3 close to significance), or sets of constraints (such as J 3 , corresponding to a lack of connection between PFC and IPL and J 4 , corresponding to a lack of connection between SMA and IPL), were found Table 3: Simulation study.Results of analysis: p 5% , the 5% percentile interval for p, as well as f 0.05 , the fraction of samples that were declared significant for a significance level of α = 0.05.

Discussion
In the present paper, we proposed a novel method to test the validity of a given model of structural equation.it extracts the set of all constraints of conditional independence induced by the absence of links between regions in the model and tests for their validity in a Bayesian framework, either individually (constraint by constraint), jointly (by gathering all constraints associated with a given missing link), or globally (all constraints associated with a structural model).We illustrated the approach on a dataset and two structural models.With a simulation study, we showed the power and limits of the method.Finally, we applied the method to the real data.This approach is unique in that it avoids several issues that are typical of usual SEM inference methods.First, it does not mislead the user into making incorrect conclusions regarding the causal pattern of the model, as observationally equivalent models will lead to the same conclusion.Second, it makes it possible to test constraints at the level desired by the user: either at the scale of a single constraint, a set of constraints corresponding to a missing link, and all constraints specific of the structural model.
A key point in the procedure is the determination of the constraints of conditional independence entailed by the structural model (Acid and de Campos, 2013).Efficient methods have been proposed to extract such constraints in a principled fashion.For acyclic graphs, a method was proposed by Pearl (1988).Another, more general approach was introduced by Shipley (2000b).Both approaches rely on the fact that the set of all constraints entailed by a structural model are not necessarily independent from one another, as some statements can be predicted from others.For instance, we could expect constraints involving the same pair of variables (i, j) to be highly correlated.The two methods mentioned above were developed to avoid this redundancy and to extract a (not necessarily unique) smallest subset, called basis, which still generates all existing relationships.We here departed from this approach and, instead, used brute force to extract and deal with all relationships of conditional independence.We reasoned that the redundancy contained in the set of all constraints could be beneficial to the inference process.Indeed, discarding some constraints is tantamount to discarding information about the model, which is not desired.Also, when dealing with real data, we can expect some constraints to be better respected than others in the data; which constraints are best to investigate a model is therefore the result of an interaction between model and data-i.e., it is itself an inference process, which we did not take the risk of separating from the main task.The application we introduced confirmed the strong redundancy between constraints related to the same missing link (e.g., for the theoretically preferred model: C 3 and C 4 , corresponding to the missing connection between PFC and IFG; C 5 , C 6 , and C 7 , corresponding to the missing connection between PFC and IPL), as observed by visual inspection of either the scatterplots (see Figure 4) or the correlation matrices (see Figure 6).The application also brought evidence in favor of keeping all constraints, as various relationships corresponding to the same missing link were found to have quite different significance levels.For instance, in the theoretically preferred model, C 5 , C 6 and C 7 , related to the same missing connection between PFC and IPL, had p-values ranging from 0.020 to 0.192.
This exhaustive way of extracting constraints may limit the generalization of the method to larger systems, as its scalability with respect to the number of regions remains to be investigated.In particular, we are not sure whether there exists an efficient algorithm that extracts all relationships of conditional independence from a given structural model.
Note that, in the process of extracting constraints of conditional independence, a missing link does not necessarily translate into a constraint.This is in particular true in cyclic graphs, i.e., in the presence of loops or feedbacks.For instance, the absence of a link between IFG and VE in the theoretically preferred model was not associated with a constraint.Similarly, in the best fit model, the missing link between SMA and IFG was not associated with a constraint as their unique parent PFC was also a descendant.
To assess the validity of our approach, we used a simulation study where we generated data according to the two models under investigation (theoretically preferred model and best fit model).Interestingly, this showed evidence of a differential behavior depending on the model being tested.Our method was able to discriminate synthetic data generated according to the theoretically preferred model from synthetic data generated from the best fit model in the case of constraints associated with the theoretically preferred model.By contrast, it was not able to do so in the case of constraints associated with the best fit model.The approach introduced in the present manuscript strongly relies on conditional correlation.Conditional correlation was introduced in fMRI functional connectivity analysis as a data-driven way to extract information from data that is closer to effective connectivity than pairwise correlation (Marrelec et al., 2005a, Marrelec et al., 2005b).A particular emphasis was put on partial correlation, a particular type of conditional correlation where the conditioning set is the rest of the variables (Marrelec et al., 2006(Marrelec et al., , 2007;;Marrelec and Benali, 2009;Marrelec et al., 2009;Smith et al., 2011).By contrast, the present method is clearly model-based, as it requires the knowledge of a potential model to start with.
For instance, consider PFC and IPL.In the theoretically preferred model, the absence of a link between both regions was deemed to be implausible in the face of the data (p = 0.017), while it was found to agree with the data based on the best fit model (p = 0.188).
The validity of using SEM in fMRI as well as its strengths and limits, and even the very possibility to access information relative to effective connectivity, are topics that are still discussed (Gonçalves et al., 2001;Gonçalves and Hall, 2003;Ramsey et al., 2010;Friston, 2011).Still, SEM is a classical way to deal with effective connectivity, and we reasoned that improving this method such that it avoids the pitfall of observationally equivalent models and provides for a finer view into local relationships could only be beneficial to the community of SEM users.We also hope that our contribution will broaden the usability of SEM and help the neuroscientists improve the causal information that they can extract from data.

2. 6 . 1
Posterior distribution of ΣAssuming a noninformative Jeffreys prior for the covariance matrix and letting m be the sample average, − m)(y n − m) t , the posterior distribution of Σ is inverse Wishart with N −1 degrees of freedom and scale matrix S(Gelman et al., 1998, Section 3

Figure 3 :
Figure 3: Experimental data.Results from inference corresponding to the three individual constraints relative to the missing link between PFC and IPL in the theoretically preferred model: C 5 (top), C 6 (middle) and C 7 (bottom).Left: histograms of the samples approximating the posterior distributions of the conditional correlation coefficients, and corresponding normal approximations.Right: histograms of the samples approximating the posterior distributions of the deviances obtained from Equation (8), and corresponding chi-square approximations.

Figure 4 :Figure 5 :
Figure 4: Experimental data.Results from inference corresponding to the joint constraints relative to the missing link between PFC and IFG (left) as well as between PFC and IPL (right) in the theoretically preferred model.Scatterplot of the conditional correlation coefficients associated with J 2 and J 3 .The color codes − log 10 d(ρ), thresholded at 3. The black dot stands for ρ = 0. Theoretically preferred model (TP) Best fit model (BF)

Figure 6 :
Figure 6: Experimental data.Value of correlation between conditional correlation coefficients approximated with the sampling scheme.

Table 1 :
Example of structural models.Review of missing connections and corresponding paths in the theoretically preferred (TP) model and the best fit (BF) model.

Table 2 :
Bullmore et al., 2000)eter values used for data generation: path coefficients for the theoretically preferred model (top) and the best fit model (middle), as well as residual variance (bottom) (values fromBullmore et al., 2000).

Table 4 :
Bullmore et al. (2000)le correlation matrix of the data set examined inBullmore et al. (2000)(lower triangular matrix) and corresponding sample partial correlation matrix (upper triangular matrix, italics).

Table 5 :
Experimental data.Result of inference: significance values for the different constraints entailed by the theoretically preferred model and the best fit model.Values lower than a threshold of α = 0.05 are emphasized in bold, while values larger but close to α are in italics.