Synergy and redundancy in the Granger causal analysis of dynamical networks

We analyze by means of Granger causality the effect of synergy and redundancy in the inference (from time series data) of the information flow between subsystems of a complex network. Whilst we show that fully conditioned Granger causality is not affected by synergy, the pairwise analysis fails to put in evidence synergetic effects. In cases when the number of samples is low, thus making the fully conditioned approach unfeasible, we show that partially conditioned Granger causality is an effective approach if the set of conditioning variables is properly chosen. We consider here two different strategies (based either on informational content for the candidate driver or on selecting the variables with highest pairwise influences) for partially conditioned Granger causality and show that depending on the data structure either one or the other might be valid. On the other hand, we observe that fully conditioned approaches do not work well in presence of redundancy, thus suggesting the strategy of separating the pairwise links in two subsets: those corresponding to indirect connections of the fully conditioned Granger causality (which should thus be excluded) and links that can be ascribed to redundancy effects and, together with the results from the fully connected approach, provide a better description of the causality pattern in presence of redundancy. We finally apply these methods to two different real datasets. First, analyzing electrophysiological data from an epileptic brain, we show that synergetic effects are dominant just before seizure occurrences. Second, our analysis applied to gene expression time series from HeLa culture shows that the underlying regulatory networks are characterized by both redundancy and synergy.


INTRODUCTION
Living organisms can be modeled as an ensemble of complex physiological systems, each with its own regulatory mechanism and all continuously interacting between them [1].Therefore the problem of the interactions inference within and between these modules is a crucial issue.Over the last years the interaction structure of many complex systems has been mapped in terms of networks, which have been succesfully addressed using tools from statistical physics [2].Dynamical networks have modeled physiological behavior in many applications; examples range from networks of neurons [3], genetic networks [4], protein interaction nets [5] and metabolic networks [6][7][8].
The inference of dynamical networks from time series data is related to the estimation of the information flow between variables.Granger causality (GC) [9] has emerged as a major tool to address this issue.This approach is based on prediction: if the prediction error of the first time series is reduced by including measurements from the second one in the linear regression model, then the second time series is said to have a Granger causal influence on the first one.It has been shown that GC is equivalent to transfer entropy [10] in the Gaussian approximation [11].
The pairwise Granger analysis (PWGC) consists in assessing GC between each pair of variables, independently of the rest of the system.It is well know that the pairwise analysis cannot distinguish between direct and indirect interactions among variables.The most straightforward extension, the conditioning approach [15], removes indirect causalities among variables which are included in the data-set by looking at how the predictive power of the driver on the target decreases when the conditioning variable is removed.In situations where the number of samples is limited and a fully conditioned approach is unfeasible, a partially conditioned approach is a workable alternative, consisting in conditioning on a small number of variables, chosen as the most informative ones for the driver node; this approach leads to results very close to those obtained with a fully conditioned analysis and even more accurate in the presence of a small number of samples [16][17][18].We remark that the use of partially conditioned Granger causality (PCGC) may be useful also in non-stationary conditions, where the GC pattern is to be estimated on short time windows.
Sometimes though a fully conditioned (CGC) approach can encounter conceptual limitations, on top of the practical and computational ones: in the presence of redundant variables the application of the standard analysis leads to underestimation of influences [31].Redundancy in a multiplet of variables may arise if some channels are all influenced by another signal that is included in the regression; another source of redundancy may be synchronization phenomena in a group of variables, without the need of an external influence.Redundancy manifests itself through an high degree of correlation in a group of variables, both for instantaneous and lagged influences.Several approaches have been proposed in order to reduce dimensionality in multivariate sets eliminating redundancy, relying on generalized variance [20], principal components analysis [21], or the Granger causality itself [22].
A complementary concept to redundancy is synergy.The presence of synergetic effects is well known in sociological and psychological modeling, where suppressors are the name given to variables that increase the predictive validity of another variable after its inclusion into a linear regression equation [38], see [39] for examples of easily explainable suppressor variables in multiple regression research.In [12] an expansion of the information flow has been proposed to put in evidence redundant and synergetic multiplets of variables.Other information-based approach have also addressed the issue of collective influence [13,14].The purpose of this paper is to provide evidence that in addition to the problem related to indirect influence, PWGC shows another relevant pitfall: it fails to detect synergetic effects in the information flow, in order words it does not account for the presence of subsets of variables that provide some information about the future of a given target when all the variables are used in the regression model.We remark that since it processes the system as a whole, CGC evidences synergetic effects; when the number of samples is low, also PCGC can detect synergetic effects provided an adequate selection for the conditioning variables.
The paper is organized as follows.In the next section we briefly recall the concepts of GC and PCGC.In section III we describe some toy systems illustrating how redundancy can affect the results by CGC, whilst indirect interactions and synergy are the main problems inherent to PWGC.In section IV we provide evidence of synergetic effects in epilepsy: we analyze electroencephalographic recordings from an epileptic patient corresponding to ten seconds before the seizure onset; we show that the putative seizure onset corresponds to two contacts acting as synergetic variables driving the rest of the system.The pattern of information transfer drives to the actual seizure onset only when synergy is correctly considered.
In section V we propose an approach that combines PWGC and GCC to evidence the pairwise influences due only to redundancy.The conditioned GC pattern is used to partition the pairwise links in two sets: those which can be justified as indirect influences between the measured variables, according to CGC, and those which are not explained as indirect relationships.The unexplained pairwise links, presumably due to redundancy, are thus retained to complement the information transfer pattern discovered by CGC.In cases where the number of samples is so low that a fully multivariate approach is unfeasible, PCGC may be applied.We also address here the issue of variables selection for PCGC; our concluding strategy is the following: for each target variable, the selected variables are those sending the highest amount of information to that node as revealed by a pairwise analysis.This new selection strategy works more efficient when the interaction graph has a tree structure; in presence of redundancy the selection based on the mutual information with the candidate driver provides results closer to those obtained by CGC.We finally apply the proposed approach on time series of gene expressions, extracted from a data-set from the HeLa culture.Section VI summarizes our conclusions.

GRANGER CAUSALITY
Granger causality is a powerful and widespread data-driven approach to determine whether and how two time series exert direct dynamical influences on each other [23].A convenient nonlinear generalization of GC has been implemented through a kernel algorithm which embeds data into a Hilbert space, and searches for linear Granger causality in that space [31]; the embedding is performed implicitly, by specifying the inner product between pairs of points [25], and a statistical procedure is used to avoid over-fitting.
Quantitatively, let us consider n time series {x α (t)} α=1,...,n ; the lagged state vectors are denoted m being the window length.Let ǫ (x α |X) be the mean squared error prediction of x α on the basis of all the vectors X (corresponding to the kernel approach described in [26]).The multivariate Granger causality index δ mv (β → α) is defined as follows: consider the prediction of x α on the basis of all the variables but X β and the the prediction of x α using all the variables, then the GC is the (normalized) variation of the error in the two conditions, i.e.
In [27] it has been shown that not all the kernels are suitable to estimate GC.Two important classes of kernels which can be used to construct nonlinear GC measures are the inhomogeneous polynomial kernel (whose features are all the monomials in the input variables up to the p-th degree; p = 1 corresponds to linear Granger causality) and the Gaussian kernel.
The pairwise Granger causality is given by: The conditioned Granger causality is defined as follows.Let Y be the variables in X, excluding X α and X β , then When Y is only a subset of the total number of variables in X not containing X α and X β , we have the Partially Conditioned Granger Causality.In [16] the set Y is chosen as the most informative for X β .Here we will also consider an alternative strategy: fixed a small number k, we select Y = {X γ } as the k variables with the maximal pairwise GC δ bv (γ → α) w.r.t. that target node, excluding X β .

EXAMPLES
In this section we provide some typical examples to remark possible problems that pairwise and fully conditioned analysis may encounter.

Indirect GC among measured variables
We consider here the following lattice of ten unidirectionally coupled logistic maps, with and with i = 1, . . ., 10.The transfer function is given by f (x) = 1 − 1.8x 2 .In this system the first map is evolving autonomously, whilst the other maps are influenced by the previous one with coupling ρ, thus forming a cascade of interactions.In figure 1a we plot as a function of ρ the number of GC interactions found by PWGC and CGC, using the method described in [31] with the inhomogeneous polynomial kernel of degree two.The CGC output is the correct one (nine links) whilst the PWGC output also accounts for indirect causalities and therefore fails to provide the underlying network of interactions.On this example we have also tested PCGC, see figure 1b.We considered just one conditioning variable, chosen according to the two strategies described above.Firstly we consider the most informative w.r.t. the candidate driver, as described in [16]; we call this strategy information based (IB).Secondly, we choose the variable with maximal bivariate causality to the target node, a pairwise based (PB) rule.The PB strategy yields the correct result in this example, whilst the IB one fails when only one conditioning variable is used and requires more than one conditioning variables to provide the correct output.This occurrence is due to the tree topology of the interactions in this example, which favors PB selecting by construction the parents of each node.

Redundancy due to a hidden source
We show here how redundancy constitutes a problem for CGC.Let h(t) be a zero mean and unit variance hidden Gaussian variable, influencing n variables x i (t) = h(t − 1) + sη i (t), where {η} are unit variance Gaussian noise and s controls the noise level.Let w(t) = h(t − 2) + sρ 0 (t) be another variable who is influenced by h but with a larger delay.In figure 1c we depict both the linear PWGC and the linear CGC from one of the x's to w (note that h is not used in the regression model).As n increases, the conditioned GC vanishes as a consequence of redundancy.The GC relation which is found in the pairwise analysis is not justified on the basis of the conditioned output, hence revealing the presence of the hidden source.The CGC pattern alone is not a good description of the information flow in this system.
Considering the influence 3 → 4, the CGC reveals that 3 is influencing 4, whilst PWGC fails to detect this causal relationship, see figure 1d, where we use the method described in [31] with the inhomogeneous polynomial kernel of degree two.This example shows that PWGC fails to detect synergetic contributions.

SYNERGETIC EFFECTS IN EPILEPTIC BRAIN
As a real example we consider intracranial EEG recordings from a patient with drug-resistant epilepsy with an implanted array of 8 × 8 cortical electrodes (CE) and two depth electrodes (DE) with six contacts each.The data are available in [32] and further details on the dataset is given in [33].Data were sampled at 400 Hz.We will consider here a dataset recorded in the preictal period, during 10 seconds preceding the seizure onset.To handle this data, we used linear Granger causality with m equal to five.In figure 2 we depict the PWGC between DEs (panel a), from DEs to CEs (panel b), between CEs (panel c) and from CEs to DEs (panel d).We note a complex pattern of bivariate interactions among CEs, whilst the first DE seems to be the subcortical drive to the cortex.We remark that there is no PWGC from the last two contacts of the second DE (channels 11 and 12) to CEs and neither to the contacts of the first DE.In figure 3 we depict the CGC among DEs (panel a), from DEs to CEs (panel b), among CEs (panel c) and from CEs to DEs (panel d).The scenario in the conditioned case is clear: the contacts 11 and 12, from the second DE, are the drivers both for the cortex and for the subcortical region associated to the first DE.These two contacts can be then associated to the seizure onset zone (SOZ).The high pairwise GC strength among CEs is due to redundancy, as they are all driven by the same subcortical source.Since the contact 12 is also driving the contact 11, see figure 3a, we conclude that the contact 12 is the closest to the SOZ, and that the contact 11 is a suppressor variable for it, because it is necessary to include it in the regression model to put in evidence the influence of 12 on the rest of the system.We stress that the influence from contact 12 to the rest of the system emerges only in the CGC and it is neglected by PWGC: these variables are synergetically influencing the dynamics of the system.To our knowledge this is the first time that synergetic effects are found in relation with epilepsy.
On this data we also apply PCGC using one conditioning variable.The results are depicted in figure 4: using the IB strategy we obtain a pattern very close to the one from CGC, while this is not the case of PB.These results suggest that IB works better in presence of redundancy.

COMBINING PAIRWISE AND CONDITIONED GRANGER CAUSALITY PATTERNS.
In the last sections we have shown that the drawback of CGC has a the poor performance in presence of redundancy.On the other hand, that some information about redundancy may be obtained from the PWGC pattern.We develop here a strategy to combine the two approaches: to find the pairwise links that are consistent with the CGC pattern, explained as indirect interactions of CGC.The remaining pairwise links (those which are not consistent with CGC) are retained for further consideration, and their disappearance from the fully conditioned pattern is ascribed to a confounding effect of redundancy (too high level of equal time correlation in multiplets of variables).
Let ∆ be the matrix from CGC (or PCGC).Let ∆ * be the matrix from PWGC.Let these matrices be evaluated using a model of order m.
The matrix contains paths of length α + β with delays in the range [−βm + α, . . ., −β + αm].We define M α,β to be compatible with a pairwise influence if this is equivalent to The elements of the matrix M αβ describe a situation where two nodes receive a common input from a third node which is α steps backward in time from one node, and β steps backward in time from the other node.The circuit corresponding to M 2,1 is represented in figure 5a.The matrix F α = ∆ α , with α ≥ 1, contains paths of length α with delays in the range [α, . . ., αm].F α is compatible with a pairwise GC if α ≤ m.The elements of the matrix F α describe a situation where a node sends information to another node through a cascade of α links (the circuit corresponding to F 2 is depicted in figure 5b.
Let us now consider the matrix where the first sum is over compatible matrices.If B ij is non vanishing, there exists at least one information flow path from i to j which is compatible with a pairwise GC i → j.Each term in B corresponds to a an informational circuit which may give rise to an indirect link.
In the pairwise matrix ∆ * we set to zero all the elements such that B ij > 0 (pruning).The resulting matrix ∆ * contains links which are not explained by the multivariate pattern, and thus should be ascribed to redundancy effects.
For m = 1 we have that the only terms in the first sum are those with α = β + 1, so the first non trivial terms are For m = 2, the simplest terms are: Since, due to the finite number of samples, a mediated interaction is more unlikely to be detected (by the pairwise analysis) if it corresponds to a long path, we limit the sum in the matrix B to the simplest terms, otherwise too many pairwise interactions would be explained and thus pulled out of consideration.
APPLICATION TO GENE EXPRESSION DATA.
The following example shows how the proposed approach works.HeLa [28] is a famous cell culture, isolated from a human uterine cervical carcinoma in 1951.HeLa cells have acquired cellular immortality, in that the normal mechanisms of programmed cell death after a certain number of divisions have somehow been switched off.We consider the HeLa cell gene expression data of [29].Data corresponds to 94 genes and 48 time points, with an hour interval separating two successive readings (the HeLa cell cycle lasts 16 hours).The 94 genes were selected from the full data set described in [30], on the basis of the association with cell cycle regulation and tumor development.We apply linear PWGC and linear CGC (using just another conditioning variable, and using both the selection strategies IB and PB described in Section III).We remark that the CGC approach is unfeasible in this case due to the limited number of samples.Varying the threshold for correlations, the number of links retrieved by both the pariwise and the partially conditioned approach increase as shown in figure 6.A thershold value has to be chosen e.g., as the one which corresponds to the expected number of GC relationships in the system.PCGC leads to a relevant number of interactions which are not detected by the pairwise approach: this is a clear signature of synergy.The low number of samples here allowed us just to use one conditioning variable, and therefore to analyze only circuits of three variables; a closely related analysis, see [37], has been proposed to study how a gene modulates the interaction between two other genes.
Moreover, using the matrix B 1 to detect the indirect links, we plot in figure 6c the percentage of bivariate links that are consistent with PCGC.These links correspond to a circuit like the one described in figure 5a; the remaining bivariate links may be due to redundancy effects and thus deserve further consideration, their number is plotted in figure 6d.In this example, the two strategies gave similar results and because the CGC is not available, it is not possible to determine which strategy yields an output resembling the CGC one.On the other hand, as far as the search for synergetic effects is concerned, we observe that the synergetic interactions found by PCGC with the two strategies are not coinciding, indeed only 10% of all the synergetic interaction are found by both strategies.This suggests that in order to search for suppressors, several sets of conditioning variables should be used in CGC in order to explore more possible alternative pathways, especially when there is not a priori information.

FIG. 1 :
FIG. 1: (a) The average number of links as a function of the coupling ρ, over 100 runs, retrieved by BVGC and FMGC on the coupled map lattice described in the text.(b) On the coupled map lattice, the average error in the reconstructed causality pattern by PCGC, obtained by the IB strategy and by BP, is plotted versus the coupling ρ.Errors are averaged over 100 runs.(c) For the example dealing with redundancy, FMGC and BVGC are plotted versus the number of variables.(d) For the example dealing with sinergy, FMGC and BVGC are plotted versus the coupling ρ.