Granger Causality Network Reconstruction of Conductance-Based Integrate-and-Fire Neuronal Systems

Reconstruction of anatomical connectivity from measured dynamical activities of coupled neurons is one of the fundamental issues in the understanding of structure-function relationship of neuronal circuitry. Many approaches have been developed to address this issue based on either electrical or metabolic data observed in experiment. The Granger causality (GC) analysis remains one of the major approaches to explore the dynamical causal connectivity among individual neurons or neuronal populations. However, it is yet to be clarified how such causal connectivity, i.e., the GC connectivity, can be mapped to the underlying anatomical connectivity in neuronal networks. We perform the GC analysis on the conductance-based integrate-and-fire (IF) neuronal networks to obtain their causal connectivity. Through numerical experiments, we find that the underlying synaptic connectivity amongst individual neurons or subnetworks, can be successfully reconstructed by the GC connectivity constructed from voltage time series. Furthermore, this reconstruction is insensitive to dynamical regimes and can be achieved without perturbing systems and prior knowledge of neuronal model parameters. Surprisingly, the synaptic connectivity can even be reconstructed by merely knowing the raster of systems, i.e., spike timing of neurons. Using spike-triggered correlation techniques, we establish a direct mapping between the causal connectivity and the synaptic connectivity for the conductance-based IF neuronal networks, and show the GC is quadratically related to the coupling strength. The theoretical approach we develop here may provide a framework for examining the validity of the GC analysis in other settings.

In the theoretical definition of GC, one assumes that the data length of time series and the regression order in autoregressive processes are both infinite [1]. In practice, however, the measured time series in experiment has only finite length. With finite-length data, some important issues have to be addressed. The first issue is how to choose a correct regression order since the total length of data is finite. It has been shown that an improper choice of regression order can cause large errors in linear regression models [2,3]. The second issue is to determine whether the causal influence between time series is statistically significant or arises from statistical error introduced by finite-length data and finite regression orders in the numerical computation of GC.
In the following, we discuss the methodology to address the above issues along with the numerical algorithm of evaluating GC for a given set of empirical has a zero mean. Assuming that X t is stationary, for a given regression orderm, we can construct the following regressive equations [2,3] where A i are N × N regression-coefficient matrices (i = 1, 2, · · · ,m) and T is a vector of residuals which have zero mean. The covariance matrix of E (m) (t), denoted by Σm, is given by Note that the correlation between E (m) (t) and X t−k (for any 1 ≤ k ≤m) is zero. Therefore, by multiplying X T t−k (k = 1, 2, · · · ,m) on both sides of Eqs. (1) and then taking expectations, we can obtain the Yule-Walker equations as follows [2,3] where C τ = X t X T t+τ is the covariance matrix of X t and C −τ = C T τ for τ ≥ 0.
1 To obtain the regression coefficients A 1 , A 2 , · · · , Am, we rewrite Eqs. (3) as For simplicity, we denote the symmetrical coefficient matrix of linear equations (4) by the regression coefficient matrices by and the right hand side of Eqs. (4) by Equations (4) then becomes Ωm ×m Λm = Bm. Note that (i) Eqs. (4) contain a total number ofmN 2 unknown regression coefficients in Λm and we have exactly the same number of linear equations; (ii) the coefficient matrix Ωm ×m in linear equations (4), in general, should be positive definite when the data length L is infinite.

Construction of positive definite matrix Ωm ×m
As is well known, the properties of the coefficient matrix, e.g., symmetric and positive definiteness, play an important role in determining the robustness of numerical solutions to linear equations [4,5]. Mathematically, the coefficient matrix Ωm ×m in Eqs. (4) is positive definite when the length of time series L is infinite. In numerical computations, however, if we first compute each covariance matrix function C k , k = 0, 1, · · · ,m − 1, and then construct the coefficient matrix Ωm ×m as in Eqs. (4) and (5), due to statistical effects caused by finite data length, Ωm ×m is not guaranteed to be positive definite. Therefore, when solving linear equations (4) to obtain the regression coefficients Λm, one may obtain an incorrect inference of the causal connectivities between neurons.
To numerically construct Ωm ×m that preserves the property of positive definiteness, we define Y t = [X T t+m−1 , X T t+m−2 , · · · , X T t ] T , which is anmN dimensional vector. The coefficient matrix Ωm ×m can be obtained by In our cases, numerical simulations have shown that the coefficient matrix Ωm ×m as constructed in Eqs. (8) is always positive definite when the length of time series L is finite (L >mN ). After this, Eqs. (4) can be solved by many robust numerical algorithms such as the conjugate gradient method, Cholesky decomposition method or successive relaxation method [4,5,6].

Regression order
To obtain a good estimate of GC, it is important to determine a correct regression order m in linear regression models [7]. Some criteria have been proposed based on the information theory [8,9]. One is the Akaike Information Criterion (AIC), which is defined by minimizing the AIC function with respect tom. The AIC function is defined as where det(Σm) is the determinant of the covariance matrix Σm as defined in Eqs. (2). The other criterion is the Bayesian Information Criterion (BIC) and the regression orderm is chosen by minimizing the BIC function as It has been shown that the AIC function sometimes cannot achieve a minimum when the total number of data points L is large. However, the BIC criterion can compensate for the large number of data points and usually performs better in neural applications [1]. In our study, both AIC and BIC criteria have been tested to determine the regression orderm, andm determined by the AIC criterion is usually larger than that obtained by using the BIC criterion. However, the final results about the inference of causal interactions between time series remain the same for both criteria.

Statistical significance
In numerical computations, due to finite data length L and finite regression order m, we can only obtain an estimate of GC, denoted byF . Therefore, we should perform significance tests to determine whether the calculated nonzerô F is statistically significant or it arises from statistical error. Under the null hypothesis that there is no causal influence from x 2 to x 1 , the conventional large-sample distribution theory can be used [10,2,3] to show thatF x2→x1 is asymptotically χ 2 distributed as LF x2→x1 ∼ χ 2 (m). We can obtain a threshold for GC magnitude from x 2 to x 1 (similarly for the GCF x1→x2 ) through a pvalue test [11,2,3]. In most of our cases, p is chosen to be 0.001 in numerical simulations.
In addition, if we want to obtain the confidence interval for the theoretically defined but unknown GC, denoted by F [11,12], The statistical significance can be similarly analyzed by using the χ 2 statistic, which is asymptotically monotonically related to the F -statistic [11,13]. For ease of discussion, we consider the significance test for two time series x 1 (t) and x 2 (t). However, this test can be straightforwardly extended to the multivariate case.
Using these approximations, one can obtain an approximate 95 percent confidence interval for F x1→x2 and F x2→x1 as where the functions φ l (·) and φ r (·) are defined as

Minimal data length for GC reconstruction
Here, we discuss the minimal data length that is needed for a successful GC reconstruction. As an illustration of this issue, we study the two-neuron network as shown in Fig. 1A. From the conventional large-sample distribution theory [2,3,10], the numerical computed GC valuesF x1→x2 andF x2→x1 are asymptotically χ 2 distributed as shown in Eq. (11) and we record their probability distribution function as ρ 1 (x) (with causal influence) and ρ 0 (x) (without causal influence), respectively. As discussed in the main text, we can determine a GC threshold F T by performing the p-value test. Therefore, the probability of correct topology reconstruction for this network is Note that, the distributions ρ 1 (x) and ρ 0 (x) depend on the theoretical GC values F x1→x2 and F x2→x1 , respectively. As discussed above, F x1→x2 and F x2→x1 are theoretically defined using infinite length of time series. Therefore, we compute GC values by using sufficiently long time series (∼ 200 mins) until our numerical results are convergent and can well approximate those theoretical GC values. For a given level of correctness P [correct] (e.g., 90%), Eq. (12) is an implicit function of time length [See Eq. (11)] and we can solve it using an iterative method to obtain the minimal data length L min for the GC reconstruction as shown in Fig. S8.