Dynamic causal models of steady-state responses

In this paper, we describe a dynamic causal model (DCM) of steady-state responses in electrophysiological data that are summarised in terms of their cross-spectral density. These spectral data-features are generated by a biologically plausible, neural-mass model of coupled electromagnetic sources; where each source comprises three sub-populations. Under linearity and stationarity assumptions, the model's biophysical parameters (e.g., post-synaptic receptor density and time constants) prescribe the cross-spectral density of responses measured directly (e.g., local field potentials) or indirectly through some lead-field (e.g., electroencephalographic and magnetoencephalographic data). Inversion of the ensuing DCM provides conditional probabilities on the synaptic parameters of intrinsic and extrinsic connections in the underlying neuronal network. This means we can make inferences about synaptic physiology, as well as changes induced by pharmacological or behavioural manipulations, using the cross-spectral density of invasive or non-invasive electrophysiological recordings. In this paper, we focus on the form of the model, its inversion and validation using synthetic and real data. We conclude with an illustrative application to multi-channel local field potential data acquired during a learning experiment in mice.


Introduction
This paper is concerned with modelling steady-state or (quasi) stationary responses recorded electrophysiologically using invasive or non-invasive techniques. Critically, the models are parameterised in terms of neurophysiologically meaningful parameters, describing the physiology and connectivity of coupled neuronal populations subtending observed responses. The model generates or predicts the cross-spectral density of observed responses, which are a simple but comprehensive summary of steady-state dynamics under linearity and stationarity assumptions. Furthermore, these cross-spectral features can be extracted quickly and simply from empirical data. In this paper, we describe the model and its inversion, with a focus on system identifiability and the validity of the proposed approach. This method is demonstrated using local field potentials (LFP) recorded from Pavlovian fear conditioned mice. In subsequent papers, we will apply the model to LFP data recorded during pharmacological experiments.
The approach described below represents the denouement of previous work on dynamic causal modelling of spectral responses. In Moran et al., (2007), we described how neural-mass models, used originally to model evoked responses in the electroencephalogram model inversion given these features. In the second section, we address the face validity of the model, using synthetic data to establish that both the form of the model and its key parameters can be recovered in terms of conditional probability densities. The parameters we look at are those that determine post-synaptic sensitivity to glutamate from extrinsic and intrinsic afferents. In the final section, we repeat the analysis of synthetic data using multi-channel LFP data from mice, acquired during cued recall of a conditioned fear memory. This section tries to establish the construct validity of DCM in relation to the previous analyses of functional connectivity using crosscorrelogram analysis. These show an increase in the coupling between the hippocampus and amygdala using responses induced by conditioned fear-stimuli. We try to replicate this finding and, critically, extend it to establish the changes in directed connections that mediate this increased coupling.

The dynamic causal model
In this section, we describe the model of cross-spectral density responses. Much of this material is based on linear systems theory and the differential equations that constitute our neural-mass model of underlying dynamics. We will use a tutorial style and refer interested readers to appendices and previous descriptions of the neural-mass model for details. We first consider the generative model for crossspectral density and then describe how these cross-spectral features are evaluated. Finally, we review model inversion and inference.

A generative model for cross-spectral density
Under stationarity assumptions, one can summarize arbitrarily long electrophysiological recordings from multi-channel data in terms of cross-spectral density matrices, g(ω) c at frequency ω (radians per second). Heuristically, these can be considered as a covariance matrix at each frequency of interest. As such, these second-order data-features specify, completely, the second-order moments of the data under Gaussian assumptions. Cross-spectral density is useful because it represents the important information, in long time-series, compactly. Furthermore, it brings our data modelling into the domain of conventional spectral analysis and linear systems theory. The use of linear systems theory to derive the predicted spectral response from a non-linear dynamical system assumes that changes in the (neuronal) states of the system can be approximated with small perturbations around some fixed-point. This assumption depends on the experimental design and is more easily motivated when data are harvested during periods of limited perturbations to the subject's neuronal state. In short, we discount the possibility of phase-transitions and bifurcations (e.g., oscillatory dynamics) due to the non-linear properties of cortical macrocolumns (e.g. Breakspear et al., 2006).

The neural mass model
The underlying dynamic causal model is defined by the equations of motion : x t ð Þ = f x;u ð Þ at the neuronal level. In this context, they correspond to a neural-mass model that has been used extensively in the causal modelling of EEG and MEG data and has been described previously for modelling spectral responses (Moran et al., 2007(Moran et al., , 2008. This model ascribes three sub-populations to each neuronal source, corresponding roughly to spiny stellate input cells, deep pyramidal output cells and inhibitory interneurons. Following standard neuroanatomic rules (Felleman and Van Essen 1991), we distinguish between forward connections (targeting spiny stellate cells), backward connections (targeting pyramidal cells and inhibitory interneurons with slower kinetics) and lateral connections (targeting all subpopulations); see Fig. 1 and Moran et al., (2007). Each neuronal source could be regarded as a three-layer structure, in which spiny stellate cells occupy the granular layer, while infragranular and supragranular layers contain both pyramidal cells and inhibitory interneurons.
Each subpopulation is modelled with pairs of first-order differential equations of the following form: The column vectors x v and x I , correspond to the mean voltages and currents, where each element corresponds to the hidden state of the subpopulation at each source. These differential equations implement a convolution of a subpopulation's presynaptic input to produce a postsynaptic response. The output of each source is modelled as a mixture of the depolarisation of each subpopulation. Due to the orientation of deep pyramidal cell dendrites, tangential to the cortical surface, this population tends to dominate LFP recordings. We accommodate this by making the output of each source, g(x) a weighted mixture of x v with weights of 60% for the pyramidal subpopulation and 20% for the others. The presynaptic input to each subpopulation comprises endogenous, E(x), and exogenous, C(u), components

Endogenous inputs
In a DCM comprising s sources, endogenous input E(x) is a weighted mixture of the mean firing rates in other subpopulations (see Fig. 1). These firing rates are a sigmoid activation function of depolarisation, which we approximate with a linear gain function; S(x i ) = Sx i ∈ ℜ sx1 . Firing rates provide endogenous inputs from subpopulations that are intrinsic or extrinsic to the source. Subpopulations within each source are coupled by intrinsic connections, whose strengths are parameterised by γ = {γ 1 ,…,γ 5 }. These endogenous intrinsic connections can arise from any subpopulation and present with small delays. Conversely, endogenous extrinsic connections arise only from the excitatory pyramidal cells of other sources and effect a longer delay than intrinsic connections. The strengths of these connections are parameterised by the forward, backward and lateral extrinsic connection matrices A F ∈ ℜ sxs , A B ∈ ℜ sxs and A L ∈ ℜ sxs respectively. The postsynaptic efficacy of connections is encoded by the maximum amplitude of postsynaptic potentials H e,i = diag(H 1 ,…,H s ) (note the subscripts in Fig. 1) and by the rate-constants of postsynaptic potentials, κ = diag(κ 1 ,…,κ s ) for each source. The rate-constants are lumped representations of passive membrane properties and other spatially distributed dynamics in the dendritic tree.

Exogenous inputs
Exogenous inputs C(u) = Cu are scaled by the exogenous input matrix C ∈ ℜ sxs so that each source-specific innovation u(t) ∈ ℜ sx1 excites the spiny stellate subpopulation. We parameterise the spectral density of this exogenous input, g(ω) u , in terms of white (α) and pink (β) spectral components:

Neuronal responses
The cross-spectral density is a description of the dependencies among the observed outputs of these neuronal sources. We will consider a linear mapping from s sources to c channels. In EEG and MEG this mapping is a lead-field or gain-matrix function, L(θ) ∈ ℜ cxs , of unknown spatial parameters, θ, such as source location and orientation. Generally, this function rests upon the solution of a well-posed electromagnetic forward model. For invasive LFP recordings that are obtained directly from the neuronal sources, this mapping is a leading diagonal gainmatrix, L =diag(θ 1 ,...θ s ) where the parameters model electrode-specific gains. The observed output at channel i is thus S i (t)=L i g(x), where g(x) is the source output (a mixture of depolarisations) and L i represents the i-th lead-field or row of the gain-matrix. In other words, L = ℜ 1xs is the change in observed potential caused by changes in source activity. These observed outputs can now be used in a generative model of source cross-spectral measures.

Cross-spectral density
The neuronal model comprises a network of neuronal sources, each of which generates stationary time-series in a set of recording channels. These steady-state dynamics are expressed, in the frequency domain, as cross-spectral densities, g ij (ω), at radial frequencies ω, between channels i and j. Under linear systems theory, the cross-spectral density induced by the k-th input or innovation u k (t), is simply the cross-transfer function Γ ij k (ω) times the spectral density of that innovation, g k (ω) u . This transfer function is the cross-product of the Fourier transforms of the corresponding first-order kernels, κ i k (t) and κ i k (t) and in the case of i =j may be regarded as the modulation or self-transfer function).
The convolution kernels mediate the effect of the k-th input, at time t in the past, on the current response recorded at each channel. In general, they can be regarded as impulse response functions and describe the output at the i-th channel, S i (t), produced by a spike of the k-th exogenous input, u k (t). The kernel for each channel obtains analytically from the Jacobian ℑ = ∂f/∂x describing how the system's hidden neuronal states, x(t), couple inputs to outputs. For channel i, and input k the kernel is This means the kernels are analytic functions of ẋ t ð Þ = f x; u ð Þand s(t)=Lg(x); the network's equations of motion and output function respectively. The use of the chain rule follows from the fact that the only way past inputs can affect current channel outputs is through the hidden states. It is these states that confer memory on the system. In Appendix A, we present an alternative derivation of the cross-spectral density using the Laplace transform of the dynamics in state-space form. This gives a more compact, if less intuitive, series of expressions that are equivalent to the kernel expansion. In this form, the Jacobian is known as the state transition matrix. To accommodate endogenous input delays between different sources and intrinsic transmission delays between different populations within one source, we augment the Jacobian using a Hadamard product; Ip I + τI ð Þ −1 I, which is based on a Taylor approximation to the effect of delays, τ (see Appendix A.1 of David et al., 2006 for details).
To furnish a likelihood model for observed data-features we include a cross-spectral density ψ ij induced by channel noise and add a random observation error to the predicted cross-spectral density. Schematic of the source model with intrinsic connections. This schematic includes the differential equations describing the motion of hidden electrophysiological states. Each source is modelled with three subpopulations (pyramidal, spiny-stellate and inhibitory interneurons) as described in (Jansen and Rit, 1995). In this figure these subpopulations have been assigned to granular and agranular cortical layers, which receive forward, backward and lateral connections from extrinsic sources in the network.
Finally, we apply a square root transform to the observed and predicted densities to render the observation error approximately Gaussian. Crossspectral densities will asymptote to a Wishart distribution at a large sample limit (Brillinger, 1969). However, when averaging each cross or auto-spectral frequency variate across multiple trials, one can appeal to the central limit theorem and assume a near normal distribution. In cases where multiple realisations are limited (see Empirical Demonstration below) the square-root transform renders a Gaussian assumption more valid (see Kiebel et al., 2005 for a comprehensive treatment). The advantage of being able to assume Gaussian errors is that we can invert the model using established variational techniques under something called the Laplace assumption ; this means the current DCM is inverted using exactly the same scheme as all the other DCMs of neurophysiological data we have described.
The spectral densities, ψ c and ψ s model the contributions of common noise sources (e.g., a common reference channel) and channel-specific noise respectively. As with the neuronal innovations we parameterise these spectral densities as an unknown mixture of white and pink components. The observation error ɛ~N(0, Σ (λ)) has a covariance function, Σ(λ) = exp(λ)V(ω), where λ are unknown hyperparameters and V(ω) encodes correlations over frequencies 1 . Eqs.
(1) to (5) specify the predicted cross-spectral density between any two channels given the parameters of the observation model {α, β, λ, θ} and the neuronal state equations, {κ, H, γ, A, C}. This means that the cross-spectral density is an analytic function of the parameters ϑ = {α, β, κ,H, γ, A, C, λ,θ} and specifies the likelihood p(g c |ϑ) of observing any given pattern of cross-spectral densities at any frequency. When this likelihood function is supplemented with a prior density on the parameters, p(ϑ) (see Moran et al., 2007 and Table 1), we have a full probabilistic generative model for crossspectral density features p(g c ,ϑ) = p(g c |ϑ) p(ϑ) that is specified in terms of biophysical parameters. Next, we look at how to extract the data features this model predicts.
Evaluating the cross-spectral density The assumptions above establish a generative model for crossspectral features of observed data under linearity and local stationarity assumptions. To invert or fit this model we need to perform an initial feature selection on the raw LFP or M/EEG data. In this section, we describe this procedure, using a vector auto-regression (VAR) model of the multi-channel data and comment briefly on its advantages over alternative schemes. We use a p-order VAR-model of the channel data y, to estimate the underlying auto-regression coefficients A(p) ∈ ℜ cxc (where c is the number of channels 2 ).
Here the channel data at the n-th time point,y n , represents a signal vector over channels. The autoregressive coefficients A (k) are estimated using both auto-and cross-time-series components. These, along with an estimated channel noise covariance, E ij provide a direct estimate of the cross-spectral density, g ij (ω) c = f(A(p)), using the following transform: The estimation of the auto-regression coefficients, A (k) ∈ A(p) uses the spectral toolbox in SPM (http://www.fil.ion.ucl.ac.uk) that allows for Bayesian point estimators of A(p), under various priors on the coefficients. Details concerning the Bayesian estimation of the VARcoefficients can be found in Roberts and Penny (2002). Briefly, this entails a variational approach that estimates the posterior densities of the coefficients. This posterior density is approximated in terms of its conditional mean and covariance; p(A|y, p)= N(μ A ,Σ A ). These moments are optimised through hyperparameters v E and v A (with Gamma hyperpriors; Γ(10 3 , 10 − 3 )) encoding the precision of the innovations e and the prior precision, respectively 3 : Equation 7 uses the posterior mean of the coefficients to provide the cross-spectral density features.
1 In our work, we use an AR(1) autoregression model of errors over frequencies, with an AR coefficient of one half and ensure that the error covariance components associated with the cross-spectral density between channels i and j are the same as the corresponding component for the cross-spectral density between channels j and i. 2 For computational expediency, if there are more than eight channels, we project the data and predictions onto an eight-dimensional subspace defined by the principal components of the prior covariance matrix in channel space where σ i 2 is the prior variance of the i-th spatial or gain parameter.
Pink noise common to all channels Lead-field gain π λ = 0 σ λ = 1 λ Noise hyperparameter Neuronal sources κ e/i Excitatory/inhibitory rate constants π κ e = 4 ms − 1 π κ i = 16 ms − 1 Excitatory/inhibitory maximum post-synaptic potentials Intrinsic connections π γ 1 = 128 In practice, the non-negative parameters of this model are given log-normal priors, by assuming a Gaussian density on a scale parameter, , and π i is the prior expectation and σ i 2 is its log-normal dispersion.
The advantage of our parametric approach is its structural equivalence to the generative model itself. We use uninformative priors but place formal constraints on the estimation of cross-spectral density through the order p of the VAR-model. This has important regularising properties when estimating the spectral features. Alternatively, non-parametric methods could be used to quantify the crossspectral density; e.g., a fast Fourier transform (FFT). However, in the case of a priori information regarding model order, several advantages exist for parametric approaches over the conventional FFT. One inherent problem of the FFT is its limited ability to distinguish between signal components at neighbouring frequencies. This resolution in Hertz is roughly reciprocal to the time interval in seconds, over which data are sampled. This is particularly problematic for short time segments where low delta (2-4 Hz) or theta (4-8 Hz) activity may be of interest. Secondly, when long data sequences are evaluated, averaging methods using a windowed FFT must trade-off spectral leakage and masking from side-lobes with broadening in the main lobe, which further decreases resolution. These limitations can be overcome using an AR model since frequencies can be estimated at any frequency point up to the Nyquist rate, and do not require windowing to obtain average steady-state estimates (Kay and Marple, 1981). The principle concern in using these AR methods is frequency splitting (the appearance of a spurious spectral peak), that ensues with overestimation of the model order (Spyers-Ashby et al., 1998). However, we can avoid this problem by exploiting our neural mass model: principled constraints on the order are furnished by the DCM above and follow from the fact that the order of the underlying VAR process is prescribed by the number of hidden neuronal states in the DCM. Heuristically, if one considers a single source, the evolution of its hidden states can be expressed as a p-variate VAR(1) process where η(t) corresponds to exogenous input convolved with the system's kernel. Alternatively, we can represent this process with a univariate AR(p) process on a single state. Because there is a bijective mapping between source activity and measurement space, the multivariate data can be represented as a VAR(p) process. We provide a formal argument in Appendix B for interested readers. The number of hidden states per source is twelve (see Fig. 1) and this places an upper bound on the order of the VAR model 4 . The relationship between the VAR model order and the number of hidden states can be illustrated in terms of the log-evidence ln p(y|p) for VAR models with different orders: we convolved a mixture of pink and white noise innovations with the DCM's first-order kernel (using the prior expectations) and used these synthetic data to invert a series of VAR models of increasing order. Fig. 2 shows the ensuing model evidence jumps to a high value when the order reaches twelve, with smaller increases thereafter.

Model inversion and inference
Model inversion means estimating the conditional density of the unknown model parameters p(ϑ|g c ,m) given the VAR-based crossspectral density features g c for any model m defined by the network architecture and priors on the parameters, p(ϑ|m). These unknown parameters include (i) the biophysical parameters of the neural-mass model, (ii) parameters controlling the spectral density of the neuronal innovations and channel noise, (iii) gain parameters and (iv) hyperparameters controlling the amplitude of the observation error in Eq. (5). The model is inverted using standard variational approaches described in previous publications and summarised . These procedures use a variational scheme in which the conditional density is optimized under a fixed-form (Laplace) assumption. This optimisation entails maximising a free-energy bound on the log-evidence, 1n p(g c |m). Once optimised, this bound can be used as an approximate log-evidence for model comparison in the usual way. Comparing DCMs in a way that is independent of their parameters is useful when trying to identify the most plausible architectures subtending observed responses (Penny et al., 2004;Stephan et al., 2007) and is used extensively in subsequent sections. The focus of this paper is on the approximate log-evidence 1n p(g c |m) and conditional densities p(ϑ|g c ,m) and, in particular, whether they can support robust inferences on neural-mass models and their parameters.

Identifiability and face validity
In this section, we try to establish the face validity of the DCM and inversion scheme described in the previous section. Here, we use synthetic datasets generated by models with known parameters. We then try to recover the best model and its parameters, after adding noise to the data. We will address both inference on models and their parameters. This involves searching over a space or set of models to find the model with the greatest evidence. One then usually proceeds by characterising the parameters of the best model in terms of their conditional density. In both inference on models and parameters, we used the same model employed to analyse the empirical data of the next section. This enabled us to relate the empirical results to the simulations presented below.

Inference on model-space
For inference on models, we generated data from three two-source networks using extrinsic connections from the first to the second source, from the second to the first and reciprocal connections. To assess inference on model-space, we first performed a model comparison using a small set of two source networks, delimited by their forward connections only. Specifically, each of the three models that were used to generate the model-specific data sets, were 4 In practice, we do not use the upper bound but use p = 8 for computational expediency; this seems to give robust and smooth spectral features. compared across each set of data. We hope to show that the inversion scheme identified the correct model in all three cases. In all three models exogenous neuronal inputs entered both sources and the connections were all of the forward type. These three models are also evaluated in the empirical analysis. The parameter values for all three models were set to their prior expectations 5 , with the exception of the extrinsic connections, for which we used the conditional estimates of the empirical analysis. Data were generated over frequencies from 4 to 48 Hz and observation noise was added (after the square root transform). The variance of this noise corresponded to the conditional estimate of the error variance from the empirical analysis.
The resulting three data sets were then inverted using each of the three models. For each data set, this provided three log-evidences (one for each model used to fit the spectral data). We normalised these to the log-evidence of the weakest model to produce log-likelihood ratios or log-Bayes factors. The results for the three models are shown in Table 2a. These indicate that, under this level of noise, DCM was able to identify the model that actually generated the data. In terms of inference on model-space, we computed the posterior probability of each model by assuming flat or uniform priors on models; under this assumption p(y|m i ) ∝ p(m i |y), which means we can normalise the evidence for each model, given one data set and interpret the result as the conditional probability on models. These are expressed as percentages in Table 3b and show that we can be almost certain that the correct model will be selected from the three-model set, with conditional probabilities close to one for correct models and close to zero for incorrect models Following the suggestions of our reviewers, we performed a second analysis where we compared all possible twosource DCM networks. This model space, which comprised 256 models in total, was derived by considering all possible permutations of inputs and connections. We would like to emphasize that this brute force method of testing all possible models (which can be very expensive in terms of computation time) is appropriate only when using small networks with a limited number of free variables. In the applied case of analysing empirical data, DCM is used to test a limited number of hypotheses regarding the type of neuronal architecture that subtends observed experimental responses (e.g. Grol et al., 2007;Stephan et al., 2006aStephan et al., , 2007. This is because (i) the precision of inference with DCM generally favours a strongly hypothesis-driven approach and (ii) the combinatorics of possible DCMs quickly explodes with the number of sources and connections.
The results of this second analysis show that DCM can correctly identify the generative model, even when all 256 possible models are considered. For each of the three data sets that were inverted, the logevidence was greatest for the correct generative model (Fig. 3). The relative log-evidence or log Bayes-factors for the best compared to the second best model offered strong support for the correct model, in all three cases (ln BF 1 = 14.6 ; ln BF 2 = 16.2 ; ln BF 3 = 16.4). Note that when we talk of the 'best' model, we mean a model for which there is strong evidence relative to any competing model. In other words, we can be 95% confident that the evidence for the best model is greater than any other (this corresponds to a relative log-evidence of about three). In summary, Bayesian model comparison with DCM seems to be able to identify these sorts of models with a high degree of confidence.

Inference on parameter-space
For inference on parameters, we looked at the effects of changing the maximum amplitudes of excitatory postsynaptic potentials (EPSP), which control the efficacy of intrinsic and extrinsic connections and the effects of changing the extrinsic connections themselves. These effects are encoded in the parameters H e ∈ ϑ and A F ∈ ϑ, respectively. We addressed identifiability by inverting a single model using synthetic data with different levels of noise. By comparing the true parameter values to the conditional confidence intervals, under different levels of noise, we tried to establish the accuracy of model inversion and how this depends upon the quality of the data. As above, we chose different levels of noise based upon the error variance estimated using real data. Specifically, we varied the noise levels from 0.001 to 2 times the empirical noise variance, allowing a broad exploration of relative signal-to-noise ratios (SNR) .
The model we used is the same model identified by the empirical analyses of the next section. This model comprised two sources and two LFP channels with no cross-talk between the channels. The parameter values were based on the estimates from the empirical analysis. Specifically, source 1 sent a strong extrinsic connection to source 2, whose excitatory cells had a relatively low postsynaptic response (Fig. 4). All parameter values were set to their prior expectation, except for the parameters of interest H e (2) and A 21 F .
In our DCM, parameters are optimised by multiplying their prior expectation with an unknown log-scale parameter that is exponentiated to ensure positivity. Hence, a log-scale parameter of zero corresponds to a scale-parameter of one, which renders the parameter value equal to its prior expectation. By imposing Gaussian priors on the log-scale parameters we place log-normal priors on the parameters per se. To model reduced postsynaptic amplitudes in source 2, H e 2 had a log-scale parameter of −0.4 representing a exp (− 0.4) = 67% decrease from its prior expectation. The log-scale parameter encoding the forward connection from source 1 to source 2, namely A 2,1 F , was set to 1.5, representing a exp(1.5) = 448% increase from its prior expectation. Both sources received identical neuronal innovations, comprising white and pink spectral components (as specified in Equation 2 above). Data were generated over frequencies from 4 to 48 Hz. Posterior density estimates for all parameters, p(ϑ | g c ,m) were obtained for 128 intermediate noise levels between one thousandth and twice the empirical noise variance. The conditional expectation or MAP (maximum a posteriori) estimates of H e (2) and A 2,1 F are shown in   conditional or posterior density. The lower panels show the conditional probabilities p(H e (2) b 8) and p(A 2,1 F N 32) that the parameters differed from their prior expectations. It can be seen that the conditional expectation remained close to the true values for both parameters, despite differences in their conditional precision, which decreased with increasing levels of observation noise. This can be seen in the shrinking Bayesian confidence intervals (grey area) and in the small increase in conditional probabilities with less noise. This effect is more marked for the estimates of H e ; where the confidence intervals splay at higher noise levels. This jagged variance in the confidence interval itself reflects the simulation protocol, in which each data set comprised a different noise realisation. In addition, the lowest conditional probability (that the parameter posterior estimate differed from the prior) for all simulations, occurred for this EPSP parameter where p(H e (2) b 8) = .74 at a high noise level of 1.83. In contrast, the connection strength parameter remained within tight confidence bounds for all noise levels and produced a minimum conditional probability, p(A 2,1 F b 32) = .99. This minimum occurred again as expected, at a high noise levels of 1.72 times the empirical noise level. One can also see, for both parameters a trend for conditional estimates to shrink towards the prior values at higher noise levels; this shrinkage is typical of Bayesian estimators; i.e. when data become noisy, the estimation relies more heavily upon priors and the prior expectation is given more weight . Importantly, while the 90% confidence bounds generally encompass the true values, the prior values remain outside. In summary, under the realistic levels of noise considered, it appears possible to recover veridical parameter estimates and be fairly confident that these estimates differ from their prior expectation.

Empirical demonstration
In this section, we present a similar analysis to that of the previous section but using real data. Furthermore, to pursue construct-validity, we invert the model using data acquired under different experimental conditions to see if the conditional estimates of various synaptic parameters change in a way that is consistent with previous analyses of functional connectivity using cross-correlograms. These analyses suggest an increase in coupling between the amygdala and hippocampus that is expressed predominantly in the theta range. This section considers the empirical data set-up, experimental design and inference on models and parameters. We interpret the conditional estimates of the parameters, in relation to the underlying physiology, in the Discussion.

Empirical LFP data
Local field potential data were acquired from mice (adult male C57B/6J mice, 10 to 12 weeks old) during retrieval of a fear-memory, learned in a Pavlovian conditioning paradigm using acoustic tones (CS+ and CS-) and foot-shock (US). A previous analysis of these data (Seidenbecher et al., 2003) points to the importance of theta rhythms (∼ 5 Hz) during fear-memory retrieval Buzsaki, 2002). Specifically, Seidenbecher et al., (2003) demonstrated an increase in theta-band coupling between area CA1 of the hippocampus and the lateral nucleus of the amygdala (LA) during presentation of the CS+. Moreover, theta synchrony onset was correlated with freezing, a behavioural index of fear-memory (Maren et al., 1997). For the purposes of demonstrating our DCM, we here revisit the data of a single animal and show that this 'on/off' theta synchrony can be explained with plausible neurobiological mechanisms at the synaptic level, using the methodology described in the previous sections. These data represent quasi-stationary signals as evidenced by small time variations in signal strength (Figs. 5a and b). The term "steady-state" refers to the frequency estimates that represent only the constant spectral amplitude and are the complete data feature captured by this DCM. Below, we examine induced steady-state responses, where spectral estimates are averaged over independent trials. However, there is no principled reason why the current model may not be inverted using spectra from a time-frequency analysis of evoked responses or event related responses, under the assumption of local stationarity over a few hundred milliseconds (e.g. Robinson et al., 2008;Kerr et al., 2008). LFP data were recorded from two electrodes in the LA and the CA1 of the dorsal hippocampus. The data comprised 6 min of recording, during which four consecutive CS-tones and four consecutive CS+ tones were presented, each lasting 10 s. Freezing behaviour was seen prominently during the CS+. Preliminary analysis, using timefrequency spectrograms, revealed that the hippocampal region exhibited strong background theta rhythms, during CS+ and CSepochs (Figs. 5a and b); whereas theta activity in lateral amygdala was prominent only during the CS+ stimulus. Fig. 6 displays the first CS+ and CS-epochs of fear recall. Cross-spectra were computed for threesecond epochs that followed the onset of freezing behaviour in the four CS+ epochs and order-time matched CS-epochs. Cross-spectral densities were computed from 4 to 48 Hz, using an eighth-order VAR model, for each epoch and averaged across conditions (Fig. 7). This revealed spectral features that corroborated the analysis of Seidenbecher et al., (2003); with pronounced fast theta activity in the hippocampus and a marked theta peak in the cross-spectral density. The amygdala showed a broader spectrum, with a preponderance of lower theta activity and a theta peak in, and only in, the CS+ trial.

Dynamic causal modelling
These cross-spectral densities were then inverted using a series of generative models. These models were used to test the direction of information flow during heightened theta synchrony following CS+.
Given key experimental differences between CS-and CS+ trials, we introduced log-scale parameters β ki to model trial-specific variations in specified parameters: β ki is the k-th experimental effect on the i-th parameter and ϑ i j is the value of the i-th parameter ϑ i in the j-th trial or condition. These effects are meditated by an experimental design matrix X, which encodes how experimental effects are expressed in each trial. Eq. (10) is a generic device that we use to specify fully parameterised experimental effects on specific parameters in multi-trial designs. In this example, β 1i is simply a log-scale parameter (Table 1) specifying the increase (or decrease) in CS+ relative to CS-trials. The parameters showing trial-specific effects were the extrinsic connections and excitatory post synaptic amplitudes; all other parameters we fixed over trials.

Inference on models
The extrinsic connection types in our DCM are based on connections between isocortical areas (Felleman and Van Essen 1991); however, in this analysis we are dealing with allocortical (CA1) and subcortical (LA) brain regions that have no clearly defined hierarchical relationship. Therefore, our first step was to establish which connection type best explained the measured LFP data. We approached this using model comparison using DCMs with reciprocal connections between CA1 and LA. The connections in these models were (model 1) forward; (model 2) backward; (model 3) lateral; (model 4) a combination of forward and backward and (model 5) a combination of all three. Bayesian model comparison based on the log-evidence indicated that the most likely type of inter-regional connections was of the 'forward' type (model 1); Fig. 5. Conditional densities of parameter estimates using the two-source simulations. The data were generated under known parameter values (red line) and mixed with noise (one thousandth to twice the empirical noise estimate). The EPSP parameter (Top left) was exp (−0.4) = 67% of its prior expectation. The MAP estimates for this log-scale parameter (plotted in hashed red) display a characteristic shrinkage toward the prior of zero at high levels of noise (90% confidence intervals are plotted in grey). The extrinsic connection parameter (Top right) A 21 F displays a similar behaviour, when simulated at exp(1.5) = 448% of its prior expectation. The grey lines show the prior value (of zero) used for the simulations. The bottom graphs show the conditiona l probabilities that the MAP estimates of the log-scale parameters differ from their prior expectation.
where connections originate from pyramidal cells and target excitatory interneurons. Fig. 8a shows the relative model evidences for the five models (i.e., the log-Bayes factor with respect to the worst model).
Next, employing the optimal connection type, three different input schemes were tested to find where driving inputs, i.e. from cortical regions, enter during CS+ and CS-epochs. These DCM's included; (model 1) comprising exogenous inputs to both CA1 and LA; (model 2) exogenous input to hippocampal region CA1 only and (model 3) the lateral amygdala only. Fig. 8b shows that the best model is model 1; where inputs enter both the lateral amygdala and hippocampal CA1.
Having established a causal architecture for the inputs, three further models were tested to examine whether connections were bidirectional or unidirectional. These results are displayed in Fig. 8c, where model 1 had bidirectional connections, model 2 had unidirectional hippocampal to amygdala connections and model 3 had connections from amygdala to hippocampus. We see that the most plausible model contains bidirectional connections between hippocampus and amygdala.
In principle, as in the analysis of synthetic data above, there are 256 possible DCMs that could explain the empirical data. However, to provide an exemplar strategy for when where exhaustive model searches are not possible, we finessed the search of model space by optimising various model attributes sequentially. This series of line searches can be regarded as a heuristic search over model space to identify the most likely model. One concern in using this sort of heuristic search is that conditional dependencies among the freeparameters do not guarantee the global maximum is found. To address this, we performed a further analysis of the 'complete' model, which comprised reciprocal connections of all types (forward and backward and lateral), and inputs to both regions. The resulting conditional covariance matrix was examined in order to investigate potential codependencies between the parameters. The posterior correlation matrix is shown in Fig. 9 and shows only relatively small interdependencies between the search parameters. Overall, the accuracy of the best performing model was impressive; the fits to the crossspectral data or shown in Fig. 10 and are almost indistinguishable from the observed spectra. Having identified this model we now turn to inference on its parameters.

Inference on parameters
We now look at the conditional probabilities of key parameters showing trial-specific or conditioning effects, under the most plausible model. These parameters were the extrinsic connection strengths and intrinsic postsynaptic efficacies. When comparing the CS-and CS+ trials, we observe decreased amygdala-hippocampal connectivity and increased hippocampal-amygdala connectivity. Fig.  11 shows the MAP estimates of ln β 1i , which scale the extrinsic connections relative to 100% connectivity in CS-. In addition, there where 'f' demarks freezing periods (the behavioural correlate of fear recall), 'e' exploration, 'r' risk assessment and 's' stereotypical behaviour. During the CS+ condition theta activity can be observed in both electrodes, in contrast, during the CS-condition, theta activity is evident in hippocam pal data but much less in the amygdala.
were small increases in postsynaptic efficacy in the amygdala for the CS+ relative to CS-Quantitatively, hippocampus-amygdala connectivity increased by 26%, with a conditional probability of 99.97% that this effect was greater than zero. In contrast, amygdala-hippocampus forward connections decreased by 72%, with a conditional probability of almost one. The relative change of intrinsic amygdala excitatory postsynaptic amplitude was 8% with a high conditional probability 99.85% that the increase was greater than zero. In contrast, changes in hippocampal excitatory postsynaptic amplitude were unremarkable, (0.002%) and with a conditional probability that was close to chance (69.70%).
In summary, these results suggest that the hippocampus and amygdala influence each other through bidirectional connections. Steady states responses induced by CS+, relative to CS-stimuli appear to increase the intrinsic sensitivity of postsynaptic responses in the amygdala and with an additional sensitization to extrinsic afferents from the hippocampus. At the same time the reciprocal influence of the amygdala on the hippocampus is suppressed. These conclusions are exactly consistent with early hypotheses based on correlations (see below).

Discussion
We have described a dynamic causal model (DCM) of steadystate responses that are summarised in terms of cross-spectral densities. These spectral data-features are generated by a biologically plausible, neural-mass model of coupled electromagnetic sources. Under linearity and stationarity assumptions, inversion of the DCM provides conditional probabilities on both the models and the synaptic parameters of any particular model. The model employed here has previously been shown to produce oscillatory activity at all standard EEG frequency bands, in its linear approximation (Moran et al., 2007). A nonlinear model analysis could uncover interesting dynamics in some of these bands and will be the subject of further research. This would call for a relaxation of the linearization assumption and present an interesting challenge for model inversion (c.f., Valdes et al., 1999).
Recently, a number of studies have established the utility neural mass models for interrogating EEG data. The motivations behind this approach are varied. In Riera et al., (2006) neural masses are used to investigate local electrovascular coupling and their multimodal time domain expression in EEG and fMRI data; while Valdes et al. (1999) employ neural masses to examine the emergent dynamic properties of alpha-band activity. Closer to the work presented here, Robinson et al., (2004) have developed a frequency domain description of EEG activity that highlights the importance of corticothalamic interactions, using neural field models. As in Robinson et al., (2004), the goal of DCM for steady-state responses is to make inferences about, regionally-specific neurotransmitter Fig. 7. Average cross-spectral densities across all CS+ (red) and CS-(blue) trials. Top left: hippocampal autospectrum, Top right: hippocampal-amygdala cross spectrum, Bottom right: amygdala autospectrum. These spectral data features were evaluated from three second epochs after the first freezing behaviour during CS+ and the time/order matched CS-trials. Peaks at theta frequency are evident in both CS+ and CS-conditions with reduced theta activity in the amygdala during CS-. and neuromodulatory action that unfolds in a connected but distributed network. The DCM presented in this paper assumes a network of point sources (c.f., equivalent current dipoles) that may be usefully extended to cover neural field models of the sort considered by Robinson et al., (2004). DCM enables inference about synaptic physiology and changes induced by pharmacological or behavioural manipulations both within and between neural ensembles; furthermore, the methodology can be applied to the cross-spectral density of invasive or non-invasive electrophysiological recordings.
Usually, in Dynamic Causal Modelling, data prediction involves the integration of a dynamical system to produce a time-series. In the current application, the prediction is over frequencies; however, the form of the inversion remains exactly the same. This is because in DCM for deterministic systems (i.e., models with no system or state noise) the time-series prediction is treated as a finite-length static observation, which is replaced here with a prediction over frequencies. The only difference between DCM for time-series and DCM for cross-spectral density is that the datafeatures are represented by a three dimensional array, covering c × c channels and b frequency-bins. In conventional time-series analysis the data-features correspond to a two-dimensional array covering c channels and b time-bins. The spectral summary used for data inversion comprises the magnitude of cross-spectra, which is a sufficient data-feature, under quasi-stationarity assumptions. Information regarding instantaneous phase or phase-coupling among sources are not considered in this treatment. In some settings, phase-coupling has been used in linear and nonlinear settings to model information exchange across discrete brain sources (e.g., Brovelli et al., 2004, Rosenblum et al., 1996. The DCM presented here represents a complement to this approach by offering a biophysically meaningful, mechanistic description of neuronal interactions. An alternative DCM approach for M/EEG analysis has been developed to describe (time-dependent) phenomenological coupling among frequencies at different brain sources that occur through both linear and nonlinear mechanisms . However, neither DCM model the instantaneous phase. Other recent developments in M/EEG data analysis have tackled this issue: Approaches involving ICA (Anemüller et al., 2003) have been used to describe the phases of induced responses on a trial by trial basis, and make use of complex lead-field distributions to retain the imaginary parts of the source signals at the scalp level. However this approach studies independent components of brain activity and as such, is not directly comparable to DCM. DCM for phase responses is an active area of research (Penny et al., 2008) and will receive a full treatment elsewhere.
Our simulation studies provide some face validity for DCM, in terms of internal consistency. DCM was able to identify the correct model and, under one model, parameter values were recovered reliably in settings of high observation noise. Changes in the postsynaptic responsiveness, encoded by the population maximum EPSP, were estimated veridically at levels below prior threshold, with a conditional confidence of more than 74%; even for the highest levels of noise. Similarly, inter-area connection strength estimates were reasonably accurate under high levels of noise. With noisy data, parameter estimates tend to shrink towards their prior expectation, reflecting the adaptive nature of the weights afforded to prior and data information in Bayesian schemes.
We have presented an analysis of empirical LFP data, obtained by invasive recordings in rat CA1 and LA during a fear conditioning paradigm. A previous analysis of these data (Seidenbecher et al., 2003) showed prominent theta band activity in CA1 during both CS+ and CS-conditions, whereas LA expresses significant theta activity during CS+ trials only. Using an analysis of functional  connectivity 6 , based on cross-correlograms of LA/CA1 activity in the theta range, Seidenbecher et al., (2003) demonstrated an increase in connectivity between these two brain regions during CS+ trials. This is consistent with a trial-specific enabling or gating of the CA1 → LA connection during retrieval of conditioned fear in the CS+ condition, leading to a transient coupling of LA responses to the conditionindependent theta activity in CA1. However, this analysis of functional connectivity was unable to provide direct evidence for directed or causal interactions. This sort of evidence requires a model of effective connectivity like DCM. The DCM analysis in the present study confirmed the hypothesis based on the crosscorrelogram results of Seidenbecher et al., (2003). The DCM analysis showed a selective increase in CA1 → LA connectivity during CS+ trials, accompanied by a decrease in LA → CA1 connection strength. An additional finding was the increase in the amplitude of postsynaptic responses in LA during CS+ trials. This result may represent the correlate of long term potentiation of LA neurons following fear conditioning (Rodrigues et al., 2004;LeDoux, 2000). In summary, one could consider these results as a demonstration of construct validity for DCM, in relation to the previous analyses of functional connectivity using cross-correlograms.
The analysis of parameter estimates was performed only after Bayesian model selection. In the search for an optimum model, we asked (i) which connection type was most plausible, (ii) whether neuronal inputs drive CA1, LA or both regions; and (iii) which extrinsic connectivity pattern was most likely to have generated the observed data (directed CA1 → LA or LA → CA1 or reciprocal connections). The results of sequential model comparisons showed that there was a very strong evidence for a model in which (i) extrinsic connections targeted excitatory neurons, (ii) neuronal inputs drove both CA1 and LA and (iii) the two regions were linked by reciprocal connections. While there is, to our knowledge, no decisive empirical data concerning the first two issues, the last conclusion from our model comparisons is supported strongly by neuroanatomic data from tract-tracing studies. These have demonstrated prominent and reciprocal connections between CA1 and LA (see Pitkänen et al., 2000 for a review). This correspondence between neuroanatomic findings and our model structure, which Fig. 11. Trial-specific effects encoding differences between the CS+, relative to CS-trials. Top left: Hippocampal EPSP displays b 1% change on CS+ trials. Top right: amygdala to hippocampus forward connection strength decreases by 72% on CS+ trials. Bottom left: Hippocampus to amygdala forward connection strength increases by 26% on CS+ trials. Bottom right: amygdala EPSP increases by 8% in CS+ relative to CS-trials. 6 Functional connectivity is defined as the statistical dependence between two biophysical time-series, whereas effective connectivity refers to the directed and casual influence one biophysical system exerts over another . was inferred from the LFP data, provides further construct validity, in relation to neuroanatomy.
In conclusion, this study has introduced a novel variant of DCM that provides mechanistic explanations, at the level of synaptic physiology, for the cross-spectral density of invasive (LFP) or noninvasive (EEG) electrophysiological recordings. We have demonstrated how this approach can be used to investigate hypotheses about directed interactions among brain regions that cannot be addressed by conventional analyses of functional connectivity. A previous (single-source) DCM study (Moran et al., 2008) of invasive LFP recordings in rats demonstrated the consistency of model parameter estimates with concurrent microdialysis measurements. The current study is another step towards establishing the validity of models, which we hope will be useful for deciphering the neurophysiological mechanisms that underlie pharmacological effects and pathophysiological processes (Stephan et al., 2006b).

Software note
Matlab routines and demonstrations of the inversion described in this paper are available as academic freeware from the SPM website (http://www.fil,ion.ucl.ac.uk/spm) and will be found under the 'api_erp', 'spectral' and 'Neural_Models' toolboxes in SPM8.