Canonical information flow decomposition among neural structure subsets

Partial directed coherence (PDC) and directed coherence (DC) which describe complementary aspects of the directed information flow between pairs of univariate components that belong to a vector of simultaneously observed time series have recently been generalized as bPDC/bDC, respectively, to portray the relationship between subsets of component vectors (Takahashi, 2009; Faes and Nollo, 2013). This generalization is specially important for neuroscience applications as one often wishes to address the link between the set of time series from an observed ROI (region of interest) with respect to series from some other physiologically relevant ROI. bPDC/bDC are limited, however, in that several time series within a given subset may be irrelevant or may even interact opposingly with respect to one another leading to interpretation difficulties. To address this, we propose an alternative measure, termed cPDC/cDC, employing canonical decomposition to reveal the main frequency domain modes of interaction between the vector subsets. We also show bPDC/bDC and cPDC/cDC are related and possess mutual information rate interpretations. Numerical examples and a real data set illustrate the concepts. The present contribution provides what is seemingly the first canonical decomposition of information flow in the frequency domain.


INTRODUCTION
Human behavior is primarily thought as a property that emerges from the interaction of several brain areas, body parts, and the environment. Understanding how these elements dynamically interact is one of major themes of systems neuroscience. Several multivariate time series methods-old and new-have been introduced to describe the interdependence between brain areas using signal modalities like EEG, BOLD signals, MEG and LFP-and are collectively called connectivity measures. Partial directed coherence (PDC) (Baccalá and Sameshima, 2001) and directed coherence/directed transfer function (DC/DTF) (Kamiński and Blinowska, 1991) are two examples of such connectivity measures. Both describe complementary aspects (see Baccalá and Sameshima, 2014 for an in depth discussion) of how information flows between pairs of univariate time series components that belong to a multivariate vector of simultaneously observed time series (Takahashi et al., 2010). Recently, PDC and DC have been generalized (as bPDC/bDC, respectively) to describe how subsets (blocks) of components within a time series vector interrelate (Takahashi, 2009;Faes and Nollo, 2013). This is specially important for neuroscience applications as one often wants to investigate the interaction between sets of time series that are circumscribed to an observed region of interest (ROI) with respect to another physiologically relevant ROI (Nedungadi et al., 2011). The potential relevance of this type of question alone justifies looking for their deeper meaning in terms of information theoretical quantities.
Despite their practical importance, bPDC/bDC suffer from the limitation that several time series within a given subset may be irrelevant or interact in opposition to one another thereby posing interpretation difficulties. Also, in several situations, a researcher may be interested in just the few "best" descriptions of interaction between two sets of time series but not in the total amount of information flowing between them. For a more concrete example, assume that two brain areas interact and that bPDC is large. In this situation, it does not straightforwardly follow that all brain region components are interacting in the same way, or even whether some such components may be ignored. One way to address this limitation is to decompose bPDC/bDC into different components weighed according to relevance.
The aim of this article is twofold: (a) to provide a proper information theoretic interpretation for bPDC/bDC and (b) to introduce a canonical decomposition of information flows, henceforth termed, respectively, canonical PDC/DC (cPDC/cDC). These new decompositions allow us to closely mimic classical canonical correlation analysis so that different dynamically relevant interaction modes between brain areas can be exposed. Due to PDC interpretability in terms of Granger causality , a consequence of the present formulation is that cPDC represents a long sought frequency domain counterpart to time domain canonical decompositions of Granger causality (Sato et al., 2010;Ashrafulla et al., 2013).
The article is organized as follows. We first introduce the background and notation necessary for the rest of the article (section 2). In the results section (section 3), we first show that both bPDC and bDC between two subsets of processes are block coherences between suitably defined underlying processes. Then, we demonstrate that such coherences are nothing but monotonic transformations of the mutual information rate between the respective processes (Gelfand and Yaglom, 1959;Takahashi et al., 2010;Nedungadi et al., 2011) leading immediately to their interpretability as mutual information rates. Next, we introduce cPDC and cDC and prove that they are the non-zero eigenvalues of the matrices whose determinants underlie the respective bPDC and bDC definitions (section 4). Using simulated examples and publicly available data we illustrate the usefulness of cPDC/cDC (section 5) followed by a brief discussion (section 6). Proof details are left to the Appendix.

BACKGROUND
Let X 1 , . . . , X K be K distinct multivariate time series vectors with dimension M 1 , . . . , M K . Using T to indicate matrix transposition, let X(t) = X 1 (t) T , . . . , X K (t) T T for each time t ∈ Z be a second order stationary time series with spectral density matrix S(ω) at each frequency ω ∈ [−π, π). To justify our formal computation, we assume that S(ω) is uniformly bounded from below and above and invertible at all frequencies (Hannan, 1970). This is called the boundedness condition which guarantees that the following autoregressive (AR) representation of X holds in the mean square sense where (t) = 1 (t) T . . . K (t) T T stands for a zero mean innovation process, i.e., E (t) (t) T = and E (t) (l) T = 0 for l = t. For l ≥ 1, A(l) are (M 1 + . . . + M K ) 2 -dimensional matrices. Let A pq (l) for p, q ∈ {1, . . . , K} and l ≥ 1 be M p × M qdimensional matrices so that A(l) has the following structure 1ωl . Under the boundedness condition, the following moving average (MA) mean square sense representation for the process X also holds where H(l) for l ≥ 0 are (M 1 + . . . + M K ) 2 -dimensional matrices. LetH(ω) = l ≥ 0 H(l)e − √ −1ωl . We have thatĀ * (ω) = H −1 (ω) for all ω ∈ [−π, π). The superscript * indicates the matrix complex conjugate. Let P(ω) = S −1 (ω). bPDC from the multivariate process X j to the process X i at frequency ω, denoted π (b) ij (ω), is defined (Takahashi, 2009;Faes and Nollo, 2013) by where det indicates the determinant and the subscript indices relate to the natural block structure associated with the matrices. Let = −1 . bDC from the multivariate process X j to the process X i at frequency ω, denoted γ (b) ij (ω), is defined (Takahashi, 2009;Faes and Nollo, 2013) by Note that the present bDC definition differs slightly from the one in Faes and Nollo (2013). We removed the unnecessary condition of strict causality, i.e., diagonality of , simply by substituting −1 jj by −1 jj in their definition of bDC as it is more suited for formulating information theoretic results as shown ahead.
Consider a second-order stationary multivariate process W(t) = Y(t) T Z(t) T T . The block coherence between Y and Z at frequency ω is defined as (Nedungadi et al., 2011) Observe that we used the process name in the subscript of the power spectrum S to indicate the corresponding spectral density matrices. In the rest of the article, we will use interchangeably the process name or the corresponding indices in the subscript whenever there is no ambiguity. Another important definition is that of mutual information rate (MIR) between two multivariate strictly stationary processes Y and Z is The classical relationship between block coherence (Equation 5) and mutual information rate (Equation 6) follows from Theorem. (Gelfand and Yaglom, 1959;Pinsker, 1964) If Y and Z are jointly stationary Gaussian processes satisfying the boundedness condition, we have that the MIR between Y and Z is given by Now, following Takahashi et al. (2010), we define, for i ∈ {1, . . . , K}, the partialized process η i by where E[ | ] henceforth denotes the best linear conditional predictor. Likewise the partialized innovation process Observe that both partialized process and partialized innovation process were defined in Takahashi et al. (2010) but for the special case of scalar η i and ζ i .

RELATION BETWEEN bPDC/bDC AND MUTUAL INFORMATION RATE
Our first result establishes the relationship between bPDC and block coherence and is analogous to Theorem 1 in Takahashi et al. (2010).
Theorem 1. Let X satisfy the boundedness condition. For all i, j ∈ {1, . . . , K} and all frequencies ω ∈ [−π, π) we have that A straightforward corollary is Corollary 1. Let X be a stationary Gaussian process and satisfy the boundedness condition. For all i, j ∈ {1, . . . , K} we have that Similar results also hold for bDC.

Theorem 2. Let X satisfy the boundedness condition. For all
and Corollary 2. Let X be a stationary Gaussian process and satisfy the boundedness condition. For all i, j ∈ {1, . . . , K}, we have that

CANONICAL PDC AND DC
Canonical correlation is a classical method developed initially by Hotelling (1936) to address the relationship between random vectors. Brillinger (1981) generalized the method for time series and gave an excellent account of the relationship between canonical correlation analysis and different ideas in multivariate statistics. Our formulation of canonical coherence is equivalent to the definition introduced by Brillinger (1981). Let Y and Z be respectively M Y -and M Z -dimensional jointly second order stationary processes. To better understand the relationship between Y and Z, we can ask the following question: Which components of Y and Z are most representative of the interaction between the processes? One way to formalize this is to consider filtering matrices B Y (l) ( 1 × M Y ) and B Z (l) ( 1 × M Z ), for all l ∈ Z and define the scalar processes b Y and b Z by and π). If furthermore Y and Z are jointly stationary Gaussian processes, then this is equivalent to maximizing MIR b Y b Z . Following the above idea, we define the first canonical coherence between Y and Z at frequency ω by Assume that the supremum (Equation 16) is achieved forb Y andb Z , which we call first canonical time series. Consider the residual processes Observe that Y 1 and Z 1 are uncorrelated to the processesb Y andb Z , respectively. The second canonical coherence C (c 2 ) YZ (ω) is defined recursively on the residues by C (c 2 ) one may define the m-th canonical coherence as In this way, it is possible to construct a hierarchy of coherences where each element captures the dependence structure that is not explained by the other elements. Finally, we introduce cPDC and cDC. For Similarly, the m-th canonical DC from j to i at frequency ω At first sight, it is unclear whether the canonical PDC and DC exist at all or even if they are uniquely defined. More importantly, nor is it obvious that it is possible to compute them. Despite these initial uncertainties, we show next that canonical coherences are consistently defined as the non-null eigenvalues of some specific matrices.

Frontiers in Neuroinformatics www.frontiersin.org
Let λ m (Q) denote its m-th eigenvalue from matrix Q ordered from its largest to its smallest value. The following theorem furnishes a practical way to calculate cPDC and cDC.

Theorem 3. Under the boundedness condition for X the following identities hold:
and Furthermore it is possible to relate bPDC/bDC and cPDC/cDC via Theorem 4. Under the same conditions of Theorem 3 the following identities hold: and A simple consequence of Equations (22), (23) is that for stationary Gaussian processes satisfying the boundedness condition, we now have a decomposition of the mutual information rates (24) and Note how the quantities being summed in Equations (24), (25) are formally themselves contributions to the mutual information written in terms of their canonical coherence contributions.

SIMULATED MODELS
Example 1. To provide insight into cPDC behavior, we begin with a very simple example that can be fully and explicitly solved. Let a vector of observed time series [Y 1 , Y 2 , Y 3 , Y 4 ] be a real valued autoregressive process of order p = 1 and = I. The autoregressive coefficients of the model are described by as in Figure 1. By adopting time series blocks as X , when e = f = g = h = 0, direct computation shows that the canonical PDC from block X 2 to X 1 is zero, i.e., π (c 1 ) 12 (ω) = π (c 2 ) 12 (ω) = 0 for all ω (reflecting the nullity of the 2 × 2 A(l) right side upper block), whereas the coupling in the opposite direction contributes two distinct components: and π (c2) 21 (ω) = a 2 + b 2 + c 2 + d 2 − (a 2 + b 2 + c 2 + d 2 ) 2 − 4(ad − bc) 2 2.5 − 2 cos (ω) . (28) For ad = bc-i.e., if the lower left 2 × 2 block determinant of A(l) is zero as well, the total number of non-zero cPDC components reduces to just 1.
Even if e, f , g, h are non-zero, i.e., regardless of intrablock dynamics, a = b = 0 suffices to produce the single non-zero π (c 1 ) 21 (ω) component (shown in Figure 2A) since block X 1 interacts with block X 2 exclusively through Y 4 , i.e., π (c 2 ) 21 (ω) ≡ 0. In this case, since only Y 4 is directly impacted by the interaction, only one combined source of variance exists even though two links exist between the blocks. Likewise if b = d = 0, even though two links leave X 1 , there is only one dynamical component that counts.
This contrasts with the situation when b = c = 0 where two non-zero π (c 2 ) 21 (ω) coexist ( Figure 2B) regardless of the values of e, f , g, h which, nonetheless, contribute to the relative size of the components. Example 2. In the next example, a 10-variate time series (Y 1 , . . . , Y 10 ) follows the connectivity diagram represented in Figure 3. The multivariate time series is divided into four blocks (X 1 , X 2 , X 3 , and X 4 ), where X 4 only sends information and X 3 , which is an integrative block, only receives information. Block X 1 has two functionally distinct internal parts, and only one is reached by outside influence. The scenario is fairly complicated and we next illustrate cPDC/cDC usefulness for understanding the underlying dynamic interaction between blocks.  To help interpret the results, we begin by describing the nonzero model coefficients and their dynamical effects. Observe that the model subscript indices in this example indicate the corresponding scalar process and not the block number.
The resulting cPDC components can be appreciated in Figure 4 for |a| = 1. Among their interesting features is the existence of the notch filtered link from X 1 to X 2 and to X 3 at midband. The effects of the low frequency dynamics due to Y 1 and the midband resonance due to [Y 4 and Y 5 ] manifests itself as the strongest component from X 1 to X 3 . Likewise the single link effect from X 2 to X 3 is readily apparent as the higher frequency resonances from X 4 toward both X 1 and X 2 . Both X 3 components are identically equal to 1 since nothing leaves the block. The corresponding cDCs are portrayed in Figure 5 for a = −1 with no signal reachability from X 4 to X 3 . This contrasts markedly with Figure 6 for a = 1 where X 4 's indirect effects on X 3 are not balanced out.
The effects of the notch connections are readily apparent in both cases. For example, the power associated with the notch frequencies are the local components to X 2 and X 3 and cannot be attributed to outside influence. For block X 1 only one of the five components is different from 1 reflecting the contribution coming from X 4 .

EMPIRICAL DATA
This example is based on EEG data borrowed from Sameshima et al. (2014) (Ex. 7.7), which describes a left mesial temporal ictal episode monitored using an extended 10-20 system. The midline electrodes were excluded and left (L) and right (R) side electrodes were grouped as to whether they were frontal (F), central (C), parietal (P), temporal (T) or occipital (O) leading to the canonical PDCs portrayed in Figure 7 where the most important connecting blocks share a dominant low pass frequency canonical component of fairly identical shape pointing to the existence FIGURE 4 | The cPDC for Example 2 reflects the existence of the notch connecting filters from X 1 to X 2 and to X 3 . The intrinsic dynamics of the oscillators from a subregion of X 1 into X 3 is apparent in the resonances of the largest cPDC component. The resonance within block X 2 manifest itself in the single non-zero component into X 3 while the effect of X 4 reaches symmetrically into X 1 and X 2 via its single dynamic component. In this and following two figures, each subfigure may contain up to five cPDC/cDC components, given by min(M i , M j ) as in Equations (22)/(23), represented in red, blue, yellow, green, or black lines in decreasing order of magnitude. of a shared dominant connectivity dynamics behind the observation, see Figure 8A. Their connectivity is further summarized in Figure 8B.

DISCUSSION
We showed that bPDC/bDC introduced in Takahashi (2009) and Faes and Nollo (2013) are block coherences between properly chosen vector time series. When the time series are Gaussian, this implies that bPDC/bDC represent mutual information rates between well defined underlying vector time series. This fully generalizes the results presented in Takahashi et al. (2010). To enhance the understanding of the possibly complex interaction between multiple time series and overcome some bPDC/bDC limitations, we showed that the latter can be decomposed in canonical terms that we call cPDC/cDC. These decompositions represent the various different modes of interaction whereby sets of time series interact. We introduced an explicit way to compute these new quantities and proved some of their properties. The usefulness of cPDC/cDC was illustrated by three examples. Takahashi et al. (2010) showed that PDC from the j-th scalar time series to the i-th scalar time series is the coherence between the i-th innovation process and the j-th partialized process with a similar result for DC. It is natural to ask whether an analogous result holds for bPDC and bDC. We showed that this is indeed the case where bPDC/bDC represent block coherences relating subsets of adequately defined innovations/partialization processes (Takahashi, 2009;Nedungadi et al., 2011). At first sight these identities may seem surprising as both bPDC and bDC are fully multivariate and directional measures of dependence, whereas block coherences are at once block-pairwise and symmetric measures of dependence. Yet careful reading of Theorems 1 and 2 highlights that bPDC/bDC from j to i and bPDC/bDC from i to j are, in general, block coherences between distinct pairs of vector processes which explains their asymmetric nature and lends them their directed connectivity character. Also, we note that for both bPDC and bDC, the coherences involve innovation process subsets which explains their fully multivariate characteristic as measures. Another interesting observation is that since the innovation processes are uncorrelated to the past of the partialized processes by construction, in the case of bPDC only innovations in the past of the partialized process contribute to the coherence which explains why bPDC is a directed measure of dependence. An analogous observation holds for bDC. In the Gaussian case, the bPDC/bDC representation as a block coherence allows relating them to the mutual information rate between suitably chosen time series. Formally this justifies the idea that these quantities are de facto measures of information flow. For an interesting comparison between bPDC/bDC and Geweke's measure of linear feedback see Faes FIGURE 5 | cDC for Example 2 for a = −1 leading to a cancelation of the effect of X 4 on X 3 as the signal travels indirectly through two exactly identical structures but with opposite phases before reaching X 3 . The notch filtering action is also apparent from the cDCs from X 1 to X 2 and X 3 . and Nollo (2013). As a small note for the reader, we observe that our definition of bPDC/bDC is slightly more general than the one proposed by Faes and Nollo (2013) because the covariance matrix of the innovations does not need to be diagonal as they assumed.

CANONICAL DECOMPOSITION OF DIRECTIONAL MEASURES
Given a pair of random vectors, it is natural to ask how to measure/represent dependence between them. In statistics, there are two main methods, both inspired by the basic Pearson correlation, to address this. The first one generalizes Pearson correlation directly using the determinants of the covariance matrix between and within each set of random variables. For time series, the equivalent measure in the frequency domain is the block coherence and the directed versions are bPDC and bDC. A second generalization rests on the idea of canonical correlation introduced by Hotelling (1936). There are several generalizations of canonical correlation for time series taylored specifically to infer Granger causality in the time domain (Sato et al., 2010;Wu et al., 2011), but, to the best of our knowledge, cPDC and cDC are the first proposals of canonical measures of directed dependence in the frequency domain.
One advantage of cPDC/cDC over bPDC/bDC is that canonical decomposition allows inferring the various different existing modes of interaction between sets of time series in close analogy to what is done for classical canonical correlation and principal component analyses. One should expect this to be useful when several signals are redundant, generated by similar mechanisms, or when there are several time series that do not significantly contribute to the interaction between sets of time series, e.g., when there are many brain areas that are not interacting with each other during some specific behavior. Besides, as we show in Theorem 4, we can recover the bPDC/bDC from the cPDC/cDC.

INTERPRETING cPDC/cDC
The main practical interest of cPDC/cDC is to allow the simplification of connectivity interpretations whilst giving new insights into the dynamical interaction between neural structures. We illustrated the achievable simplification using an EEG data set from an epileptic patient. We also showed how cPDC is related to the number of "modes" of interaction between sets of time series through the simple numerical Example 1 and via the slightly more complex Example 2. We expect that cPDC/cDC together Notice some divergences between gPDC and cPDC graphs possibly due the lack of proper rigorous statistics usage for cPDC significance level estimation, for instance, there is cPDC from RO to LO (B), but gPDC O2 to O1 is absent (A), while there is gPDC from C4 to T1 without corresponding cPDC from RC to LT.
with bPDC/bDC become useful tools for handling high dimensional data sets that are increasingly being recorded by several researchers. We propose that a reasonable way to understand the usefulness of cPDC/cDC is to make an analogy with classical principal component and canonical correlation analyses. Therefore, similar heuristics could be applied in practical situations, for example, to decide the number of different components to include in the interpretation. The canonical time seriesb Y andb Z from section 4 (see also Brillinger, 1981) are analogous to the canonical variables from the classical canonical correlation analysis and can play a similar role for result interpretation.
Finally we remark that the computational procedures used for the present paper will be made available the PDC homepage at http://www.lcs.poli.usp.br/∼baccala/pdc/canon together with the data used in section 5.2.