A controlled transfer entropy approach to detect asymmetric interactions in heterogeneous systems

Transfer entropy is emerging as the statistical approach of choice to support the inference of causal interactions in complex systems from time-series of their individual units. With reference to a simple dyadic system composed of two coupled units, the successful application of net transfer entropy-based inference relies on unidirectional coupling between the units and their homogeneous dynamics. What happens when the units are bidirectionally coupled and have different dynamics? Through analytical and numerical insights, we show that net transfer entropy may lead to erroneous inference of the dominant direction of influence that stems from its dependence on the units’ individual dynamics. To control for these confounding effects, one should incorporate further knowledge about the units’ time-histories through the recent framework offered by momentary information transfer. In this realm, we demonstrate the use of two measures: controlled and fully controlled transfer entropies, which consistently yield the correct direction of dominant coupling irrespective of the sources and targets individual dynamics. Through the study of two real-world examples, we identify critical limitations with respect to the use of net transfer entropy in the inference of causal mechanisms that warrant prudence by the community.


Introduction
Quantifying causal interactions among the units of a complex system is critical to understand the inner workings of a variety of physical, biological, social, and engineering systems (Pearl et al 2000, Spirtes et al 2000, Bossomaier et al 2016, Pilkiewicz et al 2020. Researchers have developed and are still exploring different methods for this purpose. The notion of observational causality represented by localized information flow between time-series was first proposed by Wiener (1956) and later used by Granger (1969) in a linear regression framework, known as Granger causality. Despite its widespread use, Granger causality is unsuitable for capturing interactions that are inherently nonlinear, due to its reliance on an underlying linear model for the time-series. This limitation could be overcome by using the concept of entropy (Shannon 1948), a statistical measure of uncertainty that is based on the complete probability density function. Shannon entropy lays the foundation of information theory-an advanced framework for measuring information content in random variables and information-based dependence between stochastic processes (Cover and Thomas 2005).
Through the lens of information theory, one can quantify interactions among the units of complex systems in wide-ranging areas of research, from animal collective behavior to climate science , Pilkiewicz et al 2020. By expanding on the classical concept of mutual information (Shannon 1948) that quantifies the shared information between two random variables, Schreiber (2000) introduced transfer entropy as a measure of the asymmetry in interaction between two coupled stochastic processes. Since its inception, transfer entropy has emerged as the prevalent choice for studying pairwise or dyadic interactions in a wide-range of complex systems, for example in quantifying the directional connectivity and inferring network topology in brain functioning (Staniek and Lehnertz 2008, Vicente et al 2011, Stetter et al 2012, identifying leadership behavior in groups and pairwise interactions between animals (Butail et al 2016, Lord et al 2016, Neri et al 2017, Shaffer and Abaid 2020, Valentini et al 2021, studying complex connections in climate science (Hlinka et al 2013, Campuzano et al 2018, inferring causality in stocks and finance (Sandoval Jr 2014, He andShang 2017), and understanding causal influences in social media and human behavior (Borge-Holthoefer et al 2016, Porfiri et al 2019. To properly implement transfer entropy in causal analysis and infer qualities about the system, it is critical to rigorously define its methodological validity-a task that is yet to be fully undertaken by the community. In the past, Smirnov (2013) demonstrated that transfer entropy is unable to distinguish between direct and indirect influences, a weakness that was addressed by causation entropy (Sun and Bollt 2014). More recently, the work of James et al (2016) has indicated that transfer entropy results have often been interpreted incorrectly, misreading them as proxies of local information flow. This issue is resolved by using a cryptographic flow ansatz (James et al 2018), which excludes synergistic information. These critiques highlight that transfer entropy must strictly be used only when studying the interaction between two units (not more) and that its conclusions should not be construed as information flow. Are these the only caveats in the use of transfer entropy? We argue that there are two unsolved, glaring methodological issues that are even more profound.
First, when applying transfer entropy on real datasets, there is no guarantee that the units are homogeneous in terms of their individual dynamics, such that one should generally assume to work with complex systems comprised of heterogeneous units. The units being homogeneous means that, in the absence of coupling, they would display indistinguishable dynamics, a case which is unlikely to occur in real-world applications. The validity of transfer entropy in quantifying heterogeneous systems is, however, not well understood. Second, in real data, there is no guarantee of unidirectional coupling between the units, such that one should in general be able to deal with bidirectional interactions. Not only did foundational studies on transfer entropy (Schreiber 2000, Kaiser andSchreiber 2002) focus on unidirectionally coupled units, but also they advised that 'only in the case of zero transfer entropy in one direction we can reliably infer an asymmetry of the information exchange' (Kaiser and Schreiber 2002). Thus far, the extent to which transfer entropy can be applied to study a system with bidirectionally coupled units, in which each exerts a different influence on the other, remains elusive.
More concretely, let us consider a pair of two units X and Y, for example, two animals interacting, whose movements are being recorded. Should we compute a larger transfer entropy from X to Y than the other way around from our experimental data, we would be tempted to infer that the coupling from X to Y is stronger than the coupling from Y to X; in the context of social animals, we may even go a step further and propose X to be a leader and Y to be a follower. We argue that this proposition is indeed based on the assumption of the two units being identical in terms of their individual dynamics. Mathematically, we can demonstrate that the same ranking of transfer entropy values could, in fact, be obtained in the very opposite scenario of stronger coupling from Y to X than X to Y provided that the two units are sufficiently different in their individual dynamics. With respect to social animals, we would incorrectly classify an individual as a leader because they move more erratically rather than because they exert a larger influence on the other individual. In this context, the objective of this study is twofold: 1) demonstrating where and why transfer entropy fails to correctly detect the dominant coupling direction between heterogeneous units; and 2) examining alternative measures that overcome the potential confounds of transfer entropy towards accurately assessing interactions between heterogeneous, bidirectionally coupled units.
Toward these objectives, we first study a discrete bivariate linear system for which we can analytically determine the closed-forms of salient probability distributions and information-theoretic measures, without any data-based estimation. The system allows for direct manipulation of important features, such as coupling strengths and individual dynamics-in the form of linear coupling and autoregressive coefficients, respectively. We systematically explore the inference of transfer entropy against the true dominant coupling direction across a range of scenarios, spanning from nearly identical units to widely diverse units and from almost equal to dramatically different coupling strengths. We demonstrate that transfer entropy fails to accurately detect the asymmetry in coupling strength when the units are bidirectionally coupled and heterogeneous. Within the general framework of momentary information transfer (Pompe and Runge 2011), we examine two information-theoretic measures of interaction that control for this confounding effect in transfer entropy, which we refer to as 'controlled' and 'fully controlled' transfer entropies. We show that these measures record universally accurate inference of asymmetric interactions in heterogeneous systems, where transfer entropy fails. The robustness of these findings are further verified in multi-node and nonlinear systems. Two real-world examples are selected to study the performance of these measures and understand their advantages and limitations compared to transfer entropy. One is physiological time variation of heart rate (HR) and breath rate (BR) of a sleep apnea patient (Rigney et al 1993), which are interlinked through known cardio-respiratory processes (Riedl et al 2014, Krause et al 2017. The other is pairwise interaction between naive and experienced pigeons flying a specific route over generations of transmission chain (Sasaki and Biro 2017). In these examples, we characterize the different types of errors incurred by transfer entropy in inferring the asymmetry of the interactions and demonstrate the advantage of employing controlled and fully controlled transfer entropies.

Information-theoretic tools
The uncertainty encoded in a random variable X is given by its Shannon entropy (Shannon 1948), defined as where p(x) = p(X = x) is the probability of X attaining the value x in the set of all possible realizations χ.
Often times, the logarithm of the probability of x is referred to as its surprisal or information content (Bossomaier et al 2016). The joint entropy of two random variables, X and Y, is given by The conditional entropy of X given Y, that is, the uncertainty encoded in X given the knowledge of Y, is If the two random variables are not independent, there is some information shared between them. A measure of the shared information between X and Y is given by their mutual information, defined as which is a symmetric quantity, since I(X; Y) = I(Y; X). In the presence of a third random variable Z, which is, in principle, related to both X and Y, we can compute conditional mutual information, Conditional mutual information removes the redundant information in Y and Z about X, while including the information that comes from knowing both Y and Z simultaneously (the so-called synergistic information (Bossomaier et al 2016, James et al 2016).
Here and in what follows, we adopt the following operator precedence (high to low): 'comma,' 'semicolon,' and 'vertical bar.' These concepts can be extended to the study of coupled dynamical systems, in the form of discrete-time stationary random processes of the form X = {X n } n=1,2,... where n is the time index. In this vein, transfer entropy was introduced by Schreiber (2000) as a measure of asymmetric interactions between two coupled dynamical systems. Transfer entropy from source X to target Y is defined as the mutual information between the future state of Y (Y n+1 ) and the past states of X up to l time-steps (X Transfer entropy from X to Y represents the reduction in uncertainty in the prediction of the future of Y from its time-history due to additional knowledge about the time-history of X. This reduction of uncertainty can be considered as a cause-and-effect relationship between X and Y following Wiener's principle of causality (Wiener 1956). By construction, transfer entropy is a non-negative quantity and it is bounded by . The upper bound can be utilized to normalize transfer entropy values, thereby better gauging the strength of the interaction (Gourévitch and Eggermont 2007, Duan et al 2013, Shovon et al 2014, Computing transfer entropy with long time-histories (l and k large) requires the estimation of high-dimensional joint probability density functions, which in turn requires a large amount of data for accuracy. In many applications where data is limited, only one time-history of source and target (l = 1 and k = 1) is used for computing transfer entropy (Hlinka et al 2013, Borge-Holthoefer et al 2016, He and Shang 2017, Zhang et al 2018.
For quantifying the asymmetry of interaction between X and Y, one can compute net transfer entropy or directionality index, whose sign is typically used to infer the dominant direction of interaction between the two processes X and Y. Such a quantity has been used to detect directional interactions in simulated models of coupled chaotic systems (Staniek and Lehnertz 2008), micro-blogging time-series of social collective phenomena (Borge-Holthoefer et al 2016), kinematic data of fluid-coupled airfoils (Zhang et al 2018), and movement data of birds (Valentini et al 2021).
When studying the interaction between two processes, it may be important to measure the dependence of each process on its own past, given the past of the other process. To quantify the individual dynamics of a process, we define an information-theoretic measure, self-regulation entropy of Y given X, as which represents the amount of uncertainty in the future state of Y resolved by its own time-history, given knowledge of the time-history of X. Using Jensen's inequality (Royden 1988), one can show that, like transfer entropy, self-regulation entropy is non-negative.

Bivariate linear systems: inference using TE
For an analytical treatment of information-theoretic measures that avoids numerical confounds in the estimation of probability density functions, we study the following linear system with bidirectionally coupled units: X n+1 = aX n + bY n + W n ; W n = N (0, 1) where W and V are temporally uncorrelated Gaussian processes of zero mean and unit variance, that is, standard normal variables. This form can be assumed for any first-order, bivariate linear autoregressive process with uncorrelated Gaussian noise without any loss of generality, upon normalizing all the salient variables by the variance of the noise. The autoregression coefficient, a, encapsulates the individual dynamics of X; similarly, a + αϵ captures the individual dynamics of Y. The parameters b and b + ϵ are the coupling strengths that capture the dependence of X on Y's past and that of Y on X's past, respectively. Therefore, ϵ quantifies the difference in the coupling strengths between the two processes, and, similarly, α represents the difference in their individual dynamics. For the bidirectional interaction, the dominant coupling direction is established from the difference in the two coupling strengths, Our main objective is the accurate quantification of the asymmetric coupling between X and Y against confounding effects due to the heterogeneity of the system dynamics. The exact forms of the information-theoretic measures can be analytically derived for this system. Since a linear system driven by Gaussian noise remains Gaussian, the multivariate probability distributions of all the variables are Gaussian at all times . Therefore, knowing the mean and covariance, one can determine the joint probability density functions of the system variables. In this vein, we can derive the following expressions for entropy and joint entropy of variables in terms of the corresponding covariance matrices (Hahs andPethel 2013, Novelli et al 2020): where Transfer entropy can be expressed as Note that only one time-history (k = 1, l = 1) is required for measuring coupling in the first-order system under consideration in equation (10). The covariance matrix for a linear autoregressive process (see Methods) can be expressed as an infinite series of the state matrix, , whose convergence is ensured by asymptotic stability. Thus, we only consider parameter values that satisfy the stationarity condition, that is ρ(A) < 1, where ρ( · ) is the spectral radius of a matrix. Under the assumptions of Gaussianity of the noise and stability of the system, all the information-theoretic measures can be expressed as explicit functions of the four parameters a, b, α, and ϵ. Net transfer entropy is used to measure coupling asymmetry. To test the validity of this measure, we first examine the leading order terms in a Maclaurin series expansion of net transfer entropy from X to Y for a small difference in coupling strength (ϵ → 0), Note that a leading order expansion is only used to interpret the main contributing terms and not for the remaining analysis in this work, for which the exact expression of net transfer entropy (equation (13)) is used. The zeroth order term is zero, which implies that if ϵ = 0 (equally coupled units), net TE X→Y = 0 in agreement with the true coupling direction given in equation (11). However, the first-order term depends on α, which reflects the dependence of transfer entropy on the individual dynamics of the processes in question. If α is large enough (for a given choice of a and b), net TE X→Y will change sign, leading to an incorrect inference of the dominant coupling direction. This behavior is illustrated in figure 1(a). When X and Y have identical individual dynamics (α = 0), net TE X→Y correctly identifies the dominant coupling direction; for a large enough α, net TE X→Y is likely to yield incorrect inferences. The above-mentioned behavior is not limited to small differences in coupling strengths. For a higher ϵ, the variation of transfer entropy with the difference in individual dynamics, α, is shown in figure 1(b). While TE X→Y is nearly constant with α, TE Y→X is strongly dependent on α and becomes greater than TE X→Y above a critical α. As a result, the sign of net TE X→Y (figure 1(c)) switches to negative, failing to detect the dominant direction of coupling (X → Y for the given case with |b + ϵ| > |ϵ|). If the difference in coupling strength is even larger (ϵ > a, b), as shown in figures 1(e) and (f), net transfer entropy can detect the asymmetry in coupling incorrectly at high enough α. The same behavior is observed across wide ranges of a, b, and ϵ values. For example, net transfer entropy infers the direction of coupling incorrectly even when the source has a stronger individual dynamics than the target (α < 0) and/or when the difference in coupling strength is very small (ϵ ≪ 1), as shown in supplementary information: S1. Even normalized transfer entropy, given by equation (7), fails to provide accurate inference above certain α (see supplementary information: S2).
To further investigate the variation of individual dynamics, self-regulation entropy is plotted as a function of α in figures 1(d) and (g) for two sample cases. Self-regulation entropy of X given Y does not vary with α, which should be expected since the X equation does not depend on α, while the self-regulation entropy of Y given X depends on α. There is a striking resemblance between the behaviors of TE Y→X and SE Y|X and that of TE X→Y and SE X|Y , suggesting that transfer entropy is related to the individual dynamics of its source. In the context of neuroscience, a previous study has shown that the estimates of Granger causality . Transfer entropy varies with the individual dynamics of the source, which causes net TEX→Y to provide incorrect inference of the coupling direction when the difference in individual dynamics of the source and target is high.
(equivalent to transfer entropy for a linear Gaussian process ) are independent of the receiver (target) dynamics but depends on the transmitter (source) dynamics. This dependence of transfer entropy on the individual dynamics of its source is the reason for net transfer entropy to infer incorrect direction of dominant coupling when the individual dynamics of the two variables are sufficiently different, as made clear in what follows.

Controlled and fully controlled transfer entropies
In order to avoid the spurious quantification of coupling direction in heterogeneous first order systems, like in equation (10), we employ the concept of momentary information transfer (Pompe and Runge 2011). Through this concept, we derive accurate measures of asymmetric interactions that faithfully represent the coupling strengths (b and b + ϵ in equation (10)) and do not depend on the difference in individual dynamics (α in equation (10)). Momentary information transfer, as proposed by Pompe and Runge (2011), is a measure of causal association that is based on the concept of source entropy, which captures the information shared between two units at a given moment only, not present in their joint history. Momentary information transfer between X and Y can be written as In the framework of graphical models, the authors showed that the conditioning in momentary informatin transfer offers better detection of coupling strengths and delays. This measure has been used in the past to study problems in the fields of climatology and neuroscience (Pompe and Runge 2011, Runge et al 2012, Wibral et al 2013. Our goal is to quantify interaction between the future state of Y and the past state of X; Y n+1 depends on (X n , Y n ), while X n depends on (X n−1 , Y n−1 ). TE X→Y (equation (6)) only accounts for the former two dependencies, without considering the effect of the latter two. As illustrated in figure 2(a), TE X→Y is the mutual information between past state of the source (X n ) and future state of the target (Y n+1 ) conditioned on the past state of the target (Y n ). While this controls for the individual dynamics of the target Y, it does not account for the individual dynamics of the source X. Momentary information transfer accounts for this effect, as illustrated below.
To control for the effect of the individual dynamics of the source, we condition transfer entropy on additional past state of the source X n−1 , defining controlled transfer entropy (figure 2(b)) as which is the same as momentary information transfer for k = 1 and l = 1. In this manner, controlled transfer entropy from X to Y represents the reduction in the uncertainty of the future of Y given its time-history, due to additional knowledge about the time-history of X given its own time-history further in the past.
To completely control for all the dependencies in a first-order Markov chain, as illustrated in figure 2(c), we can condition on additional past state of the target Y n−1 , defining fully controlled transfer entropy n ,x n , x (2) n p y n+1 |y which is equivalent to momentary information transfer for k = 2 and l = 1. Thus, fully controlled transfer entropy from X to Y represents the reduction in the uncertainty of the future of Y due to additional knowledge about the time-history of X given the joint time-history of X and Y going back until the same time instant. The idea is that controlled and fully controlled transfer entropies quantify the coupling strength or magnitude of causal interaction between the past state of the source and the future state of the target, independent of their individual dynamics. This is demonstrated analytically for linear systems and numerically for nonlinear systems in subsequent sections of this study.

Bivariate linear systems: inference by controlled and fully controlled transfer entropies
Similar to TE (equation (13)), for a bivariate autoregressive process of the form given in equation (10), controlled and fully controlled TE can be expressed as functions of the covariance matrix as follows: The covariance matrix is a function of the coefficients a, b, α, and ϵ (see Methods). First, we test the performance of net controlled transfer entropy and net fully controlled transfer entropy for a small difference in coupling strength (ϵ → 0) by examining their first order Taylor series expansions. Net controlled transfer entropy for a small coupling strength is given by The zeroth order term is zero, similar to net TE X→Y (equation (14)), implying that net TE C1 X→Y = 0 if the coupling strengths between X and Y are identical in both directions (ϵ = 0). However, unlike the case of net TE X→Y , the first order term of net controlled transfer entropy does not depend on α: it is simply a function of a, b, and ϵ. Similarly, the first order Taylor series expansion of net fully controlled transfer entropy is given by which is also zero when the coupling strengths are identical in both directions (ϵ = 0). The first order term is independent of both α and a, and it is purely a function of the coupling parameters b and ϵ, suggesting that TE C2 X→Y is likely the most ideal measure of coupling asymmetry. For higher difference in coupling strength (ϵ), the variation of controlled transfer entropy and fully controlled transfer entropy with α is illustrated in figure 3 for two sample cases. It is evident that controlled and fully controlled transfer entropies are both independent of α. This indicates that the additional conditioning is able to successfully control for the confounding effect of individual dynamics of the source observed in transfer entropy. As a result, both the new measures correctly detect the dominant direction of coupling, even when the source and target have very different individual dynamics, as shown in figures 3(c) and (f). Similar behavior is observed at wide-ranging values of a, b, and ϵ (supplementary information: S3).
Next, we test the inference of these measures for different values of a and b for a given ϵ and α as shown in figures 4(a)-(c). Only the (a, b) domain that satisfies the stability condition, ρ(A) < 1, is considered. In the (a, b) plane, the true direction of dominant interaction for ϵ > 0 (equation (11)) is X → Y above the b = −ϵ line and Y → X below this line. Therefore, net TE X→Y incorrectly reflects the dominant direction of coupling in the top-right and bottom-left regions of the plane. On the contrary, both net TE C1 X→Y and net TE C2 X→Y capture the correct asymmetry of interaction in the entire (a, b) domain considered. Note that there is only a small difference in the contours of the two new measures.
To illustrate the behavior for different ϵ and α, in figure 4(d) we plot the percentage of the (a, b) domain for which the measures infer the asymmetry of interaction incorrectly, as a function of α. It is evident that net TE X→Y is accurate at any a, b, and ϵ value if the difference in the individual dynamics is less than the difference in coupling strength (α ⩽ 1). If α > 1, the percentage of incorrect inferences shows a logarithmic increase with α, nearly independently of ϵ. This suggests that if the difference in individual dynamics of source and target is greater than that of the coupling strengths, net TE X→Y may incorrectly detect an asymmetric interaction and is more likely to do so if the difference in individual dynamics is higher. On the other hand, net TE C1 X→Y and net TE C2 X→Y quantify the asymmetric interaction correctly at all values of a, b, α, The three measures in question have specific behaviors in the limiting cases of the system (see supplementary information: S4). Transfer entropy along with controlled and fully controlled transfer entropies correctly provide the dominant direction of coupling if the processes are unidirectionally coupled (b = 0, ϵ ̸ = 0) and/or if they are homogeneous in terms of individual dynamics (α = 0). If the processes are bidirectionally coupled with equal coupling strengths in both directions (ϵ → 0) but have different individual dynamics (αϵ ̸ = 0), net TE X→Y falsely shows a dominant direction of coupling, while net TE C1 X→Y and net TE C2 X→Y are exactly equal to zero, correctly representing the interaction. Scaling up the system to a fully connected network with a single X node and multiple Y nodes, we observed that both net TE C1 X→Y and net TE C2 X→Y accurately reflect the asymmetry of X-Y interaction with a zero percentage error (see supplementary information: S5). On the other hand, net TE X→Y renders a higher percentage of incorrect inferences as the system is extended to three nodes or more. This indicates that controlled and fully controlled transfer entropy accurately measure the pairwise asymmetry of interaction even within a multi-node network, where transfer entropy may fail to quantify such bidirectional coupling.

Real-world examples
The ability of the three measures-transfer entropy, controlled transfer entropy, and fully controlled transfer entropy-to accurately assess asymmetric interactions is now tested on two real-world examples.

Physiology of a sleep apnea patient
The first example is a bivariate time-series data of HR and BR of a sleeping patient suffering from sleep apnea. The data is obtained from the Santa Fe Institute time-series contest held in 1991 (Rigney et al 1993), and it has been previously used to investigate the interdependence of cardiac and respiratory oscillations in a human body using a variety of measures, including information-theoretic measures (Schreiber 2000, Kaiser and Schreiber 2002, Hidaka 2012 The information-theoretic measures are calculated using kernel density estimation with a uniform kernel function and a bandwidth or smoothing parameter r (Silverman 1986). Note that the entire dataset of 34 000 points is used in the present analysis unlike Schreiber (2000), in which only a particular set of 1200 data points was selected for use. Transfer entropy, controlled transfer entropy, and fully controlled transfer entropy from HR to BR) and vice-versa are plotted for different r in figures 5(c)-(e); low values of r implies higher resolution in the estimation of probability density functions. As r decreases, the value of these measures increases until it reaches a peak value followed by a decrease due to the finite sample size of the data and increasing sampling errors (peak of transfer entropy occurs at a bandwidth lower than the plotted range). On the other hand, as r increases to very low resolution density estimation, the system dynamics are not fully captured, resulting in a collapse of information-theoretic measures in the two directions (HR → BR and BR → HR). Therefore, to compare the inference of all three measures for the given sample size of data, the reliable range of r lies approximately between 0.05 and 0.2.
In the appropriate bandwidth range, transfer entropy detects a stronger coupling from HR to BR than the other way around in agreement with Schreiber (2000). This is in contrast with controlled transfer entropy that points at a bidirectional interaction between HR and BR with equivalent coupling in both directions, and with fully controlled transfer entropy that suggests a dominant direction of interaction from BR to HR.

Interaction between naive and experienced pigeons
As a second example, we test the performance of the different information-theoretic measures in the interaction between naive and experienced homing pigeon pairs during their flight between a specific release site and their home loft. The data is obtained from an experimental study (Sasaki and Biro 2017) conducted by pairing experienced birds from the previous generation with naive ones of the present generation, to investigate the accumulation of collective intelligence leading to improvement in flying routes over generations ( figure 6(a)). To study the interaction between the birds, the information-theoretic measures are applied to a symbolized time-series in which the clockwise and counter-clockwise turning of the bird at each instant during their flight are encoded as 0 and 1, respectively.
A recent implementation of transfer entropy-based inference by Valentini et al (2021) has shown that in initial generations, naive birds (N) influenced the experienced birds (E) more strongly while in later generations the experienced bird becomes the stronger source of influence. This behavioral variation with generations inferred based on transfer entropy is depicted in the schematic diagram of figure 6(a), and is illustrated in the boxplot of net normalized TE in figure 6(b). The Theil-Sen (TS) slope (Theil 1950, Sen 1968) is used to estimate a linear trend in the median of the information-theoretic measures with changing generations, and its significance is tested by the closely related Kendall's τ test (Helsel et al 2020). The TS line for net normalized transfer entropy (TS slope = 0.146, p = 0.038) reflects a dominating influence of the naive bird on the experienced bird in the initial generations and a clearly increasing influence of the experienced bird on the naive bird with increasing generations, as previously demonstrated by Valentini et al (2021). These associations are not confirmed by normalized controlled transfer entropy (TS slope = 0.0668, p = 0.15) and normalized fully controlled transfer entropy (TS slope = 0.0681, p = 0.16), as shown in figures 6(c) and (d).
This experimental study constitutes a rich collection of experimental data with 12 releases of each pair of birds at every generation and 10 independent repetitions of the complete transmission chain, thereby offering regular possibility to explore reliability in addition to validity of any of the considered In generation 1 only a single pigeon was released 12 times and this pigeon was paired with a naive pigeon in generation 2 and released 12 times and so on. The red arrows indicate the direction and magnitude of TEE→N and TEN→E, while blue arrows represent that of TE C1 E→N or TE C2 E→N and TE C1 N→E or TE C2 N→E , as inferred from the plots in (b)-(d). Interaction between E and N is illustrated by boxplots of (b) net TE ′ E→N , (c) net TE C1 ′ E→N , and (d) net TE C2 ′ E→N , for each generation of birds. Theil-Sen (TS) estimator is used to calculate the slope of a linear increase with generation and Kendall's τ test is used to test the statistical significance of this slope. TE shows a statistically significant increase in interaction E → N with increase in generation (slope = 0.146, p = 0.038). Controlled TE (slope = 0.067, p = 0.15) and fully controlled TE (slope = 0.068, p = 0.16) do not confirm such an association.  entropy is characterized by a higher variability (dispersion from the mean) within each generation as compared to net normalized controlled and fully controlled transfer entropies. To statistically appraise this difference in variability, we perform a Levene's test for homogeneity of variances across measures and generations (table 1). The test constitutes a split-plot design or mixed-design ANOVA of absolute deviation from the mean and evaluates whether the dispersion from the mean is significantly different between the different information-theoretic measures and the different generations. The Levene's test demonstrates that there is a significant difference in variability of net transfer entropy, net controlled transfer entropy, and net fully controlled transfer entropy. Net controlled and fully controlled transfer entropies are statistically indistinguishable in their variability (p = 0.167 in post-hoc comparison via pairwise t-test), while net transfer entropy is significantly higher in variability than the other two (p < 0.05 in post-hoc comparisons via pairwise t-tests). These results suggest that, although controlled and fully controlled transfer entropies require estimation of higher dimensional probability density functions than transfer entropy, they are more precise measures of asymmetric interactions.

Discussion
This study investigates the limitations of transfer entropy in the study of bidirectional interactions in heterogeneous systems. Specifically, we seek to test whether net transfer entropy between two units X and Y, defined as the difference between transfer entropy from X to Y and transfer entropy from Y to X, is indicative of the dominant coupling direction. We release two key assumptions that are typically made when pursuing transfer entropy-based causal inference, that is, we do not assume the units to have equivalent individual dynamics, and neither do we hypothesize unidirectional coupling between them. For this general case of bidirectional interactions within heterogeneous systems, we offer a critique to the use of transfer entropy. Our critique is, however, constructive, in that we examine two viable extensions of transfer entropy that minimize confounding effects associated with bidirectional coupling and heterogeneity. Through the analytical treatment of bivariate linear system, we demonstrate the following. While transfer entropy is successful in accurately inferring the dominant coupling direction in the special case when the units are unidirectionally coupled and have the same individual dynamics, it fails to provide such an inference when the units are bidirectionally coupled and have different individual dynamics. In fact, we demonstrate that, if the units are bidirectionally coupled with equal strength in both directions (that is, in the case when no dominant coupling direction exists), transfer entropy may falsely detect the existence of a dominant coupling direction. Such an erroneous inference stems from the dependence of transfer entropy on the individual dynamics of the source: if the individual dynamics of the source and the target are sufficiently different, transfer entropy will be be prone to false inferences. To control for this confounding effect, we apply the concept of momentary information transfer by Runge et al (2012) to condition on the past of the source, thereby yielding what we refer to as 'controlled transfer entropy' , or on both the past of the source and further past of the target, thereby leading to what we call 'fully controlled transfer entropy' . For the considered example, both these measures accurately quantify the asymmetry of coupling between the two units universally across different parameter regimes, irrespective of their individual dynamics.
Borrowing terminology from the discussion by Barrett and Barnett (2013) about the work of Hu et al (2011), our study contributes to classifying relationships between 'mechanism' and 'effect' in coupled dynamical systems. We show that through the 'effect' we are able to detect qualitative aspects about the 'mechanism' of the system. The argument of Barrett and Barnett (2013) that transfer entropy-like measures are invariant under the scaling transformation while regression coefficients (representing the 'mechanism') vary does not undermine the critique made in this work. In fact, in our first example, we define the system's 'mechanism' for a unit variance of noise, thereby offering a unique scaling rule for systems with arbitrary variance. This ensures that the 'mechanism' is defined by coupling strengths and individual dynamics that are fixed with respect to any scaling transformations of the system variables. Our results indicate that transfer entropy may fail to identify the qualitative relationships between system variables accurately, while controlled and fully controlled transfer entropy yield accurate predictions.
To investigate the promise of these alternative measures in the inference of dominant coupling directions in nonlinear systems, we explore two numerically simulated examples of bivariate nonlinear systems-1) a system of coupled chaotic tent maps, and 2) a variant of the original bivariate linear system, featuring a cubic nonlinearity (see supplementary information: S6 and S7). Our numerical results demonstrate that the analytical insight from the study of the linear systems carry over to the more complex instance of nonlinear dynamics. Specifically, we confirm that transfer entropy is inadequate for the inference of the dominant coupling direction in the case of heterogeneous systems. While controlled transfer entropy shows improved inference compared to transfer entropy, it may fail for certain conditions. Fully controlled transfer entropy accurately quantifies the asymmetry in the coupling strength in all the cases considered.
Alongside analytical and numerical examples, we examine real-world examples characterized by bidirectional interactions between heterogeneous units. The variation of HR and BR of sleep apnea patients during sleep is oscillatory in nature and constitutes an ideal example of dynamically coupled oscillating processes. For high bandwidth probability estimation, analogous to coarse binning, any of the considered information-theoretic measures indicates that the coupling strengths in the two directions are indistinguishable. Reducing the bandwidth (analogous to increasing the number of bins in the estimation), transfer entropy detects a dominant coupling direction from the HR to the BR, in disagreement with the other two measures. While controlled transfer entropy does not detect asymmetries in the system, fully controlled transfer entropy reveals a dominant coupling direction from the BR to the HR. The difference in the prediction of controlled versus fully controlled transfer entropy should be ascribed to the critical role of the conditioning on additional time-history of the target, and not to numerical artifacts. The latter could arise due to the challenges in estimating a higher-dimensional probability density function, but the presence of a clear peak in the measures at a lower resolution does not support this possibility. The accuracy of the inference made through controlled transfer entropy is supported by the literature on coupling between HR and BR (Bartsch et al 2014), which identifies three distinct phenotypes: cardiorespiratory coordination (CRC) (Riedl et al 2014), cardiorespiratory phase synchronization (CRS) (Krause et al 2017), and respiratory sinus arrythmia (RSA) (Krause et al 2017). CRC and CRS both represent mutual influence between HR and BRs and RSA indicates a likely influence of BR on HR.
We further test the use of information-theoretic measures to quantify the interaction between naive-experienced pairs of homing pigeons during their flight in a multi-generational chain. In such multi-generational experiments, there is evidence of transmission of non-genetic social information about routes that helps improve performance over generations in different organisms (Helfman and Schultz 1984, Sasaki and Biro 2017, Jesmer et al 2018. In each flight, the experienced bird has a route preference from its previous knowledge and will try to influence the naive bird, while the naive bird's exploratory behavior will affect the trajectory of the experienced bird, leading to route improvement (Sasaki and Biro 2017). Employing transfer entropy on the trajectories of the birds points to an increase in coupling strength from the experienced bird to the naive bird as they proceed forward in generation (Valentini et al 2021). However, neither controlled transfer entropy nor fully controlled transfer entropy offer statistical support in favor of such an inference. While this may suggest that the association revealed by transfer entropy could be a false positive finding or a type I error, we warn prudence in the interpretation of our findings. In fact, one must maintain caution in attributing a lack of a significant trend to a true independence of these two measures on the generation. The statistical analysis of this dataset further indicates that transfer entropy has a much higher variability (dispersion from mean) as compared to the other two measures, which could be a source of unreliability in statistical inference.
This work focuses on quantification of coupling asymmetries between the units of bivariate systems by using net transfer entropy. Note that this is not an overall critique of the use of entropy or methods of inference involving entropy. In the future, we will explore the applicability and performance of controlled and fully controlled transfer entropies in network systems. Preliminary results (see supplementary information: S5) for a fully connected network with different self-loops (individual dynamics) support our findings on dyadic interactions. Specifically, our results show that transfer entropy also fails to accurately detect coupling asymmetries in network systems, in contrast with controlled and fully controlled transfer entropies that are found to be accurate. The rigorous extension of these measures to network systems will be tackled within the general causal discovery algorithm called PCMCI, given by Runge et al (2019), which is based on nonlinear conditional independence tests. When attempting the extension to network systems, we will also seek to contemplate higher-order description of the interaction beyond dyadic ones, such as simplicial complexes (Gambuzza et al 2021). Similar to our current analysis with dyadic interactions in two-dimensional systems, we will examine the relationship between the high-order 'mechanism' and the high-order 'effect/behavior' (Rosas et al 2022) of these high-dimensional network systems.
Overall, the contributions of this study are multifold: 1) we point at critical limitations of transfer entropy through analytical and numerical treatment of bivariate linear and nonlinear systems; 2) we demonstrate the validity of two alternative information-theoretic measures based on the concept of momentary information transfer through analytical and numerical insights; and 3) we show that transfer entropy can yield false causal associations when applied to real data sets. The improvement bestowed by controlled and fully controlled transfer entropies with respect to traditional transfer entropy comes at the expense of estimating a probability density function of higher dimension. When data is available to undertake this task, we recommend using these improved measures to mitigate confounding effects related to bidirectional interactions and heterogeneous dynamics. We warn prudence in the interpretation of transfer entropy-based inferences, which may lead to false positive and false negative associations.

Normalization of information-theoretic measures
n ), as given in equation (6). It can be shown that conditioning cannot increase entropy (Bossomaier et al 2016), that is, which implies Therefore, the absolute upper bound of TE is given by Hence, transfer entropy is generally normalized in literature (Gourévitch and Eggermont 2007, Duan et al 2013, Shovon et al 2014 as with the aim of better quantifying the strength of total direct causality. Following the same argument, we normalize controlled and fully controlled transfer entropies as follows:

Bivariate linear systems
The considered bivariate linear system (equation (10)) can be represented in the following vector form: The exact expressions for entropy and all the information-theoretic measures including transfer entropy, controlled transfer entropy, and fully controlled transfer entropy for a single-time history, that is, k = 1, l = 1, (equations (12), (13) and (18)), are functions of the covariance matrix of the system for two time-stamps C (2) (X, Y) = C n,n C n,n+1 C n+1,n C n+1,n+1 , where C n,n = C(X n , X n ) C(X n , Y n ) C(X n , Y n ) C(Y n , Y n ) = C n+1,n+1 , One can show from equation (26) that the covariance matrix C n,n satisfies the condition C n,n = Q + AC n,n A T , Novelli et al 2020, where the covariance matrix of the noise is Q = C(W n ) = I for uncorrelated unit-variance Gaussian noise, and I is the suitably-sized identity matrix. This leads to an infinite series solution for the covariance matrix that, if ρ(A) < 1, converges to the following form: where vec( · ) represents vectorization operator and '⊗' represents Kronecker product of two matrices. The covariance matrix C n,n+1 for this system is given by Once we have the complete covariance matrix of two time-stamps for the system (C (2) (X, Y)), all the information-theoretic measures are also exactly known as functions of system parameters a, b, ϵ, and α. This analytical calculation of transfer entropy and other information-theoretic measures holds only for a linear system with parameters a, b, ϵ, and α that satisfy the requirement ρ(A) < 1. Figure 8 shows the region in a-b plane satisfying this constraint for different ϵ values. It is evident that this constraint is satisfied and our analysis holds for coupling strength magnitudes as high as 1 or even higher.

Physiology of a sleep apnea patient Data:
The physiological data set of a sleep apnea patient consists of the time-series of HR and chest volume (respiration rate) of a sleeping patient, obtained from experiments conducted in the sleep laboratory of Beth Israel Hospital in Boston, Massachusetts, and made accessible through the Santa Fe Institute time-series contest held in 1991 (Rigney et al 1993). The variables are measured at a sampling frequency of 2 Hz (every 0.5 s). The entire dataset of 34 000 recorded measurements is used in this work. Each time-series is normalized to zero mean and unit variance prior to using it in the analysis. Analysis: To directly compare with the seminal work of Schreiber that introduced the concept of transfer entropy and employed it on this data, we use kernel density estimation, a non-parametric density estimator (Rosenblatt 1956, Parzen 1962, for computing the joint probability density functions required for calculation of transfer entropy, controlled transfer entropy, and fully controlled transfer entropy. As an example, let us consider the calculation of TE (equation (6)) with k = 1, l = 1, TE X→Y = yn+1,yn,xn p(y n+1 , y n , x n ) log p(y n+1 |y n , x n ) p(y n+1 |y n ) , where the probability of occurrence of a realization (y n+1 , y n , x n ) is estimated from the N data points aŝ p(y n+1 , y n , where the kernel K(·) is a positive function controlled by the bandwidth or smoothing parameter r and | · | represents the Euclidean norm of a vector. We employ a step or tophat kernel for this estimation: K(x, r) = 1 if x < r, K(x, r) = 0 otherwise. We use the KernelDensity class of Neighbors functionality from the Scikit-learn library (Pedregosa et al 2011) to compute the required joint probability density functions. These estimated probability density functions are utilized to compute the logarithmic part of the expression in equation (31). Summing over probability p(y n+1 , y n , x n ) multiplied by this logarithmic part is the same as computing the expected value of the logarithmic part. Thus, we simply take the mean of the logarithmic part over all the data points to compute transfer entropy. Note that the same bandwidth r is used in this case for both x and y since the data is normalized before the analysis. The choice of the bandwidth is very important and the optimum value of r depends on the number of samples and the distribution of the samples. All the information-theoretic measures from HR to BR and vice versa are computed considering single time-history of source and target (k = 1, l = 1) for a range of r ∈ (0.01, 1), similar to Schreiber (2000).

Interaction between naive and experienced pigeons Data:
The experimental data of the flight of homing pigeons (Columbia livia) from experiments conducted by Sasaki and Biro (2017) at Oxford University Field Station, Wytham, UK, was obtained from the shared data repository of Valentini et al (2021). The experiment consisted of birds flying in pairs between the same release site and home loft in 10 independent transmission chains of generations. In each chain, the first generation was constituted by the flight of only one bird (naive) 12 times. This bird (now experienced) was then paired with another bird (naive) for 12 releases thus comprising the second generation. Similarly, the naive bird in second generation (now experienced) was paired with another bird (naive) in the third generation. In this manner, ten chains of five successive generations were created through experiments to show that the collective intelligence accrues and the homing performance improves over generations. The flight trajectory data was recorded at a sampling frequency of 5 Hz. Analysis: We use two-dimensional trajectory data of the birds to determine the motion vector of a bird at each time step, and then obtain the turning of the bird at each time step i by computing the angle between the direction of motion of the bird at time step i and that at time step i + 1. We compute the symbolized time-series representing the turning direction of each bird at each point in time-0 for clockwise and 1 for counter clockwise. Since each time-series is symbolized to only two possible realizations, probability density functions are estimated directly by counting number of samples for each realization without the need for any kernel density estimation. In computing transfer entropy and other measures, a single time-history (k = 1, l = 1) is used to ensure that there is sufficient sampling per bin in estimating the probability density functions. Transfer entropy, controlled transfer entropy, and fully controlled transfer entropy are normalized in this analysis so that comparisons can be drawn between different generations. All statistical analysis are performed with a confidence level of 0.05.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// github.com/dynamicalsystemslaboratory/TE_Coupling_Asymmetry.git.