Inferring directional interactions in collective dynamics: a critique to intrinsic mutual information

Pairwise interactions are critical to collective dynamics of natural and technological systems. Information theory is the gold standard to study these interactions, but recent work has identified pitfalls in the way information flow is appraised through classical metrics—time-delayed mutual information and transfer entropy. These pitfalls have prompted the introduction of intrinsic mutual information to precisely measure information flow. However, little is known regarding the potential use of intrinsic mutual information in the inference of directional influences to diagnose interactions from time-series of individual units. We explore this possibility within a minimalistic, mathematically tractable leader–follower model, for which we document an excess of false inferences of intrinsic mutual information compared to transfer entropy. This unexpected finding is linked to a fundamental limitation of intrinsic mutual information, which suffers from the same sins of time-delayed mutual information: a thin tail of the null distribution that favors the rejection of the null-hypothesis of independence


Introduction
Information theory [1] has emerged as a powerful framework to study causal relationships underpinning the collective dynamics of complex systems. Without the need of a mathematical model to be identified or experimental manipulations to be conducted, information theory allows for deciphering the strength and direction of interactions between coupled units from mere experimental observations of their dynamics. For example, through the lens of information theory, researchers have clarified the differences between anatomical and functional networks in the brain [2,3], quantified the role of media and policy on human decision-making [4,5], identified physical pathways underlying climate change across the globe [6,7], and detected leaders in groups of animals [8][9][10].
Most of these efforts rely on the notion of transfer entropy, formulated by Schreiber two decades ago to study pairwise, asymmetric interactions between coupled dynamical systems [11]. In its classical incarnation, transfer entropy measures the extent to which knowledge about the present state of a dynamical system (source) helps reduce the uncertainty in the prediction of the future of another dynamical system from its present (target). Transfer entropy can be readily calculated from raw time-series [12], as its computation only requires the determination of the joint probability mass function for the present and future of the target and the present of the source. Likewise, hypothesis-testing with transfer entropy is easy to perform [13,14]; for example, permutation tests can be implemented to assess whether transfer entropy is different from zero with a given confidence level, so that the null-hypothesis of independence of the target from the source can be rejected.
Over the years, the seminal work of Schreiber has been extended along several threads that have made transfer entropy ubiquitous among theorists and practitioners. For example, Sun and Bollt have successfully addressed multivariate interactions in network systems, building on the notion of conditional transfer entropy [15]. Runge et al introduced the notion of momentary information transfer, which excludes misleading influence of autodependency to better detail the coupling strength between the units [16]. Likewise, Staniek has tapped into symbolic dynamics to improve the robustness of transfer entropy-based inference, especially when dealing with short time-series [17]. Despite this growing body of sound methodological efforts and successful applications to real datasets, there are still open questions about the theoretical interpretation and practical use of transfer entropy.
The work of James et al has brought an important critique to transfer entropy, by offering concrete examples of systems with exclusive OR interactions that defeat one's intuition [18]. Specifically, the authors point at potential 'interpretational errors, some quite subtle . . . including overestimating flow, underestimating influence, and more generally misidentifying structure when modeling complex systems as networks with edges given by transfer entropies.' At the core of the critique is the impossibility to mechanistically associate transfer entropy between two dynamical systems with the information flow or transfer between them.
Building on this key limitation, James et al [19] have recently detailed information flow within a pair of dynamical systems, distinguishing multiple, co-existing modalities of information flow that have been erroneously compounded in the literature. Among them, 'intrinsic information flow' pertains to the predictive power that the present of the source alone has on the target's future, independent of the target's present: this quantity is what is routinely referred to as information flow (but seldom precisely measured). To quantify intrinsic information flow, the authors propose a cryptographic flow ansatz, which hypothesizes intrinsic flow to be equivalent to the secret key agreement between the two systems [20]. Such an ansatz is practically determined using an easy-to-compute upper bound, called intrinsic mutual information. The use of intrinsic mutual information in the study of leader-follower interactions has been explored by Sattari et al [21]. Through computer simulations of pairs and groups of self-propelled Vicsek-like particles [22], the authors have offered an important insight into information flow in collective dynamics, without the confounding effects that are brought about by classical information-theoretic metrics, such as transfer entropy.
While intrinsic mutual information constitutes a breakthrough in the quantification of information flow, its use as a tool for the inference of directional interactions has never been explored. Will the accurate quantification of information flow offered by intrinsic mutual information translate into an improved ability to detect directional influence? In this paper, we seek to provide an answer to this question through an integrated numerical and theoretical effort on the relationship between intrinsic mutual information and classical information-theoretic metrics (time-delayed mutual information and transfer entropy) to support hypothesis-testing in the inference of directional interactions.
We use a Boolean model of leader-follower interactions to compute exact, asymptotic expressions for information-theoretic metrics, which mitigate numerical artifacts related to the estimation of probability density functions from time-series. Similar to prior work on minimalistic models of collective dynamics [23][24][25], the model comprises a pair of directionally coupled Boolean units (a leader and a follower), subject to different intrinsic noises. The leader changes its state due to added noise, irrespective of the follower, while the follower responds to both the added noise and the leader. Upon gaining insight into the Boolean model, we examine simulation results from the modified Vicsek model by Sattari et al [21] to probe the generality of our claims and understand how the performance of the information-theoretic metrics vary with the size of the probability space where the inference is performed.

Background on information-theoretic metrics for causal analysis
The most basic information-theoretic tool to study causal relationships between two dynamical systems is based on mutual information [1] (see section 4 for further details). Specifically, given two stationary, discrete stochastic processes {Y t } t∈Z ⩾0 and {Z t } t∈Z ⩾0 , their (one-step) time-delayed mutual information is Here, 'Pr' indicates the probability of an event; capital, lower case, and calligraphic letters are used for random variables, realizations, and sample spaces, respectively; commas are used for conjugation (logical AND); vertical bars are for conditioning of random variables; and semicolons are utilized to separate random variables when computing their mutual information I. To simplify the notation and avoid the excessive use of parentheses, we adopt the following operator precedence (high to low): 'comma,' 'semicolon,' and 'vertical bar. ' Mutual information of the pair (Z t , Y t+1 ) amounts to the reduction of uncertainty in the future state of Y (that is, Y t+1 ) given the knowledge of the present state of Z (that is, Z t ). Being symmetric by construction, MI Z→Y will also correspond to the reduction of uncertainty in Z t given the knowledge of Y t+1 . Importantly, a nonzero value of time-delayed mutual information can be registered even if the future state of Y is not directly influenced by the present state of Z, but their dynamics contain memory of their past states [21]. This drawback is resolved by transfer entropy, defined as the mutual information between Z t and Y t+1 , conditional on Y t , namely, Rephrasing James et al [19], transfer entropy is sensitive to both intrinsic dependencies between Z t and Y t+1 , as well as the dependencies induced by Y t . To filter the latter dependencies and precisely measure information flow, Sattari et al [21] proposed the use of intrinsic mutual information from Z to Y, defined as Here, Y t is an auxiliary variable taking values in Y and related to Y t by means of the conditional probability Pr(Y t |Y t )-taking the form of an unknown (finite or infinite) |Y| × |Y| matrix. Computing the infimum over all possible conditional probabilities Pr(Y t |Y t ), intrinsic mutual information avoids including influence coming from the present state of both Z and Y when predicting the future state of Y.
Intrinsic mutual information has its theoretical roots in cryptography, whereby it can be viewed as an upper bound for the information shared by Z t and Y t+1 that cannot be reconstructed or derived by Y t . The definition of intrinsic mutual information begets the following, intuitive, inequalities: thereby implying that There is not an obvious relationship between time-delayed mutual information and transfer entropy: any of them can be larger than the other, since conditioning is not a subtractive operation. Intrinsic mutual information would reduce to time-delayed mutual information if the minimization process yielded a constant Y t , whereas it would be equivalent to transfer entropy in the case of Y t = Y t [21].
Albeit intrinsic mutual information has been shown to be more accurate in measuring information flow compared to transfer entropy [19,21], this does not necessarily imply that it is a better instrument for inferring causal relationships. Indeed, a key step in the application of information-theoretic constructs to causal inference is hypothesis-testing, which requires contrasting observed values against data obtained under the null hypothesis of independence. In what follows, we clarify the relationship between intrinsic mutual information and the classical metrics of information flow on a minimalistic model of coupled Boolean units. For this model, all the information-theoretic quantities can be exactly computed, thereby enabling a comparison between time-delayed mutual information, transfer entropy, and intrinsic mutual information in terms of their ability to detect leader-follower interactions.

Boolean leader-follower model
Let us consider two Boolean random processes X L t and X F t , describing the state of the leader and follower, respectively. Their dynamics is given by where 0 < η L < 1, 0 < η F < 1, and 0 ⩽ w ⩽ 1. Similar to the coupling of Vicsek-like models [21,22], the gain w identifies the tendency of the follower to replicate the behavior of the leader at the previous time-step, with w = 1 corresponding to the deterministic dynamics X F t+1 = X L t and w = 0 to X F t+1 being independent of X L t . Likewise, the parameters η L and η F capture the strength of the added noise in Vicsek-like models. Parameter η L is the probability that the leader changes state in one time-step, whereas η F is the probability that the follower changes state in the absence of coupling, that is, when w = 0.
The schematic in figure 1(A) shows how model (6) can be adapted to mimic the four interaction types considered in the paper by Sattari et al [21] that employed a modified Vicsek model, by means of a suitable selection of the noise parameters η L and η F . Indeed, the leader (follower) will have a natural tendency to change state or remain in the same state depending on η L (η F ) being greater or smaller than 1/2, respectively. This memory can be visualized as two self-loops of weight |1/2 − η L | and |1/2 − η F | for the leader and follower, respectively. Different from the leader, the follower dynamics is not only controlled by the intrinsic noise parameter. The follower changes its state also in response to the leader in the form of a tendency to copy the previous state of the leader that is modulated by w. Particular instances of the model, where the agents have no memory of their past state (η L and/or η F equal to 1/2), can be then represented with the absence of one or both self-loops.
Next, we formulate the system dynamics in terms of an ergodic, four-state Markov chain, for which we compute the stationary distribution in closed-form.

Transition matrix
The states of the leader and follower at time t + 1 only depend on their state at time t, and therefore we can describe the time evolution of system (6) as a first-order homogeneous Markov chain with four states, defined as 1 ≡ ( , and 4 ≡ (X L t = 1, X F t = 1). We denote with P ∈ R 4×4 the transition probability matrix of the Markov chain, where its element ij is the probability that the chain takes the jth value at the next time-step given that the current value is the ith one.
For brevity, we detail how entry 11 of P is computed; other entries are analogously obtained. By definition, where we have used the property that the next state of the leader is independent of the current state of the follower, and the property that the next states of the leader and follower are independent upon conditioning on their current states. Ultimately, we establish where we have introduced the notations f η = η(1 − w) and g η = f η + w. Obviously, all the rows of P sum to one.

Stationary probability distribution
Since none of the elements of P is zero, all the states are aperiodic and positive recurrent, that is, the Markov chain is ergodic [26]. The unique stationary distribution, for all i, j ∈ {0, 1}, can be computed as the left eigenvector with unitary eigenvalue of P [26], normalized such that its elements sum to 1. Therefore, we determine From equation set (10), it follows that the stationary probabilities for the leader and follower are all equal to 1/2, whereby π LF 1}, similar to a Vicsek model for which none of the agents has a preferential heading direction.

Information-theoretic metrics
From the transition matrix (8) and its stationary distribution (10), we compute the stationary joint probability distributions lim t→+∞ Pr(X F t+1 , X L Herein, the conditional probability on the right-hand-side of the equation is obtained from matrix (8), upon marginalizing with respect to the state of the leader at t + 1, that is, Complete expressions are reported in table 1 and utilized to compute closed-form, asymptotic expressions of classical information-theoretic quantities (time-delayed mutual information and transfer entropy) and of intrinsic mutual information as functions of the coupling gains between the unites and the strengths of the added noises.

Classical metrics
The computation of time-delayed mutual information from leader to follower (MI L→F ) and vice versa (MI F→L ) can be undertaken from (1), using the expressions in table 1. Therein, the conditional probabilities should be written using the definition of conditional probability as Pr( and similarly for the follower-to-leader interaction. Then, any of the probabilities appearing in the expression of time-delayed mutual information can be retrieved from table 1 through marginalization; for example, Likewise, transfer entropy from leader to follower (TE L→F ) and vice versa (TE F→L ) can be calculated via (2), by replacing for the joint distributions in table 1.
In figures 1(B)-(E), we display time-delayed mutual information and transfer entropy from leader to follower as functions of the coupling gain w for four pairs of noise parameters η L and η F that exemplify interaction types from figure 1(A). In agreement with numerical results on the modified Vicsek model by Sattari et al [21], we observe the following. First, in the presence of a self-loop for the follower (figures 1(B) and (D)), time-delayed mutual information can be less than transfer entropy. This surprising finding is related to the onset of a synergistic information flow, whereby simultaneous knowledge about the present of the leader and follower improves the predictive power about the future of the follower, compared to mere access to the present of the follower. Second, in the absence of self-loops in both the leader and the follower (figure 1(C)), time-delayed mutual information and transfer entropy are equivalent, which is due to the lack of memory in the dynamics. Third, in the presence of a self-loop only for the leader, transfer entropy is less than time-delayed mutual information (figure 1(E)), in agreement with one's intuition about the role of transfer entropy in mitigating redundant information from the follower's own dynamics.
From the follower to the leader, transfer entropy is always zero, since the follower does not provide any predictive power about the future state of the leader once the present state of the leader is known. On the other hand, time-delayed mutual information can be different from zero due to the shared history of the follower and the leader. In figure 1(F), we report time-delayed mutual information from the follower to the leader for the same cases considered in figures 1(B)-(E). Predictably, without a self-loop in the leader, time-delayed mutual information is zero: the leader does not have a memory and, as such, no information is shared in a common history with the follower.
Finally, we comment that the role of the coupling gain is non-trivial. While time-delayed mutual information seems to increase with the coupling gain for different choices of the noise parameters, transfer entropy could decrease for sufficiently large values of w, as in figure 1(B). In such a case, the follower will tend to systematically replicate the behavior of the leader, whose dynamics is, however, evolving in response to its own history. As a result, the information flow from the leader to the follower could be hindered by larger values of w. Compact expressions for time-delayed mutual information and transfer entropy are in Table 1. Stationary joint probability distribution of X F t+1 , X L t , and X F t and of X L t+1 , X L t , and X F t for the computation of closed-form, asymptotic expressions of information-theoretic metrics for model (6).
general not feasible; second-order Taylor expansions in terms the noise parameters are presented in the supplementary note 1.

Intrinsic mutual information
Obviously, intrinsic mutual information from follower to leader is zero (IMI F→L ), since transfer entropy is zero and intrinsic mutual information cannot be larger than transfer entropy. The computation of intrinsic mutual information from leader to follower (IMI L→F ) requires an auxiliary stochastic process where α, β ∈ [0, 1] are the parameters upon which conditional mutual information is optimized. Next, we can easily compute the joint distribution Pr(X F t+1 , X L t , X F t ) in terms of α, β, and values listed in table 1. For completeness, we report the resulting joint distribution in table 2. Following analogous steps to transfer entropy computation, but using table 2, we obtain I(X F t+1 ; X L t |X F t ) as a function of α and β. By taking the minimum over α and β in the compact unit square, we calculate intrinsic mutual information.
Results in figures 1(B)-(E) indicate that intrinsic mutual information is very well approximated by the minimum between time-delayed mutual information and transfer entropy for any selection of the noise parameters. Only in figure 1(B), where time-delayed mutual information and transfer entropy cross for a coupling of w = 0.776, we observe a narrow window of the coupling gain in which intrinsic mutual information is lower than both the classical metrics (for w ∈ [0.770, 0.780], intrinsic mutual information is within 0.008, or 2.4%, from the minimum of the other two classical metrics). As such, intrinsic mutual information equals time-delayed mutual information in the presence of a self-loop for the follower and absence of a self-loop for the leader or in the presence of both self-loops, provided the coupling gain is sufficiently weak. It is equal to transfer entropy in the absence of a self-loop for the follower or in the presence of both self-loops, provided the coupling gain is sufficiently strong. When both self-loops are absent, intrinsic mutual information is equal to time-delayed mutual information and transfer entropy. We stress that these conclusions are not affected by numerical artifacts in the estimation of the probability mass functions, whereby they rely on closed-form, asymptotic expressions of all the information-theoretic metrics.

Statistical inference
Here, we explore the feasibility of employing intrinsic mutual information for the inference of the directional coupling between the units and contrast its performance with time-delayed mutual information and transfer entropy. We utilize the time-series of the two units (leader and follower) to estimate all the joint probability distributions in the information-theoretic metrics (1)-(3). Without a priori knowledge of which is the leader and which is the follower, we calculate the information-theoretic metrics between the two units. These numerical values are contrasted with their corresponding null distribution in the absence of any interaction between the units (that is, w = 0), to decide whether a directional coupling exists or not, at a given confidence level. The null distributions are estimated by simulating model (6) for N repetitions each of length T. Should one not have access to the ground true mathematical model of the time-series, as in most of the practical applications, they could generate their null distributions through shuffling. We illustrate this possibility in the supplementary note 2. Table 2. Stationary joint probability distribution of X F t+1 , X L t , and X F t for the computation of closed-form, asymptotic expressions of intrinsic mutual information for model (6).
Based on the theoretical predictions for time-delayed mutual information, transfer entropy, and intrinsic mutual information in figures 1(B)-(F), we focus the inference effort on the case considered in figures 1(B) and (F) (η L = 0.95 and η F = 0.05), which displays the richest dependence of intrinsic mutual information on the coupling gain. We consider three different values of w (0.1, 0.5, and 1); for each value, we run model (6) for T = 2000, compute information-theoretic metrics, and contrast their values with the null distributions-we denote as F IMI , F TE , and F MI the cumulative null distributions of intrinsic mutual information, transfer entropy, and mutual information, respectively. We reject the hypothesis of w = 0 with a significance level of 0.05, which corresponds to cut-off values for time-delayed mutual information, transfer entropy, and intrinsic mutual information of MI 95 = 1. Simulation results indicate sensitivity-defined as the true positive rate-at the perfection level of all the information-theoretic metrics with respect to the inference of the directional interaction from the leader to the follower (false negative rate of zero for all values of w). Specificity-defined as the true negative rate-is more problematic and highly different among the information-theoretic metrics, as illustrated in figure 2(B). Independent of the value of w, transfer entropy yields the best inferences with a false positive rate of about 5%, a much better performance compared to intrinsic mutual information, which begets a rate of about 48%. For all values of w, time-delayed mutual information offers unacceptable results, where it erroneously misclassifies the entirety of the observations (similar results are found for different values of w, see supplementary note 2).

Explaining the excess of false positives
The inadequacy of time-delayed mutual information in identifying the directionality of the interaction between leader and the follower should have been anticipated, given that the dynamics of both units contains information about their past for the selected leader-follower configuration in figure 1(A). The asymptotic time-delayed mutual information is different from zero in both directions, thereby challenging statistical inference of a directional interaction. The higher false positive rate of intrinsic mutual information compared to transfer entropy is somewhat surprising, given that intrinsic mutual information was originally intended to better quantify information flow than the classical information-theoretic metrics.
The explanation for this counter-intuitive result largely has its roots in the fact that, for most parameter combinations, intrinsic mutual information corresponds to the minimum between transfer entropy and mutual information, as shown in figures 1(B)-(F). Put simply, intrinsic mutual information suffers from the sins of time-delayed mutual information. Under the assumption that intrinsic mutual information is the minimum of time-delayed mutual information and transfer entropy, we have  These two equalities together imply that F IMI (x) ⩾ max{F TE (x), F MI (x)}. Hence, the cut-off value for intrinsic mutual information cannot be larger than those for mutual information and transfer entropy, that is, IMI 95 ⩽ min {MI 95 , TE 95 } as illustrated in figure 2(A). Noting that F MI (x) > F TE (x) for all values of x (so that IMI 95 < TE 95 ), we identify the following three modalities by which intrinsic mutual information would yield different inferences than those of transfer entropy: Case 3 is then the only case in which intrinsic mutual information could outperform transfer entropy in filtering a spurious interaction. The possibility of this case to occur is related to time-delayed mutual information being able to filter the spurious link, which is never registered for any parameter combination. Cases 1 and 2 are prevalent in our study, due to the much fatter tail of the null distribution of transfer entropy compared to intrinsic mutual information, thereby explaining the excess of false positives when using intrinsic mutual information rather than transfer entropy (48% against 5%).

Extension to the modified Vicsek model
The proposed minimalistic Boolean model offers insight into the root causes for the superior performance of transfer entropy compared to intrinsic mutual information as a tool to infer leader-follower directional interactions. To support the generality of our findings, we now consider a leader-follower pair in the modified Vicsek model [27] as in Sattari et al [21]. Here, a leader particle L and a follower particle F move in a square domain of size l × l with periodic boundary conditions, and their planar positions at time t ∈ Z ⩾0 are described by the complex numbers r L t and r F t , respectively. The two particles move at a constant speed, and the heading of the leader θ L t at time t influences the heading of the follower θ F t+1 at the next time-step when their distance is within a unitary interaction distance, according to the following equation: Here, I t is an indicator function that is 1 if |r L t − r F t | ⩽ 1, and 0 otherwise; ı is the imaginary unit; ψ L t and ψ F  We report simulation results for l = 2, s = 0.3, η = 0.1, w LL = −1, and w FF = 1. In this case, the leader would tend to flip its heading at every time-step, while the follower would tend to maintain its heading, thereby mimicking the case of the Boolean model in figure 1(B). By varying the coupling w LF , we modulate the influence of the heading on the follower. For w LF = 0.1, the follower is only marginally affected by the heading of the leader in its update process. For w LF = 1, the follower equally weights its heading and the heading of the leader in its update process.
For each value of w LF , we perform N = 100 repetitions for initial headings randomly selected in [−π, π], and initial position in the unit radius at the center of the square domain. For each repetition, we estimate TE and IMI from the leader to the follower and vice versa. The conditional probabilities in equations (1)-(3) are computed by discretizing the time-series of the heading so to obtain b equally-spaced bins, with b being equal to 2, 3 or 4. To detect interaction between the particles, we then compare the observed values of MI, TE, and IMI with the corresponding null distributions obtained by simulating the model for N = 1000 repetitions each of length T = 2000 for w LF = 0, that is, in the absence of coupling between the leader and the follower.
Our results on the modified Vicsek model confirm the inadequacy of time-delayed mutual information and the superiority of transfer entropy to intrinsic mutual information in identifying directional interactions, see figures 4(A) and (B). For all values of the coupling gain from the leader to the follower, and for both numbers of bins, we observe a rate of false positives and/or of false negatives higher than 50%. Although transfer entropy and intrinsic mutual information exhibit comparable levels of sensitivity for different values of the coupling gain from the leader to the follower and different number of bins, their specificity can be dramatically different. With respect to sensitivity, for the lowest coupling value (w LF = 0.1), intrinsic mutual information yields a slightly larger fraction of false negatives (30% against 26% for b = 2, 52% against 26% for b = 3, and 44% against 19% for b = 4). As the coupling increases, the accuracy of the inference obtained through intrinsic mutual information improves, reaching the same levels of transfer entropy, with no false negatives for w LF = 1. With respect to specificity, performance is highly related to the number of bins used in the discretization of the time-series. For coarse binning, the specificity of the inference considerably deteriorates when choosing intrinsic mutual information over transfer entropy (when w LF = 1, the false positive rate is 56% against 1% for b = 2, and 14% against 2% for b = 3). Predictably, reducing the coupling from the leader to the follower mitigates the difference in the specificity of the two inferences (when w LF = 0.1, the false positive rate is 28% against 4% for b = 2, and 3% against 8% for b = 3), due to the weaker interaction between the units. A finer binning reduces the gap between the two metrics, whereby we register perfect specificity of both transfer entropy and intrinsic mutual information for b = 4.
Similar to the Boolean model, the difference in specificity should be sought in the relationship between intrinsic mutual information, transfer entropy, and time-delayed mutual information. We confirm that intrinsic mutual information is also well-approximated by the minimum between transfer entropy and time-delayed mutual information, whereby numerical values of intrinsic mutual information and the minimum of the other two classical metrics are statistically indistinguishable at a confidence level of 0.05 across 99.7% of the 900 cases (3 parameter values × 3 numbers of bins × 100 repetitions) reported in figure 4, see section 4 and supplementary note 4. Likewise, the cumulative null distribution of time-delayed mutual information is always above that of transfer entropy, see figures 4(C) and (D). As a result, the same three cases identified for the Boolean model in figure 3 are possible, and the extent to which intrinsic mutual information under-performs transfer entropy relates to instances of Case 3 being outnumbered by instances of Cases 1 and 2. The lowest specificity of intrinsic mutual information is registered when the false positive rate of time-delayed mutual information is larger than 50%; this corresponds to the occurrence of only 0.25% instances of Case 3.

Discussion
Intrinsic mutual information has been recently proposed as a precise measure of information flow in complex systems [19], bearing important insight into collective behavior [21]. Rephrasing the words of James et al [19], given two stochastic processes X and Y, there is an intrinsic information flow from X to Y when the past of X is individually predictive of the future of Y. Such an intrinsic information flow is not exactly quantified by transfer entropy, which also incorporates synergistic information flow, that is, the reduction of uncertainty about the future of Y by the simultaneous knowledge of the present state of X and Y. Likewise, it is not captured by time-delayed mutual information, which will also incorporate shared information flow, that is, when the past of X is predictive of the present of Y in the same manner as the past of Y. Intrinsic mutual information is an easy-to-compute upper bound for intrinsic information flow, which, different from transfer entropy, is free from contributions related to synergistic information. Whether intrinsic mutual information can be used for hypothesis-testing and inference of directional interactions between units from their time-series has never been attempted. In this study, we provide an answer to this question through closed-form results on a minimalistic Boolean model that captures salient features of leader-follower dynamics and simulation results on the modified Vicsek model by Sattari et al [21].
Our theoretical and computational results do not point at a practical advantage of intrinsic mutual information versus transfer entropy in the inference of pairwise interactions. Surprisingly, we observe that the precise quantification of information flow through intrinsic information does not bestow any advantage with respect to transfer entropy in both sensitivity and specificity. None of the considered scenarios, let them be simulations of the Boolean model or of the modified Vicsek model, offers evidence in favor of a performance improvement attained through the use of intrinsic mutual information. As such, care should be placed when employing intrinsic mutual information in the discovery of causal relationships. The simultaneous consideration of synergistic and intrinsic information flows by transfer entropy seems to offer a more reliable basis to minimize false positives and negatives compared to intrinsic mutual information.
While the Vicsek model is the most widespread choice for the study of collective dynamics from biology to swarm robotics [28], its general mathematical treatment is difficult, if not impossible. In its basic incarnation, the model leads to state-dependent, switched, nonlinear, stochastic dynamics that preclude the exact quantification of any information-theoretic quantity. Working with a Boolean model helps clarify two main aspects that would remain opaque from a mere computational endeavor. First, we determine under which condition (intrinsic noises and coupling gain) intrinsic mutual information reduces to any of the classical information-theoretic metrics (time-delayed mutual information and transfer entropy), offering further backing to the critique of transfer entropy by James et al [18] and reinforcing numerical predictions by Sattari et al [21]. We highlight a complex dependence of intrinsic mutual information on the system dynamics, whereby we demonstrate that intrinsic mutual information depends on added noise and on the strength of the coupling gain in a complex, nonlinear fashion. As a first approximation, intrinsic mutual information equals the minimum between time-delayed mutual information (compounding intrinsic and shared information flows) and transfer entropy (compounding intrinsic and synergistic information flows). This result suggests that shared and synergistic information flows do not coexist for the considered Boolean model, except for a narrow window of coupling gains.
Second, we pinpoint at the modalities by which intrinsic mutual information offers reduced performance in the inference of directional interactions compared to transfer entropy. While intrinsic mutual information and transfer entropy display similarly high sensitivity, intrinsic mutual information has considerably lower specificity. Low specificity of intrinsic mutual information can be traced back to the same sins of time-delayed mutual information, whose null distribution has a slimmer tail compared with that of transfer entropy, thus favoring the rejection of the null-hypothesis. Since intrinsic mutual information can be approximated as the minimum between time-delayed mutual information and transfer entropy, the tail of its null distribution will be at least as slim as that of time-delayed mutual information. When intrinsic mutual information coincides with transfer entropy (null synergistic information flow), it may happen that intrinsic mutual information would score a false positive, despite transfer entropy being capable of filtering a spurious interaction from the leader to the follower. When intrinsic mutual information is equal to time-delayed mutual information (null shared information flow), one cannot exclude the possibility that intrinsic mutual information would outperform transfer entropy. However, this would rely on mutual information exhibiting adequate specificity, a rare possibility throughout our statistical analysis. As a result, we warn prudence with the use of intrinsic mutual information as a tool for the discovery of directional interactions.
Several prior studies have pointed at the merit of exact results on information-theoretic metrics [25,[29][30][31][32][33][34]. The use of exact theoretical values rather than their statistical estimates alleviates the dependence of any claim on the statistical methods adopted for estimation and brings to light the specific role of model parameters on any information-theoretic metric. For example, Smirnov [30] computed closed-form results of transfer entropy over a class of benchmark systems (autoregressive processes and Markov chains), demonstrating typical factors that may lead to spurious couplings in real-world applications. Hahs and Pethel [31] have established closed-form results for transfer entropy for autoregressive processes with multiple timetags. Novelli et al [34] and Goodman and Porfiri [33] independently demonstrated the dependence of transfer entropy on topological properties of network nodes within theoretical studies of a linearly coupled Gaussian model and a Boolean system, respectively. Boolean models have been further investigated in a sequence of studies by some of these authors and others [25,29,32].
The study is not free of limitations. First, we presently lack of a general form for the cumulative distribution of conditionally independent variables for hypothesis-testing. As such, claims regarding superiority of transfer entropy against intrinsic mutual information in terms of specificity are based on numerical estimations of null distribution, conducted for specific parameter choices. Some work has been conducted in this direction [35], but available approximations are based on low-order Taylor expansions that do not consider the temporal structure of the time-series, thereby hindering their application to the problem of leader-follower interactions between systems with memory; see section 4 and supplementary note 5. Such a drawback is also at the core of the second, main limitation of this study: the lack of a comparison between the inferences of transfer entropy and intrinsic mutual information beyond coarse-grained dynamics for the modified Vicsek model. In fact, the present comparison is limited to discretizing the heading of the particles with at most four bins. Such a computation required about one hundred hours on a state-of-the-art machine, and computational time would scale exponentially with the number of bins. Access to a closed-form approximation for the null distributions of all the salient information-theoretic quantities for coarse-and fine-grained dynamics would address this issue.
Despite these two main limitations, our work brings forward important insight into the use of the novel concept of intrinsic mutual information as an inference tool of pairwise interactions underpinning collective dynamics. Transfer entropy has been, rightfully, criticized for its inability to detail information flow between coupled units [18,19,21]-a task that is seamlessly accomplished through the use of intrinsic mutual information. Yet, accomplishing this task may not translate into an improved statistical inference, especially with respect to specificity. Perhaps, this is one of the few cases in which Voltaire's famous aphorism applies: 'perfect is the enemy of good.'

Intrinsic mutual information
For a discrete random variable Y, the uncertainty associated with Y is quantified by its (Shannon) entropy [1] Given another discrete random variable Z, the joint entropy of the pair (Y, Z) is whereas the entropy of Y conditional to Z is Note that the above definitions imply that Mutual information between Y and Z is defined as By definition, mutual information is symmetric, whereby from the definition of conditional probability Pr(Y = y|Z = z) = Pr(Y = y, Z = z)/Pr(Z = z), so that one obtains that the right-hand-side of (21) is equal to I(Z; Y). Furthermore, both entropy and mutual information are non-negative from Jensen inequality [1].
Next, given a third random variable W, we introduce conditional mutual information I(Y; Z|W) as the mutual information between Y and Z conditional to W. This quantity is expressed as Surprisingly, conditioning is not a subtractive operation so that conditioning on a third variable can increase information shared: it is possible that I(Y; Z|W) > I(Y; Z). This phenomenon is known as conditional dependence [36] and a specific example based on exclusive OR logic has been proposed by Sattari et al [21] (therein, Y and Z are independent, but given W they become related in a deterministic manner).
In other words, conditional mutual information 'is sensitive to both intrinsic dependencies between Y and Z , as well as dependencies induced by W' [19]. A way to filter dependencies induced by W is to utilize the notion of intrinsic (conditional) mutual information between Y and Z when given W by Maurer and Wolf [20], Here, W is an auxiliary variable taking values in W and related to W by means of the conditional probability Pr(W|W)-taking the form of an unknown (finite or infinite) |W| × |W| matrix. Intrinsic mutual information is the infimum of I(Y; Z|W) over all possible random variables that can be generated from W through Pr(W|W). When W is a constant (Pr(W|W) corresponding to a matrix with all zeros but a column of ones) and W is identical to W (Pr(W|W) corresponding to the identity matrix), intrinsic mutual information reduces to mutual information and conditional mutual information, respectively [21].
Intrinsic conditional mutual information has been used in cryptography as an upper bound for the secret key rate of transmission between a pair of sender/receiver having access to Y and Z against an adversary having access to W. In other words, the secret key is the maximum rate at which the sender/receiver can agree on a secret S so that the information that can be obtained on S from W is arbitrarily small. The definition of intrinsic mutual information begets the following, intuitive, inequalities:

Relationship between information-theoretic metrics in the modified Vicsek model
The closed-form, asymptotic expressions of time-delayed mutual information, transfer entropy, and intrinsic mutual information for the Boolean model (6) indicate that for a wide range of parameters, intrinsic mutual information coincides with the minimum of time-delayed mutual information and transfer entropy. Such a claim is at the core of our explanation for reduced specificity of intrinsic mutual information when compared to transfer entropy. We numerically verified whether this claim would also hold true for the modified Vicsek model (17). For low (w LF = 0.1), medium (w LF = 0.5), and high (w LF = 1) values of the coupling from the leader to the follower, we numerically estimated time-delayed mutual information, transfer entropy, and intrinsic mutual information, which, in turn, required the estimation of the probability density functions in equations (1)-(3), respectively. For each parameter value, these estimations were performed on N = 1000 time-series of leader and follower, each T = 2000 time-steps long. To account for the finiteness of the time-series, we associated with each point-estimate an interval at a confidence level of 0.05. Specifically, for each of the three information-theoretic measures, the width of the interval was selected as the 95th percentile of the cumulative null distribution obtained from simulating the case w LF = 0.
Overall, we found that the confidence interval for intrinsic mutual information overlaps with (at least one of) that of transfer entropy and mutual information in 99.6% of the cases, whereby intrinsic mutual information is statistically indistinguishable from the minimum between transfer entropy and mutual information. This result is robust to different choices of the coupling gains w LF and number of bins b, see supplementary material.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.