Mutual information against correlations in binary communication channels

Background Explaining how the brain processing is so fast remains an open problem (van Hemmen JL, Sejnowski T., 2004). Thus, the analysis of neural transmission (Shannon CE, Weaver W., 1963) processes basically focuses on searching for effective encoding and decoding schemes. According to the Shannon fundamental theorem, mutual information plays a crucial role in characterizing the efficiency of communication channels. It is well known that this efficiency is determined by the channel capacity that is already the maximal mutual information between input and output signals. On the other hand, intuitively speaking, when input and output signals are more correlated, the transmission should be more efficient. A natural question arises about the relation between mutual information and correlation. We analyze the relation between these quantities using the binary representation of signals, which is the most common approach taken in studying neuronal processes of the brain. Results We present binary communication channels for which mutual information and correlation coefficients behave differently both quantitatively and qualitatively. Despite this difference in behavior, we show that the noncorrelation of binary signals implies their independence, in contrast to the case for general types of signals. Conclusions Our research shows that the mutual information cannot be replaced by sheer correlations. Our results indicate that neuronal encoding has more complicated nature which cannot be captured by straightforward correlations between input and output signals once the mutual information takes into account the structure and patterns of the signals.


Background
Huge effort has been undertaken to analyze neuronal coding, its high efficiency and mechanisms governing them [1]. Claude Shannon published his famous paper on communication theory in 1948 [2,3]. In that paper, he formulated in a rigorous mathematical way intuitive concepts concerning the transmission of information in communication channels. The occurrences of inputs transmitted via channel and output symbols are described by random variables X (input) and Y (output). An actual important task is determination of an efficient decoding scheme; i.e., a procedure that allows a decision to be made about the sequence (message) input to the channel from the output sequence of symbols. This is the essence of the *Correspondence: jszczepa@ippt.pan.pl Institute of Fundamental Technological Research, Polish Academy of Sciences, Pawinskiego 5B, Warsaw, PL fundamental Shannon theorem, in which a crucial role is played by the capacity of the channel that is given by the maximum of mutual information over all possible probability distributions of input random variables. The theorem states that the efficiency of a channel is better when the mutual information is higher [4,5]. Analyzing a relation between data, in particular the input and response of any system, experimentalists apply the most natural tools; i.e., different types of correlations [6][7][8][9][10][11][12][13][14]. Correlation analysis has been used to infer the connectivity between signals. The standard correlation measure is the Pearson correlation coefficient commonly exploited in data analysis [15,16]. However, there are a number of correlation-like coefficients dedicated to specific biological and experimental phenomena [6]. Therefore, besides the Pearson correlation coefficient, in this paper, we also consider the correlation coefficient based on the spike train that is strongly related to the firing activity of neurons transmitting information. A natural question arises about the role of correlation coefficients in the description of communication channels, especially in effective decoding schemes [17,18]. Recently, interesting result has been shown [19], analytically and numerically, concerning the effects of correlations between neurons in encoding population. It turned out that decorrelation does not imply an increase in information. In [20] it was observed that the spike trains of retinal gangolin cells were indeed decorelated in comparison with the visual input. The authors conjecture that this decorrelation would enhance coding efficiency in optic nerve fibers of limited capacity. We begin a conversation about whether mutual information can be replaced in some sense by a correlation coefficient. In this paper we consider binary communication channels. It seems that the straightforward idea holds true: there is a high correlation between output and input; i.e., in the language of neuroscience, by observing a spike in the output we guess with high probability that there is also a spike in the input. This finding suggests that the mutual information and correlation coefficients behave in a similar way. In fact, we show that this is not always true and that it often happens that the mutual information and correlation coefficients behave in completely different ways.

Methods
The communication channel is a device that acts on the input to produce the output [3,17,21]. In mathematical language, the communication channel is defined as a matrix of conditional probabilities linking the transition between input and output symbols possibly depending on the internal structure of the channel. In neuronal communication systems of the brain, information is transmitted by means of a small electric current and the timing of the action potential (mV), also known in literature as a spike train [1], plays a crucial role. Spike trains can be encoded in many ways. The most common encoding proposed in the literature is binary encoding, which is the most effective and natural method [11,[22][23][24][25][26]. It is physically justified that spike trains as being observed, are detected with some limited time resolution τ , so that in each time slice (bin) a spike is either present or absent. If we think of a spike as representing a "1" and no spike as representing a "0", then, if we look at some time interval of length T, each possible spike train is equivalent to T τ digit binary number. In [26] it was shown that transient responses in auditory cortex can be described as a binary process, rather than as a highly variable Poisson process. Thus, in this paper, we analyze binary information sources and binary channels [25]. Such channels are described by a 2 × 2 matrix: where p 0|0 + p 1|0 = 1 and p 0|1 + p 1|1 = 1 , Symbol p j|i denotes the conditional probability of transition from state i to state j, where i = 0, 1 and j = 0, 1. Observe, that i and j are states of "different" neurons. Input symbols 0 and 1 (coming from the information source governed, in fact, by a random variable X) arrive with probabilities p X 0 and p X 1 , respectively. Having the matrix C, one can find a relation between these random variables; i.e., one can find by applying the where p 00 + p 01 + p 10 + p 11 = 1 , p 00 , p 01 , p 10 , p 11 ≥ 0 .
Using this notation, the probability distributions p X i and p Y j of the random variables X and Y are given by The quantities p X 1 and p Y 1 can be interpreted as the firing rates of the input and output spike trains. We will use these probability distributions to calculate the mutual information (between input and output signals), which is expressed in terms of the entropies of the input itself, output itself and the joint probability of input and output (4). In the following, we consider two random variables X (input signal to the channel) and Y (output from the channel) both assuming only two values 0 and 1, formally both defined on the same probability space. It is well known that the correlation coefficient for any independent random variables X and Y is zero [14], but in general it is not true that ρ(X, Y ) = 0 implies independence of random variables. However, for our specific random variables X and Y , which are of binary type, most common in communication systems, we show the equivalence of independence and noncorrelation (see Appendix). The basic idea of introducing the concept of a mutual information is to determine the reduction of uncertainty (measured by entropy) of random variable X provided that we know the values of discrete random variable Y . The mutual information (MI) is defined as where H(X) is the entropy of X, H(Y ) is the entropy of Y , H(X, Y ) is the joint entropy of X and Y , and H(X|Y ) is the conditional entropy [4,17,21,[27][28][29]. These entropies are defined as where I s and O s are, in general, sets of input and output symbols, p(X = i) and p(Y = j) are probability distributions of random variables X and Y , and p(X = i ∧ Y = j) is the joint probability distribution of X and Y . Estimation of mutual information requires knowledge of the probability distributions, which may be easily estimated for two-dimensional binary distributions, but in real applications it possesses multiple problems [30]. Since, in practice, the knowledge about probability distributions is often restricted, more advanced tools must be applied, such as effective entropy estimators [24,[30][31][32][33].
The relative mutual information RMI(X, Y ) [34] between random variables X and Y is defined as the ratio of MI(X, Y ) and the average of information transmitted by variables X and Y : RMI(X, Y ) measures the reduction in uncertainty of X, provided we have knowledge about the realization of Y , relative to the average uncertainty of X and Y .
It holds true that [34] if and only if X and Y are independent; 3. RMI(X, Y ) = 1 if and only if there exists a deterministic relation between X and Y .
Adopting the notation (2, 3), the relative mutual information RMI can be expressed as The standard definition of the Pearson correlation coefficient ρ(X, Y ) of random variables X and Y is where E is the average over the ensemble of elementary events, and V (X) and V (Y ) are the variations of X and Y . Adopting the communication channels notation, we get It follows that the Pearson correlation coefficient ρ(X, Y ) is by no means a general measure of dependence between two random variables X and Y . ρ(X, Y ) is connected with the linear dependence of X and Y . That is, the well-known theorem [15] states that the value of this coefficient is always between -1 and 1 and assumes -1 or 1 if and only if there exists a linear relation between X and Y .
The essence of correlation, when we describe simultaneously the input to and the output from neurons, may be expressed as the difference in the probabilities of coincident and independent spiking related to independent spiking. To realize this idea, we use a quantitative neuroscience spike-train correlation (NSTC) coefficient: 11 ) · (p 10 + p 11 ) (p 01 + p 11 ) · (p 10 + p 11 ) .
Such a correlation coefficient with this normalization seems to be more natural than the Pearson coefficient in neuroscience. A similar idea was developed in [35] where raw-cross-correlation of simultaneous spike trains was referred to the square root of the product of firing rates. Moreover, it turns out that NSTC coefficient has an important property: i.e., once we know the firing rates p X 1 and p Y 1 of individual neurons and the coefficient, we can determine the joint probabilities of firing: Since p 11 ≥ 0, by formula (12) we have the lower bound NSTC ≥ −1. The upper bound is unlimited for the general class (2) of joint probabilities. In the important special case when the communication channel is effective enough, i.e. p 11 is large enough so the input spikes with high probability pass through the channel, one has the following practical upper bound of NSTC < 1 p 11 − 1. We present realizations of a few communication channels that show that the relative mutual information, the Pearson correlation coefficient and neuroscience spiketrain correlation coefficient may behave in different ways, both qualitatively and quantitatively. Each of these realizations constitutes a family of communication channels parameterized in a continuous way by a parameter α from some interval. For each α, we propose, assuming some relation between neurons activities, the joint probability matrix of input and output signals and the information source distributions. These communication channels are determined by 2 × 2 matrixes of conditional probabilities (1). Next the joint probability is used to evaluate both the relative mutual information and correlation coefficients. Finally, we plot the values of the relative mutual information and both correlation coefficients against α to illustrate their different behaviors.

Results and discussion
We start with a communication channel in which the relative mutual information monotonically increases with α while NSTC and Pearson correlation coefficients are practically constant. Moreover, RMI has large values which, according to the fundamental Shannon theorem, result in high transmission efficiency, while the Pearson correlation coefficient ρ is small. To realize these effects, we consider the situation described by the joint probability matrix (14) where the first neuron becomes more active (i.e., the probability of firing increases) with an increase in the parameter α while simultaneously the activity of the second neuron is unaffected by α. Thus, the joint probability matrix M(α) reads In this case, the family of the communication channels for each parameter 0 < α < 2 15 is given by the conditional probability matrix C(α): We assume that the input symbols coming from an information source arrive according to the random variable X with probability distribution p X 0 = 3 5 − 2α and p X 1 = 2 5 + 2α. The behaviors of RMI, ρ and the NSTC coefficient are presented in Figure 1. Now consider the case for which the probability of firing of the first neuron decreases with parameter α while For this family of communication channels, the NSTC coefficient strongly decreases from positive to negative values, while ρ and RMI vary non-monotonically around zero. Moreover, ρ exhibits one extreme and RMI two extremes. Additionally, for α = 0.35, the RMI is close to zero while the NSTC coefficient is approximately -0.32 ( Figure 2). We point out these values to stress that, according to the fundamental Shannon theorem, the transmission is not efficient (RMI is small), although at the same time, the activity of neurons described by the NSTC coefficient is relatively well correlated. Figure 2 shows the behaviors of RMI, ρ and the NSTC coefficient. Finally, we present the situation (18) in which one neuron does not change its activity with α and the activity of the other neuron increases with α. Additionally, in contrast to the first case, the second neuron changes its activity only when the first neuron is active. In this case, the communication channel C(α) is given by and the information source probabilities are p X 0 = 9 10 and p X 1 = 1 10 for 0 < α < 1 20 . It turns out that NSTC coefficient increases linearly from large negative values below -0.4 to a positive value of 0.1. Simultaneously, ρ is practically zero and RMI is small (below 0.1) but varies in a nonmonotonic way having a noticeable minimum ( Figure 3). Moreover, observe that for small α the RMI (equal to 0.1) is visibly larger than zero what suggests that the communication efficiency is relatively good, while at the same time the Pearson correlation coefficient ρ (equal to -0.03) is very close to zero, indicating that the input and output signals are almost uncorrelated (independent for binary channels). It suggests that these measures describe different qualitative properties. Figure 3 shows the behaviors of RMI, ρ and the NSTC coefficient.

Conclusions
To summarize, we show that the straightforward intuitive approach of estimating the quality of communication channels according to only correlations between input and output signals is often ineffective. In other words, we refute the intuitive hypothesis which states that the more the input and output signals are correlated the more the transmission is efficient (i.e. the more effective decoding scheme can be found). This intuition could be supported by two facts: 1. for not correlated binary variables (ρ(X, Y ) = 0), (which are shown in the Appendix to be independent) one has RMI = 0, 2. for fully correlated random variables (|ρ(X, Y )| = 1) (which are linearly dependent) one has RMI = 1. We introduce a few communication channels for which the correlation coefficients behave completely differently to the mutual information, which shows this intuition is erroneous.
In particular, we present the realizations of channels characterized by high mutual information for input and output signals but at the same time featuring very low correlation between these signals. On the other hand, we find channels featuring quite the opposite behavior; i.e., having very high correlation between input and output signals while the mutual information turns out to be very low. This is because the mutual information, which in fact is a crucial parameter characterizing neuronal encoding, takes into account structures (patterns) of the signals and not only their statistical properties, described by firing rates. Our research shows that neuronal encoding has a much more complicated nature that cannot be captured by straightforward correlations between input and output signals.
The probability distributions of random variables X and Y are given by Adopting this notation, the condition ρ(X, Y ) = 0 implies that random variables X and Y are independent.
To prove this Theorem 1, we first show the following particular case for binary random variables.
The probability distributions p X 1 i and p Y 1 j of these binary random variables are given by Adopting this notation, ρ(X 1 , Y 1 ) = 0 implies that X 1 and Y 1 are independent.
Thus, we have p 11 − (p 01 + p 11 )(p 10 + p 11 ) = 0; i.e., p 11 is factorized p 11 = p X 1 1 · p Y 1 1 . To prove the independence of X 1 and Y 1 , we have to show that p 00 = p X 1 0 · p Y 1 0 , p 01 = p X 1 1 · p Y 1 0 , p 10 = p X 1 0 · p Y 1 1 . We prove the first and second equality, and the third equality can be proven analogously.
To generalize this Lemma 1, we consider the following.
Lemma 2. Assuming the notation as in Lemma 1, let us define the random variables: let X := (b x − a x )X 1 + a x and Y := (b y − a y )Y 1 + a y .
Under these assumptions, ρ(X, Y ) = 0 implies that X and Y are independent. In other words, divalent, uncorrelated random variables have to be independent.
Proof. The proof is straightforward and follows directly (by the linearity of the average value) from the definition of the correlation coefficient (10) and from the fact that the joint probability matrices M 1 for X 1 and Y 1 and M for X and Y are formally the same. Since by Lemma 1 the random variables X 1 and Y 1 are independent, the random variables X and Y must also be independent.
Finally, observe that X takes the values a x , b x and Y takes the values a y , b y only. Therefore, Theorem 1 follows immediately from Lemma 2.