Maximizing local information transfer in Boolean networks

We study a Boolean network model such that rules governing the time evolution of states are not given a priori but emerge from the maximization process of local information transfer and are stabilized if possible. We mathematically derive the class of rules that can be stabilized. With the presence of small noise, those stabilized are such that their output depends on a unique input. We confirm the prediction of the theory by numerical simulation. We argue that the stabilized rules have generic properties of real-world gene regulatory networks, being both critical and highly canalized.


Introduction
Quantifying information flow between agents in a complex system is an important step in understanding its structure and behavior. For this purpose, information-theoretic quantities such as mutual information [1] and transfer entropy [2] have been applied to analyzing living, cognitive, economical, and social systems [3]. The relation between the optimal information processing capability of a system and its functional advantages has also been studied in model living and cognitive systems [4,5].
The relation between the dynamics of a system and the averaged information-theoretic quantities over the whole system has been investigated in the literature. In particular, many authors have asked whether information-theoretic quantities are maximized at criticality. For example, Luque and Ferrera [6] showed that the averaged mutual information between the past and future states of each agent in a random Boolean network is maximized at criticality, using mean field theory. Ribeiro et al [7] proved that the pairwise mutual information averaged over all the pairs of agents in a random Boolean network is maximized at criticality in the thermodynamic limit. However, Barnett et al [8] showed that the global transfer entropy measuring the average information transfer from all agents in the system to an agent has a peak in the disordered phase in a twodimensional Ising model. Lizier et al [9] found that the balance between information transfer and information storage is optimized close to criticality, by numerical simulation. Lizier et al [10][11][12] further proposed a framework to localize information-theoretic measures at each spatio-temporal point in the dynamics of complex systems. In particular, they revealed that traveling coherent structures in the spatio-temporal dynamics of cellular automata carry most of the transmitted information by using local transfer entropy [10].
In this paper, we ask the converse question: What kind of dynamics will emerge when local information transfer to an agent in a system is maximized at each time step? We formulate this question in Boolean networks, which are simple models of gene regulatory networks [13,14]. Such models have also been used as a test bed for discussing the relation between information and dynamics, as we have reviewed above. One of the authors recently proposed an adaptive Boolean network model in which rewiring of links based on local information transfer leads to near critical dynamics [15]. The basic idea is to use the fact that local information-theoretic quantities can change their sign. Each agent attempts to balance the stability of information processing (1) by eliminating links with negative local mutual information, which carry misleading information transfer [10] and (2) by acquiring new links randomly when there are no such links. However, in this model, the ensemble of rules is fixed and the dynamical property of the whole system is varied via changes of links across a longer time scale than that of state dynamics. Here, we consider a Boolean network model in which the rules themselves can change in the same time scale as that of state dynamics.
Some authors have used transfer entropy as a fitness function for the evolution of systems' dynamics. Lizier et al [16] found that coherent traveling waves of information transfer are observed when motion of snake-like modular robots is evolved to maximize the average transfer entropy along components from the tail to the head. Obst et al [17] showed that the learning performance of recurrent neural networks is improved when the parameters of their dynamics are adjusted so that the transfer entropy is optimized at the individual unit level. In contrast with these previous works, local information transfer directly acts on the state dynamics in our approach.
It may appear that the question we pose here is similar to the one discussed by Informax in artificial neural networks [18]. Informax seeks the optimal function between its input and output by maximizing the average mutual information subject to given external constraints. In contrast with that, the local information transfer maximization (LITM) proposed in this paper has no external constraints. It turns out that this latter construction admits a Bayesian interpretation, with a likelihood function depending on the state dynamics that can be seen as an internal constraint.
This paper is organized as follows. In section 2, we present the Boolean network model in which the rule of each agent at each time step is determined by LITM. In section 3, we mathematically derive the stabilized rules and confirm them by numerical simulation. In section 4, we discuss properties of the stabilized rules and links to real-world gene regulatory networks. Finally, in section 5, we summarize the result of this paper and indicate its relation to existing approaches.

Model
Let us consider N agents that can each take values 0 and 1 for their state. Each agent i receives K inputs from other agents j j j , , , K 1 2 ¼ , which are chosen randomly. The state of the agent evolves according to these inputs. Unlike in usual Boolean networks [19], here the time evolution rule of state for each agent is not given a priori. Rather, we hypothesize that the change of state is based on the ability of an agent to maximize local information transfer toward it from its inputs. Thus, local information transfer has an influence on the change of state in an optimal way in our model. Let x i t , be the state of agent i at time step t and let x x x , , j j t j t j t : , , , + is determined as the state y which maximizes the sum of the local mutual information between the states of i and j k . Namely, : , : , Here, p y i ( ) is the probability that the state of i at time step t 1 + is y and p y x is the conditional probability that the state of i at time step t 1 + is y, given the state of j k at time step t is x j t , k . These probabilities at a specific time step are calculated from the frequencies of states and pairs of states in the past W steps in our numerical simulation below. The states in the first W time steps (t W 1 , , 1, 0 ) are prepared as follows: The initial states and the time evolution rules of all agents are given at random. Then, the states are updated by the rules until t 0 = . For t 0 > , the state of i is updated according to equation (1).
We could use transfer entropy instead of mutual information. Indeed, transfer entropy has been accepted as a more appropriate measure to evaluate the magnitude of the impact of one dynamical variable on another one [3]. However, we here use mutual information as an approximation for measuring information transfer to make mathematical argument feasible.
We also note that the underlying network become locally tree-like when N is large since the links between the agents are formed at random. This implies that the inputs to an agent are almost uncorrelated. The agreement between the theory developed below and numerical simulation depends largely on this assumption.
Interestingly, (1) has a Bayesian interpretation. Specifically, (2) can be expressed as : , : , Thus, in our model, each agent performs a maximum likelihood estimation with its inputs as data, L as a likelihood function, and an indeterminate future state as a hypothesis. The time evolution of states can be interpreted as transformation of states-as-hypothesis to states-as-data. We also note that the likelihood function depends on past states and hence is subject to change over time, which is reminiscent of inverse Bayesian inference [20,21]. If we fix p i and p i j k | and vary inputs x j j in (1) and (2), we can act as though a rule outputting + is given at each time step t. In general, it changes for each time step. However, some rules could persist. Here, we examine which rules can be stabilized. We introduce the following condition for that purpose.
We say that a rule f : 0, 1 : : Here, q j k is the probability that the state of input j k is 1. The values of p i and p i j l | are given by respectively. We have used the notation q j k to make it explicit that the LITM condition is a condition for an isolated single rule and does not depend on the outputs of the other agents.

Result
We find that a necessary and sufficient condition for a rule f to satisfy the LITM condition is as follows (theorem 1 in appendix): There exists an integer A rule f satisfying this condition is said to belong to class l. For a class l rule f, there exist l specific inputs j j , , k k l 1 ¼ that control the output. If their states are equal to specific values z z , , then the output of f is a specific value y and is otherwise y 1 -, regardless of the values of the other inputs. If we express a rule as a K-dimensional Ising hypercube [22], then a rule belonging to class l corresponds to the one such that all the vertices in a single K l -( )-dimensional face are colored with the same color and the remaining vertices are colored with the other color (figure 1). One finds the number of class l rules is by an elementary combinatorial argument. Table 1 shows the value of C K l , for K 4  . Since the total number of rules with K inputs is 2 2 K , the proportion of class l rules approaches 0 as K becomes large. We examined whether class l rules can indeed be stabilized by numerical simulation. We applied noise by flipping output with probability 0  > . The motivation for introducing noise is to force agents to explore the space of input probabilities q q , , )and thus to make the LITM condition effective. Another reason for including this is that it is natural to consider noise when treating Boolean networks as a model of gene regulatory networks [13,14] since it is known that gene expression is stochastic [23]. LIT 0; j j t j j t : , : , However, the equality is numerically susceptible to noise. Thus, class 1 rules are expected to work as absorbing rules for sufficiently small noise. The rule of every agent eventually settles to one of the class 1 rules.

Discussion
Class l rules have a high degree of canalization. Canalization is the property that specific values of specific inputs determine the output of a rule regardless of the values of the other inputs. In the context of ontogeny, it enables organisms to explore beneficial combinations of variations while maintaining a robust phenotype by buffering variations in the developmental process [24]. It has been suggested that real-world eukaryotic gene regulatory networks use rules with a high degree of canalization [25,26]. The degree of canalization of a rule with K inputs can be quantified as the ratio of the number of k-tuple inputs that determine the output regardless of the other inputs to the number of all k-tuple inputs p k (k K 0, 1, , 1 = ¼ -) [27]. Table 2 shows p k of class l rules for K 3 = . For K 2, 3 = , p k of rules that do not belong to any class is smaller than or equal to p k of class l rules. For K 4 = , there are 96 rules that do not belong to any class whose p k for some k is greater than that of class 1 and 2 rules. Thus, it is not necessarily true that class l rules always have a higher degree of canalization than those that do not belong to any class. Class 1 rules are the least canalizing rules among class l rules. However, we can say that they have a high degree of canalization among all rules. For example, when K 3 = , there are 18 rules whose p k is greater than that of class 1 rules for some k among all 256 rules. These 18 rules belong to either class 0 or class 3 (Class 2 rules have the same p k as class 1 rules. This is also the case for K=4). When K 4 = , there are 194 rules whose p k is greater than that of class 1 rules for some k among all 65 536 rules. These 194 rules consist of rules belonging to classes 0, 3, and 4 along with the above-mentioned 96 rules that do not belong to any class. In recent years, it has been suggested that the dynamics of real-world gene regulatory networks are near critical [28]. It has been argued that criticality of dynamics is related to a balance between robustness and evolvability [29] and an optimal information processing ability [30]. The dynamics of a Boolean network in which each agent uses a class 1 rule is expected to be critical. Indeed, a random Boolean network with class 1 rules is equivalent to one with K 1 = without constant functions. It is known that the dynamics of the latter is critical [31]. Indeed, the stability of Boolean networks can be determined by measuring the rate of perturbation propagation λ [32]. Let d t be the size of perturbation at time step t (the fraction of agents whose states are flipped). By plotting d t 1 + against d t and computing the slope of the graph (the Derrida plot) at the origin, we can obtain λ. The states 1 l < , 1 l > and 1 l = correspond to stable, unstable and critical dynamics of Boolean networks. Let f be a class 1 rule. There exists a unique input such that the output of f changes when the input is flipped. The other inputs are irrelevant. Thus, if each input of f is flipped with probability d t , then its output will  also flip with probability d t . If every agent uses class 1 rules, then d t will be conserved on average and thus λ is expected to be close to 1. We numerically checked the stability of the dynamics of Boolean networks obtained in figure 2 by constructing the Derrida plots. Figure

Conclusion
In summary, we showed that maximization of local information transfer in Boolean networks under the presence of noise leads to the emergence of class 1 rules that are both highly canalized and critical. Bassler et al [27] reported that critical rules with a high degree of canalization emerge from a minority game on Boolean networks. Although class l rules appear with high frequency in their model, other rules also appear with positive probability. Thus, the criticality dynamic is different from ours. Their model is an example of extremal dynamics since the most losing agent changes its rule randomly [33]. Hence, there is a global selection of rules that is performed from outside. Here, on the other hand, rules are determined locally from within [34,35]. Our approach suggests an alternative picture in which the determination process of rules based on LITM yields dynamics with criticality and a high degree of canalization, which are generic properties of real-world gene regulatory networks.
Future work can proceed in two directions. One is to investigate what kind of spatio-temporal structures emerges when LITM is applied to each agent in models having the spatial dimension such as cellular automata. This direction is now ongoing and will be reported elsewhere [36]. The other is a generalization of the framework based on the Bayesian interpretation pointed out in section 2. We could consider likelihood functions other than equation (3) and study what kind of dynamics emerges when each agent performs a maximum likelihood estimation similar to the one discussed in this paper for a particular likelihood function. These extensions would make the limits and applicability of our approach clearer.
}be a Boolean function assigned to an agent i. For convenience, we renumber N agents so that inputs to i are K 1, 2, , ¼ and K i < . f is said to satisfy the LITM condition when The values of p i and p i j | are given by }satisfies the LITM condition if and only if there exists an integer l K 0   , as well as k k K , , 1, , , and otherwise f x y 1 . A rule satisfying this condition is said to belong to class l.
Proof. It is straightforward to check that class 0 rules satisfy the condition stated in the theorem by using the convention mentioned in the main text: let log a 2 0 = -¥ for a 0  . In the following, we assume that f does not belong to class 0, namely, that f is not a constant function. Let f be a class l rule for some l K 1   . Without loss of generality, we can assume that f x 1 To prove that the LITM condition holds for f, we have to show f x if 1 then A.8 When f x 0 K 1: Thus, (A.9) holds for all q q 0 , , 1 Conversely, let f be a rule that does not belong to any class l for l K 0   . We have to show that there exist q q , , K 1 ¼ ( ) and x K 1: such that (A.1) does not hold. Indeed, we will show that (A.1) does not hold for a positive volume in the space of probabilities q q , , K 1 ¼ ( ) for some x K 1: . Let F be the minimum face of the K-dimensional Ising hypercube corresponding to f such that F includes all vertices x K 1: satisfying f x 1 ) . Since f does not belong to any class, F must include at least one vertex z K 1: such that f z 0 . Without loss of generality, we can assume that F x We also note that there is at least one x F K 1: Î such that f x 1 ) and f z 0 We also have h q q 0 , , 1 . This implies that f does not satisfy (A.1) for a positive volume of the space of probabilities q q , , K 1 ¼ ( ) since the inequality is strict and z LIT 1; K 1: )are continuous functions of q q , , K 1 ¼ . Indeed, we have Proof. Let f be a class l rule. Without loss of generality, we can assume that f x 1 Let p y i, ( ) be the probability that the output of f is y and p y z i j, ( | ) | the conditional probability that the output of f is y given that the jth input is z under the presence of noise of strength ò. It is clear that First, let us consider the case l 1 = . When l 1 = , we have Thus, if r 0 > is sufficiently close to 0, then we also have lhs rhs 0 -> . Since the inequality is strict for such r and both sides are continuous functions of q q , , K 1 ¼ ( ), this proves that f does not satisfy (A.1) for a positive volume of the space of probabilities q q , , K 1 ¼ ( ) for z K 1: . , ORCID iDs K Nakajima https:/ /orcid.org/0000-0001-5589-4054