Spatiotemporal dynamics driven by the maximization of local information transfer

In this paper, a generic type of a spatially extended system, which is driven by the maximization of information transfer in each spatiotemporal point, is proposed. As an expression of the information transfer, transfer entropy is addressed, and a one-dimensional cellular system (whose state transition is governed to maximize the local transfer entropy (LTE) from interacting cells) is introduced. We first show that this system’s mechanism of state transition can be considered equivalent to a certain class of cellular automata rules with memory. The spatiotemporal dynamics of the system is then investigated to generate a wide variety of patterns, including spatiotemporal intermittency, according to the length of memory. Furthermore, the spatiotemporal patterns of states and the resulting information dynamics are statistically characterized in detail, expressing the system’s diverse nature. In particular, it is found that, within a certain condition of limited memory, even if each cell is driven to maximize the LTE, the entire system cannot reach toward its theoretical maximum value at all due to its intrinsic property, in which the system is dynamically bounding its limit on its own.


Introduction
Information theory [1] has been a useful tool to reveal aspects that are fundamental to understanding natural systems.Starting from a recent advancement in information thermodynamics [2], information-theoretic metrics are not only useful for characterizing physical systems but are also effective for understanding biochemical systems [3,4] and complex systems [5,6] in terms of the quantification, storage, and transmission of information.Furthermore, these measures based on information theory not only characterize the complexity and interaction modality of the system but are also frequently used to constitute a principle for optimizing the system in an unsupervised manner (e.g.[7][8][9]).In this respect, it is important to understand how the system alters if certain information content is modulated.
In this study, we focus on the information transfer between elements using the measure called transfer entropy [10].Given two processes, transfer entropy measures how the uncertainty to predict one process, provided its own history, is reduced further by knowing the history of another process (as detailed in the next section) 4  [11].Transfer entropy is intrinsically a measure for directed relationships, expressed as conditional mutual information [12].Due to its directed nature, it is often used to quantify interaction modality and the couplings of complex systems [13].For example, Barnett et al investigated a two-dimensional Ising model and showed that the global transfer entropy, which measures the average information transfer from all elements in the system to a single element, peaks in the disordered phase [14].Borriello et al analyzed the intrinsic complexity of elementary cellular automata rules in terms of transfer entropy-based classifications [15].Lizier et al proposed a framework to localize information-theoretic measures at each spatiotemporal point in the dynamics of complex systems [16].In particular, basing on this framework and using local transfer entropy (LTE), they showed that traveling agents (called gliders) carry and transmit information predominantly in the spatiotemporal dynamics of cellular automata [17].Quantifications have included not only simulated systems but also embodied physical robots in recent years (e.g.[18,19]).
In this context, information maximization (such as transfer entropy maximization) represents another line of research that has gained popularity as a method of optimizing information processing networks or controllers in an unsupervised manner (e.g.[16,[20][21][22][23]).This approach is frequently called guided self-organization [22].In spite of the increasing popularity and success of this approach in the literature, very little is known about the behaviors that transfer entropy maximization intrinsically demonstrates.The basic motivation for conducting transfer entropy maximization in these studies is to generate coordinated behaviors within the system (e.g.[16,20]) or to adapt the system to the external environment (e.g.[21][22][23]), which both originate from the interpretation that its maximization will reduce the ambiguity or increase the predictability of the interacting systems.On this point, it has been recently reported that transfer entropy as a measure for information transfer involves a specific constraint that overestimates flow or underestimates influence in certain situations and fails to quantify the effects of polyadic dependencies [24].This example clearly points to the need for a study that deals with how the process of transfer entropy maximization itself, which is highly nonlinear and provides non-trivial responses, affects the resulting complex dynamics of the system.Based on this consideration, this study is aimed to systematically investigate the types of behavior that emerge when transfer entropy maximization in each spatiotemporal point is implemented to drive the system.We propose a simple spatiotemporal system driven by LTE maximization among interacting elements and aimed toward understanding the effect of LTE maximization on spatiotemporal dynamics in general.We tried to keep the model as generic as possible and to not include specific external constraints (e.g. the setting of the network to be trained).We assumed only multiple data series with spatial configurations to be used as a test bed; this was done to investigate what happens if the information transfer in each spatiotemporal point is maximized in spatially extended systems in general.
This paper is organized as follows.In section 2, we start by introducing the preliminaries of informationtheoretic measures.In section 3, we explain our model setting and investigate its transition mechanism in detail.In section 4, the behavior of the spatiotemporal dynamics of the system is systematically analyzed, and its information transfer properties are statistically characterized.Finally, in section 5, we discuss the implications of the results and future extension scenarios for applications.

Information-theoretic measures
In this section, we introduce the information-theoretic measures used in this paper.The Shannon entropy [1] is one of the basic quantities in information theory that defines the uncertainty associated with the state x of a random variable X as where p(x) is a probability distribution of X.The base of the logarithm is taken as 2 throughout this paper, which defines the unit as bit.The mutual information between two processes X and Y measures the mutual dependence between the two, expressed as follows: where p(x, y) is a joint probability distribution of processes X and Y [1].For statistically independent distributions satisfying p(x, y)=p(x) p(y), we have M XY =0.If statistical dependencies exist, M XY >0.Mutual information is a fundamental measure in information theory and is used to evaluate an association between two or more processes, which naturally encompass linear and nonlinear dependencies.However, as can be seen from the equation, it is intrinsically symmetric under the exchange of the two processes X and Y, which means that it does not contain any directional information.Transfer entropy, proposed by Schreiber [10], is a measure that addresses both directional and dynamic relations between two processes.It measures the reduction in uncertainty to infer the future state of the responding system (X) by knowing the past state of the driving system (Y).
where the index TE Y X  indicates the influence of Y on X.In brief, transfer entropy measures the degree to which the history of Y disambiguates the future of X beyond the degree to which X is already disambiguated by its own history [11].It is non-negative, and any information transfer between the two variables results in TE 0 Y X >  .If the state y t n has no influence on the transition probabilities from x t m to x t+1 , or if the two time series are completely synchronized, then TE 0 Y X =  .We would like to note that from the perspective of testing a statistical hypothesis, it has been shown that the log-likelihood ratio test statistic for the null hypothesis of zero transfer entropy is a consistent estimator for the transfer entropy itself [11].
Lizier et al introduced the concept of local information-theoretic quantities [16] including a measure LTE [17].This measure focuses on revealing a transfer entropy profile for each spatiotemporal point by directly corresponding the measure to the observed time series data, which would be especially useful for monitoring the interaction modality of the local dynamics from the information-theoretic point of view.The concept is motivated by the fact that, in calculating the information-theoretic measure from the experimental data, the associated probability p(x) is operationally equivalent to the ratio of the count of observations N x of states (or joint states), to the total number of observations O made.Then, the transfer entropy can be expressed as follows: By considering that a double sum running over each observation a for each possible tuple observation x x y , , ) can be expressed as a single sum over all O observations, we obtain the following: Thus, we can write transfer entropy as the global average over LTE, te X Y t ,  , defined as where X á ñ denotes the temporal average of X.Note that LTE can have a negative value and it means that the sender is misleading about the prediction of the receiver's next state [17].

Cellular information transfer (CIT) system: formalizations and interpretations
We considered a spatially extended system consisting of N cells that could take a binary state of either 0 or 1.Each cell i interacts with K 2 neighboring cells { i−K, ..., i−1, i+1, ..., i+K }, and the state of the cell evolves depending on the cells' interactions.We implemented the maximization of local information transfer as a driving force of the state transitions of each cell, which determined the next state to maximize the local information transferring from the interacting cells.This setting is motivated to implement interactions to make each cell maximally driven by the inputs from interacting cells, or in other words, to make the behaviors of each cell maximally predictable by knowing the behaviors of interacting cells.Let x i,t be the state of cell i at timestep t; the state x i,t+1 is determined to maximize the sum of the LTE [17] from the interacting cells { i−K, ..., i−1, i+1, ..., i+K } to cell i, which is expressed as follows: where Here, m and n are the lengths of the embedding vectors, expressed such as x i,t m =(x i,t , x i,t−τ , ..., x i,t−(m−1)τ ), where τ is the embedding delay.Moreover, p y x ) and p y x x , ) are the conditional probabilities, where the state of i at timestep t+1 is y given the state of i at timestep t is x i t m , and that given the joint state of i and its interacting cell i+k at timestep t is x x ), respectively.Note that these probabilities are defined for each cell i or between cells i and i+k at timestep t, and they are calculated from the frequencies of joint states in the past W timesteps.We also note that the conditional probabilities in equation ( 9) are always positive since the joint state y x x , , ) at timestep t is counted for calculating these conditional probabilities.This setting to prepare the probability distribution, on the one hand, avoids the numerical dilemma of encountering a previously unobserved joint state at timestep t, but, on the other hand, it simultaneously avoids obtaining one consistent joint probability over the system's history generating two different probability distributions, according to y=0 and y=1.It is the experimenter's choice to weaken this condition, so as to prepare one consistent joint probability using, for example, the system's history until timestep t−1.In this case, one should directly face the situation of encountering a previously unobserved joint state at timestep t, making it impossible to naturally decide the next state based on equations (8) and ( 9).Accordingly, it is inevitable to introduce some additional settings, such as to randomly select x i,t+1 from {0, 1 } or to assume a certain model of probability distribution beforehand that complements the missing piece.Furthermore, in the case of te i,t (0)=te i,t (1), our model always selects y=0 according to equation (8) and this is obviously not the only option.For example, one can randomly select x i,t+1 from {0, 1 } in those situations.In fact, we have briefly tested that this condition, of randomly selecting x i,t+1 from {0, 1 } when te i,t (0)=te i,t (1), does not qualitatively alter the behavior of the system within the numerically experimented parameter settings in this paper.However, this option would be worth remembering when applying the model to different parameter settings from our experiments.
After the state x i,t+1 is determined based on equations (8) and ( 9), the value of the maximized LTE for cell i at timestep t can be obtained as te i,t (x i,t+1 ).We called this value the LTE of cell i at timestep t and simply denoted it as te i,t .This overall spatiotemporal system is called the CIT system, and it inevitably includes dual spatiotemporal dynamics.One of these dual dynamics is the dynamics of states forming cellular automata, and the other is the dynamics of the LTE.We previously proposed a similar model system based on a Boolean network that maximizes the local mutual information between interacting agents, and we showed that its time evolution rule always degenerates into a critical and highly canalized rule [25].The major differences between our current model and the previous one are the introduction of space (interacting with the neighborhood) and the use of LTE.
We would like to note that in the framework of local information dynamics [16], which is originally introduced to characterize the dynamics of a spatiotemporal system, our way of introducing LTE in this model is specifically defined as apparent transfer entropy.This measure is based on pairwise interactions with neighboring cells, and its accumulated value is specifically defined as the summed local information transfer profiles in [17]; this measure is now exploited to drive the system in our approach.Although we set this form of interaction as a first demonstration in our model, the design of interactions does not need to be the same, and it has flexibility to be modified according to the experimenter's purpose.In fact, there are several other forms of LTE proposed, such as complete transfer entropy [17], which conditions on other causal contributors to bring out the effect of specific interactions that the experimenter is interested in.It is expected that this way of thinking to design interactions, that is, by conditioning elements in specific ways, will further extend our framework.
It is worth noting that when we take a sufficiently large value of W and can assume a consistent joint probability distribution over the system's history at timestep t, the time evolution scheme of the CIT system can also be understood from a different angle.Let us check whether the condition te i,t (1)>te i,t (0) in equation (8) can be rewritten in a different form as follows: By introducing a partial likelihood function (or a likelihood function parameterized with ) [11], the above equivalence can be expressed as This illustrates that each cell in our system performs a maximum likelihood estimation with the inputs as data, L as a partial likelihood function given the current state x i t m , , and an indeterminate future state as a hypothesis.Thus, the driving force of the CIT system, which realizes a time evolution of states, can be interpreted as a transformation of states as hypotheses to states as data.
The maximization of LTE can also be expressed as a transition rule of cellular state dynamics of the CIT system.We introduced the notations N y x x This equivalence suggests that the transition of the cellular state dynamics of the CIT system forms a class of totalistic rule with memory that does not depend on the specific configuration of the cell states but depends only on the ratio of the total counts between N y x x , ,

Spatiotemporal dynamics of the CIT system
We started by observing the typical spatiotemporal patterns of the CIT system through numerical experiments.
Throughout the experiments in this study, unless otherwise mentioned, the system size was fixed to 100, and a single run (trial) consisted of 15 000 timesteps from a random initial condition with a periodic boundary condition.A random initial condition means that we prepared random binary states for the first W timesteps and then drove the system using these initial random states subsequently.The parameter K, the embedding dimensions (m, n), and the embedding delay τ were fixed to K=1, m=n=2, and τ=1, respectively, as a representative case throughout the analysis in this study.Note that in the actual numerical simulation of the system, for the state transition condition in equation (8), we used an equivalent expression induced in equation (10) instead of the original.This is to avoid causing a miscalculation due to the cancellation of significant digits when calculating the logarithm term in the original condition.

Observations
Figure 1 shows examples of typical spatiotemporal patterns of states and LTEs when the parameter W was set to 5, 10, 40, and 50.By varying the parameter W, we could observe the diversity of complex patterns.When W=5, we could observe that, for LTE dynamics, each cell took a specific value of LTE, which was 0, 1, or 0.5, and flipped it with 0 in a specific periodic pattern (we observed period 7 timesteps in figure 1 (the uppermost line)).Neighboring cells did not always form clusters but took different periodic patterns with the same cycle length.This structure generated strong spatial randomness with a specific temporal periodicity.We could observe that state dynamics corresponded to this behavior (figure 1 (the uppermost line)).
When the value of W became larger, such as W>8, we started to observe many avalanche-like traveling waves moving across the cells and colliding in LTE dynamics, where background patterns remained spatially random but temporally periodic (figure 1 (the second highest line)).Its patterns appeared to be similar to spatiotemporal intermittency observed in coupled map lattices [26].In this case, state dynamics also seemed to correspond to LTE behaviors.
When the value of W became much larger (W 11 > ), then some cells started to form spatial clusters with neighboring cells, which were taking the same constant value from each other or were semi-synchronized, showing relatively long periodic patterns in the LTE dynamics (e.g. when W=40, figure 1 (the third highest line)).Especially for the long periodic patterns in LTE dynamics, the corresponding state dynamics sometimes showed the same value for a specific time length, which we called a block.As the value of W became larger, these clusters began to gather, and the cells started to get partially synchronized both in LTE and state dynamics (e.g. when W=50, figure 1 (the lowest line)).These synchronized behaviors seem to be directly interpretable from the LTE rule, since the synchronization or coordination can reduce the uncertainty of or increase the predictability of the interacting systems.From the next section, we investigate the behavior of the system more into detail.

Analysis of spatiotemporal structures 4.2.1. Spatial clustering and temporal periodicity
We have also quantitatively investigated the spatial clustering and temporal periodicity of the patterns in both LTE and state dynamics.Figure 2(A) shows the outcome of clustering in LTE dynamics.A cluster is defined as a group of neighboring cells that take the same LTE value, and a cluster's size is defined as the number of cells that it includes.Using the final 10 000 timesteps for each trial of CIT run, we investigated the size and the number of clusters for each timestep, and we collected the maximum value of the size and the number of clusters.We iterated this process for 100 trials, and the mean maximum number and size of clusters were calculated (figure 2(A)).As a comparison, we prepared randomly assigned spatiotemporal states (we call this a random system), which are not driven by local information transfer maximization, with the same system size.We calculated the LTE for each cell with the given parameter W. The mean maximum number and size of clusters were obtained in the same manner (figure 2(A)).As a result, we could clearly see that several peaks appeared in the mean maximum size of clusters in the region around 10<W<60, starting from W=7, 8 in each 8 or 9 increments of W. These peaks became maximized at around W=16, 17, and they gradually started to decrease when W got larger.The behavior of the mean maximum number of clusters basically coordinated in an opposite way with that of the mean maximum size of clusters.We also confirmed that these peaks are the outcome of the specific characteristic of the CIT system by comparing with the results of the random system that had no peaks at all. Figure 2(B) investigates the temporal periodicity of patterns for both LTE and state dynamics.Similarly to the above case, using the final 10 000 timesteps for each trial, we evaluated whether each cell settled down to a periodic cycle of patterns for both LTE and state dynamics by checking the patterns' appearance ratio.We counted how many times we could observe a periodic cycle of patterns within the system (among 100 cells).The system should contain at least one cell that shows a definite periodic cycle of patterns to be counted.We also collected the maximum length of the cycle if more than one cycle appeared in each trial, and the mean maximum cycle length was obtained.In the plot (figure 2(B)), we can see that, as the value of W increased to more than around 30, the appearance ratio suddenly started to drop.(We can still observe high values of the appearance ratio for specific values of W.) It is noticeable that, in the setting when the system shows avalanche-like traveling waves, such as W=9, 10, 11, the appearance ratio is low, which is consistent with our observations.The mean maximum cycle length of patterns basically coordinates with the behavior of the appearance ratio.Until W got larger than 30, the mean max cycle length increased almost monotonically up to 60-80 (figure 2(B)).If W got even larger, the periodic patterns could not be stably observed.We have also investigated the maximum size of the block that appears in the state dynamics (figure 2(C)).The block was analyzed using the final 10 000 timesteps of state dynamics for each trial, and the maximum block size was averaged over 100 trials.Interestingly, although the size remained at a small value when W was less than around 30, if it got larger, the maximum block size immediately increased and almost always reached near the value of W for each trial (figure 2

(C)).
To see the temporal structure of the system further, we performed a power spectrum analysis.The Fourier transformation of a time series x i,t of the cell i for t=0, 1, ..., T−1 is given by S f x exp and the Fourier power spectrum is defined as S f S f , where N=100 in our analysis.We used the final 8,192 timesteps for each trial for the analysis, and we investigated both the state and LTE dynamics.
Figure 3(A) shows the results of the analysis according to each W. We observed that when W was small (such as W=5), several peaks can be observed in the high-frequency domain, while the power remained flat in the low-frequency domain in both the state and LTE dynamics.By increasing the value of W, we observed several peaks in the lower-frequency domain as well, and its tail started to show up as a slope, which is a signature of 1/f α noise, suggesting the self-similarity and complexity of the temporal patterns in the range of 1 2   a .We analyzed the behavior of α according to each W (figure 3(B)).Starting from α=0 in the small W (< 9), the slope became steep immediately (such as α>1), in the range W 9 3 2   , and then it became gentle again (such as α<1), if the value of W became much larger.

Statistics of LTE
We also investigated the amount of information that the CIT system transfers.We have investigated both the mean and the mean maximum value of LTE according to each W (figure 4).Let us first see the behavior of the mean LTE value, which is a spatiotemporal average of the amount of LTE that flowed within the system.We obtained the temporally averaged LTE value for each cell using the final 10,000 timesteps; then, we calculated the spatial average of these averaged LTE values of cells in each trial.This value was further averaged using 100 trials of run, and the mean value of LTE was obtained for the analysis (figure 4(A)).As a result, we found several local maximums when the value of W was relatively small, observing a peak of the LTE value when the value of W was multiples of 8.This tendency continued until the value of W exceeded about 40.When W got larger than 40, the value of LTE started to settle down to around 0.75.It is worth noting that the value of W showing a peak of the LTE value corresponds to that showing a peak in the maximum size of clusters.The mean LTE value of the system was consistently larger than that of the random system in the entire range of W. (It is important to note that the result of a random system represents a finite-data effect caused by constraints of our system settings, such as the parameter settings and the state transition scheme using an LTE.) Next, we analyzed the mean maximum value of LTE.By using the final 10 000 timesteps, we collected the maximum LTE value for each trial, and the mean maximum LTE value was calculated using 100 trials for each W. As a reference, a theoretical upper bound could be calculated as , becomes the maximum when p x x x , 1 , which is W log 2 .)We could see that the mean maximum LTE value did not increase and remained a relatively small value, which was actually smaller than the random system, until the value of W exceeded around 25 (figure 4(B)).This is counterintuitive because this outcome suggests that, even if the system is driven to maximize the LTE at each spatiotemporal point, the system as a whole cannot produce any larger LTE than the random system (even for a single site), expressing the unique feature of the CIT system.When W got larger than around 25, the mean maximum LTE value started to increase and almost always reached around the theoretical upper bound of the LTE value.It is also important to note that this outcome corresponds to the appearance of the block in the state dynamics whose size is W (figure 2(C)).
Let us see how the block and the LTE value are related to each other in detail.As mentioned above, the theoretical upper bound of the LTE value W log 2 can be obtained when p x x x , 1  For each diagram, the mean spectrum and slopes were obtained using 100 trials.To calculate the slopes, we used the data from the lowest five plots of f T log( ).
also dependent on how you set the parameters m and n).In our numerical experiments, the block size of W+1 was merely observed as the maximum size that appeared, while the block sizes of W−1 or W were frequently observed (figure 2(C)).This implies that the set of initial conditions that can generate the block size of W+1 is very specific and remains in a narrow region of basin, if it even exists.As an example, we see a typical case that shows the LTE value of )and the block size of W when W=50 (figure 5).We can actually observe that just after the block of size 50 has completed, the high value of LTE is obtained, which implies a similar mechanism as the one we illustrated above for the case when obtaining the theoretical upper bound.

Effect of small W
In figures 2(C), 3(B), and 4(B), we observed that qualitative changes occur at W≈30 in the mean maximum block size, in the slope of the power spectrum, and the maximum value of LTE, respectively.These could be originated from the finite sample size W for calculating the conditional probabilities since the total number of the joint states is 2×2 m ×2 n =32.As a first step to investigate the effect of small W, we calculated the average number of joint states that do not occur N non occur á ñ in the calculation of the joint probabilities (figure 6).For For both plots, as a comparison, we prepared 100 spatiotemporal systems that have random binary sequences with the same system size of 100 (i.e.random systems).LTEs (the spatiotemporal average of LTEs and the maximum LTEs) were calculated over these systems, and the corresponding mean and the mean maximum LTE values were obtained.In (B), we also overlaid the theoretical upper bound of the LTE in each W. The shadowed envelope represents the standard deviation.
comparison, we also plotted N non occur á ñ for the random system in which future states are determined with equal probabilities.The possible minimum number of joint states that do not occur is given by W max 0, 32 -{ } .Since joint states do not occur with nonzero probabilities in the random systems, N non occur á ñ for the random system is greater than W max 0, 32 -{ } .For W 11  , N non occur á ñ for the CIT system is almost equal to that for the random system.This suggests that the behavior of the CIT system is dominated by the finite size effect of W. However, the former takes significantly higher values than the latter for larger W. Thus, in this regime, the rule of the CIT system is at least partially responsible for the bias towards the higher values in N non occur á ñ -.N non occur á ñ for the CIT system seems to converge to a nonzero value (≈18) as W increases.From figure 6, W=32 could be characterized as the point where N non occur á ñ almost reaches the convergence value.However, the effect of small W on the dynamical behaviors generated by the CIT system is still unclear.Providing the complete analysis to reveal this issue is out of the scope of this paper and is left as future work.

Discussion
In this paper, we proposed a simple spatiotemporal system that evolves through the maximization of local information transfer.It was shown that its transition function of states can be explained as a specific class of   Figure 6.The average number of joint states that do not occur in the calculation of the joint probabilities.For each W, we run the system once and the average is taken over all timesteps (T=1500 and the initial 1000 steps were disregarded), cells (N=50), neighbors (K=2) and future states (y=0 and y=1).For comparison, the result for the random system and the possible minimum number of the number of joint states that do not occur W max 0, 32 -{ } are also shown.
cellular automata rules with memory.We also investigated the behavior of the system by varying the length of memory W, and we revealed its diverse nature.For example, we observed avalanche-like traveling waves propagate through the system in some conditions, and the spatial clusters and temporal periodic patterns were generated in others.Surprisingly, we found that in a specific condition of a limited range of memory (even if the system was intrinsically driven by maximizing the LTE), the maximum value of LTE did not grow higher than that of the random system, whose states were assigned randomly without any maximization procedure of the LTE.Considering that our model is generic enough to represent a typical feature of an unsupervised system driven by LTE, this result has critical implications for the setting of memory resources in general.In the literature regarding transfer entropy maximization research, for example, it is often the case that the experimenter fixes the memory span for the transfer entropy calculation beforehand, which corresponds to W in our model (e.g.[16]).However, our results suggest that according to the choice of W, the resulting target behavior is largely constrained.This essentially requires an understanding of how the target behavior is determined by predetermined system parameters; thus, our framework would be beneficial to further investigating these intrinsic constraints systematically.Furthermore, since the LTE is about the information flown from the neighboring cells, the results in figure 4 may also imply that the length of memory W can be taken as a parameter that modulates the sensitivity to or acceptability of the information transfer in each cell.Thus, it would be considered that only if the parameter W is tuned and balanced in this sense does the self-similar structure emerge in the system, resulting in the appearance of 1/f α noise, as we observed in figure 3.
This platform can be analyzed further and also be extended in multiple directions.For example, it would be worth investigating how the behavior of the system alters when varying the parameter of embedding dimension m, n and the number of interacting neighboring cells K.These variations will be included in our future work.Furthermore, although we adopted transfer entropy as an expression for information transfer, this is not always appropriate according to what is being investigated through the experiments [27].It is also possible to adopt conventional mutual information or other directed information measures as candidates (e.g.[28][29][30]).The designs of the forms of interactions can likewise be tested in many variations (such as including delayed interactions between multiple cells) by regulating how to condition on elements within the system [29][30][31].We believe that we have proposed a generic framework to investigate how the system behaves when certain information content is modulated in each spatiotemporal point.We expect that, by adjusting the model setting according to the specific constraint of a focused situation, our framework will provide a useful test bed that can be explored.

,
within the memory span W.

Figure 1 .
Figure 1.Spatiotemporal patterns of the CIT system according to the parameter W. Spatiotemporal patterns of the LTE (left diagram) and states (right diagram) are shown for W=5, 10, 40, and 50.

Figure 2 .
Figure 2. Characterizing spatiotemporal dynamics of the CIT system in terms of spatial clustering and temporal periodicity of the patterns.(A) The mean maximum number and size of clusters of the CIT system are shown according to each W. Results from randomly assigned spatiotemporal states, which are not driven by local information maximization, are also overlaid as a comparison.Error bars show standard deviations.(B) The appearance ratio (upper diagram) and the mean maximum cycle length of patterns (lower diagram) in both the state and LTE dynamics according to each W. (C) The mean maximum block size in state dynamics according to each W. The shaded region expresses the range within the standard deviations.

Figure 3 .
Figure 3.The Fourier power spectrum analysis.(A) The mean spectrum for state dynamics (left diagram) and LTE dynamics (right diagram) are shown according to each W. (B) The mean slopes of the spectrum for state dynamics and LTE dynamics are shown according to each W.For each diagram, the mean spectrum and slopes were obtained using 100 trials.To calculate the slopes, we used the data from the lowest five plots of f T log().

Figure 4 .
Figure 4. LTE statistics according to W. (A) Analysis of the mean LTE value.(B) Analysis of the mean maximum LTE value.For both plots, as a comparison, we prepared 100 spatiotemporal systems that have random binary sequences with the same system size of 100 (i.e.random systems).LTEs (the spatiotemporal average of LTEs and the maximum LTEs) were calculated over these systems, and the corresponding mean and the mean maximum LTE values were obtained.In (B), we also overlaid the theoretical upper bound of the LTE in each W. The shadowed envelope represents the standard deviation.

Figure 5 .
Figure 5.A typical example of LTE dynamics and block when W=50.The upper diagram overlays the state dynamics x x , , | )), and the lower diagram shows the LTE dynamics, which shows the maximum value of W Let us write x t and y t for the values of two temporal processes X t and Y t , respectively.It quantifies the deviation from the generalized Markov property: to x t+1 , and m and n are the length of the embedding vectors.If the deviation from a generalized Markov process is small, then the state y t n can be assumed to have little relevance to the transition probabilities from x t m to x t+1 .If the deviation is large, however, then the assumption of the generalized Markov process is not valid.The incorrectness of the assumption can be expressed by the transfer entropy, formulated by the Kullback-Leibler divergence [1, 10]: Actually, this configuration of the cell states is nothing but a block of size W+1 (note that the length of block is