Chaos may enhance expressivity in cerebellar granular layer

Recent evidence suggests that Golgi cells in the cerebellar granular layer are densely connected to each other with massive gap junctions. Here, we propose that the massive gap junctions between the Golgi cells contribute to the representational complexity of the granular layer of the cerebellum by inducing chaotic dynamics. We construct a model of cerebellar granular layer with diffusion coupling through gap junctions between the Golgi cells, and evaluate the representational capability of the network with the reservoir computing framework. First, we show that the chaotic dynamics induced by diffusion coupling results in complex output patterns containing a wide range of frequency components. Second, the long non-recursive time series of the reservoir represents the passage of time from an external input. These properties of the reservoir enable mapping different spatial inputs into different temporal patterns.


Introduction
Recent experimental studies have revealed that neighboring Golgi cells in the granular layer of the cerebellar cortex are densely interconnected with gap junctions that allow direct diffusion of ions between neuronal intracellular spaces (Dugué et al. (2009) ;Vervaeke et al. (2010)). Vervaeke et al. (2010) reported that more than 80 % of neighboring neuron pairs are interconnected with gap junctions, and that each Golgi cell is connected to approximately 10 other Golgi cells via gap junctions. They also showed that the diffusion current between neighboring Golgi cells has the effect of transiently desynchronizing the spike activities after external excitation (Vervaeke et al. (2010)). This is contradictory to the classical view of the role of gap junctions, which is to synchronize nearby neurons (Watanabe (1958)). In spite of the complex effect of diffusion coupling between Golgi cells on the ongoing dynamics, the causal relationship between this dynamics and cerebellar computation has yet to be elucidated.
Several theoretical studies have pointed out that diffusion coupling between nonlinear oscillators not necessarily realizes synchronization, but also induces instability (Turing (1952)) or even chaotic activity (Yamada & Kuramoto (1976); ; Tsuda et al. (2004); Schweighofer et al. (2004) ;Tokuda et al. (2010); Katori et al. (2010); Tadokoro et al. (2011); Tokuda et al. (2019)).  and Tsuda et al. (2004) reported that introducing diffusion coupling through gap junctions between class 1 neurons induces chaotic dynamics. Schweighofer et al. (2004) proposed a theory that the abundant gap junctions in the inferior olive produce chaotic neural activity that enables efficient transmission of information in the high-frequency components of inputs. It has also been proposed that the adaptive strength of the gap junction in the inferior olive regulates the degrees of freedom of the system, and the brain modifies the gap junction strength during learning to ensure that the system operates at an optimal level of degrees of freedom (Kawato et al. (2011); Tokuda et al. (2013Tokuda et al. ( , 2017; Hoang et al. (2019)). The possible computational role of diffusion coupling through gap junction in the granular layer should be elucidated as well.
The majority of cerebellar computational theories assume that the cerebellum is a supervised machine that learns a desirable input-output relationship (Marr (1969); Albus (1971); Ito (1970); Kawato et al. (1987); Buonomano & Mauk (1994); Wolpert et al. (1998); Schweighofer et al. (2004); Yamazaki & Tanaka (2007); Raymond & Medina (2018)). It is well known that two major input pathways converge on the Purkinje cells; the mossy fiber-granular layer-Purkinje cell pathway originating from a precerebellar nucleus such as the pontine nucleus, and the climbing fiber-Purkinje cell pathway originating from the inferior olive (Ruigrok et al. (2015)). These theories assume that the former pathway is the input layer of the supervised machine and the latter pathway conveys the supervising signals. The computational role of the cerebellar granular layer is assumed to be the preprocessing -feature engineering -of incoming signals from the mossy fibers.
It transforms an input to a dynamical representation in a high-dimensional space realized by the enormous number (∼ 10 11 in human) of the granular neurons (Marr (1969); Albus (1971); Badura & De Zeeuw (2017); Raymond & Medina (2018)). The granule cells and Golgi cells are the two major components of the cerebellar granular layer. Even though the major outputs of the granular layer are conveyed by the parallel fibers of the granule cells, the Golgi cells are also thought to play the central role of forming the representation because of the lack of direct recurrent connection within granule cells (Marr (1969); Albus (1971); Raymond & Medina (2018)).
It has long been known that the cerebellum plays a crucial role in motor learning, which requires execution of sequential movements with temporally precise timing (Ito (1984)). For example, a vast amount of experimental studies have characterized the essential information flow in the cerebellum that supports motor learning called classical eyeblink conditioning (Thompson (2005)). In a typical eyeblink conditioning, the animal is exposed to paired presentation of tone stimulus and a periorbital air puff stimulus intervened with a fixed interval (typically 250 ms) repetitively. After learning occurs, the animal acquires a temporally precise motor response (eyeblink) to the tone. The eyeblink response is precisely timed at the air puff onset with millisecond precision. It is also known that the interval discrimination task can be learned in eyeblink conditioning: animals can learn to elicit eyeblink responses at different latencies to different tone stimuli (Kehoe et al. (1993); Green & Steinmetz (2005)). Vast amount of evidence supports the fact that the cerebellum acquires the desired map to return a specific spatiotemporal pattern to a specific input. To explain this computation of the cerebellum, Buonomano & Mauk (1994) proposed a model of the granular layer consisting of sparse reciprocally connected granule cells and Golgi cells that are capable of representing the passage of time from the onset of an external sensory stimulus. In their model, mossy fiber excitation conveying the information of external tone stimulus elicits activity of the granule cells and the Golgi cells, with different sub-populations activated at different times. As a result, a specific sub-population of the granule cells is activated at a specific time from the onset of the stimulus, thereby representing the passage of time. This model successfully explains an important aspect of the behavioral and physiological traits in eyeblink conditioning. Yamazaki & Tanaka (2007) extended Buonomano's model, and proposed the view that the cerebellum is a liquid state machine -a type of reservoir machine (Jaeger (2001); Maass et al. (2002)). In the reservoir computing framework, the input signals to the system project to a recurrent network called reservoir that has a highly nonlinear dynamics in a high dimensional space, and only the readout connections from this reservoir are modified to give the desired output signals. In Yamazaki's model, the general computational role of the cerebellum is to acquire a map between spatiotemporal input patterns and desired spatiotemporal output patterns. This is a natural elaboration of the classical Marr-Albus-Ito model regarding the cerebellum as a supervised machine, in that the reservoir machine can process spatiotemporal patterns. The granular layer works as the reservoir, and long-term depression (LTD) of the parallel fiber-Purkinje cell connection works as the learning rule. The Purkinje cell corresponds to the output neuron of the reservoir. They showed that the network of random recurrent connections between granule and Golgi cells realized temporally specific activation of different sub-populations of granule cells in response to external inputs. Their model successfully acquires a function that maps specific different inputs to specific different temporal patterns. To date, several studies of cerebellar function with the reservoir computing framework have been conducted (Yamazaki & Nagao (2012); Rössert et al. (2015)). However, even though these theories assume the inevitable functional role of the Golgi cells in realizing the reservoir, to our knowledge, no study has focused on the functional role of massive gap junctions between the Golgi cells. Considering the facts that chaotic activity is related to the performance of the reservoir (Jaeger (2001) Laje & Buonomano (2013)) and that gap junction often induces chaotic activity (Yamada & Kuramoto (1976) (2019)), it is necessary to elucidate how the gap junctions affect the computational performance of the granular layer as the reservoir, especially in terms of the effect of chaotic dynamics it may produce.   Figure 1 shows the schematic diagrams of the two network architectures of the reservoir machines studied in the current study. We take the view that the cerebellum is a reservoir machine (Yamazaki & Tanaka (2007)). In this view, the granular layer of the cerebellum works as the reservoir, the mossy fiber projecting to the granular layer is the input, and the Purkinje cell is the output neuron. Learning via synaptic modification is realized by changing the connection strength of the parallel fibers, which are the readout connections. The reservoir maps an incoming input into a high-dimensional time series by nonlinear dynamics. Figure 1(a)

The network architecture
shows a simple reservoir machine composed of only Golgi cells that are mutually connected with diffusion coupling through gap junctions. In this study, we mainly focused on this model to evaluate the computational performance of diffusion-induced chaos as a reservoir machine. To confirm whether the results obtained with this model can also be observed in a more realistic situation, we perform a simulation incorporating the granule cells as well ( Fig. 1(b)). This model incorporates other major components of the granular layer: the granule cells, the excitatory projections from the granule cells to the Golgi cells, and the inhibitory projections from the Golgi cells to the granule cells. The readout projection to the Purkinje cell originates from the granule cells as the actual cerebellar anatomical structure. Note that we do not incorporate a feedback loop from the reservoir output to the input in this study.

The model dynamics
A cerebellar Golgi cell is known to have the following properties: (1) it shows a periodic activity in vitro (Forti et al. (2006); Solinas et al. (2007); Vervaeke et al. (2010)), (2) its spike frequency increases as external input increases (Forti et al. (2006); Solinas et al. (2007)), and (3) its diffusion coupling induces desynchronization of neighboring cells (Vervaeke et al. (2010)). We model the Golgi cells with the µ-model, which is a simple model described by a two-dimensional ordinary differential equation ).
The µ-model is a class 1 neuron model that shows spiking activity after a saddle-node bifurcation occurs as tonic external input increases. This model shows periodic activity under isolated conditions with a tonic input, increases spike frequency to increasing tonic input, and shows aperiodic activity when coupled with diffusion ( )). The simple model shown in Fig. 1(a) is described as follows: where V go i is the membrane potential of the ith Golgi cell, R go i is the recovery variable representing the ion channel activity of the ith Golgi cell, µ is the parameter of the µ-model, I go,tonic is the common tonic input to all cells, J go i is the diffusion current into the ith Golgi cell from the neighboring cells conducted through the gap junctions, g GJ is the conductance of a gap junction, I go,input i is the input signal to the reservoir described below, and N go is the number of Golgi cells. The model only differs from that of the former studies ; Tsuda et al. (2004); Tadokoro et al. (2011)) in that the input signal I go,input i is incorporated in Eq. (1). In the µ-model, both the units of time and the variables are arbitrary. We use milliseconds as the unit of time for convenience. We use parameters µ = 1.7, g GJ = 0.08 (typical parameter set showing chaotic dynamics) except in Figs. 4, 5 where the dependency of the dynamics on these parameters is studied, and in Figs. 3, 9 where g GJ = 0 is used. For the parameter I go,tonic , we use I go,tonic = 0.004 other than in a simulation shown in Fig. 2

(b) and (c).
We restrict the form of the input signal I go,input ∈ R Ngo to an instantaneous pulse as follows: where δ (t) is the Dirac delta function, x in,i ∈ R Ngo is the N go -dimensional vector representing the amplitude of the input, t 1 is the time when the input is given to the reservoir. Practically, giving an input pulse is done by setting V go → V go + x in at time t 1 , where V go = (V go i ) ∈ R Ngo is the vector representation of the membrane potentials. In Sec. 3.4, a series of two different pulses are given to the system. In this case, the input signal I go,input is described as follows: where t 1 and t 2 are the times when the input pulses are given to the reservoir.
The model incorporating the granule cells shown in Fig. 1(b) is described as follows: where V gr i is the membrane potential of the ith granule cell, R gr i is the recovery variable representing the ion channel activity of the ith granule cell, I gr,tonic is the common tonic input to the granule cells, and I gr,input i is the input signal to the reservoir projecting to the ith granule cell. The diffusion currents are described as follows: The currents I ei i , I ie i are the currents caused by chemical synapses, each representing the inhibition of the ith granule cell by the Golgi cells and the excitation of the ith Golgi cell by the granule cells, respectively, described with the following equations: where w ei ij is the strength of the synaptic connection from the jth Golgi cell to the ith granule cell, w ie ij is the strength of the synaptic connection from the jth granule cell to the ith Golgi cell, θ is a parameter defining the threshold above which each neuron can be regarded as emitting a spike, and f (V ) is an activation function. In this study, we use θ = 0.7, f (V ) = 1 1+exp(−50V ) . The synaptic strengths are determined using the following procedure. For each granule cell i, n ei presynaptic Golgi cells are randomly chosen. Then, the strength is set at w ei ij = c ei /n ei , if the jth Golgi cell is in the chosen group. The strength is set at w ei ij = 0, if the jth Golgi cell is not in the chosen group. Similarly, for each Golgi cell i, n ie presynaptic granule cells are randomly chosen. Then, the strength is set at w ie ij = c ie /n ie , if the jth granule cell is in the chosen group. The strength is set at w ie ij = 0, if the jth granule cell is not in the chosen group. The parameter values used are µ = 1.7, g GJ ∈ {0, 0.08} , N gr = 10 4 , N go = 10 2 , n ei = 4, n ie = 10 2 , c ie = −c ei = 0.2. These parameters are determined such that the ratio of the numbers of granule cells and Golgi cells, N gr /N go , and the number of synapses each neuron receive, n ei , n ie , are compatible with those described in the former study (Sudhakar et al. (2017)).
Neurons in a specific subset of the reservoir neurons are connected to the outputs, which we refer to as the projecting neurons hereafter. Let V out i be the membrane potential of the ith projecting neuron. In the simple model shown in Fig. 1(a), all the Golgi cells are the projecting neurons. Thus, V out i = V go i . In the model shown in Fig. 1(b), all (and only) the granule cells are the projecting neurons. Thus, V out i = V gr i . The ith output of the reservoir, y i is defined as follows: where N out is the number of projecting neurons in the reservoir, and w out ij is the synaptic weight of the connection from the jth projecting neuron in the reservoir to the ith output y i . The vector y = (y i ) ∈ R Ny defines the instantaneous output of the reservoir machine, where N y is the number of outputs (the number of Purkinje cells considered). The output synaptic weight w out ij is time independent, and its value is modified only in the batch learning procedure described below.

Learning of the readout connection
Let y target ∈ R Ny be the target pattern consisting of N y -dimensional time series defined over a time interval [t 0 , t 0 + T train ], where T train is the length of the time series. We determine the readout weight matrix W out = w out ij ∈ R Ny×Nout to minimize the following residual value: where |x| is the Euclidean norm of a vector x. In practice, this is conducted by sampling both the output vectors and the target vectors with a small sampling interval ∆. Let Ω ∈ R Ny×K be the discretized time series of the numerically integrated membrane potentials of the reservoir dynamics over the time interval [t 0 , t 0 + T train ] as follows: where K is the natural number satisfying K∆ ≤ T train < (K + 1)∆, and V (t) = (V out i (t)) ∈ R Nout is the instantaneous membrane potentials of the projecting neurons. The target pattern matrix Y is also defined by the same discretization as follows: Then, the optimal readout weight matrix W out is obtained by solving the following linear least square regression: where |W | fro is the Frobenius norm of a matrix W .

Evaluation of model performance
In order to evaluate the performance of the model, we use normalized root mean square error normalized by that of the target pattern (nRMSE), as follows: With the discretized time series, nRMSE is calculated as follows:

Lyapunov dimension
For the simple model ( Fig. 1(a)) under no dynamical input, we characterized the strength of chaotic activity of the reservoir with the Lyapunov dimension (Kaplan & Yorke (1979)). First, we calculate the Lyapunov spectrum by the standard method with continuous Gram-Schmidt orthonormalization of the fundamental solutions to the linearized differential equation along the trajectory (Shimada & Nagashima (1979)). Then, let λ 1 ≥ λ 2 ≥ λ 3 · · · ≥ λ 2Ngo be the Lyapunov exponents of the reservoir dynamics, and k be the maximal value of j such that j i=1 λ i ≥ 0, the Lyapunov dimension of the system is defined as follows: In this study, the Lyapunov dimension of the system is calculated under no external input to the system, where the system can be regarded as an autonomous dynamical system. Because we restrict the form of the input to the system as an instantaneous pulse (Eq. (4), the property of the reservoir without any external input characterizes the system's response to the external input.

Similarity index
It is commonly assumed that the cerebellar granular layer is able to exhibit activity specific to the passage of time in response to an input (Buonomano & Mauk (1994); Yamazaki & Tanaka (2007)). In order to evaluate the specificity of the instantaneous activity of the reservoir with respect to the passage of time and external input, we use the similarity index between different states of the reservoir. Let V out,(1) (t) and V out,(2) (t) be two different time series of the projecting neurons of the reservoir generated with two different external inputs. We use the following correlation function between the instantaneous values of V out,(1) (t) and V out,(2) (t) at two time points t 1 and t 2 , which we call the similarity index: where V out,(1) (t) is the mean membrane potential at time t as follows:

Numerical calculation
The numerical simulations were conducted using the ode45 function in Matlab R2019a (MathWorks Inc., Natick, MA, USA). The obtained time series of the reservoir states were further discretized with time step ∆ = 0.1 using the interp1 function (Eq. (15)). The learning procedure to obtain the optimal readout weight matrix W out was conducted by solving a linear least squares regression in Eq. (17) using the Matlab function mldivide.

Broad distribution of the interspike interval caused by gap junctions
First, we show that introducing diffusion coupling causes chaotic dynamics, which in turn results in a broad distribution of interspike interval (ISI) of a neuron in the model. Figure 2 shows examples of the evolution of the simple reservoir ( Fig. 1(a)) described by Eqs. (1-3). Namely, the model network is a one-dimensional chain of neurons coupled to nearest neighbors with gap junctions and does not consist of any connections via chemical synapses. Figure 2(a) illustrates an evolution of the membrane potentials (N go = 100) under a positive tonic input, I go,tonic = 0.004. An external perturbation at t = 0 is given with x in in Eq. (4) as x in = 0.2e 50 , to the all-synchronized state, where e 50 is the 50th standard basis. Namely, the membrane potential of the neuron in the center (50th neuron) is set as V go 50 → V go 50 + 0.2. The network shows chaotic activity induced by diffusion coupling, as reported previously Tadokoro et al. (2011)). Periodic synchronous activity is observed before the propagation of chaotic dynamics is elicited by an external perturbation. The behavior of the cell in the center of the network is shown in Fig. 2(d).
The activity is quite different from an isolated µ-model neuron with the same parameter, as shown in Fig. 2(e) (I go,tonic = 0.004, µ = 1.7). An isolated µ-model shows a saddle-node bifurcation at I go,tonic = 0, and it has a periodic spiking activity with I go,tonic > 0 and stable fixed point at resting potential with I go,tonic < 0 ). The ISI of the neuron in the center of the network (cell #50) in Fig. 2(a) is calculated for the subsequent 1 × 10 3 seconds ( Fig. 2(f)) by regarding the neuron emitting a spike when crossing the threshold θ = 0.7 from negative to positive. As visually evident in Figs. 2(a), (d), the ISI of this neuron shows a broad distribution over a wide range of periods, which is quite different from the periodic spiking activity without diffusion coupling shown in Fig. 2(e). The distribution has a small peak at around 50 ms, which is close to the period of the isolated single neuron (without the effect of the gap junction) shown in Fig. 2(e). Interestingly, the spatiotemporal pattern of the membrane potentials shows the fractal known as the Sierpinski gasket (Mandelbrot (1983)) under a small negative tonic input, I go,tonic = −0.00095 ( Fig. 2(b)). The spatiotemporal pattern at larger scale (N go = 1000) shown in Fig. 2(c) clearly depicts self-similarity of the spatiotemporal pattern. Namely, the spatiotemporal pattern has no characteristic scale.
The spatiotemporal pattern with positive tonic input ( Fig. 2(a)) could be interpreted as a ruined pattern of the Sierpinski gasket ( Fig. 2(b)), thus inheriting the fractals property of scale invariance over multiple time scales (i.e., broad distribution of ISI).
3.2. Gap junctions in the reservoir realize producing a target pattern with a broad range of frequencies.
Next, we evaluate the effect of introducing gap junctions on the expressivity of the reservoir. More specifically, we examine how closely the model can output a sinusoidal temporal pattern with various temporal frequencies. Namely, we use the following sinusoidal wave (a scalar function of time t) as the target pattern y target described in Eq. (14): where T wave is the period of the sinusoidal wave. We quantify the dependency of the following nRMSE value on the period of the target sinusoidal wave, T wave , and on the model parameters: where RSS(T wave ) is the value of the objective function described in Eq. (14), which depends on T wave .
The spectrum of the value nRMSE(T wave ) over various T wave gives a way to evaluate the model's ability to output a more general complex temporal pattern that contains various frequency components. Suppose there is a target pattern, y target that is composed of a linear superposition of various sinusoidal waves as follows: where N wave is the number of different sinusoidal waves that compose the target pattern, T wave k is the period of the kth sinusoidal pattern. Let nRMSE target be the optimal nRMSE values for y target that minimizes the objective function described in Eq. (14). Then, from the triangle inequality, the following inequality holds: Thus, the right hand side of Eq. 26 gives the upper limit of the nRMSE value for the target pattern. To evaluate nRMSE(T wave ) (Eq. (24)), the following analysis is conducted. Firstly, a time evolution over [t 0 , t 0 + T train ] of the reservoir with a random initial input at t = 0 is numerically generated. Then, we use various sinusoidal waves (scalar function of time) over the same time interval [t 0 , t 0 + T train ] as the target patterns. The readout connection is fitted to match each sinusoidal wave separately by the linear least squares. The dashed lines in Fig. 3(a) indicate the target sinusoidal patterns and the solid lines indicate the fitted outputs of the model. The difference between the target patterns and the model outputs are visually not detectable when T wave ≥ 3 ms. Note that, as shown in Figs. 2(d) and (e), the width of a spike of the model neuron is ∼ 5-8 ms. The results of the same analysis without gap junctions are shown in Fig. 3(b).
All settings of the analysis are the same as in Fig. 3(a) except for two conditions. First, the conductance of the gap junction is changed from g GJ = 0.08 to g GJ = 0. Second, the initial state of the reservoir is generated by randomly shuffling the phases of neurons. This is because, with g GJ = 0, all neurons behave as parallel isolated neurons with a shared identical period. Thus, the distribution of the phase of the neurons is time invariant, and biased distribution of the phase of the neurons should be disadvantageous in producing the sinusoidal target patterns. The precision of the model output drastically decreases compared to the condition with diffusion coupling through gap junctions (Fig. 3(b)). The nRMSE values for both conditions are shown in Fig. 3(c). The nRMSE takes very small values over a wide range of the period of the target pattern when the gap junctions are incorporated in the model, whereas the nRMSE takes small values only at some specific range of the period if the model lacks the gap junctions. This result suggests that incorporating diffusion coupling in a reservoir enhances the expressivity of the output that the network can generate.  Fig. 4(a) depicts the dependency of the nRMSE on the strength of gap junction coupling, g GJ /N 2 go , and the target wave period T wave . Figure 4(b) illustrates the nRMSE dependence on g GJ /N 2 go when the target pattern period is T wave = 100 ms. We employ g GJ /N 2 go rather than g GJ because different network models with different sizes, but a same value of g GJ /N 2 go , behave qualitatively the same (Tadokoro et al. (2011)). This is because the model described with Eqs. (1-3) can be regarded as a discretization of a partial differential equation of a continuous one-dimensional excitable media. Models with different network sizes, N go , and the same value of g GJ /N 2 go correspond to discretizations of the same partial differential equation with different spatial resolutions. Equivalently, different models with a same network size N 2 go and different g GJ values show spatiotemporal patterns with different spatial scales proportional to 1/ √ g GJ . As illustrated in Figs. 4(a) and (b), the nRMSE takes small value at a wide but specific range of g GJ (10 −7.5 ≤ g GJ ≤ 10 −5.5 ).

Inverse correlation of the Lyapunov dimension and nRMSE
At the same time, the Lyapunov dimension, D L , takes a large value at the same range of g GJ (Fig. 4(c)).
The Lyapunov dimension characterizes the strength of chaos, and represents the degrees of freedom of the dynamics (Kaplan & Yorke (1979)). Figure 5 illustrates the nRMSE dependency on the parameter µ. Chaotic activity induced by diffusion coupling appears at a specific range of µ, as shown in the positive Lyapunov dimension in Fig. 5(c). As in the case of the dependency of the nRMSE value on g GJ (Fig. 4), the nRMSE takes very small values when the dynamics is chaotic and the Lyapunov dimension is large. These results showing the inverse correlation between the Lyapunov dimension and nRMSE suggest that diffusion-induced chaotic dynamics enhances the complexity of the representation in the reservoir.

The reservoir state represents the passage of time from a specific input
We evaluate the reservoir's ability to represent the passage of time with the similarity index (Eq. (21)).
The color map in Fig. 6(a) shows similarity indices C (1),(1) (t 1 , t 2 ) defined by Eq. (21), calculated within a time series, V go,1 (t), shown in the upper and left panels in Fig. 6(a). An external perturbation is given at t = 0 to an all-synchronized state as V go 50 → V go 50 + 0.5. The length of the time series V go,1 (t) is 1000 ms, and it is discretized with a time step of 1 ms to calculate the similarity indices, yielding a 1000 × 1000 matrix.
Each element of the matrix corresponds to the correlation coefficient between the membrane potentials of the reservoir neurons at two different time points. Because C (1),(1) (t 1 , t 2 ) is calculated within one time series, the diagonal elements are all 1. Figure 6(b) shows the histogram of the values of upper triangular elements of the C (1),(1) (t 1 , t 2 ) shown in panel (a). Most of the elements have smaller values than ∼ 0.4. This suggests that the state of the system does not come back close to the same point in the phase space -close enough that the similarity index takes high value close to 1 -within 1000 ms. In other words, the state of the reservoir activity has specificity to the passage of time. The distribution of the similarity index between different time points shifts towards even smaller values with larger network size, as shown in the case of N go = 500 ( Fig. 6(d)).  The discriminative ability of the reservoir to different inputs is also important. Figure 6(c) shows the similarity indices calculated between two time series, V go,(1) (t) and V go,(2) (t), that have slightly different inputs. The time series shown in the left panel, V go,(2) (t), evolves with the same initial condition as V go,(1) (t) shown in Fig. 6(a), but an additional input pulse is given to the cell at the edge of the network at t = 200 as V go 1 → V go 1 + 0.5 (shown with an open magenta circle). Because of the chaotic property of the dynamics, the orbit diverges from the original unperturbed orbit of V go,(1) (t), and the correlation between the two time series vanishes at around t = 350 ms (Fig. 6(c), lower panel). This can be explained by the sensitivity to the initial condition of chaotic dynamics. Thus, the reservoir's response to input has high specificity to the input.

Generation of different activities for different inputs
Next, we examine whether a model with a fixed readout weight matrix is actually able to generate different temporal patterns as outputs for different inputs. Firstly, we show a simulation of eyeblink conditioning: an extensively studied model of cerebellar dependent learning, which has been reproduced in several computational studies as well (Buonomano & Mauk (1994); Bullock et al. (1994); Medina et al. (2000);Yamazaki & Tanaka (2007); Li et al. (2013); Yamazaki & Igarashi (2013)). We simulated a situation where the model outputs a specific time series with respect to a specific external input. In eyeblink conditioning, animals are able to acquire motor response with different timings to different types of tone stimuli (Kehoe et al. (1993); Green & Steinmetz (2005)). For example, an animal can learn to elicit a motor reflex to a tone stimulus 200 ms after the stimulus onset if the pitch of the tone is 600-Hz, and 600 ms after the onset if its pitch is 1-kHz (Kehoe et al. (1993)). To reproduce this phenomenon, we consider the case where the model output, y, is a scalar function of time representing the eyeblink response, and train the model with multiple target patterns with its peaks at different latencies from the external input. Figure 7(a) shows similarity indices calculated between four time series of the reservoir state with a same initial condition and four different inputs x in,0 , x in,1 , x in,2 , x in,3 given at time t = 50. Four inputs, x in,0 , x in,1 , x in,2 , x in,3 are generated from a multivariate Gaussian distribution N (0, (0.2) 2 E), where E is the identity matrix. As shown in Fig. 7(a), the four spatiotemporal patterns with the inputs x in,0 , x in,1 , x in,2 , x in,3 rapidly lose similarity among them. When the time series with the inputs x in,0 , x in,1 , x in,2 , x in,3 are fitted to different target patterns (dashed lines in Fig. 7(b)) simultaneously, the model acquires different outputs assigned for each time series of the reservoir (solid lines). We also demonstrate the model's ability to generate two different human motions responding to two different inputs. Figure 8 shows the two outputs of a learned model with a fixed output matrix: walking ( Fig. 8(a)) and boxing ( Fig. 8(b)). The model is trained on two time series simultaneously, each for a specific

Incorporation of excitatory granular neurons
Lastly, we briefly confirm that the observed property of the simple model ( Fig. 1(a)) can also be reproduced in the model incorporating the excitatory granule cells and the chemical synapses, shown in Fig. 1

Discussion
In the current study, we investigated the computational role of the gap junction between Golgi cells in the cerebellar granular layer. Specifically, we evaluated the computational performance of the model of the cerebellar cortex using a reservoir computing framework. First, we showed that introducing gap junctions in the model induces chaotic dynamics that enables the reservoir to output complex patterns containing a wide range of frequency components . Second, we showed that the chaotic dynamics has a long non-recursive time series that is capable of representing the passage of time (Fig. 6). These properties of the chaotic dynamics realize the reservoir's ability to output the desired temporal patterns (Figs. 7, 8).
Yamazaki and their colleagues have proposed a model with these abilities based on a different mechanism, i.e., the random connection by chemical synapses between granule cells and Golgi cells (Yamazaki & Tanaka (2007)). In the current study, we pointed out another possible mechanism (diffusion through gap junctions) that would be capable of reproducing the aforementioned abilities of the model. Because the gap junctions connect neighboring neurons, the connections realized by the gap junctions must inevitably be local rather than distant. Thus, the average number of neighboring cells a neuron can contact cannot be as large as the case of chemical synapses. The average degree of the network realized by gap junctions may therefore be small. In the literature, it has been argued that the desirable feature of a reservoir is that the connection should be sparse (Jaeger (2001)). This is in line with our hypothesis that diffusion coupling by the gap junction constitutes the reservoir in the cerebellum. On the other hand, one report showed that the smallworldness of the reservoir contributes to better performance (Kawai et al. (2019)). In the granular layer of the real brain, the chemical synapses and the gap junctions may work in concert to realize preferable properties as a reservoir.
The ISI of a neuron in the chaotic dynamics induced by diffusion coupling through gap junctions shows a broad distribution over a wide range of periods, unlike that of an isolated neuron model (Fig. 2). Our result may explain the fact that a Golgi cell exhibits periodic activity in some experimental settings such as in vitro recordings (Forti et al. (2006); Solinas et al. (2007); Vervaeke et al. (2010)), but also shows an irregular activity with broad ISI distribution in vivo (Holtzman et al. (2006)).
Reservoir computing has drawn considerable attention in recent years, as it is expected to be a suitable and powerful framework for processing temporal sequences. However, the desirable dynamical properties that the reservoir must have, and its proper implementation, have not been well characterized, and still remain an open question (Boyd & Chua (1985); Jaeger (2001) 2019)). The current study investigated a reservoir consisting of a reaction-diffusion system (i.e., neurons coupled with gap junctions) and obtained results suggesting chaos in a reaction-diffusion system may contribute to the performance of the model. Further investigation of a reservoir machine using chaotic dynamics in a reactiondiffusion system may be an interesting direction for future study.
We found that the spatiotemporal pattern of the chaotic dynamics of µ-models coupled with diffusion in a one-dimensional chain shows the Sierpinski gasket (Fig. 2). It is reported that some other nonlinear reaction-diffusion systems also shows the Sierpinski gasket (Hayase & Ohta (1998). The simple model in our current study ( Fig. 1(a)) also belongs to the class of reaction-diffusion system, as it consists of a onedimensional chain of the neurons with nearest neighbor connections. There maybe a common mathematical structure behind our model and these former studies. It is well known that the Sierpinski gasket appears in the spatiotemporal patterns of a cellular automaton (CA) such as Wolfram's Rule 90 (Wolfram (1994)).
Morán et al. used CA as the reservoir and constructed a classifier for handwritten characters of the MNIST dataset (Lecun et al. (1998)), and compared the performance across the rules in CA. They reported that Wolfram's Rule 90 gives the best performance in the test data of the cross-validation (Morán et al. (2018)).
In the current study, we did not construct a classifier with our model. Thus, it is difficult to directly compare the current results with Morán and their colleagues' work. However, it should be an interesting direction to evaluate the performance of a classifier model with a reaction-diffusion system as the reservoir. Additionally, studies with CA may contribute to the elucidation of the cerebellar computation.
An important issue we did not consider in depth in the current study is the generalization ability of the model. Namely, the ability of the model to generate the same output from similar input. We showed that the chaotic dynamics realizes the specificity of the response to the input (Fig. 6). This can be interpreted by the chaotic dynamics' high sensitivity to the initial condition. However, the high sensitivity to the initial condition may cause poor generalization ability, because a small noise or a deviance in the input signal grows rapidly over time. Thus, systematic evaluation of the generalization ability of the current model should be an important issue to be elucidated in the future study. The aforementioned study by Morán et al. showed that Wolfram's Rule 90, which shows the same Sierpinski gasket pattern as the current model we use, shows the best performance in the test data of cross-validation (i.e., it shows the highest generalization ability).
In their study, the MNIST data is used as the initial state of the reservoir, and the spatiotemporal pattern of the evolution of CA over specific steps is used as the feature vector used for the subsequent classification task. Similarly, one could modify our current model so that the activity caused by the input decays within a specific time scale, before the small deviance in the input grows to the system size. This situation is actually similar in the real cerebellum because the cerebellum is believed to be able to maintain the input information for a fixed time, approximately ∼ 500 ms (Thompson (2005); Kotani et al. (2003)). It should be an interesting issue to elucidate the relationship to previously proposed properties of the reservoir such as the echo state property (Jaeger (2001) )). Additionally, it should also be noted that cerebellar dependent motor learning requires repetitive training (Thompson (2005)), which would help generalization. This is very different from hippocampal dependent learning, where an episode is learned one-shot.
In the last part of the Results section, we confirmed that the non-recursiveness of the system is inherited in the model incorporating the excitatory granule cells (Fig. 9). Actually, the similarity index between different time points within a time series (Fig. 9(d)) shows higher specificity to the passage of time than in the simple model consisting of only the Golgi cells (Fig. 6(c)). This suggests that the large number of granular cells contributes to the specificity. In this study, we incorporated granular neurons with a population size only 10 2 times larger than the Golgi cells (N gr = 10 4 , N go = 10 2 ) because of the computational cost. However, the ratio of the number of the granule cells to the Golgi cells in the real brain is reported to be even larger, as much as 430 times (Korbo et al. (1993)). The numerous granule cells may serve a role in multiplying the representational ability of the Golgi neurons.
In conclusion, we proposed the hypothesis that the massive gap junctions between the Golgi cells in the cerebellar granular layer contributes to expressivity by inducing chaotic dynamics.
This study was also partially supported by the JST Strategic Basic Research Programs (Symbiotic Interaction: Creation and Development of Core Technologies Interfacing Human and Information Environments, CREST Grant Number JPMJCR17A4). This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).