Bayesian inference is facilitated by modular neural networks with different time scales

Kohei Ichikawa; Kunihiko Kaneko

doi:10.1371/journal.pcbi.1011897

Abstract

Various animals, including humans, have been suggested to perform Bayesian inferences to handle noisy, time-varying external information. In performing Bayesian inference by the brain, the prior distribution must be acquired and represented by sampling noisy external inputs. However, the mechanism by which neural activities represent such distributions has not yet been elucidated. Our findings reveal that networks with modular structures, composed of fast and slow modules, are adept at representing this prior distribution, enabling more accurate Bayesian inferences. Specifically, the modular network that consists of a main module connected with input and output layers and a sub-module with slower neural activity connected only with the main module outperformed networks with uniform time scales. Prior information was represented specifically by the slow sub-module, which could integrate observed signals over an appropriate period and represent input means and variances. Accordingly, the neural network could effectively predict the time-varying inputs. Furthermore, by training the time scales of neurons starting from networks with uniform time scales and without modular structure, the above slow-fast modular network structure and the division of roles in which prior knowledge is selectively represented in the slow sub-modules spontaneously emerged. These results explain how the prior distribution for Bayesian inference is represented in the brain, provide insight into the relevance of modular structure with time scale hierarchy to information processing, and elucidate the significance of brain areas with slower time scales.

Author summary

Bayesian inference is essential for predicting noisy inputs in the environment and is suggested to be common in various animals, including humans. For the brain, to perform Bayesian inference, the prior distribution of the signal must be acquired and represented in the neural networks by sampling noisy inputs to estimate the posterior distribution of signals. By training recurrent neural networks to predict time-varying inputs, we demonstrated that those with modular structures, characterized by the main module exhibiting faster neural activity and the sub-module exhibiting slower neural activity, achieved highly accurate Bayesian inference to perform the required task. In this network, the prior distribution was specifically represented by the slower sub-module, which effectively integrated the earlier inputs. Furthermore, this modular structure with different time scales and division of representing roles emerged spontaneously through the learning process of Bayesian inference. Our results demonstrate a general mechanism for encoding prior distributions and highlight the importance of the brain’s modular structure with time scale differentiation for Bayesian information processing.

Citation: Ichikawa K, Kaneko K (2024) Bayesian inference is facilitated by modular neural networks with different time scales. PLoS Comput Biol 20(3): e1011897. https://doi.org/10.1371/journal.pcbi.1011897

Editor: Thomas Serre, Brown University, UNITED STATES

Received: January 31, 2023; Accepted: February 6, 2024; Published: March 13, 2024

Copyright: © 2024 Ichikawa, Kaneko. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Source codes for these models can be found at https://github.com/tripdancer0916/slow-reservoir.

Funding: This study was partially supported by a Grant-in-Aid for Scientific Research (A) (20H00123) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan and by MIC under a grant entitled “R&D of ICT Priority Technology (JPMI00316)” in part. KK is supported by Novo Nordisk Fonden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the human and various animal brains, information processing involves inference based on inputs from the external world through the sensory systems, which obtain information from inputs under uncertainty [1] due to noise. To predict time-varying noisy inputs, previous studies suggested that animals such as humans and monkeys process inputs according to a Bayesian inference framework to deal with such uncertainty [2–13].

Bayesian inference is performed by calculating the posterior from the prior. The prior is gained from the history of inputs and gives the information to predict the signal input in advance. From it, the likelihood is estimated by observing the input signal. It is believed that prior knowledge must first be represented in the brain. It is then adjusted over time from the input history. However, how prior information is shaped in the brain remains elusive. In machine learning, several models such as variational recurrent neural networks (RNN) [14, 15] have been proposed that can perform Bayesian inference by computing prior from external signals. However, these models are designed specifically to make Bayesian inference, for instance, by introducing neurons for expressing prior, in advance. They, therefore, will not be able to answer how prior is shaped in the brain and which structure in the brain is relevant, in order to perform Bayesian inference. Here, we explore how neural networks acquire and represent prior knowledge, to predict time-varying noise inputs by performing Bayesian inference.

In this study, by recalling that a deterministic neural network with a simple learning algorithm can perform probabilistic inference [16], we investigate which type of RNNs can predict stochastic time-varying input better, by acquiring prior knowledge to perform Bayesian inference. Note that the acquisition mechanism of prior knowledge itself was studied earlier for fixed inputs [17]. In this case, however, fixed prior is sufficient, therein, and how time-varying prior was shaped in neural networks to predict time-varying inputs was not considered.

To discuss which structure of RNNs is relevant to shape the Bayesian inference, we recall hierarchical structure in the brain, with functional differentiated areas. In fact, some experiments suggest that the prior and the likelihood for Bayesian inference are encoded in different brain areas [18–20], even though the validity of the possible mechanism underlying the results remain controversial. On the other hand, relevance of area differentiation to the Bayesian inference can be theoretically expected as follows: In general, to obtain the prior, it is necessary to estimate the prior distribution based on previous observations. For it, the population of neurons representing the prior must integrate observed inputs over some time span. One possible mechanism for achieving such integration will be gained by adopting two neural modules functioning at distinct time scales; a downstream neuron population with slower activity changes separated from an upstream neuron population that processes input information. The existence of slow module that does not directly receive inputs in the neural network, thus will be relevant to integrate inputs over some time span.

Some experimental reports, in fact, have suggested that the time scale of neural activities in higher layers of the brain that do not directly receive external input is slow [21–23], which may work to integrate the activities of lower layers. Note that putamen, amygdala, insula, and orbitofrontal cortex in the brain have been found to represent prior in an experiment [18].

Inspired by these experiments and theoretical considerations, we studied RNN models with two modules; a main module with a direct connection to the input-output layer and a sub-module with a direct connection to the main module and without connections to the input-output layer (i.e., a hierarchical structure)(Fig 1). In the RNN we applied a time-varying stochastic input, whose mean value changes with time whereas Gaussian noise is added around it. We trained the RNN to minimize the prediction error. Then, we confirmed that RNN could predict the input by the appropriate modular structure shaping the prior for the Bayesian inference. We examined a possible role of modular structure and the importance of the time scale difference between the main and sub-modules in forming the prior representation for Bayesian inference. We found that RNNs with a modular structure shape the prior more accurately than regular RNNs when the signal containing noise is inputted.

Download:

Fig 1. Schematic of RNN.

(a) Standard RNN without modular structure (b) RNN with modular structure.

https://doi.org/10.1371/journal.pcbi.1011897.g001

Further, Bayesian inference was shown to be more accurate when the time scale of the sub-module was appropriately slower. When the time scale was uniform, prior information was maintained in both the main module and sub-module. In this case, the performance of the prediction of time-varying input was rather low. In contrast, when the time scales were different, prior information was represented by the slow sub-module, in which case, the performance of prediction was quite high. In the latter case, the time variance of the prior was embedded in the neural manifold of the slower sub-module. With this embedding of variance by the sub-module, the average input change is clearly distinguished from noise, leading to better Bayesian inference.

In addition, we trained the RNN so that the connection of structure and time scale of neurons also change and examined if the modular structure with distinct time scales would emerge from a homogeneous neural network. As the training progressed, we observed that the time scales of neurons differentiated into slower and faster scales. A modular structure arose in which slower neurons were separated from the input/output layers, which were predominantly connected to the fast neurons connecting input/output layers. This sub-module with slow neurons represented the prior information distinctly.

These results will be important to understand how the prior for Bayesian inference is represented in the neural networks and provide insight into the relationships between neural dynamics and the structure [24–28] underlying information processing in the brain.

Materials and methods

Recurrent neural networks with/without modular structure

To investigate the effect of structure and time scale on Bayesian inference, we considered the following RNNs [29]; a standard RNN without modular structure and RNN with a modular structure.

First, we adopted a standard RNN consisting of an input layer, a recurrent(hidden) layer, and an output layer, as shown in Fig 1(a). The number of neurons N is set at N = 200 unless otherwise mentioned. The following equation represents the dynamics of the recurrent layer: (1) where I represents and represents a vector to introduce the rate of change, i.e. the inverse of time scale of the neurons(0 ≤ α_i ≤ 1) (Within this range of α, the present model works). Furthermore, it should be noted that the products of vectors such as (I − α)x(t) are defined as the Hadamard product. If the timescale for neurons is identical, α_i is set as α_i = const. Below we consider the case with two timescales as (2) where N_m and N − N_m neurons have distinct timescales. N_m is set at N_m = 150 unless otherwise mentioned. Here the standard homogeneous network is given by α_s = α_m; Below, the case with α_s < α_m was mostly studied to investigate the effect of time scale difference. Although we have mainly shown the results with a specific ratio of 150 fast neurons to 50 slow neurons, we found that the key results remained unchanged; In fact, we will show later that unless the number ratio is excessively biased (e.g., 10:240 or 240:10). As long as there are enough neurons in each category to capture the underlying dynamics, good Bayesian performance is achieved. (The condition may further depend on the details in the network complexity or specific interactions forms between fast and slow neurons, which will need further studies.) Here, u(t) is the input signal, and x is the state of the neurons in the recurrent layer. W_in and W represent the weight of synaptic connection. ξ was used to account for independent noise in dynamics given by a random variable that follows a normal distribution with mean 0 and standard deviation 0.05. We adopted the activation function ReLU(ReLU(z) = 0 for z ≤ 0 and ReLU(z) = z for z > 0) [30]. Then, the output of the RNN was determined by the linear combination of the internal states as follows. (3)

Next, we introduced a modular structure to the above RNN to ensure the distinction between the main and sub-modules(Fig 1(b)). Only the main module was connected to the input/output layers. Thus, the dynamics of the recurrent layer are given by (4) (5) where x_m and x_s represent the firing rate of neurons in the main and sub-modules, respectively. Hence, α_m and α_s represent the inverse of time scale of the main and the sub-module, respectively. Here, α_m is fixed at 1(without losing generality), while we varied α_s from 1 to 0.01 to examine the effect of the time scale difference. The RNN output was determined by the linear combination of internal states of the main module. (6)

Task

This study focused on a task optimally solved via Bayesian inference, in which an RNN was assigned to estimate the actual value from a noise-perturbed signal. We constructed the external input signal s as follows:

Initially, we randomly sampled the true value, y_true, from a generator (or cause) distribution defined by a normal distribution with a mean of μ_g and a variance of
Subsequently, the s was generated from y_true by adding noise which is sampled from the normal distribution with mean 0 and variance

Here it is important to note that the generator is not static, and is changed over time with a probability p_t, where p_t represents the parameter indicating the likelihood of changes in the generator. Upon the change, the parameters μ_g and σ_g were updated to values uniformly sampled from a given ranges: μ_g ∈ [−0.5, 0.5] and σ_g ∈ [0, 0.8]. When the generator model alters, it changes the distribution to which y_true conforms. As a result, the external input signal s fed into the RNN also changes. However, even if the generator changes, the signals before and after the change are combined and input to the RNN as a series of signals.

As mentioned in the Introduction, we set this task, as the RNN can make predictions when it performs Bayesian inference. To perform it, the prior distribution needed for Bayesian inference must be estimated from the observed signal so that it is close to the generator distribution. u(t) for Eq 1 (or 4 and 5) is given by the Probabilistic Population Code (PPC) [31]. Now we need to assign the input term u(t) in Eq 1 from the signal s. For it, we adopted the Probabilistic Population Code (PPC) [31]. PPC assumes that the information in a signal is encoded by a population of neurons with a position-based preferred stimulus that fires probabilistically according to a Poisson distribution. It has been shown that neural networks with a population of neurons with input by PPC can learn probabilistic inference effectively [16]. Therefore, in this study, we also assumed that the activity u of the input-layer neurons encoding the observed signal followed the PPC model. Accordingly, u was sampled from the following Poisson distribution [32] every time step: (7)

Here, s is the observed input signal (i.e., which is generated from y_true by adding noise), whereas f_i is the tuning curve of the neurons, which represents how responsive each neuron is to s. The signal variance is inversely encoded in f_i as the amplitude of the tuning curve. The tuning curve of the neurons is represented as follows: (8) where ϕ_i represents the preferred stimuli of neurons in the input layer. It was assumed that ϕ_i follows an arithmetic sequence for i (ϕ_i = −1/2 + i/m when the number of neurons in the input layer is m) [33]. Also, is a constant that represents the width of tuning curve which was set as in this study. By employing the above of tuning curve transformation, it is demonstrated that one can encode the information from the external signal s into the spatial position of neurons that are most likely to fire.

In this task, the true value y_true was to be estimated based on the input signal u. Therefore, training was performed to minimize the mean squared error (MSE) between the neural network output y(t) and the true value y_true(t). Note that the information of y_true is used only to calculate the loss function when learning. (We acknowledge that providing the true value y_true(t) for training might be an artificial setting. However, similar settings in which the true value is provided have been utilized in previous studies, such as in the work by [18]. For the purposes of this study, to investigate the role of a modular structure and time-scale difference, it will be useful to adopt the simple and previously established settings. Then one can compare the present system with the standard homogeneous network case, even though it cannot fully reflect on the real-world situation.) (9)

The training was performed by using the backpropagation method [34, 35], to decrease L by optimizing the weight of synaptic connections W_*. Here, this optimization is performed by the stochastic gradient method. For it, an efficient method called Adam [36] is generally established and widely use, which was adopted here. The batch size of training samples was set to 50. In machine learning, a batch size refers to the number of training samples that are processed together in one iteration. The weight decay rate was set to 0.0001, where the weight decay is a regularization technique used in neural networks to prevent overfitting. This decay is introduced by adding a penalty, proportional to the size of the weight coefficients, to the loss function that the model is asked to minimize. By setting a weight decay rate, we ensure that the model does not too much focus on some particular feature and can generalize better to unseen data. In each iteration in machine learning, the network performs a single pass through the entire training dataset. Here we set 6000 iterations for the training. Here, in our case, the training process was performed over the complete set of training data 6000 times. This is a typical number adopted in machine learning and we also confirmed that this number is sufficient to complete the training. See Table 1 for the hyperparameters used in the experiment.

Download:

Table 1. Hyperparameters.

Important items are in bold.

https://doi.org/10.1371/journal.pcbi.1011897.t001

Results

Fixed structure and time scales

Bayesian optimality.

Because the generated signal s was observed under noise, the neural network was required to estimate the true value y_true sampled from the generator. If the information from the generator were known, we can estimate the true value optimally as follows (maximum a posteriori(MAP) estimation [37]). (10)

However, as described in the “Task” section, the information from the generator was not explicitly given to the neural network, so it must be estimated from observed signals as a prior distribution. First, we examined whether the neural network could achieve this prior-based estimation.

The output y of RNN with modular structure trained with α_s = 0.1, when given an observed signal s, is shown Fig 2. s was sampled from the prior with μ_g = 0.5, σ_g = 0.5, and of noise was added. The green points represent the estimation based on the maximum likelihood estimation y_ML, which is with the highest accuracy when no prior information is available. Here, this estimation is nothing but matching with the observed signal s. The blue points represent y_opt when estimated according to the MAP estimation, and the orange points represent the actual neural network output y. Fig 2 shows that the output of RNN is closer to the blue points y_opt rather than to the green points, indicating that approximate (nearly-optimal) Bayesian inference with a well-estimated prior is achieved (the mean squared error between y and y_ML is 0.15, and the mean squared error between y and y_opt is 0.019, the latter being smaller).

Download:

Fig 2.

(a) The output y of RNN against the observed signal value s. Before s is input, the time series signal, which is sampled from the normal distribution with the mean μ_g = −0.3 and the standard deviation σ_g = 0.3 and then the noise with the standard deviation is added in the input. The accuracy can be increased by estimating prior based on the signal input before s and performing Bayesian inference. Blue points represent value, orange points represent the output of RNN y, and green points represent estimation based on maximum likelihood estimation y_ML = s. The result is for a model with α_s = 0.1. (b) The same plots as (a), with the settings (μ_g, σ_g) = (0.3, 0.3). When compared to (a), the output y of the RNN shows a larger value as expected by the Bayesian optimal. (c) The plots for the settings (μ_g, σ_g) = (−0.3, 0.6). Compared to (a), the output y of the RNN shows a lower slope as expected by the Bayesian optimal.

https://doi.org/10.1371/journal.pcbi.1011897.g002

Next, we examined the optimality of the Bayesian estimation for networks with and without modular structures and time scale differences. Fig 3(a) shows the MSE between y and y_opt by the RNN trained under each condition. This result shows that the modular structure improved the accuracy of Bayesian estimation, which was further increased when α_s decreased to an appropriate degree. In fact, we found the optimal time scale α_s = 0.06 ∼ 0.2, at which the maximum accuracy was achieved. As shown in S4 Fig MSE remains to be low at around 0.06 ≲ α_s ≲ 0.2. Even without modular structure, the time scale difference contributed to inference accuracy, but the accuracy increased significantly with both the modular structure and time scale difference.

Download:

Fig 3.

(a) MSE between the optimal value y_opt(t) and the output of RNN y(t), plotted against the time scale α_s. ⋅ with modular and × without modular structure. RNNs with a modular structure are more accurate. In addition, those with α_s ∼ (0.1 ∼ 0.2) have optimal error. (b) MSE between the true value y_true(t) and the output of RNN y(t) for the network with α_s = 1(⋅) and α_s = 0.1(×). The value increases as p_t increases, but the model with α_s = 0.1 is always more accurate.

https://doi.org/10.1371/journal.pcbi.1011897.g003

Adjustability to rapid generator switching.

So far, we studied the performance of Bayesian inference models under a fixed generator to compare the accuracy of Bayesian inference itself. Next, we examined their performance when the generator changes in time. To perform Bayesian inference for a rapidly changing input, it was necessary for the model to quickly approach the new optimal value y_opt to yield a good estimation. To verify the accuracy of the RNN in this case, we compared the MSE between y_true(t) generated by the generator and the output y(t) of RNN under various p_t(Fig 3(b)). The model with α_s = 0.1 was found to be more accurate for all values of p_t.

As a special case, we considered a setting where the input moves back and forth between two generators, A and B. Then we examined whether the prior distribution estimated by the RNN was closer to the distribution of either generator. Specifically, we adopted generator A with and generator B with and compute the following values when the Bayesian optimal estimates under each generator were . (11) When a(t) is close to 1, the model’s prior is closer to generator A, and when a(t) is close to 0, it is closer to generator B.

Comparing the change in a(t) between the model with α_s = 0.1 and the model with α_s = 1, we found that the model with α_s = 0.1 was more adjustable to the generator change, as shown in Fig 4(a). This result shows that the model with α_s = 0.1 was more responsive to the changes of the generators and recognized the generator change more quickly in all runs. The difference between the two models was especially pronounced in the extreme case in which the two generators switched every time(Fig 4(b)). Intuitively, having a population of slow neurons would seem to be disadvantageous in responding to rapid environmental changes, but the results showed the opposite. The network with α_s = 1 could not follow rapid input changes, whereas that with α_s = 0.1 could estimate the input prior effectively. We discuss the importance of slow neurons in responding to rapid changes below. Furthermore, we also checked that when μ_g and σ_g are constant, the modularity of the network is not necessary, and the difference in the performance with and without modularity was not detected.

Download:

Fig 4. Adjustability to rapid generator change.

(a) a(t) for the case where generator A() and generator B() switch alternately with probability p_t = 0.2. The model with α_s = 0.1 adjusted more quickly to the generator change. The thin line represents a(t) when the output was fully adjusted to generator switching. (b) a(t) for the case where generator A() and generator B() switch every time(periodic switch).

https://doi.org/10.1371/journal.pcbi.1011897.g004

Representation of the prior.

We investigated how the slow sub-module facilitated improved prior representation for Bayesian inference. By starting from the examination of the hypothesis that a group of downstream slow neurons represent the prior by integrating the observed signal over time, we investigated which side of the main/sub-module was responsible for the prior information in the modular RNN.

Here, by using the prior information, the estimated value was shifted from the observed signal s to an appropriate value y_opt(Eq 10). In other words, even given the same signal input s, the output varied depending on which time series signal was input before s (because the prior estimation changed). Even if one module returned to its original state, the output shifted from s because the prior information remained in the other module. The magnitude of this change is considered to represent the degree to which the module utilizes the estimated prior information. Therefore, it is possible to estimate the extent to which each module plays a role in prior information processing by examining the change in the output y(t) when the internal state of each main and sub-module is changed to the value corresponding to a different prior.

First, let x_m(t; μ_g, σ_g), x_s(t; μ_g, σ_g) be the internal states of the main and sub-module, respectively, when the input signal s from a generator (μ_g, σ_g) is applied for a certain period. Because the output y(t) is determined by the internal states of two modules and the input signal at t − 1, it can be written as y(x_m(t − 1; μ_g, σ_g), x_s(t − 1; μ_g, σ_g), s(t − 1), σ_l)(From now on, time notation will be omitted). From this, the change in the output y is computed by fixing one of the two modules and varying the other to a different internal state . This change is made by saving the value of internal state x_m,s obtained by applying the input s created from the generator of , and by changing the value of x_m,s to that value during RNN inference. (This procedure is adopted only for the sake to analyze which module is more responsible for the prior information.) The degree of change in y represents the impact on the output of each module reflecting the prior information. Hence, by comparing the above variances of y by x_m(or x_s) with fixed x_s(or x_m) respectively, it is possible to estimate how much each module is responsible for the prior representation. Specifically, we fixed one of the modules at the values with μ_g = 0 and σ_g = 0.4 (These values are set to the median of the range of values −0.5 ≤ μ_g ≤ 0.5 and 0 ≤ σ_g ≤ 0.8), i.e., x_i(0, 0.4), while for the other module, μ_g and σ_g are changed as x_g(μ_g, σ_g). Then, we calculated the variance of y as (12) (13) where denotes the variance over the changes of (μ_g, σ_g), and denotes the average over the changes of (s, σ_l). The magnitudes of V_s and V_m indicate the extent to which the sub-module and main module, respectively, influence the variation in the output, to respond upon changes in the signal’s prior distribution.

Dependencies of V_s and V_m on different α_s are shown in Fig 5. This result shows that when α_s = 1 (i.e., the time scale is uniform), both the main and sub-modules contribute to the representation of prior distribution to the same degree. Conversely, when α_s = 0.05 ∼ 0.5, V_s is much larger than V_m, meaning that the sub-module selectively contributes to the representation of the prior. In particular, when α_s = 0.1 and 0.2, the differentiation of representation between the main and sub-modules is more pronounced. Note that the contribution of the main module is large when α_s = 0.01, probably because the time scale of the sub-module is too slow to code the information of the prior. Comparing Figs 3 and 5 shows that the highly accurate Bayesian inference is achieved when the prior distribution information is localized in the sub-module.

Download:

Fig 5. Division of roles for representing prior distribution.

V_s, V_m defined in the text Eqs (12) and (13) plotted for different values of α_s computed over 1000 samples of data. V_s and V_m represent the degree to which the sub-module and the main module are responsible for prior-based information processing. When α_s = 0.05 ∼ 0.5, in particular for α_s = 0.1 and 0.2, the sub-module selectively contributes to the representation of the prior.

https://doi.org/10.1371/journal.pcbi.1011897.g005

Next, we investigated how the prior is represented by the main and sub-modules by visualizing the neural activity by principal component analysis(PCA) [38, 39]. First, x_m(μ_g, σ_g) and x_s(μ_g, σ_g) were computed for various (μ_g, σ_g) in a model with α_s = 0.1, and made PCA. The results were projected on a plane using the first and second principal components and color-coded according to μ_g and σ_g (Fig 6(a) and 6(b)). The neural activity in the main module was loosely distributed on a one-dimensional manifold, represented by the first principal component(PC1). This PC1 approximately corresponded to the μ_g value, although the distinction was not clear. In contrast, the activity in the sub-module was clearly represented by 2-dimensional manifolds, as in Fig 6(b2), where PC1 corresponds to μ_g, and PC2 corresponds to σ_g, rather well.

Download:

Fig 6. The neural activities of the main module x_m(μ_g, σ_g) and sub-module x_s(μ_g, σ_g) were plotted by the first and second principal component spaces.

(a,b) is the result of α = 0.1, and (c,d) is of α_s = 1. (a1,c1) Main module, color-coded by μ_g. (a2, c2) Main module, color-coded by σ_g. (b1,d1) Sub-module, color-coded by μ_g. (b2, d2) Sub-module, color-coded by σ_g. 300 data are plotted.

https://doi.org/10.1371/journal.pcbi.1011897.g006

Then, we performed the same analysis on the model with α_s = 1 (Fig 6(c) and 6(d)). In this case, the manifolds of neural activities for the main and sub-modules did not change significantly. Both were represented in a one-dimensional manifold corresponding to μ_g; there was no axis corresponding to σ_g. The decodability of σ_g achieved in the internal states of the sub-module with α_s = 0.1 was not observed for α_s = 1. In fact, the coefficient of determination when σ_g was calculated by Ridge regression from the internal state of the sub-module with α_s = 0.1 was 0.68, while that using the sub-module with α_s = 1 is −0.03. This suggests that the model with α_s ∼ 0.1 can better distinguish the input’s variance from noise to perform Bayesian inference accurately.

When the generator changed rapidly, the variance of the prior was larger than the variance of the generator, as shown in the S1 Fig for the case with α_s = 0.1. When σ_g was large, as seen from Eq 10, the influence of the observed signal s was larger than that of μ_g, allowing the model to “keep up” with large changes in the observed signal. This explains the higher adjustability to rapid generator changes as seen in Fig 4.

To investigate whether the division of roles and accurate Bayesian inference depend on the number of neurons in the sub-module and main-module, we trained models with varying the ratio values of N_s to N_m by fixing at N_s + N_m = 250. As shown in S3 Fig, the MSE remained to be low as long as the number fraction is not too biased(e.g., 10:240 or 240:10). Except these extreme cases, the efficient Bayesian inference was achieved, where the division of roles was achived as computed by V_s and V_m. Here, to investigate whether the difference in time scale or the existence of slower time scale itself was more influential, we trained an RNN with α_s = α_m = 0.1 and examined its accuracy. As shown in S2 Fig, We found that when (α_m, α_s) = (0.1, 0.1), the MSE was larger, and the accuracy was worse than that in the cases with (α_m, α_s) = (1, 1) and (α_m, α_s) = (1, 0.1). Therefore, it is not simply the slower time scale of the neurons but the time scale difference between the main and sub-modules facilitates the accuracy in Bayesian inference.

Effects of different time scales.

To examine the impact of α_s differences on Bayesian inference accuracy in detail, we considered how each model with α_s = 1 and α_s = 0.1 represents prior as a function of the input signal s(t).

The RNN need to estimate the current generator from past input signals in order to accurately predict y(t). In this paper, this estimation is treated as a prior. Thus, we assume that the mean μ_p in the current generator estimated by the RNN is represented as a superposition of past input signals as follows: (14)

Note that μ_p corresponds to the “current state of the generator” estimated by the RNN and is treated as a variable distinct from μ_g (for example, right after the generator switches from μ_g = −0.5 to μ_g = 0.5, if a positive s(t) is input into the RNN, the RNN would estimate that the mean of generator is remains to be still −0.5). If the values of a_k in the above equation are known, it will be possible to discuss how the RNN estimates the current state of the generator, and how it performs this estimation using the past input signals s(t).

Below, we estimate a_k. For it, it is necessary to estimate μ_p. Here, there is the one-to-one relationship between the internal state x of the RNN and μ_p. Given this background, we define μ_p as . Here, is considered to be a transformation matrix, calculated as follows: We fixed the generator and estimated by calculating x. Then, we created a data vector M_g that gives the time series and a data matrix X that gives the time series of the internal states x¹, x², … i.e., Then, we seeked for the matrix such that . Using the Moore-Penrose pseudo inverse, we obtain the best-fit matrix as [40]. Let μ_p be the result of the transformation by . As Fig 7(a) shows, μ_g ≃ μ_p is valid.

Download:

Fig 7.

(a)Comparison between the estimated mean of prior μ_p and the mean of generator μ_g. (b)Comparison between the linear weighted sum of past signals s(t − k) and the estimated mean of prior μ_p.

https://doi.org/10.1371/journal.pcbi.1011897.g007

Based on the calculation of μ_p, we estimated a_k by the following steps. First, we obtained x(t) against the time-varying signal with a probability of p_t = 0.03. By applying the above transformation matrix, to x(t), which was obtained at this time, the prior μ_p was estimated. The state of the prior was thus obtained for the time series of the observed signal s(t).

Finally, a_k in Eq 14 was obtained by minimizing the difference between the two sides of Eq 14. Specifically, we created a data vector M_p that arranges μ_p and a data matrix S that arranges . Using the Moore-Penrose pseudo inverse, we obtained . As Fig 7 shows, was valid.

Because the obtained coefficients correspond to the contribution of the signal before k time steps, we could estimate the extent to which the neural network uses past information in estimating the prior.

The estimated coefficients of Eq 14 were plotted against k(Fig 8), revealing that the model with α_s = 0.1 used more past information in estimating prior information than the model with α_s = 1. This difference in time windows leads to a difference in accuracy for prior encoding.

Download:

Fig 8. a_k defined by Eq 14 is plotted against k, for the model with α_s = 1 and α_s = 0.1 using 3000 data points.

Note that a_k corresponds to the coefficient when the mean of the current generator that the RNN is estimating is expressed as a superposition of past signals s(t), given by μ_p ≃ ∑_k a_ks(t − k). From this, it can be seen that the model with α_s = 0.1 used more past information in estimating prior information than the model with α_s = 1.

https://doi.org/10.1371/journal.pcbi.1011897.g008

Organization of modular structure time-scale separation

So far, we have investigated neural networks with fixed and modular structures along fixed time scales and demonstrated that those with fast and slow modules effectively represented the prior distribution. Then, we investigated whether such a modular structure would emerge by training a neural network to predict y_true from a homogeneous-structure network. Here, it should be noted that our findings about the effectiveness of modular structures with slow/fast time scales in Bayesian inference do not necessarily imply such structure emerges through learning. In this section, we examine if slow/fast separation with corresponding modular structure is reachable just by learning.

We again used the same neural network model as Eq 1. In this section, α values, as well as elements of W, change by training to start from initial values set randomly according to . In other words, we examine if the modular structure together with the timescale difference emerges from the random Gaussian distribution without such structure. During training, each matrix W and α are optimized according to the gradient descent method [41] at each step. The number of neurons in the recurrent layer of the neural network was set to 80.

The change in α distribution during the learning task is shown in Fig 9(a). As shown, α split into two groups over the learning period: one with large values close to 1 and the other with small values near 0.1.

Download:

Fig 9. RNN features obtained by learning when α is variable by learning.

(a) Frequency distribution of α for all neurons at 200, 1000, and 10000 learning epochs. At 10000 epochs, the learning process was complete. (b) Division of roles V_slow and V_fast. See the “Representation of the prior” section for definitions of V_slow and V_fast; V_slow (V_fast) was computed for neurons with α < 0.2 (α > 0.8) respectively. (c) The average degree of RNN connections of 10000 epochs. Connections between input layer, recurrent neurons with α < 0.2, 0.2 ≤ α ≤ 0.8, α > 0.8, and output layer. Each was normalized so that the maximum value was 1.

https://doi.org/10.1371/journal.pcbi.1011897.g009

Next, we measured the contribution of prior representation as examined in the “Representation of the prior” section for groups of neurons with large values α (neurons with α_i > 0.8) and groups of neurons with small values α(neurons with α_i < 0.2) for three epochs in the learning process(Fig 9(b)). We found that after 10000 epochs, the slow neurons were responsible for the representation of prior distribution, as in the model with α_s = 0.1 in the fixed time scale setting. To explore the optimal time scale α in response to the variations in p_t, we allowed α to be trainable, by taking an approach similar to that in the last chapter. Models were trained under three distinct settings: p_t = 0.03, 0.1, and 0.3. The results are illustrated in S5 Fig. As shown, the peak on the smaller α side shifted to a larger value as p_t increased. Conversely, the peak on the larger α remained constant at 1 across all settings, with no discernible difference in its proportion.

Finally, we investigated the neural network structure shaped by training. In Fig 9(a), the recurrent layer neurons of the network of epoch 10000 were split into the three groups, divided by the magnitude of α_i, slow neurons with α_i < 0.2, fast neurons with α_i > 0.8, and 0.2 ≤ α_i ≤ 0.8 neurons as the others. The average connectivity between the input layer, each group, and the output layer is shown in Fig 9(c) [42]. The connection from the input layer to the group of fast neurons and that from the fast neurons to the output layer were distinctively larger than those to or from the slow neurons. Among connections within the recurrent layer, those between the fast and slow neurons were larger than others. In summary, a modular structure, shown in Fig 1(b), emerged through learning alone.

Discussion

In this study, we demonstrated that neural networks with slow and fast activity modules play an essential role in the prior representation for Bayesian inference. We set up a task to predict a time-varying signal under noise that could be estimated by Bayesian inference and trained RNNs with or without modular structure and with or without time scale differences.

The RNN could learn to approximate Bayesian inference using the prior(approximating the generator distribution) in all conditions we tested. However, the accuracy was higher in the modular RNN; further, the accuracy was significantly higher when the time scale of the sub-module was moderately slower than that of the main module. In addition, the increase in accuracy was pronounced against a rapidly varying input, for which it was necessary to generate a prior that changes quickly. To achieve such accuracy with a slow sub-module, the sub-module was found to specifically represent the prior, indicating role differentiation between the representation of the prior and the representation of the observed signal (likelihood). Of note, such functional differentiation is caused by differences in time scales. This result is consistent with experimental observations in the brain in which areas that code the prior and likelihood in Bayesian inference are different [18–20](However, caution is required as there is also an experimental report showing that the prior and likelihood are encoded in the same brain area [43]). Finally, it was shown that a modular structure with distinct time scales was spontaneously organized in the RNN by learning.

It is important to note that a relatively slow time scale of the neuron population encoding the prior is required, but the difference between fast and slow neurons should not be excessive. If the time scale is too small, the accuracy is decreased (Fig 3), in which case the sub-module is not responsible for representing the prior (Fig 5). This is because prior construction requires a larger time span to address changes in external input for a neural network with such a slow time scale.

It has been suggested that the time scale of neurons slows down hierarchically from the area where the signal is directly applied to the area where information is proceed [21–23]. This hierarchical structure, combined with modularity [44], is believed to be relevant to information processing [44–46]. Our findings indicate that modular structures with two-level time scales could handle slowly changing inputs. Handling more complex environmental shifts might necessitate a more multi-layered modular structure with diverse time scales. With such a structure, Bayesian inference against complex temporal changes could be achieved by extrapolating the results of this study. Further research verifying this finding will elucidate the significance of hierarchical structuring in the brain. Notably, our simulations revealed that the distinction in time scales not only improves Bayesian inference accuracy but also spontaneously arises from learning processes. Considering these findings, a similar process may be expected in evolution [47].

The modular network with slow/fast time scales could integrate out noise and distinguish the average change in the inputs from fast noise. In fact, the network could effectively predict temporal changes in the input, even under rapidly changing conditions. The brain must adapt to time-varying, noisy inputs; hence, the performance of Bayesian inference by the network design reported herein is considered relevant to brain information processing.

We adopted a simple RNN and trained it using backpropagation. Backpropagation is often argued to be different from the learning algorithm implemented in the actual brain [48, 49], so care should be taken when generalizing our results. However, previous studies have suggested that neural networks trained by backpropagation can show similar behavior to that of the actual brain [38, 50–55]. For instance, by training neural networks by backpropagation, it is possible to produce a neural activity that displays the same behavior as place cells, which represents one’s own spatial position [56]. It is generally considered that the learning scheme in the brain will not adopt backpropagation. Still, one may expect that neural networks and dynamics that achieve the requested task and Bayesian inference have a common structure, as long as the learning scheme is based on synaptic changes depending on on/off neural activity dynamics. Then, the present finding that neurons with slower time scales play a role in representing the prior will be relevant as a plausible explanation of how the brain actually behaves.

Unravelling the relationship between the structure of neural networks, neural dynamics, and the information processing performed by the brain is a primary goal in computational neuroscience [25–27, 57]. In this study, the relevance of modular structure and time scale difference in neural dynamics to the representation of the prior in Bayesian inference is demonstrated, as well as their formation by learning [58, 59], which will support ongoing research in the field.

Supporting information

S1 Fig. Trajectory of the internal state x_sub(t) of the sub-module when generator A() and generator B() switch alternately.

Here the trajectories of the internal state x_sub(t) are plotted by the first and second principal components in S1 Fig for the cases in which generators A and B switch every 2-time steps and every 30-time steps. Generators A and B both have σ_g = 0.04. In the case of switching every 30-time steps, they were located in the region taken by the internal state when σ_g was small. In the case of switching every 2-time steps, they were located in the region taken by the internal state when σ_g was large. This occurred because the generators switched so rapidly that the RNN recognized that the signal was created by a generator with a large variance. This made it possible to switch y(t) quickly because the information of the observed signal s was prioritized over the prior information when calculating the output y.

https://doi.org/10.1371/journal.pcbi.1011897.s001

(EPS)

S2 Fig.

Results of the RNN with α_m = α_s = 0.1: (a) Mean squared error between the optimal value y_opt(t) and the output of RNN y(t), plotted against the setting (α_m, α_s) = (1, 1), (1, 0.1), (0.1, 0.1). (b) Division of roles for representing prior distribution. V_s, V_m defined in the text Eqs (12) and (13) plotted for different values of (α_m, α_s) = (1, 1), (1, 0.1), (0.1, 0.1) computed over 1000 samples of data. When (α_m, α_s) = (0.1, 0.1), it resulted in V_s < V_m and this indicates that the sub-module was unable to process prior-based information.

https://doi.org/10.1371/journal.pcbi.1011897.s002

(EPS)

S3 Fig.

Results of the different N_s, N_m: (a) Mean squared error between the optimal value y_opt(t) and the output of RNN y(t), plotted against the setting (N_s, N_m) = (10, 240), (50, 200), (100, 150), (150, 100), (200, 50), (240, 10). α_s is set to 0.1. (b) Division of roles for representing prior distribution. By fixing the sum N_s + N_m to be constant at 250, we examined six configurations: (N_s, N_m) = (10, 240), (50, 200), (100, 150), (150, 100), (200, 50), (240, 10) while keeping α_s = 0.1, α_m = 1 and p_t = 0.03 fixed. Our results reveal that except for the cases (N_s, N_m) = (10, 240), (240, 10) efficient Bayesian inference, indicated by lower MSE values was observed for all other configurations(S3(a) Fig). The differences in MSE between the configurations (N_s, N_m) = (50, 200), (100, 150), (150, 100), (50, 200) were within the margin of error. Furthermore, the division of roles as measured by the variances ((Eqs (12) and (13))) between the sub-module and main-module was evident in all configurations except for (N_s, N_m) = (10, 240)(S3(b) Fig). As long as the number fraction is not too biased, the efficient Bayesian inference was achieved, with the division of roles. If the fraction of N_s is too low, the variance for the slow module is larger, but the number of slow module is not sufficient to make appropriate Bayesian difference, whereas if it is too high the separation of variances does not follow. From these findings, it can be inferred that the results presented in this paper hold broadly, as long as neither of the modules is extremely undersized.

https://doi.org/10.1371/journal.pcbi.1011897.s003

(EPS)

S4 Fig. Extended examination of α_s in the range 0.16 − 0.06: MSE between the optimal value y_opt(t) and the output of RNN y(t), plotted against the time scale α_s.

Trained and tested the model with (a)p_t = 0.03 and (b)p_t = 0.1. In our present analysis, it was observed that if α_s is slow, accurate Bayesian inference is achievable. In Fig 3, MSE turned to be larger for α_s ≲ 0.01 or ≳ 0.2, and it was smaller around α_s ∼ (0.05 ∼ 0.2). Motivated by this, we investigated if detailed differences in α_s might pinpoint an optimal value, and if the outcomes would be influenced by variations in p_t. Here we change α_s values ranging from 0.16 to 0.06, as shown in S4 Fig, there was no significant differences within this range. Additionally, such insensitivity was observed irrespective of the differences in p_t.

https://doi.org/10.1371/journal.pcbi.1011897.s004

(EPS)

S5 Fig. Time scale α for different p_t: Frequency distribution of α for the model trained in (a) p_t = 0.03 setting, (b) p_t = 0.1 setting, and (c) p_t = 0.3 setting.

The peak on the smaller α side shifted to a larger value as p_t increased. On the other hand, the peak on the larger α remained constant at 1.

https://doi.org/10.1371/journal.pcbi.1011897.s005

(EPS)

Acknowledgments

We thank Koji Hukushima and Yasushi Nagano for stimulating the discussion.

References

1. Sokoloski S. Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics. Neural Computation. 2017;29(9):2450–2490. pmid:28599113
- View Article
- PubMed/NCBI
- Google Scholar
2. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences. 2004;27(12):712–719. pmid:15541511
- View Article
- PubMed/NCBI
- Google Scholar
3. Moreno-Bote R, Knill DC, Pouget A. Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences. 2011;108(30):12491–12496. pmid:21742982
- View Article
- PubMed/NCBI
- Google Scholar
4. Angelaki DE, Gu Y, DeAngelis GC. Multisensory integration: psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology. 2009;19(4):452–458. pmid:19616425
- View Article
- PubMed/NCBI
- Google Scholar
5. Haefner RM, Berkes P, Fiser J. Perceptual Decision-Making as Probabilistic Inference by Neural Sampling. Neuron. 2016;90(3):649–660. pmid:27146267
- View Article
- PubMed/NCBI
- Google Scholar
6. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415(6870):429–433. pmid:11807554
- View Article
- PubMed/NCBI
- Google Scholar
7. Merfeld DM, Zupan L, Peterka RJ. Humans use internal models to estimate gravity and linear acceleration. Nature. 1999;398(6728):615–618. pmid:10217143
- View Article
- PubMed/NCBI
- Google Scholar
8. Doya K, Ishii S, Pouget A, Rao RPN. Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press; 2007.
9. Friston K. The history of the future of the Bayesian brain. NeuroImage. 2012;62(2):1230–1233. pmid:22023743
- View Article
- PubMed/NCBI
- Google Scholar
10. Pouget A, Beck JM, Ma WJ, Latham PE. Probabilistic brains: knowns and unknowns. Nature Neuroscience. 2013;16(9):1170–1178. pmid:23955561
- View Article
- PubMed/NCBI
- Google Scholar
11. Beck JM, Latham PE, Pouget A. Marginalization in Neural Circuits with Divisive Normalization. Journal of Neuroscience. 2011;31(43):15310–15319. pmid:22031877
- View Article
- PubMed/NCBI
- Google Scholar
12. Geisler WS, Kersten D. Illusions, perception and Bayes. Nature Neuroscience. 2002;5(6):508–510. pmid:12037517
- View Article
- PubMed/NCBI
- Google Scholar
13. Honig M, Ma WJ, Fougnie D. Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions. Proceedings of the National Academy of Sciences. 2020;117(15):8391–8397. pmid:32229572
- View Article
- PubMed/NCBI
- Google Scholar
14. Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y. A Recurrent Latent Variable Model for Sequential Data. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc.; 2015.
15. Ahmadi A, Tani J. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition. Neural Computation. 2019;31(11):2025–2074. pmid:31525309
- View Article
- PubMed/NCBI
- Google Scholar
16. Orhan AE, Ma WJ. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nature Communications. 2017;8(1):138. pmid:28743932
- View Article
- PubMed/NCBI
- Google Scholar
17. Quax SC, Bosch SE, Peelen MV, van Gerven MAJ. Population codes of prior knowledge learned through environmental regularities. Scientific Reports. 2021;11(1):640. pmid:33436692
- View Article
- PubMed/NCBI
- Google Scholar
18. Vilares I, Howard JD, Fernandes HL, Gottfried JA, Kording KP. Differential Representations of Prior and Likelihood Uncertainty in the Human Brain. Current Biology. 2012;22(18):1641–1648. pmid:22840519
- View Article
- PubMed/NCBI
- Google Scholar
19. Chan SCY, Niv Y, Norman KA. A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex. Journal of Neuroscience. 2016;36(30):7817–7828. pmid:27466328
- View Article
- PubMed/NCBI
- Google Scholar
20. d’Acremont M, Schultz W, Bossaerts P. The Human Brain Encodes Event Frequencies While Forming Subjective Beliefs. Journal of Neuroscience. 2013;33(26):10887–10897. pmid:23804108
- View Article
- PubMed/NCBI
- Google Scholar
21. Murray JD, Bernacchia A, Freedman DJ, Romo R, Wallis JD, Cai X, et al. A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience. 2014;17(12):1661–1663. pmid:25383900
- View Article
- PubMed/NCBI
- Google Scholar
22. Cavanagh SE, Hunt LT, Kennerley SW. A Diversity of Intrinsic Timescales Underlie Neural Computations. Frontiers in Neural Circuits. 2020;14. pmid:33408616
- View Article
- PubMed/NCBI
- Google Scholar
23. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):970. pmid:34400800
- View Article
- PubMed/NCBI
- Google Scholar
24. Amunts K, DeFelipe J, Pennartz C, Destexhe A, Migliore M, Ryvlin P, et al. Linking Brain Structure, Activity, and Cognitive Function through Computation. eNeuro. 2022;9(2). pmid:35217544
- View Article
- PubMed/NCBI
- Google Scholar
25. Mastrogiuseppe F, Ostojic S. Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks. Neuron. 2018;99(3):609–623.e29. pmid:30057201
- View Article
- PubMed/NCBI
- Google Scholar
26. Vyas S, Golub MD, Sussillo D, Shenoy KV. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 2020;43(1):249–275. pmid:32640928
- View Article
- PubMed/NCBI
- Google Scholar
27. Beiran M, Dubreuil A, Valente A, Mastrogiuseppe F, Ostojic S. Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks. Neural Computation. 2021;33(6):1572–1615. pmid:34496384
- View Article
- PubMed/NCBI
- Google Scholar
28. Papo D. Time scales in cognitive neuroscience. Frontiers in Physiology. 2013;4. pmid:23626578
- View Article
- PubMed/NCBI
- Google Scholar
29. Barak O. Recurrent neural networks as versatile tools of neuroscience research. Curr Opin Neurobiol. 2017;46:1–6. pmid:28668365
- View Article
- PubMed/NCBI
- Google Scholar
30. Nair V, Hinton GE. Rectified Linear Units Improve Restricted Boltzmann Machines. In: Fürnkranz J, Joachims T, editors. ICML. Omnipress; 2010. p. 807–814. Available from: http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10.
31. Ma WJ, Beck JM, Latham PE, Pouget A. Bayesian inference with probabilistic population codes. Nature Neuroscience. 2006;9(11):1432–1438. pmid:17057707
- View Article
- PubMed/NCBI
- Google Scholar
32. Ichikawa K, Kataoka A. Dynamical Mechanism of Sampling-Based Probabilistic Inference Under Probabilistic Population Codes. Neural Computation. 2022;34(3):804–827. pmid:35026031
- View Article
- PubMed/NCBI
- Google Scholar
33. Swindale NV. Orientation tuning curves: empirical description and estimation of parameters. Biological Cybernetics. 1998;78(1):45–56. pmid:9518026
- View Article
- PubMed/NCBI
- Google Scholar
34. Rumelhart DE, Hinton GE, Williams RJ. In: Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press; 1986. p. 318–362.
35. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–1560.
- View Article
- Google Scholar
36. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization; 2014. Available from: http://arxiv.org/abs/1412.6980.
37. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006.
38. Mante V, et al. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503:78–84. pmid:24201281
- View Article
- PubMed/NCBI
- Google Scholar
39. Ichikawa K, Kaneko K. Short-term memory by transient oscillatory dynamics in recurrent neural networks. Phys Rev Research. 2021;3:033193.
- View Article
- Google Scholar
40. Penrose R. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society. 1955;51(3):406–413.
- View Article
- Google Scholar
41. Perez-Nieves N, Leung VCH, Dragotti PL, Goodman DFM. Neural heterogeneity promotes robust learning. Nature Communications. 2021;12(1):5791. pmid:34608134
- View Article
- PubMed/NCBI
- Google Scholar
42. Yang GR, Joglekar MR, Song HF, Newsome WT, Wang XJ. Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience. 2019;22(2):297–306. pmid:30643294
- View Article
- PubMed/NCBI
- Google Scholar
43. Mochol G, Kiani R, Moreno-Bote R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology. 2021;31(6):1234–1244.e6. pmid:33639107
- View Article
- PubMed/NCBI
- Google Scholar
44. Yamashita Y, Tani J. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment. PLOS Computational Biology. 2008;4(11):1–18.
- View Article
- Google Scholar
45. Kurikawa T, Kaneko K. Multiple-Timescale Neural Networks: Generation of History-Dependent Sequences and Inference Through Autonomous Bifurcations. Frontiers in Computational Neuroscience. 2021;15. pmid:34955798
- View Article
- PubMed/NCBI
- Google Scholar
46. Tanaka G, Matsumori T, Yoshida H, Aihara K. Reservoir computing with diverse timescales for prediction of multiscale dynamics. Phys Rev Research. 2022;4:L032014.
- View Article
- Google Scholar
47. Yamaguti Y, Tsuda I. Functional differentiations in evolutionary reservoir computing networks. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2021;31(1):013137. pmid:33754767
- View Article
- PubMed/NCBI
- Google Scholar
48. Bengio Y, Lee D, Bornschein J, Lin Z. Towards Biologically Plausible Deep Learning. ArXiv. 2015;abs/1502.04156.
49. Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. 2016;7(1):13276. pmid:27824044
- View Article
- PubMed/NCBI
- Google Scholar
50. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–1770. pmid:31659335
- View Article
- PubMed/NCBI
- Google Scholar
51. Yang GR, Wang XJ. Artificial Neural Networks for Neuroscientists: A Primer. Neuron. 2020;107(6):1048–1070. pmid:32970997
- View Article
- PubMed/NCBI
- Google Scholar
52. Barak O, Sussillo D, Romo R, Tsodyks M, Abbott LF. From fixed points to chaos: Three models of delayed discrimination. Progress in Neurobiology. 2013;103:214–222. pmid:23438479
- View Article
- PubMed/NCBI
- Google Scholar
53. Cueva CJ, Wei XX. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. In: International Conference on Learning Representations; 2018. Available from: https://openreview.net/forum?id=B17JTOe0-.
54. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–365. pmid:26906502
- View Article
- PubMed/NCBI
- Google Scholar
55. Haesemeyer M, Schier AF, Engert F. Convergent Temperature Representations in Artificial and Biological Neural Networks. Neuron. 2019;103(6):1123–1134.e6. pmid:31376984
- View Article
- PubMed/NCBI
- Google Scholar
56. Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557(7705):429–433. pmid:29743670
- View Article
- PubMed/NCBI
- Google Scholar
57. Dubreuil A, Valente A, Beiran M, Mastrogiuseppe F, Ostojic S. The role of population structure in computations through neural dynamics. Nature Neuroscience. 2022;25(6):783–794. pmid:35668174
- View Article
- PubMed/NCBI
- Google Scholar
58. Lorenz DM, Jeng A, Deem MW. The emergence of modularity in biological systems. Physics of Life Reviews. 2011;8(2):129–160. pmid:21353651
- View Article
- PubMed/NCBI
- Google Scholar
59. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences. 2005;102(39):13773–13778. pmid:16174729
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Sokoloski S. Implementing a Bayes Filter in a Neural Circuit: The Case of Unknown Stimulus Dynamics. Neural Computation. 2017;29(9):2450–2490. pmid:28599113
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences. 2004;27(12):712–719. pmid:15541511
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Moreno-Bote R, Knill DC, Pouget A. Bayesian sampling in visual perception. Proceedings of the National Academy of Sciences. 2011;108(30):12491–12496. pmid:21742982
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Angelaki DE, Gu Y, DeAngelis GC. Multisensory integration: psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology. 2009;19(4):452–458. pmid:19616425
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Haefner RM, Berkes P, Fiser J. Perceptual Decision-Making as Probabilistic Inference by Neural Sampling. Neuron. 2016;90(3):649–660. pmid:27146267
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415(6870):429–433. pmid:11807554
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Merfeld DM, Zupan L, Peterka RJ. Humans use internal models to estimate gravity and linear acceleration. Nature. 1999;398(6728):615–618. pmid:10217143
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Doya K, Ishii S, Pouget A, Rao RPN. Bayesian Brain: Probabilistic Approaches to Neural Coding. MIT Press; 2007.

[ref9] 9. Friston K. The history of the future of the Bayesian brain. NeuroImage. 2012;62(2):1230–1233. pmid:22023743
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Pouget A, Beck JM, Ma WJ, Latham PE. Probabilistic brains: knowns and unknowns. Nature Neuroscience. 2013;16(9):1170–1178. pmid:23955561
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref11] 11. Beck JM, Latham PE, Pouget A. Marginalization in Neural Circuits with Divisive Normalization. Journal of Neuroscience. 2011;31(43):15310–15319. pmid:22031877
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref12] 12. Geisler WS, Kersten D. Illusions, perception and Bayes. Nature Neuroscience. 2002;5(6):508–510. pmid:12037517
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref13] 13. Honig M, Ma WJ, Fougnie D. Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions. Proceedings of the National Academy of Sciences. 2020;117(15):8391–8397. pmid:32229572
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref14] 14. Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y. A Recurrent Latent Variable Model for Sequential Data. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 28. Curran Associates, Inc.; 2015.

[ref15] 15. Ahmadi A, Tani J. A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition. Neural Computation. 2019;31(11):2025–2074. pmid:31525309
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref16] 16. Orhan AE, Ma WJ. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nature Communications. 2017;8(1):138. pmid:28743932
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref17] 17. Quax SC, Bosch SE, Peelen MV, van Gerven MAJ. Population codes of prior knowledge learned through environmental regularities. Scientific Reports. 2021;11(1):640. pmid:33436692
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Vilares I, Howard JD, Fernandes HL, Gottfried JA, Kording KP. Differential Representations of Prior and Likelihood Uncertainty in the Human Brain. Current Biology. 2012;22(18):1641–1648. pmid:22840519
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Chan SCY, Niv Y, Norman KA. A Probability Distribution over Latent Causes, in the Orbitofrontal Cortex. Journal of Neuroscience. 2016;36(30):7817–7828. pmid:27466328
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref20] 20. d’Acremont M, Schultz W, Bossaerts P. The Human Brain Encodes Event Frequencies While Forming Subjective Beliefs. Journal of Neuroscience. 2013;33(26):10887–10897. pmid:23804108
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Murray JD, Bernacchia A, Freedman DJ, Romo R, Wallis JD, Cai X, et al. A hierarchy of intrinsic timescales across primate cortex. Nature Neuroscience. 2014;17(12):1661–1663. pmid:25383900
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Cavanagh SE, Hunt LT, Kennerley SW. A Diversity of Intrinsic Timescales Underlie Neural Computations. Frontiers in Neural Circuits. 2020;14. pmid:33408616
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref23] 23. Golesorkhi M, Gomez-Pilar J, Zilio F, Berberian N, Wolff A, Yagoub MCE, et al. The brain and its time: intrinsic neural timescales are key for input processing. Communications Biology. 2021;4(1):970. pmid:34400800
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref24] 24. Amunts K, DeFelipe J, Pennartz C, Destexhe A, Migliore M, Ryvlin P, et al. Linking Brain Structure, Activity, and Cognitive Function through Computation. eNeuro. 2022;9(2). pmid:35217544
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref25] 25. Mastrogiuseppe F, Ostojic S. Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks. Neuron. 2018;99(3):609–623.e29. pmid:30057201
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref26] 26. Vyas S, Golub MD, Sussillo D, Shenoy KV. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 2020;43(1):249–275. pmid:32640928
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref27] 27. Beiran M, Dubreuil A, Valente A, Mastrogiuseppe F, Ostojic S. Shaping Dynamics With Multiple Populations in Low-Rank Recurrent Networks. Neural Computation. 2021;33(6):1572–1615. pmid:34496384
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref28] 28. Papo D. Time scales in cognitive neuroscience. Frontiers in Physiology. 2013;4. pmid:23626578
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref29] 29. Barak O. Recurrent neural networks as versatile tools of neuroscience research. Curr Opin Neurobiol. 2017;46:1–6. pmid:28668365
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref30] 30. Nair V, Hinton GE. Rectified Linear Units Improve Restricted Boltzmann Machines. In: Fürnkranz J, Joachims T, editors. ICML. Omnipress; 2010. p. 807–814. Available from: http://dblp.uni-trier.de/db/conf/icml/icml2010.html#NairH10.

[ref31] 31. Ma WJ, Beck JM, Latham PE, Pouget A. Bayesian inference with probabilistic population codes. Nature Neuroscience. 2006;9(11):1432–1438. pmid:17057707
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref32] 32. Ichikawa K, Kataoka A. Dynamical Mechanism of Sampling-Based Probabilistic Inference Under Probabilistic Population Codes. Neural Computation. 2022;34(3):804–827. pmid:35026031
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref33] 33. Swindale NV. Orientation tuning curves: empirical description and estimation of parameters. Biological Cybernetics. 1998;78(1):45–56. pmid:9518026
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref34] 34. Rumelhart DE, Hinton GE, Williams RJ. In: Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press; 1986. p. 318–362.

[ref35] 35. Werbos PJ. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE. 1990;78(10):1550–1560.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref36] 36. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization; 2014. Available from: http://arxiv.org/abs/1412.6980.

[ref37] 37. Bishop CM. Pattern Recognition and Machine Learning. Springer; 2006.

[ref38] 38. Mante V, et al. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature. 2013;503:78–84. pmid:24201281
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref39] 39. Ichikawa K, Kaneko K. Short-term memory by transient oscillatory dynamics in recurrent neural networks. Phys Rev Research. 2021;3:033193.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref40] 40. Penrose R. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society. 1955;51(3):406–413.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref41] 41. Perez-Nieves N, Leung VCH, Dragotti PL, Goodman DFM. Neural heterogeneity promotes robust learning. Nature Communications. 2021;12(1):5791. pmid:34608134
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref42] 42. Yang GR, Joglekar MR, Song HF, Newsome WT, Wang XJ. Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience. 2019;22(2):297–306. pmid:30643294
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref43] 43. Mochol G, Kiani R, Moreno-Bote R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Current Biology. 2021;31(6):1234–1244.e6. pmid:33639107
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref44] 44. Yamashita Y, Tani J. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment. PLOS Computational Biology. 2008;4(11):1–18.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref45] 45. Kurikawa T, Kaneko K. Multiple-Timescale Neural Networks: Generation of History-Dependent Sequences and Inference Through Autonomous Bifurcations. Frontiers in Computational Neuroscience. 2021;15. pmid:34955798
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref46] 46. Tanaka G, Matsumori T, Yoshida H, Aihara K. Reservoir computing with diverse timescales for prediction of multiscale dynamics. Phys Rev Research. 2022;4:L032014.
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref47] 47. Yamaguti Y, Tsuda I. Functional differentiations in evolutionary reservoir computing networks. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2021;31(1):013137. pmid:33754767
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref48] 48. Bengio Y, Lee D, Bornschein J, Lin Z. Towards Biologically Plausible Deep Learning. ArXiv. 2015;abs/1502.04156.

[ref49] 49. Lillicrap TP, Cownden D, Tweed DB, Akerman CJ. Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications. 2016;7(1):13276. pmid:27824044
View Article
PubMed/NCBI
Google Scholar

[168] View Article

[169] PubMed/NCBI

[170] Google Scholar

[ref50] 50. Richards BA, Lillicrap TP, Beaudoin P, Bengio Y, Bogacz R, Christensen A, et al. A deep learning framework for neuroscience. Nature Neuroscience. 2019;22(11):1761–1770. pmid:31659335
View Article
PubMed/NCBI
Google Scholar

[172] View Article

[173] PubMed/NCBI

[174] Google Scholar

[ref51] 51. Yang GR, Wang XJ. Artificial Neural Networks for Neuroscientists: A Primer. Neuron. 2020;107(6):1048–1070. pmid:32970997
View Article
PubMed/NCBI
Google Scholar

[176] View Article

[177] PubMed/NCBI

[178] Google Scholar

[ref52] 52. Barak O, Sussillo D, Romo R, Tsodyks M, Abbott LF. From fixed points to chaos: Three models of delayed discrimination. Progress in Neurobiology. 2013;103:214–222. pmid:23438479
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref53] 53. Cueva CJ, Wei XX. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. In: International Conference on Learning Representations; 2018. Available from: https://openreview.net/forum?id=B17JTOe0-.

[ref54] 54. Yamins DLK, DiCarlo JJ. Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience. 2016;19(3):356–365. pmid:26906502
View Article
PubMed/NCBI
Google Scholar

[185] View Article

[186] PubMed/NCBI

[187] Google Scholar

[ref55] 55. Haesemeyer M, Schier AF, Engert F. Convergent Temperature Representations in Artificial and Biological Neural Networks. Neuron. 2019;103(6):1123–1134.e6. pmid:31376984
View Article
PubMed/NCBI
Google Scholar

[189] View Article

[190] PubMed/NCBI

[191] Google Scholar

[ref56] 56. Banino A, Barry C, Uria B, Blundell C, Lillicrap T, Mirowski P, et al. Vector-based navigation using grid-like representations in artificial agents. Nature. 2018;557(7705):429–433. pmid:29743670
View Article
PubMed/NCBI
Google Scholar

[193] View Article

[194] PubMed/NCBI

[195] Google Scholar

[ref57] 57. Dubreuil A, Valente A, Beiran M, Mastrogiuseppe F, Ostojic S. The role of population structure in computations through neural dynamics. Nature Neuroscience. 2022;25(6):783–794. pmid:35668174
View Article
PubMed/NCBI
Google Scholar

[197] View Article

[198] PubMed/NCBI

[199] Google Scholar

[ref58] 58. Lorenz DM, Jeng A, Deem MW. The emergence of modularity in biological systems. Physics of Life Reviews. 2011;8(2):129–160. pmid:21353651
View Article
PubMed/NCBI
Google Scholar

[201] View Article

[202] PubMed/NCBI

[203] Google Scholar

[ref59] 59. Kashtan N, Alon U. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences. 2005;102(39):13773–13778. pmid:16174729
View Article
PubMed/NCBI
Google Scholar

[205] View Article

[206] PubMed/NCBI

[207] Google Scholar

Figures

Abstract

Author summary

Introduction

Materials and methods

Recurrent neural networks with/without modular structure

Task

Results

Fixed structure and time scales

Bayesian optimality.

Adjustability to rapid generator switching.

Representation of the prior.

Effects of different time scales.

Organization of modular structure time-scale separation

Discussion

Supporting information

S1 Fig. Trajectory of the internal state xsub(t) of the sub-module when generator A() and generator B() switch alternately.

S2 Fig.

S3 Fig.

S4 Fig. Extended examination of αs in the range 0.16 − 0.06: MSE between the optimal value yopt(t) and the output of RNN y(t), plotted against the time scale αs.

S5 Fig. Time scale α for different pt: Frequency distribution of α for the model trained in (a) pt = 0.03 setting, (b) pt = 0.1 setting, and (c) pt = 0.3 setting.

Acknowledgments

References

S1 Fig. Trajectory of the internal state x_sub(t) of the sub-module when generator A() and generator B() switch alternately.

S4 Fig. Extended examination of α_s in the range 0.16 − 0.06: MSE between the optimal value y_opt(t) and the output of RNN y(t), plotted against the time scale α_s.

S5 Fig. Time scale α for different p_t: Frequency distribution of α for the model trained in (a) p_t = 0.03 setting, (b) p_t = 0.1 setting, and (c) p_t = 0.3 setting.