Signatures of criticality in eﬃcient coding networks

,

sizes systematically changes with the strength of added noise (Figure 1A).In networks with a small amount of noise (e. g., noise strength 0.5, thick blue line in Figure 1A, or To determine the most scale-free avalanche distribution (closest to a power-law distribution), we use a deviation measure κ that quantifies deviation from an ideal power-law distribution.Our κ measure closely follows the non-parametric measure introduced by Shew and colleagues (12) but does not assume a particular scaling exponent (see supplementary methods).κ is defined as the normalized area between the empirical and the ideal (fitted power-law for the portion of data between two cut-offs) distribution (Figure 1B).We measure how deviations from a power-law, κ, and the network's reconstruction error depends on the noise strength.
We confirm the previous observation that the performance of this network depends non-monotonically on the noise amplitude (gray dotted curve in Figure 1C), with the optimal performance achieved for an intermediate noise level (6,15).Interestingly, the change in κ with the noise level demonstrates a similar non-monotonic behavior (colorful circles in Figure 1C).Remarkably, they both are minimized at the same noise level, resulting in a coincidence of the optimal point for 105 coding and the most scale-free distribution (vertical purple 106 and cyan line in Figure 1C).This observation offers additional 107 support to the criticality hypothesis for the brain, namely that 108 the various information processing measures are optimized 109 close to the critical point (3,13).

110
We next verify the stability of this result to changes in the 111 network's size by considering networks of various sizes in a 112 range between N = 50 and N = 400 neurons.We find that 113 all networks demonstrate similar non-monotonic behavior for 114 the dependence of reconstruction error (Figure 2A) and the 115 scale-freeness deviation measure κ (Figure 2B) on the strength 116 of noise.This non-monotonic behavior of the reconstruction 117 error is less pronounced for larger networks.This is expected, 118 because the recurrent network used in our study is particularly 119 suitable to code a single dimension of input by a small number 120 of neurons(6, 7), i. e., hundreds, rather than thousands, of 121 neurons per input dimension (see 10, as to how this problem 122 could be alleviated for large networks by encoding higher 123 dimensional inputs).

124
We observe the co-occurrence of efficient coding optimal-125 ity with criticality optimality across all network sizes.The 126 noise levels where coding error is minimal (x-coordinates in 127 Figure 2C) and where κ is minimal (y-coordinates in Fig- 128 ure 2C) are highly correlated across different network sizes.129 Furthermore, this observation is robust to variations in the 130 choice of the right cut-off needed for calculating κ (whiskers 131 in Figure 2C indicate the standard deviation with respect to 132 changing the ways κ is computed).Lastly, the location of the 133 cut-off of the scale-free distribution shifts right (to the larger 134 values) with the size of the network (Figure 2D), hinting at 135 the correct finite-size scaling behavior (see, e. g., 11,14).We believe, our study opens up promising avenues for future in-174 vestigations to establish the connection between other aspects of criticality (e. g., 16,17) and theories of neural computations (e. g., 7).

Materials and Methods
Further details are provided in the Supporting Information.
where x(t) is a given one-dimensional sensory input (similar to 2, 3), x(t) is the reconstructed sensory input, ri(t) is the firing rate of the neuron i, and α and β are the weights of the L1 and L2 penalties on the firing rate.
It is assumed that the input can be reconstructed by performing a linear readout of the spike trains, more precisely, by a weighted leaky integration of output spike trains, where oi indicates the output spike trains for the neuron i, and τ is the read-out time constant * , and wi is a constant read-out weight associated to the neuron i.
Given an idealized network with instantaneous synapses, the optimal network could be derived from first principles.Boerlin et al. (1) demonstrated that the dynamics of each leaky-integrate and fire (LIF) neuron can be expressed by conventional differential equation governing the dynamics of the membrane potentials, where Vi is the membrane potential of the neuron i, wi is the constant readout which was introduced in Equation 2, c(t) is the input to the network, oi(t) is the spike train of neuron i, β is the regularizer that was introduced in Equation 1, and ν(t) is a white noise with unit variance that was manually added in the original derivation of (1) for biological realism.Notably, in this network we have two types of input, a feed-forward input, wic(t) and a recurrent input −wi k w k o k (t).The recurrent input is the result of a fully connected network.In this network, neurons that receive a common input, decorrelate their activity to avoid communicating redundant information via instantaneous recurrent inhibition.

Chalk et al.
(2) introduced a more biologically plausible variants of (1)'s network by incorporating synaptic delays and introducing a balance network of inhibitory and excitatory population of neurons.They incorporated realistic synaptic delays by assuming that each spike generates a continuous current input to other neurons, with a dynamic that is described by the conventional alpha function, where τr and τ d are respectively synaptic rise and decay times.Adding realistic synaptic delays, led to network synchronization, which impairs coding efficiency.Chalk et al.
(2) demonstrated that, in the presence of synaptic delays, this network of LIF neurons can nonetheless be optimized for efficient coding by adding noise to the network.In this study, we implement the additional noise, as white noise added to the membrane potentials.However, (2) also demonstrated similar dependency of network's performance to noise by using other ways of incorporating noise, for instance, by inducing unreliability in spike elicitation (also see, 4-6, for other approaches).
The original network introduced by (1) was a pure inhibitory network.(2) introduced a variant of this network that respects the Dale's law.In their network, they introduce a population of inhibitory neurons that tracks the estimate encoded by the excitatory neurons, and provides recurrent feedback to the excitatory population (for further detail see, 1, 2) Avalanche detection.To investigate the scale-free characteristic of the spiking activity (as a potential signature of networks operating close to criticality), similar to previous studies (7), we probe the distribution of neural avalanches.A neuronal avalanche is defined as an uninterrupted cascade of spikes in the network (7).In a system operating close to criticality, the distribution of avalanche sizes (number of spikes in a cascade) and avalanche life-time follows a power-law (in this study we have only investigated the distribution of avalanche sizes).* In the efficient coding network used in this study (as in 2), for simplicity, the read-out time constant of the input (i.e., time-scale of x(t)) is the same as the time-constant of the membrane potential of the neurons.Nevertheless, in (1) they are not necessarily the same for more general computations.

Fig. 1 .
Fig. 1.Co-occurrence of the criticality and optimal settings for efficient coding.(A) Avalanche-size distributions of efficient coding networks with different noise levels (indicated in the legend and with consistent color code across all panels).(B) Deviation from criticality measure κ for three noise levels.Left: small noise (0.5, network appears supercritical, many large avalanches); middle: medium noise (1.3, network close to criticality); right: strong noise (5.5, network exhibit subcritical behavior with predominantly small avalanches).(thicker lines in A): The area of gray-shaded regions between the actual avalanche size distribution and fitted power-law distribution defines the deviation measure κ (in the middle panel, the filled region is not visible, as the avalanche size distribution is very close to the ideal power-law).Vertical lines indicate the choices of left and right cut-offs (see main text for more details).Distributions for the chosen noise levels are highlighted in bold in panel A (matching colors).(C) Deviation from power-law κ as a function of noise level (left y-axis), color matching the panel A. Gray dotted line indicates mean-square-error (MSE) (right y-axis) as a function of noise.Vertical continuous line (purple) indicates the noise level corresponding to minimal κ (the most scale-free avalanche size distribution), Vertical broken line (cyan) indicates the noise level corresponds to minimum MSE (the best efficient coding performance).These two vertical lines overlap exactly, demonstrating the coincidence of noise levels for scale-free behavior and efficient coding.
Figure 1B left), large avalanches dominate the distribution of avalanche sizes (a bump in the tail of the distribution signifies a transient synchronization in the network).On the other hand, for a large amount of noise (e. g., noise strength 5.5, thick red line in Figure 1A, or Figure 1B right), the distribution is concentrated on the small avalanches (an exponential distribution).However, for intermediate levels of noise (e. g., noise strength 1.3, thick green line in Figure 1A, or Figure 1B middle), the avalanche-size distribution resembles a power-law (appears as a linear function in the log-log coordinates), which is a key signature of criticality in neural systems (see, e. g., 11).
κ takes small (close to zero) values for a scale-free distribution (Figure 1B middle) and deviates from zero otherwise (Figure 1B left and right panels).

Fig. 2 .
Fig. 2. Co-occurrence of the criticality and efficient coding optimality across networks of different sizes.(A) Mean-square-error (MSE) of stimulus reconstruction for different injected noise amplitudes (similar to Figure 1C, gray line).Curves with different colors correspond to different network sizes (specified in the legend of panel B).(B) Deviation measure (κ) as function of noise (similar to Figure 1C, colorful dots).As in (A), different curves represent κ-noise relationship for networks of different sizes.(C) Y-coordinate of each point specifies the value of optimal noise chosen based on scale-freeness of the avalanche size distributions (minimum of κ), and the X-coordinate specifies the value of optimal noise chosen based on efficient coding criterion (minimum MSE).Error bars (mean ± standard deviation) indicate the variability across a wide range of choices of free parameters used for computing the deviation measure κ (see the supplementary text for more detail).(D) The cutoff of the power-law distribution for the most scale-free avalanche size distribution (resulting in smallest κ on panel B) for different network sizes shifts with the network size as expected from finite-size scaling ansatz for critical systems (colors are specified in the legend of panel B). 173