Stationary log-normal distribution of weights stems from spontaneous ordering in adaptive node networks

Experimental evidence recently indicated that neural networks can learn in a different manner than was previously assumed, using adaptive nodes instead of adaptive links. Consequently, links to a node undergo the same adaptation, resulting in cooperative nonlinear dynamics with oscillating effective link weights. Here we show that the biological reality of stationary log-normal distribution of effective link weights in neural networks is a result of such adaptive nodes, although each effective link weight varies significantly in time. The underlying mechanism is a stochastic restoring force emerging from a spontaneous temporal ordering of spike pairs, generated by strong effective link preceding by a weak one. In addition, for feedforward adaptive node networks the number of dynamical attractors can scale exponentially with the number of links. These results are expected to advance deep learning capabilities and to open horizons to an interplay between adaptive node rules and the distribution of network link weights.

where V i (t) is the scaled voltage of the i th terminal, T = 20 ms is the membrane time constant, V st = 0 stands for the scaled stable (resting) membrane potential (Methods) and J i stands for the i th terminal weight. W m and τ m stand for the m th link weight and delay, respectively, and the summation over n sums all input timings arriving at the m th link, t m (n). A spike occurs when the voltage of one of the terminals crosses the threshold, V i ≥ 1. After a spike is Figure 1. Adaptive nodes in a feedforward network. (a) A schema of N input units connected with fixed weights (synapses), W m , to K terminals (dendrites) which are connected with adaptive weights, J i , to an output unit. Delays between input units and the output unit are denoted by τ m and the dynamics of the output unit is governed by leaky integrated-and-fire neuron, eq. (1). After an adaptation step, + J i is a function of its relative change, δ, and an additive noise, η, eq. generated the terminal's voltage is set to V st , and a refractory period of 2 ms occurs, where no evoked spikes are possible by any one of the terminals (Methods). Note that in order to achieve a threshold crossing, typically many inputs have to arrive to a neuron in temporal proximity via one of its terminals 29 .
For every pair of a sub-threshold stimulation via terminal i and an evoked spike from a different terminal, an adaptation step occurs for J i : where δ i and η i stand for the relative change and an additive random white noise, respectively. The relative change, δ, is the same as the adaptation rule used for link weights and follows the modified Hebbian learning rule 12,19,20 (blue-line in Fig. 1b). This relative change is a function of the time-lag between a sub-threshold stimulation and an evoked spike, t sub − t spike , originated from a different terminal. Specifically, the relative change decays exponentially to zero for large time-lags and follows its sign (Fig. 1b). The qualitative reported results were also found to be robust to a simplified two-level adaptation rule (dashed blue-line in Figs 1b and S1). Following recent experimental evidence, an additional ingredient is introduced, where a threshold crossing by one of the terminals now generates an evoked spike with probability 30 where Δt is the time-lag from the last threshold crossing, and f c reflects the maximal stationary firing frequency of the neuronal terminal, e.g. 15 Hz. Note that for high stimulation frequencies (>f c ) the nodal firing rate is saturated at K · f c , and for low stimulation frequencies (<f c ) response failures practically vanish ( Fig. 1c-e, Methods).
Feedforward networks. When the input units are simultaneously stimulated with a common stimulation frequency, three possible types of dynamics for J i are observed, according to the initial weights and delays, W m and τ m . This is exemplified for a network with an output node with three terminals and 15 input units, without noise (K = 3, N = 15, η i = 0) and 5 Hz stimulation frequency (Fig. 1c). In the first type of dynamics, all J i converge to fixed values ( Fig. 1c 1 ). The second type is characterized by fast oscillations with relatively small fluctuations of each J i around an average value ( Fig. 1c 2 ), and their periods are below few seconds, typically sub-seconds. The third type is characterized by slow oscillations with periods which can exceed hundreds of seconds (Fig. 1c 3 ) and exists for K > 2 only 18 . They are accompanied by large variations in the amplitudes of J i and consist of long plateaus at extreme values. The fraction of initial time-independent weights and delays, W m and τ m , leading to oscillations was estimated using random sampling (Methods) for K = 3 and varies N (Fig. 1d). It increases from ~0.4 for N = 9 to ~0.8 for N = 27, indicating that the phenomenon of oscillations is a common scenario in adaptive node networks. Note that in the traditional adaptive link scenario all W m converge either to zero or to above-threshold (similar to Fig. 1c 1 ) and oscillations are excluded 18 . The robustness of the fast and slow oscillations to small stochastic noise, η in eq. (2), was examined using the Fourier analysis of the adaptive weights (Fig. 2). For fast oscillations, the noise does not affect the periods of oscillations, but only slightly affects their Fourier amplitudes (Fig. 2a,c). In contrast, for slow oscillations the noise, η, affects the periodicity which is typically shortened (Fig. 2d). This trend is a result of the noise which prevents the plateau at small values of terminal weights, J, for long periods (Fig. 2b).
The number of different stationary firing patterns, attractors, in the large N limit, can be bounded from below, for given K and delays τ m (Fig. 1a). Assuming that for each terminal there are N 0 < N/K non-zero inputs, the number of different attractors, A(N 0 ), is estimated using an exhaustive random sampling for W m (Methods). A lower bound for the number of dynamical attractors for the entire network with N non-zero inputs, scales as since for each one of the K terminals one can select a subset of N 0 inputs among N/K, with repeated abovethreshold stimulated inputs. Each one of these choices results in A(N 0 ) different attractors as a result of different delays. For K = 3 and N 0 = 3, for instance, the number of different attractors was estimated as A(3) ~ 1500 ( Fig. 1e and Methods), indicating that eq. (4) scales as N 9 . For N 0 = O(N) even with small K, e.g. K = 2, the number of different attractors is expected to scale exponentially with N. This type of input scenarios is expected in biological realizations where a neuron has only a few terminals (dendritic trees) 13 and many thousands of links (synapses) 29 , however at each firing event only a small fraction of the input links is effectively involved 29 . Results indicate powerful computational capabilities under biological realizations with a huge number of attractors even for such a simple feedforward network with only finite number of adaptive terminals.

Recurrent networks and log-normal distributions.
To this point we assumed simultaneous inputs, which is far from biological reality. In order to expand the study to non-simultaneous inputs, asynchronous stimulations are first discussed for the case of population dynamics between two pools 31 consisting of 500 nodes each (Fig. 3a 1 and a 2 ). In the adaptive links scenario, each node receives 60 inputs from randomly selected nodes from the other pool ( Fig. 3a 1 ), or via three adaptive terminals for the adaptive nodes scenario ( Fig. 3a 2 ), using random delays from a normal distribution of 100 ms mean and a standard deviation of 2 ms. Networks are simulated using eq. (1) (Methods). For the adaptive links scenario, weights converge to biologically unrealistic limits, and are frozen at either above-threshold or practically vanish ( Fig. 3b 1 and c 1 ). For the adaptive nodes scenario, the distribution of the effective weights, W m · J i , converges to a stationary log-normal distribution ( Fig. 3b  inputs from 60 randomly selected nodes ( Fig. 3a 3 and b 3 ). The log-normal distribution is stationary (Fig. S3), however each weight is not frozen and significantly varies along the dynamics (Fig. 3b 3 and c 3 ) such that its distribution can also be well approximated by log-normal (Fig. S2). The log-normal distribution is not attributed to the emergence of some spontaneous clustering among the adaptive nodes, as the raster plot indicates a random firing activity of each node and the entire network, without any significant structure in the Fourier spectrum (Fig. S4). For pools consisting of adaptive nodes ( Fig. 3a 2 ), where one of the pools was initially triggered, the raster plot initially indicates alternating firing between two synchronized pools ( Fig. 3d 1 ). The variation between delays results in the broadening of the firing stripes until merging occurs with random raster activity ( Fig. 3d 2 ), similar to the random network (Fig. 3a 2 ). This broadening and merging might be a self-control mechanism to terminate an induced reverberating mode. Interestingly, the log-normal distribution for the effective weights emerges already in the transient consisting of stripes activity (Fig. 3d 3 ), and only its average (and variance) are later adjusted. The firing rate of each terminal in the network (Fig. 3a 2 and a 3 ) is saturated, ~15 Hz, hence, the firing frequency of each node is ~45 Hz.
Restoring force via spontaneous spike ordering. The understanding of the underlying mechanism for the emergence of a stationary log-normal distribution requires the examination of a much simpler system imitating the network activity. We examine the dynamics of an adaptive node consisting of two terminals (K = 2) and 60 inputs, where each one of the inputs is stimulated at random and on the average at 30 Hz ( Fig. 4a and Methods). The distribution of the effective weights is indeed log-normal distribution (Fig. 4b) and is practically identical to the distribution obtained in the network dynamics (Fig. 3b 3 ).
The emergence of a log-normal distribution is natural, since multiple adaptation steps of a weight, eq. (2), result in a multiplicative process, however, its stationary shape requires an explanation. The relative change in links with a given weight J, averaged over such instances during the stationary dynamics, revealed a stochastic restoring force towards the most probable J (Figs. 4c and S5). The origin of this restoring force is the emergence of spontaneous temporal ordering of pairs of spikes for a given adaptive node during the dynamical process. For simplicity we assume K = 2 and we concentrate on momentary events of the dynamics where the adaptive node has simultaneously one weak, J W , and one strong, J S , terminals, relative to the most probable values of the log-normal distribution (Fig. 4b). Next we estimate in simulations the probability of occurrence of the following two types of pairs of spikes in a bounded time window, e.g. 5 ms (Fig. 4d 1 ). The first type, P SW , stands for a spike generated by J S prior to a spike generated by J W , and vice versa for the second type of pairs, P WS . Simulation results indicate  where typically P SW is several times greater than P WS , and P SW constitutes a few percent of all pairs of events (Fig. 5a). This preference, eq. (5), is exemplified using the following self-consistent argument assuming that initially the weak and the strong spikes occur almost simultaneously (Fig. 4d 2 ). Since the input units of both terminals are stimulated at the same rate, the threshold crossing of J S occurs before J W which are both accompanied by response failures, eq. (3). Consequently, the spike generated by J S occurs prior to the spike generated by J W (Fig. 4d 2 ). Note that the adaptation steps, eq. (2), change the two terminals from remaining strong and weak, however on the average there is a stochastic tendency for the strong spike to evoke prior the weak one.

Discussion
The mechanism of the restoring force is a direct consequence of the spontaneous temporal ordering (Fig. 4d). A terminal that evoked a spike resets its membrane potential which rapidly increases by many sub-threshold stimulations. The threshold crossing is achieved again in several ms and is followed by many response failures. Hence, the strong terminal generates most of its sub-threshold stimulations prior to the following weak spike, whereas all the sub-threshold stimulations of the weak terminal appear after the strong spike (Fig. 4e). Following the adaptation rule, eq. (2), the strong terminal is decreased, ΔJ S < 0, whereas the weak terminal is enhanced, ΔJ W > 0 (Fig. 4e) and the restoring force is created. A necessary ingredient in the formation of the mechanism to achieve a stationary log-normal distribution is that the majority of the sub-threshold stimulations of the strong spike occur prior to the weak one (Fig. 4e). For short refractory periods the time-lag between a pair of strong-weak spikes decreases, since the minimal time-lag between consecutive spikes decreases. Indeed, for short enough refractory periods and certainly for a vanishing one, the log-normal distribution was found in simulations to be unstable, where all effective weights are asymptotically above threshold, since both ΔJ s and ΔJ w are now positive (Fig. 5). The log-normal distribution of link weights is an emerging spontaneous feature of adaptive node networks where the essential role of the refractory period is evident. Results open the horizon to explore the possible interplay between the adaptive node rules and stationary distribution classes of the network link weights 26,27 .

Methods
Simulation dynamics. Each node is described by several independent terminals, and a node generates a spike when a terminal crosses a threshold (eqs. (1) and (3)). The voltage of each terminal is determined according to the leaky integrate and fire model as described in eq. (1), where T = 20 ms. For simplicity, we scale the equation such that V th = 1, V st = 0, consequently, V ≥ 1 is above threshold and V < 1 is below threshold. Nevertheless, results remain the same for both the scaled and unscaled equations, e.g. V st = −70 mV and V th = −54 mV. The initial voltage for each terminal is V (t=0) = 0 and J i = 1. The adaptation is done according to eq. (2), where , and Δt stands for the time between a sub-threshold stimulation and a spike, up to a cutoff at 50 ms. The parameter η is chosen randomly in the range [−0.5, 0.5] · 10 −3 , and A is the adaptation step taken as 0.05, unless otherwise is stated.
Refractory period. After a spike is generated, the terminal that evoked a spike cannot respond to other stimulations arriving in the following 2 ms. During this refractory period, all other terminals cannot evoke a spike or cross the threshold as well, but can increase their membrane potential as a result of stimulations.
Response failure. When crossing the threshold, the terminal creates a spike with probability of Δt · f c , where Δt is the time-lag from the last threshold crossing by this terminal, and f c reflects the maximal stationary firing frequency of the terminal. In case the terminal failed to respond its voltage is set to its previous value.
The parameters for feedforward networks. Number of terminals = 3, number of inputs per terminal = 5, refractory period = 2 ms, link weights are randomly chosen from a uniform distribution in the range [0.1, 1.1], delays (τ) are randomly chosen from a uniform distribution in the range [1,150] ms (Fig. 1c). Links are ordered with increasing delays, except the maximal delay which is linked to the first terminal (closing a loop). The dynamics is given by eq. (1) and is numerically solved with a time resolution of 1 ms. Initial terminal weights, J i , are set to 1. We assume large f c , hence response failures are excluded. In addition, in Fig. 1 η = 0. The robustness of the results to noise, η > 0, is demonstrated in Fig. 2. The upper bound for the terminal weights is J i = 10 and the lower bound is J i = 10 −6 .
The fraction of oscillations. The fraction of each type of dynamics was estimated using 20,000 random initial conditions for the delays, τ m , and the weights, W m , (defined above) for each number of inputs per terminal (Fig. 1d).
The number of attractors. Number of terminals = 3, number of inputs per terminal = 3. The average and the standard deviation of each point was obtained from 10-18 samples, each sample is with a fixed set of N delays (τ) and the initial conditions for the N weights are randomly sampled. In order to determine if two initial occurs before the weak one (right), explaining the emergence of the spontaneous spike ordering in (d 1 ). (e) The most probable scenario, a weak spike consecutive a strong one ((d 1 ) left), is demonstrated with accompanied sub-threshold stimulations and response failures (each background covers sub-threshold stimulations and the other color spike). Adaptation step of the strong spike is negative, ΔJ S < 0, since most of its sub-threshold stimulations are prior to the weak spike, whereas ΔJ w > 0 since all the weak sub-threshold stimulation occur after the strong spike. These stochastic opposite trends explain the mechanism for the emergence of the restoring force in (c). conditions lead to the same attractor, we compared the firing rate from each input link. We calculated the number of firing events for each link, and compared it with the same link from a simulation with different initial weights. If for all of the input links the difference is less than 2%, we determine that these different initial weights lead to the same attractor. For links that have low firing rates, the comparison was made between non-firing events. We obtained very similar results when the comparison was done between the firing timings for each link, instead of number of firing events (Fig. 1e).

Recurrent network parameters.
Number of terminals = 3, number of inputs per terminal = 60, f c = 15 Hz, refractory period = 2 ms, adaptation step A = 0.05. The dynamics of each node is given by eq. (1) and is solved with a time resolution of 0.1 ms. Link weights are randomly chosen from a uniform distribution in the range [0.1, 0.2], delays are randomly chosen from a normal distribution with a mean of 100 ms and STD of 2 ms, initial terminal weights, J i , are set to 1. In order to initiate the network simulation, 0.4 of the nodes in the network are stimulated above-threshold. Spontaneous noise, external above-threshold stimulations, is randomly added with an average frequency of 0.01 Hz per node (Fig. 3).
The ratio max/min of each weight (Fig. 3c) was calculated for the last 2 seconds of the simulation, out of 50 seconds for adaptive nodes and 350 seconds for adaptive links (same running time as for Fig. 3b). For networks of adaptive links (Fig. 3c 1 ) a fraction of the weights vanishes, hence the upper bound of the histogram is set to 60. The histograms (Fig. 3c 2 and c 3 ) constitute of 100 bins each. For visibility, points in the raster plots (Fig. 3d 1 and d 2 ) were 50% diluted.
The parameters for the feedforward network with random inputs. Number of terminals = 2, number of inputs per terminal = 60, f c = 15 Hz, refractory period = 2 ms, adaptation step A = 0.1, link weights are randomly chosen from a uniform distribution in the range [0.1, 0.2], initial terminal weights are set to 1 (Fig. 4). The dynamics is given by eq. (1) and is solved with a time resolution of 0.1 ms. Running time = 2500 seconds, where a transient of 200 seconds is excluded in the measurements. Strong and weak weights (Fig. 4b-e) were chosen such that 50% of the weights were between the maximum of the weak and the minimum of the strong effective weights, and in addition for each limit (maximum and minimum) 1% of the extreme weights were excluded. The force was calculated in bin size of 0.05 and defined as J J J ( ) − + , where 〈J〉 stands for the average bin value. The error bar (Fig. 4c) stands for the standard deviation of the adaptation steps belonging to each bin.  Fig. 4b, where P SW stands for the fraction of spikes generated by J S prior to a spike generated by J W in a time windows of 5 ms, and P WS represents the opposite scenario. Results were calculated starting from the minimal possible refractory period, 0.4 ms (black dashed line) where the log-normal distribution was obtained. (b) The most probable scenario, where a weak spike follows a strong one, is demonstrated with accompanied sub-threshold stimulations (solid thin lines) and response failures (dotted lines below the x-axis) for two cases. (b 1 ) Above the critical refractory period (RP C ), the time-lag between the spikes is long enough to ensure that most of the strong sub-threshold stimulations (thin purple lines) occur prior to the weak spike, resulting in a total negative adaptive step (ΔJ S < 0). (b 2 ) Below RP C , the time-lag between the spikes is too short, thus most of the strong sub-threshold stimulations (thin purple lines) occur after the weak spike, resulting in a total positive adaptive step (ΔJ S > 0). Since both ΔJ S and ΔJ W are positive, all weights are driven to be above-threshold.