Optimizing microcircuits through reward modulated STDP

aEigen value spread is defined as the ratio |λmax/λmin|, where λmax and λmin are the maximum and minimum eigen values of the cross-correlation matrix of liquid state. bEach component of x(t) models the impact that a particular neuron v may have on the membrane potential of a generic readout neuron. Thus each spike of neuron v is replaced by a pulse whose amplitude decays exponentially with a time constant of 30 ms. In other words: x(t) is obtained by applying a low-pass filter to the spike trains emitted by the neurons in the generic neural microcircuit model. cIs defined as − ∑n i=1 xi(t) · ln(xi(t)), where xi(t) is the i th component of the Liquid State x(t), and n is the number of neurons in the generic neural microcircuit. • Results indicate that the approach presented here leads to the convergence of parameters mentioned above


Overview
What is reservoir computing?Figure 1: The initial "liquid computing" model of [1] and its subsequent expansion by allowing feedback [2] from trained linear readouts (dashed line).The circuit itself is a generic recurrent circuit, based on biological data (not constructed for any particular task).
A learning paradigm for recurrent neural circuits characterized by: • A recurrent, randomly connected "reservoir" (e.g. a generic neural microcircuit), that delivers a high-dimensional projection of the low dimensional input space • Simple linear readouts that can be trained (with e.g.linear regression) to compute the desired output from the current reservoir state • Such generic neural circuits can be used for several open-loop sensory processing tasks that can be carried out with rapidly fading memory • Providing tuned feedback from trained readouts enhances the computational power of such circuits enabling tasks requiring persistent or working memory • Traditionally, learning is constrained to only the synapses projecting from the generic neural microcircuit to the linear readout, leaving the recurrent circuitry intact How can we characterize if a given neural microcircuit is optimal for the class of computational operations that the readout has to perform on a certain input distribution?
Currently the only (rather tautological) answer is: A neural microcircuit is optimal if it yields accurate models.The question is a difficult one as the relation between dynamical properties of the input and output signal and the properties of induced circuit dynamics is not well understood [3].
It has been argued in [3] that the desirable features of such optimal circuit would be: We show that modifying the recurrent synapses of a neural circuit via STDP can optimize the circuit in an unsupervised fashion if the STDP is modulated by a global reward signal a Eigen value spread is defined as the ratio |λmax/λ min |, where λmax and λ min are the maximum and minimum eigen values of the cross-correlation matrix of liquid state.
b Each component of x(t) models the impact that a particular neuron v may have on the membrane potential of a generic readout neuron.Thus each spike of neuron v is replaced by a pulse whose amplitude decays exponentially with a time constant of 30 ms.In other words: x(t) is obtained by applying a low-pass filter to the spike trains emitted by the neurons in the generic neural microcircuit model.
c Is defined as , where x i (t) is the i th component of the Liquid State x(t), and n is the number of neurons in the generic neural microcircuit.
• Results indicate that the approach presented here leads to the convergence of parameters mentioned above • The approach is quite robust and converges irrespective of the initial circuit dynamics being in a high or sparse firing regime • The network manages to increase the recurrent inhibitory drive when the circuit is initialized with high excitatory drive and decrease the inhibitory drive when the circuit was drawn with a low excitatory drive with the convergence point exhibiting slightly dominant inhibition 2 Methods • Generic Neural Microcircuit: 135 leaky integrate-and-fire neurons arranged on the grid points of a 3 × 3 × 15 cube in 3D, with 20% of the neurons randomly chosen to be inhibitory • Neuron Parameters: Membrane time constant 30 ms, absolute refractory period 3 ms (excitatory neurons), 2 ms (inhibitory neurons), threshold 15 mV (for a resting membrane potential assumed to be 0), reset potential drawn uniformly from the interval [13.8 mV, 14.5 mV].
• Connection probability: from neuron a to neuron b (as well as that of a connection from neuron b to neuron a) was defined as , where D(a, b) is the Euclidean distance between neurons a and b and λ is a parameter which controls both the average number of connections and the average distance between neurons that are synaptically connected (we set λ = 2).Depending on whether the pre-or postsynaptic neuron were excitatory (E) or inhibitory (I), the value of C was set to 0.3 (EE), 0.2 (EI), 0.4 (IE), 0.1 (II).
• Inputs to the circuit were 16 spike trains drawn at 40 Hz, with each being projected to approx.63% of neurons in the generic circuit.
• Initially the circuit was simulated once to measure the baseline values of the EV spread (ε), entropy (H), pair-wise decorrelation (ρ), and the number of principal components (N) needed to represent 95% of the information contained in the liquid state.
• Each trial lasted for 200 msec and consisted of simulating the circuit with the inputs drawn from the distribution and measuring the global reward signal at the end of the trial.The reward for a trial was computed as: where ∆(.) denotes the percentage difference in the value of a parameter between (n + 1) th and n th trial and α is a scaling factor set to 2.5.
• The recurrent synapses were modified using STDP which was modulated using the reward signal.More precisely, for a presynaptic spike occuring at time tpre and a postsynaptic spike occuring at t post , the weight change is given by: ∆w  Other possible reward functions: statistical analysis

Conclusions
• We have demonstrated that using STDP in conjunction with a global reward signal can adapt a circuit in an unsupervised fashion to tune important network parameters.
• This approach tunes the circuit such that the recurrent weight matrix has a low eigenvalue spread, the circuit dynamics demonstrate higher entropy, high-decorrelation and needs a much higher number of principal components to represent the information contained in the liquid state.
• Moreover, the approach leads to convergence irrespective of the initial circuit dynamics being in a high or sparse firing regime with the convergence point exhibiting slightly dominant inhibition.
• Since each of the parameters tries to make individual reservoir units as "mutually different" as possible, several reward functions are plausible.
5 Open Questions and Future Work • The method is heuristic.Further investigation is needed to determine why this approach leads to convergence.
• Determining the stopping point for the optimization process from the perspective of a linear readout.

Figure 2 :Figure 2 :Figure 3 :
Figure 2: (A) Change in weights through reward modulated STDP after 10 trials.(B) Histograms showing the distribution of weights before (blue) and after (red) optimization.

Figure 4 :
Figure4: Statistical analysis with several plausible reward functions.For each reward function, 20 runs were performed, each consisting of 10 trials.Since all parameters aim at making individual reservoir units as different as possible, the reward can be set to be proportional to either of these parameters.Shown are results for evolution of (A) EV Spread, (B) Entropy, (C) Pair-wise correlation, and (D) Number of principal components needed to represent 95% of information in the liquid state.