STDP allows close-to-optimal spatiotemporal spike pattern detection by single coincidence detector neurons

By recording multiple cells simultaneously, electrophysiologists have found evidence for repeating spatiotemporal spike patterns, which can carry information. How this information is extracted by downstream neurons is unclear. In this theoretical paper, we investigate to what extent a single cell could detect a given spike pattern and what the optimal parameters to do so are, in particular the membrane time constant $\tau$. Using a leaky integrate-and-fire (LIF) neuron with instantaneous synapses and homogeneous Poisson input, we were able to compute this optimum analytically. Our results indicate that a relatively small $\tau$ (at most a few tens of ms) is usually optimal, even when the pattern is much longer. This is somewhat counter intuitive as the resulting detector ignores most of the pattern, due to its fast memory decay. Next, we wondered if spike-timing-dependent plasticity (STDP) could enable a neuron to reach the theoretical optimum. We simulated a LIF neuron equipped with additive spike-timing-dependent potentiation and homeostatic rate-based depression, and repeatedly exposed it to a given input spike pattern. As in previous studies, the LIF progressively became selective to the repeating pattern with no supervision, even when the pattern was embedded in Poisson activity. Here we show that, using certain STDP parameters, the resulting pattern detector can be optimal. Taken together, these results may explain how humans can learn repeating visual or auditory sequences. Long sequences could be recognized thanks to coincidence detectors working at a much shorter timescale. This is consistent with the fact that recognition is still possible if a sound sequence is compressed, played backward, or scrambled using 10ms bins. Coincidence detection is a simple yet powerful mechanism, which could be the main function of neurons in the brain.


Introduction
Electrophysiologists report the existence of repeating spike sequence involving multiple cells, also called "spatiotemporal spike patterns", with precision in the millisecond range, both in vitro and in vivo, lasting from a few tens of ms to several seconds [Tiesinga et al., 2008]. In sensory systems, different stimuli evoke different spike patterns (also called "packets") [Luczak et al., 2015]. In other words, the spike patterns contain information about the stimulus. How this information is extracted by downstream neurons is unclear. Can it be done by neurons only one synapse away from the recorded neurons? Or are multiple integration steps needed? Can it be done by simple coincidence detector neurons, or should other temporal features, such as spike ranks, be taken into account? Here we wondered how far we can go with the simplest scenario: the readout is done by simple coincidence detector neurons only one synapse away from the neurons involved in the repeating pattern. We demonstrate that this approach can lead to very robust pattern detectors, provided that the membrane time constants are relatively short, possibly much shorter than the pattern duration.
In addition, it is known that mere repeated exposure to meaningless sensory sequences facilitates their recognition afterwards, in the visual [Gold et al., 2014] and auditory modalities [Agus et al., 2010, Andrillon et al., 2015, Viswanathan et al., 2016, even when the subjects were unaware of these repetitions. Thus, an unsupervised learning mechanism must be at work. It could be the so called spike-timingdependent plasticity (STDP). Indeed, some theoreti-1 arXiv:1610.07355v2 [cs.NE] 9 Jan 2017 2 Formal description of the problem 2 cal studies by us and others have shown that neurons equipped with STDP can become selective to arbitrary repeating spike patterns, even without supervision [Gilson et al., 2011, Humble et al., 2012, Hunzinger et al., 2012, Kasabov et al., 2013, Klampfl and Maass, 2013, Krunglevicius, 2015, Masquelier et al., 2008, 2009, Nessler et al., 2013, Sun et al., 2016, Yger et al., 2015. Using numerical simulations, we show here that the resulting detectors can be close to the theoretical optimum.

Formal description of the problem
We assess the problem of detecting a spatiotemporal spike pattern with a single LIF neuron. Intuitively, one should connect the LIF to the neurons that are particularly active during the pattern, or during a subsection of it. That way, the LIF will tend to be more activated by the pattern than by some other input. More formally, we note L the pattern duration, N the number of neurons it involves. We call Strategy #n the strategy which consists in connecting the LIF to the M neurons that emit at least n spike(s) during a certain time window ∆t ≤ L of the pattern. Strategy #1 is illustrated on Figure 1.
We hypothesize that all afferent neurons fire according to an homogeneous Poisson process with rate f , both inside and outside the pattern. That is the pattern corresponds to one realization of the Poisson process, which can be repeated (this is sometimes referred to a "frozen noise"). To model jitter, at each repetition a random time lag is added to each spike, drawn from a uniform distribution over [−T, T ] (a normal distribution is more often used, but it would not allow analytical treatment, see next section).
We also assume that synapses are instantaneous (i.e. excitatory postsynaptic currents are made of Diracs), which facilitates the analytic calculations.
For now we ignore the LIF threshold, and we want to optimize its signal-to-noise ratio (SNR), defined as: where V max is the maximal potential reached during the pattern presentation, V noise is the mean value for the potential with Poisson input (noise period), and σ noise its standard deviation (see Figure 1). Pattern t Fig. 1: Detecting a spike pattern with a LIF neuron. (Top) Raster plot of N = 10 4 neurons firing according to an homogeneous Poisson process. A pattern of duration L can be repeated (frozen noise). Here we illustrated Strategy #1, which consists in connecting the LIF to all neurons that fire at least once during a certain time window of the pattern, with duration ∆t ≤ L. These neurons emit red spikes. Of course they also fire outside of the ∆t window. (Bottom) Typically the LIF's potential will be particularly high when integrating the spikes of the ∆t window, much higher than with random Poisson inputs, and we want to optimize this difference, or more precisely the signal-to-noise ratio (SNR, see text).

A theoretical optimum
3 3 A theoretical optimum

Deriving the SNR analytically
We now want to calculate the SNR analytically. In this section, we assume unitary synaptic weights.
Since the LIF has instantaneous synapses, and the input spikes are generated with a Poisson process, we have V noise = τ f M and σ noise = τ f M/2, where τ is the membrane's time constant [Burkitt, 2006]. The number of selected afferents M depends on the strategy n. The probability that an afferent fires k times in the ∆t window is given by the Poisson probability mass function: P(k spikes) = λ k e −λ k! , with λ = f ∆t. The probability that an afferent fires at least n times is thus 1 − e −λ n−1 k=0 λ k k! , and finally, on average: We now need to estimate V max . Intuitively, during the ∆t window, the effective input spike rate, which we call r, is typically higher than f M , because we deliberately chose the most active afferents. For example, using Strategy #1 with ∆t = 10 ms ensures that this rate is at least 100Hz per afferent, even if f is only a few Hz. More formally, Strategy #n discards the afferents that emit fewer than n spikes. This means on average the number of dis- Thus on average: We note V ∞ = τ r the mean potential of the steady regime that would be reached if ∆t was infinite. We now want to compute the transient response. The LIF with instantaneous synapses and unitary synaptic weights obeys the following differential equation: where t i are the presynaptic spike times. We first make the approximation of continuity, and replace the sum of Diracs by an equivalent firing rate R(t): R(t) should be computed on a time bin which is much smaller than τ , but yet contains many spikes, to avoid discretization effects. In other words, this approximation of continuity is only valid for a large number of spikes in the integration window, that is if rτ >> 1, which for Strategy #1 leads to N f τ >> 1.
Note that R(t) = f M during the noise period, and R(t) = r during the ∆t window (in the absence of jitter).

At this point it is convenient to introduce
, which obeys the following differential equation: x-axis is time, and y-axis is spike number (arbitrary, so we order them in increasing added jitter, which is a random variable uniformly distributed over [−T, T ]). Dashed (resp. solid) lines corresponds to the boundaries of the raster plot before (resp. after) adding jitter. The left (resp. right) panel illustrates the ∆t > 2T case (resp. ∆t < 2T ) (Bottom) We plotted the corresponding spike time histograms, or, equivalently, doing the approximation of continuity, i(t). One can easily compute t 1 = t 3 = min(∆t, 2T ), t 2 = |∆t − 2T |, and h = min(1, ∆t/2T ). One can check that the trapezoidal area is ∆t whatever T (jittering does not add nor remove spikes). where is the dimensionless input current, such as i = 0 during the noise period (when the input spike rate is f M ), and i = 1 when the input spike rate is r).
Without jitter, i(t) would be a simple step function, equals to 1 during the ∆t window, and 0 elsewhere. Adding jitter, however, turns i(t) into a trapezoidal function, which can be calculated (see Fig. 2). Now that i(t) is known, one can compute v(t) by integrating Equation 6.
The response of the LIF to an arbitrary current i(t) is [Tuckwell, 1988]: (7) With i = at + b, and given that a primitive of te t is te t − e t , the integral can be computed exactly: Note that another jitter distribution than uniform (e.g. normal), would not lead to a piece-wise linear function for i(t), and thus would typically not permit exact integration like here. 0 t 1 t 1 +t 2 t 1 +t 2 +t max t 1 +t 2 +t 3 , which lowpass filters i(t), can be computed exactly on each piece. One can thus compute succes- As illustrated on Figure 3, one can use Equation 8 3 A theoretical optimum One and differentiate it: One can check that v max = h − tmax 2T which means that the maximum is on the trapezoid edge, which is logical: before the crossing i > v, so v increases; after the crossing i < v, so v decreases. Plugging the t max value into Equation 11, and expliciting all variables, we have: One can check that if T << τ and T << ∆t, then v max ∼ 1 − e −∆t/τ , which is the classical response of a LIF to a step current.
From the definition of v: . We now have everything we need to compute the signal to noise ratio:

Numerical validation
We verified the exact Equation 15 through numerical simulations. We used a clock-based approach, and integrated Equation 4 using the forward Euler method with a 0.1ms time bin. We generated 100 random Poisson patterns of duration L = 20ms, involving N = 10 4 neurons with rate f = 5Hz. We chose ∆t = L = 20ms, i.e. the LIF was connected to all the afferents that emitted at least n spikes during the whole pattern, n being the strategy number. In order to estimate V max , each pattern was presented 1000 times, every 400ms. Between pattern presentations, the afferents fired according to a Poisson process, still with rate f = 5Hz, which allowed to estimate V noise and σ noise . We could thus compute the SNR from Equation ?? (and its standard deviation across the 100 patterns), which, as can be seen on Figure 4, matches very well the theoretical values, for strategies 1 and 2.

Optimizing the SNR
We now want to optimize the SNR given by Equation 15. We consider that f and T are external variables, and that we have the freedom to choose the strategy number n, τ and ∆t. We also add the constraint τ f M ≥ 10, so that the approximation of continuity is reasonable, even in the noise periods. We assume that L is sufficiently large so that an upper bound for ∆t is not needed. We used the Matlab R2015b Optimization Toolbox (MathWorks Inc., Natick, MA, USA) to compute the optimum numerically. Figure 5 illustrates the results. One can make the following observations: • Strategy #1 is usually the best for f and T in the biological ranges (see below), while higher numbers are optimal for very large f and T (see panel A). This means that emitting a single spike is already a significant event, that should not be ignored. We will come back to this point in the discussion.
• Unsurprisingly, optimal τ and ∆t typically have the same order of magnitude (∆t being slightly larger, see panel C). Unless T is high (>10ms), or f is low (<1Hz), then these timescales should be relatively small (at most a few tens of ms). This means that even a long pattern (hundreds of ms or above) is optimally detected by a coincidence detector working at a shorter timescale. This could explain the apparent paradox between typical ecological stimulus durations (hundreds of ms or above) and the neuronal integration timescales (at most a few tens of ms).
• The constraint τ f M ≥ 10 imposes larger τ when both f and T are small (panel B, lower left). In the other cases, it is naturally satisfied.
• Unsurprisingly, the optimal SNR decreases with T . What is more surprising, is that it also de-4 Simulations show that STDP can be close-to-optimal 7 creases with f . In other words, sparse activity is preferable. We will come back to this point in the discussion.
What is the biological range for f and T ? It is worth mentioning that f is probably largely overestimated in the electrophysiological literature, because the technique totally ignores the cells that do not fire. Furthermore, experimentalists tend to select the most responsive cells, and search for stimuli that elicit strong responses. Mean firing rates, averaged across time and cells, could be smaller than 1 Hz [Shoham et al., 2006].
T corresponds to the spike time precision. Millisecond precision in cortex has been reported [Havenith et al., 2011, Kayser et al., 2010, Panzeri and Diamond, 2010. We are aware that other studies found poorer precision, but this could be due to uncontrolled variable or the use of inappropriate reference times [Masquelier, 2013].
We now focus, as an example, on the point on the middle of the T × f plane, whose parameters are gathered in Table 1. The resulting SNR is very high (about 80). In other words, it is possible to choose a threshold for the LIF which will be reached when the pattern is presented, but hardly ever in the noise periods.
In the next section, we investigated, through numerical simulations, if STDP can find this optimum. More specifically, since STDP does not adjust τ , we set it to the optimal value in Table 1 and investigated whether STDP could lead to the optimal n and ∆t. The set up we used was similar to the one of our previous studies [Gilson et al., 2011, Masquelier et al., 2008. We simulated a LIF neuron connected to all of the N = 10 4 afferents with plastic synaptic weights w i ∈ [0, 1], obeying the following differential equation: Initial synaptic weights were all equal. Then these synaptic weights evolved in [0, 1] with additive, all-toall spike STDP like in Song et al. [2000]. Yet we only modeled the Long Term Potentiation part of STDP, ignoring its Long Term Depression (LTD) term. Here LTD was modeled by a simple homeostatic term w out < 0, which is added to each synaptic weight at each postsynaptic spike [Kempter et al., 1999]. Note that using a spike-timing-dependent LTD, could also lead to the detection of a repeating pattern, as demonstrated in our earlier studies [Masquelier et al., 2008[Masquelier et al., , 2009], but less robustly, because it is more difficult to depress the synapses corresponding to afferents that do not spike in the repeating pattern.
As in Song et al. [2000], at each synapse i, we introduce the trace of presynaptic spikes A i pre , which obeys the following differential equation: Furthermore: • At each presynaptic spike: • At each postsynaptic spike: w i → w i + A i pre + w out for i = 1..N , then the weights are clipped in [0,1].
We used δA pre = 0.01 and τ pre = 20ms, while w out and the LIF threshold θ were systematically varied (see below). The refractory period was ignored for simplicity.
We used a clock-based approach, and integrated Equations 16 and 17 using the forward Euler method with a 0.1ms time bin. The Matlab code for these simulations will be made available in ModelDB [Hines et al., 2004] once this paper is accepted in a peerreviewed journal.
We now describe the way the input spikes were generated. Between pattern presentations, the input spikes were generated randomly with a homogeneous Poisson process with rate f (see Table 1). The spike pattern with duration L = 100ms was generated only once using the same Poisson process (frozen noise). The pattern presentations occurred every 400ms (in previous studies, we demonstrated that irregular intervals did not matter [Gilson et al., 2011, Masquelier et al., 2008, so here regular intervals were used for simplicity). At each pattern presentation, all the spike times were shifted independently by some random jitters uniformly distributed over [−T, T ] (see Table 1).

Results: two optimal modes
The theory developed in the previous sections ignored the LIF threshold (a difference of unconstrained potential was maximized). But in the simulations, one needs a threshold to have postsynaptic spikes, necessary for STDP. Since we did not know which threshold values θ could lead to the optimal ∆t, we performed an exhaustive search over threshold values, using a geometric progression with a 1.1 ratio. Note that (from Equation 16) the threshold θ can be interpreted as the number of synchronous presynaptic spikes needed to reach the threshold from the resting potential if these spikes arrive through maximally reinforced synapses (w = 1).
We also used a geometric progression with a 1.1 ratio to search for w out . This parameter tunes the strength of the LTD relative to the LTP, and thus influences the number of reinforced synapses after convergence. For each θ × w out point, 100 simulations were performed with different random patterns, and computed the proportion p of "optimal" ones (see below for the definition).
The initial weights were computed such as V noise = θ + 2σ noise (leading to an initial firing rate of about 20Hz, see Fig. 6 top). After 500 pattern presentations, the synaptic weights converged by saturation. That is synapses were either completely depressed (w = 0), or maximally reinforced (w = 1), as usual with additive STDP [Gütig et al., 2003, Song et al., 2000, van Rossum et al., 2000. A simulation was considered optimal if the reinforced synapses did correspond to a set of afferents which fired at least once (Strategy #1) in a subsection of the pattern, whose duration had to match the optimal ∆t window of the pattern given in Table 1 (with a 10% margin). In practice this subsection typically corresponded to the beginning of the pattern, because STDP tracks back through the pattern [Gilson et al., 2011, Masquelier et al., 2008, but this is irrelevant here.
We found two optimal modes (see Fig. 6). The first one, with a high threshold (θ = 370) and strong LTD (w out = −3.5 10 −3 ) led to 1 postsynaptic spike at each pattern presentation (as in our previous studies [Gilson et al., 2011, Masquelier et al., 2008, 2009). For this mode, p = 51%. The second mode, with a lower threshold (θ = 250) and weaker LTD (w out = −1.6 10 −3 ) led to 2 postsynaptic spikes at each pattern presentation, and p = 87% (the lower threshold increases the probability of false alarms during the noise period, but this problem could be solved by requiring two consecutive spikes for pattern detection). Figure 6 illustrates an optimal simulation for both modes. We conclude that for most patterns, STDP can turn the LIF neuron into an optimal, or close-tooptimal pattern detector.
Detection is optimal only after convergence (i.e. weight binarization), which takes time (about 500 pattern presentations). This is because the learning rate we used is weak (δA pre = 0.01, in other words, the maximal weight increase caused by one pair of pre-and post-synaptic spike is only 1% of the maximal weight), as in other theoretical studies and in accordance with experimental measurements [Masquelier et al., 2008, 2009, Song et al., 2000, Yger et al., 2015. By using a higher rate, it is possible to converge faster, at the expense of the robustness. For example with δA pre = 0.02, convergence occurs in ∼ 250 pattern presentations, but p decreases to 44% and 80% for modes #1 and #2 respectively. In any case, it is worth mentioning that (suboptimal) selec-5 Discussion 9 tivity emerges way before convergence (e.g. around t ∼ 5s, or ∼ 12 pattern presentations in Figure 6).
Critically, for successful learning the pattern presentation rate must be high in the early phase of learning, before selectivity emerges. For example presenting the pattern every 800ms instead of 400ms leads to p = 33% and 43% for modes #1 and #2 respectively. Once selectivity has emerged, this rate has much less impact, since the neuron tends to fire (and thus changes its weights) only at pattern presentations, whatever the intervals between them.

Discussion
One of the main result of this study is that even a long pattern (hundreds of ms or above) is optimally detected by a coincidence detector working at a shorter timescale (tens of ms), and which thus ignores most of the pattern. One could have thought that using τ ∼ L, to integrate all the spikes from the pattern would be the best strategy. Instead, it is more optimal to use a subpattern as the signature for the whole pattern (see Fig. 5).
We also demonstrated that STDP can find the optimal signature in an unsupervised manner, by mere pattern repetitions. Note that the problem that STDP solves here is similar to the one addressed by the Tempotron [Gütig and Sompolinsky, 2006], which finds the best spike coincidence to separate two (classes of) patterns, by emitting or not a postsynaptic spike. Recently, the framework has been extended to fire more than one spike per pattern [Gütig, 2016], like here (e.g. Neuron #2 in Fig. 6). Yet these mechanisms require supervision.
In this work we only considered single cell readout. But of course in the brain, it is likely that a population of cells is involved, and these cells could learn different subpatterns (lateral inhibition could encourage them to do so [Masquelier et al., 2009]). If each cell is selective to a subpart of the repeating pattern, how can one make a full pattern detector? One solution is to use one downstream neuron with appropriate delay lines [Carr and Konishi, 1988]. Specifically, the conduction delays should compensate for the differences of latencies, so that the downstream neuron receives the input spikes simultaneously if and only if the subpatterns are presented in the correct order. Another solution would be to convert the spatiotemporal firing pattern into a spatial one, using neuronal chains with delays as suggested by Tank and Hopfield [Tank and Hopfield, 1987]. Such a spatial pattern -a set of simultaneously active neurons -can then be learned by one downstream neuron equipped with STDP, and fully connected to the neuronal chains, as demonstrated in Larson et al. [2010].
It is also conceivable that the whole pattern is detected based on the mere number of subpattern detectors' spikes, ignoring their times. Two studies in the human auditory system are consistent with this idea: after learning meaningless white noise sounds, recognition is still possible if the sounds are compressed or played backward [Agus et al., 2010], or chopped into 10ms bins that are then played in random order [Viswanathan et al., 2016].
Our theoretical study suggests that synchrony is an important part of the neural code [Stanley et al., 2012], that it is computationally efficient [Brette, 2012, Gütig andSompolinsky, 2006], and that coincidence detection is the main function of neurons [Abeles, 1982, König et al., 1996. In line with this proposal, neurons in vivo appear to be mainly fluctuation-driven, not mean-driven [Brette, 2012[Brette, , 2015. It remains unclear if other spike time aspects such as ranks [Thorpe and Gautrais, 1998] also matter.
Our results show that, somewhat surprisingly, lower firing rates lead to better signal-to-ratio. This could explain why average firing rates are so low in brain, possibly smaller than 1 Hz [Shoham et al., 2006]. It seems like neurons only fire when they need to signal an important event, and that every spike matters [Wolfe et al., 2010].  Cyan rectangles indicate pattern presentations. Next, we represented the weights corresponding to the rightmost time point in two different ways. First, we plotted the spike pattern, coloring the spikes as a function of the corresponding synaptic weight for each neuron: blue for low weight, purple for intermediate weight, and red for high weight. Initial weights were uniform (we used 0.68 for Neuron #1 and 0.47 for Neuron #2, in order to have V noise = θ + 2σ noise ). We also plotted the weight histogram for each neuron. (Middle) During learning. Selectivity emerges at t ∼ 5s, after ∼ 12 pattern presentations. Yet the weights still have intermediate values, leading to suboptimal SNR. (Bottom) After convergence. For both neurons, STDP has concentrated the weights on the afferents which fire at least once in a ∼ 23 ms long window, located at the beginning of the pattern. This results in 1 and 2 postsynaptic spikes for Neuron #1 and #2 respectively each time the pattern is presented. Elsewhere both V noise and σ noise are law, resulting in optimal SNR.