Time-warp invariant pattern detection with bursting neurons

Sound patterns are defined by the temporal relations of their constituents, individual acoustic cues. Auditory systems need to extract these temporal relations to detect or classify sounds. In various cases, ranging from human speech to communication signals of grasshoppers, this pattern detection has been found to display invariance to temporal stretching or compression of the sound signal (‘linear time-warp invariance’). In this work, a four-neuron network model is introduced, designed to solve such a detection task for the example of grasshopper courtship songs. As an essential ingredient, the network contains neurons with intrinsic bursting dynamics, which allow them to encode durations between acoustic events in short, rapid sequences of spikes. As shown by analytical calculations and computer simulations, these neuronal dynamics result in a powerful mechanism for temporal integration. Finally, the network reads out the encoded temporal information by detecting equal activity of two such bursting neurons. This leads to the recognition of rhythmic patterns independent of temporal stretching or compression.


Introduction
Acoustic signals have an intrinsically temporal structure. Animals and humans extract meaning from sound patterns based on precise temporal relations between acoustic events. A spoken word, for example, is identified by its sequence of phonemes. Interestingly, this identification appears to be independent of the absolute timescale and instead relies primarily on relative comparisons of event durations; understanding a spoken word is largely independent of the speed at which it is articulated [1]- [3].
This invariance appears to add substantial complexity to the auditory detection task. Nonetheless, similar phenomena can even be found for the small nervous systems of insects. Certain grasshopper species, for example, detect their species-specific acoustic communication signals largely independent of stretch or compression in time [4]. In the following, we will investigate a small, biologically plausible model of a neuronal network that achieves similar pattern detection performance.
To detect acoustic signals, a neural system has to cope with some fundamental challenges: 1 Temporal integration: relevant sensory information is contained in structures that are distributed over time, and the neural system thus needs to integrate this information, for example, to assess the duration between acoustic events [5]- [7]. 2 Information buffering: to combine and compare information from different periods of the stimulus, earlier information has to be buffered until the later information is available.
For the task of detecting spoken words, an elegant neural network model has been proposed by Hopfield and Brody [8]. The neurons in this model buffer information by virtue of sustained firing activity, and they integrate temporal information by displaying stereotypic activity profiles that represent the time since an acoustic event. For each detected pattern, it will then be a different set of neurons that have equal activity levels at some point in time, which causes them to transiently synchronize their spike times.
The authors refer to this mechanism as a 'many are equal' computation [9]: first, a stimulus is encoded so that it causes equal activity in a certain set of neurons; then, this equality is read out via, in this case, detection of synchronous activity. This strategy directly leads to invariance to the absolute temporal scale, which they term 'linear time-warp invariance'; for a stretched or compressed acoustic pattern, equality in activity will still occur, albeit at a different level of 3 activity. For the model proposed in this work, we will see that despite its composition of few neurons only, similar computational principles are at work.

Time-warp invariance in grasshopper song detection
Many grasshopper species recognize potential mating partners via the precise temporal structure of their acoustic communication signals, which are produced by rasping the hind-legs across the wings. This detection task shares with the word-recognition problem that it is performed in such a way that temporal stretching and compression of the acoustic signal affect the detection only little. In particular, this time-warp invariant signal detection has been studied in the grasshopper species Chorthippus biguttulus [4,10,11]. The male courtship songs consist of a rhythmic structure of alternating loud and quiet parts, commonly called syllables and pauses [10]. In accordance with time-warp invariant pattern detection, it is the ratio of the durations of these two acoustic features that appears to be the crucial parameter for eliciting a response by a female grasshopper, which then produces a song of her own. In behavioral experiments, this has been assessed by the response probabilities of female grasshoppers to artificial courtshipsong templates, which simply consist of loud and quiet parts of broadband noise. Varying the duration of the syllables and pauses of these acoustic signals revealed that such combinations were most effective that kept the ratio of these durations near 5 : 1, see figure 1. This time-warp invariant detection presumably allows the female to detect a male independent of his current body temperature, which affects the speed of his singing [4].
Despite this remarkable performance, the auditory system of the grasshopper is relatively small, probably containing on the order of tens of neurons at individual stages of processing [12]. Studies in C. biguttulus, for example, indicated that around 50-60 auditory receptor neurons at each ear transduce sound information [13] and project to the metathoracic ganglion, where preprocessing and filtering is thought to take place. From here, about 20-30 neurons on each side send the auditory information on to the brain [12,14]. These ascending neurons appear to be subdivided into neurons that carry information about sound location and about sound structure [15]. Interestingly, in contrast to auditory receptor neurons, ascending neurons generally do not track the acoustic patterns well [16]. They are therefore thought to represent the results of specific filtering operations that help solve the behavioral task of song detection. These neurons typically display transient activity [14]. One ascending neuron in particular, named AN12, has been implicated in song detection [16]; it marks the onsets of syllables with short and temporally precise spike bursts. Moreover, the neuron's response structure shows intriguing temporal-integration characteristics; in response to a song template, the number of spikes in the burst is proportional to the duration of the preceding pause [16].
Based on similar response characteristics-short and precise spike bursts that display temporal integration-we will here investigate a biologically plausible model that aims at solving the time-warp invariant grasshopper song detection with few neurons only. Key features of the model lie in the dynamics of individual neurons. These turn out to perform essential parts of the temporal processing based on generic aspects of neuronal membrane conductances, which can provide powerful dynamics for auditory signal processing [17]. It consists of broad-band noise that is modulated into periods of high sound intensity ('syllables') and low sound intensity ('pauses') and thus mimics the natural rhythmic structure of the courtship song. Female grasshoppers respond to natural songs as well as to these artificial templates by producing a song of their own. (b) Responsiveness of female grasshoppers to different combinations of syllable and pause durations. The circles show the approximate pause durations that gave the highest number of responses for different fixed syllable durations. The solid line is a regression line. The dashed line shows the region for which the response was at least about half of the maximum response. Panel (b) adapted and modified from [4] with permission (copyright Springer-Verlag).

Model for an intrinsically bursting neuron
In the following, we will introduce a four-neuron network model for detecting grasshopper song templates. Before discussing the full model, we will investigate some single-cell dynamics that will be essential for performing the necessary temporal stimulus integration and information buffering to solve this task. The single-neuron models will be based on the widely used leaky integrate-and-fire model, which describes the neuronal membrane as an RC-circuit with leak resistance R, resting potential V rest , and membrane capacitance C [18]. For an external input current I ext (t) (for example, synaptic input), the membrane potential V (t) is governed by the circuit's current equation Multiplying by R and introducing the membrane time constant τ m = R · C yields the dynamics of the membrane potential in a commonly used format:

5
In the absence of input, V (t) relaxes towards the resting potential V rest . When V (t) reaches the threshold potential V thresh > V rest , a spike is elicited, which is marked as a brief depolarization of the membrane potential and a subsequent reset to V reset < V thresh . To model the neuron's refractory period, we then fix the neuron's membrane potential at V thresh for the time T ref .
As a key element, we will enrich the integrate-and-fire model with dynamics that render the neuron intrinsically bursting so that-once triggered, for example, by brief external inputit can elicit a series of spikes at high rate. Intrinsic bursting is a commonly observed feature in neuronal systems [19,20], and a wide variety of biophysical mechanisms have been suggested to underlie burst generation [21]- [25]. Most of these rely on the interplay between fast positive feedback responsible for generating subsequent spikes after the first and a slower negative feedback that eventually terminates the burst [26]- [28]. We will here investigate such a model where the fast feedback is based on a depolarizing conductance that is triggered by the neuron's own spiking activity, whereas the slow feedback results from accumulation of a shunting adaptation conductance. To start, we first study the neuron with the positive feedback conductance alone before adding the adaptation dynamics.

Fast positive feedback
In analogy to persistent sodium currents [20,29], we include a positive feedback conductance g p (t) in the leaky integrate-and-fire model by adding a term R · g p (t) · (V p − V (t)) to equation (2). Between spikes, the conductance g p (t) is assumed to relax exponentially back to zero with a time constant τ p . When a spike occurs, g p (t) is set to the fixed value g (0) p . In general, the resulting current could be mediated by sodium or calcium influx into the neuron [30]- [32] so that the reversal potential V p is substantially higher than the threshold potential V thresh . Strong activation of this conductance will therefore drive the neuron towards spiking.
Let us briefly analyze the characteristics of this neuron model. Like many systems with positive feedback, this extended integrate-and-fire model displays bistability; for a constant external input I ext (t) = I 0 , it can either remain quiescent or show continuous spiking activity. The range of input levels I 0 for which a quiescent state exists can be easily found. In the absence of any spikes, g p (t) will be zero, and for constant input I 0 , the voltage will assume the constant level V rest + R · I 0 , which, for consistency, must be smaller than the threshold V thresh . For I 0 < (V thresh − V rest )/R, the neuron can thus stay in a quiescent state; for stronger inputs, firing will be initiated.
If, again under constant input I 0 , the neuron is spiking, each spike leads to a reset of V (t) and g p (t) so that firing must be periodic. The neuron is refractory for T ref and then takes some time T to again reach V thresh and elicit the next spike. The firing rate ν is thus given by ν = 1/(T + T ref ). To further analyze this active state, we can obtain an implicit equation for T by using an approximation for the positive feedback. We replace the voltage-dependent Such an approximation is often used in neural network theory and essentially means that we are substituting the usual conductance change induced by a synapse by a fixed current input ('current-synapse approximation'). This allows us to gain some intuition about the neuronal dynamics by analytical manipulations of the membrane-potential equation. For the subsequent computer simulations, however, we will return to the biologically more accurate description with synaptic conductances. 6 With the current-synapse approximation and the fact that g p (t) has decayed from g (0) p to g (0) p · exp(−T ref /τ p ) at the end of the refractory period, we obtain the dynamics of the membrane potential as where t = 0 marks the end of the refractory period and is the feedback current at the end of the refractory period. With initial condition V (0) = V reset , equation (3) can be solved analytically to yield For the often convenient case τ m = τ p = τ , the last term simply becomes R · I (0) p · t/τ · e −t/τ . For the case without feedback, I (0) p = 0, equation (5) reverts to the standard singleexponential relaxation that crosses threshold if I 0 > (V thresh − V rest )/R. The feedback term, however, may induce a threshold crossing of V (t) even for smaller values of I 0 . The time T for reaching threshold-and thus occurrence of the next spike-must satisfy the consistency equation obtained by setting equation (5) equal to V thresh , which can be written as We can graphically observe the different scenarios for obtaining solutions by plotting the left-and right-hand side of equation (6) as a function of T for different values of I 0 , see figure 2(b). One finds that by increasing g (0) p and making the feedback large enough, one can obtain solutions-and thus sustained spiking activity of the neuron-even for negative input currents I 0 . The range of I 0 for which two solutions are obtained corresponds to the range of bistability; the neuron can be active or quiescent. Note, though, that there is only one active state; the second solution of equation (6) is non-physical in the sense that it corresponds to a second threshold crossing by V (t) (now crossing threshold from above), which does not occur because the first crossing already elicits a spike and terminates the dynamics by resetting the membrane potential. Figure 2(c) shows the numerical solutions of equation (6) together with steady-state firing rates obtained from simulations of the full neuron dynamics (without the current-synapse approximation). In either case, a large range of bistability is visible where the neuron can remain quiescent or regularly and strongly active, depending on initial conditions. Note that while the current-synapse approximation captures the general structure of the full model's activity states very well, details apparently differ, such as the minimum current that can sustain the active state.

Slow negative feedback
Many neurons display fatigue and adaptation during prolonged activity [33,34]. When adaptation results from the neuron's own spiking, this acts as negative feedback [35]. Biophysically, this may result, for example, from recruitment of slow, hyperpolarizing  (6), marked by a circle. The solid line also crosses the zero line (thin dashed line), which corresponds to threshold crossing in the case without feedback, I (0) p = 0. This means that the current input is supra-threshold so that the neuron cannot be quiescent at this input level. For intermediate input I 0 , the two intersections correspond to a true solution for smaller T (marked by a circle) and a nonphysical solution for larger T (marked by a cross). The solid line does not cross the zero line; the input is thus sub-threshold so that the neuron can be either firing or quiescent. For large negative I 0 , no solution of equation (6) exists. The input is sub-threshold, and the neuron must be quiescent. (c) Activity states for different inputs levels I 0 . Numerical solutions of equation (6) yield the active state where the neuron is regularly firing at high rate and the non-phyiscal solutions, which correspond to a second threshold crossing. The quiescent state exists as long as the input is sub-threshold. A similar layout of states is found in simulations that take the full dynamics of the feedback conductance into account, without the current-synapse approximation (circles). The following parameters were used: τ m = τ p = 5 ms, V rest = V reset = −60 mV, V thresh = −40 mV, R = 100 M , T ref = 0.5 ms, g (0) p = 60 nS and V p = 0 mV. potassium currents or inactivation of sodium currents. In analogy to the potassium currents, we here model slow feedback adaptation by including an adaptation conductance g a (t) to the bistable integrate-and-fire model. Temporal integration based on slow conductances may indeed be a general feature of auditory systems [36]. The reversal potential V a of this conductance is typically near the Nernst potential for potassium and thus substantially smaller than the threshold V thresh . Activation of the conductance g a (t) therefore inhibits spiking and drives the neuron towards quiescence. Between spikes, we assume that g a (t) exponentially relaxes to zero with a time constant τ a that is considerably larger than τ m and τ p . For every spike, g a (t) is incremented by a fixed value g (0) a so that the effect of 8 adaptation can accumulate over many spikes. In addition, we assume that the only other input into the neuron consists of short trigger currents I trigger (t), suited to induce a transition from the quiescent to the active state within the bistable regime, and that a white noise source η(t) acts on the membrane potential. The trigger signal is used to mark the occurrence of temporal events that are characteristic for an acoustic signal, such as sound onsets. From the model's perspective, its purpose is simply to push the neuron into its active state when such an event occurs. Trigger signals could be mediated by neurons that encode the basic stimulus structure upstream of the read-out network, and their use is motivated by the abundance of onset (and to a lesser degree offset) detectors in auditory systems [37]. Mechanisms that are thought to mediate onset detection include synaptic depression [37,38] and post-onset inhibition [39]. In the grasshopper auditory system, the aforementioned precise and brief activity of some neurons at sound onset give evidence of such trigger signals, whereas sign inverting neurons that are inhibited by sound could support marking offset events [12]. The full dynamics of the model neuron between spikes and with trigger inputs at times t n are thus given by: I trigger (t) = I (0) trigger · e −t/τ trigger , for t > 0.
When V (t) reaches V thresh , a spike is elicited and membrane potential and conductances are adjusted according to After a spike, V (t) is kept fixed at V reset for the time T ref . When this neuron model receives a sequence of trigger signals at varying intervals t, it responds with a short burst of several spikes at high rate for each trigger signal, see figure 3. The dynamics of this model can be visualized by treating the current induced by the adaptation conductance as an input I ext (t) to the bistable dynamics that are described by the diagram of figure 2(c). This is shown in figure 3(d). When the neuron is triggered into the active state (phase 1 in figure 3(d)), it fires a sequence of spikes at high rate, and adaptation builds up with each spike (phase 2). This will lead to a stronger negative adaptation current, which eventually reaches the point where the bistable region ends, beyond which only the quiescent state exists. The dynamics of the neuron will thus transition into the quiescent state at a nearly fixed level of g a = g crit a (phase 3), and firing will cease. As long as no other inputs arrive, the neuron will remain in this quiescent state while the adaptation current slowly decays (phase 4). The next  Membrane potential and adaptation conductance in response to a sequence of trigger signals at varying intervals. The neuron fires bursts of spikes that are accompanied by rapid increases of the adaptation conductance g a . Note that the peaks of g a always reach about the same level. (c) Enlarged section of the neuron's membrane potential, which exemplifies that shorter intervals are followed by fewer spikes, longer intervals by more spikes. (d) Comparison of the activity states of the bistable neuron (same as in figure 2(c)) with the evolution of g a and of the activity level in the simulation. The activity level at each point in time is estimated by the inverse of the surrounding inter-spike interval (ISI). The dynamics can be understood by separation into four phases: (1) transition into the active state by a trigger input, (2) firing at high rate while g a accumulates, (3) transition at approximately a fixed level of g a into the quiescent state, (4) recovery from adaptation while in the quiescent state. Parameter values were the same as in figure 2, and in addition: τ a = 150 ms, g (0) a = 7 nS, V a = −60 mV, I (0) trigger = 4 nA and τ trigger = 1 ms. The intervals between trigger events were here drawn randomly from a uniform distribution between 10 and 300 ms. trigger signal can then induce a new transition into the active state, for which the starting level of g a depends on the recovery time since the last trigger signal.
The decay of adaptation thus performs the temporal integration; it marks the time that has elapsed since the last trigger signal. This information is then read out by how many spikes can be elicited before g crit a is again reached. An important aspect of this encoding mechanism is that the transition into the quiescent state essentially functions as a reset, setting g a to g crit a , so that the state of the neuron is, to a good approximation, independent of any history prior to the last trigger signal. (The fact that each spike generates a finite addition g (0) a to the adaptation conductance, however, leads to shot noise-like fluctuations in the level of g a .) This reset thus starts an unbiased clock that times the duration between trigger signals. The duration of the burst itself is here negligible compared to the quiescent period between the trigger signals, given that the build-up of adaptation during the active state is rapid compared to its decay.
The burst of spikes that follows the new trigger signal will then contain as many spikes as required to bring g a back up to g crit a . The number of spikes therefore depends on how much time g a has had to recover and thus encodes the time t that has passed between the two trigger signals, see figure 4. Such a relationship between spike number in a burst and preceding interburst interval has been observed in the developing chick spinal cord [40] where a model based on the same principle dynamics, fast and slow feedback, has been successfully applied [41], although based on network feedback, not cell-intrinsic mechanisms.
An important aspect of this encoding scheme is that the timescale of the temporal integration is set by the adaptation time constant τ a . This can also be seen by the following argument: the number of spikes N that a burst contains is determined by how many incremental steps of g (0) a it takes to bring g a back up to the critical value g crit a . N is thus a function of the level of g a at the time of a new trigger signal, which itself is a function of the preceding interval t between trigger signals, g a = g crit a · exp(− t/τ a ). It therefore follows that N is a function of t/τ a , N = f ( t/τ a ). In figure 4, this is demonstrated by simulations with different values of τ a . The fact that τ a is essentially a scaling parameter of the temporal integration will be important below when we will use bursting neurons with different τ a .
There are, of course, limits to the range over which temporal intervals are well encoded by the spike count in the burst. This is apparent in the fact that the spike count t/τ a levels off at high t/τ a in figure 4, which occurs because g a (t) has practically already decayed to zero after some time and further waiting for the next trigger signal hardly changes its level. Also, for very short intervals t, resolution is limited by the fact that each trigger signal leads to a minimum response, here three spikes. Thus, naturally, temporal integration works best for timescales similar to the adaptation constant.
In this intermediate regime, the resolution of this encoding scheme is limited by the fact that the spike numbers come as discrete integer values. Each spike number therefore corresponds to a range of t/τ a , as apparent in figure 4(b). These ranges overlap because of the noise in the model and some residual dependence on previous spiking history (for example, the shot noise induced by the previous burst). Barring noise sources, the resolution depends on the maximum number of spikes that can appear in a burst (here around 15). The more potential spikes, the more discrete levels that can be used for encoding t/τ a . Larger spike counts can, in principle, be achieved by reducing the negative feedback g (0) a . On the downside, however, bursts with more spikes last longer, which itself reduces the minimal temporal interval between discernible bursts. The strength of the negative feedback must thus represent a compromise between maximizing resolution and minimizing burst duration. shown depending on the preceding interval t between trigger signals. For each time constant, the spike count encodes t well, but with a different functional relation. On the right, the abscissa is rescaled by τ a for each dataset, revealing that a common function f ( t/τ a ) underlies the relation between spike count and interval length. All parameters were the same as in the simulation for figure 3, except for the variations in τ a .

Network for grasshopper song detection
With the temporal integration properties of the bursting neurons in place, we can now construct a network that applies these features to the problem of the grasshopper song detection where communication song templates such as those shown in figure 1(a) with syllable duration s and pause duration p form the inputs. We do so by assuming that both the onset and offset of a high-intensity sound, such as a syllable, lead to trigger signals, which may be provided by two neurons that briefly burst at sound onset and offset, respectively. The trigger signals are received by two bursting neurons (neurons 1 and 2 in figure 5); one of them receives the trigger only at the onset of the syllable, the other at both onset and offset. The first neuron is thus sensitive to the period s + p of the repetitive song signal; the second neuron encodes the pause duration p at the syllable onset and the syllable duration s at its offset. At syllable onset, both neurons are triggered and fire bursts with spike numbers of N 1 and N 2 , respectively. Note that neuron 1 functions as an information buffer for the syllable duration s. Analogous to the 'many are equal' computation applied in the Hopfield-Brody network, we here use the approach that detection should take place if the activities of the two bursting neurons are equal, N 1 = N 2 . The task of the other neurons in the network is to detect this equality, which will be discussed below.
According to the earlier scaling considerations, the adaptation time constant τ a sets the scale of temporal integration. If the adaptation time constants of the two bursting neurons are τ a,1 and τ a,2 , respectively, we find that N 1 and N 2 are approximately given by N 1 = f ((s + p)/τ a,1 ) and N 2 = f ( p/τ a,2 ) with the same function f for both neurons. N 1 = N 2 thus follows if τ a,1 /τ a,2 = 1 + s/ p. Therefore, equal activity of the two neurons occurs for a given ratio of syllable and pause duration, and this ratio depends on the ratio of the adaptation time constants. If, for example, the network should detect a ratio of s : p = 4 : 1, the appropriate ratio of Figure 5. Schematic drawing of the four-neuron network for grasshopper song detection. Neurons 1 and 2 are bursting neurons with adaptation time constants τ a,1 and τ a,2 , respectively. Neurons 3 and 4 are standard leaky integrate-andfire models without the feedback conductances g p and g a . The output of the network is given by the activity of neuron 4, which should fire for the correct ratio of syllable and pause duration. Neuron 1 is excitatory, neurons 2 and 3 are inhibitory. Inputs into the network are given by short trigger signals at the onset and offset of the song syllables. For neurons 1 and 2, bursts are triggered; neuron 4 receives input at syllable onset with a short delay τ delay , which drives it across the threshold if neurons 1 and 2 provide balanced excitation and inhibition. If excitation from neuron 1 is stronger than inhibition from neuron 2, neuron 3 becomes active and provides strong inhibition, which shuts off neuron 4. the adaptation time constants is τ a,1 : τ a,2 = 5 : 1. As the ratio of syllable and pause duration is independent of the absolute timescale, the foundation is laid out for time-warp invariant detection.
The task now is to read out the activity of the two bursting neurons so that the network detects when their activity is equal. We here model this read-out of equal spike count by a balanced push-pull mechanism between excitation and inhibition, but other schemes are also feasible. To do so, we connect two simple (non-bursting) leaky integrate-and-fire neurons (neurons 3 and 4 in figure 5) to our two bursting neurons. Neuron 3 mediates inhibition; neuron 4 functions as the network output and signals by its activity detection of the preferred song structure. All synaptic connections are modeled by adding terms to the differential equation for the membrane potential. The synaptic reversal potential V syn for excitatory synapses is V syn,exc > V thresh , and for inhibitory synapses it is V syn,inh < V thresh . The synaptic conductances are modeled by an exponential rise and decay after each presynaptic 13 spike at time t sp : Again, for τ rise = τ dec = τ , this expression simplifies to g (0) syn · (t − t sp )/τ · e −(t−t sp )/τ . The synaptic strengths g (0) syn are set independently for excitatory and inhibitory synapses at each neuron so that a single spike yields about equally effective inhibition or excitation.
Excess activity of either neuron 1 or 2 results in inhibition of the output neuron; only balanced activity between those two neurons lets the output neuron become active. Figure 6 shows sample voltage traces of all four neurons in the network for a syllable duration s = 120 ms and three different pause durations p = 20, 35 and 50 ms. Only for the intermediate pause duration do the two bursting neurons have the same spike count at syllable onset, leading to spiking of the output neuron; for the short pause, neuron 2 has fewer spikes than neuron 1, which leads to activity of neuron 3 and therefore inhibition to neuron 4; for the long pause, neuron 2 has more spikes than neuron 1, so that inhibition for neuron 4 is stronger than excitation.
We can test the performance of the network model by running simulations for many different combinations of syllable and pause duration and counting the spikes that are elicited by the output neuron 4. This is shown in figure 7 for three different combinations of adaptation time constants τ a,1 and τ a,2 with ratios of 3 : 1, 5 : 1 and 9 : 1, respectively. In each case, the successful time-warp invariant detection is visible by the diagonal region of syllable-pause combinations that yield strong activity of the output neuron. As expected, the preferred ratios of syllable and pause duration are different in the three cases and indeed match the prediction that follows from the ratios of the adaptation time constants. The resolution of the detector manifests itself in the width of these diagonal response regions. This width results from a combination of the internal noise and the limited resolution of the bursting neurons 1 and 2 (cf figure 4). As discussed above, their integer spike counts are unaffected by slight changes in the duration of the integrated temporal window. Consequently, the read-out is insensitive to variations in syllable and pause durations over some range. More precisely, the diagonal response regions are composed of smaller response fields that line up along the diagonal and that each correspond to a different number of spikes that is produced by both neurons 1 and 2. These response fields merge into each other, however, because of noise.

Conclusions
Even the small auditory systems of insects are capable of astonishing pattern detection performances [14]. The limited size of these systems makes them an appealing model system for an attempt to study principles of temporal integration in detail [42]. The network model presented here can be viewed as a proof of principle that small networks with few neurons can perform such a highly complex task as the time-warp invariant detection of rhythmic sound patterns. The network applies only generic neuronal characteristics, such as integrateand-fire dynamics, conductances that provide positive and negative feedback, and a balance of excitatory and inhibitory inputs. Nonetheless, complex and computationally useful dynamics arise, including bistability, intrinsic bursting, and reliable temporal integration over tens and hundreds of milliseconds. Despite this complexity, the network is easy to tune; by adjusting only the adaptation time constants as key parameters, the target ratio of acoustic feature durations can s=120 ms, p=20 ms s=120 ms, p=35 ms s=120 ms, p=50 ms The parameters for all neurons were the same as used for figure 3 except for τ a,1 = 300 ms and τ a,2 = 60 ms, the omission of the feedback conductances for neurons 3 and 4, and an increase of τ m to 30 ms for neuron 4. The parameters for modeling the synapses were: V syn,exc = 0 mV, V syn,inh = −60 mV, g (0) 1 to 3 = 50 nS, g (0) 1 to 4 = 200 nS, g (0) 2 to 3 = 110 nS, g (0) 1 to 4 = 440 nS, g (0) 3 to 4 = 500 nS, τ delay = 7 ms, τ rise = 0.5 ms and τ dec = 3 ms, except for synapses on to neuron 4, for which τ rise = τ dec = 5 ms. easily be chosen. For a biological system, this has the advantage that simple learning rules may suffice to bring the network towards correct detection of behaviorally relevant signals.
The signal detection strategy of the network is based on comparing the duration of the pause to the preceding syllable. The network could thus also detect signals that contain a wide variety of syllable and pause durations as long as locally neighboring syllables and pauses have the correct ratio. In fact, this results in a more general type of time-warp invariance than the Performance of the four-neuron network for grasshopper song detection for three different ratios of τ a,1 and τ a,2 . In each case, τ a,2 was set to 60 ms, while τ a,1 was set to 180 ms (left), to 300 ms (center), or to 540 ms (right). The ratios τ a,1 : τ a,2 of 3 : 1, 5 : 1 and 9 : 1 correspond to optimal s : p ratios of 2 : 1, 4 : 1 and 8 : 1, respectively, which are displayed by the dashed lines. The performance of the network is measured by the number of spikes of neuron 4 over ten periods of the sound signal for many different syllable and pause durations. In each case, the diagonal region of strong activity shows that the network performs time-warp invariant detection of the syllable-pause structure, and the detected syllable-pause ratio is close to the expectation obtained from the ratio of the adaptation time constants. All parameters, except for the adaptation time constants and the syllable and pause durations, were the same as in the simulations for figure 6. linear scaling applied to the full signal. Moreover, this finding can be used as a prediction for behavioral experiments with grasshoppers in which the detection of such non-globally warped patterns is tested.