Recurrent inhibition in striatum enables transfer of time-flexible skills to basal ganglia

The basal ganglia play a major role in directing learned action sequences such as reaching and pressing, particularly in cases where precise timing plays an important role, as shown in experiments on rodents performing delayed lever press tasks. A comprehensive understanding of the relative roles of cortex and basal ganglia in learning and performing such behaviors, however, is still lacking. Inspired by recent experimental results showing that motor cortex is necessary for learning certain types of motor sequences but not for performing them once learned, we develop a model of the striatum, the major input structure of the basal ganglia, in which recurrently connected inhibitory neurons receive cortical input. An anti-Hebbian plasticity rule at the recurrent synapses in the striatum allows our model to learn a sparse, sequential pattern of neural activity similar to the patterns observed in experimental population recordings. After learning, the network can reproduce the same dynamics autonomously without patterned cortical input, and can further speed up or slow down the activity pattern simply by adjusting the level of tonic external input. The general mechanisms used in this model can also be applied to circuits with both excitatory and inhibitory populations, and hence may underlie sequential neural activity patterns that have been observed throughout other brain areas in addition to basal ganglia.


Introduction
Anyone who has ever practiced a sport or played a musical instrument is familiar with the experience of learning to perform an action. This is difficult at first and requires concentration, but through repetition the action becomes increasingly automatic, until it requires hardly any conscious control. What are the neural mechanisms that might underlie such changes in our ability to perform actions? Cortical lesion studies have revealed the capacity of subcortical structures to direct a large repertoire of movements, particularly "innate" movements that don't require dexterous limb motion [1,2,3,4]. On the other hand, however, motor cortex is also known to play a major role in both the acquisition and control of motor behaviors such as task-specific limb movements in both primates [5,6,7,8,9] and rodents [10,11,12,13]. However, the precise roles and interactions of cortical and subcortical brain areas in controlling movement during skill learning have not been entirely determined. As a step toward addressing this, one set of recent studies showed that rats are unable to learn precisely timed lever-press sequences when motor cortex is lesioned, but are able to successfully perform the sequence if the lesion occurs after the learning has already taken place [14,15]. It was therefore suggested that motor cortex may act as a "tutor" to subcortical brain circuits during learning, and that these lower brain circuits eventually internalize the pattern of neural activity necessary to control movement, allowing them to drive behavior without receiving further instruction from the tutor once the 1 behavior has been learned [14]. In the work that follows, we develop a model of such a tutor-to-student circuit involving cortex and striatum, with a focus on reproducing the sequence-like neural activity patterns that are characteristic of population activity in striatum.
Given that the striatum collects inputs from many areas of cortex and thalamus, plays an important part in controlling movement via its projections to the output structures of the basal ganglia, and has a central role in reinforcement learning [16], it is natural to focus on this brain structure as a likely student to be tutored by motor cortex. Medium spiny neurons (MSNs), which constitute over 90% of the neurons in striatum [17], exhibit stereotyped sequential firing patterns during learned motor sequences and learned behaviors in which timing plays an important role, with sparse firing sequences providing a seemingly ideal representation for encoding time and providing a rich temporal basis that can be read out by downstream circuits to determine behavior [18,19,20,21]. Such neural activity has been shown in rodents to strongly correlate with time judgement in a fixed-interval lever-press task [21], and with kinematic parameters such as the animal's position and speed in a task in which the animal was trained to remain on a treadmill for a particular length of time [20]. Such sequences have even been shown to flexibly dilate or contract by up to a factor of five in proportion to the time-delay interval for obtaining a reward [19]. In addition, pharmacological attenuation of neural activity in dorsal striatum has been shown to impair such motor sequence behavior [19,20]. Together, these results suggest that such firing patterns in striatum are likely to play a causal role in an animal's ability to perform learned motor sequences.
The above considerations lead to two basic questions. First, how can sparse sequential activity patterns such as those observed experimentally be obtained given the circuitry of the striatum, which is characterized by recurrent inhibition? And second, how can control of the dynamics be passed from the cortex to striatum over the course of learning? In this paper, we propose a model for how striatum could be structured in order to obtain the required dynamics in a robust and flexible way. A key element is the presence of synaptic depression at the inhibitory synapses between MSNs, which has been shown to exist experimentally [22] and which competes with the effect of feedforward excitatory input to determine the rate of switching of activity from one neuron cluster to the next. By adjusting the relative levels of these parameters, it is possible to dilate or contract the time dependence of neural activity sequences by an order of magnitude or more. Furthermore, we show that our striatal model can encode multiple sequences that can be expressed individually or pieced together into longer sequences as selected by the external input. Finally, we address learning by introducing an anti-Hebbian plasticity rule at the synapses between MSNs, and show how this enables the circuit to obtain the desired structure and internalize the dynamical activity pattern, so that temporally patterned input from cortex eventually becomes unnecessary as the behavior is learned. We also note that, as shown in detail in the Supplemental Materials, the same mechanisms can be applied to circuits with both excitatory and inhibitory units, and hence may provide an explanation for the sequential firing patterns that have been observed in other brain areas including hippocampus [23,24,25] and cortex [26,27].

Synaptic depression enables temporally controllable sparse activity sequences in striatum
Experimentally observed population activity patterns in striatum during learned behaviors are sparse and sequential, and these are the main features that we want our model network to exhibit in a robust manner. In order to achieve sparse activity, we make use of the well-known fact that recurrent inhibition can lead to a winner-take-all steady state in which a single unit or group of units (where a unit consists of a cluster of MSNs) becomes active and inhibits the other units in the network from becoming active. Indeed, recurrent inhibition is a hallmark feature of MSNs in striatum, and such a picture has previously been suggested to apply to striatum [28,29,30]. Although individual inhibitory synapses between MSNs are relatively sparse and weak on the scale of the currents needed to drive spiking in these neurons [31,32], active populations of many MSNs firing together may more effectively mediate suppression between populations, in particular if these populations are also receiving sufficient background excitation from cortex and/or thalamus to keep them near the firing threshold [33,34,35], possibly in a metastable depolarized "up state" [36].
In addition to sparse activity, our model also requires a mechanism by which the activity can be made to switch from one unit to another, otherwise the network would lock into a single winner-take-all state. While other mathematically similar approaches are possible (see Supplemental Materials for discussion), in this paper we propose that this mechanism is short-term plasticity in the form of depressive adaptation at synapses between MSNs. Such synaptic depression has in fact been observed experimentally [22]. The effect of synaptic depression is to weaken the amount of inhibition from an active unit onto inactive units over time. If all units also receive constant external excitatory input, then eventually the inhibition may weaken sufficiently that the net input to one of the inactive units becomes positive, at which point the activity switches to this unit, and it begins to inhibit the other units. This competition between synaptic depression and the level of external input is the basic mechanism that determines the dynamics of activity switching. In particular, adjusting the level of external input can change the duration of time that it takes for activity to switch from one unit to the next, thus providing a mechanism for controlling the speed of an activity sequence in a robust manner without requiring any change in intrinsic properties of the neurons or temporally precise input to the network.
The dynamics of x i (t), which we think of as describing the activity level of a cluster of MSNs, and the associated synaptic depression factors y j (t) in our network model are described by the following equations: The first equation describes the activity of unit i as being determined by a nonlinear function acting on recurrent and external inputs. The recurrent synapses are inhibitory, with weights W ij ≤ 0, and the external input is excitatory, with x in i ≥ 0. For concreteness, we take the nonlinear function to be the sigmoidal function φ(x) = 1/(1 + e −λx ), where λ is a gain parameter. The second equation in (1) describes the dynamics of synaptic depression, where the dynamic variable y j (t) represents the depression of all outgoing synapses from unit j with characteristic timescale τ y . The first term on the right-hand side of the equation drives y j to attain a resting state value of 1 if the presynaptic unit j is inactive, so that the synapse is fully potentiated. If the presynaptic unit becomes active, with x j ≈ 1, then the second term drives y j to be β, where 0 ≤ β ≤ 1, so that the synaptic weight depotentiates to a finite minimum value when the presynaptic unit is active.
As described above and discussed in detail in the Supplemental Materials, the model defined by (1) exhibits activity switching between units due to competition between the two terms in the argument of the nonlinear function φ(x). The second term is a positive external input, which tends to make x i active. The first term is a negative input from other units in the network, and becomes weaker over time as other units remain active due to decreasing synaptic weight W ij y j (t). When the first term eventually becomes smaller than the second, the net input becomes positive, causing x i to become active and begin to inhibit other units.
In Figures 1a-1b, we show a striatal model that is fully connected by inhibitory synapses, where all off-diagonal elements have the same inhibitory weight (-1) except for those connecting unit j to unit j + 1, which are depotentiated by an amount η. This means that if unit j is currently active, then unit j + 1 will become active next since it experiences the least amount of inhibition. Figure 1(c)-(d) show that the expected sequence of activity (which is repeating due to the fact that we also depotentiate the weight between the last and first units) is indeed obtained in such a network, and that the magnitude of the constant external input can be used to control the rate of switching. The period of the activity sequence slows down tremendously as x in approaches the synaptic depression parameter β. This slowing down allows for the temporal dynamics to be smoothly and reliably controlled, providing a potential mechanism consistent with recent experiments showing dramatic dilation of the time-dependence in population recordings of striatal neurons [19], without requiring new learning of the synaptic weights from one trial to the next. While an infinite range of dynamical scaling can be obtained in the idealized limit of τ /τ y → 0 and λ → ∞, Figure 1(d) shows that attaining both very long and short switch times T is possible even away from this idealized limit. Finally, Figure 1(e) shows that a substantial dynamical range of sequence speeds can be obtained even if control over the precise value of the input x in is limited, as may be the case do to noise in the system, preventing extremely slow and extremely fast sequence speeds.

Targeted external input selects which of several sequences striatum expresses
We can extend the model described so far to multiple-and even overlapping-behaviors by positing that the external input from cortex and/or thalamus targets the particular subset of MSNs needed to express a particular behavior. If multiple sequences are encoded in the weights between different populations of MSNs, then the external input can be thought of as a "spotlight" that activates the behavior that is most appropriate in a particular context, with the details of that behavior encoded within the striatum itself, as shown in Figure 2. These subpopulations may even be partially overlapping, with the overlapping portions encoding redundant parts of the corresponding behaviors. In this way, a wide variety of motor behaviors could be encoded without requiring a completely distinct sequence for every possible behavior. This model dissociates the computations of the selection and expression of motor sequence behaviors. The inputs to striatum select a sequence (possibly composed of several subsequences) by targeting a certain subpopulation of MSNs, and then the striatum converts this selection into a dynamical pattern of neural activity that expresses the behavior in time.
It is known that synapses onto striatal neurons from cortex are potentiated during the reward-based learning of motor behaviors in rodents, making these synapses a likely site for reinforcement learning [37,38,39]. If a behavior leads to a greater-than-expected reward, then a (possibly dopamine-mediated) feedback signal can cause the recently active corticostriatal synapses to be strengthened, making that behavior more likely to be performed in that particular context in the future, lowering the threshold for activation and possibly speeding up the activity sequence underlying a desired behavior, making the basal ganglia circuit important for controlling the "vigor" associated with movements [40,41]. The scenario just outlined can be viewed as a generalization of recent models of reinforcement learning in mammals [42] to behaviors with temporally rich structure. Again using the spotlight analogy, it is also easy to see how multiple behaviors can be concatenated if the cortical and/or thalamic inputs activating the appropriate neuron assemblies are active together or in sequence. This provides a natural mechanism by which "chunking" of simple behaviors into more complex behaviors might take place in the striatum [43,44].  Anti-Hebbian plasticity enables sequence learning A striatal network with initially random connectivity can learn to produce sparse sequential activity patterns when driven by time-dependent cortical input. We again consider a network described by (1), but now with distinct time-dependent external inputs x in i (t) to each unit i, and dynamic synaptic weights described by wherex j is the activity of unit j, low-pass filtered over a time scale τ w , and α 1 and α 2 control the rates of learning. Roughly,x j (t) will be nonzero if unit j has been recently active over the time window from t − τ w to t. The first term in (2) thus causes W ij → 0 if postsynaptic unit i is active together with or immediately following presynaptic unit j. Otherwise, if j is active but i is not active, the second term causes W ij → −1.
Equation (2) thus describes an anti-Hebbian learning rule, according to which synapses connecting units that fire together or in sequence are depotentiated, while others are potentiated. Figure 3 shows that sequences can develop in the network when it is subjected to several cycles of timevarying external input. Figure 3(a) shows that a network initially having no special features in its connectivity matrix can acquire such structure through anti-Hebbian plasticity by subjecting it to a repeated sequence of pulse-like inputs, which induce a regular pattern of sequential activity. In Figure 3(b), the regular input to each unit is replaced by a superposition of sinusoids with random amplitudes and phase shifts. (We presume that in the brain cortical input to the striatum is structured in a meaningful way rather than random, determined by a reinforcement learning process that we are not modeling explicitly. Our use of random input here, however, illustrates that robust sequences emerge naturally in striatum even in the case where the input is not highly structured.) This input leads to a particular activity sequence in the network, with only one unit being active at a given time due to the inhibitory competition between units. Meanwhile, synaptic weights between sequentially active units are depotentiated by anti-Hebbian learning, eventually leading to a weight matrix (labeled "t = 500τ y " in Figure 3(b)) in which each unit that is active at some time during the sequence is described by a column with a single depotentiated entry, which corresponds to the next unit to become active in the sequence. Figure 3(c) illustrates that it is also possible to train a network that has previously learned one sequence to produce a new unrelated sequence. The evolution of the synaptic weights during the learning and relearning phases is illustrated in Figure 3(d).
After the network has been trained in this way, it is able to reproduce the same pattern of activity even after the time-dependent input is replaced by a constant excitatory input to all units. This is similar to the network model studied in above, although now with the active units appearing in random order. Figure 3(e) shows that, as in the earlier network model, the level of external input can be used to control the speed of the activity sequence, with the dynamical range spanning more than an order of magnitude. Finally, Figure  3(f) shows that, again replacing the time-varying input with constant external input to all units, the activity pattern and sequence speed in this trained network are robust with respect to random perturbations of the weights W ij . For comparison, we show in Figure S1 that performance is severely degraded by comparable perturbations in a more generic trained recurrent neural network.
Previous models have shown that neural activity sequences can emerge from initially unstructured networks of excitatory neurons via spike-timing-dependent plasticity (STDP) [45,46,47]. Compared with these earlier works, our model has the advantage of being able to dynamically adjust the speed of the activity sequence, as shown in Figure 3(e) (cf. however Refs. [46], [48], and [49], where some temporal rescaling in activity patterns has been obtained using distinct mechanisms). In addition, our model does not require the assumption of heterosynaptic competition limiting the summed synaptic weights into and out of each unit, as in Ref. [45].
Taken together, the above results show, within the context of a highly simplified network model, that timevarying input can lead to robust activity sequences, but that this input is no longer necessary once the circuit Circuit activity: Weights before learning: Weights after learning: has internalized the sequence. Further, the speed of the dynamics can be adjusted using the overall level of external input to the network. Taken as a model of striatum, it therefore provides a possible explanation of the motor cortex lesion studies of Ref. [14], as well as the variable-delay lever press experiments of Ref. [19].
A sparsely connected spiking model supports learning and execution of timeflexible sequences Figure 4 shows that, as in the continuous version of the model, temporally patterned input and recurrent plasticity can be used to entrain a recurrent inhibitory network having no initial structure in the recurrent weights to perform a particular firing sequence, with only one cluster of neurons active at any one time.
Recent experimental work has indeed identified clusters of neurons in striatum that appear to function as transiently active cell assemblies [50]. Because we interpret the units studied in the continuous case above as clusters of neurons rather than individual neurons, full connectivity between units can be easily obtained even if connectivity between neurons is sparse, since some neurons in one cluster will always have synapses to some neurons in any other cluster if the clusters are sufficiently large. In Figure 4, the connection probability between all pairs of neurons is p = 0.2, showing that sparse connectivity between neurons is sufficient to enable one population to effectively inhibit another, as in the continuous model. Although the highly structured input used to train the network may at first appear highly artificial, similar sparse sequential activity patterns have been observed in motor cortex, which is a main input to striatum, after learning a lever press task [51]. The STDP rule according to which recurrent inhibitory synapses are modified is shown in Figure 4(b). The rule is anti-Hebbian, with postsynaptic spikes occurring at approximately the same time as or slightly after a presynaptic spike leading to weakening of the synapse, while presynaptic spikes occurring in isolation lead to a slight potentiation of the synapse. Figure 4(c) shows that this rule leads, after several repetitions of the input sequence, to a connectivity structure similar to that obtained in the continuous model, with decreased inhibition of a population onto itself and onto the next population in the sequence. Finally, as shown in Figure 4(d) and (e), once the weights have been learned, constant input is sufficient to induce the desired firing pattern in the network, with the magnitude of this input controlling the rate at which the pattern progresses. Thus, as for the continuous network studied in above, the spiking network is able to learn a firing pattern from an external source, and later autonomously generate the same pattern over a wide range of speeds. Further details of the spiking model are presented in the Supplemental Materials.

Discussion
We have presented a model in which a network of recurrently connected inhibitory units internalize stereotyped sequential activity patterns based on temporally patterned input. Moreover, the same activity pattern can be reproduced after learning even after removing the temporally patterned input, and the speed of the activity pattern can be adjusted simply by varying the overall level of excitatory input, without requiring additional learning. As a model of striatum, we suggest that it may provide an explanation for recent experiments showing that (i) sparse sequences of neural activity in striatum dilate and contract in proportion to the delay interval in timekeeping tasks [19], and (ii) motor cortex is necessary to learn new behaviors but not to perform already-learned behaviors (which are presumably directed at least in part by subcortical brain circuits such as the striatum) [14]. In unlesioned animals, the ability to progress between two modes of control-one in which the dynamical neural activity in the basal ganglia is enslaved to top-down input from cortex, and another in which subcor- tical brain circuits generate dynamics autonomously-may form the basis of a cognitive strategy enabling performance of behaviors more reliably and with less cognitive effort as performance at a particular task becomes increasingly expert. More speculatively, it may also be possible for trained animals to switch between these two modes of control during and after learning of a task, similar to the switching between the "practice" and "performance" modes of male songbirds in the absence or presence of a female, respectively [52]. It is also well known that important differences exist in rodents, monkeys, and humans between goaldirected and habitual behaviors (for reviews, see Refs. [53,54,55,56]), and exploring the relation between these behavioral modes and the cortically and non-cortically driven modes described above is an important direction for future study.
Regarding the flexible timing of neural firing patterns, several previous theoretical frameworks have been proposed for interval timing, including pacemaker-accumulators [57], in which a constant pacemaker signal is integrated until a threshold is reached; superposed neural oscillators [58], in which oscillations at different frequencies lead to constructive interference at regular intervals; and sequence-based models [59,60,61], in which a network passes through a sequence of states over time. The last of these is most similar to the model that we present, though with the important difference that it involves stochastic rather than deterministic switching of activity from one unit to the next and hence has much greater trial-to-trial variability. In addition to these models, some previous theoretical works have attempted to use external input to control the speed of a "moving bump" of neural activity within the framework of continuous attractors [62,63]. However, in both of these previous studies, obtaining a moving activity bump requires external input that couples to different types of neurons in different ways. This is not required by the model presented here, for which the activity bump still propagates even if the input to all units is identical.
It is also useful to contrast our model with other possible approaches within the framework of reservoir computing, starting with a random recurrent neural network (RNN) and training it to produce a sparse sequential pattern of activity either in the recurrent units themselves [64] or in a group of readout units. Such training can be accomplished for example using recursive least squares learning [65,66,67,68] or various backpropagation-based algorithms [69]. However, as we show in Figure S2, such a trained RNN generically tends to be much more sensitive to perturbations in the recurrent weights. In addition, the successful training of such an RNN requires many examples spanning the entire range of time scaling that one wishes to produce, whereas the network that we present can learn a sequence at one particular speed and then generalize to faster or slower speeds simply by changing one global parameter, making this network more flexible as well as more robust. This is reminiscent of the ability of human subjects learning a motor skill to successfully generalize to faster and slower speeds after training at a single fixed speed [70].
Although, motivated by experimental results involving the basal ganglia, we have developed a model of recurrently connected inhibitory units, the same basic mechanisms for sequence learning can be applied to obtain sparse sequential firing patterns with flexible time encoding in a network of excitatory units with shared inhibition. As we show in Figure S3, such a network can be made to produce variable-speed sequential patterns just as in the inhibitory network. Shared inhibition among excitatory neurons is a common motif used to obtain sparse coding of both static and dynamic neural activity patterns in models of cortical circuits. We show that, by including synaptic depression, these circuits can also be endowed with the ability to dilate and contract dynamic activity patterns in a straightforward way. Because the timescale of the shared inhibitory response limits the maximum possible sequence speed in such networks, a purely inhibitory network may provide an advantage over an excitatory network with shared inhibition in tasks requiring a large dynamical range. This ability may be relevant for various cortical areas involved in time-dependent decision making and motor control, for which sparse firing sequences similar to those in striatum have been observed [27,51], and this mechanism for producing sparse sequences may enable cortical networks to provide a pulse-like "tutoring" input to striatum, as in Figure 4. Previous models of sequences in circuits with both excitatory and inhibitory populations have not exhibited the ability to flexibly control the sequence speed [45] or have not been constrained by biologically plausible learning rules and connectivity constraints [64].
We conclude by summarizing the experimental predictions suggested by our model. Central to the model is the anti-Hebbian plasticity rule that enables the inhibitory network to learn sequential patterns. Experimental results on medium spiny neurons in vitro have shown that recurrent synapses do in fact potentiate when presynaptic spiking is induced without postsynaptic spiking [71], as one would expect from the second term in (2). To our knowledge, however, the question of whether paired pre-and postsynaptic spiking would lead to depotentiation, as described by the first term in the equation, has not yet been addressed experimentally. Both Hebbian and anti-Hebbian forms of STDP at inhibitory synapses have been found in other brain areas, as reviewed in Ref. [72].
Our model also predicts that the overall level of external excitatory input to the network should affect the speed of the animal's time judgement and/or behavior. By providing differing levels of input to a population of striatal MSNs optogenetically, it could be tested whether the speed of the neural activity sequence among these cells is affected. An alternative, and perhaps less technically challenging, approach would be to measure the overall activity level in the network, which should increase as the speed of the sequence increases. This effect should persist as long as saturation effects in activity levels do not become prominent (which does occur in the continuous model we present, but not in our spiking model) Changing the strength of recurrent inhibition should have a similar effect to changing the input level, although this would have to be done selectively to synapses between MSNs without disrupting feedforward inhibition from interneurons within striatum. Alternatively, as we remarked above, dopamine may be able to cause a change of the sequence speed by modifying the synaptic depression parameter (β in our model). Thus changes in tonic dopamine levels should also be able to effect temporal rescaling, and indeed there has already been some evidence that this occurs [73]. However, it is as yet unknown whether direct-and indirect-pathway MSNs, which project to different targets within the basal ganglia [17], play distinct roles with regard to interval timing. Including both types of MSN in the model will be a natural extension for future work and will allow for more direct comparison with existing models of basal ganglia function [74,75].
Our theory also predicts that the neural activity pattern in striatum should be the same in trained animals before and after cortical lesions and that this neural activity should play a role in driving the animal's behavior. Investigating the neural activity in striatum and its role in generating behavior in lesioned animals would thus provide an important test of the theory. Observing the activity in cortex itself may also be useful. The theory suggests that time-dependent variability in cortical input is likely to decrease as an animal becomes more expert at performing a task, or as it switches between behavioral modes. This could be studied via population recordings from striatum-projecting neurons in motor cortex.
Finally, while the lesion experiments of Ref. [14] suggest that the instructive tutoring input to striatum likely originates in motor cortex, the source of the non-instructive input driving behavior and controlling speed after learning is unknown. It would be interesting for future experiments to explore whether the non-instructive input originates primarily from other cortical areas, or alternatively from thalamus, thereby endowing this structure with a functionally distinct role from cortex in driving behavior.

Supplemental Materials Appendix A: Dynamics of model with synaptic depression
Intuition for the behavior of the model defined by (1) can be obtained by studying the case in which there are only two units, with activities x 1 (t) and x 2 (t), identical constant inputs x in 1 = x in 2 = x in , and symmetric inhibitory connectivity W ij = δ ij − 1. The behavior of this model is illustrated in Figure S1. This model can be understood by considering the fixed-point solutions for given fixed values of the synaptic depression variables y j . In this case one can plot curves along which the time derivativesẋ 1 andẋ 2 vanish, as shown in Figure S1. The intersection of these curves describes a stable fixed point, which may occur at either (x 1 , x 2 ) ≈ (1, 0) or (x 1 , x 2 ) ≈ (0, 1), depending on which of y 2 or y 1 is larger. With this picture in mind we can now consider the effects of dynamical y j (t). Suppose that at a given moment y 2 < y 1 and hence (x 1 , x 2 ) ≈ (1, 0). According to the second equation in (1), y 2 will begin increasing toward 1 due to the inactivation of x 2 , while y 1 will begin decreasing toward β due to the activation of x 1 . As this happens, the net input to the second unit becomes positive, and the stable fixed point switches to (x 1 , x 2 ) ≈ (0, 1) when y 1 = x in (assuming β < x in < 1), and the synaptic depression variables begin adjusting to this new activity. The result will thus be repetitive switching between the two units being active, with the period of this switching determined by τ y , β, and (importantly) x in . Versions of this two-unit model for switching, often termed a "half-center oscillator," have been previously studied in the context of binocular rivalry [76] and have long been used as a "central pattern generator" in models of rhythmic behaviors [77,78,79,80].
The above analysis holds exactly in the limit τ /τ y → 0 and λ → ∞, and in this limit it is straightforward to solve for the time that it takes for the activity to switch from one unit to the next: where is the largest value that y i (t) attains in each cycle and satisfies x in < y 0 < 1. Equation (3) shows that the switching period T diverges logarithmically as x in → β from above, and can be made arbitrarily small as x in → 1 from below. Thus, in addition to allowing for neural activity to switch between populations, the competition between external input and synaptic depression also provides a mechanism for complete control of the speed of the network dynamics.
Although we control temporal scaling throughout this paper by adjusting the external input level x in , we note that essentially equivalent effects can be obtained within our model by instead adjusting the synaptic depression parameter β rather than the external input x in . While this might seem like an intrinsic neuron property that would be difficult to control externally, there is evidence from in vitro experiments that the degree of synaptic depression in MSNs in striatum is dependent upon the level of dopamine input to the neuron [22]. What's more, changing dopamine levels in this circuit has been shown to reliably speed up or slow down an animal's time judgement [73], as one would expect from our model if the level of dopamine does in fact affect synaptic depression.
A possible objection to the above analysis is that x in cannot be tuned arbitrarily close to β in the presence of noise, thus limiting the dynamical range of scaling parameters that can be obtained. In order to take this into account, we suppose thatx in ≡ x in /(1 − η) can only be tuned reliably to within precision ∆. In this case, the maximum possible switching period that can be reliably obtained will no longer grow to infinity asx in → β, but rather will attain only a finite value asx in → β + ∆. Similarly, the minimum attainable switching period cannot be arbitrarily small, but instead will reach a minimum value whenx in → 1 − ∆. x in  Using (3), the dynamical range of temporal scaling is therefore given by (4) Figure 1(c) shows that a large dynamical range can be obtained as a function of the noise parameter ∆ even for biologically plausible noise values of ∆ 0.1. Thus, an inhibitory network with synaptic depression and appropriately chosen synaptic weights is capable of performing an activity sequence over a wide dynamical range, even without requiring a biologically unrealistic degree of precision in the input to the network. Finally, we note that similar results to those shown in this section and in the main text can be obtained instead in a model which features depressive adaptation current rather than depressive synapses: where the depressive adaptation current a i (t), a low-pass filtered version of the activity x i (t), increases monotonically after unit i becomes active, and γ ≥ 0 is a constant describing the magnitude of the adaptation current. In this model, an active unit will tend to lower its own activity level over time due to the dynamical adaptation current a i (t). If this depression is sufficiently strong, then the unit may become inactive after some time, at which point another unit in the network will become active. As in the synaptic depression model studied in the main text, the switch time for successively active units can be dynamically adjusted by varying the level of external input x in . Although this adaptation current model exhibits dynamics extremely similar to those of the synaptic depression model, we focus on the latter due to the fact that depressing synapses have been shown to be realized by neurons in the striatum [22]. quickly or slowly than it has learned in training. We measure the performance of the model in two ways. First, for each trial, we consider as templates all time-scalings of the pulse sequence in the range used to train the model (i.e., activity patterns such as those in the bottom panel of Figure S2b) and find the template with the best match to the produced striatal activity for that trial. The quality of the match is measured by the normalized root mean squared error: nRMSE = ||r s (t) −r s (t)|| 2 2 / ||r s (t)|| 2 2 where r s (t) andr s (t) are the produced striatal activity and the template pulse sequence respectively. The best-match time is considered to be the response time of the model. Second, the value of the nRMSE indicates whether the response on that trial looked anything like a 'correct' striatal pulse sequence. By visual inspection, we set an nRMSE of 0.3 as the threshold above which a trial is not considered to be a meaningful pulse sequence.
In Figure S2(c) we show the mean and standard deviation of the best-match times for several target times after the addition of corticocortical synaptic weight noise. Notably, at 5% noise, the mean best-match times deviate far from the target times (compare to Figure S3(e)) and greater than 25% of the trials at every target time have nRMSEs exceeding 0.3.
We show the extrapolation performance of the model in Figure S2(d). For target times shorter than the minimum target time used during training, the striatal responses deviate to longer times and the quality of the responses (as measured by the nRMSE) degrade. For target times longer than the maximum used during training, the responses quickly become meaningless with values of the nRMSE of about 1.
Appendix C: Sparse sequential firing in an excitatory network with shared inhibition Although the model described in the main text describes a network consisting of only recurrently connected inhibitory units, the same mechanisms can be applied to a network of excitatory units connected by shared inhibition. In particular, switching from from one excitatory unit to the next is again controlled by competition between the level of background input and synaptic depression at excitatory synapses, with the relative values of these quantities determining the rate at which activity jumps from one unit to the next. Some previous theoretical works have used excitatory networks with shared inhibition to obtain random or nonrandom activity sequences [83,84].
To begin, we consider a network illustrated in Figure S3(a) and described by the following equations: where x i (t) is the activity of an excitatory unit, x I (t) is the activity of a shared inhibitory unit, and we assume J ij , J EI , J IE ≥ 0. In the case where the timescale characterizing inhibition is much faster than that characterizing excitation (τ I → 0) and the nonlinearity of the transfer function for the inhibitory units can be ignored (φ I (x) ≈ x), (6) becomes the following: where we have defined J I ≡ J EI J IE . In the case where J I ≥ J ij , and if excitatory synapses are made to be depressing by letting x j (t) → x j (t)y j (t) on the right hand sides of the above equations, then this is precisely the model that was introduced in (1).
T (s) Figure 8: (a) Decoded time from readouts of a trained recurrent network with random perturbations to the recurrent weights versus decoded time from the trained network without perturbations, where the network has been trained to produce a sequence of pulses in a group of readout units, with the sequence speed determined by the level of tonic input to the network. Open symbols denote intervals in which the root mean squared error (RMSE) between the time-dependent readout activities and the best-fit pulse sequences is > 0.3 on more than 25% of trials. (b) Decoded time for a trained recurrent network as in (a), but in this case the network is trained only on time intervals ranging from 0.4s to 1.2s, and then tested on intervals outside of this range. The colors show the RMSE between the time-dependent readout activities and the best-fit pulse sequences.
characterizing excitation (⌧ I ⌧ 1) and the nonlinearity of the transfer function for the inhibitory units can be ignored ( I (x) ⇡ x), (10) becomes the following: where we have defined J I ⌘ J EI J IE . In the case where J I J ij , and if excitatory synapses are made to be depressing by letting x j (t) ! x j (t)y j (t) on the right hand sides of the above equations, then this is precisely the model that was introduced in (1). x in ...  Figure S3 shows that the behavior of such a circuit with shared inhibition exhibits sparse sequential firing patterns virtually identical to those in the recurrent inhibitory network, even in the case where the above assumptions requiring inhibition to be fast and linear are relaxed by letting τ I = τ and φ I (x) = Θ(x) tanh(x), where Θ(x) is the Heaviside step function. However, the dynamic range of temporal scaling factors that can be obtained in this case is somewhat more limited in the model with shared inhibition, with an approximately four-fold speed increase obtained in Figure S3, compared with over an order of magnitude obtained in Figure  1.
Although we shall not explore the effects of synaptic plasticity here in detail, the mapping of the network with shared inhibition onto the model previously studied, as shown in Equation (7), means that sequence learning can also take place within this model. This requires that recurrent connections between excitatory synapses should follow a Hebbian plasticity rule, according to which synapse J ij is potentiated if unit i fires immediately after unit j.

Appendix D: Sequences in a network of spiking neurons
The following model describes a network of exponential integrate-and-fire neurons with synaptic depression: where the membrane potential V i (t) is defined for each neuron i, and the dynamical synaptic depression variable x ij (t), which can be interpreted as the fraction of available neurotransmitter at a synapse, is defined for each synapse, with x ij (t − 0 + ) meaning that the value of x ij just before the presynaptic spike should be used. When the membrane potential of neuron i diverges, i.e. V i (t) → ∞, a spike is emitted from neuron i, and the potential is reset to the resting potential E L . Each time a presynaptic neuron j fires a spike at time t j , the depression variable is updated as x ij → (1 − u)x ij , where u is the fraction of neurotransmitter that is used up during each spike (0 ≤ u ≤ 1). The amount of electric charge that enters the postsynaptic cell during a presynaptic spike from neuron j is ux ij (t)QW ij , where Q has units of charge, and u, x ij , and W ij are dimensionless. In terms of the model described in Section , each cluster of neurons corresponds to one of the units from the continuous model. As before, the competition between external input current and synaptic depression is used to obtain control over the temporal dynamics. The parameters used in Figure 4 are C = 300pF, g L = 30nS, E L = −70mV, V T = −50mV, ∆ T = 2mV, τ x = 200ms, u = 0.5, τ pre = 20ms, τ post = 5ms, Q = 1.5pC, A = 0.05, a = 0.002.