A Hippocampal Model for Behavioral Time Acquisition and Fast Bidirectional Replay of Spatio-Temporal Memory Sequences

The hippocampus is known to play a crucial role in the formation of long-term memory. For this, fast replays of previously experienced activities during sleep or after reward experiences are believed to be crucial. But how such replays are generated is still completely unclear. In this paper we propose a possible mechanism for this: we present a model that can store experienced trajectories on a behavioral timescale after a single run, and can subsequently bidirectionally replay such trajectories, thereby omitting any specifics of the previous behavior like speed, etc, but allowing repetitions of events, even with different subsequent events. Our solution builds on well-known concepts, one-shot learning and synfire chains, enhancing them by additional mechanisms using global inhibition and disinhibition. For replays our approach relies on dendritic spikes and cholinergic modulation, as supported by experimental data. We also hypothesize a functional role of disinhibition as a pacemaker during behavioral time.

While there are evidences (Girardeau et al., 2009;Ego-Stengel and Wilson, 2010;Middleton et al., 2018;Schapiro et al., 2018) suggesting that forward and reverse replays of memories during sharp-wave ripples play an essential role in memory consolidation, it is much less clear how such replays can actually be formed. Memory sequences that happen on the timescale of behavior (order of minutes) have to be stored in an online and one-shot fashion in such a way that subsequent replay is possible on a much faster timescale (order of a couple of hundred milliseconds). In addition, it is not enough to link each item in the memory sequence with its successor, as some items may occur several times in the sequence, but with different successors. Even further, different memory sequences which share many items would introduce severe corruptions in memories stored by linking items with their successors. As the nature of replays is best understood in the context of navigation, we next illustrate these challenges in more detail in this context.
It is known that in the hippocampus there are place cells which fire whenever the animal is at a specific spatial location in a given environment (O'Keefe and Dostrovsky, 1971;O'Keefe, 1976;Harvey et al., 2009). An animal which navigates in an environment, will do so varying specifics of its motion, such as its speed, the stopping locations and the duration of these stops. Spontaneous replays of such events in sharp-wave ripples, however, are independent of such speeds, stopping times or locations of the animal (Wilson and McNaughton, 1994;Nádasdy et al., 1999;Geisler et al., 2007;Davidson et al., 2009;Yamamoto and Tonegawa, 2017), as well during sleep (Wilson and McNaughton, 1994;Lee and Wilson, 2002) as in quiet wakefulness (Foster and Wilson, 2006;Karlsson and Frank, 2009;Carr et al., 2011;Foster, 2017;Pfeiffer, 2017;Yamamoto and Tonegawa, 2017). In addition, if the animal goes through the same location twice, turning left the first time and right the second time, the hippocampus is still able to activate the correct order of the place cells during replay (e.g., left first then right) (Wu and Foster, 2014).
In this work, we propose a possible mechanism that addresses and resolves all challenges mentioned above. We present a model that can store experienced trajectories on a behavioral timescale after a single run, and can subsequently replay (forward and backwards) such trajectories, thereby omitting any specifics as speed, stopping times, etc. Our solution builds on two wellknown concepts: (i) one-shot learning (Amit and Fusi, 1994) and (ii) synfire chains (Abeles, 1982). More specifically, we make use of the fact that random connectivity is enough to learn associations between ensembles of cells in a one-shot fashion using standard forms of Hebbian plasticity. Synfire chains, on the other hand, are a well known mechanism that activates neurons in a pre-specified order. Our main contribution is an enhancement (inspired by biological observations) of the standard synfire chain approach that allows the network to alternate between an asynchronous mode, used during encoding of memories, and a synchronous mode, used during replays. We term this enhancement a sequence encoding module. Cells in the sequence encoding modules are called sequence cells. Unlike place cells, which fire at a particular location, sequence cells fire in a predefined position of the sequence, independent of the environment. A sequence ensemble consists of all sequence cells that fire at, say, the fourth position of the sequence; i.e., that correspond to the fourth layer of a synfire chain. We explain in more detail how the sequence encoding modules function in what follows. See also the video which we provide in the Supplementary Material.
Before continuing the discussion on sequence cells, we would like to argue why the recurrent connectivity among place cells alone can hardly reproduce the observed replay phenomena. First, replays of trajectories containing loops become inherently corrupted, as there is no way of locally deciding whether it is the first or second visit to a particular location in the absence of a context signal. In particular, Wu and Foster (2014) indicate that the hippocampal replays are able to correctly decide the direction in a forked environment, where the animal visits the same location multiple times. Second, during multiple runs of the animal, many possible trajectories will be stored for replay. Even if each stored trajectory contains no loops, together each location will be visited many times. If replays used solely the recurrent connectivity among place cells, over time, the produced replays would resemble much more a random walk over the space than a previous experience. While it has been reported that some replays do not reflect previously traversed trajectories (Gupta et al., 2010), the same work reports that a significant fraction of replays does.
For clarity we present our approach in the context of place cell replays, even though it can be used to store any form of spatio-temporal memory sequence. Note that we require no assumptions on the geometry of the environment and on the specific trajectory taken (such as whether it contains loops). The sequence encoding modules operate in two distinct modes: an encoding mode, corresponding to the asynchronous firing hippocampal (theta) state observed during navigation and a replay mode, corresponding to sharp-wave ripples observed during sleep and quiet wakefulness. In the encoding mode, the synapses from sequence cells to the corresponding place cells are learned by standard Hebbian plasticity rules. For this to work, we do not need any specific (learned) connectivity between sequence cells and place cells: random connectivity with low synaptic strength between all sequence cells and place cells is enough. Associations can be learned in an online and one-shot manner between coactive cells by standard Hebbian plasticity. We use a global inhibitory population to ensure that the same sequence ensemble remains active, regardless of how much time the animal spends at a particular location. Once the animal moves to a new location, we postulate that a locomotion signal activates a second group of inhibitory neurons that temporarily inhibits the global inhibitory population, thereby allowing the sequence module to move its active state to the next ensemble. We note that we do not require any specific connectivity for the inhibitory neurons (random connectivity to the sequence cells is enough). The use of disinhibitory input to shape activity patterns so they match behavioral navigation times is motivated by the experimental observation that glutamatergic interneurons in the medial septum effectively control speed-correlated firing of hippocampal neurons (Fuhrmann et al., 2015;Justus et al., 2017). The replay mode, unlike the encoding mode, exhibits extremely synchronous activity as observed in sharp-wave ripples (Buzsáki, 1986(Buzsáki, , 2015. This synchrony of activity induces dendritic spikes, which produce a fast forward and/or reverse replay of the sequence cells. In replay mode, our sequence ensembles trigger, due to the learned connectivity between sequence ensembles and place cells in the encoding phase, a forward and/or reverse replay of the place cell ensembles in the correct order. Consistent with previous theoretical work (Jahnke et al., 2015) and experimental observations (Hasselmo, 1999;Buzsáki, 2015), our model assumes that replays, dendritic spikes, sharp-wave ripples and cholinergic modulation are tightly interconnected. See the section 3 for a more complete discussion on the assumptions of the model and how they relate to what has been observed in experiments.

Encoding of Place Cell Sequential Activity
We demonstrate that the sequence encoding modules can account for the robust forward and reverse replays observed in the hippocampus. This is done through the simulation of a virtual mouse exploring a predefined environment. A spiking neural network of exponential integrate-and-fire neurons with physiological parameters, emulating the behavior of place and sequence cells, is associated to the mouse.
As the mouse walks through the environment, a sequence of place cells in the spiking neural network will fire, each place cell corresponding to a particular place in the environment (in the model, we do not speculate how place cells could correspond to a particular place in an environment. Instead, when the mouse is at a certain location, we enforce that the corresponding place cell ensemble fires at the correct rate, as is experimentally observed, O'Keefe, 1976). In parallel, the sequence cells of the spiking neural network also fire, but they do so along a predefined sequential activity pattern, which is independent of the environment and the route taken by the mouse. Crucially, after exploration of the environment by the mouse, the sequence cell activity is capable of triggering replays of place cell activity, due to the plasticity of synapses between sequence and place cells. These sequence cells can be used to replay many different sequences of place cell activity. We defer the detailed explanation of the mechanism that allows sequence cells to produce the desired behavior to the next subsection. For ease of understanding we first provide some general intuition relating sequence cell dynamics to the place cell activity.
Consider as an example, a virtual mouse walking on a track, as depicted in Figure 1A. Each place cell corresponds to a specific location on the track. The place cells fire with ≈50 Hz when the mouse is at the corresponding location, and are otherwise silent. As the mouse moves along the track, a sequence of place cell ensembles is activated. The place cells are also targeted by sequence cells and plastic synaptic connections will encode the sequence of place cell activity produced by the motion of the mouse, as shown in Figure 1B. The sequence cells fire following a predefined activation pattern that possesses two modes of operation: the encoding mode and the replay mode. These correspond to the experimentally observed different modes of operation at the hippocampus during exploration and slow-wave sleep (or quiet wakefulness), respectively. In the encoding mode, the next ensemble in the sequence is only activated when the mouse moves to another location. The synapses from sequence cells to the corresponding place cells are learned and, thereafter, sequence cells can activate the corresponding place cell ensembles without requiring external input from the environment, as illustrated in Figure 1C. The applied learning rule is a simple Hebbian plasticity mechanism (a detailed description of the applied learning rule is given in the section 4.2.5).
Thus, in the encoding mode, the place cell ensembles are activated in sequence as the animal moves along the track, as shown in Figure 1D. In the replay mode, the sequence cell ensembles are triggered one after the other, in a small time window. Through the synaptic connections that were learned in the encoding mode, the sequence cells trigger a forward replay of the place cells. A raster plot of place cell activity during the replay mode is shown in Figure 1E. Furthermore, the sequence cell ensembles can also be activated in the opposite order, and this produces reverse replay of place cell activities. A raster plot of a reverse replay of place cell activity is shown in Figure 1F. It may be noted that both forward and reverse replay are much faster than the original encoding time, that they are independent of the animal's speed, stopping times and locations, and that the majority of place cells fire in the right order, driven by the activity of the sequence cells. Moreover, these replays can be produced after a single traversal of the path by the mouse.
We emphasize that our approach does not require any particular assumption on the geometry of the environment, the connectivity among the place cells and particularities of the path (such as loops). Sequence cells enable our model to produce replays in such a general setting, and this is discussed in the next subsection.

Sequence Encoding Modules of Sequence Cells
Sequence cells are organized in a specific structure that resembles a synfire chain. This structure allows the sequence cells to produce the replays observed in place cells in Figure 1. Sequence cells might coexist in the CA3 with the place cells (or might be localized somewhere else in the hippocampus). Figure 2A shows the general structure of sequence cell connectivity, showing an example of what we term a sequence encoding module. The sequence cells can, in principle, compose many such modules that could store many parallel sequences. Each sequence encoding module consists of ℓ ensembles of neurons B 1 , . . . , B ℓ . Forward connections from B i to B i+1 are present. Backward connections, which allow reverse replays, are also present and are generally weaker than the forward connections. These ensembles share two different sources of inhibition: the gating inhibition I G and the competitive inhibition I C . The rate dynamics of individual ensembles B i have two stable fixed points, one at low firing rate (≈0 Hz) and one at high firing rate (≈50 Hz). Henceforth, we term the ensembles B i bistable units, and refer to B i as active at FIGURE 1 | Robust replay of place cell activity. (A) An animal walks along a path, containing a loop in a 2-dimensional environment. (B) Locations in this path are associated to specific populations of CA3 place cells which fire at high rates if the animal is at said location. In this particular scenario, each position is associated with an ensemble of 100 place cells. Concomitantly, the sequence cells fire following a predefined sequential activity pattern that is explained in detail in section 2.2. Hebbian-like synaptic plasticity connects the sequence cells to the corresponding place cell ensembles. (C) The sequence cell dynamics allow for place cell activity replay on a much shorter timescale and independent of the animal's stopping times and locations. (D) 15 s simulation of place cell activity as the animal walks through the track in (A); a rate plot of the place cell ensembles corresponding to each position the animal walks through is shown; the x axis shows time in seconds, the left y axis shows the track positions corresponding to the place cell populations and the right y axis shows the firing rate of each place cell ensemble corresponding to a particular track position. Note that in addition to the loop in the path, the animal occasionally stays arbitrarily longer at particular locations. (E) Fast replay of the path taken by the animal in less than 100ms; a raster plot of place cell activity is shown; the x axis shows time in milliseconds, the left y axis shows the track positions corresponding to the place cell populations. (F) Reverse replay of the path taken by the animal; a raster plot of place cell is shown; the axis are the same as in (E). Note that each weight in the spiking network is multiplied by a gaussian multiplicative noise. a particular time if its instantaneous rate is at the 50 Hz stable point, and inactive if it is at the silent stable point. Furthermore, the gating inhibition I G is also either active (if its inhibitory neurons fire at a high rate) or inactive (if its inhibitory neurons are silent).
As noted in the previous subsection, these sequence encoding modules possess two modes of operation: the encoding mode and the replay mode. In the encoding mode, only one bistable unit is active at a particular point in time, and this bistable unit remains active for as long as necessary, only triggering activity in the next bistable unit and turning itself inactive when the mouse moves to another location. In the replay mode, the bistable units are triggered one after the other in a very brief period of time.
The discussion of the dynamics of sequence cells is split between this subsection and the section 2.3. This is done to distinguish the essential building blocks of our model, presented FIGURE 2 | Sequence encoding modules. (A) Illustration of a sequence encoding module architecture consisting of 3 bistable units B i , B i+1 , B i+2 , the competitive inhibitory population I C and the gating inhibitory input I G . The inhibitory unit I C prevents two bistable units B j from having high rate at the same time, whereas the gating inhibition I G is the "break" that prevents activity from propagating from B j to B j+1 . This break is externally controlled, by some mechanism that recognizes when the animal changes its location. (B) Evolution of the rates of 2 abstract rate units B i , I C and I G , disposed according to the sequence encoding architecture in (A). (C) General illustration of the states the network goes through over time. (i,vi) correspond to stable states in the network while I G is activated. Inactivating I G induces a dynamical process that brings the network from state (ii) into state (v). This dynamical process is depicted in further detail below with (iii,iv). In (iii), due to inactivation of I G , the activity in B i+1 picks up, driven from the activity B i . As shown in (iv), once both B i and B i+1 are active, they together activate I C , which gives strong inhibitory input to B i and B i+1 and (in v) forces B i to become inactive, while the recurrent activity permits B i+1 to recover and stay active. (D) Spiking network of sequence cells. Above, the x-axis shows 1.2 s of simulated activity. The left y-axis shows the indices of the cells in each sequence ensemble, and a raster plot is given. The right y-axis shows the rate of each sequence ensemble. Below, the period where I G is inactivated is emphasized. During this period, the stability of the system is broken and the dynamics described above take place, leading ultimately to B i+1 overtaking B i as the active sequence ensemble. (E) Analogous to (D), but we now purposedly enforce a longer time out of the global inhibitory mechanism. Observe that the system still functions much like in (D), so the period where I G is turned off is flexible. in this subsection, from specific implementation decisions that are not essential.

Encoding Mode
The encoding mode of the sequence encoding modules relies on a specific, complex rate dynamics while the animal is exploring an environment. To describe them, and to extract the fundamental ingredients that allow a spiking neural network to produce the desired behavior, we first adopt a mean-field approach and collapse the bistable units B i , the gating inhibition I G and the competitive inhibition I C into abstract rate units. This approach is inspired by Litwin-Kumar and Doiron (2014); Zenke et al. (2015). A detailed description of this rate-based system is given in the section 4.1.
At all times, the competitive inhibition I C guarantees that only one bistable unit B i can be active at any particular point in time, by realizing essentially a winner-takes-all mechanism. The gating inhibition I G is responsible for controlling the amount of time B i remains active. While I G is active, the currently active bistable unit remains active. If the mouse location changes, I G turns inactive, which implements a disinhibitory mechanism that allows progression of activity in the bistable units as follows (see the section 3 for experimental evidence relating disinhibition and behavior). As soon as I G turns inactive, the active bistable unit B i drives the rate of the inactive bistable unit B i+1 . Eventually B i+1 turns active and triggers the competitive inhibition I C . As B i+1 receives more input than any other bistable unit (if we set the connections from B i to B i+1 stronger than the ones from B i+1 to B i ), it will be the only one active at the end due to the winnertake-all dynamics imposed by I C . Once B i+1 has taken over, the disinhibitory signal is removed and I G turns active again and prevents B i+1 from activating B i+2 . Figure 2B shows this evolution of activity in a rate-based system that reproduces the desired rate dynamics. Figure 2C illustrates the most notable states of activity the network goes through. In (i) while I G is active, B i remains active; (ii) once I G is inactive; (iii) the rate of B i+1 is driven up by B i ; (iv) this, in turn, triggers an increase in the rate of I C , which (v) forces a competition among B i and B i+1 ; eventually, B i+1 emerges as the winner and (vi) I G is again active, preventing B i+1 from activating B i+2 , while B i+1 stays active. Figures 2D,E show the spiking network of sequence cells transitioning between two consecutive patterns and exhibiting the aforementioned rate dynamics. Note that the length of the period in which the gating inhibition I G is inactive is different in both. Thus, the length of the period in which the disinhibitory signal is given to I G does not need to be strictly timed. The most important requirement is that it gives enough time for the dynamics happening between (ii) and (v) to reach its stable point.
Furthermore, as will be seen in the next subsection, when the spiking network obeying these rate dynamics is presented in detail, the encoding mode has two defining characteristics other than the combined gating and competitive inhibitory processing mechanisms. They are: (1) neurons in bistable units fire asynchronously; and (2) transition time between one bistable unit and the next is in the order of 50 ms to 100 ms. The first property emerges from a balance of excitation and inhibition in a spiking network of neurons (Van Vreeswijk and Sompolinsky, 1996;Renart et al., 2010) and is crucial in guaranteeing the stable rate dynamics we expect from the bistable units. The latter property is a consequence of having a spiking neural network produce the complex rate dynamics of the sequence encoding modules. Roughly speaking, for the gating inhibition I G to prevent activation of the next bistable unit, the weights between bistable units must be generally weaker than those inside each bistable unit. Since too high stable rates for a bistable unit B i are not reasonable, these weights between bistable units units will be weak. Finally, weak weights imply that some time is necessary to increase the membrane potential of the neurons beyond the spiking threshold.
In essence, in the encoding mode, the sequence encoding module activates bistable units in a sequence, with the crucial property that the sequence can be stopped, for an arbitrarily long period, at any particular point in time. This allows it to capture the essential elements of the path explored by the animal, as the sequence transitions to the next element only when a new location is reached. It may be noted that such an ability to produce compressed representations is potentially useful in other contexts. We discuss further applicability in the section 3 of the paper.

Replay Mode
The main distinguishing feature of the encoding and replay modes is the synchrony of activity. In the replay mode, activity progresses through the bistable units in synchronous waves, reminiscent of a synfire chain (Abeles, 1982), in a manner depicted in Figure 3A. This synchrony of activity in a bistable unit yields fast dendritic sodium spikes in the subsequent bistable unit. Dendritic sodium spikes are observed whenever, in a period of a couple milliseconds, the membrane potential at a particular branch of the dendrite of a neuron receives excitatory input above some threshold (Ariav et al., 2003;Gasparini et al., 2004;Polsky et al., 2004;Gasparini and Magee, 2006). Therefore, we assume that the neurons in bistable unit B i have the axons projecting on close branches of the dendrites of neurons in bistable unit B i+1 . We assume a similar structure for the backward connections, as reverse replays will also be driven by dendritic spikes (in the model, we follow the same approach as Memmesheimer (2010), Jahnke et al. (2012), and Jahnke et al. (2015) and assume that given enough synchronous input dendritic spikes will occur). Whenever B i is synchronously activated, some branch of the dendritic tree of a neuron in B i+1 receives strong excitatory input in a few milliseconds, producing a dendritic spike that has a large depolarizing effect on the soma. Figure 3B shows the effect on the soma of one such dendritic spike, in comparison to the effect that would be produced if dendritic nonlinearities were not present.
There is experimental support for the connection between synchronous events and replays, as replays are observed typically during sharp-wave ripple events, the most synchronous pattern of activity in the brain (Buzsaki, 2006;Buzsáki, 2015;Diba and Buzsáki, 2007). In our model, as shown in red in Figure 3C, the sequence cells can also be played asynchronously forward, without the appearance of dendritic spikes due to the lack of synchrony. In fact, this asynchronous slow replay of sequence FIGURE 3 | Replay mode of the sequence encoding module. (A) Synfire chain like synchronous activation of three consecutive bistable units B 1 , B 2 , B 3 . Synchrony triggers dendritic spikes in the next bistable unit, which in turn synchronously activate it. Activity moves in a single direction due to neurons entering a refractory period. (B) Nonlinear effect of a dendritic spike on the membrane potential at the soma of a neuron. There is a latency between the moment the dendrite triggers the dendritic spike and its effect on the soma. All dendritic spikes produce the same stereotypical current pulse effect in the soma. (C) Raster plot of the sequence encoding module in the replay mode, without any dendritic spikes, and showing different levels of cholinergic effect (for ACh factor of 5, only the first five spikes are shown). Without any cholinergic modulation (red), or with the cholinergic factor used in the rest of the paper (black), the sequence encoding module does produce a replay of the path, but on a much slower timescale. Each transition, in the same way as during encoding, takes between 50 and 100 ms. If one increases the cholinergic effect beyond what is reported in Hasselmo and Schnell (1994) and Hasselmo et al. (1995), then as shown in the plot (blue), the transitions between consecutive ensembles is around 15 ms in the upper range of what is often reported in the literature (Lee and Wilson, 2002;Diba and Buzsáki, 2007). We emphasize that only the first five spikes are shown in the blue plot. Due to the fivefold increase in excitatory weights and no counterbalance by inhibition, the network becomes too excitable and the neurons spike at very high rates. Still, the blue plot indicates that it would be possible to produce fast replays if the cholinergic effect would be two times or more of what has been reported in the literature, even without using dendritic spikes. On the other hand, reverse replays rely crucially on dendritic spikes and cholinergic modulatory effect and do not happen without either. (D) Raster plot of the sequence encoding module in the replay mode, now with dendritic spikes. Dendritic spikes increase the speed of replays by two orders of magnitude, making it in line with what is observed during sharp-wave ripples.
cells produces replays that resemble what is experimentally observed during REM sleep (Louie and Wilson, 2001). In the model, the forward replays without synchrony are not significantly faster than the original encoding time, with transition between consecutive bistable units is between 50 ms to 100 ms. Such slow transition times are not in line with the speed of forward replays observed during sharp-wave ripples, but they are comparable to the speed of replays observed during REM sleep where sharp-wave ripples are rare and theta activity is predominant (Buzsáki, 2002). Furthermore, without using synchrony and cholinergic modulation, it is not possible to significantly reduce such transition times (see Figure 3C in red). If asynchronous firing of a bistable unit B i , in a couple of milliseconds, sends too large input to B i+1 , then a simple form of inhibitory gating of activity may not prevent B i+1 from turning active. Even more, the non-linear effect of dendritic spikes is also crucial if the neurons in B i are synchronously activated. This is because, without the non-linear effect of dendritic spikes, it is difficult to distinguish high asynchronous firing rates, from synchronous events. Only a couple milliseconds are required for a bistable unit firing at 50 Hz to send forward the same input that it would send if it fired synchronously (lowering the stable rate at which a bistable unit is active would also have undesirable consequences as one needs to be able to clearly distinguish between the low and high firing rate stable points of the bistable units. This will be further discussed in the next subsection). Figure 3D shows a fast replay of sequence cells produced by the spiking network. Note that the transition time drops to around 5 ms, that the activity in each bistable unit is very synchronous and that most neurons spike once and in the right order.
The second distinguishing feature between the encoding and replay modes is a different cholinergic environment. In the replay mode, we assume lower levels of Acetylcholine (ACh) in the network. This is motivated by the experimental observation that replays typically occur in sharp-wave ripples during slowwave sleep and quiet wakefulness, where lower ACh levels are documented (Hasselmo, 1999(Hasselmo, , 2006. It is known that a lower ACh level makes synapses overall stronger and the network more excitable (Hasselmo et al., 1995). In our model, higher excitability is not necessary for forward replays, but it is strictly necessary for reverse replays. Furthermore, higher excitability facilitates faster transition times between bistable units, as well as the occurrence of dendritic spikes. In principle, if there would not be any dendritic spikes, this higher excitability of the network could decrease the transition times between bistable units, even when the bistable units fire in an asynchronous manner. Hence, the higher network excitability would allow for the bistable units to be replayed asynchronously faster than shown in red in Figure 3C, which would in turn allow for replays even without a synchronous mode containing dendritic spikes. However, in order for the replays to be as fast as what is observed in the literature (Lee and Wilson, 2002;Diba and Buzsáki, 2007), the magnitude of the cholinergic effect on the network would have to be much higher than what is reported in Hasselmo et al. (1995) (as shown in Figure 3C in black and blue). Dendritic spikes are therefore necessary for fast replays of activity, in line with what is reported in the literature.
In short, due to slow transition times, the encoding mode of the sequence encoding module cannot account for the fast robust replays observed in the hippocampus. The module needs another mode of activity that has faster transition times. We remark that this different mode of activity is consistent with biology, as these replays are observed during quiet wakefulness and sleep, where the hippocampus is known to be operating in a different regime. Two essential observed properties of the hippocampus in quiet wakefulness and sleep motivate the replay mode: sharp wave ripples and a different neuromodulatory environment, especially, lower levels of Acetylcholine (ACh). Evidence concerning sharp wave ripples indicate that the sequence encoding module in the replay mode might have its bistable units activated in an almost synchronous manner (Buzsáki, 2015). Lower ACh levels, as is observed during sleep and quiet wakefulness (Hasselmo, 1999(Hasselmo, , 2006, induce higher excitatory synaptic weights which facilitate faster transitions between bistable units.

Reverse Replays
In our model, as mentioned before, the reverse replays rely crucially on the higher network excitability produced by lower levels of ACh, whereas forward replays do not. This is because the rate dynamics of the encoding mode lead to a weight asymmetry condition. The backward connections among bistable units that enable reverse replays must necessarily be weaker than the forward connections among bistable units. In particular, this asymmetry gives a potential function for the (experimentally observed) increased excitatory weights induced by lower ACh levels (Hasselmo, 1999(Hasselmo, , 2006. For the backward connections to produce synchronous activation of the bistable units in reverse order, they need to be strong enough to produce dendritic spikes. However, as they are weaker than the forward connections, in the encoding mode, asynchronous activation of a bistable unit should not produce dendritic spikes in the next one. If the excitatory synaptic weights are not generally increased in the replay mode it is difficult to have, at the same time, dendritic spikes at the backward connections in the replay mode and a fully functioning encoding mode. One either gets instability in the encoding mode, or not enough dendritic spikes in the replay mode. Therefore, reverse replays are made possible by the difference in the neuromodulatory environment, in addition to sharp-wave like synchronous activity triggering dendritic spikes. Observe, additionally, that dendritic spikes have a refractory period of about 5 ms (Memmesheimer, 2010;Jahnke et al., 2012Jahnke et al., , 2015 and that within this refractory period the respective dendrites do not transmit any spikes. It is this refractory period that forces the synchronous wave to travel in a single direction. This is also shown in Figure 3A. In this case, if bistable unit B i is synchronously activated by dendritic spikes from B i−1 (from B i+1 ), it will produce dendritic spikes only in B i+1 (in B i−1 ), as the neurons of B i−1 (of B i+1 ) are in the refractory period, respectively. Moreover, Figure 3D shows a reverse replay of activity produced by the spiking network.
In essence, the asynchronous encoding mode has a preferred direction for the activity to travel, as the forward connections between bistable units B i and B i+1 are stronger than the backward connections between B i and B i−1 . Dendritic spikes, cholinergic modulation and the refractory period of neurons allow the sequence cells to be activated in both directions, depending on which bistable unit the synchronous volley starts.
While the previous discussion is constrained to the setting of our model, in general, there are other possible ways of enabling the sequences to be traversed only forward during encoding but in both directions during replays. For example, replays and reverse replays can be generated by symmetric connection weights between sequence modules as done in Chenkov et al. (2017) and unidirectional sequence propagation during encoding could be enforced by short-term synaptic depression (Stevens and Wang, 1995;Markram and Tsodyks, 1996;Abbott et al., 1997).

Sequence Encoding Modules-Spiking Network
In the previous subsection, the essential building blocks of sequence encoding modules were discussed on a conceptual level. In this subsection, we describe more detailed conditions that are necessary so the dynamical system comprising our spiking network model behaves as intended.

Bistable Units
In the spiking network, each bistable unit consists of a recurrently connected population of excitatory and inhibitory cells as depicted in Figure 4A. To produce bistable units where the rate dynamics maintain two stable fixed points, one around 0 Hz and one around 50 Hz, we analyse the gain function (Gerstner et al., 2014) of excitatory and inhibitory neurons as shown in the left of Figure 4B. There, the rate of excitation (red) and inhibition (teal) is shown for varying recurrent excitatory (inhibitory) and compared with the identity (black). Formally, to obtain the stable points of activity one has to solve the system of equations: r E = F(r E , r I ) and r I = G(r E , r I ), where F and G represent the evolution of the rate of excitatory and inhibitory neurons when receiving excitatory rate r E and inhibitory rate r I . For the plot, the implicit solution r I (x) = G(x, r I (x)) is approximated for all x. For the excitatory gain function, the function F(x, r I (x)) (in red) is plotted and compared to the identity (in black). An analogous procedure is done to plot the inhibitory gain function G(r E (x), x), where we now define r E (x) implicitly via the equation r E (x) = F(r E (x), x).
Note that the excitatory gain function is below the identity for low rates in Figure 4B but not in Figure 4C. This implies that there is a stable point at 0 Hz while the gating inhibition I G is active, even when receiving input from the previous bistable unit, but this stable point disappears when I G is turned inactive. This allows the activity to move forward. On the other hand, in both Figures 4B,C, the excitatory gain function is above and then below the identity around the corresponding stable points. This characterizes the stable point as well as the stable region around it.

Encoding Mode
The spiking network operates in the encoding mode if the initial excitatory input that activates the first bistable unit does so in an asynchronous manner. If the bistable units exhibit the correct rate dynamics, then the gating and competitive inhibitory mechanisms guarantee that it behaves as intended, again, by a through analysis of the gain functions of excitatory and inhibitory neurons. From the conditions given by the rate dynamics we have: (1) if the gating inhibition I G is inactive, then the input sent by B i to B i+1 should be enough to activate B i+1 ; (2) if I G is active, then B i remains active; and (3) while I G is active, B i+1 remains silent. Figures 4B,C explain the functioning of the gating inhibitory mechanism. In Figure 4B observe that if I G is activated then the bistable unit B i maintains both high and low rate stable points, even when receiving input from the previous bistable unit. Now, we explain in a little more detail how to choose I G based on the conditions in the previous paragraph. Essentially, condition (2) imposes an upper bound on the weight between I G and the ensembles B i (which depends only on the weight parameters inside an ensemble B i ); condition (3) imposes a lower bound on the weight between I G and the ensembles B i (which depends only on the weight of synapses between two consecutive ensembles). The upper and lower bounds for the weight of I G have to be consistent, meaning the parameters have to be chosen in such a way that the lower bound is smaller than the upper bound. Condition (1) itself only influences I G indirectly by imposing a lower bound on the weight of the connection between two consecutive ensembles (which depends only on the weight parameters inside the ensembles). Typically, for our network to behave as intended, one chooses the weight between consecutive ensembles close to the lower bound imposed by condition (1). Then, one checks the upper bound on the weight from I G to B i imposed by condition (2) and the lower bound on the same quantity imposed by condition (3).
The behavior of the network changes once I G is inactivated. If bistable unit B i receives input from the previous bistable unit then the stable point at 0 Hz disappears and only the 50 Hz stable point remains. This is shown in Figure 4C. Observe that if the gain functions behave as shown in Figures 4B,C then the spiking network will satisfy conditions (1), (2), and (3) from above.
We observe a few additional points. Increasing the rate of I G decreases the gain functions of excitatory and inhibitory neurons. This enlarges the stable region around 0 Hz, but shrinks the stable region around 50 Hz, and also shifting to a slightly smaller rate value. Thus, satisfying the three conditions, corresponds to not shrinking the stable region around 50 Hz by too much in (2) and enlarging the stable region around 0 Hz enough to satisfy (3). This observation also implies that it is not feasible to have the high firing rate stable point at too small firing rates, as satisfying all three conditions is difficult.
Again, from the rate dynamics we see that: once I G is inactive, and B i+1 gets active, the competitive inhibition I C will spike and guarantee that only B i+1 remains active. The competitive inhibition I C functions here as a coincidence detector of two active bistable units. It remains inactive if only a single bistable unit is active, but once two or more are active, its neurons receive enough excitatory input to spike. Such behavior is a simple consequence of the thresholding of the membrane potential of neurons to produce an action potential. With such a one spike volley, I C gives strong enough inhibitory input to quickly inactivate B i , but B i+1 recovers and keeps active as it gets excitatory input from B i and from itself. As soon as the activity B i+1 stabilizes, the gating inhibition I G turns active and prevents activity from going onwards to activate B i+2 . In particular, these observations indicate how the weight parameters between I C and the ensembles B i should be chosen. Essentially, the weight from B i to I C should be such that if a single B i is active, I C is not active. On the left, the gain function of excitatory (red) and inhibitory(teal) neurons inside a bistable unit B i is depicted. As shown in the center, if I G is turned on, then the shape of the gain function produces two stable states, one at 0Hz and another at approximately 50Hz. The stable states are shown on the right. (C) Analogous to (B), but now with I G turned off. As seen on the left, the gain function shape changes when receiving input from the previous pattern and the stable state at 0Hz ceases to exist. The network always settles for the only stable point around 50 Hz, a fact which is shown on the right. However, if two or more B i are active then I C neurons should start spiking. This condition is pretty easy to guarantee due to the threshold of the membrane potential. In addition, the weight from I C to B i is chosen large enough to guarantee that if I C is active then its activity is strong enough to inactivate the patterns which are active.
The last observation we make about the spiking network in the encoding mode is that one should decide when to turn I G active again. To simplify things, we turn I G active again as soon as the rate in the next ensemble in the sequence stabilizes. In our simulations, we did not observe that it was necessary to be too strict about the moment that I G is turned active again (as illustrated by Figures 2D,E), as long as I G was turned active after the activity in the next ensemble in the sequence had stabilized.

Replay Mode
The spiking network is in the replay mode if the first bistable unit is activated in a synchronous manner. This synchronous activation of the neurons in a bistable unit triggers dendritic spikes in neurons in the next bistable unit, which in turn, trigger synchronous spikes at the corresponding neurons and the activity can continue in a synchronous wave, much like in a synfire chain (Abeles, 1982). Such activity travels only forward because the dendrites have a refractory period of 5 ms after transmitting a dendritic spike to the soma.

Reverse Replay
The replay mode of the spiking network can produce reverse replays if the last bistable unit is activated synchronously. As in the forward replays, the activity proceeds synchronously through dendritic spikes. Again, it has a single direction of motion, due to the refractory period of the dendritic compartments that produce dendritic spikes. We emphasize that the increased synaptic weights in the replay mode, due to lower ACh levels, are essential if one wants the encoding and reverse replay modes to reliably behave as intended.

Robustness to Noise and Parameter Variations
We want to point out that the chosen parameters are biologically plausible and do not need to be fine-tuned. To this end, we produce Figure 5, showing that the choice network ensemble sizes and the choice of network weights are not crucial for the spiking network to behave as intended. In Figures 5A,B we double the network ensemble sizes, while halving all the weight parameters (alternatively, one could halve the connection probability). The encoding and replay modes function the same way as before. In Figures 5C,D we change all the used parameters following the guide of the gain functions and show another set of parameters where the encoding and replay modes behave as intended. We observe that while the conditions of the dynamical system imply one cannot take a single parameter and change it significantly, one can change the parameters if a certain set of reasonable conditions is ensured (in other words, one has to respect the coupling among different parameters).
As a final note, we emphasize that the plots in Figure 1 were produced using a gaussian multiplicative noise on each individual weight. It is not crucial for the spiking network that each individual weight is as defined, only that the total average weight inside a particular ensemble or between ensembles is close to the desired values.

Summary
In this work, we show a model which can store in behavioral time place cell activity traces corresponding to paths run by an animal and produce fast bidirectional replays of these place cell activity traces during sleep and quiet wakefulness, even after a single run of the path. Furthermore, our approach works independently of the geometry of the environment and of particularities of the path taken (such as loops present in it). As observed in the literature, the replays produced by our model have constant speed and are not influenced by the potentially varying speed of the mouse during the run. As the primary conclusion of our model, we predict the existence of a population of sequence cells, dedicated to the task of encoding of sequential activity, which enables the model to properly deal with loops in the path of the mouse. This is an experimentally verifiable hypothesis, as sequence cell firing should be distinguishable from place cell firing, as well as from head direction cells, border cells and so on [time cells, (Eichenbaum, 2014) could pose a challenge as sequence cells also measure time in a certain sense]. Sequence cells enable behavioral time storage of paths traversed and subsequent spontaneous fast replay of place cell activities as observed in rodents during slow-wave sleep and quiet wakefulness. The sequence cells are organized in sequence encoding modules where, crucially, the activity progresses through stereotypical sequences of ensembles. The sequence encoding modules possess two modes of operation one used during encoding and another during replay. The encoding mode relies on a disinhibitory mechanism to control the speed of asynchronous activity progression in the ensembles. The replay mode relies on synchronous activity and resulting dendritic spikes to trigger fast replay of the sequence encoding modules. Lower levels of ACh, leading to higher excitability, allow the sequence encoding modules to be played in reversewhich would also account for the observed reverse replays of place cell activity in rodents. In addition, reverse replays happen mostly after rewards, which in our model, permit the sequence to be played in reverse to find the causes that produced the reward. In particular, a very long sequence encoding module can be used to store multiple paths, by just using the reverse replays to mark the relevant starting and ending positions in the sequence. Effectively, this allows the sequence to be split up in many subsequences, each corresponding to a relevant memory.
In the upcoming subsections, we discuss the bioplausibility of the properties we used, such as how the sequence encoding modules can be formed, the form of inhibitory processing, the dendritic spikes and the different cholinergic environment in movement and quiescence.

Formation of Sequence Encoding Modules
The sequence encoding modules proposed here rely on complex dynamics and a specific connectivity structure. Their structure can either be hardwired, as a product of evolution, or perhaps more likely, emerge as a consequence of the interaction of the many plasticity rules present in the hippocampus. For example, sequences of ensembles forming long synfire chains can spontaneously emerge as a product of spike-timing dependent plasticity (STDP) and axon remodeling (Jun and Jin, 2007), or a three factor variant of STDP (Frémaux and Gerstner, 2015) which is modulated by global population activity (Weissenberger et al., 2017). Such long sequences of ensembles form the basis of the sequence encoding modules in here.
Furthermore, while the sequence encoding modules presented in this paper require a defined connectivity structure, in the brain they would likely emerge as a consequence of plasticity (or due to evolution). Forming the sequence encoding modules through learning should produce sequence cells that are more heterogenous and whose ensembles overlap (as for example in Weissenberger et al., 2017). In particular, in Weissenberger et al. (2017), it has been shown that such heterogeneity and ensemble overlap poses no problem, as through proper learning rules the ensemble sequences can be learned to exhibit the desired behavior. Our sequence encoding modules can be described with very few parameters, so using learning rules to induce the correct behavior is (at least theoretically) relatively simple.  Table 6). This shows that the choice of parameters we used is not unique. Instead, as long as the conditions on the gain functions are satisfied, there is a large range of weight parameters where the model performs as expected. (D) Raster plot of the replay mode using the alternative parameter set (see Table 6).
In addition, in Steemers et al. (2016), experimental evidence is presented for attractor dynamics in the hippocampus. Their data shows that linear changes in the environment context induce dynamic, non-linear changes in hippocampal activity patterns. Moreover, Pfeiffer and Foster (2015) present evidence for autoassociative and heteroassociative dynamics in the hippocampus. Densely connected populations representing attractor states (autoassociative dynamics) are sparsely connected among themselves (heteroassociative dynamics), a characteristic which permits sequential processing of information. Their data reveals a fundamental discretization in the memory retrieval processes of the hippocampus, supporting the idea that sequences of attractor states are sparsely connected, as is the case in the sequence encoding modules. Finally, Grosmark and Buzsáki (2016) report two different firing dynamics in place cells, one of rigid and fast-firing pyramidal neurons, and another of plastic and slow-firing neurons. Our work provides an interesting interpretation of the reasons for such difference in firing dynamics, with the rigid, fast-firing neurons essentially functioning as our sequence cells and driving replays of the plastic neurons.

Inhibitory Neural Processing
Crucial to the proper functioning of the encoding mode of the sequence encoding modules are the two forms of inhibitory neural processing, the gating inhibition and the competitive inhibition. Speed-correlated firing of hippocampal neurons has been reported and is controlled by glutamatergic interneurons in the medial septum (Fuhrmann et al., 2015;Justus et al., 2017). The firing rates of these glutamatergic interneurons are proportional to the speed of locomotion and control the theta oscillations during locomotion, being crucial to the initiation of theta oscillations. These glutamatergic septo-hippocampal projections perform disinhibitory control of the activity in areas CA3 and CA1, which is in line with our proposal of a gating inhibitory mechanism that is switched off during movement, possibly via some disinhibitory mechanism.
Furthermore, a specific type of inhibitory interneurons, called VIP interneurons, are known to have a disinhibitory function and seem to be ubiquitous in the brain (Letzkus et al., 2011;Jiang et al., 2013;Pi et al., 2013). These VIP interneurons perform disinhibitory control in the brain which is activated under specific behavioral conditions (Pi et al., 2013), a fact which is consistent with our hypothesis that the gating inhibition is switched off when the animal changes its location.
The winner-take-all dynamics implemented using the competitive inhibition are commonplace in the theoretical neuroscience literature (Maass, 2000;Binas et al., 2014). Furthermore, anatomical studies have shown that cortical networks contain essential features which can be implemented via winner-take-all circuits (Douglas and Martin, 2004;Carandini and Heeger, 2012).
As the above examples show, there exists a large diversity of inhibitory interneuron subtypes each potentially serving a different purpose (Petilla Interneuron Nomenclature Group et al., 2008). Such diversity of interneuron types and functions is a strong indicator that the simple forms of inhibitory control proposed here can be realized in the brain.

Sharp-Wave Ripples, Dendritic Spikes and Replays
Previous works (Jahnke et al., 2015) also suggest the relation between dendritic spikes, sharp-wave ripples and replays. Our work goes along similar lines, suggesting that replays are a product of synfire chain like behavior, driven by dendritic spikes, of special populations of neurons, that fire in a stereotypical manner, independent of instantial specifics of an animal's movement. Sharp-wave ripples are the most synchronous events reported in the hippocampus (Ylinen et al., 1995). The replay of place cell activities is known to be concurrent with ripple events (Nádasdy et al., 1999;Lee and Wilson, 2002). Such synchronous events favor the occurrence of dendritic spikes, which, in turn, favor synchronous excitatory activity along a feedforward structure (Jahnke et al., 2015;Haga and Fukai, 2018).

ACh Levels and Neuromodulators
It is known that Acetylcholine levels are significantly reduced in slow-wave sleep and quiet wakefulness, in comparison to an awake state (Kametani and Kawamura, 1990;Marrosu et al., 1995;Hasselmo, 1999). It is further believed that such change in the cholinergic modulation is crucial for the process of acquiring and consolidating new memories (Hasselmo and Bower, 1993;Hasselmo, 1999Hasselmo, , 2006. In accordance with this work, Hasselmo (2006) proposes that different levels of Acetylcholine during exploration, slow-wave sleep and quiet wakefulness set the appropriate dynamics of the network for encoding and consolidating memories (replays). In fact, septal cholinergic activation to CA3 pyramidal neurons has been experimentally shown to enhance theta oscillations and effectively suppress sharp-wave ripples in the CA3 network (Hasselmo, 1999(Hasselmo, , 2006Vandecasteele et al., 2014). The nicotinic receptors either on the soma, pre-synaptic boutons, or post-synaptic spines of CA3 pyramidal cells can effectively inhibit firing of CA3 pyramidal neurons. Another potential inhibitory mechanism could be through the GABA-B receptors on CA3 pyramidal neurons, which can serve a similar role, although the origin of inputs to GABA-B receptors are currently unknown.
Lastly, a recent study (Singer and Frank, 2009) indicates that replays, in both directions, are more likely to occur after rewards are obtained. In particular, their results suggest that reverse replays could be a mechanism to reinforce experiences that produced rewards. This connects nicely to our work, as replays (and especially reverse replays) in our model rely on higher excitability of the network which does not need to happen exclusively due to lower levels of ACh. Other neuromodulators, such as dopamine, have been reported to possess increased levels in the presence of rewards (Arias-Carrión et al., 2010), and these increased levels typically induce higher excitability of neurons (Henze et al., 2000;Kobayashi and Suzuki, 2007). However, there are reports indicating inhibitory effects of dopamine (Stanzione et al., 1984;Smiałowski and Bijak, 1987), and, to the best of our knowledge, it is currently unknown whether the total neuromodulatory effect in the presence of rewards is excitatory or inhibitory in the hippocampus.

Other Models of Sequence Replay
There are alternative models that achieve partially similar results to the ones reported here. These previous works fail to produce replays which cope with the irregularity of behavior and need specific assumptions about the geometry of the environment or about the trajectory taken (e.g., that it does not contain loops). In Romani and Tsodyks (2015), a short-term synaptic depression mechanism is shown to produce two different regimes of activity of place cell rate dynamics, that can account for replays observed in the literature (Wilson and McNaughton, 1994;Lee and Wilson, 2002) and preplays (Dragoi and Tonegawa, 2011;Ólafsdóttir et al., 2015;Grosmark and Buzsáki, 2016), at least under the assumption that a map of the environment is stored in the synaptic connections among the place cell rate units. In Theodoni et al. (2018), it is studied how standard plasticity rules (Markram et al., 1997;Bi and Poo, 1998) shape the patterns of activity of place cell rate units and it is found that the theta rhythm speeds up encoding of newly explored environments. The mechanism for generating the replays is the same as the one used in Romani and Tsodyks (2015). In Chenkov et al. (2017), a spiking network consisting of Hebbian assemblies that are weak and sparsely connected uses pre-existing recurrent connections to quickly amplify weak signals. As in our work, the assemblies are activated like a synfire chain during replays and network states are altered by globally changing neuronal excitability (say, by cholinergic modulation). The authors of Jahnke et al. (2015), in line with our work, propose that sharp-wave ripples and replays are interconnected and that nonlinear dendritic computation, in the form of dendritic sodium spikes, accounts for the observed replays in the rodent hippocampus. As in this work, the dendritic spikes distinguish between an asynchronous state present while the animal is awake and a synchronous activity state, related to sharp-wave ripples, occurring during slow-wave sleep and quiet wakefulness. In Haga and Fukai (2018), dendritic processing is used in the generation of replays and preplays of activity. In an hypothesis similar to ours, CA3 possesses innate sequential structure that gives rise to spontaneous firing sequences and can be used to one-shot learn place fields. Still, we note that our sequence encoding modules are the first to account for a compressed representation of place cell activity to produce replays, both forward and reverse, which are independent of the instantial specifics of the animal's movement and environment.

Compressing Sequence Representations in Other Contexts
In this work, we restrict sequence processing to a navigation task because the hippocampus is extensively studied in this regard and there are many reported phenomena (sharp-wave ripples, cholinergic modulation, forward and reverse replays, dendritic spikes) which fit well into our framework of sequence encoding modules.
Still, the ability to divide complex tasks into simpler, compounding, sequential subtasks is essential to understand and perform the original more complex task. Identifying these subtasks and their interconnectedness, compressing useful sequences into the most relevant parts and factoring out the non-relevant ones, is an important characteristic if one wants to learn ever more complex behavioral patterns. It is a given that animals perform this form of sequence processing operation on a high-level. In this work, it is further argued that this sequence processing is such an essential operation, that the brain might develop special modules of cells capable of extracting the relevant subparts contained in a more complex task.

Limitations and Outlook
While our model produces near perfect replica of replays and reverse replays of previous trajectories, doing so after a single traversal of the path, there are many avenues which are still left to be explored and which go beyond the scope of this work. First, our model does not include the observed theta phase precession (O'Keefe and Recce, 1993) and theta sequences (Foster and Wilson, 2007). Theta phase precession corresponds to the fact that place cells fire progressively earlier in the theta phase as the animal traverses the place field. As a result, theta sequences, that is sequences of place cells firing corresponding to the movement of the animal, naturally emerge. This is currently not considered in our model. In principle, it is possible to incorporate theta waves and phase precession by increasing the level of detail in place cell modeling. However, this goes beyond the scope of our work as the replay phenomena is already reproduced even with a simple model of place cell activity. This thus preclude our model from offering insight into how theta waves interplay with the replay/ripple phenomena. However, it also shows that theoretically theta waves are not strictly necessary to obtain robust bidirectional replays in a one shot fashion, which indicates that the presence of theta waves in conjunction with ripples cannot be fully understood exclusively from the point of view of replays.
Second, the firing rates occurring in our simulations are typically higher and more uniform than what has been reported in the literature (Mizuseki and Buzsáki, 2013). However, the high firing rates of our model could be significantly decreased by combining it with some mechanism which induces excitability decay during immobility (e.g., short-term synaptic depression Stevens and Wang, 1995;Markram and Tsodyks, 1996;Abbott et al., 1997). The increase in excitability during movement would greatly facilitate the task of transitioning from one sequence ensemble to the next (while staying at the current ensemble during immobility), and, as a consequence, the firing rates of active sequence ensembles would not need to be as high.
In particular, there is indirect experimental support for such a mechanism (Kay et al., 2016), as hippocampal neurons have been reported to possess significantly reduced firing rates during immobility. Furthermore, in our model, place cells may have lower firing rates than sequence cells, as the only condition is that the combination of place cell firing and plasticity rule used must be adequate to associate the place cell ensembles to the currently active sequence ensemble. In particular, lower firing rates of place cells as opposed to sequence cells is consistent with the results reported in Grosmark and Buzsáki (2016). Finally, the sub-populations we simulate are precisely those which represent the current behavior of the mouse and, thus, must have naturally higher firing rates than the rest of the network. Therefore, the entire network naturally has more realistic firing rate profile.
Third, we restrict our discussion and analysis to the initial learning of a sequence in a single environment. There are some possibilities about how the sequence cells would behave during the second run of an already traversed path. For example, each new run, a new sequence encoding module may be selected and the place cells are associated to it (independent of whether this path had been traversed before). Alternatively, the same sequence of place cells may be associated to the corresponding sequence encoding module used in the first run of this path (for example, by associating the place cells to the corresponding sequence cells). The latter case is less likely though, as over multiple runs, the place cells would tend to be associated to many sequences which leads again to the problem that the context cannot be decided during replays (as was the case when replays are produced exclusively by the place cell connectivity). However, there is a whole range of intermediate cases where the sequence encoding module used would not be neither completely independent of the path it is associated to, nor completely determined by this path. This is an interesting matter because if sequence cells are associated to a specific path, their firing pattern would match closely that of place cells, whereas if they would be independent their firing would not appear as that of place cells. In particular, as place cells are known to remap across different environments (Alme et al., 2014), this indicates that sequence encoding modules are rarely reused. Consequently, sequence cells might remap depending on the environment, just like place cells. Observe that, as shown in Weissenberger et al. (2017), it is possible to pack many such sequence encoding modules in a population of neurons.
Lastly, we recognize that our model fails to shed light on how certain replays seem to correspond to novel virtual experiences, seemingly combining prior experiences to create a novel one (Gupta et al., 2010;Pfeiffer and Foster, 2013). However, this phenomena should rely on the ability to produce coherent internal models of the environment, which can thus be used to produce coherent virtual experiences. Therefore, comprehending such phenomena is clearly out of the scope of our model.

Reduced Rate Encoding Model
To understand the dynamics of the spiking network, we consider a reduced model where we collapse ensembles into abstract rate units. As before, we denote by B 1 , . . . , B ℓ the ℓ bistable units that form the sequence, by I C the competitive inhibitory population to all ℓ ensembles and by I G the gating inhibitory input. The activity of unit X at time t is given by λ X (t) which corresponds to the rate of the ensemble. The units are connected into a network which dictates their interaction. We denote the interaction from unit X to itself by w X , and the interaction strength from unit Y to X by w XY . The connectivity is as in Figure 2A. We assume that interactions with an inhibitory (excitatory) source negatively (positively) affect the target's rate. Moreover, we assume that the activity λ I G (t) alternates between high and low rate states as depicted in Figures 2B,C. The rate of other units evolves according to the following set of differential equations: for the rate of unit B i and for the rate of unit I C , where I B i (t) is: and I I C (t) is: the functions f and g are generally chosen as saturating functions such as the sigmoid function e x 1+e x and the factor s multiplying the right-hand side of Equation (2) controls the speed of inhibitory dynamics relative to excitatory dynamics in the reduced rate model. These equations represent how the rate of the populations in the spiking network should evolve, with f and g being gain functions of neurons and I B i (t) and I I C (t) being the total input received by the respective neurons. The goal is to set the parameters of the system such that the activity evolves as in Figure 2C.
To produce the dynamics of Figures 2B,C, it is enough to study this rate model with two bistable units B i and B i+1 , with the additional constraint that weights from and to B i are equal to those from and to B i+1 . This allows for a simple extension to ℓ bistable units, as the symmetry of the weights guarantees that the system will have the same rate dynamics independent of which bistable unit is currently active. In particular, with only two bistable units, it is possible, under the assumption that f and g are piecewise linear, to write a simple dynamical system of equations that can be fully solved and understood.
Moreover, it is crucial that f is chosen to guarantee bistable rate dynamics of each B i , that is, specifically that close to zero or negative input produces silent (zero rate) responses and that large enough inputs produce a maximal rate response. In addition, from g it is required that it guarantees the behavior we expect from I C , namely, that it serves as a detector for two bistable units being active at the same time, producing maximal response if it receives input from two bistable units close to the maximal response, and being silent if it receives not much more than what a single maximally active bistable unit can produce.
Given this, one can state simple requirements that are essential: (1) that the system is at a stable point while B 1 is at its maximal rate, B 2 silent and I G is at the high rate state. (2) that B 1 can activate B 2 to its maximal rate once I G is at the low rate state; (3) that the rate of I C increases when both bistable units are jointly active; (4) I C being active is strong enough to turn silent both B 1 and B 2 ; (5) the speed of the inhibitory I C dynamics is faster than that of B 1 and B 2 , which can force B 1 below the point where it can reactivate itself, while B 2 remains above said point. In particular, the speed of inhibitory I C dynamics (controlled by s) is crucial to guarantee that the rate system can produce the desired rate dynamics with symmetric weights.
These requirements of the rate-based system, therefore, imply that one of three possible outcomes can happen, where only the third is desired: (1) B i and B i+1 turn both inactive; (2) the rates of B i , B i+1 and I C oscillate; (3) B i turns inactive, but B i+1 stays active, driven by activity in B i and by its own dynamics.

Overview of Parameters-Rate Model
Table 1, we summarize the notation and parameters used in the rate reduced model.

Neuron Models
In the spiking version of the model each ensemble unit is composed of an excitatory (E x ) population with N E units and an inhibitory (I x ) population with N I units where the lower-case subscript x corresponds to the index of the pattern. We follow the convention of denoting the populations by superscript and the neuron index by subscripts. The membrane potential dynamics of neuron i in population X follow: The parameter values are summarized in Table 2 and the dynamics are consistent with (Clopath et al., 2010;Litwin-Kumar and Doiron, 2014). The main difference between the excitatory and inhibitory units is that the excitatory neurons have an adaptation current and an adaptive threshold (Brette and Gerstner, 2005;Badel et al., 2008) whereas the inhibitory neurons are non-adapting. The adaptive threshold follows: We record a spike when the voltage exceeds 20mV and we reset it to and hold it at V re for the refractory period duration τ abs .
For an adaptive threshold we set it to V T + A T after a spike. The adaptation current for a neuron in population E x follows: and it is increased by b w when neuron i spikes.

Synapse Dynamics
Connection probability between population X and Y is given by p XY and the strength of a connection between unit i ∈ X and unit j ∈ Y is given by w XY ij . The total conductance of neuron i in population X from a source of type Y (excitatory or inhibitory) follows: is the synaptic kernel for input from neurons of type Y, Y k corresponds to subpopulations of type Y, * is the convolution operator and s Y k j (t) = n ℓ=1 δ(t − t Y k i,ℓ ) denotes the spike train of neuron j in population Y k . Additionally, each neuron receives external input given by the spike train s XY i,ext (t) which follows a homogeneous Poisson process with rate r XY ext and weight w XY ext for neurons in population X where Y can either be an excitatory (E) or an inhibitory (I) source. The resolution of the simulation is 0.1 ms and the coupling parameters can be found in Table 3.

Dendritic Spikes
Dendritic spikes are triggered by excitatory spikes arriving in a short time span at a particular position in the dendrite of a neuron. We incorporate them by setting the synaptic conductance of the afferent input to produce a stereotypical current pulse, using parameters observed in single-neuron experiments (Ariav et al., 2003;Gasparini et al., 2004;Polsky et al., 2004;Gasparini and Magee, 2006) and similar to the one used in recent spiking network models (Memmesheimer, 2010;Jahnke et al., 2012Jahnke et al., , 2015. Whenever the excitatory input to a (dendritic branch of) a neuron changes the conductance of the neuron by a threshold θ dspike in a time window of 2 ms, a dendritic spike is initiated. This produces a stereotypical current pulse at the soma whose effect is triggered with a time lag of τ DS after the dendritic threshold is crossed. This time lag is on the same order as the one observed in single neuron experiments. The stereotypical current pulse, like that of Jahnke et al. (2015), is described by: where the factors A, B, C and the decay time constants τ ds,1 , τ ds,2 , τ ds,3 are chosen so that the depolarizing effect on the soma fits experimental data (Ariav et al., 2003;Gasparini et al., 2004;Polsky et al., 2004;Gasparini and Magee, 2006), and whose values are shown in Table 4. Moreover, we add a refractory period t ref ,ds = 5 ms, as in Memmesheimer (2010), Jahnke et al. (2012), and Jahnke et al. (2015), to these dendritic branches after the stereotypical current pulse has influenced the soma, in accordance to the saturating effect of somatic depolarization of dendritic spikes reported in the literature (Ariav et al., 2003).

Cholinergic Modulation
In accordance with (Hasselmo and Bower, 1993;Hasselmo, 1999Hasselmo, , 2006, we use the fact that, during exploration, high levels of cholinergic modulation are present and the excitatory feedback is thus in part suppressed; during sleep and quiet wakefulness, the ACh levels drop and this suppression is reduced, thus leading to generally stronger synaptic weights. We use a simple 2.5 factor to increase the strength of excitatory synapses during the replay mode in comparison with the encoding mode. Though not too much data is available on the cholinergic effect in vivo, this factor is in line with the one obtained from hippocampal slices of areas CA3 and CA1 in rats (Hasselmo and Schnell, 1994;Hasselmo et al., 1995).

Hebbian Plasticity From Sequence to Place Cells
To connect a bistable unit in the sequence encoding module with the corresponding population of place cells, we use a Hebbian like plasticity mechanism. Initially, all synapses from sequence to place cells are silent, and they are potentiated and turned active if the corresponding pre and post cells fire at high rates in a short period. The exact implementation was done by a voltage-based STDP plasticity rule (Clopath et al., 2010) which tends to increase synaptic weight if both pre and post neurons have a high firing rate. As the process of activating a silent synapse is equivalent to LTP (Isaac et al., 1995;Kerchner and Nicoll, 2008), we speed up the dynamics to make it consistent with the timescale of LTP. In accordance with (Clopath et al., 2010;Litwin-Kumar and Doiron, 2014), the dynamics of the synaptic strength w ij from sequence cell j to place cell i are given as:   where R(x) is a linear-rectifying function, that is, R(x) = 0 if x ≤ 0 and R(x) = x otherwise; u i and v i represent the membrane voltage V i low-pass filtered with time constants τ u and τ v , respectively; and x j represents the spike train s j low-pass filtered with time constant τ x . The constants A LTD , A LTP , θ LTD , θ LTP , τ u , τ v and τ x are given in Table 5 and are the same as Litwin-Kumar and Doiron (2014), with the exception of A LTP , which is increased to speed up the dynamics and bring the timescale of activating a synapse into the hundred milliseconds range. Furthermore, the maximum value of w ij is 5pF.

Overview of Parameters
Here we summarize the parameters of the spiking neural network simulation of exponential integrate-and-fire neurons.
Tables 2, 3 contain the parameters regarding the membrane potential dynamics of a neuron, and the parameters regarding the neuronal coupling, respectively. Table 4 contains the dendritic spikes parameters that were used, and Table 5 the Hebbian plasticity ones.
For Figure 1, we use a multiplicative gaussian noise N(1, 0.2) for each individual synaptic weight (shown in Table 3) in the spiking network. For Figures 5C,D, we use a different set of synaptic weights which we show in Table 6.
In addition, code implementing the model can be found at GitHub.

AUTHOR'S NOTE
A preprint version of this paper can be found at BioRxiv (Gauy et al., 2018).