Synaptic learning rules for sequence learning

Remembering the temporal order of a sequence of events is a task easily performed by humans in everyday life, but the underlying neuronal mechanisms are unclear. This problem is particularly intriguing as human behavior often proceeds on a time scale of seconds, which is in stark contrast to the much faster millisecond time-scale of neuronal processing in our brains. One long-held hypothesis in sequence learning suggests that a particular temporal fine-structure of neuronal activity — termed ‘phase precession’ — enables the compression of slow behavioral sequences down to the fast time scale of the induction of synaptic plasticity. Using mathematical analysis and computer simulations, we find that — for short enough synaptic learning windows — phase precession can improve temporal-order learning tremendously and that the asymmetric part of the synaptic learning window is essential for temporal-order learning. To test these predictions, we suggest experiments that selectively alter phase precession or the learning window and evaluate memory of temporal order.


Introduction
It is a pivotal quality for animals to be able to store and recall the order of events ('temporal-order learning', Kahana, 1996;Fortin et al., 2002;Lehn et al., 2009;Bellmund et al., 2020) but there is only little work on the neural mechanisms generating asymmetric memory associations across behavioral time intervals (Drew and Abbott, 2006). Putative mechanisms need to bridge the gap between the faster time scale of the induction of synaptic plasticity (typically milliseconds) and the slower time scale of behavioral events (seconds or slower). The slower time scale of behavioral events is mirrored, for example, in the time course of firing rates of hippocampal place cells (O'Keefe and Dostrovsky, 1971), which signal when an animal visits certain locations ('place fields') in the environment. The faster time scale is given by the temporal properties of the induction of synaptic plasticity (Markram et al., 1997;Bi and Poo, 1998) -and spike-timing-dependent plasticity (STDP) is a common form of synaptic plasticity that depends on the millisecond timing and temporal order of presynaptic and postsynaptic spiking. For STDP, the so-called 'learning window' describes the temporal intervals at which presynaptic and postsynaptic activity induce synaptic plasticity. Such precisely timed neural activity can be generated by phase precession, which is the successive acrosscycle shift of spike phases from late to early with respect to a background oscillation ( Figure 1). As an animal explores an environment, phase precession can be observed in the activity of hippocampal place cells with respect to the theta oscillation (O'Keefe and Recce, 1993;Buzsáki, 2002;Qasim et al., 2021). Phase precession is highly significant in single trials (Schmidt et al., 2009;Reifenstein et al., 2012) and occurs even in first traversals of a place field in a novel environment (Cheng and Frank, 2008). Interestingly, phase precession allows for a temporal compression of a sequence of behavioral events from the time scale of seconds down to milliseconds ( Figure 1; Skaggs et al., 1996;Tsodyks et al., 1996;Cheng and Frank, 2008), which matches the widths of generic STDP learning windows (Abbott and Nelson, 2000;Bi and Poo, 2001;Froemke et al., 2005;Wittenberg and Wang, 2006). This putative advantage of phase precession for temporal-order learning, however, has not yet been quantified. To assess the benefit of phase precession for temporal-order learning, we determine the synaptic weight change between pairs of cells whose activity represents two events of a sequence. Using both analytical methods and numerical simulations, we find that phase precession can dramatically facilitate temporal-order learning by increasing the synaptic weight change and the signal-to-noise ratio by up to an order of magnitude. We thus provide a mechanistic description of associative chaining models (Lewandowsky and Murdock, 1989) and extend these models to explain how to store serial order.

Results
To address the question of how behavioral sequences could be encoded in the brain, we study the change of synapses between neurons that represent events in a sequence. We assume that the temporal order of two events is encoded in the asymmetry of the efficacies of synapses that connect We assume that each cell shows phase precession with respect to the LFP's theta oscillation (every second cycle is marked by a greay box). When the activities of multiple cells overlap, the sequence of behavioral events is compressed in time to within one theta cycle (two examples highlighted in the dashed, shaded boxes). Bottom: This faster time scale can be picked up by STDP and strengthen the connections between the cells of the sequence. Figure adapted from Korte and Schmitz, 2016. neurons representing the two events ( Figure 1). After the successful encoding of a sequence, a neuron that was activated earlier in the sequence has a strengthened connection to a neuron that was activated later in the sequence, whereas the connection in the reverse direction may be unchanged or is even weakened. As a result, when the first event is encountered and/or the first neuron is activated, the neuron representing the second event is activated. Consequently, the behavioral sequence could be replayed (as illustrated by simulations for example in Tsodyks et al., 1996;Sato and Yamaguchi, 2003;Leibold and Kempter, 2006;Shen et al., 2007;Cheng, 2013;Chenkov et al., 2017;Malerba and Bazhenov, 2019;Gillett et al., 2020) and the memory of the temporal order of events is recalled (Diba and Buzsáki, 2007;Schuck and Niv, 2019). We note, however, that in what follows we do not simulate such a replay of sequences, which would depend also on a vast number of parameters that define the network; instead, we rather focus on the underlying change in connectivity, which is the very basis of replay, and draw connections to 'replay' in the Discussion.
Let us now illustrate key features of the encoding of the temporal order of sequences. To do so, we consider the weight change induced by the activity of two sequentially activated cells i and j that represent two behavioral events (dashed lines in Figure 2A). Classical Hebbian learning (Hebb, 1949), where weight changes Dw ij depend on the product of the firing rates f i and f j , is not suited for temporal-order learning because the weight change is independent of the order of cells: Therefore, a classical Hebbian weight change is symmetric, that is, Dw ij À Dw ji ¼ 0. This result can be generalized to learning rules that are based on the product of two arbitrary functions of the firing rates. We note that, although not suited for temporal-order learning, Hebbian rules are able to achieve more general 'sequence learning', where an association between sequence elements is created -independent of the order of events. To become sensitive to temporal order, we use spiketiming dependent plasticity (STDP; Markram et al., 1997;Bi and Poo, 1998). For STDP, average weight changes depend on the cross-correlation function of the firing rates (example in Figure 2C, D), which is anti-symmetric: C ij ðtÞ ¼ C ji ðÀtÞ. Assuming additive STDP, that is, weight changes resulting from pairs of pre-and postsynaptic action potentials are added, the average synaptic weight change Dw ij between the two cells in a sequence can then be calculated explicitly (Kempter et al., 1999): where W is the STDP learning window (example in Figure 2E). We aim solve Equation 1 for given firing rates f i and f j . To do so, we assume that the synaptic weight w ij is generally small and thus only has a weak impact on the cross-correlation of the cells during encoding, that is, for the 'encoding' of a sequence the cross-correlation function is dominated by feedforward input, whereas the recurrent inputs are neglected. Next, let us show that the symmetry of W is essential for temporal-order learning. Any learning window W can be split up into an even part W even , with W even ðtÞ ¼ W even ðÀtÞ, and an odd part W odd , with W odd ðtÞ ¼ ÀW odd ðÀtÞ, such that W ¼ W even þ W odd . For even learning windows, one can derive from Equation 1 and the anti-symmetry of C ij that weight changes are symmetric, that is, Dw ij ¼ Dw ji ; therefore, only the odd part W odd of W is useful for learning temporal order.
To further explore requirements for encoding the temporal order of a sequence of events, we restrict our analysis to odd learning windows. We then can relate the weight change Dw ij to the essential features of C ij ðtÞ. To do so, we integrate Equation 1 by parts (with W replaced by W odd ), with the primitive W odd ðtÞ :¼ R t À¥ dt 0 W odd ðt 0 Þ and the derivative C 0 ij ðtÞ :¼ d dt C ij ðtÞ. Because W odd ðtÞ can be assumed to have finite support (note that R þ¥ À¥ dt W odd ðtÞ ¼ 0), the first term in Equation 2 vanishes. Also the learning window has finite support, and therefore we can restrict the integral in the second term in Equation 2 to a finite region of width K around zero: Figure 2. Model of two sequentially activated phase-precessing cells. (A) Oscillatory firing-rate profiles for two cells (solid blue and cyan lines). The black curve depicts the population theta oscillation. For easier comparison of the two different frequencies, the population activity's troughs are continued by thin gray lines, and the peaks of the cell-intrinsic theta oscillation are marked by dots. Dashed lines depict the underlying Gaussian firing fields without theta modulation. (B) Phase precession of the two cells (same colors as in A). The compression factor c describes the phase shift per theta cycle for an individual cell (2pc). For the temporal separation T ij of the firing fields and the theta frequency !, the phase difference between the cells is !cT ij . The dots depict the times of the maxima in (A). (C) Resulting cross-correlation for the two firing rates from (A). The solid red curve shows the full cross-correlation. The dashed line depicts the cross-correlation without theta-modulation. The gray region indicates small (t<20 ms) time lags. (D) Same as in (C), but zoomed in. Note that the first peak of the theta modulation is at a positive non-zero time lag, reflecting phase precession. The dashed black curve shows the approximation of the cross-correlation for the analytical treatment (Materials and methods, Equation 17). (E) Synaptic learning window. The gray region indicates the region in which the learning window is large, and this region is also indicated in (C) and (D). Positive time lags correspond to postsynaptic activity following presynaptic activity. Parameters for all plots: where K describes the width of the learning window W (gray region in Figure 2E). The integral in Equation 3 can be interpreted as the cross-correlation's slope around zero, weighted by the symmetric function ÀW odd ðtÞ; interestingly, features of C ij for jtj ) K, for example whether side lobes of the correlation function are decreasing or not, are irrelevant. As a generic example of sequence learning, let us consider the activities of two cells i and j that encode two behavioral events, for example the traversal of two place fields of two hippocampal place cells. In general, the cells' responses to these events are called 'firing fields'. We model these firing fields as two Gaussian functions G 0;s and G Tij;s that have the same width s but different mean values 0 and T ij (we note that T ij and s are measured in units of time, that is, seconds; Figure 2A, dashed curves). In this case of identical Gaussian shapes of the two firing fields, the cross-correlation C ij ðtÞ is also a Gaussian function, denoted by G Tij; ffiffi 2 p s , but with mean T ij and width ffiffi ffi 2 p s (dashed curve in Figure 2C). The value s ¼ 0:3 s, which we use in the example of Figure 2, matches experimental findings on place cells (O'Keefe and Recce, 1993;Geisler et al., 2010).
It is widely assumed that phase precession facilitates temporal-order learning Dragoi and Buzsáki, 2006;Schmidt et al., 2009), but it has never been quantitatively shown. To test this hypothesis and to calculate how much phase precession contributes to temporal-order learning, we consider Gaussian firing fields that exhibit oscillatory modulations with theta frequency ! ( Figure 2A, solid curves). The time-dependent firing rate of cell i is described by f i ðtÞ / G i;s ðtÞ 1 þ cos½!ðt À c i Þ f g, that is, a Gaussian that is multiplied by a sinusoidal oscillation; see also Equation 11 in Materials and methods. Phase precession occurs with respect to the population theta, which oscillates at a frequency of ð1 À cÞ! that is slightly smaller than !, with a 'compression factor' c that is usually small: 0 c ( 1 (Dragoi and Buzsáki, 2006;Geisler et al., 2010). This compression factor c describes the average advance of the firing phase -from theta cycle to theta cycle -in units of the fraction of a theta cycle; c thus determines the slope !c of phase precession ( Figure 2B). A typical value is c » p=ð4s!Þ, which accounts for 'slope-size matching' of phase precession (Geisler et al., 2010); that is, c is inversely proportional to the field size L :¼ 4s of the firing field, and the total range of phase precession within the firing field is constant and equals p 180 . If there are multiple theta oscillation cycles within a firing field (!s ) 1), which is typical for place cells, the cross-correlation C ij ðtÞ is a theta modulated Gaussian (solid curve in Figure 2C; see also Equation 15 in Materials and methods).
The generic shape of the cross-correlation C ij in Figure 2C allows for an advanced interpretation of Equation 3, which critically depends on the width K of the learning window W. We distinguish here two limiting cases: narrow learning windows (K ( 1=! ( s), that is, the width K of the learning window is much smaller than a theta cycle and the width of a firing field, and wide learning windows (K ) s), that is, the width K of the learning window exceeds the width of a firing field. Let us first consider narrow learning windows. Only later in this manuscript, we will turn to the case of wide learning windows.
Dependence of temporal-order learning on the overlap of firing fields for narrow learning windows (K ( 1=! ( s) We first show formally that sequence learning with narrow learning windows requires that the two firing fields do overlap, that is, their separation T ij should be less than or at least similar to the width s of the firing fields. In Equation 3, which was derived for odd learning windows, the weight change Dw ij is determined by C 0 ij ðtÞ around t ¼ 0 in a region of width K. For narrow learning windows (K ( 1=!), this region is small compared to a theta oscillation cycle and much smaller than the width s of a firing field. Because the envelope of the cross-correlation C ij ðtÞ is a Gaussian with mean T ij and width ffiffi ffi 2 p s, the slope C 0 ij ðt ¼ 0Þ scales with the Gaussian factor G Tij; ffiffi 2 p s ð0Þ / exp ½ÀT 2 ij =ð4s 2 Þ. The weight change Dw ij therefore strongly depends on the separation T ij of the firing fields. When the two firing fields do not overlap (T ij ) s), the factor exp ½ÀT 2 ij =ð4s 2 Þ quickly tends to zero, and sequence learning is not possible. On the other hand, when the two firing fields do have considerable overlap (T ij < s) we have exp ½ÀT 2 ij =ð4s 2 Þ < 1. In this case, sequence learning may be feasible with narrow learning windows. In this section, we will proceed with the mathematical analysis for overlapping fields, which allows us to assume exp ½ÀT 2 ij =ð4s 2 Þ » 1.
For overlapping firing fields (T ij < s), let us now consider the fine structure of the cross-correlation C ij ðtÞ for jtj<K, as illustrated in Figure 2D. Importantly, phase precession causes the first positive peak (i.e. for t>0) of C ij to occur at time c T ij with c ( 1 (Dragoi and Buzsáki, 2006;Geisler et al., 2010); phase precession also increases the slope C 0 ij ðtÞ around t ¼ 0, which could be beneficial for temporal-order learning according to Equation 3. To quantify this effect, we calculated the crosscorrelation's slope at t ¼ 0 (see also Equation 18 in Materials and methods): How does C 0 ij ð0Þ depend on the temporal separation T ij of the firing fields? If the two fields overlap entirely (T ij ¼ 0) the sequence has no defined temporal order, and thus C 0 ij ð0Þ is zero. For at least partly overlapping firing fields (T ij < s) and typical phase precession where c ¼ p=ð4!sÞ ( 1, we will show in the next paragraph (and explain in Materials and methods in the text below Equation 18) that the second addend in Equation 4 dominates the other two. In this case, C 0 ij ð0Þ is much higher as compared to the cross-correlation slope in the absence of phase precession (c ¼ 0), leading to a clearly larger synaptic weight change for phase precession. The maximum of C 0 ij ð0Þ is mainly determined by this second addend (multiplied by G Tij; ffiffi 2 p s ð0Þ) and it can be shown (see Materials and methods) that this maximum is located near T ij » ffiffi ffi 2 p s . The increase of C 0 ij ð0Þ induced by phase precession can be exploited by learning windows W that are narrower than a theta cycle (e.g. gray regions in Figure 2C,D,E). To quantify this effect, let us consider a simple but generic shape of a learning window, for example, the odd STDP window WðtÞ ¼ signðtÞ expðÀjtj=t Þ with time constant t and learning rate >0 ( Figure 2E); this STDP window is narrow for t ( 1=!. Equations 3 and 4 then lead to (see Materials and methods, Equation 19) the average weight change where A depicts the number of spikes per field traversal. Note that, according to Equation 3, the weight change Dw ij in Equation 5 can be interpreted as a time-averaged version of C 0 ij ðtÞ near t ¼ 0 from Equation 4. Thus, Equations 4 and 5 have a similar structure, but Equation 5 includes multiple incidences of the term ! 2 t 2 that account for this averaging. This term is small for narrow learning windows (t ( 1=!) and can thus be neglected (! 2 t 2 ( 1) in this limiting case; however, for typical biological values of t ! 10 ms and ! ¼ 2p Á 10 Hz, the peculiar structure of the ! 2 t 2 -containing factor in the third addend in the square brackets is the reason why this addend can be neglected compared to the first one; as a result, the cases of 'phase locking' (c ¼ 0) and 'no theta' (only the first addend remains) are basically indistinguishable. Moreover, for narrow odd learning windows, Dw ij in Equation 5 inherits a number of properties from C 0 ij ð0Þ in Equation 4: the second addend still remains the dominant one for T ij < s; inherited are also the absence of a weight change for fully Figure 3A). Furthermore, the prefactor A 2 t 2 in Equation 5 suggests that the average weight change increases with increasing width t of the learning window, but we emphasize that this increase is restricted to t ( 1=! (as we assumed for the derivation), which prohibits a generalization of the quadratic scaling to large t ; the exact dependence on t will be explained later.
To quantify how much better a sequence can be learned with phase precession as compared to phase locking, we use the ratio of the weight change Dw ij with phase precession (c>0) and the weight change Dw ij ðc ¼ 0Þ without phase precession ( Figure 3A), and define the benefit B of phase precession as By inserting Equation 5 in Equation 6, we can explicitly calculate the benefit B of phase precession (see Equation 20 in Materials and methods and solid line in Figure 3B). For T ij < s and ! 4 t 4 ( 1 Figure 3. Temporal-order learning for narrow learning windows (t ( 1 ! ). (A) The average synaptic weight change Dw ij depends on the temporal separation T ij between the firing fields. Phase precession (blue) yields higher weight changes than phase locking (red). Simulation results (circles, averaged across 10 4 repetitions) and analytical results (lines, Equation 5) match well. The vertical lines mark time lags of s and 4s, respectively, where 4s approximates the total field width. (B) The benefit B of phase precession is determined by the ratio of the average weight changes of two scenarios from (A). The solid and dashed lines depict the analytical expression for the benefit (Equation 20) and its approximation for small T ij (Equation 7), respectively. (C) Signal-to-noise ratio (SNR) of the weight change as a function of the firing-field separation T ij . The SNR is defined as the mean weight change divided by the standard deviation across trials in the simulation. Colors as in (A). Parameters for all plots: ! ¼ 2 p Á 10 Hz, s ¼ 0:3 s, t ¼ 10 ms, (see Materials and methods) the benefit B is well approximated by a Taylor expansion up to third order in T ij (dashed line in Figure 3B), The maximum of B as a function of T ij is obtained for T ij ¼ 0 (fully overlapping fields), but the average weight change Dw ij is zero at this point. We note, however, that B decays slowly with increasing can be used to approximate the benefit for small field separations T ij (i.e. largely overlapping fields). For narrow (!t ( 1) odd STDP windows and slope-size matching (!sc ¼ p=4), we find the maximum B max » !s=2, which has an interesting interpretation: If we relate s to the field size L of a Gaussian firing field through L ¼ 4s and if we relate the frequency ! to the period T of a theta oscillation cycle through T ¼ 2p=!, we obtain B max » 0:82 L=T , that is, the maximum benefit of phase precession is about the number of theta oscillation cycles in a firing field. The example in Figure 3B (with firing fields in Figure 2A) has the maximum benefit B max » 10 and the benefit remains in this range for partly overlapping firing fields (0<T ij < s). We thus conclude that phase precession can boost temporal-order learning by about an order of magnitude for typical cases in which learning windows are narrower than a theta oscillation cycle and overlapping firing fields are an order of magnitude wider than a theta oscillation cycle.
So far, we have considered 'average' weight changes that resulted from neural activity that was described by a deterministic firing rate. However, neural activity often shows large variability, that is, different traversals of the same firing field typically lead to very different spike trains. To account for such variability, we have simulated neural activity as inhomogeneous Poisson processes (see Materials and methods for details). As a result, the change of the weight of a synapse, which depends on the correlation between spikes of the presynaptic and the postsynaptic cells, is a stochastic variable. It is important to consider the variability of the weight change ('noise') in order to assess the significance of the average weight change. For this reason, we utilize the signal-to-noise ratio (SNR), that is, the mean weight change divided by its standard deviation (see Materials and methods for details). To do so, we perform stochastic simulations of spiking neurons and calculate the average weight change and its variability across trials. This is done for phase-precessing as well as phaselocked activity. To connect this approach to our previous results, we confirm that the average weight changes estimated from many fields traversals matches well the analytical predictions ( Figure 3A and B, see Materials and methods for details).
The SNR shown in Figure 3C summarizes how reliable is the learning signal in a single traversal of the two firing fields -for the assumed odd learning window. The SNR further depends on T ij and follows a similar shape as the weight changes in Figure 3A. For phase precession, there is a maximum SNR that is slightly shifted to larger T ij ; for phase locking, SNR is always much lower. For the synapse connecting two cells with firing fields as in Figure 2A where T ij ¼ s, we find an SNR of 0.27, which is insufficient for a reliable representation of a sequence.
To allow reliable temporal-order learning, one possible solution is to increase the number of spikes per field traversal A (SNR / ffiffiffi A p , as shown in Appendix 1). Another possibility is to increase the number of synapses. In Materials and methods we show that SNR / ffiffiffiffi ffi M p where M is the number of identical and uncorrelated synapses. Therefore, to achieve SNR > 1 for A ¼ 10, one needs M > 14 synapses.
In summary, for narrow, odd learning windows (t ( 1=! ( s), temporal-order learning could benefit tremendously from phase precession as long as firing fields have some overlap. Average weight changes and the SNR are highest, however, for clearly distinct but still overlapping firing fields. It should be noted that any even component of the learning window would increase the noise and thus further decrease the SNR.
Dependence of temporal-order learning on the width of the learning window for overlapping firing fields To investigate how temporal-order learning for an odd learning window depends on its width, we vary the parameter t and quantify the average synaptic weight change Dw ij and the SNR both analytically and numerically. We first study overlapping firing fields ( Figure 4) and later consider nonoverlapping firing fields ( Figure 5).
For partly overlapping firing fields (e.g. T ij ¼ s), we find numerically that the average synaptic weight change Dw ij (the 'learning signal') increases monotonically for increasing t and saturates (colored curves in Figure 4A). This is because for increasing t the overlap between the learning window The signal-to-noise ratio (SNR; phase precession: blue, phase locking: red, no theta: cyan) takes into account that only the asymmetric part of the learning window is helpful for temporal-order learning. For large t , all three coding scenarios induce the same SNR. The horizontal dashed black line depicts the analytical limit of the SNR for large t and overlapping firing fields (SNR » 1:6, Equation A1-17 of Appendix 1). The dotted black line depicts the analytical expression for the 'no theta' case (Equation A2-48 in Appendix 2, the curve could not be plotted for t < 0:1 s due to numerical instabilities). Dots represent the SNR for experimentally observed learning windows. The learning windows were taken from 'B&P', Bi and Poo, 2001: their Table 4, 'All to All', 'minimal model', and 'B', Bittner et al., 2017: their Figure 3D. For 'B&P', 'F', and 'B', the position of the dots on the horizontal axis was estimated as the average time constants for positive and negative lobes of the learning windows. Wittenberg and Wang modeled their learning rule by a difference of Gaussians -we approximated the corresponding time constant as 30 ms. For the triplet rule by Pfister and Gerstner, we used the average of three time constants: the two pairwise-interaction time constants (as in Bi and Poo) and the triplet-potentiation time constant. Parameters for all plots: T ij ¼ 0:3 s, ! ¼ 2 p Á 10 Hz, s ¼ 0:3 s, c ¼ 0:042, A ¼ 10, ¼ 1. Colored/gray curves and dots are obtained from stochastic simulations; see Materials and methods for details. and the cross-correlation function grows, and this overlap begins to saturate as soon as the learning window is wider than T ij , that is, the value at which the cross-correlation assumes its maximum (cmp. dashed curve in Figure 2C). To analytically calculate the saturation value of Dw ij for large learningwindow widths (t ) s), we can approximate the learning window as a step function (see Materials and methods for details) and find the maximum that provides an upper bound to the weight change for overlapping firing fields (solid line in Figure 4A). For t < 1=! (and actually well beyond this region), the analytical small-tau approximation of Dw ij (Equation 5, dashed curves in Figure 4A) matches the numerical results well. The results in Figure 4A confirm that Dw ij is increased by phase precession for narrow learning windows but is independent of phase precession for t ) 1=!. Thus, the benefit B becomes small for large t ( Figure 4B) because, for large enough t , the theta oscillation completes multiple cycles within the width of the learning window. To better understand this behavior, let us return to Equation 1: if the product of a wide learning window and the cross-correlation C ij is integrated to obtain the weight change, the oscillatory modulation of the cross-correlation (e.g. as in Figure 2C) becomes irrelevant; similarly, according to Equation 3, the particular value of the derivative C 0 ij ðtÞ near t ¼ 0 can be neglected. Consequently, for t ) 1=! phase precession and phase locking as well as the scenario of firing fields that are not theta modulated yield the same weight change ( Figure 4A), and the benefit approaches 0 ( Figure 4B). Wide learning windows thus ignore the temporal (theta) finestructure of the cross-correlation.
How noisy is this learning signal Dw ij across trials? Figure 4C shows that for odd learning windows the SNR increases with increasing t and, for t ) 1 ! , approaches a constant value. This constant value is the same for phase precession, phase locking, or no theta oscillations at all. Taken together, for large enough t , the advantage of phase precession vanishes. For small enough t , phase precession increases the SNR, which confirms and generalizes the results in Figure 3C. Remarkably, the SNR for 'phase locking' is lower than the one for 'no theta', which means that theta oscillations without phase precession degrade temporal-order learning, even though theta oscillations as such were emphasized to improve the modification of synaptic strength in many other cases (e.g. Buzsáki, 2002;D'Albis et al., 2015). Figure 4C predicts that a large t yields the biggest SNR, and thus wide learning windows are the best choice for temporal-order learning; however, we note that this conclusion is restricted to odd (i.e. asymmetric) learning windows. An additional even (i.e. symmetric) component of a learning window would increase the noise without affecting the signal, and thus would decrease the SNR (dots in Figure 4C). It is remarkable that the only experimentally observed instance of a wide window (with t » 1 s in Bittner et al., 2017) has a strong symmetric component, which leads to a low SNR (dot marked 'B' in Figure 4C).
Taken together, we predict that temporal-order learning would strongly benefit from wide, asymmetric windows. However, to date, all experimentally observed (predominantly) asymmetric windows are narrow (e.g. Bi and Poo, 2001;Froemke et al., 2005;Wittenberg and Wang, 2006; see Abbott and Nelson, 2000;Bi and Poo, 2001 for reviews).

Temporal-order learning for wide learning windows (K ) s)
We finally restrict our analysis to wide learning windows, which allows us then to also consider nonoverlapping firing fields ( Figure 5A, we again use two Gaussians with widths s and separation T ij ). To allow for temporal-order learning in this case, the spikes of two non-overlapping fields can only be 'paired' by a wide enough learning window. As already indicated in Figure 4, phase precession does not affect the weight change for such wide learning windows where the width t of the learning window obeys t ) 1=! (note that we always assumed many theta oscillation cycles within a firing field, that is, 1=! ) s). Furthermore, Figure 4 indicated that only the asymmetric part of the learning window contributes to temporal-order learning. For the analysis of temporal-order learning with non-overlapping firing fields and wide learning windows, we thus ignore any theta modulation and phase precession and evaluate, again, only the odd STDP window WðtÞ ¼ signðtÞ expðÀjtj=t Þ. In this case, the weight change (Equation 1) is still determined by the cross-correlation function and the learning window (examples in Figure 5B,C). The resulting weight change Dw ij as a function of the temporal separation T ij of firing fields is shown in Figure 5D: with increasing T ij , the weight Dw ij quickly increases, reaches a maximum, and slowly decreases. The initial increase is due to the increasing overlap of the Gaussian bump in C ij with the positive lobe of the learning window. The decrease, on the other hand, is dictated by the time course of the learning window. For t ) s, these two effects can be approximated by in which the error function describes the overlap of the cross-correlation with the learning window and the exponential term describes the decay of the learning window (dashed black curve in Figure 5D, see also Equation 25 in Materials and methods for details).
How does the SNR of the weight change depend on the separation T ij of firing fields? For T ij ¼ 0, the signal is zero and thus also the SNR. As T ij increases, both signal and noise increase, but quickly settle on a constant ratio. The value of the SNR height of this plateau can be approximated by (dashed line in Figure 5E), where A is the number of spikes within a firing field (Equation 11). For A ¼ 10, we find SNR » 2:2, allowing for temporal-order learning with a single synapse. We note that this conclusion is limited to asymmetric STDP windows. A symmetric component (like in Bittner et al., 2017) decreases the SNR and makes temporal-order learning less efficient.
Taken together, temporal-order learning can be performed with wide STDP windows, and phase precession does not provide any benefit; but temporal-order learning requires a purely asymmetric plasticity window. For non-overlapping firing fields, wide learning windows are essential to bridge a temporal gap between the fields.

Discussion
In this report, we show that phase precession facilitates the learning of the temporal order of behavioral sequences for asymmetric learning windows that are shorter than a theta cycle. To quantify this improvement, we use additive, pairwise STDP and calculate the expected weight change for synapses between two activated cells in a sequence. We confirm the long-held hypothesis  that phase precession bridges the vastly different time scales of the slow sequence of behavioral events and the fast STDP rule. Synaptic weight changes can be an order of magnitude higher when phase precession organizes the spiking of multiple cells at the theta time scale as compared to phase-locking cells.

Other mechanisms and models for sequence learning
As an alternative mechanism to bridge the time scales of behavioral events and the induction of synaptic plasticity, Drew and Abbott, 2006 suggested STDP and persistent activity of neurons that code for such events. The authors assume regularly firing neurons that slowly decrease their firing rate after the event and show that this leads to a temporal compression of the sequence of behavioral events. For stochastically firing neurons, this approach is similar to ours with two overlapping, unmodulated Gaussian firing fields. In this case, sequence learning is possible, but the efficiency can be improved considerably by phase precession.
Sato and Yamaguchi, 2003 as well as Shen et al., 2007 investigated the memory storage of behavioral sequences using phase precession and STDP in a network model. In computer simulations, they find that phase precession facilitates sequence learning, which is in line with our results. In contrast to these approaches, our study focuses on a minimal network (two cells), but this simplification allows us to (i) consider a biologically plausible implementation of STDP, firing fields, and phase precession and (ii) derive analytical results. These mathematical results predict parameter dependencies, which is difficult to achieve with only computer simulations.
Related to our work is also the approach by Masquelier and colleagues Masquelier et al., 2009 who showed that pattern detection can be performed by single neurons using STDP and phase coding, yet they did not include phase precession. They consider patterns in the input whereas, in our framework, it might be argued that patterns between input and output are detected instead.

Noisy activity of neurons and prediction of the minimum number of synapses for temporal-order learning
To account for stochastic spiking, we use Poisson neurons. We find that a single synapse is not sufficient to reliably encode a minimal two-neuron sequence in a single trial because the fluctuations of the weight change are too large. Fortunately, the SNR scales with ffiffiffiffiffiffiffi MA p , that is, the square root of the number M of identical, but independent synapses and the number A of spikes per field traversal of the neurons. For generic hippocampal place fields and typical STDP, we predict that about 14 synapses are sufficient to reliably encode temporal order in a single traversal. Interestingly, peak firing rates of place fields are remarkably high (up to 50 spikes/s; e.g. O' Keefe andRecce, 1993, Huxter et al., 2003). Taken together, in hippocampal networks, reliable encoding of the temporal order of a sequence is possible with a low number of synapses, which matches simulation results on memory replay (Chenkov et al., 2017).
Width, shape, and symmetry of the STDP window are critical for temporal-order learning Various widths have been observed for STDP learning windows (Abbott and Nelson, 2000;Bi and Poo, 2001). We show that for all experimentally found STDP time constants phase precession can improve temporal-order learning. However, for learning windows much wider than a theta oscillation cycle, the benefit of phase precession for temporal-order learning is small. Wide learning windows, where the width can be even on a behavioral time scale of » 1 s (Bittner et al., 2017) or larger, could, on the other hand, enable the association of non-overlapping firing fields. Alternatively, nonoverlapping firing fields might also be associated by narrow learning windows if additional cells (with firing fields that fill the temporal gap) help to bridge a large temporal difference, much like 'time cells' in the hippocampal formation (reviewed in Eichenbaum, 2014).
STDP windows typically have symmetric and asymmetric components (Abbott and Nelson, 2000;Mishra et al., 2016). We find that only the asymmetric component supports the learning of temporal order. In contrast, the symmetric component strengthens both forward and backward synapses by the same amount and thus contributes to the association of behavioral events independent of their temporal order. For example, the learning window reported by Bittner et al., 2017 shows only a mild asymmetry and is thus unfavorable to store the temporal order of behavioral events. Only long, predominantly asymmetric STDP windows would allow for effective temporal-order learning ( Figure 4).
Generally, the shape of STDP windows is subject to neuromodulation; for example, cholinergic and adrenergic modulation can alter its polarity and symmetry (Hasselmo, 1999). Also dopamine can change the symmetry of the learning window (Zhang et al., 2009). Therefore, sequence learning could be modulated by the behavioral state (attention, reward, etc.) of the animal.
Key features of phase precession for temporal order-learning: generalization to non-periodic modulation of activity For STDP windows narrower ( < 10 ms) than a theta cycle ( > 100 ms), we argue that the slope of the cross-correlation function at zero offset controls the change of the weight of the synapse connecting two neurons; and we show that phase precession can substantially increase this slope. This result predicts that features of the cross-correlation at temporal offsets that are larger than the width of the learning window are irrelevant for temporal-order learning. It is thus conceivable to boost temporal-order learning even without phase precession, which is weak if theta oscillations are weak, as for example in bats (Ulanovsky and Moss, 2007) and humans (Herweg and Kahana, 2018;Qasim et al., 2021). In this case, temporal-order learning may instead benefit from two other phenomena that could create an appropriate shape of the cross-correlation: (i) Spiking of cells is locked to common (aperiodic) fluctuations of excitability. (ii) Each cell responds the faster to an increase in its excitability the longer ago its firing field has been entered, which may be mediated by a progressive facilitation mechanism. Together, these phenomena can make the cross-correlation exhibit a steeper slope around zero and could even give rise to a local maximum at a positive offset. This temporal fine structure is superimposed on a slower modulation, which is related to the widths of the firing fields. In summary, a progressively decreasing delay of spiking with respect to nonrhythmic fluctuations in excitation generalizes the notion of phase precession. Interestingly, synaptic short-term facilitation, which could generate the described fine structure of the cross-correlation, has also been proposed as mechanism underlying phase precession .

Model assumptions
In our model, we assumed that recurrent synapses (e.g. between neurons representing a sequence) are plastic but weak during encoding, such that they have a negligible influence on the postsynaptic firing rate; and that the feedforward input dominates neuronal activity. These assumptions seem justified as Hasselmo, 1999 indicated that excitatory feedback connections may be suppressed during encoding to avoid interference from previously stored information (see also Haam et al., 2018). Furthermore, neuromodulators facilitate long-term plasticity (reviewed, e.g. by Rebola et al., 2017), which also supports our assumptions.
The assumption of weak recurrent connections implies that these connections do not affect the dynamics. Consequently (and in contrast to Tsodyks et al., 1996), we thus hypothesize that phase precession is not generated by the local, recurrent network (see also, e.g. Chadwick et al., 2016); instead, we assume that phase precession is inherited from upstream feedforward inputs (Chance, 2012;Jaramillo et al., 2014) or generated locally by a cellular/synaptic mechanism (Magee, 2001;Harris et al., 2002;Mehta et al., 2002;Thurley et al., 2008). After temporal-order learning was successful, the resulting asymmetric connections could indeed also generate phase precession (as demonstrated by the simulations in Tsodyks et al., 1996), and this phase precession could then even be similar to the one that has initially helped to shape synaptic connections. Finally, inherited or local cellularly/synaptically generated phase precession and locally network-generated phase precession could interact (as reviewed, for example in Jaramillo and Kempter, 2017).
We assumed in our model that the widths of the two firing fields that represent two events in a sequence are identical (see, e.g. Figure 2A). But firing fields may have different widths, and in this case a slope-size matched phase precession would fail to reproduce the timing of spikes required for the learning of the correct temporal order of the two events. For example, the learned temporal order of events (timed according to field entry) would even be reversed if two fields with different sizes are aligned at their ends. How could the correct temporal order nevertheless be learned in our framework? In the hippocampus, theta oscillations are a traveling wave (Lubenov and Siapas, 2009;Patel et al., 2012) such that there is a positive phase offset of theta oscillations for the wider firing fields in the more ventral parts of the hippocampus. This traveling-wave phenomenon could preserve the temporal order in the phase-precession-induced compressed spike timing, as also pointed out earlier (Leibold and Monsalve-Mercado, 2017;Muller et al., 2018).
Our results on learning rules for sequence learning rely on pairwise STDP in which pairs of presynaptic and postsynaptic spikes are considered. Conversely, triplet STDP considers also motifs of three spikes (either 2 presynaptic -1 postsynaptic or 2 postsynaptic -1 presynaptic) (Pfister and Gerstner, 2006). Triplets STDP models can reproduce a number of experimental findings that pairwise STDP could not, for example the dependence on the repetition frequency of spike pairs (Sjö strö m et al., 2001). To investigate the influence of triplet interactions on sequence learning, we implemented the generic triplet rule by Pfister and Gerstner, 2006. We used their 'minimal' model, which was regarded as the best model in terms of number of free parameters and fitting error; for the parameters they obtained from fitting the triplet STDP model to hippocampal data, we found only mild differences to our results (see, e.g. Figure 4C). Differences are small because the fitted time constant of the triplet term (40 ms) is smaller than typical inter-spike intervals ( > 50 ms, minimum in field centers) in our simulations.

Replay of sequences and storage of multiple and overlapping sequences
A sequence imprinted in recurrent synaptic weights can be replayed during rest or sleep (Wilson and McNaughton, 1994;Nádasdy et al., 1999;Diba and Buzsáki, 2007;Peyrache et al., 2009;Davidson et al., 2009), which was also observed in network-simulation studies (Matheus Gauy et al., 2020;Malerba and Bazhenov, 2019;Gillett et al., 2020). Replay could thus be a possible readout of the temporal-order learning mechanism. However, replay depends on the many parameters of the network, and a thorough investigation of is beyond the scope of this manuscript. Therefore, we focus on synaptic weight changes that represent the formation of sequences in the network, which underlies replay, and we do not simulate replay.
We have considered the minimal example of a sequence of two neurons. Sequences can contain many more neurons, and the question arises how two different sequences can be told apart if they both contain a certain neuron, but proceed in different directions -as they might do for sequences of spatial or non-spatial events (Wood et al., 2000). In this case, it may be beneficial to not only strengthen synapses that connect direct successors in the sequence but also synapses that connect the second-to-next neuron. In this way, the two crossing sequences could be disambiguated, and the wider context in which an event is embedded becomes associated, which is in line with retrieved-context theories of serial-order memory (Long and Kahana, 2019). More generally, it is an interesting question of how many sequences can be stored in a network of a given size. Gillett et al., 2020 were able to analytically calculate the storage capacity for the storage of sequences in a Hebbian network.
In conclusion, our model predicts that phase precession enables efficient and robust temporalorder learning. To test this hypothesis, we suggest experiments that modulate the shape of the STDP window or selectively manipulate phase precession and evaluate memory of temporal order.

Experimental design: model description
We model the time-dependent firing rate of a phase precessing cell i (two examples in Figure 2A) as where the scaling factor A determines the number of spikes per field traversal and G i;s ðtÞ ¼ 1=ð ffiffiffiffiffiffi 2p p sÞ Á exp½Àðt À i Þ 2 =ð2s 2 Þ is a Gaussian function that describes a firing field with center at i and width s. The firing field is sinusoidally modulated with theta frequency ! (but the sinusoidal modulation is not a critical assumption, see Discussion), with typically many oscillation cycles in a firing field (!s ) 1). The compression factor c can be used to vary between phase precession (c>0), phase locking (c ¼ 0), and phase recession (c<0) because the average population activity of many such cells oscillates at frequency of ð1 À cÞ! (Geisler et al., 2010;D'Albis et al., 2015), which provides a reference frame to assign theta phases ( Figure 2A). Usually, jcj ( 1 with typical values c < 1=ðs!Þ (Geisler et al., 2010); for a pair of cells with overlapping firing fields (centers separated by T ij :¼ j À i ) the phase delay is !cT ij ( Figure 2B).
To quantify temporal-order learning, we consider the average weight change Dw ij of the synapse from cell i to cell j, which is (Kempter et al., 1999) where C ij ðtÞ is the cross-correlation between the firing rates f i and f j of cells i and j, respectively ( Figure 2C,D): WðtÞ denotes the synaptic learning window, for example the asymmetric window where t is the time constant and >0 is the learning rate ( Figure 2E). For the following calculations, we make two assumptions that are reasonable in the hippocampal formation (O'Keefe and Recce, 1993;Bi and Poo, 2001;Geisler et al., 2010) : 1. The theta oscillation has multiple cycles within the Gaussian envelope of the firing field in Equation 11 (1=! ( s).
2. The window W is short compared to the theta period (t ( 1=!).

Analytical approximation of the cross-correlation function
To explicitly calculate the cross-correlation C ij ðtÞ as defined in Equation 13, we plug in the firingrate functions (Equation 11) for the two neurons: The first term (out of four) describes the cross-correlation of two Gaussians, which results in a Gaussian function centered at T ij and with width s ffiffi ffi 2 p . For the second term, we note that the product of two Gaussians yields a function proportional to a Gaussian with width s= ffiffi ffi 2 p , and then use assumption (i). When integrated, the second term's contribution to C ij ðtÞ is negligible because the cosine function oscillates multiple times within the Gaussian bump, that is, positive and negative contributions to the integral approximately cancel. The same argument applies to the third term. For the fourth term, we use the trigonometric property cosðaÞ Á cosðbÞ ¼ 1 2 cosða þ bÞ þ cosða À bÞ ð Þ.
Again, we use assumption (i) and neglect the first addend on the right-hand side. Notably, the cosine function in the second addend is independent of the integration variable t 0 . Taken together, we find Thus, the cross-correlation can be approximated by a Gaussian function (center at T ij , width s ffiffi ffi 2 p ) that is theta modulated with an amplitude scaled by the factor 1 2 . To further simplify Equation 15, we note that the time constant t of the STDP window is usually small compared to the theta period (assumption (ii), Figure 2C,D,E). Structures in C ij ðtÞ for jtj ) t thus have a negligible effect on the synaptic weight change. Therefore, we can focus on the crosscorrelation for small temporal lags. In this range, we approximate the (slow) Gaussian modulation of C ij ðtÞ ( Figure 2C,D, dashed red line) by a linear function, that is, T ij 2s 2 Á t þ 1 : Inserting this result in Equation 15, we approximate the cross-correlation function C ij ðtÞ for jtj < t as ( Figure 2D, dashed black line) In the Results, we show that the slope of the cross-correlation function at t ¼ 0 is important for temporal-order learning. From Equation 17 we find which has three addends within the square brackets. Let us estimate the relative size of the second and third terms with respect to the first one. The third term is at most of the order of 0.5 because j cosð!cT ij Þj 1. For the second addend, we note that sinð!cT ij Þ=ð!cT ij Þ approaches 1 for T ij ! 0 and remains in this range for j!cT ij j < p=4. This condition is fulfilled for jT ij j < s if we assume slope-size matching of phase precession (Geisler et al., 2010), that is, !cs » p 4 » 0:79. Then, the size of the second addend is dictated by the factor !s, which is large according to assumption (i). In other words, for typical phase precession and jT ij j < s, the second addend is much larger than the other two.
To further understand the structure of C 0 ij ð0Þ, which is also shaped by the prefactors in front of the square brackets, we first note that C 0 ij ð0Þ is zero for fully overlapping firing fields (T ij À ! 0). On the other hand, for very large field separations (T ij ) s), the Gaussian term G causes C 0 ij ð0Þ to become zero. The prefactors have a maximum at jT ij j ¼ ffiffi ffi 2 p s. The maximum's exact location is slightly shifted by the second addend but remains near ffiffi ffi 2 p s. This peak will be important because it is inherited by the average weight change (Equation 3).

Average weight change
Having approximated the cross-correlation function and its slope at zero (Equations 17,18), we are now ready to calculate the average synaptic weight change (Equation 3) for the assumed STDP window (Equation 14). Standard integration methods yield Because Dw ij is a temporal average of C 0 ij ðtÞ for small t (see interpretation of Equation 3), the weight change's structure resembles the previously discussed structure of C 0 ij ð0Þ. The averaging introduces additional factors proportional to 1 AE ! 2 t 2 , but for !t ( 1 [assumption (ii)] those have only minor effects on the relative size of the three addends. The second term still dominates. Importantly, Dw ij ¼ 0 for T ij ¼ 0 and the position of the peak at T ij < ffiffi ffi 2 p s is inherited from C 0 ij ð0Þ ( Figure 3A).

The benefit of phase precession
To quantify the benefit B of phase precession, we consider the expression Dw ij =Dw ij ðc ¼ 0Þ À 1, because Dw ij describes the overall weight change (including phase precession), and Dw ij ðc ¼ 0Þ serves as the baseline weight change due to the temporal separation of the firing fields (without phase precession). We subtract 1 to obtain B ¼ 0 when the weight changes are the same with and without phase precession. From Equation 19 we find To better understand the structure of B, we Taylor-expand it in T ij up to the third order and assume ! 4 Thus, B assumes a maximum for T ij ¼ 0 and slowly decays for small T ij ( Figure 3B). Using slopesize matching (!sc ¼ p=4), the maximal benefit is where L ¼ 4s depicts the total field size and T ¼ 2p ! is the period of the theta oscillation. Thus, the number of theta cycles per firing field determines the benefit for small separations of the firing fields.

Average weight change for wide learning windows
In this paragraph we relax assumption (ii), that is, we consider wide asymmetric learning windows W (Equation 14 with t ) s). Furthermore, we neglect any theta-oscillatory modulation of the firing fields in Equation 11 and, thus, C ij in Equation 15. First, for non-overlapping fields (T ij ) s), the learning window can be approximated to be constant near the peak of the Gaussian bump of C ij . We can thus rewrite Equation 1 as Second, for overlapping fields (0<T ij < s), the Gaussian bump of C ij partly lies on the negative lobe of W. We can approximate WðtÞ ¼ signðtÞ, and the average weight change in Equation 1 then reads Combining the two limiting cases in Equations 23 and 24 yields

Signal-to-noise ratio
To correctly encode the temporal order of behavioral events, the average weight change Dw ij of a forward synapse needs to be larger than the average weight change Dw ji of the corresponding backward synapse. We thus define the signal-to-noise ratio as where std() denotes the standard deviation and Dw k ij , Dw k ji are the weight changes for trial k 2 ½1; N, the averages across trials being Dw ij ¼ hDw k ij i k and Dw ji ¼ hDw k ji i k . This expression for the SNR 'punishes' the non-sequence-specific strengthening of backward synapses. Specifically, SNR ¼ 0 for a symmetric (even) learning window, because the numerator (which represents the 'signal') is zero. On the other hand, a perfectly asymmetric learning window, like the one used throughout this study (Equation 14), yields SNR ¼ Dwij stdðDw k ij Þ , because Dw k ij ¼ ÀDw k ji . Asymmetric learning windows thus recover the classical definition of the SNR as the ratio between the average weight change and the standard deviation of the weight change.
We note that the generalized definition above can be used to calculate the SNR for arbitrary windows, such as the learning window from Bittner et al., 2017, Figure 4C.
Assuming an asymmetric window and M uncorrelated synapses with the same mean and variance of the weight change, we can write the signal-to-noise ratio as because the variance of the sum can be decomposed into the sum of variances and covariances. All covariances are zero because synapses are uncorrelated. This leaves a sum of M variances, which are identical. Therefore, the standard deviation, and consequently also the SNR, scale with ffiffiffiffi ffi M p .

Numerical simulations
To numerically simulate the synaptic weight change, spikes were generated by inhomogeneous Poisson processes with rate functions according to Equation 11. For every spike pair, the contribution to the weight change was calculated according to Equation 14. We repeated the simulations for N ¼ 10 4 trials, and the mean weight change as well as the standard deviation across trials and the SNR were estimated. All simulations were implemented in Python 3.8 using the packages NumPy (RRID:SCR_008633) and SciPy (RRID:SCR_008058 The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. The signal-to-noise ratio of Dw ij A synapse with weight w ij is assumed to connect neuron i to neuron j. Here, we aim to derive the signal-to-noise ratio (SNR) of the weight changes Dw ij , which is defined as (Materials and methods)

Author contributions
where hDw ij i is the average signal. The noise is described by the standard deviation of the weight change, Signal and noise are generated by additive STDP and spiking activity that is modeled by two inhomogeneous Poisson processes with rates f i ðtÞ and f j ðtÞ that have finite support. The average weight change is calculated by hDw ij i ¼ R dt WðtÞC ij ðtÞ where WðtÞ is the synaptic learning window and C ij ðtÞ depicts the cross-correlation function C ij ðtÞ ¼ R dt 0 f i ðt 0 Þf j ðt 0 þ tÞ. From Kempter et al., 1999, we use where S i ðtÞ ¼ P n dðt À t ðnÞ i Þ and S j ðtÞ ¼ P n dðt À t ðnÞ j Þ are the presynaptic and postsynaptic spike trains, respectively. To simplify, we set t 0 ¼ À¥ and Dw ij ðt 0 Þ ¼ 0. Furthermore, we are interested in paired STDP and thus set w in ¼ w out ¼ 0. Because both spike trains are drawn from different Poisson processes, S i and S j are statistically independent, and therefore we can simplify hS i ðt 0 þ sÞS i ðu þ vÞS j ðt 0 ÞS j ðuÞi ¼ hS i ðt 0 þ sÞS i ðu þ vÞihS j ðt 0 ÞS j ðuÞi: Moreover, in a spike train the spikes at different times are uncorrelated, hS i ðt 0 þ sÞS i ðu þ vÞi ¼ hS i ðt 0 þ sÞihS i ðu þ vÞi þ hS i ðt 0 þ sÞidðt 0 þ s À u À vÞ and hS j ðt 0 ÞS j ðuÞi ¼ hS j ðt 0 ÞihS j ðuÞi þ hS j ðt 0 Þidðt 0 À uÞ: As S i and S j are realizations of inhomogeneous Poisson processes with rates f i ðtÞ and f j ðtÞ, respectively, we find hS i ðt 0 þ sÞS i ðu þ vÞi ¼ f i ðt 0 þ sÞ f i ðu þ vÞ þ f i ðt 0 þ sÞdðt 0 þ s À u À vÞ and hS j ðt 0 ÞS j ðuÞi ¼ f j ðt 0 Þ f j ðuÞ þ f j ðt 0 Þdðt 0 À uÞ: We insert these expressions into Equation A1-3: dv WðsÞWðvÞFðt 0 ; s; u; vÞ (A1-4) where Fðt 0 ; s; u; vÞ ¼ f i ðt 0 þ sÞ f i ðu þ vÞ þ f i ðt 0 þ sÞdðt 0 þ s À u À vÞ ½ Á f j ðt 0 Þ f j ðuÞ þ f j ðt 0 Þdðt 0 À uÞ Â Ã : To explicitly calculate the SNR, we parameterize the firing rates as In what follows we consider a limiting case of wide learning windows, for which we can explicitly calculate the SNR. The results obtained in this case match well to the numerical simulations for wide learning windows (Figures 4 and 5 in the main text).

Wide learning windows
For wide windows (formally: t ! ¥, k ! ¥), we can approximate W even ¼ l and W odd ðtÞ ¼ sgnðtÞ and neglect the sinusoidal modulations of f i and f j in Equation A1-5 and A1-6; phase precession does not affect the SNR in this case.
The following calculations are similar for odd and even windows. We elaborate the calculations in detail for odd windows and use '±' and 'Ç' to include the similar calculations for even windows. The top symbol ('+' and '-', respectively) corresponds to odd windows; the bottom symbol corresponds to even windows.
To start, we split the third and fourth integral in Equation A1-4 into positive and negative time lags s and v, respectively: Summing contributions (1.ii) to (4.ii) for the odd window yields: We continue with contribution (1.iii): Contribution (1.iii) is non-zero if the argument t 0 þ s À u À v of the delta function in the last integral (across v) is zero for some v, which varies from 0 to ¥. The argument of the delta function is thus zero for some v if 0 t 0 þ s À u<¥, which we can rewrite as u t 0 þ s and then use it in the integral across u, which leads to variance of the weight change. To do so, we collect all terms of hDw 2 ij i for even windows (indicated by the bottom symbol of all occurences of 'AE' and 'Ç' in the previous section). Again, we assume wide windows (k ! ¥).