Offline orbitofrontal cortex reactivation depends on recency of place-reward changes and coheres with hippocampal replay

Summary The orbitofrontal cortex, one of the key neocortical areas in valuation and emotion, is critical for cognitive flexibility but its role in the consolidation of recently acquired information remains unclear. Here, we demonstrate orbitofrontal offline replay in the context of a place-reward association task on a maze with varying goal locations. When switches in place-reward coupling were applied, replay was enhanced relative to sessions with stable contingencies. Moreover, replay strength was positively correlated with the subsequent overnight change in behavioral performance. Interrogating relationships between orbitofrontal and hippocampal activity, we found that orbitofrontal and hippocampal replay could occur independently but became coordinated during a type of cortical state with strong spiking activity. These findings reveal a structured form of offline orbitofrontal ensemble activity that is correlated with cognitive flexibility required to adapt to changing task contingencies, and becomes associated with hippocampal replay only during a specific state of high cortical excitability.


INTRODUCTION
The orbitofrontal cortex (OFC) has been implicated in a range of cognitive functions.2][3][4][5] Alternatively, OFC functioning has been cast as signaling the economic value of choice options, [6][7][8] signaling reward expectancy, [9][10][11] or supporting credit assignment during multi-option tasks by encoding action-outcome associations. 12,13everal electrophysiological studies have previously attempted to identify the neuronal correlates of these functions.For instance, firing activity and rhythmic phase-locking of single OFC units was shown to correlate with the outcome value of specific sensory cues 9,11,14,15 and to be sensitive to reversal learning. 11,16,17owever, previous observations pointing to causal roles of the OFC in outcome devaluation [18][19][20] and sensory preconditioning 21 have been difficult to explain using only the hypotheses described previously. 22To account for its functional versatility, the OFC has been proposed to support the formation of a cognitive map 13,23 and construct an internal model of the subject's ''task space.'' 24A task space is an abstract mapping of non-observable, causal relationships between the elements involved in a task (e.g., sensory cues, actions, outcomes, internal drives), which are relevant for achieving the subject's goals. 22,23Task space representations may be acquired by model-based reinforcement learning, which enables the embedding of specific stimulus-outcome and action-outcome relationships in a task-relevant state space 25,26 in addition to less specific, cached values assigned to cues and actions.Task space representations may subserve goal-directed action and prospective cognition, and may be coded jointly in other structures such as medial prefrontal cortex (mPFC).Whereas the mPFC may be specialized in coding action-outcome relationships and corresponding task rules, [27][28][29] the OFC may engage more broadly in coding relationships between cues, spatial context, and outcomes. 4,21,22,30,31A major outstanding question is whether the OFC plays an active role in storing and consolidating task space information, either within its internal synaptic matrices or by driving changes in synaptic networks of connected areas (e.g., basolateral amygdala 32 ).Alternatively, the OFC may import information acquired and stored in other areas to process it ''on the fly,'' thereby controlling how downstream target areas deploy associatively learned information to steer behavior. 13,21

Task acquisition
We measured the development of reward site preferences over the course of flexible place-reward association task (fPRAT, Figures 1A and  1B) sessions using a relative preference index (RPI).The RPI was calculated as the relative difference between the number of valid trials at rewarded and unrewarded ports in each lap around the maze, with valid trials represented by reward port visits where rats maintained a poke for a pre-set time interval (see STAR methods).We performed a block-based analysis where each training block spanned three consecutive sessions (1 session/day) in which reward positions were kept constant relative to distant geometrical cues.To measure acquisition performance within blocks, RPIs were averaged for each session across all blocks (Figure 1D).Across the three sessions of a block, rats displayed exponential place-preference curves with roughly similar asymptotic levels, but slightly different preferences at the start of the session (Figures 1D and S1).After the first two laps, we observed significant increases in session-to-session performance.These are reflected in differences between preference indices calculated for laps 1-11, with the highest and lowest preference values in session 3 (S3) and 1 (S1), respectively (difference S3-S1: 0.195 G 0.017, S2-S1: 0.066 G 0.021, S3-S2: 0.129 G 0.012, S3-S1 vs. S2-S1 p < 0.01, S3-S1 vs. S3-S2 and S2-S1 vs. S3-S2 p < 0.05, Bonferroni-corrected Wilcoxon rank-sum test, WRST, Figure 1E).Thus, animals displayed progressive acquisition of place-reward associations across the three sessions of a block.Furthermore, differences between S3 and S1 calculated across the subset of relevant blocks did not increase as testing progressed, indicating that the task structure was reliably acquired during the pre-training phase.

Orbitofrontal reactivation during rest episodes
To examine electrophysiological mechanisms supporting consolidation of place-reward associations in the OFC, we asked whether correlation patterns newly established between single units during the task are reinstated during subsequent offline states.Spike correlations for all pairs of OFC putative pyramidal neurons, as determined by their slower repolarization and lower firing rate relative to putative fast-spiking interneurons (Figures 2A and 2B), were calculated across pre-task rest (Rest1), task, and post-task rest (Rest2) episodes (Figure 2C; for pooled OFC-CA1 data, see Figure S2).Reactivation was estimated using explained variance (EV) analysis, 33 taking the temporally mirrored measure, the reversed explained variance (REV), as a control. 51As demonstrated by a larger median EV (11.1) relative to the REV (5.3, Wilcoxon matched-pairs signed-rank test -WMPSRT: p = 0.016, 36 sessions from 3 rats, Figure 2D), functional relationships expressed during Task, but not during Rest1, were reactivated during Rest2.This difference was maintained when spikes recorded during hippocampal sharpwave ripples (SWRs) were eliminated from the dataset (EV = 11.5 vs. REV = 5.5, WMPSRT: p = 0.021).These results indicate that reactivation of task-related firing patterns is present in OFC, even in the absence of ripples.

Spatiotemporal correlates of memory trace reactivation in orbitofrontal cortex
Previous studies have shown that task variables (e.g., rewards, cues, locations) and their spatiotemporal succession constrain neural activity patterns during post-task consolidation and replay.For instance, hippocampal, ventral striatal, and visual cortex ensembles fire in specific sequences which are largely conserved during rest periods in between sessions. 40,43,52,53Similarly, fPRAT has a sequential structure where clockwise visits can be reinforced at six spatially distinct ports with variable reward contingencies.The clockwise running direction enabled us to investigate whether temporal associations between neuronal pairs, expressed during the task, are reactivated during post-task rest.Thus, we computed a temporal bias (T B ) score (Figure 3A), based on cross-correlograms for all cell pairs recorded in each session. 54T B scores, calculated as the relative differences between cross-correlogram integrals in a 400 ms time window on either side of t = 0 s, were pooled across all sessions.Consistent with hippocampal and striatal recordings, 40,[54][55][56][57] a Pearson's correlation analysis of T B scores computed for Rest2 vs. Task yielded higher mean correlations than T B scores for Rest1 vs. Task (Rest1 vs. Task, r Rest1 (d.f.: 2581) = 0.088, p < 0.001; Rest2 vs. Task: r Rest2 (d.f.: 2581) = 0.143, p < 0.001).A bootstrap analysis that resampled T B scores across task episodes revealed that these T B correlation differences were statistically significant (p = 0.02).Thus, the temporal structure of pairwise relationships in OFC during task performance is significantly reinstated during post-task as compared to pretask rest.
A reinstatement of OFC temporal structure during post-task rest may be identifiable only in a subset of the task-engaged ensembles 58 or in ''engram neurons.'' 59,60To investigate whether the sequential encoding of task events shapes neuronal activity during post-task rest generally or  by way of subsets, we restricted analyses to sessions with a positive reactivation strength (i.e., (EV-REV) > 0) and constructed a distribution of cellpair contributions to reactivation using jackknifing. 56These contributions were quantified by estimating EV and REV with and without the pair under scrutiny and taking the difference in (EV-REV) as the pair's contribution to reactivation over an entire session.Next, we compared Not only the temporal order of fPRAT events but also representations of place-reward configurations on the steering wheel maze may be actively re-expressed during offline replay.To investigate this, we computed firing-rate maps for all putative OFC pyramidal neurons (see examples in Figure 3F).Given results from previous studies, 30,61 it was surprising to find that the spatial selectivity and spatial information content of OFC cells (n = 424) were low (3.70 G 0.13 and 0.29 G 0.01 bits/spike, respectively), when compared to CA1 cells recorded in the same experiment (6.04 G 1.13 and 1.76 G 0.09 bits/spike, p < 0.001, n = 74).Nonetheless, when we plotted distributions of rate-map correlations for contributors vs. non-contributor pairs, we observed a rightward shift in the distribution of contributors relative to non-contributors (Figure 3G).Thus, OFC cell pairs that strongly contributed to reactivation showed higher rate-map overlap than non-contributors (mean rate map correlation -contributors: 0.71 vs. non-contributors: 0.62, WRST: p < 0.01, Figure 3H).(B) Significant difference in linear correlations between temporal bias scores for Rest2-Task (blue) and Rest1-Task (red) in pairs strongly contributing to reactivation (estimated using jackknifing, bootstrap statistics: p < 0.01).(C) This difference is absent in non-contributors.(D and E) Histograms of bootstrapped correlation differences (between Rest2-Task versus Rest1-Task) and significance levels for 2.5% and 97.5% (green lines), corresponding to contributors (D) and non-contributors (E).Purple lines indicate differences between correlation coefficients measured for the nonbootstrapped data in B and C. In D but not E, the purple line indicates a significantly stronger difference between correlation coefficients than expected from bootstrapping (p < 0.05).These findings were not paralleled by a difference in overall spiking activity between contributor and non-contributors, as shown by similar firing rates measured during offline and task periods (contributors vs. non-contributors: Rest1: 1.65 G 0.16 Hz vs. 1.47 G 0.17 Hz, p = 0.41 (d.f.175); Task: 1.98 G 0.20 vs. 1.64 G 1.18, p = 0.24, d.f.175; Rest2: 1.63 G 1.16 vs. 1.36 G 0.15, p = 0.21, d.f.175; Bonferroni-corrected WMPSRT).In addition, the quality of spike clusters representing contributor and non-contributor units did not show significant differences when isolation distance and L-ratio 62 were compared between the two categories (contributors vs. non-contributors: L-ratio: 0.061 G 0.011 vs. 0.076 G 0.010, p = 0.30; isolation distance: 35.8 G 2.4 vs. 30.9G 1.7, p = 0.14, WRST).Moreover, the two groups of units did not show a difference in burstiness (Figure S4).
Taken together, these differences in spatiotemporal relationships between replay contributor and non-contributor OFC cell pairs indicate that during post-task rest, a subset of OFC neurons replays spatiotemporal patterns characteristic of the sequential nature of the task.

Strength of orbitofrontal reactivation depends on the recency of changes in place-reward configuration
We next asked whether OFC reactivation strength depends on the repeated exposure of the rat to a particular place-reward configuration on the maze, and may thus be related to the updating of place-reward associations.Therefore, we examined reactivation separately for the three sessions making up a training block.To compare sessions sharing the same reward configuration, this analysis considered only blocks containing all three consecutive sessions (7 blocks containing 21 sessions).In this data subset, we again found reactivation of OFC task patterns during post-task rest: the EV exceeded the REV (median of all 21 sessions: EV = 11.2 vs. REV = 1.8, WMPSRT: p < 0.001, Figure 4A).Across the 3-session blocks, the largest EV and smallest REV, and thus the maximum reactivation strength, were found in the first session (Figure 4B left).A decrease was observed within blocks (mean differences: EV-REV S1 = 14.7 G 5.0, EV-REV S2 = 9.7 G 5.1, EV-REV S3 = 3.0 G 2.1, Friedman's test: p = 0.028, Figure 4B right) with a significant difference in reactivation strength between the first and third session (p = 0.047, Bonferronicorrected WMPSRT).Thus, the relative novelty of place-reward coupling in a familiar environment favors strong reactivation in the OFC network.

Orbitofrontal reactivation strength coheres with overnight change in behavioral performance
So far, these results indicate that task-related information, recently acquired during maze experience, is reactivated by OFC cell ensembles.The extent to which replay coheres with performance may be detectable in the correlations between the overnight changes in behavioral performance across consecutive maze runs and neural activity during interleaved rest episodes.To test this, we calculated the overnight performance difference (i.e., the difference in place preference index between pairs of consecutive sessions, for different temporal segments of the task period), and correlated this difference with the reactivation strength observed after the first fPRAT session of this session pair (see STAR methods, Figure 4C1).Thus, we obtained a 10 x 10 reactivation-behavioral correlation (RBC) matrix in which each element is the Pearson's correlation coefficient (r) for the reactivation strength (i.e., EV-REV) on Day d versus the RPI difference between Day d+1 and Day d, for a given subset of laps (Figure 4C).Overall, differences between RPI during the initial laps on Day d+1 and the later series of laps on Day d showed significant linear correlations with replay strength (see example in Figure 4C2, and 4C4).A bootstrap test with randomized lap assignment showed that a subset of correlation matrix elements, corresponding to laps 1-2 on Day d+1 and laps 4-11 on Day d (Figures 4C3,  and 4C4; purple rectangle), fell in the upper 5 th percentile of the bootstrapped distribution (Figure 4C3, r 95% = 0.63).In addition, we tested other parameter combinations, such as EV vs. the difference in performance described in Figure 4 and, to compare equal numbers of laps on consecutive days, we built similar RBC matrices by correlating EV or the EV-REV difference with the difference in performance between moving intervals of three laps on day D+1 -day D (data not shown).All analyses showed a correlation profile similar to Figure 4C4.This grouping indicates that the strength of OFC reactivation correlates with the performance difference between consecutive sessions, supporting the notion that OFC replay in between subsequent learning episodes contributes to changes in task performance.

State-dependent coordination of orbitofrontal replay with hippocampus
During spatial tasks, neural activity in the OFC is synchronized to CA1 activity, particularly in the theta band following learning and at decision points. 48Together with recent evidence showing that optogenetic inhibition of the subiculum alters single-cell coding of response direction and OFC representations of outcome-relevant blocks of trials, 63 these findings suggest that hippocampal-prefrontal interactions may be critically involved in adaptation to changing reward associations.Despite the presence of OFC replay in the absence of hippocampal SWRs, we thus asked whether dorsal area CA1 and OFC jointly reactivate after fPRAT performance.
As expected, ensemble activity in area CA1 was highly synchronized during SWR complexes (Figure 5B left).Reactivation in dorsal area CA1 prominently occurs during SWRs recorded in quiet wakefulness and non-REM sleep (QW-NREM) episodes, 33 and this was also the case following fPRAT (CA1: EV = 13.6 vs. REV = 0.9, WMPSRT: p < 0.01; n = 10 sessions, Figure 5B right, C).Even though the OFC is anatomically remote from CA1, we found that OFC firing activity was entrained to hippocampal SWRs on a timescale of seconds, both during pre-and post-task rest (Figures 5D and 5E; cf;). 57Peri-ripple time histograms (PRTHs) showed a significant decrease in mean OFC firing rate for several seconds preceding SWRs, followed by a gradual rise, peaking in a 1 s time-window following ripple onset during pre-task rest (bootstrap: p < 0.01 Figure 5D; for example OFC units, see Figure 5F; WMPSRT: p < 0.05).During post-task rest, OFC firing generally displayed a pattern similar to pre-task rest, but now OFC unit activity was downregulated more sharply before ripple onset (Figure 5E).A comparison between Rest1 and Rest2 PRTHs of OFC neurons showed a more pronounced trough and peak around SWRs in Rest1 relative to Rest2 (Figure 5G, cluster permutation test: p < 0.05).However, when correlations between CA1 and OFC cells were compared for all SWRs during Rest1 and Rest2, no overall reactivation across the OFC-CA1 network could be detected (EV = 1.3 G 0.5 vs. REV = 1.8 G 0.8, WMPSRT: p = 0.105).
We next fractionated OFC activity in relation to reactivation.This fractionation was inspired by the distinction between Up and Down states, 64 but was not based on the presence of slow-wave activity preceding or following a period of enhanced firing activity, as would be the case for Up states.Thus, we distinguished Active states by a more inclusive measure based on the graded distribution of spike counts as compared to the definition of (slow-wave activity coupled) Up states.After identifying Active state segments from OFC multiunit activity (based on spike counts during intervals of sustained activity; see STAR methods), these states were subdivided into quartiles, where we focused on the first and last quartiles.The first quartile of Active states is labeled ''Sparsely Active state'' here, with the number of spikes thus being in the lowest 25 th percentile of the Active state distribution computed for the number of spikes per state.The last quartile of Active states is called ''Highly Active state'', representing the highest 25 th percentile of this distribution (Figure S3A, left).Figure 6A shows examples of synchronized unit firing in both state types across OFC and CA1.Next, we estimated intra-and inter-area reactivation across these two different types of Active state for OFC and OFC-CA1 pairs of neurons (see Figure 6B for two examples of Highly Active state).Although Active states showed a substantial overlap (61%) with Up states as defined by Vyazovskiy et al., 64 we note that, in our data, Up states formed a subset of Active states (see STAR methods), which are the states referred to in the current work.
Remarkably, intra-OFC replay preferentially occurred during Sparsely Active states during post-task rest (EV = 16.7 G 2.3% vs. REV = 9.0 G 2.1%, Bonferroni-corrected WMPSRT: p < 0.05, Figure 6C left).OFC reactivation was still observed when spikes fired during CA1 SWRs were eliminated from the dataset (EV = 16.7%G 2.2% vs. REV = 9.1% G 2.0%, WMPSRT: p < 0.01).In contrast, the (EV-REV) difference for OFC was not significant for Highly Active states (EV = 13.7%G 2.2% vs. REV = 10.7%G 1.6%, WMPSRT: Figure 6C right).An opposite relationship, however, was observed for inter-area replay.While the EV-REV difference for OFC-CA1 pairs during Sparsely Active states was not significant (EV = 1.6 G 0.6 vs. REV = 1.5 G 0.5, WMPSRT, Figure 6D left), task-related patterns were reinstated in OFC-CA1 pairs during Highly Active states (EV = 1.8 G 0.6 vs. REV = 0.5 G 0.3, WMPSRT: p < 0.05, Figure 6D right, B).Similar patterns in reactivation strength were observed when EV and REV were calculated using OFC and OFC-CA1 unit pairs for different distribution percentiles of Active states (Figure S3).A trend toward joint OFC-CA1 reactivation was observed when SWR spikes were eliminated from both OFC and CA1 spike vectors, but this was not significant (EV = 1.8 G 0.5 vs. REV = 0.6 G 0.3, WMPSRT: n.s., Figure 6D).In general, inter-area EV and REV values were low.The sparse and predominantly polysynaptic connections between CA1 and OFC may translate functionally in low neuronal activity correlations across these regions. 65,66Indeed, the mean Pearson's correlation coefficient of OFC-CA1 pairs in the Highly Active state was in the range of one order of magnitude smaller than that for OFC pairs during QW-NREM, or CA1 pairs during SWRs (OFC-CA1: 0.0048 G 0.0003 vs. OFC: 0.0168 G 0.0005, WRST: p < 0.001; or vs. CA1: 0.0377 G 0.0045, WRST: p < 0.001).These data suggest that OFC ensembles engage in different replay relationships during distinct brain states: during Sparsely Active states they engage primarily in local, intra-areal replay, whereas during Highly Active states they reactivate together with CA1 ensembles.

DISCUSSION
OFC ensembles were shown to reactivate task-related information patterns during offline states following search behavior on a steering wheel maze involving variable place-reward contingencies.Neuronal pairs strongly contributing to replay displayed similar temporal and spatial relationships during post-task rest as during active behavior.Notably, reactivation was stronger after task sessions in which place-reward couplings were changed relative to the preceding session block, and reactivation strength correlated with a measure of the overnight change in behavioral performance.OFC firing patterns were differentially modulated by SWR activity during pre-and post-task rest.Whereas ensembles within OFC showed significant reactivation during Sparsely Active states, joint OFC-hippocampal reactivation was selectively found during Highly Active states.

Orbitofrontal reactivation and the updating of place-reward associations
We found that OFC reactivation strength depended on the recency of a shift in place-reward coupling on the maze.When recording rats that had been exposed to a shift in place-reward configuration, OFC post-task reactivation was stronger in the first session than in the two subsequent sessions of the same block having the same configuration (Figure 4B).On average, a decrease in performance was observed when comparing the final laps on Day D to the initial laps on Day D+1, which may be attributed to adopting a more exploratory strategy at the start of a new session and a concomitant delay in the reinstatement of task set. 3,67,68In addition, the levels of arousal, motivation, and attention may change over the course of a session.Nevertheless, when correlating the RPI change from Day d to Day d+1 of the same block to the reactivation strength on Day d, we found a strong positive correlation (i.e., when the RPI on Day d+1 was restricted to the first few laps).Thus, a strong reactivation on Day d was especially associated with a lack of performance drop at the start of the session on Day d+1, and can be interpreted as a successful reinstatement of learned fPRAT associations as recalled from the previous session (Figure 4C).That the correlations declined when more laps of Day d+1 were included can be explained by a multitude of factors that may affect behavioral performance on this day, and may well be causally and historically unrelated to the preceding day (e.g., different levels of arousal, motivation, and attention as compared to the previous day, new learning, or intrusion of remote memories).
What we consider especially striking in the boosted OFC reactivation after a shift in place-reward coupling is that the level of task context novelty and required behavioral adaptation in this situation was relatively limited.Hippocampal reactivation was previously reported to be enhanced in a novel environment. 55In contrast, the OFC reactivation we report was enhanced within a familiar environment.No novel reward sites were introduced in our task; each of the six goal sites on the maze had been rewarded at some point in the animal's training history.We also did not introduce a novel task rule, but required animals to update previously learned place-reward associations as these changed from block to block.The importance of this distinction is underwritten by previous findings showing that mPFC lesions in marmosets impair extradimensional shifting but not reversal learning, whereas OFC lesions produce the opposite pattern. 3The placereward updating studied here largely conforms to the principle of reversal learning, because upon a shift, a subset of previously rewarded places was no longer rewarded, whereas a subset of non-rewarded places became coupled to reward.The relationship between placereward updating (Figures 1D and 4) and OFC reactivation (Figure 4) is thus consistent with its causal function in reversal learning 5 ; but see below, 22 .

Properties of orbitofrontal reactivation in relation to memory consolidation
Cell pairs that strongly contributed to reactivation showed a preservation of temporal bias relationships from Task to Rest2 relative to Rest1.This result underscores that reactivation is not only manifested by way of binned cross-correlations (Figure 2) but is also characterized as a (temporally specific) replay.Moreover, spatial correlations between cell pairs were significantly higher for contributor pairs than non-contributors (Figures 3F-3H).Intrinsic firing properties of contributor cells are unlikely to underlie these correlations.Contributor cells did not show an increase in firing rate during Rest2, and if they would have shown a more spatially or temporally confined firing irrespective of task high firing-rate correlations would be observed during Rest1 as well, which would be reflected in a decreased reactivation contribution score.Both the temporal and spatial correlation patterns confirm that replay depends on the specific relationships between OFC cells as they ''tessellate'' the unidirectional sequence of running-and-foraging during each session. 69,70Superficially, these OFC results resemble reactivation characteristics reported for the hippocampus, 54 ventral striatum, 40 neocortex, 41,42 and other structures.
In comparison to area CA1 and the hippocampal-ventral striatal network, however, most OFC cell pairs showed synchronous and temporally symmetric activation patterns, resulting in a distribution of relatively low temporal biases centered around zero (Figures 3A-3E).Furthermore, cross-correlograms during Rest2 did not reveal a clear temporal compression of pairwise firing relationships, unlike the compressed, ripple-associated replay reported for hippocampus, 52,53,71 mPFC, 29,41 and ventral striatum. 40Rather, OFC patterns resemble the uncompressed offline reactivation as reported for putative dopamine neurons in the ventral tegmental area (VTA). 72,73Importantly, OFC reactivation persisted when removing OFC spikes recorded during hippocampal ripples (Figure 2D), whereas reactivation in ventral striatum and mPFC coheres with SWR activity. 29,40These combined findings suggest that the OFC and VTA may be part of a motivational network sustaining offline reactivation independently of the hippocampus, at least in particular brain states.
5][76] In the OFC, local NMDA receptors have been causally implicated in synaptic plasticity underlying the formation of ensembles discriminating sensory cues paired with different outcome values. 77We hypothesize that this mechanism may extend to offline consolidation phases such as post-task sleep and rest.Indeed, studies on hippocampus and mPFC have implicated NMDA receptors in post-task memory reactivation and retention. 78,79In hippocampus-dependent consolidation, SWRs may function to accelerate replay and boost NMDA receptor-dependent plasticity. 51,78Because in our study OFC reactivation occurred independently of SWRs, a boosting effect based on temporal compression may be absent in OFC.Conceivably, the hippocampus plays a special role in memory consolidation as a ''fast absorber'' of information vis-a `-vis neocortical structures, which would incorporate new information more slowly to be protected against catastrophic interference. 80,81Mechanistically, the OFC offers the opportunity to study how SWR-independent replay processes cohere with the speed and efficacy of memory consolidation, and whether the involved transmitter and receptor systems are different from those implied in SWR-dependent consolidation.[84] Co-reactivation of orbitofrontal and hippocampal ensembles Despite the finding that OFC reactivation occurred in the absence of hippocampal SWR activity, two observations suggest that OFC firing activity is conditionally coordinated with CA1 activity: (i) OFC firing activity was modulated in association with SWRs on a timescale of seconds, and this modulation differed between pre-and post-task rest (Figure 5G), and (ii) joint OFC-hippocampal replay did occur but was selectively found during Highly Active states (Figure 6D).This juxtaposition of findings may seem paradoxical given that SWRs and Down-to-Up state transitions in neocortex were previously reported to be correlated. 85However, this paradox may be explained by OFC-hippocampal coactivity being coordinated by other mechanisms than those directly depending on SWRs, while it is also relevant to recall that the Active states as defined here are not identical to Up states. 64Instead of being SWR dependent, OFC-hippocampal replay may be enabled by the very occurrence of Highly Active states, providing a temporal window for replay to emerge and unfold, whereas only their initial temporal segment may tightly cohere with SWR activity. 85As yet it is difficult to underpin why OFC-hippocampal replay occurred selectively during Highly Active states whereas intra-OFC reactivation was restricted to Sparsely Active states.These Sparsely Active states described here for OFC replay may be functionally similar to short duration, fast decorrelating Up-states reactivating recent memory traces in mPFC. 86ighly Active states may allow more time for extended network interactions to be recruited during non-REM sleep as compared to Sparsely Active states, which may only permit short-range and local network activations.This idea is compatible with findings indicating a general decline in long-range functional connectivity during non-REM sleep and anesthesia, [87][88][89] as opposed to preservation of short-range connectivity. 90However, our results lead to the amendment that QW-NREM states may permit long-range interactions selectively during periods of high cortical excitability (such as Highly Active states), which may propagate in a fronto-temporal direction and entrain parahippocampal areas, 22 while local reactivation predominates during shorter, Sparsely Active states. 86he conditional occurrence of joint OFC-hippocampal replay also raises the question through which anatomical pathways cross-structural coordination may occur.Area CA1 is not known to project directly to the OFC subregions we recorded from (VO, VLO, and agranular insular cortex), but its main output structure, the subiculum, does so. 46,63,66,91The hippocampus also projects to other structures which in turn target OFC, such as basolateral amygdala, 66,92-95 peri-and entorhinal cortex, 96 and the thalamic nucleus reuniens. 97Moreover, the hippocampal formation emits excitatory projections to the ventral striatum 98,99 which in turn controls dopaminergic VTA signaling and ventral pallidothalamic feedback to prefrontal cortex.Thus, hippocampal output may influence OFC activity through multiple routes.Conversely, the OFC may indirectly affect hippocampal activity via the anterior cingulate, medial agranular, and temporal association cortex. 45Given this wide range of indirect connections, the OFC and CA1 likely represent only a subset of a larger network of structures showing coordinated replay.

Role of orbitofrontal cortex in storing and consolidating task information
When asking whether task-related information is stored in OFC, or whether OFC merely imports information from other areas to compute its outputs on-line, 32 the present evidence clearly argues in favor of an active role in offline processing (while not excluding information-importing functions in online computations).Because OFC replay is expressed at least by pyramidal cells, it may function to strengthen its internal synaptic matrices or it may affect long-lasting synaptic modifications in downstream areas e.g., striatum, VTA, and basolateral amygdala. 32,100,101How OFC replay may contribute to the formation of an internal model of task space 22,23,24 is difficult to tell at present.However, one key aspect of model-based learning is the embedding of singular task elements in a spatiotemporal context: subjects learn how associations between cues, actions, and outcomes are contingent on spatial locations and moments within a behavioral sequence. 22Thus, the co-reactivation of OFC and CA1 (Figure 6D) may allow the integration of spatial-episodic information coded by hippocampus in a task space model coded by OFC and mPFC to guide flexible and prospective decision-making.Future experiments testing this theoretical framework may flesh out the causal roles of OFC, mPFC, hippocampus, and related structures in these processes.

Limitations of the study
(1) The current study was based on recordings from three rats.Although this quantity is not unusual in replay studies, this number imposes limitations on the statistical power.(2) Due to the lack of strong spatial selectivity of orbitofrontal neurons, it was not feasible to reconstruct replayed spatial trajectories from orbitofrontal ensemble activity during rest or task phases.
Animals were initially trained to run clockwise and visit reward ports in a free choice design, with rewards available at all ports except for the recently visited one.On each visit sucrose was delivered if a nose poke was maintained for at least a preset time interval (the minimally required nose poke duration or 'required poke time', taken randomly between 0.25 and 2 s) during cue on (green lights on) periods.
Next, to habituate rats to the varying absence of reward across port visits, they were subjected to an Equal Probability Schedule, 50 where reward probability (p r ) on all ports was decreased from 100% to 75%.Training on this schedule ended when animals reached a learning criterion of 72 valid trials performed in less than 1 h.A trial was considered valid when nose pokes were maintained at a reward port for at least the duration of the required poke time.This resulted in either sucrose delivery or, for unrewarded trials, activation of an empty syringe pump serving as control for auditory and vibrational cues that would normally occur upon reward delivery.
After successful training on the Equal Probability Schedule, rats transitioned to the flexible Place-Reward Association Task (fPRAT, Figures 1A and 1B).Now, rewards were removed from three adjacent ports and p r was set to 100% on the remaining three ports (Figure 1).Animals received one training session per day and the location of the rewarded ports, relative to extra-maze landmarks, was kept constant for each rat across three training sessions (three consecutive sessions with identical reward distribution made up one block, Figure 1B).To minimize the salience of local maze cues (e.g., odor, texture), the Steering Wheel Maze was rotated by at least 60 at the start of each training day while maintaining the position of the rewarded ports relative to the experimental room. 50,102,103After each block, the position of the rewarded ports was shifted pseudo-randomly by at least two, and at most four, positions (i.e., a rotation by 120 -300 ).Rats were trained until they were able to learn the new reward locations and reached a preference of at least 75% for the rewarded ports on the third session of a block.This pretraining phase lasted approximately two weeks, after which four rats were implanted with a tetrode microdrive.Following implantation, animals were subjected to the same fPRAT during electrophysiological recordings (Figure 1B).Rats were subjected to either saline or corticosterone (3 mg/kg) intraperitoneal injections using a within-subject design.However, no differences between these two treatment conditions were observed throughout data analysis, therefore these data were pooled.

Surgical procedure
Rats were implanted with a custom-built tetrode microdrive featuring two bundles, each containing six individually movable tetrodes and a reference electrode. 104Two craniotomies were performed to accommodate the hyperdrive bundles, with the frontal bundle aimed at the lateral orbitofrontal cortex (area LO) and agranular insular cortex (dorsal and ventral part; AID and AIV, Figure 1C; target in left hemisphere: AP: +3.0 mm, ML: À2.6 mm from bregma; 105 and the caudal one aimed at dorsal hippocampal area CA1 (Figure 1C, right: AP: À4.6 mm, ML: À2.5 mm from bregma).Using dental cement, the microdrive was secured to six surgical screws inserted into the cranium.One caudal screw, situated above the contralateral hemisphere, was used as ground.Over the course of the next $9 days the implanted animals were retrained on the fPRAT and the tetrodes were gradually lowered until the regions of interest were reached.

Data acquisition
FPRAT sessions, flanked by rest episodes lasting between 45 and 60 min, were recorded during the first half of the day (9:00-14:00 h) and rats were subjected to blocks of three sessions until experiments were terminated.OFC tetrode positions were adjusted daily after the recording sessions to optimize the number of recorded units.To ensure recording stability, tetrodes were allowed to stabilize overnight and were never turned in the morning, prior to a recording session.
Maze sensor and electrophysiology data were acquired using a 64-channel Cheetah setup (Neuralynx Inc., USA).Spike channels, bandpass filtered between 600 and 6000 Hz, were sampled at 32 kHz, and data were acquired in a 1 ms window when signal amplitude exceeded a pre-set voltage value.Continuously sampled LFP channels were filtered between 1 and 475 Hz.
Video data, captured via a color CCD camera situated centrally above the maze, were sampled at 30 frames per second with a spatial resolution of 352 x 240 pixels (0.4 cm per pixel) and recorded using Windows Movie Maker software (Microsoft, Redmond, WA, USA).Individual frames were automatically labeled online with numeric codes representing behavioral sensor events (e.g., arm entry) and video data was synchronized offline with sensor and electrophysiological data.

Histology
At the conclusion of experiments rats were anesthetized and the end positions of the tetrodes were marked by passing a 12 mA current for 10 s through the tips of the tetrodes.Scar tissue was allowed to form for 24-48 h before the animals were sacrificed with an overdose of Euthasol (80 mg/kg; AST Farma BV, Oudewater, The Netherlands) and perfused with a 4% paraformaldehyde solution in phosphate buffered saline (0.1 M, pH 7.4).After removal from the skull, the brain was stored in 4% paraformaldehyde at 4 C. Serial, 40 mm thick, coronal sections were cut using a vibratome (Leica Biosystems, Nussloch, Germany), and subsequently stained with cresyl violet solution.
Recording locations were estimated by comparing records of daily turning steps and final tetrode positions (Figure 1C, bottom), as determined in stained sections, to a stereotaxic atlas. 105Electrophysiological data were obtained from area LO and the ventral (AIV) and dorsal agranular insular (AID) cortices (Figure 1C, left) for the prefrontal bundle, and the CA1 subregion of the dorsal hippocampus for the caudal bundle (Figure 1C, right).

QUANTIFICATION AND STATISTICAL ANALYSIS Behavioral analysis
Reward-based spatial learning was measured using a relative place preference index (RPI 50 ): RPI = ðN À nÞ = ðN + nÞ (Equation 1) representing the relative difference between the number of valid trials performed at rewarded (N) and unrewarded (n) ports.
To measure within-session development of reward site preference, maze run episodes were divided in laps.A lap was defined by two consecutive crossings of the same runway sensor such that, in between, all other runway sensors were activated in clockwise direction at least once.Behavioral analysis was limited to the first eleven laps and the cumulative RPI, based on the number of valid trials performed in the current and all previous laps, was calculated for laps 1 to 11 in each fPRAT session.Rats performed between 5 and 20 laps per session and sessions with less than 11 laps were discarded.Only the first 11 laps from each session were taken for further analyses.To render performance scores comparable between animals, learning curves from all sessions of each rat were normalized to the maximum attained over the entire session.Unless mentioned otherwise, curves for the first, second and third sessions of each block were averaged across blocks, and mean differences between sessions were calculated across laps.Statistical significance was determined using a Bonferroni-corrected Wilcoxon Matched-Pairs Signed-Rank Test (WMPSRT).

Ripples, slow wave activity and Active states
7][108] All ripples with a squared amplitude that exceeded 4 standard deviations from the mean for at least 6 ms were selected and events shorter than 25 ms, or with an amplitude larger than 2000 mV were excluded from analysis.QW-NREM episodes were detected as time intervals during which rats remained immobile in continuously sliding time-windows of 4 s and the theta band spectral density to overall LPF power ratio did not exceed a value of 0.3. 40eriods of enhanced OFC firing activity during QW-NREM, i.e., Active states, were determined from multi-unit data and slow-wave LFP recordings.For a given recording session, spike data from all OFC tetrodes recorded during pre-and post-task rest episodes (Rest1 and Rest2, respectively) were analyzed in 100 ms bins from QW-NREM intervals.Active states were defined as intervals of sustained activity in which the number of spikes exceeded a pre-set threshold defined as 50% of the number of units in that session. 64In addition, we determined Up states as a subset of Active states which were preceded or followed by a slow wave within 150 ms. 64Slow waves were identified by filtering LFPs between 0.3 and 1.5 Hz, and events above or below 0.5 SD were marked as slow waves. 109Slow wave analysis was performed using the YASA Python package. 110Additional analysis yielded qualitatively similar results when comparing Up state and Active state properties (similar duration, firing rate and number of spikes per state), with a majority of Active states satisfying the slow-wave criterion used to define Up States (61% of Active states were confirmed as Up states).In the results, however, we mostly refer to Active states and it is important to note that these were not identical to Up states.Note that the number of spikes per Active state showed a graded distribution (Figure S3), indicating that the lowest and highest quartile do not represent discrete classes.

Analysis of spike data
A customized version of MClust-3.5 (A.D. Redish) was used for spike sorting.Briefly, using Klustakwik 1.7 (K.Harris), spike data was clustered semi-automatically in multidimensional parameter space, where each dimension was associated with waveform peak amplitudes, energy and the first derivatives of the energy from the four tetrode channels.The resulting spike clusters were refined manually based on spike amplitude stability over time, cross-correlogram properties, and separation quality assessed using the L-ratio and isolation distance. 622][113] Subsequently, spike time data and waveforms were imported in MATLAB for further analysis.
Single units from OFC and CA1 were separated into putative pyramidal neurons and interneurons based on the properties of their extracellularly recorded waveforms (Figure S2).For each cluster, spike waveforms were averaged and normalized to the peak amplitude.Subsequently, an agglomerative hierarchical clustering algorithm (clusterdata, MATLAB) was used to discriminate between putative interneurons and pyramidal neurons based on average firing rate and waveform repolarization dynamics, estimated using the waveform amplitude at 0.55 ms after the spike peak. 114,115A large majority of neurons belonged to two cell clusters, one of which exhibited a high firing rate and fast repolarization, and the other showed a low firing rate and slow repolarization.These were classified as fast-spiking, putative interneurons, and putative pyramidal neurons, respectively (Figures 2A, 2B; and S2).Unless otherwise noted, interneurons were excluded from the analyses.

Firing rate modulation during non-REM sleep and quiet wakefulness
To assess possible interactions between OFC neuronal activity and hippocampal ripple activity, sharp wave ripple (SWR)-triggered peri-event time histograms (peri-ripple time histograms, PRTHs 57 ) were computed from OFC spike data recorded during Rest1 and Rest2.To allow for correct baseline estimation, only peak times from ripples recorded at least 10 s apart were taken.For individual units, ripple modulation was assessed by comparing a time interval from 1 s before to 1 s after ripple peak (0.5 s bin size) to four control bins centered around À5 s using WMPSRT.A bin was considered significantly different from baseline when all four tests reached statistical significance (p < 0.05).

Figure 1 .
Figure 1.Task setup and behavioral training (A) Rats were pretrained on a hexagonal maze to obtain a sugar-water reward from six reward sites.Top: following a pre-electrophysiology training phase, where rewards were removed from three adjacent ports, animals were implanted with a tetrode microdrive.Middle: session structure.Bottom: during maze testing rats were subjected to a flexible place-reward association task.(B) Top: task structure using a block design, with each block spanning 3 sessions.Bottom: relative to distant geometrical cues (black rectangle, arrow, and loudspeaker) reward configurations (3 rewarded and 3 adjacent unrewarded ports) were kept unchanged for 3 consecutive sessions.After each block of 3 sessions, reward positions were changed pseudorandomly by 2-4 ports.(C) A split drive targeting the lateral OFC (LO, top) and the CA1 region of the hippocampus (right) was implanted after the pre-training phase.Bottom-left, zoomin of the rectangular area outlined above.Arrowheads and arrows indicate tetrode tracks and tetrode tip locations, respectively; AID and AIV, dorsal and ventral parts of the agranular insular cortex.(D) Behavioral performance shown as a normalized, cumulative relative place-preference index (RPI) for each lap and preceding ones, calculated across the first 11 laps of each session.Data were averaged across all animals and session blocks.(E) Mean differences (GSEM) between normalized cumulative RPI across consecutive sessions in each block, calculated for all laps, **p < 0.01, *p < 0.05 -Bonferroni-corrected WMPSRT.

Figure 2 .Figure 3 .
Figure 2. Reactivation in orbitofrontal cortex during rest (A) Normalized spike waveforms of putative pyramidal cells (PCs, green, left) and interneurons (INs, blue, middle) recorded from the OFC, and their corresponding averages (Avg, right).Arrowheads indicate waveform amplitude at 0.55 ms from peak.(B) Classification of OFC single units into putative pyramidal cells (green) and interneurons (blue), based on firing rate and repolarization amplitude.(C) Patterns of correlations between cell pairs measured during Rest1, Task, and Rest2 for two example sessions (2 nd session of the 2 nd block and the 1 st session of the 6 th block); the 2 nd session was part of a block with 12 units (top), and the 1 st session was in a block with 8 units (bottom).The color map shows correlation strength and sign, and thick lines indicate cell pairs which were correlated during Task and also during Rest1 or Rest2.Upper panel: note how negative correlations during Rest2 contribute to pattern similarity relative to Task.(D) Variance of correlation patterns observed during post-task rest, explained by correlations found during Task and controlled for pre-existing correlations as found during Rest1 (explained variance: EV), versus the reverse explained variance (REV).Gray circles and boxplots show values for individual sessions and Q1, median and Q3, respectively; connected black circles are example sessions shown in C. Inset shows EV and REV values calculated using the same dataset, except that spikes recorded during SWRs were eliminated.WMPSRT: *p < 0.05.

Figure 3 .
Figure 3. Continued (F) Rate maps for a pair of putative pyramidal neurons in OFC contributing to reactivation (rate map correlation: 0.74).Numbers indicate maximum firing rates (lower left).(G) Distributions of correlations between rate maps of neuronal pairs classified as contributors (black, top) and non-contributors (gray, bottom).Note the rightshifted distribution of contributor correlations.(H) Mean (GSEM) calculated for Pearson's correlation coefficients shown in G, WRST: **p < 0.01.

Figure 4 .Figure 5 .
Figure 4. Orbitofrontal reactivation depends on recency of changes in place-reward configuration and task performance (A) Boxplots (Q1, median, Q3) show explained and reversed explained variance (EV and REV) calculated for 21 sessions corresponding to 7 complete blocks using OFC units.Gray markers and lines show data from individual sessions, WMPSRT: ***p < 0.001.(B) Left: Mean EV and REV calculated by session rank across blocks; Right: corresponding mean (black squares) and single-session differences (gray circles) between EV and REV (reactivation strength).Reactivation strength decreased significantly as a function of change recency (Friedman's test: p < 0.05, Bonferroni-corrected WMPSRT: p < 0.05).(C1) Schematic illustrating the calculation of the Pearson's correlation coefficient shown in C2 and C4 (blue square).Correlations between the reactivation strength on a given session (Day d) and differences in behavioral performance between the subsequent session (Day d+1) and the current one for which reactivation strength was computed, calculated for lap ranges 1-2 and 6-11, on Day d+1 (all sessions 2 and 3) and Day d (all sessions 1 and 2), respectively.All 14 Day d and d+1 pairs were used.RPI: relative place preference index.(C2) Correlation between performance (RPI) differences and reactivation strengths (r = 0.67, p = 0.008).The performance difference between the initial laps on Day d+1 and laps 6-11 on Day d was negative in a majority of cases, presumably because the initial laps on Day d+1 may involve more exploratory behavior and delay in reinstating task set as compared to performance in later trials.The higher the reactivation strength on Day d, the easier it was for rats to quickly achieve similar performance on Day d+1 relative to Day d. (C3) Distribution of correlations between reactivation strength and performance change computed by bootstrapping lap identity (for all Day d and d+1 pairs and all lap ranges).The purple line indicates Pearson's correlation coefficient at the 95 th percentile (r = 0.63).(C4) Correlations between reactivation strength and performance difference, calculated for different lap ranges on Day d (abscissa) and Day d+1 (ordinate).Blue and purple contours indicate the example in C2, and significant (p < 0.05) Pearson's r values estimated using the bootstrap test in C3, respectively.This graph raises empirical evidence for a significant relationship between reactivation after the task performed on Day d and the initial behavioral performance on the next day d+1, corrected for the RPI on Day d.

Figure 5 .
Figure 5. Continued (F) Top and bottom, examples of peri-ripple time histograms (PRTHs) for two OFC cells during Rest1 and Rest2, respectively; red asterisks indicate significant bins, *p < 0.05, Wilcoxon's matched-pairs signed-rank tests.(G) Z scored average PRTHs shown in D and E. Thick horizontal red lines indicate statistically significant differences between PRTHs computed across Rest1 (cyan) and Rest2 (black) episodes, determined using a cluster-based permutation test (p < 0.05).

Figure 6 .
Figure 6.Orbitofrontal reactivation and joint orbitofrontal-hippocampal reactivation occur during different brain states (A) OFC-CA1 cell sequences recorded during three sessions (columns).For each example sequence, Highly (top) and Sparsely Active states (bottom) are shown for the same spiking cells (red -CA1 units, black -OFC units).Each panel shows spikes from all active cells in each example Active state, sorted by peak firing rate time within each respective state.(B) Patterns of correlations between OFC (black circles) and CA1 (gray circles) cell pairs computed for Highly Active states across all task episodes from two example sessions.The color bar shows correlation strength and thick lines indicate cell pairs correlated during Task and also during Rest1 or (C) EV and REV calculated during Sparsely and Highly states (left and right, respectively) identified in Rest1 and intervals, for OFC neuronal ensembles (36 sessions).(D)Same as C, except that the variance was computed for ensembles sessions).Inset shows EV and computed for Highly Active states spikes fired during hippocampal SWRs; *p < 0.05, Bonferroni-corrected WMPSRT, n.s.not significant.