Unexpected changes in learned task contingencies trigger sharp wave ripples in the primate hippocampus during virtual navigation

The hippocampi and mesial temporal lobes play a central role in episodic memory and associative learning. It is unclear how unexpected experience inuences learning. Hippocampal sharp wave ripples (SWR) are an electrical biomarker of memory consolidation. We tracked when and where SWR occur during 2 tasks. Local eld potentials were recorded in the hippocampi, entorhinal cortices and amygdalae of non-human primates (NHP; n=3) performing reversal and associative learning tasks in a 3D virtual environment. Our results show hippocampal SWR occurred when learned task contingencies were unexpectedly altered. Surprise rewards and reward denial were associated with SWR rates 9.8x and 8.0x greater than expected rewards. The highest density of SWR occurred in zones where errors were made. SWR were preceded by event-related potentials in the amygdala but not entorhinal cortex. Our results suggest that SWR generation in primates may prioritize behaviourally relevant experience for commitment to memory to allow exible learning. and expectancy estimated awake NHP incorporated into internal models of the world when there is a difference between an expected and a perceived outcome.


Introduction
The hippocampi, entorhinal cortices and amygdalae play a signi cant role in learning and memory. The involvement of the hippocampus in memory was rst described in Patient H.M. after a bilateral resection of his mesial temporal lobes left him unable to form new memories (Scoville & Milner, 1957). Memory de cits due to hippocampal damage can be observed across commonly studied non-human mammals Gaskin et al., 2003;Jarrard, 1978;Morris et al., 1982;Squire, 1992;Zola et al., 2000). Coordination of the hippocampi and the entorhinal cortices supports associative learning (Igarashi et al., 2014;Schomburg et al., 2014) and episodic memory formation (memory of autobiographical events); notably, by integrating self-motion into place elds to map a physical space Hafting et al., 2005;Hasselmo et al., 2009;Sargolini et al., 2006;Jeewajee et al., 2008). Amygdalohippocampal communication modulates memory salience (Zheng et al., 2017) and correlates with memory performance (Seidenbecher et al., 2003;Sutherland and McDonald, 1990;Dunsmoor et al., 2019). In fact, concurrent injury of the hippocampus and amygdala was required to reproduce the severe memory impairment experienced by H.M. (Mishkin 1978;Murray and Mishkin, 1983). It's generally uncontroversial to say that experiences which are unexpected and therefore more salient are more memorable than those that are commonplace. The sensory processing which underlies the salience of an event is likely in uenced by the complex interaction of topdown and bottom-up attention (Macaluso et al., 2016) (Tanner & Itti, 2019) (Fecteau & Munoz, 2006). However, few studies have documented neural correlates of these processes in structures such as the hipocampus and amygdalae. Here we tested the hypothesis that the frequency of electrical neural signatures of memory consolidation in the mesial temporal lobe, depends on the saliency and behavioral relevance of a task event.
The hippocampal sharp wave ripple (SWR), rst described in rodents by Dostrovsky and O'Keefe (O'Keefe, 1976), is a strongly synchronous event generated in the hippocampus that has been observed in all studied mammals, including humans (Bragin et al., 1999;G. Buzsáki, 1986). The causal relationship between SWR, learning and memory was demonstrated in animal models when their selective disruption by electrical stimulation was shown to impair memory performance (Ego-Stengel & Wilson, 2010;Girardeau et al., 2009). SWR interruption did not affect place eld activity but caused a speci c learning de cit that persisted throughout training (Jadhav et al., 2012). Importantly, SWR are observed during non-REM sleep (slow wave sleep) (G. Buzsáki et al., 1983) and during SWR, the hippocampus replays temporally compressed sequences of the ring patterns that were observed when an animal was awake (Jadhav et al., 2012;Wilson & McNaughton, 1994). Maingret at al., and Yang et al., provide evidence supporting that this synchronous replay enables the structural cortical remodeling required for episodic memory consolidation (Maingret et al., 2016;Yang et al., 2014).
However, the large amount of research in that area has been conducted in rodents. Data from primates about the role of awake SWR is scarce and it is unclear how SWR occurrence correlates with behavioral events during learning. Leonard & Hoffman show that the rate of awake ripples is increased when gaze approaches familiar targets (Leonard & Hoffman, 2017). This suggests a link between ripples and memory of recently encoded experiences in NHP. To measure how SWR appearance is related to behavioral events during learning we recorded SWR from the hippocampi of 3 rhesus macaques as they learned to navigate and perform reversal learning and associative memory tasks in a 3D virtual environment. We determined the timing and location of each SWR detected during task performance and analyzed the relationship of SWR occurrence to task events. We found that SWR are maximally generated with unexpected task events such as errors (incorrect choices and missed targets) and after infrequent alteration of otherwise regular task conditions. We also found that during certain behaviors SWR are accompanied by a prominent event-related potential in the amygdala and entorhinal cortex; with the amygdalar activity leading the SWR.

Methods
All animal care and experimental procedures were approved by either the Queen's University Animal Care Committee or the McGill University Animal Care Committee and were conducted in accordance with the Canadian Council on Animal Care guidelines on the care and use of laboratory animals.

Subjects
Experiments were performed on three healthy male rhesus monkeys (Macaca mulatta), here referred to as NHP L (9 years old;14.3 kg), NHP W (7 years old; 7 kg) and NHP R (14 years old; 12 kg). NHP L and NHP R were single housed, NHP W was group housed. Prior to electrophysiological recording, NHP R and NHP W were trained to transfer into the laboratory at McGill University using a standard NHP chair. NHP L was trained at Queen's University using methodology and equipment that has been described previously (Mcintosh et al., 2019). Prior to electrophysiologic recording, NHP R and NHP W were trained to perform the associative learning task. NHP L was recorded from rst presentation of the reversal learning task.

Behavioural system and tasks
We used two different tasks to determine whether there is a relationship between SWR and task events; a reversal learning task and an associative learning task, both of which are performed with spatial navigation. We used three versions of the reversal learning task to clarify the relationship. Correct behaviour was reinforced with liquid rewards (water or juice) instead of solid food rewards to reduce the distortion from chewing. We created unexpected task conditions by: 1) altering target size such that the NHP could miss it and be denied a reward, even if he chose the correct arm 2) denial of reward when the NHP accurately navigates to a visible target; and 3) delivery of surprise rewards outside of a target.

System setup
In all tasks, the NHP used a two-axis joystick (M212, PQ Controls, Bristol, CT) to navigate a 3D world displayed on 3 video monitors. Eye tracking was employed (EyeLink II, SR Research, Ottawa, Canada). This 3D world was built with an open-source library running a freely available videogame engine (Unreal Engine 4, Epic Games, Inc., Potomac, NC) . A control computer ran an experimental suite programmed in MATLAB (Mathworks Inc., Natick, MA), called NIMH MonkeyLogic (Hwang et al., 2019). MonkeyLogic controlled the computer running the videogame engine and the transfer of behavioural data (NHP position in the 3D world, eye-data, task conditions and performance) to the neural signal processor for accurate synchronization. The tasks took place in an X maze ( Figure 1A). An X maze is a double-ended version of the classic Y-choice maze (similar to the T choice maze) from rodent literature (Biggan et al., 1991;Botwinick et al., 1963;Ingles et al., 1993;Redish, 2016). It has been shown that spatial tasks in the X maze are associated hippocampal activation . To examine SWR distribution in this task, we divided the maze into zones ( Figure 1B).

Reversal learning task
The three versions of the reversal learning task are summarized in Table 1. The rst version had no visual cue for the target at any time.
The NHP must use trial and error to nd the reward. Once rewarded, the target was left in the same location and remains invisible. The NHP was teleported to the center of the Navigation zone at the start of each trial and had to use the context of the maze (trees and mountain) to remember where that target was and navigate to it. The target size was purposefully smaller than the maze arm, so it was possible for the NHP to go down the correct arm and miss the target. This Missed zone, where the NHP was located after passing the invisible target, is shown more clearly in Figure 1B. Separately analyzing this zone allows for the study of self-cued error that occurs when an arm choice is correct, but the target is missed and the NHP is not rewarded.
In the second version of the task, the target became visible once the NHP has entered the Decision zone. Once the target appeared, the NHP navigated towards the visual cue to get the reward. Once performance was stable, we denied the NHP a reward for correct choices (note that incorrectly chosen arms were never rewarded). This denial was only done on 1% of trials in Version 2 to prevent unlearning of reward contingencies. The relative proportions of the number of trials the NHP performed in each task version are listed in the Proportion column of Table 1.
In the third version of the task, the NHP was surprised with a reward outside of the maze arms. The invisible, surprise-reward target location was pseudo-randomly chosen for each version 3 trial. It was pseudorandom, and not random, because it could not occur in a previously rewarded location. This manipulation was used to test whether increased SWR generation was speci c to negative outcomes or more generally related to expectancy. Version 3 had the same requirements for correct trials as version 1.

Associative Learning Task
The structure of the associative learning task has been described elsewhere.  Brie y, the NHP needed to associate the context on the maze walls with a hierarchy of visible target colours and choose the target associated with the highest reward. For our analysis, a trial was "correct" if the target associated with higher reward was chosen. Task conditions are shown in Table 2. The "Rewarded" target (in contrast to the targets associated with reduced rewards or no rewards) was the correct choice when it is presented.
Here, as in the reversal learning task, we assumed that once task conditions were learned, an NHP that is actively navigating toward a target is expecting a reward.

Quanti cation of behavioural performance
We used a state-space model paradigm to characterize learning as the probability of a correct response as a function of trial number. This state space model is 2 equations: a state equation to describe the unobservable learning process (Kakade & Dayan, 2002;Kitagawa & Gersch, 1996) and an observation equation to relate our behavioral data (correct and incorrect responses) to the unobservable learning state process. The "learning curve" can be de ned as a function of the learning state process such that an increase in the learning process increases the probability of a correct response. The goal of this analysis was to estimate the learning curve for each NHP. For a given trial, the observation equation was expressed as follows: Where P k denotes the probability of a correct response and is de ned as: Where μ is determined by the probability of a correct response by chance in the absence of learning or experience, n k = 1 is a correct response and n k = 0 is an incorrect response. We assume that P k is governed by the unobservable learning state process X k which is de ned as: We de ne the learning trial as the rst trial for which there is reasonable certainty (>95%) that the NHP will perform better than chance for the remainder of the session (Smith et al., 2004).

Surgical Procedures
NHP L underwent 2 surgeries under general anesthesia, separated by a recovery period of at least 4 weeks. The rst surgery was performed to implant the halo for head-xation (Azimi et al., 2016). The second surgery was to implant the cranial hardware, including the custom recording implant and linear microelectrode arrays (LMAs) (MicroProbes, MD USA) for recording. Chronic LMA were implanted in NHP L using a custom NHP stereotactic arc-radius frame adapted from human neurosurgery (Chen et al., 2015). Chronic electrodes introduce less tissue damage than daily penetrations and adjustments required for acute recordings (Lansink et al., 2007;McNaughton, 1999) and facilitate the longitudinal study of electrophysiological changes during learning. SWR can be detected in the local eld potential (LFP) of the LMA, which is a summation of neuronal activity within approximately 200 -400 μm of the recording electrode (Kajikawa & Schroeder, 2011;Katzner et al., 2009;Xing et al., 2009). Standard coordinates, from Paxinos' atlas (Paxinos, 2009) and preoperative MRI (1 sequence, 0.6 mm isotropic pixels, 3.0 T Siemens TimTrio) were used to plan the surgical trajectories. Implants were positioned over the right prefrontal cortex and trajectories were obtained using the stereotaxic frame and a custom neuronavigation system. Electrode trajectories were chosen to minimize damage to brain tissue and avoid blood vessels. All chronic LMA were implanted unilaterally on the right side. Each LMA was fabricated using 37.5 μm platinum/iridium microwires (Microprobe) threaded through polyimide guide tubes for MRI compatibility. The 16 contacts on the LMA were spaced 250 μm apart and span 3.75 mm.
In NHP L, LMAs were implanted in the hippocampus, amygdala, and entorhinal cortex. Both humans and macaques have anatomical connections between these structures. Speci cally, the amygdala (lateral, medial and basal nuclei) projects to the entorhinal cortex (layers II and III) and then to the hippocampus (CA1, CA3, Dentate Gyrus and the subiculum) (Amaral & Cowan, 1980;Höistad & Barbas, 2008;Pitkänen et al., 2002). We recorded from the basolateral amygdala because of its proposed involvement in memory salience and its particularly dense connectivity with the hippocampal body and relevant cortices in NHP (Andersen et al., 2006;Ranganath & Ritchey, 2012;Royer et al., 2010).We also recorded from the medial entorhinal cortex because it is associated with spatial and non-spatial episodic memory formation (Aronov et al., 2017;Basu et al., 2016;Lipton & Eichenbaum, 2008). Post-operative imaging allowed us to verify the LMA placement within the three structures (see Figure 1C for ventral temporal hippocampus).
The surgical procedures and cranial hardware for NHP W and NHP R are described elsewhere (Blonde et al., 2018;Doucet et al., 2020).
Brie y, before each behavioural task session, a tungsten microelectrode was lowered into the CA3 region of the right hippocampus for acute recording. Cranial implants and the head xation hardware were implanted after NHP W and NHP R learned the behavioural task. In contrast, NHP L underwent all surgeries before seeing the behavioural task.
Ripple detection methodology SWR were detected using an algorithm adapted from Skaggs et al (Skaggs et al., 2007). Raw voltage signals, acquired at 10,000 samples/second, were band-pass ltered (100-300Hz), recti ed and low-pass ltered (cut-off 20Hz). Unlike Skaggs et al. (who employed a user-de ned threshold to capture high amplitude peaks) we used a custom peak detection algorithm that detects peaks with SWR-speci c features. Our algorithm captures peaks with a prominence of at least 4 standard deviations from the event-free baseline and for which the peak duration falls within the physiological range of SWR (György Buzsáki, 2015). Figure 1D shows an example of a raw SWR and the accompanying sharp wave.
To assess the signi cance of the SWR distribution, we performed a one-way analysis of variance followed by a multiple comparisons test.
The analysis of variance rejected the null hypotheses; that the mean ripple densities, mean ripple rates or mean ripple factors are equal for each zone if they are different enough such that p < 10 -10 , where p was the probability of obtaining the observed results assuming that the null hypothesis is correct. We used multiple comparisons testing (Tukey-Kramer Procedure) to determine which zones had means that were signi cantly different than the others.

Results
NHP L performed 16 reversal learning task sessions (version 1), with 1834 trials (minimum 50 trials per session) that met our learning criteria. NHP R and NHP W performed 6 and 11 associative learning task sessions, respectively, with a total of 4246 trials that met our learning criteria. Figure 2 shows the average learning curve for each NHP. All learning curves show an increasing probability of a correct response with increasing trial number. The steep initial slope of the NHP R and NHP W learning curves was likely due to their extensive experience with the task. On average, we recorded 13-15 SWR per minute, per session. We examined the spatial and temporal distribution of ripples and found that their distribution was associated with errors. SWR distribution could not be explained by movement artifacts or by disproportionate time spent in each zone. We used task variations to show that SWR were associated with unexpected events and not just the absence of reward.

SWR detection and localization
SWR were detected in all 3 NHP ( Figure 3A; left). In conjunction with post-operative MRI, we used current source density analysis to determine which electrode contacts coincide with each hippocampal layer for more accurate SWR detection ( Figure 3A;right). NHP drinking resulted in noticeable signal distortion so we use a continuous wavelet transform to analyze recordings of the NHP drinking, in the absence of a task, to determine whether the frequency of the "licking artifact" overlapped with our frequency band of interest (150-250 Hz). We found that licking did not introduce spurious 150-250 Hz power to our signal ( Figure 3B). Other non-electrophysiological artifacts from body shifting were minimized with head xation and were less than 100Hz so they should not affect our frequency band of interest (Tandle, 2015). Aliasing from high frequency noise did not distort our results because our data was lowpass ltered with a cut-off frequency of 300 Hz before down-sampling.

SWR density related to unexpected events
We analyzed the spatial distribution of SWR across maze locations, and SWR timing relative to behavioral events for both tasks. A representative heat map of the average SWR density for version 1 of the reversal learning task is shown in Figure 4A. For the reversal learning task, ripple density increased in error zones across almost all sessions ( Figure 4B, right panel). The highest average ripple density (82.52%) was observed in the error zones, near or within expected target locations in an incorrect arm (NHP choose the incorrect arm and did not get a reward ) or near, but not within, targets in the correct arm (NHP correctly predicted the arm but missed the target and didn't get the reward). The next highest (7.10%) was observed in the decision zone, then during navigation (6.40%), in the pre-decision zone (3.50%) in the reward target zone (0.42%) and when the black screen was presented (0.06%) ( Figure 4B).
A similar pattern was observed for the associative learning task ( Figure 4C). The highest percentage of SWR occurred in error zones (81.39%), either in an incorrect arm or near a missed target in a correct arm. The next highest occurred during navigation (10.69%), in the decision zone (4.56%), in the pre-decision zone (3.29%), in the reward target zone (0.06%) and when the black screen was presented (0.01%). In the associative learning task, there were not as many SWR in missed locations as in the reversal learning task. This may be due to the higher accuracy of NHP R and NHP W, who had over 100 prior training sessions before data was collected, as well as the fact that the targets are visible in the associative learning task.
To determine if the time spent in each zone could explain SWR distribution, we normalized the SWR density in each zone by the time spent in that zone (at the time of that ripple). We call this ratio the SWR factor ( Figure 4D right). The ripple factor showed that the increased density of SWR in error zones (incorrect arms or correct arms but missed targets) cannot be explained by the time spent in those zones.

Reward Manipulation
In the reversal learning task, rewarded correct outcomes with invisible and visible targets were associated with mean ripple rates of 0.083 and 0.071 respectively. When rewards are denied in version 2 after the NHP correctly navigates to a visible target result in a mean SWR rate of 0.688, similar to the mean SWR of 0.667 we observe in error outcomes in version 1. Both version 1 and 2 resulted in a mean SWR rate that was 8.0 that of correct outcomes. In Figure 5A, the greatest ripple rate occurred when the NHP was surprised with unexpected rewards (V3). Denying visible rewards (V2) or giving surprise rewards signi cantly increased the SWR rate ( Figure 5A). Surprise rewards have the highest mean ripple rate (μ) averaging to 0.812 SWR per second with a standard deviation of (σ) 0.09 which is more than 9.8 that of correct outcomes (p <10 -15 ). There was no difference between the mean SWR in error, cued denied (i.e. visible target), and surprise reward (p =10 -15 ). This indicates that SWR generation is more likely associated with the violation of expectations and not with the denial of reward.
Ripple rates were lower in the associative learning task. There tended to be a higher ripple rate for error (i.e. when the NHP chose none or low when alternative was low or high respectively) than correct trials but this was statistically insigni cant ( Figure 5B, top panel). The ripple rate also tended to increase in both NHPs with low reward levels regardless of whether or not the outcome was correct on that trial ( Figure 5B).

Amygdala and entorhinal cortex activity during SWR
We examined event related potentials (ERP) in both the amygdala and the entorhinal cortex that accompany hippocampal SWR in NHP L.
ERP accompanied SWR in 75% of the analyzed sessions. Figure 6 A shows the mean ERP for the amygdala and for the entorhinal cortex for SWR that occurred during navigation and when a reward was denied because the NHP missed the invisible target. There was a signi cant ERP related to SWR produced when the NHP L missed the invisible target, which was observed in both the amygdala and the entorhinal cortex (Figure 6 B, left). The ERP in the amygdala preceded or 'led' SWR with μ and σ magnitude of 15.3ms ±10ms when reward was denied. ERP in the entorhinal cortex also accompanied denied rewards, but they occurred after the SWR rather than preceding it, and lag behind the amygdalar ERP by a μ and σ of 18.4ms ±8ms (Figure 6 A, right).

SWR and reward contingency learning
We calculate the SWR rate over the duration of each session and nd that, consistent with the rodent literature, the ripple rate decreases as the session conditions are learned (Papale et al., 2016). Figure 2 shows the mean learning curves for all three NHP overlaid with the mean number of SWR per trial averaged across all sessions. The correlation coe cient for these two means is shown in the top righthand corner for each NHP. The correlation coe cient between the mean learning curve and the mean number of SWR per trial was calculated for each NHP. The correlation coe cient was -0.84 for NHP L, -0.75 for NHP W and -0.36 for NHP R. All learning curves show an increasing probability of a correct response with increasing trial number. The initial slope of the NHP R and NHP W learning curves are likely steep because of their extensive experience with the task. The negative correlation of SWR per trial with learning was greatest in NHP L who had the least experience of the 3 NHP.

Discussion
Memory is not a passive record of sensory experiences; it is continuously updated through the integration of new information within the context of goals and previous experience. This is the rst study to correlate hippocampal SWR with the occurrence of unexpected events, highlighting a potential mechanism for their prioritized encoding into memory during learning. We demonstrated that the SWR were maximally generated with unexpected positive outcomes. Unexpected negative outcomes were associated with fewer SWR than unexpected positive outcomes, but they were still associated with more SWR than expected positive outcomes. In other words; as a task is learned with high delity and behaviours result in more frequent positive expected outcomes, SWR are produced less often. Our observations are summarized in Figure 6.
We are not the rst to propose that SWR generation may not spontaneously occur across all consummatory behaviors as previously described (György Buzsáki, 2015). Recently, Abadchi et al. also proposed SWR may pursue extrahippocampal modulation. Abadchi et al. described that, in anesthetised rats, older and more consolidated memories may initiate the hippocampal-cortical dialog (Abadchi et al., 2020). Our limited ERP data correlates extrahippocampal modulation with behavior. We demonstrate that SWR generation may be associated with ERP in mesial temporal lobe structures such as the amygdala and the entorhinal cortex -two structures known to support episodic memory formation. Amygdalar activity preceding task-related events associated with hippocampal SWR (e.g. denial of reward) which is consistent with the proposed role of the amygdala in salience and expectancy (Zheng et al., 2017).
Comparison to rodent SWR Unlike NHP, awake SWR in rodents seem to occur more frequently in reward locations than error locations (Ekstrom, 2015;Karlsson & Frank, 2009;Redish, 1999;Singer & Frank, 2009). This may highlight a difference between NHP and rodent memory systems. Alternatively, perhaps rodent SWR occur in effective error zones (such as when a target is missed) that are near reward locations. Ekstrom speculated that rodents may have more nely tuned spatial maps than primates (Ekstrom, 2015). If that were true, rodent SWR may re ne target locations with a higher resolution in their internal representations of space. With higher resolution analysis of where SWR occur, we may nd that even rodent SWR occur very near, but not in, task targets and are modulated by rodent expectations.
As rodents learn a behaviour, their hippocampal neurons appear to re in patterns that are predictive, rather than representative, of the behaviour. (Buckner, 2010;Buckner & Carroll, 2007;Lisman & Redish, 2009;Schacter & Addis, 2007;Stachenfeld et al., 2017) Sequences of recently active hippocampal place elds, which represent a pathway in a maze for instance, will replay in a time-compressed format before the animal runs down that pathway. Hippocampal replay of recent experience, including these predative patterns, is condensed during SWR in rodents (Joo & Frank, 2018). These predictive patterns usually occur at the 'choice points' of a maze and encode future behaviour. Often called 'vicarious trial and error', these predictive patterns have been proposed as the neural correlate for deliberating or imagining which route to take. As a learned behaviour becomes automated, the vicarious trial and error ring patterns vanish (Johnson & Redish, 2007) (Papale et al., 2016). Our results may be consistent with this theory. We show that the probability of a correct response and number of ripples per trial are negatively correlated and that SWR appear with unexpected events. We therefore conclude that SWR generation is associated with a mismatch between expectation and perception. However, this does not rule out the possibility that mismatches incite deliberation in NHP.
The seminal works of Hollup et al. and Packard & McGaugh in rodents, suggest that the hippocampus is required for context learning (when no visual target is provided so the subjects must recall the location of a goal from context) and not for cue learning (when the subject just navigates towards a visual goal) (Hollup et al., 2001). They did not factor the expectations of the recorded subject into this conclusion. Our results suggest that they would see reactivation of hippocampal neurons if the expected reward was randomly withheld (or randomly provided) after a cue-reward association is learned. In support of this idea, Fhyn et al. described a subset of rat hippocampus neurons that become selectively activated the rst time a change is made to a task it has learned (Fyhn et al., 2002). Our ndings are consistent with this increased hippocampal activation following changes to a learned task. In addition, our ndings support the proposed link between behavioural salience and hippocampal activation for goal directed or landmark-based navigation (Gothard et al., 1996).
While self-position may not be robustly encoded by the NHP's hippocampus, as it is in rodents (Courellis et al., 2019), cells that are distinctly tuned to other aspects of experience have been reported in NHP (i.e. cells tuned to reward locations and objects) . Single units should be recorded in those robustly tuned neuron populations in NHP to determine how predictive patterns emerge and change with prediction error. Rodent data shows clear differences in the hippocampal activation of novice animals and those previously trained in a similar task (Gauthier & Tank, 2018). Extensive training of a NHP, typical in this eld of research, likely also impacts NHP hippocampal electrophysiology, including a possible overall reduction in SWR generation if reward contingencies are stable, or perhaps increased SWR generation if the task involves the occasional surprise.
Finally, the differences between ndings in NHP and rodents may be related to differences in how they explore the environments. Rodents, with poor vision relative to primates but an enhanced sense of olfaction and somatosensation via specializations such as whiskers generally visit locations of potential rewards as mean of exploration. Moreover, place elds seem robust in rodents (O'Keefe & Conway, 1978) while highly dependent on task contingencies in primates . Primates are visual animals with high resolution foveas and color vision, that explore the environment through gaze xation and then engage in visits to potential rewarded locations. Gaze elds have been described in primate hippocampus (Rolls & Wirth, 2018) but not in rodents. One issue that needs to be determine is whether hippocampal maps in primates are more contingent upon gaze direction than spatial position. One way or the other, the differences between sensory systems and exploration strategies between species and the lack of su cient data in primates makes it di cult to position our ndings within the framework of the vast hippocampus literature in rodents. Over the next years the use of virtual environments and freely moving primates in hippocampus studies may help to clarify these issues.

Limitations
All recordings in were obtained from the right hippocampus. There may be differences in hemispheric specialization in the primate hippocampus. Human studies indicate that exploration and recognition show laterality, with the right hippocampus preferentially activating during exploration of new scenes and the left hippocampus preferentially activating during repeated scenes (Lee et al., 2016). The right hippocampus also seems to be preferentially recruited for large-scale, allocentric navigation whereas the left hippocampus is recruited for temporal sequence memory, egocentric navigation, autobiographical experience memory and match-mismatch associative memory (Iglói et al., 2010;Kumaran & Maguire, 2007;Lehn et al., 2009;Maguire & Frith, 2003). In humans, slow-wave sleep, during which SWR are concentrated, promotes dialog between the two hippocampi and has been correlated with successful route memory retrieval (Peigneux et al., 2004). SWR travel from the CA3 region to the CA1 region as well as along the septotemporal axis of the hippocampus, leads us to consider the possibility of SWR propagation between hippocampi as a travelling wave (Both et al., 2008;Patel et al., 2013). Future experiments should consider bilateral hippocampal recordings to determine whether there is hemispheric speci city for prediction error.
Another limitation is that our hippocampal recordings were only performed in the temporal areas of the hippocampus rather than along the septotemporal axis of the hippocampus. Difference in function of subregions of the hippocampus in the septotemporal axis have been described. In primates, the septal hippocampus is recruited during encoding spatial memory(E. Moser et al., 1993;M. B. Moser & Moser, 1998). In humans, episodic memory encoding preferentially recruits the temporal hippocampus while retrieval seems to recruit septal regions (E. Moser et al., 1993). Our data supports that the temporal hippocampus and its input from the amygdala is involved in processing unexpected events.
Another possible limitation of our study is that we only studied male NHPs. Although there is no a large body of evidence that Hippocampus processing is different in males and females, one should achieve a balance of males and female NHP in future studies. This will help determining if there are behavioural or electrophysiological sex differences in learning or consolidation. We found one study suggesting that female NHP estradiol related variability may in uence spatial cognitive task performance (Lacreuse et al., 2001).

The hippocampus and novelty
It is well documented that the mesial temporal lobes, including the hippocampus, responds to novelty (Gabrieli et al., 1997;Howard et al., 2011;Jessen et al., 2002;Köhler et al., 2005;Ranganath & Rainer, 2003;Yassa & Stark, 2009). However, the hippocampus is not simply a novelty detector as it also habituates to accumulating novelty (Strange and Dolan 2001;Yamaguchi et al. 2004). Murty et al. 2016, noticed increased hippocampal activation when surprise was associated with reinforcement -this increased blood oxygen leveldependent (BOLD) activation may re ect the increased SWR generation that we saw after surprise reward delivery (Gabrieli et al., 1997;Howard et al., 2011;Jessen et al., 2002;Köhler et al., 2005;Ranganath & Rainer, 2003;Yassa & Stark, 2009). The novelty response, measured in BOLD in human hippocampi, diminishes as more novel stimuli are presented (Murty et al., 2013). Novelty detection appears to be modulated by a dynamic network which suppresses the novelty response in the absence of behaviorally relevant outcomes.
Habituation to novelty is seen in multiple areas in the brain and could be an evolutionary adaptation for energy conservation (Sokolov 1963;Rankin et al. 2009). As behaviour is automated, the decrease in SWR with learning could re ect a shift to circuits that require fewer cognitive resources.

Dynamic prediction model learning
Based on our ndings we propose a salience-focused, schematic concept of the classical two-stage model of learning and memory formation (György Buzsáki, 1989) ( Figure 7A). The classical two-stage model summarizes the multifarious processes which underlie memory formation. Stage 1 (learning) is dominated by afferent theta oscillations (3-7 Hz) from cortices that modify synaptic strengths in the hippocampus to transiently store learned information. Stage 2 (consolidation) is dominated by SWR activity extending from the hippocampus to the cortex and supports the consolidation of long-term memories. In this model, awake SWR tend to occur spontaneously during "consummatory behaviours" that pursue the "consummation" or completion of a goal-directed behaviour. The occurrence of SWR after the completion of a behaviour is in concordance with our ndings. However, our goal is to focus on why unexpected events are remembered while most experiences are forgotten.
Additionally, there are dissimilarities between rodent and primate hippocampal oscillatory activity. For example, in stage 1 the strong protracted theta-band activity observed in rodents is unlike that observed in primates; NHP theta power increases are relatively weak and transient (approximately 0.4 seconds). To remember a sequence of experiences, the order of their occurrence in the external world may be represented in the primate brain by the timing of neuron population ring in the gamma frequency (>30 Hz) with respect to the phase of an underlying theta rhythm. (Friese et al., 2013) Jenson et al., proposed that these rhythms allow for a top down time-compressed encoding of extended experiences (Jensen et al., 1996). Hippocampal replay of these time-compressed representations of experiences is concentrated during SWR bursts which punctuate the coordinated theta-gamma activity (Axmacher et al., 2010;Heusser et al., 2016;Lega et al., 2016;Peyrache et al., 2009).
Our results demonstrate a relationship between the hippocampal SWR and expectancy which can be dynamically estimated by the degree of prediction error. We propose a theoretical schematic model of the dominant role of hippocampal SWR in awake NHP ( Figure 7B). New information is incorporated into internal models of the world when there is a difference between an expected and a perceived outcome.
SWR drive the update of internal models with new or unexpected information, which is transiently encoded by the hippocampus that alters neocortical and subcortical targets to result in learning and memory encoding.

Conclusions
Prioritizing the integration of relevant information into internal models of the world enables adjustments to behaviors, which draw on relevant experience, thereby increasing the frequency of desired outcomes. Importantly, salient unexpected events are incorporated into memory in order to update our internal model of the world. Our study proposes a mechanism in primates by which behaviorally relevant experiences result in learning. If task performance is based on an internal construct of the task contingencies, e.g. a problem space (Newell & Simon, 1972), our data suggests that awake hippocampal SWR coincide with unexpected events in the subject's problem space. The awake hippocampal SWR allows for exibility of an NHP to unlearn, reconstruct or re ne these problem spaces, thereby updating their internal model of the world with novel information.

Figure 2
Probability of a correct response as a function of trial number (i.e. the "learning curve") and its relationship to SWR per trial. The black line shows the mean probability of a correct response as a function of trial number (coral, purple and blue shading = 95% con dence interval) the horizontal, dashed line shows chance performance. The central grey line show the mean number of ripples per trial (light grey shading = standard error). r = correlation coe cient for the speci ed NHP.

Figure 3
Sharp-wave ripple (SWR) detection and artifact frequency identi cation A. Three awake SWR corresponding with an increase in power around 200 Hz in the continuous wavelet transform of the raw LFP data (top left) and the output of our peak-detection software (bottom left). Current source density is used to localize hippocampal layers: o; stratum oriens, Pyr3; hilus CA3 pyramidal layer, Rad; stratum radiatum, m; molecular layer, g; granular layer, Pyr1; CA1 pyramidal layer red indicates a current source while blue indicates a current sink B Mean, un ltered band power for task-free drinking and quiet rest. Mean power was multiplied by frequency to correct for 1/f power scaling. The frequency band of interest is highlighted in red.

Figure 4
Spatial distribution of SWR occurrence. A. Average density scatter plot of SWR occurrence. Red indicates high ripple density. The invisible target is denoted with a white dashed circle. This distribution is for NHP L who showed the highest power in the correct arm but was otherwise representative of all 3 NHP. B. SWR density for the reversal learning task. Mean and 95% con dence interval of ripples per zone for NHP L. C SWR density for the associative learning task. Mean and 95% con dence interval of ripples per zone for NHP R and NHP W. D The SWR factor. Mean ripples per zone divided by time spent in each zone. The percentage of ripples in the error zone was signi cant independent of the time spent in each zone. An asterisk indicates that the mean was signi cantly different from all other means.

Figure 5
Page 19/20 Ripple rate, outcome and reward manipulation. A. Ripple Rates associated with different outcomes version 1 (V1, target invisible), version 2 (V2, target visible) and version 3 (V3, surprise off target) of the reversal learning task described in Table 1. B. Average ripple rates in the associative learning task for correct outcome (NHP chose low or high when alternative was none and low respectively) and incorrect (i.e. error) outcome where the NHP chose none or low when alternative was low or high respectively. (Top). Mean ripple rates after high and low reward in the associative learning task. Error bars show 1 standard deviation (Bottom).

Figure 6
The relationship between hippocampal SWR and event-related potentials (ERP) in the amygdala and entorhinal cortex of NHP L. A. ERP in the amygdala (left) and entorhinal cortex (right). The average LFP and standard error time-locked to hippocampal SWR which occur after missing a target (Denied reward) or during exploration of the virtual world (Navigating). B. Relative lead and lag times of ERP onset to SWR peak in the amygdala (AMY) and entorhinal cortex (EC) Our dynamic prediction model for awake learning developed from NHP data. Prediction errors are re ected in SWR which support memory consolidation and modify the internal model of reality, on which goal-directed behaviour is predicated.

Figure 8
SWR occurred after unexpected changes in learned task contingencies. The top left schematic shows the NHP brain and the electrophysiologic signatures, average ERP and SWR associated with each recorded region. The vertical bar indicates how these signatures align. The right side summarizes our results; the top panel shows that when a reward is expected and delivered, SWR don't occur whereas in the middle panel; when a reward is expected and not delivered and the bottom panel when a reward is not expected and delivered, more SWR occur. This is the same for trials with a visible target (not included in the schematic). In the bottom left, a learning curve from Figure 2 is revisited, overlaid with the mean SWR per trial and their correlation is shown. The central grey line shows the mean number of ripples per trial (light grey shading = standard error), r is the correlation coe cient for NHP L.