Reward disrupts reactivated human skill memory

Accumulating evidence across species and memory domains shows that when an existing memory is reactivated, it becomes susceptible to modifications. However, the potential role of reward signals in these mechanisms underlying human memory dynamics is unknown. Leaning on a wealth of findings on the role of reward in reinforcing memory, we tested the impact of reinforcing a skill memory trace with monetary reward following memory reactivation, on strengthening of the memory trace. Reinforcing reactivated memories did not strengthen the memory, but rather led to disruption of the memory trace, breaking down the link between memory reactivation and subsequent memory strength. Statistical modeling further revealed a strong mediating role for memory reactivation in linking between memory encoding and subsequent memory strength only when the memory was replayed without reinforcement. We suggest that, rather than reinforcing the existing memory trace, reward creates a competing memory trace, impairing expression of the original reward-free memory. This mechanism sheds light on the processes underlying skill acquisition, having wide translational implications.

Not only that applying reward following reactivation did not strengthen (Δ Day3 test -Day1 end ) memory retention compared to unrewarded reactivation (t 22 = − 1.834, ns), (Fig. 2a), but rather it exhibited a tendency towards inferior retention. To better understand the role of reward in skill memory reactivation we next examined whether the strength of reactivation correlated with subjects' subsequent memory strength on Day3. Whereas a statistically significant correlation was found between reactivation strength and memory strength on Day3 in the unrewarded group (r = 0.616, p = 0.033, Fig. 2b), this correlation was abolished in the rewarded group (r = − 0.243, ns, Fig. 2c). Both correlations were robust against outliers (bend correlation r = 0.606 and r = − 0.08 in the unrewarded and rewarded groups respectively), and significantly different from one another (Fisher Z = 2.05, p = 0.0404).
To further uncover the effect of reward on memory reactivation and subsequent memory strength we considered two competing models. According to the first, the magnitude of memory on Day 3 depends on how well the memory was encoded, regardless of reactivation (Fig. 3a). In a second, indirect model, the effects of encoding on Day 3 memory strength are mediated by the strength of reactivation (Fig. 3a). Modeling the behavior displayed by the rewarded and unrewarded groups revealed clear differences among the two. Regression analysis indicated an overall similar strength, among the two groups, in each of the direct paths in the model (i.e., encoding to Day 3 memory strength, encoding to reactivation and reactivation to Day 3 memory strength; Fig. 3b,c, Table 1). However, testing for mediation effects based on an indirect path from encoding to Day 3 memory strength, bias-corrected confidence estimates and bootstrap resampling 21,22 showed that in the unrewarded group the effect of encoding on Day 3 memory strength was significantly mediated by reactivation (B = 1.284, confidence interval = 0.241 to 2.849, Table 2), but not in the rewarded group (B = 0.469, confidence interval = − 0.23 to 1.01, Table 2).
To separate between the effects of reactivation and the effects of reward per se and to control for possible intervening effects stemming from the timing of reinforcement, we ran a second experiment in which subjects (n = 12) underwent the same experimental procedures, wherein performance on the reactivation trial was reinforced with a fixed delay of 6 h, a period which may surpass the time-window during which memory becomes susceptible to further modifications following reactivation 10 . In this delayed reward condition reactivation strength significantly correlated with Day 3 memory strength (r = 0.678, p = 0.015; Fig. 4), in a way which did not differ from the unrewarded group (Z = 0.23, ns). Additionally, retention did not differ from that of the unrewarded group (t 22 = 1.527, ns). Overall, these results suggest that reinforced memory reactivation disrupts subsequent memory strength.

Discussion
The goal of this study was to identify the role of reinforcement in reactivated skill memory. As reward has been shown to mediate the encoding of information in both the procedural and declarative memory systems 11-17 , we reasoned that reinforcing an already encoded procedural memory following reactivation would (c) The experiment included three sessions. In the first session (Day 1) all subjects performed twelve trials of the sequential finger-tapping task. On the next session (Day 2), participants were divided into two groups, the first performing one reactivation trial with reinforcement and the second with performance feedback instead of reinforcement, allowing to tease apart monetary reward and simple performance feedback per se. In the third session (Day 3) subsequent memory strength was measured by having both groups perform three test trials of the task, with no performance or reward feedback. (d) Both the rewarded and unrewarded groups showed comparable encoding of the skill memory. Shaded lines denote standard errors of the mean.
impact strengthening of the memory trace and facilitate retention. We report however that reactivation with reward did not strengthen memory, relative to reactivation with no reward. Moreover, the relation between the strength of memory reactivation and subsequent memory as found in reward-free reactivation, was reduced for reward-based reactivation. In addition, statistical modeling revealed that unrewarded, as opposed to rewarded reactivation, indirectly mediated the link between encoding and subsequent memory strength.
A viable framework for interpreting the current results is that of competition between memory traces and memory systems. It is by now well accepted that various tasks invoke competition between memory systems that rely on dissociable brain networks and distinct computational processes 17,[23][24][25] . Competitive memory dynamics can also be formed between memory traces, and may specifically originate from reinforcement mechanisms. For instance, encoding of reward associated items interferes with the encoding of non-reinforced mnemonic representations 25 . Moreover, in episodic memory, competitive dynamics between memory encoding and learning from reward has been documented 17 , consistent with differential engagement of the medial temporal lobes and the striatum during learning 25 . Thus, a feasible framework for interpreting the disruptive effect of reinforcement following reactivation on subsequent memory strength is that the introduction of reward following reactivation may have resulted in a competition with the original encoded trace, which was averted when the memory trace was replayed with no reward. This framework is consistent with findings in Pavlovian learning demonstrating retroactive interference of new memories on reinstated memories [26][27][28] , or modification of reactivated memories through counterconditioning with new reinforcers [29][30][31] .
An alternative but related explanation on the unfavorable effects of reinforcement following reactivation on subsequent memory strength is based upon a prediction error mechanism. Prediction errors are believed to be a prerequisite for memory destabilization and reconsolidation [32][33][34] . However, it was recently suggested that memories can be weakened when they are mispredicted by the context, which was originally associated with these memories 35 . The current results may reflect a related prediction error mechanism, whereby the addition of reinforcement during the replay of procedural memory generates a prediction error which subsequently weakens memory. This prediction error is absent when the memory trace is replayed with no reward, leading to superior subsequent memory. Thus, in this respect reward may generate a previously unencountered context which ultimately weakens the memory trace. Future research should take into account that a combination of memory competition and prediction error mechanisms may underlie the disruptive effect reward has on the reactivated skill memory, both generating an impairment in subsequent memory strength. Our results further indicate that delaying the receipt of reward after the reactivation trial results in an intact significant relation between reactivation Taken together, these results open interesting avenues for future research. First, it remains to be tested if extrinsic modulation of reward systems, whether by means of pharmacological interventions 36,37 , or using non-invasive brain stimulation 38 exerts a similar influence on memory reactivation and subsequent memory strength. This will allow to further uncover the role of dopaminergic neuromodulation during memory reactivation. Second, the putative competitive dynamics between memory traces and their underlying neural underpinning could be further tested using brain imaging techniques, suitable for probing complex information representations, such as multivariate pattern classification analysis 39,40 .
The notion that existing memories can be modified with external interventions has far reaching clinical implications, as such interventions can be employed, for instance to disrupt maladaptive memories after post-traumatic stress or to reduce drug craving in addiction 41 . The current results demonstrate the contextual specificity required  for these interventions to as found and point to the need for additional studies, to further delineate the role of memory reactivation in shaping subsequent memory strength.

Subjects.
A total of thirty-six right-handed healthy subjects (13 men, 23 women; mean age 24.8 ± 2.2 standard deviation) participated in the study. All subjects gave their written informed consent, approved by Tel Aviv University's Ethics committee. All procedures were in accordance with approved guidelines. Musicians (in the past or present) were excluded from participating in the study. We have additionally required at least 6 h of sleep prior to each experimental session.
Task. During the experiment subjects were asked to perform a sequential finger-tapping task [18][19][20]42 . Each trial in the task lasted 30 sec, during which subjects had to repeatedly tap with their left non-dominant hand a 5-element sequence of finger movements as quickly and accurately as possible (the sequence was 4-1-3-2-4, whereby '1' , '2' , '3' and '4' correspond to tapping of the index, middle, ring and little fingers respectively). Tapping was performed on a 4-key response box (Cedrus, Lumina, Model LU440), placed in front of subjects during the experiment. Performance in the task was quantified in terms of the number of correct sequences tapped during each trial 18,19,42 . The same sequence was used in all experiments and sessions. Throughout each trial, each key press produced a dot displayed at the top portion of the screen, with the dots accumulating from left to right as the trial progressed. Trials were separated by 30 s breaks.
Experimental procedure. The first experiment comprised three sessions, administered on three consecutive days. In the first session (Day 1) all subjects (n = 24) performed twelve trials of the sequential finger-tapping task. On the next session (Day 2), which was administered 24 hours later, participants in the main experiment were equally divided into two groups. The first, rewarded group, performed one reactivation trial whereby each successful sequence within the trial was reinforced with monetary reward. Subjects were explicitly told that they will be reinforced at the end of the trial. The total reward earned in the trial was displayed on the screen right after the completion of the trial (indicating: "you won X Shekels!" with X being the amount of Israeli Shekels that is equal to the number of total correct sequences performed in the trial). Instructions given to this group of subjects prior to the task explicitly indicated that they will be monetarily rewarded with 1 Israeli Shekel for each of the correct sequences they perform during the task. A second, unrewarded group, performed one reactivation trial with no monetary reward, however this group of subjects received performance feedback at the end of the trial indicating how many successful sequences they were able to tap during the trial ("you tapped X correct sequences!"). Subjects were explicitly told that they will receive performance feedback at the end of the trial. This design enabled to tease apart monetary reward and simple performance feedback per se. Thus, both groups were   administered a reactivation trial at Day 2, eliminating differences resulting from retrieval-induced forgetting 43,44 .
In the third session (Day 3) all participants performed three regular trials of the task, with no performance or reward feedback. In a second experiment, subjects (n = 12) were reinforced in accordance with their performance in the reactivation trial, similar to the rewarded group. However in this experiment reward feedback was provided 6 hours after the completion of the reactivation trial (in the same way as the reward group). Instructions in this experiment indicated to the subjects that they will "receive feedback" about their performance 6 hours after the reactivation trial (i.e., the subjects were not aware that their performance will be reinforced with monetary reward).

Data analysis.
To test for group differences in encoding, a repeated measures analysis of variance (ANOVA) was performed with group serving as the between-subjects factor, and the first and last three trials of Day 1 as repeated measures. The ANOVA was preceded by Mauchly test of sphericity, to confirm that the assumption of sphericity was not violated. Retention was defined as the difference in performance between Day 3 and the last 3 trials of Day 1. As in previous studies using the same task, to better characterize memory strength by minimizing motor fatigue-related decrements in performance, the best two trials were considered for Day 1 post-training and for Day 3 memory strength 45,46 . Two tailed tests were used in all analyses. The relationship between reactivation strength (defined as the difference in performance between Day 2 reactivation and Day 1 post-training), and Day 3 memory strength was tested with a Pearson's correlation. We have additionally assessed the robustness of these correlations against outliers using the percentage-bend correlation technique 47 , as implemented in the "Robust Correlation Toolbox" in MATLAB 48 . Group differences in the strength of correlation were tested with a Fisher's r-to-z transformation. Mediation effects were tested using regression analysis and bias-corrected confidence estimates. First, simple linear regression analyses between each component in the mediation model were performed, including the effect of Day 1 encoding on Day 2 reactivation (A path), Day 2 reactivation on Day 3 subsequent memory strength (B path), Day 1 encoding on Day 3 memory strength (C path), and encoding on Day 3 memory strength when the putative mediator (Day 2 reactivation) is also in the model (the C' path). In this model, the indirect effect estimates the degree to which encoding exerts an indirect effect on Day 3 memory strength through reactivation (the mediator). Mediation effects (the indirect path) were tested using bootstrapping with bias-corrected confidence estimates 21,49 , defining confidence interval (99% to account for multiple comparisons) with 5000 bootstrap resamples 22 . Confidence intervals that included the value 0 indicated that the null hypothesis (no mediation effects) could not be rejected whereas intervals that did not include 0 indicated that the null hypothesis should be rejected.