Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reinstated episodic context guides sampling-based decisions for reward

Abstract

How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event. This effect is mediated by fMRI measures of context retrieval on each trial, suggesting a mechanism whereby cues trigger retrieval of context, which then triggers retrieval of other decisions from that context. This result establishes a new avenue by which experience can guide choice and, as such, has broad implications for the study of decisions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Task design.
Figure 2: Context reward influences choices following a probe.
Figure 3: Context reward effect is mediated by scene reinstatement.

Similar content being viewed by others

References

  1. Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bornstein, A.M., Khaw, M.W., Shohamy, D. & Daw, N.D. Reminders of past choices bias decisions for reward in humans. Nat. Commun. http://dx.doi.org/NCOMMS15958 (2017).

  3. Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Polyn, S.M., Norman, K.A. & Kahana, M.J. A context maintenance and retrieval model of organizational processes in free recall. Psychol. Rev. 116, 129–156 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Manning, J.R., Kahana, M.J. & Norman, K.A. The role of context in episodic memory. in The Cognitive Neurosciences (ed. M. Gazzaniga) 557–566 (MIT Press, 2014).

  6. Sederberg, P.B., Gershman, S.J., Polyn, S.M. & Norman, K.A. Human memory reconsolidation can be explained using the temporal context model. Psychon. Bull. Rev. 18, 455–468 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gershman, S.J., Schapiro, A.C., Hupbach, A. & Norman, K.A. Neural context reinstatement predicts memory misattribution. J. Neurosci. 33, 8590–8595 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Shadlen, M.N. & Shohamy, D. Decision making and sequential sampling from memory. Neuron 90, 927–939 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Howard, M.W. & Kahana, M.J. A distributed representation of temporal context. J. Math. Psychol. 46, 269–299 (2002).

    Article  Google Scholar 

  10. Bornstein, A.M. & Daw, N.D. Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans. PLoS Comput. Biol. 9, e1003387 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wimmer, G.E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

    Article  CAS  PubMed  Google Scholar 

  12. Shohamy, D. & Daw, N.D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).

    Article  Google Scholar 

  13. Gershman, S.J. & Niv, Y. Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 20, 251–256 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bernheim, B.D. On the potential of neuroeconomics: a critical (but hopeful) appraisal. Am. Econ. J. Microecon. 1, 1–41 (2009).

    Article  Google Scholar 

  15. Weber, E.U. & Johnson, E.J. Constructing preferences from memory. in The Construction of Preference (Lichtenstein, S. & Slovic, P.) 397–410 (Cambridge Univ. Press, 2006).

  16. Erev, I., Ert, E. & Yechiam, E. Loss aversion, diminishing sensitivity, and the effect of experience on repeated decisions. J. Behav. Decis. Mak. 21, 575–597 (2008).

    Article  Google Scholar 

  17. Brainard, D.H. The Psychophysics Toolbox. Spat. Vis. 10, 433–436 (1997).

    Article  CAS  PubMed  Google Scholar 

  18. Smith, S.M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 (Suppl. 1), 208–219 (2004).

    Article  Google Scholar 

  19. Behrens, T.E.J., Woolrich, M.W., Walton, M.E. & Rushworth, M.F.S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

    Article  CAS  PubMed  Google Scholar 

  20. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  Google Scholar 

  21. Norman, K.A., Polyn, S.M., Detre, G.J. & Haxby, J.V. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends Cogn. Sci. 10, 424–430 (2006).

    Article  PubMed  Google Scholar 

  22. Epstein, R. & Kanwisher, N. A cortical representation of the local visual environment. Nature 392, 598–601 (1998).

    Article  CAS  PubMed  Google Scholar 

  23. Schroeder, L.D., Sjoquist, D.L. & Stephan, P.E. Understanding Regression Analysis: An Introductory Guide (Sage, Beverly Hills, California, USA, 1986).

Download references

Acknowledgements

The authors wish to thank J. Manning of Dartmouth University for providing localizer code and stimuli, A. Schapiro, A. Rangel, J. Poppenk, M. deBettencourt, S. Chan and Y. Niv for fruitful discussions, and M. Aly, C. Honey and A. Shenhav for comments on an earlier version of the manuscript. This publication was made possible through the support of a grant from the John Templeton Foundation (grant ID #57876; K.A.N.). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.

Author information

Authors and Affiliations

Authors

Contributions

A.M.B. and K.A.N. designed the experiment; A.M.B. ran the experiment; A.M.B. analyzed the data; A.M.B. and K.A.N. wrote the paper.

Corresponding author

Correspondence to Aaron M Bornstein.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Context guides which episodes are sampled next.

In the proposed mechanism, object images presented as memory probes cause participants to reinstate the trial episode on which the probed image was first encountered (first item in thought bubble). The reinstated episode carries with it the option chosen and reward received on that trial ($10 bill or phase-scrambled bill reflecting $0 reward). The episode also carries with it information about the context, or “casino room”, in which that trial took place. Reinstated context can lead to the subsequent reinstatement of other trials from that same context (ensuing items in thought bubble). Therefore, we predicted that memory probes will cause subsequent choices to be biased by both the rewards received on the reminded trial as well as those received on other trials in the same context.

Supplementary Figure 2 Memory probes can provide multiple sources of value information.

As shown in Figure S1, probes can trigger the recall of both the trial on which the probed image was received, and other trials in the same context. The payoff probabilities are designed such that during the initial 10 trials in each room, one deck is more likely to pay out a reward than the others, but that deck is not likely to pay out well in the last twenty trials of that room (Figure 2). Thus, the participant is likely to choose a different deck for the final twenty trials of each room than she did for the first 10 trials. This design feature allows us to distinguish effects of the reminded context from that of the reminded trial. a. First, each object image uniquely identifies one choice trial. b. This object image is episodically associated with the choice made, and reward received, on the given trial. c. However, the value information carried by subsequent retrievals of contextually-related trials is more likely to reflect trials on which a different option was chosen, and rewarded. We hypothesized that choices after a reminder would be influenced both by the rewards received on the reminded trial and the rewards received across the reminded context room.

Supplementary Figure 3 Simulations with a context-aware sampling model.

We ran simulations using a context-aware sampling model. The model reinstates a first episode with a probability related to its temporal recency, then some number of other episodes in turn. Each new reinstated episode is, with some probability, drawn from the same context as the previous reinstated episode. (See Methods for details of the model’s implementation.) In this task, when more than a few samples are drawn, the context-aware sampling model predicts that the effect of reminded context should be greater than the effect of reminded trials. a. Shown here is the regression model from Figure 2, fit to a simulated population of 32 subjects that sampled 12 episodes between each probe and the ensuing choice. Error bars are +/- 1 SEM. b. Simulations show that the influence of context reward should increase with greater numbers of episodic samples. We simulated the context-aware sampling model to generate populations of 1,600 subjects (50 groups of 32), holding fixed all parameters except the number of samples drawn in support of each decision. We then performed the regression analysis shown in Figure 2 of the main text on this simulated data, and plot here the regression weights for single-trial (item) rewards and context rewards. As the number of samples increased, so did the ratio of the effect of context reward to that of the reminded trial (correlation between context and single-trial effects: R(13)=−0.9436, P=1.3125e−07). Error bars are +/- 1 SEM.

Supplementary Figure 4 Simulations with imperfect memory reinstatement.

We used the simulation to illustrate how our mechanism can eliminate the effect of reminded trials while sparing an effect of recent rewards and context. We followed the simulation procedure of Figure S3, again generating populations of 1,600 subjects, this time with each population using a different value of αevoked, the probability that a reminder probe would result in reward information being reinstated from the reminded trial and context. All other model parameters were fixed. a. Shown here is the full regression model fit to a simulated population of 32 subjects that sampled 12 episodes before each choice, with αevoked set to 0.3. Error bars are +/- 1 SEM. b. As the probability of reinstating reward information decreases, so does the contribution of both item and context memory to decisions. We ran the experiment for 9 populations of 1,600 simulated subjects each. Each population used a different value of the αevoked parameter. We then measured the regression weights for the reminded trial and the reminded context, plotted here by their ratio to the regression weight for reward experienced one trial ago. As αevoked decreased, so did the ratio of each type of memory-based effect to the effect of recent rewards. While the effect of reminded context is preserved at even very small values for αevoked, the effect of the reminded trial drops to near zero. This pattern matches the reduced effect of both reminded trial and context in Experiment 2. Error bars are +/- 1 SEM.

Supplementary Figure 5 Each regressor of interest plotted as a function of scene reinstatement evidence.

As described in the main text, we also tested whether regressors of interest other than context (recent reward, cued single-trial reward) were modulated by scene evidence, by repeating the quartiles analysis of Figure 3c for each regressor. We found that scene evidence did not reliably modulate any of the regressors of interest other than context (R-1: t(31)=0.3188, P=0.7520; R-2: t(31)=0.9716, P=0.3388; R-3: t(31)=0.8135, P=0.4221; Reminded trial: t(31)=1.5052, P=0.1424).

Supplementary Figure 6 Other activity does not predict behavioral effects.

To confirm that context reward was specifically modulated by scene reinstatement in the PPA, we also repeated the quartiles analysis to look for an effect of activity or classifier evidence in several other regions of interest. a. Univariate activity in PPA. Across quartiles and subjects, neither reminded trial (mean slope=−0.0422, SEM 0.0397, t(31)=1.0205, P=0.3154), nor reminded context (mean slope=−0.0162, SEM 0.0184, t(31)= 0.8765, P=0.3875) showed a relationship with PPA activity, nor did we observe an interaction between these relationships (t(31)=0.3136, P=0.7559). b. The same analysis was repeated for a region of interest that was differentially responsive to the “scrambled” scenes that were used as a control in our localizer task. Across quartiles and subjects, neither reminded trial (mean slope=0.0194, SEM 0.0479, t(31)=0.3949, P=0.6956), nor reminded context (mean slope=−0.0214, SEM 0.0212, t(31)=1.0085, P=0.3210) showed a relationship with univariate activity levels in this ROI, nor did we observe an interaction between these relationships (t(31)=0.6750, P=0.5047). c. We trained classifiers to decode “scrambled scene” evidence and scene evidence from this ROI. Neither reminded trial (mean slope=−0.0033, SEM 0.0160, t(31)=0.2048, P=0.8391), nor reminded context (mean slope=0.0314, SEM 0.0200, t(31)=1.5707, P=0.1264) showed a relationship with “scrambled scene” evidence, nor did we observe an interaction between these effects (t(31)=1.4886, P=0.1467). d. Neither reminded trial (mean slope=0.0041, SEM 0.0132, t(31)=0.3079, P=0.7602), nor reminded context (mean slope=0.0185, SEM 0.0158, t(31)=1.1678, P=0.2518) showed a relationship with scene evidence in this ROI, nor did we observe an interaction between these effects (t(31)=0.6751, P=0.5046). Error bars are +/- 1 SEM.

Supplementary Figure 7 Neither hippocampal activity nor classifier evidence predict trial or context effect.

The hippocampus has long been understood to be critical to episodic memory encoding and retrieval. We repeated the analysis of Figure 3c using an anatomically defined mask of bilateral hippocampus (two subjects’ data excluded due to badly warped anatomical masks). a. Univariate hippocampal activity did not scale with either the single-trial reward effect (mean slope=−0.0231 SEM 0.0288, t(29)=-0.7754, P=0.4444), nor the context reward effect (mean slope=0.0277 SEM 0.0285, t(29)=0.9657, P=0.3422), nor was there an interaction between the effects (t(29)=1.0715, P=0.2928). b. We trained a classifier to discriminate scenes in this region, and did not observe a reliable relationship between hippocampal scene evidence and either reminded trial (mean slope=-0.0069 SEM 0.0140, t(29)=-0.4898, P=0.6280) or reminded context reward effects (mean slope=0.0045 SEM 0.0148, t(29)=0.3010, P=0.7656). Error bars are +/- 1 SEM.

Supplementary Figure 8 Activity in bilateral hippocampus scales with entropy over scene reinstatement evidence.

We investigated the hypothesis that hippocampal activity could reflect retrieval of memories in support of decisions [1,2]. We used the scene-specific reinstatement weights to investigate activity related to memory retrieval. In previous studies, we observed that hippocampal activity increases along with uncertainty about an action’s outcome, both for simple sequential responses and goal-directed planning decisions [3,4]. We interpreted those findings as consistent with hippocampus’ known role in memory retrieval. In the context of action evaluation, memory retrievals could constitute evidence about the outcome of the actions under consideration (similar to how forward trajectories are “replayed” as rodents make navigation decisions [5]). In this task, greater uncertainty about the associated context should lead to a wider range of next-step outcomes to evaluate. We therefore reasoned that activity in hippocampus at choice might scale with uncertainty about the probed item’s context. We computed the evidence that participants reinstated each context image on each trial, by taking the correlation between per-scene template patterns and PPA activity on that trial. The resulting six numbers were then normalized to create a probability distribution for that trial, reflecting the relative likelihood that each scene was being remembered. The entropy over this distribution can thus be considered the uncertainty over the reinstated context. Consistent with a role for hippocampus in retrieving memories that are used to evaluate outcomes, this entropy value is reliably correlated with hippocampal activity at the time of choice (mean R=0.0670, SEM 0.0261, t(29)=2.5733, P=0.0155). Error bars are +/- 1 SEM.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bornstein, A., Norman, K. Reinstated episodic context guides sampling-based decisions for reward. Nat Neurosci 20, 997–1003 (2017). https://doi.org/10.1038/nn.4573

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.4573

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing