Skip to main content
Advertisement

< Back to Article

Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation

Fig 7

The value-decay generates value-contrasts between ‘Go’ and ‘Stay’.

(A) Schematic diagram of the selection of A11 (‘Stay’) and A12 (‘Go’) at S6. We considered a reduced continuous-time dynamical system model that describes the time evolution of q(A11) and q(A12), which are continuous-time variables approximately representing the action values of A11 (‘Stay’) and A12 (‘Go’), respectively. (B) Bifurcation diagram of the reduced model, showing the equilibrium values of q(A11(‘Stay’)) (red line) and q(A12(‘Go’)) (blue line) (vertical axis) depending on the degree of the value-decay (horizontal axis; ψ = 0 corresponds to the case without the value-decay). Temporal discounting was not assumed. The thick parts of the lines indicate the stable equilibriums, whereas the thin part indicates the unstable equilibrium; the unstable equilibrium of q(A12(‘Go’)) is overlapped by the stable equilibrium and is thus invisible. (C) Probability of selecting A11 (‘Stay’) (red) or A12(‘Go’) (blue) at the equilibriums (vertical axis) depending on the degree of the value-decay (horizontal axis). The thick parts and thin parts correspond to the stable and unstable equilibriums, respectively. (D) A simulation result of the original model with the decay rate φ = 0.0045, in which there appears a phenomenon indicative of bistability: the value of A11 (‘Stay’) fluctuates between two levels in long time scales. (E) Phase diagrams in the cases with five different degrees of the value-decay. The red and blue lines indicate the nullclines on which the time derivative of q(A11(‘Stay’)) or q(A12(‘Go’)) is zero, respectively. The gray arrows indicate the direction of the time evolution of q(A11(‘Stay’)) and q(A12(‘Go’)) (indicating vectors (dq(A11)/dt, dq(A12)/dt)/2). Notably, the analysis of the reduced model was conducted under the assumption of q(A11(‘Stay’)) ≤ q(A12(‘Go’)), which corresponds to the upper left region of the black dashed line.

Fig 7

doi: https://doi.org/10.1371/journal.pcbi.1005145.g007