Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit
Highlights
► We review computational issues and possible algorithms for decision making. ► We review recent findings on the neural correlates of the variables in those algorithms. ► Then we propose a hypothesis about parallel and hierarchical modules in the striatum.
Introduction
The loop network composed by the cerebral cortex and the basal ganglia is now recognized as the major site for decision making and reinforcement learning [1, 2]. The theory of reinforcement learning [3] prescribes a number of steps that are required for decision making: 1) recognize the present state of the environment by disambiguating sensory inputs; 2) evaluate the candidate actions in terms of expected future rewards (action values); 3) select an action that is most advantageous; and 4) update the action values based on the discrepancy between the predicted and the actual rewards. Simplistic models of reinforcement learning in the basal ganglia (e.g. [4]) proposed that the cerebral cortex represents the present state and the striatal neurons compute action values [5]. An action is selected in the downstream, the globus pallidus, and the dopamine neurons signal the reward prediction error [6], which enables learning by dopamine-dependent synaptic plasticity in the striatum [7]. Recent studies, however, have shown that the reality may be more complex. Discriminating the environmental state behind noisy observation is in itself a hard problem, known as perceptual decision making [8, 9]. Activities related to action values are found not only in the striatum, but also in the pallidum [10, 11•] and the cortex [12••]. Different parts of the striatum, especially in its ventromedial to dorsolateral axis, have different roles in goal-directed and habitual behaviors [13]. Action selection may be performed not just in one locus in the brain but by competition and agreement among distributed decision networks [14]. Finally, a subset of midbrain dopamine neurons located in the dorsolateral part signal not only rewarding but also aversive signals [15••].
Based primarily on primate studies, Samejima and Doya [16] proposed that different cortico-basal ganglia subloops realize decisions in motivational, context-based, spatial, and motor domains. In this article, we consider how different algorithms of decision making, such as model-based and hierarchical reinforcement learning algorithms, can be implemented in the cortico-basal ganglia circuit with a focus on the ventromedial to dorsolateral axis in the rodent striatum.
Section snippets
Computational axes in action learning
In looking into the computational mechanisms of decision making and reinforcement learning, there are several axes that are useful for sorting out the process.
Model-free reinforcement learning algorithms
In the basic theory of reinforcement learning, the learning agent does not initially know how its actions affect the environmental state or how much rewards are given in what state. The action value-based algorithms, including Q-learning and SARSA, use actual experience of state, action, and reward to estimate the action value function Q(state,action), which evaluates how much future reward is expected by taking a particular action at a given state. An action can be selected greedily or
Model-based analysis of learner's variables
In order to describe how an animal's choices change dynamically depending on the reward experience, a straight forward way is to take a Markov model in which the conditional probability of action choice given previous state, action, and reward is computed. Such non-parametric, hypothesis-neutral description is helpful in measuring goodness of more elaborate model-based explanation [11•]. Recent use of normative models, especially those by reinforcement learning algorithms, has turned out to be
Possible implementation in the cortico-basal ganglia network
Based on the computational requirements and possible reinforcement learning algorithms, we now review neural recording and brain imaging results, many of which through the model-based analysis described above, that shed light on how they could be implemented in the cortico-basal ganglia network.
Hierarchical reinforcement learning in the cortico-basal ganglia loops
Anatomically and neurophysiologically, DS and VS have the same basic structure and there is no clear boundary [69], suggesting a possibility that DS and VS work with the same mechanism. On the contrary, input from the cortex has a dorsolateral–ventromedial gradient in the modality: the more dorsolateral striatum receives sensorimotor-related information and the more ventromedial part receives associative and motivational information [69]. These striatal subdivisions send their output through
Conclusion
We reviewed computational issues and possible algorithms for decision making and reinforcement learning and recent findings on the neural correlates of the variables in those algorithms. Then we proposed a working hypothesis: the dorsolateral, the dorsomedial, and the ventral striatum comprise a parallel and hierarchical reinforcement learning modules that are in charge of actions at different physical and temporal scales. The parallelism of the decision modules has been suggested also in the
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
References (73)
Complementary roles of basal ganglia and cerebellum in learning and motor control
Curr Opin Neurobiol
(2000)- et al.
Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
Rob Auton Syst
(2001) - et al.
Model-based fMRI and its application to reward learning and decision making
Ann N Y Acad Sci
(2007) - et al.
Model-based analysis of decision variables
- et al.
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task
J Neurosci
(2002) - et al.
Activation of dorsal raphe serotonin neurons underlies waiting for delayed rewards
J Neurosci
(2011) - et al.
Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning
Behav Brain Res
(2006) - et al.
Putting a spin on the dorsal-ventral divide of the striatum
Trends Neurosci
(2004) - et al.
Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task
Neuron
(2010) Reinforcement learning: computational theory and biological mechanisms
HFSP J
(2007)
Modulators of decision making
Nat Neurosci
Reinforcement Learning
Representation of action-specific reward values in the striatum
Science
A neural substrate of prediction and reward
Science
A cellular mechanism of reward-related learning
Nature
Representation of confidence associated with a decision by neurons in the parietal cortex
Science
Decision making under uncertainty: a neural model based on partially observable markov decision processes
Front Comput Neurosci
Shaping of motor responses by incentive values through the basal ganglia
J Neurosci
Validation of decision-making models and analysis of decision variables in the rat basal ganglia
J Neurosci
Neural computations underlying action-based decision making in the human brain
Proc Natl Acad Sci USA
Corticostriatal interactions during learning, memory processing, and decision making
J Neurosci
Cortical mechanisms of action selection: the affordance competition hypothesis
Philos Trans R Soc Lond B Biol Sci
Two types of dopamine neuron distinctly convey positive and negative motivational signals
Nature
Multiple representations of belief states and action values in corticobasal ganglia loops
Ann N Y Acad Sci
Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
Nat Neurosci
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artif Intell
Hierarchical reinforcement learning with the MAXQ value function decomposition
J Artif Intell Res
Feudal reinforcement learning
The computational neurobiology of learning and reward
Curr Opin Neurobiol
Understanding neural coding through the model-based analysis of decision making
J Neurosci
Prefrontal cortex and decision making in a mixed-strategy game
Nat Neurosci
Matching behavior and the representation of value in the parietal cortex
Science
Dynamic response-by-response models of matching behavior in rhesus monkeys
J Exp Anal Behav
Cortical substrates for exploratory decisions in humans
Nature
A new look at the statistical model identification
IEEE Trans Autom Control
Role of striatum in updating values of chosen actions
J Neurosci
Cited by (95)
Informing deep neural networks by multiscale principles of neuromodulatory systems
2022, Trends in NeurosciencesBrain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making
2022, NeuroImageCitation Excerpt :This method is similar to approaches used in statistics and economics (Berger and Pericchi, 1996; Fong and Holmes, 2020; Rust and Schmittlein, 1985; Wang and Pericchi, 2020). While the models’ log-evidence can be approximated Akaikes Information Criterion (AIC) and Bayesian information criterion (BIC), cross-validation is often considered a more robust method for model comparison (Ito and Doya, 2011). For the Surprise Actor-critic (which we later focus on as our winning model of model comparison), we performed (i) posterior predictive checks as well as (ii) a model recovery and (iii) a parameter recovery analysis (Nassar and Frank, 2016; Wilson and Collins, 2019).
Reinforcement learning and its connections with neuroscience and psychology
2022, Neural NetworksCitation Excerpt :Langdon, Sharpe, Schoenbaum, and Niv (2018) reviewed recent findings of the association of dopaminergic prediction errors with model based learning and hypothesized that the underlying system might be multiplexing model-free scalar RPEs with model-based multi-dimensional RPEs. Although there have been numerous advancements in finding neural correlates for model-free reinforcement learning (Delgado, Nystrom, Fissell, Noll, & Fiez, 2000; Hare, O’Doherty, Camerer, Schultz, & Rangel, 2008; Knutson & Gibbs, 2006), the last two decades have witnessed research that bolsters evidence for the existence of a model-based system especially in a combined setting with the model-free learning system (Daw, Gershman, Seymour, Dayan, & Dolan, 2011; Gläscher, Daw, Dayan, & O’Doherty, 2010b; Ito & Doya, 2011; Kool, Cushman, & Gershman, 2018; Seo et al., 2009). Human neural systems are known to use information from both model-free and model-based sources (Daw, Niv, & Dayan, 2005; Gläscher et al., 2010a; Pan, Sawa, Tsuda, Tsukada, & Sakagami, 2008).
More dynamical and more symbiotic: Cortico-striatal models of resolve, suppression, and routine habit
2021, Behavioral and Brain SciencesSubthalamic nucleus deep brain stimulation alleviates oxidative stress via mitophagy in Parkinson’s disease
2024, npj Parkinson's Disease