Research reportTheory meets pigeons: The influence of reward-magnitude on discrimination-learning
Introduction
Successful behavior depends on establishing reliable predictions about future events. To select appropriate actions, humans and other animals need to learn which sensory events predict dangers or benefits and which actions improve or worsen the situation of the animal. This learning often relies on positive (reward) or negative feedback (punishment). The neural basis of feedback-based learning is highly conserved across species and much of the basic neural organization in different vertebrate species resembles each other [38], [12]. Countless research has been dedicated to understanding the computational principles mediating feedback-based learning and numerous models have been devised to describe these principles mathematically [36], [8]. Modern, theoretical accounts on feedback-based learning are mostly centered on reinforcement learning algorithms; the most prominent of these is the temporal-difference (TD) algorithm [36], [37], which has been successfully used as a model for behavioral and neural responses during reward-based learning [21], [31]. TD learning is an extension of the Rescorla–Wagner (or also the Widrow–Hoff) learning rule, with a more detailed representation of time [36], [37]. We used the TD model in this study because it is widely used in computational neuroscience and because it is well integrated into machine-learning theory including action selection in decision making.
In TD-algorithms, time is often divided into discrete steps and for each time step the amount of predicted future reward is determined on the basis of sensory stimuli. A comparison of predicted and obtained reward yields a prediction error signal with three basic characteristics: (1) an unexpected reward generates a positive prediction error indicating that more reward was obtained than was predicted, (2) omission of a predicted reward generates a negative prediction error indicating that less reward was obtained than was predicted, and (3) obtaining a fully predicted reward generates no prediction error. This prediction error signal is in turn used to update the reward prediction of sensory stimuli that preceded the reward; a positive prediction error leads to an increase in reward prediction, a negative prediction error to a decrease in reward prediction [31], [33]. Through these mechanisms TD learning can be used to associate a stimulus with a reward (as in classical conditioning) [25], to associate an action with a reward (as in operant conditioning) [22], [1] or also to cause extinction of a previously formed association [26].
The TD-algorithm gained popularity, since the activity of dopaminergic neurons located in the ventral tegmentum and substantia nigra pars compacta of mammals resembles the TD prediction error signal. The dopaminergic system is frequently termed the ‘reward-system’ of the brain and numerous theories have been devised on its exact role in reward. The most prominent theories include reinforcement [35], incentive salience [2] and habit formation [10]. Despite the discussion on the behavioral role of dopamine, there is clear evidence that the activity of dopaminergic neurons bears striking resemblance to the TD error signal. The responses of dopaminergic neurons show positive and negative prediction errors [21], [31], [25] and comply with several assumptions of learning theory [40]. One important prediction of the TD-algorithm is that the error signal is dependent on the size of the reward; a big unexpected reward will generate a bigger error signal than a small unexpected reward. Hence, bigger rewards lead to faster learning than smaller rewards.
The influence of reward-magnitude on animal behavior has previously been investigated with regards to several questions, for example reward-discriminability [7], [14], [15], [17], [24], motivation [4], [5], [6], [9], [19], [43] and choice behavior [18]. In addition, it has been evaluated in the light of response-rates during acquisition [4], [7], [13], [20], [43], and reversal [19]. However, whether the influence of reward-magnitude on learning-rate complies with the predictions of the TD-model has not yet directly been investigated. Such a test requires the use of error-rates instead of measures of response-strength in order to avoid measuring overall differences in performance due to motivational differences [5], [6]. Here we test whether the acquisition of a color-discrimination is modulated by the magnitude of contingent reward and relate our findings to an implementation of the TD-model.
Section snippets
Subjects
Twelve naive homing pigeons (Columba livia) with body weights ranging from 330 g to 490 g served as subjects. The animals were housed individually in wire-mesh cages inside a colony room, had free access to water and grit and during experiments they were maintained on 80% of their free-feeding body weight. The colony room provided a 12 h dark–light cycle with lights on at 8:00 and lights off at 20:00. The experiment and all experimental procedures were in accordance with the National Institute of
Behavior
Of the 12 animals in training, ten reached criterion on the reward-discrimination (three consecutive days over 80% choice of the big reward) and went on to be tested on the color-discrimination. For these 10 animals, the high level of reward-discrimination was maintained throughout all consecutive sessions (Fig. 2). Training of the remaining two animals was discontinued and they were omitted from analysis.
All animals learned the color-discrimination task within 10 days of training, the
Discussion
The aim of the present study was to test a prediction of reinforcement learning models. These models imply that learning-rates depend on the magnitude of reward delivered after correct responses. To assess this prediction, pigeons were trained on a color-discrimination task with different reward-magnitudes. In line with reinforcement learning models, a large-reward led to fast acquisition of the task, whereas a small-reward led to slow acquisition of the task. As an additional measure, the
Acknowledgement
This work was supported by the BMBF grant ‘reward-based learning’.
References (43)
- et al.
Midbrain dopamine neurons encode a quantitative reward prediction error signal
Neuron
(2005) The basal ganglia: learning new tricks and loving it
Curr Opin Neurobiol
(2005)Avian and mammalian “prefrontal cortices”: limited degrees of freedom in the evolution of the neural mechanisms of goal-state maintenance
Brain Res Bull
(2005)- et al.
Single units in the pigeon brain integrate reward amount and time-to-reward in an impulsive choice task
Curr Biol
(2005) - et al.
Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons
Neuron
(2004) - et al.
The biopsychology-toolbox: a free, open source Matlab-toolbox for the control of behavioral experiments
J Neurosci Methods
(2008) Getting formal with dopamine and reward
Neuron
(2002)Behavioral dopamine signals
Trends Neurosci
(2007)The debate over dopamine's role in reward: the case for incentive salience
Psychopharmacology
(2007)Shifts in magnitude of reward and contrast effects in instrumental and selective learning: a reinterpretation
Psychol Rev
(1968)
Changes in performance as a function of shifts in the magnitude of reinforcement
J Exp Psychol
Quantitative variation in incentive and performance in the white rat
Am J Psychol
Amount of reinforcement and level of performance
Psychol Rev
Differential response learning on the basis of differential size of reward
J Genet Psychol
Modulators of decision making
Nat Neurosci
Changes in response strength with changes in the amount of reinforcement
J Exp Psychol
Neural systems of reinforcement for drug addiction: from actions to habits to compulsion
Nat Neurosci
Operant conditioning, extinction, and periodic reinforcement in relation to concentration of sucrose used as reinforcing agent
J Exp Psychol
Equal-reinforcement values for sucrose and glucose solutions compared with equal-sweetness values
J Comp Physiol Psychol
Rate of bar pressing as a function of quality and quantity of food reward
J Comp Physiol Psychol
Dopamine release in the dorsal striatum during cocaine-seeking behavior under the control of a drug-associated cue
J Neurosci
Cited by (18)
An educated guess: how coral reef fish make decisions under uncertainty
2024, Animal BehaviourStriatal dopamine D1 receptors are involved in the dissociation of learning based on reward-magnitude
2013, NeuroscienceCitation Excerpt :After the animals learned to reliably choose the large reward over the small reward (three consecutive days of at least 80% choice of large reward), they were trained on the color-discrimination task. For a more detailed description of the initial training procedure, refer to Rose et al. (2009). This initial training was performed for two reasons.
The modulation of operant variation by the probability, magnitude, and delay of reinforcement
2011, Learning and MotivationThe role of dopamine in maintenance and distractability of attention in the "prefrontal cortex" of pigeons
2010, NeuroscienceCitation Excerpt :With an increasing strength of the association between the target-stimulus and reward this activation is shifted to the target-stimulus (Schultz, 2007). At this stage, dopamine helps forming the stimulus-reward association (Rose et al., 2009; Tsai et al., 2009). When training is completed, the presentation of the target-stimulus triggers dopamine-release to the NCL.