Pramipexole Enhances Reward Learning by Preserving Value Estimates

,

Major depressive disorder is a debilitating condition and a pressing public health concern (1).The majority of individuals with depression respond only partially to treatment, and a sizable proportion do not respond at all (2), motivating the search for novel interventions.The success of this search depends on characterizing promising treatment targets associated with the illness and identifying agents able to engage these targets.A target of particular recent interest is the impairment of reward learning seen in patients with depression (3,4).The consistent finding in this patient group is that they respond less reliably as compared with control participants to stimuli that are associated with rewards (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16).
Computational characterization using reinforcement learning models has identified 3 distinct alterations of learning and decision-making processes that may produce the behavior observed in patients with depression (Figure 1B, C): they may make decisions less deterministically (9), they may treat rewards as if they were of reduced value (9), or their learned value estimates may decay to a greater degree over time (17).
While these processes describe qualitatively distinct causal mechanisms, they produce very similar effects on behavior in the commonly used reward learning tasks and thus cannot be distinguished using behavioral outcomes from these tasks alone (9).Rather, measures of the internal model variables that do differ between the processes, such as the neural response to expectations or reward prediction errors, are required (Figure 1D, E).Neuroimaging measures of striatal and rewardrelated cortical regions provide an index of these processes (18,19) and thus can help to distinguish between competing mechanistic hypotheses (18,20).The most common neuroimaging finding in depressed and anhedonic populations is a reduced neural response to rewarding outcomes and/or reward prediction errors (7,(21)(22)(23)(24), suggesting that patients respond less consistently to rewards because they treat outcomes as if they were less rewarding (7), rather than due to decreased decision consistency (which should not affect the blood oxygen level-dependent [BOLD] response to reward) or increased value decay (which should increase BOLD response to reward, reflecting greater disparity between reward expectation and reward outcome resulting from increased decay of the former).
The centrality of the mesocorticolimbic dopaminergic system in reward learning (25) suggests that dopaminergic agents might act to reverse impaired reward learning in patients with major depressive disorder (26).Early evidence suggests that one such agent, the SEE COMMENTARY ON PAGE 204 D 2 -like (D 2 , D 3 , and D 4 ) receptor agonist pramipexole, is efficacious in the treatment of major depressive disorder (27)(28)(29).However, contrary to its clinical effects, previous experimental studies (30)(31)(32)(33)(34)(35)(36) of pramipexole generally indicate that it blunts rather than enhances participants' behavioral responses to reward (30)(31)(32)(33).Similarly, pramipexole has been found to blunt the neural response to positive outcomes in reward-sensitive brain regions such as the medial prefrontal cortex (mPFC), orbitofrontal cortex (OFC) (37), ventral striatum (VS), and midbrain (38).One explanation for the seeming contradiction between the clinical and experimental evidence is that experimental studies have predominantly examined the effect of a single dose of pramipexole while sustained treatment is required to improve symptoms (27,39).From a pharmacological perspective, acute treatment with D 2/3/4 agonists is believed to primarily influence inhibitory presynaptic autoreceptors, leading to reduced dopaminergic transmission, whereas sustained administration leads to autoreceptor downregulation and enhanced transmission via agonism at postsynaptic D 2 -like receptors (40)(41)(42)(43).This suggests that the clinically relevant effects of pramipexole on reward learning are likely to become apparent only after sustained administration of the drug.
In the current study, we examined the effect of a sustained (2 weeks) course of pramipexole on both behavioral and neural measures of reward learning (Figure 1A) in nonclinical participants.As registered in ClinicalTrials.gov(https://clinicaltrials. gov/ct2/show/NCT03681509), we hypothesized that pramipexole would induce the opposite pattern of reward learning behavior characteristic of depression and anhedonia, increasing asymptotic choice of stimuli associated with higher levels of reward (Figure 1C) by increasing subjective valuation of rewards (leading to increased BOLD response to rewarding outcomes in the brain's reward prediction error network) (44).

Participants, Design, and Intervention
We conducted a randomized, placebo-controlled experimental medicine study with a between-group design.Forty-two nonclinical participants between the ages of 18 and 45 were randomized 1:1 to receive pramipexole or placebo.Potential participants were excluded if they had ever been diagnosed with a psychiatric illness (determined using the Structured Clinical Interview for DSM-V-Clinician Version) or had a first-degree relative with a psychotic illness, were taking psychoactive medication, had any history of impulse control difficulties, had any contraindication to pramipexole, had taken any recreational drugs in the last 3 months, regularly drank more than 4 units of alcohol per day, smoked more than 5 cigarettes per day, or drank more than 6 caffeinated drinks per day.Female participants who were pregnant, lactating, or not using a highly effective method of contraception were also excluded.From a starting dose of 0.25 mg of pramipexole salt, the dose was increased in 0.25-mg increments every 3 days, reaching a dose of 1 mg/day by day 10.Participants continued to take 1 mg/day for 3-5 days (until testing was completed).Following this, the dose was downtitrated over 3 days to avoid withdrawal effects.The apparent dose of the placebo was increased in the same manner.Participants performed a probabilistic instrumental learning task (PILT) (see below for details) before the intervention and then twice between days 12 and 15 of the intervention (one with functional magnetic resonance imaging data collection, one behavioral only).At the screening session, participants completed the Eysenck Personality Questionnaire, Beck Depression Inventory, and Spot-the-Word test (an estimate of IQ).At both behavioral testing sessions, participants completed the Befindlichkeitsskala, Positive and Negative Affect Schedule, State-Trait Anxiety Inventory, Snaith-Hamilton Pleasure Scale, Temporal Experience of Pleasure Scale, Oxford Happiness Questionnaire, and Questionnaire for Impulsive-Compulsive Disorders in Parkinson's Disease-Rating Scale.At the postintervention behavioral testing session, participants additionally completed a side-effects questionnaire (45).Two participants (both in the placebo group) dropped out of the study due to nausea and related side effects.

Task
Participants completed a modified version of a PILT described by Pessiglione et al. (18).The PILT is a 2-arm bandit task (Figure 1A) with interleaved win and loss trials.In each trial, the participant is presented with 2 stimuli that have reciprocal probabilities (0.7 vs. 0.3) of a win outcome (1£0.20)versus a Between days 12 and 15 of the pramipexole/placebo course, participants attended a functional magnetic resonance imaging (fMRI) session (in which they performed the probabilistic instrumental learning task while undergoing fMRI) and, on a separate day, a behavioral testing session that was identical to the preintervention behavioral testing session.(B) Probabilistic instrumental learning task.In each trial, participants were presented with one of two possible pairs of shapes.For one of the shape pairs (top line), one shape was associated with winning money on 70% of trials and not winning on the remaining 30% (the other shape had reciprocal contingencies).For the other shape pair (bottom line), one shape was associated with losing money on 70% of trials and not losing on the remaining 30% (again, the other shape had reciprocal contingencies).Participants had to learn to select the shapes that were associated with the high probability of win/no-loss.Depression is associated with reduced asymptotic choice of rewarding outcomes in this and similar tasks and so we hypothesized that pramipexole would have the opposite effect (i.e., increase asymptotic choice).Performance on the task can be described by a simple learning rule, in which rewards, R, are combined with expectations, Q, before being fed into a decision rule.Distinct parameters modify each component of this process: a reward sensitivity parameter, r, influences the effective size of experienced rewards; a decay parameter, f, influences the degree to which expectations are maintained between trials; a learning rate parameter, a, influences the rate at which rewards alter expectations; and an inverse temperature parameter, b, influences the degree to which expectations are used to determine choices.(C) Learning curves generated by the learning and decision-making rules described in (B).Choices of the baseline model (black line) were produced using a r of 0.6, a f 0.12, a b of 10, and an a of 0.1.Increases in either r (blue line) or b (green line) and decreases in f (red line) produce equivalent changes in asymptotic choice.In other words, 3 qualitatively distinct processes lead to the same behavioral effect.As a result, choice data on its own cannot be used to distinguish between these processes.However, the internal model variables do differ, and thus can discriminate, between these processes.(D) Illustration of model expectations, Q.As can be seen, either increasing r or decreasing f causes an increase in expectations, whereas b, which influences decision-making rather than learning, does not change expectations (i.e., the green and black lines are identical).(E) Illustration of the prediction errors of the models, which are able to fully discriminate the 3 parameters.Again, changes in b have no effect, whereas increases in r leads to increased prediction errors and reductions in f leads to decreased prediction errors.To discriminate between the 3 possible causes of changed asymptotic choice behavior, estimates of the internal model variables, such as those produced by neuroimaging measures, are required.The behavioral measure of interest was choice accuracy, defined as the proportion of advantageous choices made, i.e., the stimulus with 0.7 probability of win in the reward condition or the stimulus with 0.7 probability of no loss in the loss condition.We measured accuracy in the second half of each block as this provides a close estimate of asymptotic choice (46,47) found to be associated with depression (Figure 1C).Notably, the same pattern of results was found in the current study if accuracy was calculated across all trials rather than those in the second half (see the Supplement).

Reinforcement Learning Models
We used a simple reinforcement learning model, which combined parameters from different, previously described models, to formalize the mechanistic question being addressed in this study.
First, a learning rule was used to update expectations about the association of the stimuli with the outcomes: Here, Q tðsÞ is the expectation about the value of shape s on trial t, R t is the observed outcome (1 for positive outcome, 0 for negative outcome), a j is the learning rate used for trial valence j (i.e., win or loss trial), and r j is a reward sensitivity parameter for trial valence j.
Expectations were initialized at Q 0ðsÞ ¼ 0:5, and the unchosen option, Q tðs 0 Þ , was updated with the reciprocal outcome (see the Supplement for evidence supporting these decisions).Following this, the model's expectations decayed back toward its initial value (i.e., Q 0 ) with the rate of decay controlled by a decay factor f (17): Finally, the Q values were fed into a softmax action selector to produce a choice: Here, the inverse temperature parameter, b, controls the degree to which the probability of the participant choosing shape s, P tðsÞ , is determined by the difference in Q values.This model is overparameterized, and the 3 parametersreward sensitivity, decay, and inverse temperature (r; f and b)-produce very similar effects on asymptotic choice (Figure 1C) and therefore cannot be jointly estimated from participant behavior.To account for changes in behavior, 2 of the 3 parameters have to be fixed while the other (as well as the learning rate) remains free.Doing this is equivalent to making a statement about the presumed cause of the change in behavior.However, as illustrated in Figure 1 and Table 1, while the 3 different parameters have the same effect on choice, they act on distinct components of the learning and decisionmaking process.As a result, it is possible to discriminate between their effects, but doing so requires access to internal model variables such as the expectation and prediction error.
To fit the 3 model variants described in Table 1 to participant choice, the joint posterior probability of the free parameters for each variant was calculated for each participant separately, given their choices.Each participant's parameter values were estimated as the expected value of the marginalized parameter distribution (19,48).r and b parameters were sampled in log space, while a and f parameters were sampled in logit space.All statistical analyses were performed on transformed parameters.

Magnetic Resonance Imaging Analysis
Details of the image acquisition parameters and preprocessing pipeline are included in the the Supplement.Task events were represented using separate explanatory variables for the presentation of stimuli in win and loss trials (2-second period during which stimuli were first presented), and separate variables representing the 4 possible outcomes (2-second period during presentation of win, no-win, loss or no-loss outcomes).Additional explanatory variables were included to account for respiratory and cardiac noise.Activity associated with expectation during learning was captured as the relative difference between signals during the stimuli presentation period for winversus-loss-condition trials.Post hoc analyses then compared expectation-associated activity between the first versus second halves of trials in a block (i.e., when expectation should be low relative to when it should be high, Figure 1D).The contrast between win and no-win outcomes and no-loss and loss outcomes were used as simple non-model-based measures of prediction errors.Note that while this analysis makes no assumption about how participants are learning during the task, it does assume that the observed activity during outcome periods reflects participants' prediction error rather than just experienced outcome (i.e., the response to an outcome is reduced if that outcome is expected).We therefore supplemented this analysis with a model-based analysis in which the estimated prediction errors from the belief decay model were used as parameter regressors in place of the binary outcome regressors.Model-based results utilizing prediction errors generated using the inverse temperature and reward sensitivity models are reported in the Supplement.First-level analyses were run for each participant and both blocks of the task.The outputs of these analyses were then averaged, within subject, across the blocks and entered into a higher-level random-effects analysis which assessed the difference between the 2 groups.The higher-level analysis was restricted to anatomical regions of interest (ROIs) of reward-sensitive regions: the mPFC, OFC [defined using the Harvard-Oxford Structural Atlas library (49)], and the VS [defined using the Oxford-GSK-Imanova Structuralanatomical Striatal Atlas (50)].Group-level inference was performed using the FSL nonparametric permutation tool (Randomise) with 5000 permutations, threshold-free cluster enhancement method, and familywise-error correction (p , .05).

RESULTS
Participants were young, highly educated, and evenly split between females and males (Table 2).

Pramipexole Specifically Increases Asymptotic Choice of Rewarded Stimuli
There was a significant group 3 valence 3 time interaction for choice accuracy across behavioral sessions (Figure 2) (F 1,38 = 10.517p = .002).Win trial accuracy increased after treatment in the pramipexole group (t 20 = 2.347 p = .029),with no significant change in loss trial accuracy across sessions (t 20 = 1.158 p = .26)and no change in either win (p = .86)or loss (p = .172)trial accuracy in the placebo group.The pramipexole and placebo groups did not differ significantly on win or loss trial accuracy at baseline (ps = .435and .395,respectively) or postintervention (ps = .179and .375,respectively).

The Behavioral Effects of Pramipexole May Be Attributed to Increased Reward Sensitivity or Choice Stochasticity or to Reduced Reward Value Decay
We fitted participants' behavior to the versions of the reinforcement learning model described in Methods and Materials.All 3 models were able to account for participant behavior (see Figures S1-S3 for model comparison and diagnostics).Specifically, the observed effect of pramipexole may result from increased reward sensitivity (Figures 1D and 2E) (group 3 valence 3 time F 1,38 = 5.81 p = .021),decreased reward value decay (Figures 1E and 2F) (group 3 valence 3 time F 1,38 = 7.96 p = .008),or increased inverse temperature (Figures 1F and 2D) (group 3 valence 3 time F 1,38 = 5.81 p = .021).

Pramipexole Increases BOLD Signal During Anticipation of Rewards Versus Losses
Anticipation of reward stimuli, as measured using the activity during presentation of stimuli in reward relative to loss trials, was increased in participants receiving pramipexole relative to placebo in the OFC ROI (peak voxel x = 34, y = 77, z = 31; voxel size: 8; p = .0376)(Figure S4).There were no significant clusters for this contrast in the mPFC or VS ROIs.We next examined the development of win and loss expectations during the task blocks.As illustrated in Figure 1D, expectations develop as learning proceeds.We captured this process by subtracting the response to stimuli in the first half of trials (trials 1-15) from the latter half (trials 16-30) separately for reward and loss trials.Within the reward condition, participants receiving pramipexole had a greater increase in activity across the block than those receiving placebo in the OFC ROI (peak voxel x = 30, y = 76, z = 36; voxel size: 57; p = .019)(Figure 3A,  B).There were no clusters for this contrast within mPFC or VS ROIs or for the OFC/mPFC/VS ROIs during loss stimulus presentation.These results are consistent with pramipexole causing an increase in reward sensitivity or reduction in reward expectation decay as both processes lead to increased reward expectation (Figure 1D).They are not consistent with an increase in inverse temperature, which does not require a change in expectations.

Pramipexole Decreases the BOLD Signal Associated With Rewarded Prediction Errors
The response to rewarded outcomes, as measured using activity associated with win relative to no-win outcomes, was reduced in the pramipexole group relative to the placebo group in the mPFC ROI (peak voxel x = 44, y = 84, z = 27; voxel size: 406; p = .0072)(Figure 3C, D).Pramipexole did not influence activity in loss relative to no-loss trials or in the OFC or VS ROIs.This same effect was apparent using regressor coding reward prediction errors derived from the decay model (Figure 3E, F) (see the Supplement for a summary of analyses using other model variants).

Questionnaire Scores
No effect of drug treatment was found for any of the questionnaire measures other than the anticipatory subscale of the Temporal Experience of Pleasure Scale, which was driven by a higher baseline score in the pramipexole group (Table 3) (45).
Behavioral and neuroimaging analyses controlling for baseline Temporal Experience of Pleasure Scale scores are reported in the Supplement.

DISCUSSION
Sustained treatment with pramipexole increases the asymptotic choice of rewarded stimuli while simultaneously enhancing neural activity during the anticipation of rewarded trials and suppressing the response to win outcomes.This indicates that it enhances reward learning by reducing the decay of value estimates and suggests a cognitive mechanism of action by which it may ameliorate the reduced reward learning characteristic of depression and anhedonia.Pramipexole specifically increased asymptotic choice of highly rewarded stimuli, with no effect on the choice of stimuli associated with loss.This is in contrast with the majority of previous experimental studies of pramipexole, which have found that it impairs reward learning (30)(31)(32)(33).However, these studies have generally administered a single, low dose of pramipexole which is thought to act primarily on inhibitory dopamine presynaptic autoreceptors (40,41).The 2-week administration used in the current study was selected to avoid this effect and to assess the more clinically relevant action of the drug on postsynaptic receptors.The current results are consistent with the few studies of patient groups in which longer treatment durations were used and which also found an increase in reward choice following pramipexole treatment (36,51).Together, these studies indicate that putatively postsynaptically acting, sustained dosing of pramipexole acts to enhance reward learning.Depression is associated with reduced reward learning (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16).Overall, therefore, the impact of pramipexole on learning is opposite to that associated with depression and is consistent with the antidepressant effects of the drug (27)(28)(29).
The increase in asymptotic reward choice following treatment with pramipexole may be produced by a number of distinct cognitive mechanisms (Figure 1).It is not possible to arbitrate between these using choice behavior from the PILT alone; rather, estimates of internal model processes are required.We used functional neuroimaging of reward-sensitive neural areas during the presentation of task stimuli and the receipt of outcomes to provide estimates of these internal processes.Pramipexole was found to increase anticipatory activity during rewarded trials in the OFC, a region in which activity commonly tracks expected value (52)(53)(54) and in which activity is found to be altered in depression (55,56).Pramipexole also reduced the response to win outcomes and reward prediction errors in the mPFC, a node in a previously described, positive valence-specific reward prediction error network (44).Contrary to our hypothesis, this pattern of effect suggests that pramipexole enhances value expectations, and therefore reward learning, by reducing the decay of value estimates between trials rather than by enhancing the effective value of the outcomes.Depression itself is associated with a reduced BOLD response to rewarding outcomes (7,(21)(22)(23)(24), which suggests that the reduced learning in patients (5-16) is

Pramipexole and RL
Biological Psychiatry February 1, 2024; 95:286-296 www.sobp.org/journal the result of a lower effective value of rewards rather than a difference in decay of value estimates.The current findings indicate that pramipexole does not act directly to reverse the cognitive profile of patients with depression but rather improves reward learning via a separate mechanism.This result may go some way to explaining why the clinical response to pramipexole in depression seems to be higher in patients with intact, rather than impaired, baseline reward learning (34).Specifically, as pramipexole does not increase reward sensitivity, the impact of the drug on reward learning, and presumably on symptoms of depression, will depend on an intact response to rewarding outcomes and will be reduced in those patients with an impaired response.In other words, there is little point in decreasing the decay of reward value estimates if these estimates have been systematically lowered by reduced reward sensitivity.This interpretation raises the question of whether alternative approaches to enhancing reward learning, such as kappa opioid receptor antagonism (57) or cognitive interventions (58), might act to enhance reward sensitivity and whether the effects of these treatments may therefore be complementary to those of pramipexole.
A reduction in reward expectation decay may be relevant to the potential side effects, as well as beneficial effects, of pramipexole.In particular, dopamine agonists including pramipexole can induce impulse control difficulties, with reports by patients experiencing this describing a persistent preoccupation with rewarding activities, even in the absence of obvious cues (59).While the treatment regimen in the current study was not long enough to induce impulse control difficulties, the symptomatic experience of patients does superfically seem to be consistent with persistent reward expectation.
An outstanding question raised by the current results is how pramipexole might act to reduce the decay of estimated values.One possibility is that this effect is related to the role of dopamine in working memory (60).Previous modeling work has demonstrated that simple learning tasks are often solved using a mixture of working memory and reinforcement learning-based processes, with working memory acting to reduce prediction error responses by maintaining distinct representations of the current value (20).The observed effect of pramipexole in this study may therefore reflect an increase in the degree to which participants rely on working memory when completing the PILT.However, a general enhancement of working memory should also influence loss learning rather than produce a reward-specific effect, as found here.It is therefore necessary to invoke some form of valence-specific working memory effect to explain the current findings.Ultimately, the potential role of working memory in the effect of pramipexole would best be tested by manipulating memory load during learning (17).An alternative explanation for the reduced decay Pramipexole and RL in estimated values, which may more naturally incorporate valence specificity, is that value estimates may be rationally combined with prior beliefs during learning, and pramipexole may change these underlying implicit beliefs.By this view, individuals maintain a global estimate of the likelihood of experiencing positive events, and they use this estimate to moderate their local estimate of reward value during the task.
If, for example, an individual's global estimate is pessimistic, with positive events judged to only occur rarely, it will act to particularly reduce the estimated value of highly rewarded stimuli.The effect of pramipexole may therefore be understood as inducing a more positive global estimate of the likelihood of rewarding events, which reduces the degree to which local estimates are downgraded.If correct, this would suggest that  pramipexole should also act to increase other measures of optimism bias (61).
The current study has a number of limitations.Most obviously, the population recruited were nonclinical healthy participants.A nonclinical population was selected to reduce phenotypic variation among participants and thus enhance the sensitivity of this experimental medicine study to detect the pharmacological effects of pramipexole.However, the possibility remains that patients with depression, perhaps by means of an altered dopaminergic tone, may respond differently to pramipexole treatment than nonclinical participants.Ultimately, this possibility requires replication of the current study in a clinical population.A related concern is that the current design is not able to assess the degree to which change in reward learning mediates clinical response in patients.Answering this question requires a clinical trial of pramipexole in which patients complete the PILT before and after initiating treatment with pramipexole or placebo.We are currently undertaking such a trial (62).A second limitation is that the OFC cluster identified in the expectation analysis (Figures S4 and S5) is small (8 voxels) and peripherally placed, making it a less convincing finding than the reward prediction error result (Figure S3).Related to this limitation, the current sample size (n = 19/21 per group) is modest.As explained above, the reward prediction error result is more useful when discriminating between the competing models, but it would clearly be important to replicate the expectation-linked effect in future studies.Finally, although the current study used blinded treatment allocation, a degree of functional unblinding occurred by the end of the study (i.e., participants were able to guess better than chance which treatment group they were in) (see the Supplement), which raises the possibility that some of the observed results may have been influenced by expectation effects.
A 2-week course of pramipexole enhanced asymptotic choice of highly rewarded stimuli while reducing the neural response to rewarding outcomes.These results indicate that pramipexole enhances reward learning by reducing the decay of learned value estimates and suggests a potential cognitive mechanism by which it may act to ameliorate symptoms of depression.

=Figure 1 .
Figure 1.(A) Study design.Following a screening session, participants underwent the preintervention behavioral testing session in which they performed the probabilistic instrumental learning task [described in (B)].Participants received the first dose of pramipexole/placebo at the end of this behavioral testing session.Between days 12 and 15 of the pramipexole/placebo course, participants attended a functional magnetic resonance imaging (fMRI) session (in which they performed the probabilistic instrumental learning task while undergoing fMRI) and, on a separate day, a behavioral testing session that was identical to the preintervention behavioral testing session.(B) Probabilistic instrumental learning task.In each trial, participants were presented with one of two possible pairs of shapes.For one of the shape pairs (top line), one shape was associated with winning money on 70% of trials and not winning on the remaining 30% (the other shape had reciprocal contingencies).For the other shape pair (bottom line), one shape was associated with losing money on 70% of trials and not losing on the remaining 30% (again, the other shape had reciprocal contingencies).Participants had to learn to select the shapes that were associated with the high probability of win/no-loss.Depression is associated with reduced asymptotic choice of rewarding outcomes in this and similar tasks and so we hypothesized that pramipexole would have the opposite effect (i.e., increase asymptotic choice).Performance on the task can be described by a simple learning rule, in which rewards, R, are combined with expectations, Q, before 1, 2024; 95:286-296 www.sobp.org/journalBiological Psychiatryno-win outcome (£0.00) in reward condition trials or a loss outcome (2£0.20)versus a no-loss outcome (£0.00) in losscondition trials.Participants chose one of the two stimuli, following which they received visual feedback on the trial outcome and their current total earnings.Each block of the PILT consists of 30 reward trials and 30 loss trials.Participants performed 3 blocks of the PILT in each behavioral testing session and 2 blocks in the imaging session.Different task stimuli were used in each block.Participants started each session with £1.50 of funds.Participants received a portion of their winnings from these tasks (up to a maximum of £30).

Figure 2 .
Figure 2. (A, B) Learning curves depicting reward choice accuracy in the (A) preintervention and (B) postintervention session.Green (pink) curves represent the placebo (pramipexole) group.Error bars represent SEM.The curves represent the proportion of runs in which a participant chose the advantageous shape (i.e., the shape associated with a 70% probability of receiving a win outcome) in a given trial.See Figure S3 for loss learning curves.(C) Mean (SEM) pre-vs.-postintervention change in reward/loss condition choice accuracy.(D-F) Three variants of the reinforcement learning model were fitted to participant choice data; in each variant two parameters were fixed and the third (and the learning rate) were allowed to vary.The results show the prevs.-postintervention change when the inverse temperature (D), reward sensitivity (E), and decay parameter (F) were allowed to vary.As can be seen, any one of these three parameters can capture the effect of pramipexole (Figure S3 for illustration of the degree to which the models are able recapitulate participant learning curves).Green (pink) bars represent the placebo (pramipexole) group.Error bars represent SEM.Scatter plots overlaying bar graphs depict corresponding individual values.* represents significant (ps , .05)pre-vs.-postintervention change.

Figure 3 .
Figure 3. Results of functional magnetic resonance imaging analyses.(A) The (red-yellow) colored area represents the cluster of significantly increased activity compared with the placebo group in the orbitofrontal cortex region of interest during win anticipation in the second half .first half of win-condition trials (peak voxel x = 30, y = 76, z = 36; voxel size: 57; p = .019)and (B) parameter estimates (i.e., mean beta weights from image analysis) extracted from the area of significantly increased activity in (A) associated with win anticipation in the first half, second half, and second half .first half of win-condition trials.(C) The (blue) colored area represents the cluster of significantly decreased activity compared with the placebo group in the medial prefrontal cortex region of interest associated with win outcomes .no-win outcomes (peak voxel x = 44, y = 84, z = 27; voxel size: 406; p = .0072)and (D) parameter estimates (i.e., mean beta weights from image analysis) extracted from the area of significantly increased activity in (C) associated with win outcomes, no-win outcomes, and win outcomes .no-win outcomes.(E) The (blue) colored area represents the cluster of significantly decreased activity compared with the placebo group for win-condition RPEs in the medial prefrontal cortex region of interest (peak voxel x = 44, y = 84, z = 27; voxel size: 415; p = .0074)and (F) parameter estimates (i.e., mean beta weights from image analysis) extracted from the area of significantly increased activity in (E) associated with win condition RPEs.(A, C, E) Areas of significantly increased/decreased activity are threshold-free cluster enhancement corrected with a familywise-error cluster significance level of p # .05. (B, D, F) Green (pink) bars represent the placebo (pramipexole) group.Error bars represent SEM.Scatter plots overlaying bar graphs depict corresponding individual parameter estimates.Max, maximum; RPE, reward prediction error.

Table 1 .
The 3 Causal Proposals of Changes in Asymptotic Choice Associated With the Reward Sensitivity, Decay, and Inverse Temperature Parameters of the Reinforcement Learning Model Biological Psychiatry February 1, 2024; 95:286-296 www.sobp.org/journal

Table 3 .
Questionnaire Scores Before and After the Intervention Beck Depression Inventory; BFS, Befindlichkeitsskala; OXH, Oxford Happiness Questionnaire; PANAS, Positive and Negative Affect Schedule; QUIP, Questionnaire for Impulsive-Compulsive Disorders in Parkinson's Disease-Rating Scale; SHAPS, Snaith-Hamilton Pleasure Scale; STAI, State-Trait Anxiety Inventory; TEPS, Temporal Experience of Pleasure Scale.
a Significant group 3 time interaction driven by difference between groups at baseline.