Cortical dopamine reduces the impact of motivational biases governing automated behaviour

Motivations shape our behaviour: the promise of reward invigorates, while in the face of punishment, we hold back. Abnormalities of motivational processing are implicated in clinical disorders characterised by excessive habits and loss of top-down control, notably substance and behavioural addictions. Striatal and frontal dopamine have been hypothesised to play complementary roles in the respective generation and control of these motivational biases. However, while dopaminergic interventions have indeed been found to modulate motivational biases, these previous pharmacological studies used regionally non-selective pharmacological agents. Here, we tested the hypothesis that frontal dopamine controls the balance between Pavlovian, bias-driven automated responding and instrumentally learned action values. Specifically, we examined whether selective enhancement of cortical dopamine either (i) enables adaptive suppression of Pavlovian control when biases are maladaptive; or (ii) non-specifically modulates the degree of bias-driven automated responding. Healthy individuals (n=35) received the catechol-o-methyltransferase (COMT) inhibitor tolcapone in a randomized, double-blind, placebo-controlled cross-over design, and completed a motivational Go NoGo task known to elicit motivational biases. In support of hypothesis (ii), tolcapone globally decreased motivational bias. Specifically, tolcapone improved performance on trials where the bias was unhelpful, but impaired performance in bias-congruent conditions. These results indicate a non-selective role for cortical dopamine in the regulation of motivational processes underpinning top-down control over automated behaviour. The findings have direct relevance to understanding neurobiological mechanisms underpinning addiction and obsessive-compulsive disorders, as well as highlighting a potential trans-diagnostic novel mechanism to address such symptoms.


Sample Description Sample and site description
Individuals were considered eligible if they i) were between 18-49 years of age, ii) had no prior history of neurological disorders (e.g. Parkinson's Disease) iii) had no medical conditions or medication considered incompatible with drug administration, iv) had no current, clinically relevant depression (Montgomery-Asberg Depression Rating Score: MADRS >20) [1], had no history of bipolar disorder, psychotic disorder or a diagnosis of Borderline Personality Disorder, vi) reported no recent use of illicit substances (self-report and as confirmed by urine screen) and fulfilled MRI inclusion criteria.

Clinical assessment, neuropsychological testing and clinical questionnaires
Clinical assessment was carried out by a trained rater and comprised a structured assessment of medical and psychiatric history, including medications, the National Adult Reading Test (NART) to quantify IQ [2], and the Montgomery Asberg Depression Rating Scale (MADRS) to quantify depressive symptoms [1]. Clinical interview included the Mini International Neuropsychiatric Inventory (MINI) [3] to identify mental disorders, which were exclusionary. Participants then completed the Barratt Impulsiveness Scale (BIS-11) [4] and the Padua Inventory (PI-WSUR) [5] to characterize their self-reported levels of impulsivity and compulsivity. We evaluated illicit substance use during the clinical interview and additionally assessed current drug use using a urine drug screen to ensure abstinence of any illicit drugs. Participants were included on the basis of having a negative urine drug screen and negative clinical assessment relating to substance use. One participant had a faint trace positive for opioid but this was deemed to be a false positive following detailed evaluation by a psychiatrist. Weekly caffeine intake was assessed using a bespoke Caffeine History Questionnaire (quantifying typical caffeine amounts and beverages/snacks consumed) (unpublished questionnaire from K. Ioannidis and S.R. Chamberlain, available on OSF: https://osf.io/9tjzp/), while nicotine use was evaluated using the Fagerstrom self-report questionnaire [6] and by asking participants to indicate how many cigarettes they smoked per day. Of note, only one participant smoked." A characterization of the sample including the demographic data and questionnaire scores can be found in table S1.

Supplemental data analyses Data Quality Control
Data was screened for quality, and subjects excluded according to the following criteria: i) Completely deterministic responding, i.e., if participants always gave the same response to each cue (e.g. always Go for Win cues and NoGo for avoid cues), indicating failure to understand the task. No subject met this criterion. ii) More than 25% (i.e., >40 trials) of pressing an uninstructed button, given that on those trials participants did not receive feedback and could not learn. This led to exclusion of one participant with 50 false responses resulting in a final sample size of N = 35. After exclusion of this one participant, the mean number of false responses was < 1 % across participants. Those 82 trials were subsequently excluded for the mixed model analysis (0.68% out of 12000 in total). For the logistic regression, we excluded trials on which participants showed reaction times (RT) faster than 200 ms, as it can be assumed that such fast RTs cannot reflect the stimulus being adequately processed (24 trials in total).
As noted in the main manuscript, we detected difference across test site in terms of performance, which are further visualized in Figure S1. Here, we also report the employed model equation for completeness: Choice~ Req Action* Valence * Drug * Site + (Req Action* Valence * Drug + 1|Subject) Figure S1. Site differences in performance. A and B) Mean probability of making a Go response (P(Go) for each cue condition and mean Accuracy per cue sorted according to bias congruent and incongruent response requirements, depicted for the Cambridge recruitment site. C and D) Mean probability of making a Go response (P(Go) for each cue condition and mean Accuracy per cue sorted according to bias congruent and incongruent response requirements, depicted for the Chicago recruitment site

Task block effects
In the main manuscript, we assessed whether the impact of tolcapone on motivational biases was constant over time. For this, we included task block (2 blocks of 80 trials) as 2-level factor, within-subject effect in the mixed model. The Valence x Drug x Block interaction was significant (χ 2 (1) = 4.6, p-value = 0.03). For completeness, we report the model equation and the full statistics of this model below. As reported in the main manuscript, we additionally analyzed the Drug x Valence interaction for block 1 and block 2 separately.

Session order effects
As a control analysis, we assessed whether session order (i.e. whether tolcapone was administered on testing day 1 or 2) interacted with the Drug x Valence interaction of interest. This was done given such effects were observed in a previous pharmacological study with the same task [7]. However, we did not see a Drug x Valence x Order interaction here (χ 2 (1) = 1.05, p-value = .

Mixed regression analysis of reaction times
As a secondary analysis, we analysed reaction times using the same factors (Required action, Valence, Drug as within subject variables, Site as between-subject variable), using linear mixed-effects models as implemented in lme4, which are appropriate for continuous RT data.
Note that this analysis has much lower power than the analysis of the choice data, as by definition we can only analyse Go responses.
RT ~ Req Action* Valence * Drug * Site + (Req Action* Valence * Drug + 1|Subject) Results from this analysis conceptually replicate the main findings from the choice data regarding the non-drug effects. People were faster to respond to a Win rather than an Avoid cue (Valence χ 2 (1) = 42.0, p-value < .001), indicating a presence of a motivational bias in invigorating of responding. People were also faster to make a Go response when this was correct (i.e. for a Go cue), relative to when this was incorrect (i.e. for NoGo cue) (required action: χ 2 (1) = 37.8, p-value < .001), indicating that there was (in-or explicit) awareness of the correctness of making a Go response, i.e. learning. However, there were no interactions with tolcapone (Drug x Valence: χ 2 (1) < 0.05, p-value = .08). Full statistics are presented in Table S3.

Model equations
Across all models, action weights (w) for Go and NoGo responses were determined per trial (t) and cue (s) and used together with a softmax function to compute the associated choice probabilities: The model M1 was a standard Rescorla Wagner model that only included two model parameters, namely a learning rate (ε) and a feedback sensitivity parameter (ρ). Action weight for both motor responses were derived from unbiased action values using a standard deltalearning rule including these two parameters.
For the second model (M2), an additional parameter capturing the 'go' bias (b) was introduced. The go bias parameter b gets added exclusively to the action weight of the Go response and describes an individuals' tendency to make a 'go' response, independent of cue valence: To also assess the latent mechanisms in the decision process during tolcapone and placebo administration, we additionally employed computational modelling. For this, Model M3 was extended by a bias parameter π which was introduced to capture the impact of motivational biases on decision making. Modelling the impact of motivational biases was achieved by allowing the anticipated cue value (V) to impact the action weights, either by increasing or decreasing the action weight of the Go responses. Given that in this task, the cue valence for Win and Avoid cues was cued, we fixed the weight of the Go response to be positive (V = + 0.5) for Win cues or negative for Avoid cues (V = -0.5). Thereby, the strength of the motivational bias influencing action selection can be derived from the bias parameter π: Except for the learning rates ε, which were constrained between 0 and 1 (via a probit transform), all other parameters were left unconstrained.

Model Fitting and Comparison
Model fitting was performed with the cbm toolbox implemented in matlab [8], employing Hierarchical Bayesian Inference (HBI). Hierarchical Bayesian Inference (HBI) treats model as a random rather than a fixed effects term [9,10]. Thus, individual data can impact updating of group-level parameters differently, with participants whose observed data is fitted better by the respective model impacting the update to a greater extent. To determine the best fitting model, we employed a random effects model comparison [9,10] based on the Laplace approximation of model evidence on the individual level [11,12] from which group evidence is subsequently computed. We first compared model evidence across base models M1-M3 using model frequency and protected exceedance probability (PXP) [8]. PXP compares which model is expressed most often [9] while controlling for the possibility that this finding could result from chance. Simulations confirmed that the winning model could qualitatively reproduce key patterns of the data (see details below).

Model comparison and parameter estimates
We already report the modelling results in the main manuscript and illustrate the results from the winning tolcapone model M4 in Figure 2 in the main manuscript. For completeness, we here report all parameter estimates from the three base models (M1-M3) as well as the numerical results from the base and full model comparisons in Table S4.

Control analysis of site effect for estimated model parameters
In line with the analysis of the raw behavioural data, we also assessed whether site interacted with the tolcapone effect on the Pavlovian bias parameter but also the other three model parameters derived from the winning computational model using the following general model equation: Parameter~ Drug * Site Table S5 reports the full regression analysis results with factors Drug and Site, which are already referred to in the main manuscript. Apart from the already reported effect of tolcapone on the Pavlovian bias parameter (χ 2 (1) = 5.4, p-value = .02), we report significant effects of testing site on both learning rate (χ 2 (1)= 8.1, p = .005) and feedback sensitivity ( χ 2 (1)= 11.3, p < .001) which can capture the difference in performance in terms of instrumental learning of the task, but are unrelated to the effects of tolcapone. For further full results, see table S6. Effects are visualized in Figure S3. Figure S3: A) Tolcapone effect on parameter estimates for motivational bias parameter for the placebo and tolcapone session as well as the parameters for feedback sensitivity, the learning rate and the Go bias from the winning model M4.

Model validation and parameter recovery
While model comparison can indicate which model performs best within a selected modelspace, it does not answer whether a model is able to reproduce key qualitative patterns in the data. [13]. Thus, we also tested whether our model could recover the model parameters. For this, we simulated 150 synthetic datasets per individual, based on their individual optimal parameter estimates from the winning model M4. We first validated that this simulated data from the winning model captures the key features of the task. As can be observed in Figure  S4A, the model captured both instrumental learning and the presence of a motivational bias (green above red lines), and matched the original data relatively well.
We then re-fitted our winning computational model to each of the 150 simulated datasets and averaged the resulting model parameter estimates across simulation runs, and examined the difference in parameter estimates for the Tolcapone and Placebo condition. This recovered the significant difference in the Pavlovian bias parameter π (χ2(1) = 6.3, p-value = .01; Fig S3). The feedback sensitivity and go bias parameters did not differ between placebo and tolcapone (for observed and simulated data, all p > .1). However, there was a significant difference between the recovered learning rate under tolcapone versus placebo (χ2(1) = 5.0, p-value = .03), which was not significant when fitting the real data (χ2(1) = 1.8, p=0.2). However, for both real and simulated data, the estimated learning rate decreased under tolcapone. It is possible that the non-significant parameter difference for the real data surfaced as a significant effect in the simulated data. This is because model fitting 'reduces' the real data to a few parameters, effectively getting rid of any variance not explained by the model. Thus, simulations result in 'cleaner', i.e. more consistent, data. This can then lead to parameter estimates to become significant. Speculatively, this change in the instrumental learning rate (in addition to a modulation of motivational bias parameter) is putatively of interest, in that it is in line with a shift in the balance between the instrumental and Pavlovian controllers. Figure S4: A) Simulated trial by trial behaviour based on n = 150 simulated datasets. B) Recovered parameter estimate for motivational bias parameter for the placebo and tolcapone session based on n = 150 simulation runs.