Misperceiving Momentum: Computational Mechanisms of Biased Striatal Reward Prediction Errors in Bipolar Disorder

Background Dysregulated reward processing and mood instability are core features of bipolar disorder that have largely been considered separately, with contradictory findings. We sought to test a mechanistic account that emphasizes an excessive tendency in bipolar disorder to enter recursive cycles in which reward perception is biased by signals that the environment may be changing for the better or worse. Methods Participants completed a probabilistic reward task with functional magnetic resonance imaging. Using an influential computational model, we ascertained whether participants with bipolar disorder (n = 21) showed greater striatal tracking of momentum-biased reward prediction errors (RPEs) than matched control participants (n = 21). We conducted psychophysiological interaction analyses to quantify the degree to which each group modulated functional connectivity between the ventral striatum and left anterior insula in response to fluctuations in momentum. Results In participants with bipolar disorder, but not control participants, the momentum-biased RPE model accounted for significant additional variance in striatal activity beyond a standard model of veridical RPEs. Compared with control participants, participants with bipolar disorder exhibited lower insular-striatal functional connectivity modulated by momentum-biased RPEs, an effect that was more pronounced as a function of current manic symptoms. Conclusions Consistent with existing theory, we found evidence that bipolar disorder is associated with a tendency for momentum to excessively bias striatal tracking of RPEs. We identified impaired insular-striatal connectivity as a possible locus for this propensity. We argue that computational psychiatric approaches that examine momentary shifts in reward and mood dynamics have strong potential for yielding new mechanistic insights and intervention targets.

kernel.Intrinsic autocorrelations were accounted for by AR (1) and low frequency drifts were removed via the 128s high pass filter.

ROI definition
To define the ventral striatum ROI, we followed Eldar and Niv's approach and included all grey-matter voxels within the bilateral ventral striatum that responded to 'reward' outcomes versus 'loss' outcomes contrast at a family-wise-error (FWE) corrected threshold of pFWE < .05(1).We extracted activations from an insula ROI, defined as 8mm spheres around the peak coordinates for the bilateral anterior insula reported by Vinckier and colleagues (4) (left: x = -30, y = 22, z = -6; right: x = 32, y = 20, z = -6).

Unbiased reward perception model
In the unbiased model, reward expectations were formalised as the net expected value () of the possible outcomes; that is the probability and value of winning the sum of money at stake combined with the probability and value of losing this stake.Reward prediction error (), was operationalised as the difference between the actual outcome value () obtained and the expected value i.e.   .Multiple studies have shown that RPEs are still tracked in tasks without learning components (5,6).
EV estimates on the next trial are not updated because EV estimates are explicit on every trial and RPE does not drive the update to EV.

Momentum-biased reward perception model
To account for effects of momentum on each trial ( ) on valuation, the unbiased model is modified to compute momentum-biased RPEs using perceived momentumbiased outcome value ( ) instead of the objective outcome () (1).
Here,  is the momentum bias parameter that indicates the direction and degree of momentum bias.If  = 1, momentum does not bias the perception of reward or expected value.With  > 1, momentum exerts positive feedback i.e. reward is perceived as larger in a good mood and smaller in a bad mood, whereas 0<  <1 would correspond to a negative feedback on reward value.In our present study, we set  to 1.2, based on the average  derived from Eldar and Niv's (1) sample of control participants who scored relatively high on the Hypomanic Personality Scale (2), a measure of trait mood instability, as determined by a median split.
As per Eldar and Niv's model ( 1), we updated momentum on each trial, ( ), based on the momentum at the beginning of the trial, ( ), and the difference between  on the current trial,  and  (i.e. the degree to which the environment is improving or worsening).

𝑚 𝑚 𝜂 𝑅𝑃𝐸 𝑚 3
Before applying momentum to , we constrained it using a sigmoid function, allowing it to take values between -1 and 1: Unlike Eldar and Niv's study (1), we did not use subjective ratings of mood to as a confirmatory check of how well the model captures participants' self-reported mood during the task.However, Eldar and Niv's (1) findings lend support to this model; it outperformed the unbiased and other reinforcement-learning models and explained participants' trial-by-trial choices and subjective mood ratings well.We also conducted additional analyses (see Supplementary Results) to test two models with fewer assumptions.Specifically, we tested a GLM with no assumptions about computational model parameters, in which RPE in the previous trial (i.e.RPEt-1) predicts outcomelocked striatal and insular activation in the next trial, on top of RPEt.We also tested a computational model in which we removed momentum bias and instead examined the influence of the interaction of RPE history with RPE, analogous to biased RPEt in our main analyses.We found that in both these cases only participants with bipolar disorder track 1) momentum (i.e.RPEt-1) and 2) biased RPE derived from the simplified model without a momentum bias parameter.
Given that in our task, the probability and magnitude of outcomes were fixed and made explicit in each trial, the RPEs do not have utility for updating expectations.Instead, the model updates trial-by-trial estimates of momentum () using averaged free parameter values from previous datasets using this model (1).Given this, we did not directly quantify momentum and momentum-update rate, which quantifies how quickly on all participants from a previous study using participants with high hypomanic traits (1), which may not fully capture inter-individual variability.We reasoned that the average parameter values would be similar in our bipolar group to the high hypomanic participant group given that high hypomanic symptoms confer psychometric vulnerability for bipolar disorder (2).However, Wilson and Niv (7) demonstrate that deriving RPEs under group-averaged learning rate parameters achieved comparable fits to the fMRI data as utilising individually estimated learning rate parameters.
Moreover, they found that, for learning rate, the quality of fit of the resulting RPEs to ventral striatal activations was highly robust even to large deviations from the actual group average.However, these specific results pertain to the effect of learning rate and given our present findings, future research could examine this in relation to the momentum bias parameter.

Confirmatory checks
We conducted confirmatory checks which relied on fewer modelling assumptions to determine whether there is a carry-over effect of "momentum" in participants with bipolar disorder.Specifically, we tested 1) whether the RPE value in the previous trial (i.e.RPEt-1) predicts outcome-locked striatal activation on the current trial, t (after confirming that these values were uncorrelated: median r = -.064), and 2) whether this effect is stronger in participants with bipolar disorder.This is analogous to the momentum-bias of RPE in the computational modelling analyses.This analysis confirmed that only participants with bipolar disorder tracked RPEt- We also tested a simplified computational model in which we removed the momentum bias parameter, to see whether striatal activation was additionally modulated by the interaction between RPE history and current RPE.This analysis also converged in finding evidence that this interaction term was only tracked in the bipolar group [t(20) = 2.51, p = .008,d = .52]and not in matched controls [t(20) = .57,p = .29,d = .18].

2014)
For the anticipation stage, a three-way group by probability hemisphere interaction approached significance (p = 0.058, d = .63)and showed that left ventral striatal activity was greater in participants with bipolar disorder than matched controls.For the outcome stage, an overall group effect was found (p = 0.04, d = .69);follow-up analyses showed greater activity for gains (vs.losses) (p < 0.001, d = .75)and large (vs.small) outcomes (p = 0.024, d = .76).

Striatal tracking of momentum-biased RPE (within matched controls)
Our matched control group is comparable to the entire non-clinical sample in Eldar & Niv (2015), i.e. their low and high bias subgroups combined.Our effect size within our bipolar group is indeed larger (d = 0.54) than that in the high-bias group reported by Eldar and Niv (2015; t-value of only 0.69±0.15,Cohen's d = 0.29 assuming df = n-1 = 11.5 on median split of total n = 25).This is to be expected given that ours is a clinical sample.However, the effect of biased reward perception in our matched control group is many times larger (d = .18)than the effect in their low-bias group (d = 0.02, based on "t = −0.04±0.14",df = 11.5.Indeed, when we split our matched controls by score on the Hypomanic Personality Scale (see figure below), to derive something comparable to the low and high subgroups in Eldar and Niv (2015), the effect sizes in our lower HPS subgroup are much weaker (d= 0.06) than in our higher HPS subgroup (d = 0.30).Consequently, our matched controls have an effect size comparable to the net of the two subgroups in Eldar and Niv (2015), thereby potentially hampering a between-group difference for our bipolar participants compared to our control group as a whole.

Supplementary Discussion
The supplementary results reported here extend the main analyses by exploring the influence of model-estimated momentum values on left anterior insular-ventral striatal functional connectivity.These findings provide insight into how these neural processes are affected in periods of higher and lower momentum.Only matched controls, and not participants with bipolar disorder, increased striatal-insular coupling in response to higher momentum of recent outcomes.This corroborates our main findings of stronger striatal-insular functional connectivity in matched controls compared to participants with bipolar disorder as momentum-biased RPEs become more positive, which occurs primarily during upward momentum.
As we did not find that ventral striatal activity tracked momentum-biased RPE in matched controls in our main findings, we reason that the significant positive modulation of striatal-insular connectivity by momentum-biased RPE could arise from changes in the momentum signal: i.e. instances where momentum-biased RPE deviates from zero correspond to instances when momentum similarly becomes strongly positive or negative.We speculate that greater ventral striatal coupling with the anterior insula helps to contextualise reward perception and reduces the chances of misperceiving the likelihood of getting future rewards from the environment.The lack of contextualisation via the striatal-insular coupling in participants with bipolar disorder could therefore result in a greater propensity to misperceive cues in the environment when mood is strongly positive or elevated, which could result in recursive cycles where expectations of reward, moods and behaviours escalate to extremes.

Figure S2 .
Figure S2.Mean striatal activity modulated by momentum-biased RPE across two subgroups of matched controls that had low (n=10) and high scores (n=11) on the Hypomanic Personality Scale (as determined by a median split) and participants with bipolar disorder (n=21).CG = Control Group; BD = participants with bipolar disorder.