Reward Sensitivity and Noise Contribute to Negative Affective Bias: A Learning Signal Detection Theory Approach in Decision-Making

In patients with mood disorders, negative affective biases – systematically prioritising and interpreting information negatively – are common. A translational cognitive task testing this bias has shown that depressed patients have a reduced preference for a high reward under ambiguous decision-making conditions. The precise mechanisms underscoring this bias are, however, not yet understood. We therefore developed a set of measures to probe the underlying source of the behavioural bias by testing its relationship to a participant’s reward sensitivity, value sensitivity and reward learning rate. One-hundred-forty-eight participants completed three online behavioural tasks: the original ambiguous-cue decision-making task probing negative affective bias, a probabilistic reward learning task probing reward sensitivity and reward learning rate, and a gambling task probing value sensitivity. We modelled the learning task through a dynamic signal detection theory model and the gambling task through an expectation-maximisation prospect theory model. Reward sensitivity from the probabilistic reward task (β = 0.131, p = 0.024) and setting noise from the probabilistic reward task (β = –0.187, p = 0.028) both predicted the affective bias score in a logistic regression. Increased negative affective bias, at least on this specific task, may therefore be driven in part by a combination of reduced sensitivity to rewards and more variable responses.

In the standard SDT model, the beliefs about the environment are mapped to a criterion.It has two components, sensitivity (d') and criterion (c).We did not have to model d' as it was the same for all participants (d'=1.5).This is because the noise source in the task was exclusively the external noise in the sampling from the category distributions, and not due to internal perceptual noise, as the participant was never asked to view the high-contrast orange or purple line stimulus prior to making their decision.External noise was proportional to the orientation uncertainty of the sampling distributions for the purple and orange arrows.
In a probabilistic reward task with equal category probabilities and equal reward amounts that differ in reward probability based on category, the optimal criterion (copt) placement is computed as a function of the category reward ratio and the sensitivity.In our task, the reward ratio corresponded to beliefs about the reward probabilities of the orange and purple categories.As the two categories were yoked, this can be expressed as: corresponds to the ratio of the reward probability estimates of the purple category (1 −  ! ) versus the orange category ( ! ).
To model the dynamic learning process for the beliefs about the category reward contingencies, we chose to implement a reformulation of the best-fitting model of Norton et al. ( 2019), adapted to a variable reward contingency context.For each trial, an exponential-averaging learning function updates the reward-probability estimate for the orange category ( '(#) ) the following weighted average: Where  (#%$) is the information gained on the previous trial through reward,  #%$ is the correctness of the previous response (i.e., either 1 if the stimulus was correctly categorised, or 0 otherwise), a is the fitted learning rate parameter, and  '(#%$) is the estimation of the reward of the orange category on the previous trial.Note that this formulation only updates the reward-probability beliefs following correct trials, as no information about reward probability can follow an incorrect trial.
The SDT model of criterion placements is a normative behavioural model.However, participants rarely behave optimally.Therefore, we considered how far the placement of the slider, the criterion placement of the observer (c), deviated from the optimal, by fitting a magnitude scaling parameter (G) and a shifting parameter (b) to the criterion placement of the optimal observer with identical rewardprobability beliefs: The gain parameter represented amplified G[1, ¥] or conservative G[0,1] scaling of the optimal criterion placement and therein a measure of reward sensitivity.In cases in which G was negative, the behavioural mapping was incorrect, but the response ranges were still informative for reward sensitivities.The parameter b captured response biases such as a general category preference for either the purple or orange category.To model the actual response, we used a likelihood function given the model predictions and the setting noise (s): The dynamic SDT model was fit using custom-coded RStan scripts (version 2.21.0, (Stan-Development-Team, 2022) in RStudio (version 1.3.1093).We used four Markov-Chain Monte Carlo chains per participant, each containing 10000 parameter-space samples.The first 5000 samples from each chain were discarded as warm-up samples, and the mean across chains was calculated for each parameter.We constrained our sampling to the following parameter-space: a=

Prospect Theory Model and Fitting Procedure:
We fit a prospect theory model with three parameters (r,d,t) on the data of our gambling task.Our subjective value sensitivity parameter r could take values between 0-¥ with r = [0,1] corresponding to being risk averse and preferring certainty over uncertainty, even if the objective payoff of the certain choice is lower, r = 1, being risk neutral, and r > 1 corresponding to risk-seeking behaviour, valuing uncertainty over certainty.Our loss aversion parameter d [0, ¥] corresponded to weighing potential losses stronger than potential gains with larger d corresponding to larger aversion to losses.The inverse temperature parameter t [0, ¥] indicated how deterministic participants were in their choice strategy, with larger t corresponding to more consistent choice behaviour.