Association Between a Directly Translated Cognitive Measure of Negative Bias and Self-reported Psychiatric Symptoms

Background Negative interpretation biases are thought to be core symptoms of mood and anxiety disorders. However, prior work using cognitive tasks to measure such biases is largely restricted to case-control group studies, which cannot be used for inference about individuals without considerable additional validation. Moreover, very few measures are fully translational (i.e., can be used across animals and humans in treatment-development pipelines). This investigation aimed to produce the first measure of negative cognitive biases that is both translational and sensitive to individual differences, and then to determine which specific self-reported psychiatric symptoms are related to bias. Methods A total of 1060 (n = 990 complete) participants performed a cognitive task of negative bias along with psychiatric symptom questionnaires. We tested the hypothesis that individual levels of mood and anxiety disorder symptomatology would covary positively with negative bias on the cognitive task using a combination of computational modeling of behavior, confirmatory factor analysis, exploratory factor analysis, and structural equation modeling. Results Participants with higher depression symptoms (β = −0.16, p = .017) who were older (β = −0.11, p = .001) and had lower IQ (β = 0.14, p < .001) showed greater negative bias. Confirmatory factor analysis and structural equation modeling suggested that no other psychiatric symptom (or transdiagnostic latent factor) covaried with task performance over and above the effect of depression, while exploratory factor analysis suggested combining depression/anxiety symptoms in a single latent factor. Generating groups using symptom cutoffs or latent mixture modeling recapitulated our prior case-control findings. Conclusions This measure, which uniquely spans both the clinical group-to-individual and preclinical animal-to-human generalizability gaps, can be used to measure individual differences in depression vulnerability for translational treatment-development pipelines.


Supplementary Information
Previous Findings Figure S1: This study builds on prior work developing a measure of negative affective bias as indexed by proportion of mid tones interpreted as high reward ('p(mid)as high') in A) a rat pharmacological model of mood and anxiety disorders and B) humans with mood and anxiety disorders relative to healthy controls. Specifically, symptomatic ('Symptom') rats and humans both demonstrate significantly increased negative affective bias (i.e. reduced prediction that ambiguous outcomes will lead to higher rewards) relative to non-symptomatic controls ('HC'). The effect size of the Human group difference is d= 0.72.

Pilot
Prior to the main study, extensive piloting was carried out to minimise the amount of bias introduced by the stimuli used to translate the task into the visual domain (figure S2). Pilot 1 (circle size) had 4 counterbalancing versions based on the stimuli, response and outcomes (see table S1 for full details).
Following discovery of clear between-subject bias, pilot 2 used line orientation and had 8 counterbalancing versions (see table S2). Further between-subject bias meant the main study was restricted to two counterbalancing versions from pilot 2.

Results
Both pilots demonstrate clear sources of between-subject bias (figure S2; main effect of counterbalancing in pilot 1 (F(3,260)=35,p<0.001) and pilot 2 (F(7,143)=3,p=0.005). Individuals demonstrated 'higher' bias when large (or vertical) stimuli were paired with large rewards on the righthand side. These likely reflect pre-potent biases (e.g. bigger sizes are associated with numerically higher amounts and in Latinate languages we read from left to right). For the main task, testing was therefore restricted to counterbalancing 1 and 7 from pilot 2 as this constituted the smallest difference between two counterbalancing conditions across the two pilots (mean difference=0.003, p(Tukey)=1). Pilot 2 design was also preferred over pilot 1 because there is only one interpretation of line orientation, whereas a circle has both area and diameter.   figure). Following training, participants were presented with intermediate stimuli (mid size circle in P1 or angled lines in P2) and had to choose which of the same two buttons to press. This was randomly followed by the high or low reward (i.e. 50% contingency). An

Supplemental Questionnaire Details
We included only the Trait component of the STAI in all subsequent analyses as it is designed to capture stable trait symptoms (but is also highly correlated with the State measure). The Beck Depression Inventory was presented without the 9 th item concerning suicidal thoughts as we did not wish to solicit information about suicidal thoughts without the ability to intervene. The OCI-R and SZ measures were chosen as negative controls for the mood and anxiety disorder questionnaires (i.e. to demonstrate effects were specific rather than generic), as well as to be consistent with previous research examining identifying individual differences in pathology (1). A 12-item form of Raven's Progressive Matrices (2) was completed as a measure of IQ as it is strongly predictive of (0.90) the full 36-item Advanced Progressive Matrices (2), but is short enough to minimize participant drop-out. Participants were also required to fill out demographic questionnaires indicating their age, gender, as well as various mentalhealth related questions.

Supplemental Task Details
The training phase used 'extreme' cues only, and participants were instructed to maximize the amount earned in a 2-alternative-forced-choice task. Participants pressed a button (left or right) when they saw the upper and lower extreme stimuli (e.g., high and low frequency tones in the original task; vertical/horizontal lines in this task) to receive a reward (£1 or £4). The stimulus-response-outcome contingencies were 100% (but counterbalanced across individuals) and acquired by participants on a trial and error basis. In the main task, participants were also presented with intermediate stimuli (diagonal line) which was randomly reinforced with the two reward outcomes (i.e., a contingency of 0.5).
The test version of the task had 40 trials for each intermediate/extreme stimulus (total=120).

Supplemental Modelling Details
The DDM models decision making as a process of evidence accumulation towards a decision threshold, utilising speed and accuracy of responses to model the biases that shape our responses. The parameter of interest here (drift rate) indicates the rate of information accumulation towards a response (3), and has been shown to be more negative in individuals with mood and anxiety disorders (4). No bias on this measure would be driftrate=0. As previous studies showed no evidence of prior bias (i.e. 6 'starting point') when using this task (4), we used the EZ diffusion model which assumes no bias (and which is computationally tractable with this number of participants).

Missing data
The reported parameters are obtained using list-wise deletion of the individuals with incomplete data

Confounds
To confirm robustness to confounds, we also re-ran the initial regression analysis but added answers to questions about past, current or family history of diagnoses or symptoms as well as current treatment use/seeking. This model resulted in identical inference to the model of interest and none of these additional factors were significantly related to task performance indicating that these factors do not confound the observed symptom effects (see supplementary  table  S3  and https://github.com/ojr23/InterpretationBias).
To exclude potential bots we also reran analyses excluding individuals with mean task RTs 2 standard deviations below the mean for the low stimulus (N=36 excluded). When we do this the statistical inference remains identical (see https://github.com/ojr23/InterpretationBias). Moreover, there are no individuals who have an accuracy difference of 1 for the high vs low trials (i.e. individuals who are bashing a single key for every trial).

Bimodality in OCIR data
One perplexing issue is the presence of bimodality in the OCIR scale ( Figure S3a) which leads to substantially higher mean symptoms than other studies (for example (1)). Indeed, in our sample 499 participants out of the 1060 met a clinical cut-off of 21 for OCD (5), which seems unusually high. In a post-hoc exploratory analysis, however, we identified substantially higher distributions for the OCIR sum scores for those tested in the UTC+05:00 (Pakistan and Central Asian -stans) and UTC+05:30 (India and Sri Lanka only; Figure S3b) timezones. As such, it appears that these timezones are driving the bimodality. In prior studies (e.g. (1)) recruitment has often been restricted to the USA. For direct comparison, in our samples from within USA time zones, the mean OCIR scores fall well below 21 (Table S4), with the exception of the west coast of the Americas which displays some bimodality (Figure S3b; interestingly 9 Califorina saw the the highest proportion of Indian migrants in the US between 2014-2018 (6)). Notably, prior work has demonstrated considerably higher scores on the OCI-R in individuals of Asian relative to White or Black heritage, which has been speculated to reflect cultural differences in how OCD symptoms are interpreted and reported (7). Thus, we argue that this bimodality may be due to the uniquely global nature of our sample and reflect the impact of cultural and/environmental factors on questionnaire responses. It is also worth noting that time zone also had an effect on both depression and anxiety scores, but it had a much larger effect on OCIR scores than any other variable ( Figure S4). Critically, including time zone in the primary regression analysis had no impact on the inference (and time zone was not itself a significant predictor of task performance Table S5). Thus, while time zone accounts for considerable variance in symptom scores, it does not influence the relationship between symptoms and cognitive task performance. Overall, this exploratory analysis highlights the importance of considering cultural/environmental predictors in psychiatry research and the importance of trying to generalize outside of WEIRD (Western, Educated, Industrialized, Rich, and Democratic) samples (8). Figure S4: Regression weights from a linear regression model predicting time zone (Timezone ~ GenderMF + Age + Ravens + spreadsheet +BDI + STAI2 + SZ + OCIR + propmedhigh) demonstrating the biggest impact of OCIR scores (and no impact of task performance (propmedhigh)).

Supplemental Discussion
It is also worth noting that the task developed here bears some resemblance the probabilistic reward task developed by Pizzagalli and colleagues (9). In this prior task, two stimuli (a short or long line) are reinforced (with the same 5c reward) either sparingly or frequently. In other words, subjects learn over time that one stimulus is better because it is more frequently rewarded. Depressed individuals fail to show a bias towards the more frequently rewarded stimulus. This is similar to our effect in that the more depressed individuals show less of a bias towards the more highly rewarded stimulus. However, the Pizzagalli task requires individuals to integrate rewards over time (i.e. learn) whereas in our current task the feedback is asymmetric on each trail (£1 vs £4) and the main version of the task occurs after learning of the high and low exemplars has already taken place. Of course, the 50% contingencies of the ambiguous stimuli are still reinforced at 50% in our task, which means that they should eventually learn to respond at 50%. Our overall bias suggests that they don't and, moreover, the high split half reliability we see means that performance is consistent over time and that learning differences on these ambiguous stimuli are unlikely to be driving the effect on our task. Of note, work using reinforcement learning models and the Pizzagalli task (10) also indicate that effects may be more driven by differential sensitivity to reward in depression than learning rates (i.e. depressed individuals are less motivated by rewards, rather than being slower to learn about them). As such performance across both tasks may be driven by the same underlying mechanisms. Future work should seek to collect data from both tasks in the same individuals to determine how related performance across these tasks is. Also, while the task design being directly translated from animal models is a key strength of our study, the limitations of this method should also be noted. Our stimuli were notably abstract, whereas negative biases are often seen in self-relevant stimuli (11). As a result, this task could be more of a test of cognitive performance rather than affective bias (hence the correlation with IQ). Future work could control for this by including a cognitive task with no affective component (e.g. the same task with no reinforcement) as a control condition.