Paranoia as a deficit in non-social belief updating

Paranoia is the belief that harm is intended by others. It may arise from selective pressures to infer and avoid social threats, particularly in ambiguous or changing circumstances. We propose that uncertainty may be sufficient to elicit learning differences in paranoid individuals, without social threat. We used reversal learning behavior and computational modeling to estimate belief updating across individuals with and without mental illness, online participants, and rats chronically exposed to methamphetamine, an elicitor of paranoia in humans. Paranoia is associated with a stronger prior on volatility, accompanied by elevated sensitivity to perceived changes in the task environment. Methamphetamine exposure in rats recapitulates this impaired uncertainty-driven belief updating and rigid anticipation of a volatile environment. Our work provides evidence of fundamental, domain-general learning differences in paranoid individuals. This paradigm enables further assessment of the interplay between uncertainty and belief-updating across individuals and species.


Introduction
Paranoia is excessive concern that harm will occur due to deliberate actions of others (Freeman and Garety, 2000). It manifests along a continuum of increasing severity (Freeman et al., 2005;Freeman et al., 2010;Freeman et al., 2011;Bebbington et al., 2013). Fleeting paranoid thoughts prevail in the general population (Freeman, 2006). A survey of over 7000 individuals found that nearly 20% believed people were against them at times in the past year; approximately 8% felt people had intentionally acted to harm them (Freeman et al., 2011). At a national level, paranoia may fuel divisive ideological intolerance. Historian Richard Hofstadter famously described catastrophizing, context insensitive political discourse as the 'paranoid style': "The paranoid spokesman sees the fate of conspiracy in apocalyptic terms-he traffics in the birth and death of whole worlds, whole political orders, whole systems of human values. He is always manning the barricades of civilization. He constantly lives at a turning point [emphasis added]." (Hofstadter, 1964).
At its most severe, paranoia manifests as rigid beliefs known as delusions of persecution. These delusions occur in nearly 90% of first episode psychosis patients (Freeman, 2007). Psychostimulants also elicit severe paranoid states. Methamphetamine evokes new paranoid ideation particularly after repeated exposure or escalating doses (86% and 68%, respectively, in a survey of methamphetamine users) (Leamon et al., 2010).
Paranoia has thus far defied explanation in mechanistic terms. Sophisticated Game Theory driven approaches (such as the Dictator Game [Raihani and Bell, 2018;Raihani and Bell, 2017]) have largely re-described the phenomenon -people who are paranoid have difficulties in laboratory tasks that require trust (Raihani and Bell, 2019). However, this is not driven by personal threat per se, but by negative representations of others (Raihani and Bell, 2018;Raihani and Bell, 2017). We posit that such representations are learned (Fineberg et al., 2014;Behrens et al., 2008), via the same fundamental learning mechanisms (Cramer et al., 2002) that underwrite non-social learning in non-human species (Heyes and Pearce, 2015). We hypothesize that aberrations to these domaingeneral learning mechanisms underlie paranoia. One such mechanism involves the judicious use of uncertainty to update beliefs: Expectations about the noisiness of the environment constrain whether we update beliefs or dismiss surprises as probabilistic anomalies. The higher the expected uncertainty (i.e., 'I expect variable outcomes'), the less surprising an atypical outcome may be, and the less it drives belief updates ('this variation is normal'). Unexpected uncertainty, in contrast, describes perceived change in the underlying statistics of the environment (Yu and Dayan, 2005;Payzan-LeNestour and Bossaerts, 2011;Payzan-LeNestour et al., 2013) (i.e. 'the world is changing'), which may call for belief revision.
Since excessive unexpected uncertainty is a signal of change, it might drive the recategorization of allies as enemies, which is a tenet of evolutionary theories of paranoia (Raihani and Bell, 2019). We tested the hypothesis that this drive to flexibly recategorize associations extends to non-social, domain-general inferences. We dissected learning mechanisms under expected and unexpected uncertainty -probabilistic variation and changes in underlying task structure (volatility). Here, volatility is a property of the task. Unexpected uncertainty is the perception of that volatility. Participants completed a non-social, three-option learning task which challenged them to form and revise associations between stimuli (colored card decks) and outcomes (points rewarded and lost), in addition to their beliefs about the volatility of the task environment. They encountered expected uncertainty as eLife digest Everyone has had fleeting concerns that others might be against them at some point in their lives. Sometimes these concerns can escalate into paranoia and become debilitating. Paranoia is a common symptom in serious mental illnesses like schizophrenia. It can cause extreme distress and is linked with an increased risk of violence towards oneself or others. Understanding what happens in the brains of people experiencing paranoia might lead to better ways to treat or manage it.
Some experts argue that paranoia is caused by errors in the way people assess social situations. An alternative idea is that paranoia stems from the way the brain forms and updates beliefs about the world. Now, Reed et al. show that both people with paranoia and rats exposed to a paranoiainducing substance expect the world will change frequently, change their minds often, and have a harder time learning in response to changing circumstances.
In the experiments, human volunteers with and without psychiatric disorders played a game where the best choices change. Then, the participants completed a survey to assess their level of paranoia. People with higher levels of paranoia predicted more changes would occur and made less predictable choices. In a second set of experiments, rats were put in a cage with three holes where they sometimes received sugar rewards. Some of the rats received methamphetamine, a drug that causes paranoia in humans. Rats given the drug also expected the location of the sugar reward would change often. The drugged animals had harder time learning and adapting to changing circumstances.
The experiments suggest that brain processes found in both rats, which are less social than humans, and humans contribute to paranoia. This suggests paranoia may make it harder to update beliefs. This may help scientists understand what causes paranoia and develop therapies or drugs that can reduce paranoia. This information may also help scientists understand why during societal crises like wars or natural disasters humans are prone to believing conspiracies. This is particularly important now as the world grapples with climate change and a global pandemic. Reed et al. note paranoia may impede the coordination of collaborative solutions to these challenging situations.
probabilistic win or loss feedback ('each option yields positive and negative outcomes, but in different amounts'), and unexpected uncertainty as reassignment of reward probabilities between options ('sometimes the best option may change,' reversal events). Although reversal events elicit unexpected uncertainty by driving re-evaluation of the options, participants increasingly anticipate reversals and develop expectations about the stability of the task environment. We implemented an additional task manipulation: a shift in the underlying probabilities themselves (contingency transition, unsignaled to the participants), that effectively changes task volatility. Armed with the task structure and participants' choices, we applied a Hierarchical Gaussian Filter (HGF) model (Mathys et al., 2011;Mathys et al., 2014) which allowed us to infer participants' initial beliefs (i.e., priors) about task volatility, their readiness to learn about changes in the task volatility itself (meta-volatility learning rate) and learning rates that captured their expected and unexpected uncertainty regarding the task.
We examined the behavioral and computational correlates of paranoia both in-person and in a large online sample, spanning patients and healthy controls with varying degrees of paranoia. We also undertook a pre-clinical replication in rodents exposed chronically to saline or methamphetamine to determine whether a drug known to elicit paranoia in humans might induce similar perceptions of unexpected uncertainty, without contingency transition (Groman et al., 2018). We predicted that people with paranoia and rats administered methamphetamine would exhibit stronger priors on volatility, facilitating aberrant learning through unexpected uncertainty. We further hypothesized that this learning style would manifest as frequent and unnecessary choice switching (increased choice stochasticity and 'win-switch' behavior) rather than increased sensitivity to negative feedback (increased 'lose-switch' behavior/decreased 'lose-stay' behavior).

Results
We analyzed belief updating across three reversal-learning experiments ( Figure 1): an in laboratory pilot of patients and healthy controls, stratified by stable, paranoid personality trait (Experiment 1); four online task variants administered to participants via the Amazon Mechanical Turk (MTurk) marketplace (Experiment 2); and a re-analysis of data from rats on chronic, escalating doses of methamphetamine, a translational model of paranoia (Experiment 3) (Groman et al., 2018).

Experiment 1
First, we explored trans-diagnostic associations between paranoia and reversal-learning in-person. Participants with and without psychiatric diagnoses (mood disorders: anxiety, depression, bipolar disorder, n = 8; schizophrenia spectrum: schizophrenia or schizoaffective disorder, n = 8; and healthy controls, n = 16), completed questionnaire versions of the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) screening assessment (Ryder et al., 2007), Beck's Anxiety Inventory (BAI) (Beck et al., 1988), Beck's Depression Inventory (BDI) (Beck et al., 1961), and demographic assessments (Table 1). Approximately two-thirds of participants endorsed three or fewer items on the SCID-II paranoid personality subscale (median = 1 item). Participants who endorsed four or more items were classified as high paranoia (n = 11), consistent with the diagnostic threshold for paranoid personality disorder. Low paranoia (n = 21) and high paranoia groups did not differ significantly by age, nor were there significant group associations with gender, educational attainment, ethnicity, or race, although a larger percentage of paranoid participants identified as racial minorities or 'not specified' (Table 1). Diagnostic category (i.e., healthy control, mood disorder, or schizophrenia spectrum) was significantly associated with paranoia group membership, c 2 (2, n = 32)=12.329, p=0.002, Cramer's V = 0.621, as was psychiatric medication usage, c 2 (1, n = 32)=9.871, p=0.003, Cramer's V = 0.555. These differences were due to the higher proportion of healthy controls in the low paranoia group. As expected, paranoia, BAI, and BDI scores were significantly elevated in the high paranoia group relative to low paranoia controls ( Participants completed a three-option reversal-learning task in which they chose between three decks of cards with hidden reward probabilities (Figure 1a and b). They selected a deck on each turn and received positive or negative feedback (+100 or À50 points, respectively). They were Figure 1. Probabilistic reversal learning task. (a) Human paradigm: participants choose between three decks of cards with different colored backs (Blue, Red, and Green) with different, unknown probabilities of reward and loss. (b) Reward contingency schedule for in laboratory experiment (Reward probabilities associated with the different colored decks, Blue, Red, Green, across trials and blocks). On trial 81, the probability context shifted from 90%, 50%, and 10% (dark grey) to 80%, 40%, and 20% without warning (light grey). (c), Reward contingency schedules for online experiment. (d) Rat Figure 1 continued on next page instructed to find the best deck with the caveat that the best deck may change. Undisclosed to participants, reward probabilities switched among decks after selection of the highest probability option in nine out of ten consecutive trials ('reversal events'). Thus, the task was designed to elicit expected uncertainty (probabilistic reward associations) and unexpected uncertainty (reversal events), requiring participants to distinguish probabilistic losses from change in the underlying deck values. In addition, reward contingencies changed from 90%, 50%, and 10% chance of reward to 80%, 40%, and 20% between the first and second halves of the task ('contingency transition'; block 1 = 80 trials, 90-50-10%; block 2 = 80 trials, 80-40-20%, unsignaled to the participants). This transition altered the volatility of the task environment, thereby making it more difficult to achieve reversals and often delaying their occurrence. Successful achievement of reversals was contingent upon adapting stay-vs-switch strategies, thereby testing subjects' abilities to update beliefs about the overall task volatility ('metavolatility learning'). High paranoia subjects achieved fewer reversals (MD = À2.31,CI=[À4.504,], t(30)=-2.145, p=0.04, Hedges' g = 0.798), but total points earned did not significantly differ, suggesting that there was no penalty for the different behaviors expressed by the more paranoid subjects (Table 1). We predicted that paranoia would be associated with unexpected uncertainty-driven belief updating.

Experiment 2
We aimed to replicate and extend our investigation of paranoia and reversal-learning in a larger online sample. We administered three alternative task versions to control for the contingency transition ( Figure 1c). Version 1 (n = 45 low paranoia, 20 high paranoia) provided a constant contingency of 90-50-10% reward probabilities (Easy-Easy); version 2 (n = 69 low paranoia, 18 high paranoia) provided a constant contingency of 80-40-20% (Hard-Hard); version 3 (n = 56 low paranoia, 16 high paranoia) served to replicate Experiment 1 with a contingency transition from 90-50-10% to 80-40-20% (Easy-Hard); version 4 (n = 64 low paranoia, 19 high paranoia) provided the reverse contingency transition, 80-40-20% to 90-50-10% (Hard-Easy). The stable contingencies (versions 1 and 2) lacked contingency transitions. Versions 3 and 4 manipulated task volatility mid-way, although the contingency transition was not signalled to participants. We predicted that high paranoia participants would find versions 3 and 4 particularly challenging. Given that version 3 is easier to learn initially, we expected participants to develop stronger priors and thus be more confounded by the contingency transition, compared to version four participants.
Participants' demographic and mental health questionnaire responses did not differ significantly across task version experiments ( Table 2). Total points and reversals achieved suggest variations in task difficulty ( n/a n/a 50.0% 62.5% n/a n/a % Male 28.6% 27.3% n/a n/a 50.0% 37.5% n/a n/a % Other or not specified 0% 0% n/a n/a 0% 0% n/a n/a Education 4.972 (6) ‡ 0.638 § 5.351 (6) ‡ 0.549 § % High school degree or equivalent 19.0% 45.5% n/a n/a 16.1% 6.3% n/a n/a % Some college or university, no degree 14.3% 0% n/a n/a 17.9% 25.0% n/a n/a % Associate degree 9.5% 9.1% n/a n/a 12.5% 12.5% n/a n/a % Bachelor's degree 23.8% 27.3% n/a n/a 35.7% 56.3% n/a n/a % Master's degree 9.5% 0% n/a n/a 14.3% 0% n/a n/a % Doctorate or professional degree 4.8% 0% n/a n/a 1.8% 0% n/a n/a % Completed some postgraduate 0% 0% n/a n/a 1.8% 0% n/a n/a % Other / not specified 19.0% 18.2% n/a n/a 0% 0% n/a n/a n/a n/a 0% 12.5% n/a n/a % Asian 14.3% 9.1% n/a n/a 3.6% 6.3% n/a n/a % American Indian or Alaska Native 4.8% 0% n/a n/a 1.8% 6.3% n/a n/a % Multiracial 0% 0% n/a n/a 3.6% 0% n/a n/a % Other / not specified 0% 18.2% n/a n/a 5.4% 0% n/a n/a To translate across species, we performed a new analysis of published data from rats exposed to chronic methamphetamine (Groman et al., 2018). Rats chose between three operant chamber noseports with differing probabilities of sucrose reward (70%, 30%, and 10%; Figure 1d and e). Contingencies switched between the 70% and 10% noseports after selection of the highest reinforced option in 21 out of 30 consecutive trials ( Figure 1e). This task was most similar in structure to the first blocks of online versions 2 and 4. There was no increase in unexpected volatility mid-way through the task. Rats were tested for 26 within-session reversal blocks (Pre-Rx, n = 10 per group), administered saline or methamphetamine according to a 23 day schedule mimicking the escalating doses and frequencies of chronic human methamphetamine users (Groman et al., 2018), and tested once per week for four weeks following completion of the drug regimen (Post-Rx; n = 10 saline, seven methamphetamine) (Groman et al., 2018). Relative to rats exposed to saline, those rats exposed to methamphetamine exhibited increased win-switch behavior, similar to what we has observed in the high paranoia human participants, and additionally, unlike humans, they perseverated after negative feedback (Groman et al., 2018).

Computational modeling
We employed hierarchical Gaussian filter (HGF) modeling to compare belief updating across individuals with low and high paranoia, as well as across human participants and rats exposed to methamphetamine (Table 3). We paired a three-level perceptual model with a softmax decision model dependent upon third level volatility ( Figure 2a). We inverted the model from subject data (trial-bytrial choices and feedback) to estimate parameters for each individual ( Figure 2b). Level 1 (x 1 ) characterizes trial-by-trial perception of task feedback (win or loss in humans, reward or no reward in rats), Level 2 (x 2 ) distinguishes stimulus-outcome associations (deck or noseport values), and Level 3 (x 3 ) renders perception of the overall task volatility (i.e., frequency of reversal events, changes in the stimulus-outcome associations). Belief trajectories were unique to each subject due to the probabilistic, performance-dependent nature of the task, so we estimated initial beliefs (priors) for x 2 and x 3 (m 2 0 and m 3 0 , respectively). We also estimated w 2 , the tonic volatility of stimulus-outcome associations. Lower w 2 indicates that subjects are slower to adjust beliefs about the value of each option; they maintain rigid beliefs about the underlying probabilities. The k parameter captures the impact of phasic volatility on updating stimulus-outcome associations. In the setting of our experiments, k approximates the influence of unexpected uncertainty. Higher k implies faster updating of stimulus-outcome associations -that is, participants are more likely perceive volatility as reversal events. Our final parameter of interest, w 3, characterizes perception of 'meta-volatility,' such as changes in the frequency of reversal events (Lawson et al., 2017). The lower w 3 , the slower a subject is to adjust their volatility belief; they adhere more rigidly to their volatility prior (m 3 0 ).
Priors did not differ between groups at x 2 ( Table 3) but paranoid individuals and rats exposed to methamphetamine exhibited elevated m 3 0 , they expected greater task volatility (Figure 2b, blue). In n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a % Other or not specified 0% 0% 1.4% 0% 0% 0% 0% 0% n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a % Doctorate or professional degree 4.4% 0% 0% 5.6% 1.8% 0% 1.6% 0% n/a n/a n/a n/a n/a n/a % Completed some postgraduate 0% 0% 1.4% 5.6% 1.8% 0% 3.1% 0% n/a n/a n/a n/a n/a n/a % Other / not specified 0% 0% 0% 5.6% 0% 0% 0% 0% n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a Over $100,000 0% 0% 5.8% 5.6% 3.6% 6.3% 1.6% 0% n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a % Asian 8.9% 10.0% 7.2% 0% 3.6% 6.3% 7.8% 0% n/a n/a n/a n/a n/a n/a % American Indian or Alaska Native 0% 0% 0% 0% 1.8% 6.3% 0% 0% n/a n/a n/a n/a n/a n/a % Multiracial 15.8% n/a n/a n/a n/a n/a n/a % Other / not specified 0% 0% 1.4% 0% 5.4% 0% 0% 0% n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a % Schizophrenia spectrum 2.2% 0% 0% 0% 0% 6.3% 0% 0% n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a .037). Both paranoid humans and rats administered chronic methamphetamine had strong beliefs that the task contingencies would change rapidly and unpredictably -in other words, they expected frequent reversal events. Methamphetamine exposure made rats behave like humans with high paranoia (Figure 2b, Post-Rx condition, orange). This is particularly striking when compared to human data from the first task block (before contingency transition), when task designs are most similar across experiments. Rats; split-plot ANOVA (i.e., repeated measures with between-subjects factor). Paranoid participants and methamphetamine exposed rats updated stimulus-outcome associations more strongly in response to perceived volatility (e.g., correctly or incorrectly inferred reversals; Figure 2b). k showed significant paranoia group and block effects across the in laboratory experiment and online version 3 ( Thus, learning was more strongly driven by unexpected uncertainty in high paranoia participants and rats chronically administered methamphetamine; they were faster to interpret volatility as reversal events than their low paranoia and saline exposed counterparts. Expected uncertainty (w 2 ) was decreased in paranoid participants and rats exposed to methamphetamine ( Figure 2b). In laboratory and online (version 3), paranoid individuals were slower to update stimulus-outcome associations in response to expected uncertainty (  maintained rigid beliefs about the underlying option probabilities relative to low paranoia and saline controls. This was associated with perseverative behavior in the rats but not in humans. Meta-volatility learning (w 3 ) was similarly decreased across paranoia and methamphetamine exposed groups (in laboratory, online version 3, and rats: MD META = À1.155, CI=[À2.139,-0.171], z META = À2.3, p=0.021), suggesting more reliance on expected task volatility (i.e., anticipated frequency of reversal events) than on actual task feedback. In laboratory, we observed a block by paranoia group interaction ( . These data indicate that paranoia and methamphetamine are associated with slower learning about changes in task volatility, suggesting greater reliance on volatility priors than task feedback.

Mental Health
In summary, our modeling analyses suggest the following about paranoia in humans and methamphetamine exposed animals: they expect the task to be volatile (high m 3 0 ), their expectations about task volatility are more rigid (low w 3 ), and they confuse probabilistic errors and task volatility as a signal that the task has fundamentally changed (high k, low w 2 ). We applied False Discovery Rate (FDR) correction for multiple comparisons of each model parameter (Hochberg and Benjamini, 1990). k group effects survived corrections within each experiment ( Table 4). In addition to k, m 3 0 survived for experiment 1; m 3 0 and w 2 survived in online version 3; and m 3 0 , w 2 , and w 3 survived in experiment three as group effects. Such correction is not yet standard practice with this modeling approach (Lawson et al., 2017;Powers et al., 2017;Sevgi et al., 2016) but we believe it should be, and when effects survive correction we should increase our confidence in them.

Paranoia effects across task versions
To examine the relationship between beliefs about contingency transition and paranoia within our HGF parameters, we performed split-plot, repeated measures ANOVAs across all four task versions. Paranoia group effects were specific to versions of the task in which we explicitly manipulated uncertainty via contingency transition which increased volatility ( Figure 3,  Figure 3a). m 3 0 also exhibited a paranoia by version trend (  Figure 3a). There were no significant paranoia effects or interactions for w 3 ( Table 5). In sum, our contingency shift manipulation -from easily discerned options to underlying   probabilities that are closer together -increased unexpected uncertainty the most, particularly in highly paranoid participants, compared to the other task versions.

Covariate analyses
We completed three ANCOVAs for each HGF parameter derived from Experiment 2: demographics (age, gender, ethnicity, and race); mental health factors (medication usage, diagnostic category, BAI score, and BDI score); and metrics and correlates of global cognitive ability (educational attainment, income, and cognitive reflection; Tables 6 and 7 perception of unexpected uncertainty -was the only parameter whose main effect of paranoia (higher k in high paranoia participants) and paranoia-by-version interaction (higher k in high paranoia participants as a function of increasing unexpected volatility in version 3) survived covariation for demographic, mental health and cognitive covariates. We are most confident that high paranoia participants have higher unexpected uncertainty which drives their excessive updating of stimulus-outcome associations.

Relationships between parameters and paranoia
We found a significant correlation between k and paranoia scores ( Figure 4). However, depression and anxiety were also related to k, and indeed, paranoia and depression correlate with one another, in our data and in other studies (Na et al., 2019). In order to explore commonalities among the rating scales in the present data, we performed a principle components analysis ( Figure 5), identifying three principle components. The first principle component (PC 1) explained 82.3% of the variance in the scales and loaded similarly on anxiety, depression, and paranoia. It correlated significantly with kappa (r = 0.272, p=0.021). Depression, anxiety and paranoia all contribute to PC1. We suggest that this finding is consistent with the idea that depression and anxiety represent contexts in which paranoia can flourish and likewise, harboring a paranoid stance toward the world can induce depression and anxiety. Item Prompt 1 A folder and a paper clip cost $1.10 in total. The folder costs $1.00 more than the paper clip. How much does the paper clip cost? 2 If it takes 5 clerks 5 min to review five applications, how long would it take 100 clerks to review 100 applications? 3 In a garden, there is a cluster of weeds. Every day, the cluster doubles in size. If it takes 48 days for the cluster to cover the entire garden, how long would it take for the cluster to cover half of the garden?

Multiple regression
In order to make the case that our observations were most relevant to paranoia, we examined the effects of paranoia, anxiety, and depression on k within the online version three dataset with multiple regression.  Figure 4) revealed a much stronger relationship when analyses were restricted to individuals with paranoia scores greater than 0 (i.e., endorsement of at least one item); among participants who denied all questionnaire items, a minority (seven out of 32) exhibited elevated k. To account for the possibility that some individuals with severe paranoia may avoid disclosing sensitive information, we performed additional analyses of participants who endorsed one or more paranoia item. The correlation between paranoia and k in the first block of the task increases from r = 0.

Behavior and simulations
Win-switching was the prominent behavioral feature of both paranoid participants and rats exposed to methamphetamine ( Table 1, Table 2; Groman et al., 2018). Collapsed across blocks and task versions, our Experiment 2 data demonstrated a main effect of paranoia group (Figure 3b; F(1 (Kong et al., 2017), a metric of behavioral variability employed by behavioral ecologists (increasingly an inspiration for human behavioral analysis [Fung et al., 2019]), particularly with regards to predator-prey relationships (Humphries and Driver, 1970). When a predator is approaching a prey animal, the prey's best course of action is to behave randomly, or in a protean fashion, in order to evade capture (Humphries and Driver, 1970). The more protean or stochastic the behavior, the closer to the U-value is to 1. Across task blocks, paranoid participants exhibited elevated choice stochasticity (paranoia by version interaction, F(3, 298)=3.438, p=0.017, h p 2 =0.033; Table 2). Post-hoc tests indicate that this stochasticity was specific to versions with contingency transition, suggesting a relationship to unexpected uncertainty (Figure 3b; version 3, F(1, 298)=17.585, Figure 6. Parameter effects on simulated task performance. We simulated behavior from low paranoia participants (online Version 3, n = 54) to evaluate the effects of k,m 3 0 , w 2 , and w 3 on win-shift and lose-stay rates. Estimated perceptual parameters were averaged across subjects to create a single set of baseline parameters. Additional parameter sets were created by doubling or halving one parameter at a time (e.g., 2 k or 0.5 k), while the others were held constant (n.b., 2 w 2 violated model assumptions and was excluded from analysis). We also included the average parameter values of rats exposed to methamphetamine (Meth). Ten simulations were run per subject for each condition (i.e., parameter set). Win-shift and lose-stay rates were calculated, then averaged across simulations and subjects. Rates from each condition were divided by the baseline condition rate to generate relative win-shift and lose-stay rates. We compared relative rates for each condition to the baseline (relative rate of 1, depicted as the dotted line; paired t-tests, Bonferroni-corrected p-values). Of note, baseline parameters were positive for k and w 2, and negative for m 3 0 and w 3 . Consequently, the doubled (2x) condition makes m 3 0 and w 3 more negative (lower). (n = 54). Box-plots: center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots; crosses represent sample means; data points are plotted as open circles; *p 0.05, **p 0.01, ***p 0.001. To test the propriety of our model, we simulated data for each subject in online version 3 and determined whether or not key behavioral effects (Figure 7a, Table 8). To demonstrate the effects of parameters on task performance, we performed additional simulations in which we doubled or halved a single parameter at a time from the baseline average of low paranoia participants. These results confirmed the impact of k, w 2 , and w 3 on winshift behavior (Figure 4). Parameter recovery revealed significant correlations for k and w 2 between original subject parameters and those estimated from simulations ( Figure 6; w: r = 0. were less consistently recovered, as noted in previous publications (Brö ker et al., 2018). Thus, the model we chose, with meta-volatility and three coupled layers of belief, successfully simulates the key features of paranoid behavior: higher win-switching and stochastic choice.

Alternate models
Our model is complex and other simpler reinforcement learning models might explain behavior on this task. Given the win-switching behavior we sought to understand, we fit a model from Lefebvre and colleagues that instantiated biased belief updating via differential weighting of positive and   negative prediction errors (Lefebvre et al., 2018). Fitting this model to online version 3, we saw no significant paranoia group differences in learning rates for positive or negative prediction errors in parameters derived from all 180 trials (independent samples t-test: a + , t(70)=-0.532, p=0.597; a -, t  Table 9. We can also simplify within our hierarchical Gaussian Filter framework. The model we chose had three layers of beliefs and the highest level seemed to capture most of the task and paranoia effects of interest (Figure 8). To confirm this suspicion, we removed the third layer, fitting an HGF model that had beliefs about outcomes and deck values but no beliefs about volatility, no unexpected volatility learning rate, nor meta-volatility. This model failed to capture the task effects or group differences in its parameters (see Table 9).
Therefore, a more complicated model, one that captures higher-level beliefs about contingency transitions or learning when to learn, seems most appropriate, and indeed, that type of model was able to simulate the key features of our data (Palminteri et al., 2017). Future work will compare and contrast different potential computational models included, but not limited to Bayesian Hidden State Markov Models (Hampton et al., 2006), as well as switching (Gershman et al., 2014) and volatile Kalman Filters (Piray and Daw, 2020).

Clustering analysis
Given the apparent similarity in effects of paranoia and methamphetamine in humans and rats, respectively (Figure 2b), we searched for latent structure in our data using two-step cluster analysis (Tkaczynski, 2017). This approach sorts subjects into groups (clusters) on the basis of some experimenter-selected variables such as estimated model parameters. The goal is to find distinct subsets in the data such that each cluster exhibits a cohesive pattern of relationships between the variables. Whereas some clustering approaches require the experimenter to predefine the expected number of clusters, two step-clustering determines both the optimal number of clusters and the composition of each cluster. The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better the clustering. Behavioral switching patterns replicate across in laboratory and online version three experiments. Perseveration after negative feedback (lose-stay behavior) did not significantly differ between paranoia groups or task block. (b) Simulated data generated from HGF perceptual parameters (version 3). Win-switch rate, U-value and Lose-stay rate of the simulated data are depicted. The model simulated data replicate the win-switch and U-value behavioral differences between high and low paranoia participants presented in panel a. Like the real participants, there was no difference in lose-stay rates in the simulated data. Center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots; crosses represent sample means; data points are plotted as open circles. *p 0.05, **p 0.01, ***p 0.001. Plots of participant behavioral metrics (a) are presented side by side with simulated data (b). Figure 9. Cluster analysis of HGF parameters. Two-step cluster analysis of model parameters (w 3 , m 3 0 , k , w 2 ) across rat and human data sets (rat, post-Rx; in laboratory and online version 3, block 1). Automated clustering yielded an optimal two clusters with good cohesion and separation (average silhouette coefficient = 0.7; cluster size ratio = 2.46). (a) Density plots for m 3 0 , k, w 2 , and w 3 (light pink) depict cluster-specific distributions for each parameter (red). Unlike frequency histograms (that depict the number of data points in bins), density plots employ smoothing to prioritize distribution shape and are not restricted by bin size. Beneath each density plot, box-plots of overall median, 25 th quartile, and 75 th quartile for each parameter are aligned (pink), with cluster medians and quartiles superimposed (red). Relative to the overall distribution, Cluster 1 (n = 35) medians are elevated for m 3 0 and k, decreased for w 2 and w 3 . Cluster 2 (n = 86) falls within each overall distribution. (b) Predictor importance of included parameters. Consistent with the color scheme in Figure 2a, Uncertainty weighting parameters (k, w 2 , w 3 ) are depicted in purple and m 3 0 the prior is in blue. (c) Distribution of cluster identities within groups. Black bars signify the proportion of group members assigned to Cluster one and gray bars represent the proportion of group members assigned to Cluster 2. Cluster one membership is significantly associated with paranoia and methamphetamine groups (c 2 (1, n = 121)=29.447, p=5.75E-8). Columns display means [standard error] or Figure 9 continued on next page Considering that paranoia and methamphetamine exposure share a pattern of elevated m 3 0 and k accompanied by decreased w 2 and w 3 (Table 10), we hypothesized that these four variables would yield a distinct cluster: a 'paranoid style' across species. We analyzed m 3 0 , k, w 2 , and w 3 estimates derived from the first block of experiment one and online version 3 (pre-context change data, because rats do not experience a context shift) with post-chronic exposure rat data (methamphetamine and saline). We identified two clusters with good cohesion and separation, meaning that subjects sorted into two groups (each containing rodents and humans) whose parameters travelled in such a way that their values were close to the centroid or mean of the cluster they were in and as far as possible from the centroid of the other cluster (average silhouette coefficient = 0.7; cluster size ratio = 2.46; Figure 9a). All parameters contributed to clustering; k contributed most strongly ( Figure 9b). Importantly, the cluster solution did not separate rats from humans (despite the differences in task structure, incentives, manipulanda, and phylogeny). Relative to the overall distribution, Cluster one was characterized by high k and m 3 0 , and decreased w 2 and w 3 . Cluster one membership was significantly associated with high paranoia and methamphetamine exposure, c 2 (1, n = 121) =29.447, p=5.75E-8, Cramer's V = 0.493 (Figure 9c). Notably, no participants in the low paranoia group with paranoia scores above zero were ascribed Cluster one membership. The cluster solution was robust to validation by split-half analysis (removing half of the participants and repeating the clustering), removal of the rat subjects, and removal of human participants. In each case, we identified two clusters with good cohesion and separation (Split-half 1, n = 19 cluster 1, 42 cluster 2: silhouette coefficient = 0.6; Split-half 2, n = 17 cluster 1, 43 cluster 2: silhouette coefficient = 0.7; No Rat, n = 26 cluster 1, 78 cluster 2: silhouette coefficient = 0.7; Rat Only, n = 6 cluster 1, 11 cluster 2: silhouette coefficient = 0.7). In summary, paranoid participants and methamphetamine-exposed rats cluster together (high m 3 0 , high k, low w 2 , and low w 3 ), suggesting that these parameters share an underlying generative process and that paranoia and methamphetamine have similar effects on reversal-learning. Table 10. Summary of paranoia/methamphetamine effects on belief-updating.
In lab Online Rats ⇡ ⇣ Non-significant increase/decrease in high paranoia or meth, relative to low paranoia or saline " # Trend-level increase/decrease in high paranoia or meth, relative to low paranoia or saline pq Significantly higher/lower in high paranoia or meth, relative to low paranoia or saline --No significant findings or trends † Baseline trend; parameter decreases in second block for low but not high paranoia ‡ Version 3 only § Trend-level significance disappears with inclusion of demographic covariates ¶ Significance reduced to trend with inclusion of demographic covariates.

Discussion
During non-social probabilistic reversal-learning, paranoid individuals and rats chronically exposed to methamphetamine have higher initial expectations of task volatility (m 3 0 ). In other words, they start the task anticipating more changes in stimulus-outcome associations, and they switch choices readily and excessively in anticipation of reversal events. By relying more on their expectations of volatility than on actual experience (exemplified by switching even after positive feedback), they are slower to learn about changes in task volatility. This manifests as decreased meta-volatility learning (w 3 ) and failure to significantly adjust m 3 0 after contingency transitions. More paranoid individuals are similarly slower to adjust expected deck values (lower w 2 ) but faster to attribute volatility to reversal events (elevated k), perceiving change (unexpected uncertainty) instead of normal statistical variation (expected uncertainty). They sit at Hofstadter's 'turning point', constantly expecting change but never learning appropriately from it. In the reversal learning literature, choice switching after positive feedback has garnered less attention than perseverative behavior and sensitivity to negative feedback (Izquierdo et al., 2017;Waltz, 2017). Individuals with depression and schizophrenia seemingly perseverate less than healthy controls, but this has formerly been attributed to increased sensitivity to negative feedback (Waltz, 2017;Robinson et al., 2012). However, elevated win-switch tendencies have been reported in youths with bipolar disorder, major depressive disorder, and anxiety disorder (Dickstein et al., 2010). A prior study in people with schizophrenia described excessive win-switch behavior that correlated with the severity of delusional beliefs and hallucinations (Waltz, 2017). Likewise, an elevated prior on environmental volatility (m 3 0 ) and higher sensitivity to this volatility (k) have been observed in HGF analyses of 2-choice probabilistic reversal-learning in medicated and unmedicated patients with schizophrenia (Deserno, 2018). These authors did not explore paranoia specifically.
We assessed paranoia across the continuum of health and mental illness, provided three choice options, and explicitly manipulated unexpected volatility across task versions. The version that shifted from an easier to a more difficult contingency context (version 3) was associated with paranoia group effects on m 3 0 , k, and w 2 , and a meta-analytic effect on w 3 . Furthermore, this contingency transition -an exposure to truly unexpected volatility -rendered low paranoia controls more similar to their paranoid counterparts by decreasing their meta-volatility learning (w 3 ). Paranoid participants responded to contingency transitions in version 3 and version four by switching stochastically. These findings suggest a continuum of behavioral responses to volatility, moving from optimal learning to diminished feedback sensitivity (i.e, decreased w 3 in low paranoia participants) and from diminished feedback sensitivity (lower w 3 and increased win-switching in high paranoia participants) toward complete dissociation from experienced feedback (stochastic switching). Unexpected uncertainty, the perception of change in the probabilities of the environment -particularly 'unsignaled context switches" (Yu and Dayan, 2005) which increase unexpected volatilityis thought to promote abandonment of old associations and new learning. However, our results suggest that this response might vary according to a hierarchy of belief. Paranoid participants were quick to abandon 'best deck' associations and explore alternative options (i.e., x 2 beliefs), but in turn they relied more on their higher-level beliefs about the task volatility (x 3 beliefs) and less on sensory feedback (lower metavolatility learning). Our analysis of covariates warrants specific focus on k, the sensitivity to unexpected volatility. Other parameter-paranoia associations did not endure after controlling for demographic factors (age, gender, ethnicity, and race), although we see their derangement in our rodent study as well as in the significant meta-analytic effects across our experiments. Furthermore, these demographic factors are themselves strong predictors of paranoia (Holt and Albert, 2006;Iacovino et al., 2014;Mahoney et al., 2010). It is notable too that k was the most powerful discriminator of the two clusters of human and animal participants. We conclude that elevated k -belief updating tethered to unexpected volatility -is the parameter change most robustly associated with paranoia. Doubling k in our simulations induced significantly more win-switching.
Multiple neurobiological manipulations may induce such win-switching behavior. Lesions of the mediodorsal thalamus in non-human primates (Chakraborty et al., 2016) or neurons projecting from the amygdala to orbitofrontal cortex in rats (Groman et al., 2019) engender win-switching. Unexpected uncertainty, and the k parameter of the HGF in particular (Marshall et al., 2016), are thought to be signaled via the locus coeruleus and noradrenaline (Yu and Dayan, 2005;Payzan-LeNestour and Bossaerts, 2011;Payzan-LeNestour et al., 2013;Tervo et al., 2014). This mechanism is thought to modulate switching versus staying behaviors (Kane et al., 2017;Aston-Jones and Cohen, 2005;Aston-Jones et al., 1999;Eldar et al., 2013), as well as responses to stress (Borodovitsyna et al., 2018;McCall et al., 2015;Atzori et al., 2016) and subliminal fear cues (Liddell et al., 2005) to coordinate fight-or-flight responses (Atzori et al., 2016). The dual role of the locus coeruleus in recognizing and responding to threats as well as unexpected uncertainty suggests that dysfunction could produce both paranoia and the inferential abnormalities we observed. Methamphetamine may induce similar dysfunction (Ferrucci et al., 2019;Ferrucci et al., 2013;Ferrucci et al., 2008). Acute moderate doses increase pre-synaptic catecholamine release, particularly noradrenaline (Rothman et al., 2001), and induce exploratory locomotive effects modulated through adrenoceptors on dopamine neurons (Ferrucci et al., 2013).
Excessive release of noradrenaline from the locus coeruleus into the anterior cingulate cortex drives feedback insensitivity and stochastic switching behavior in rats completing a three-option counter prediction task (Tervo et al., 2014). Evolutionarily, departure from predictable, rational actions might offer an adaptive mechanism for escape from intractable threat. As a protean defense mechanism, behavioral stochasticity impedes predators' abilities to create accurate, actionable countermeasures (Humphries and Driver, 1970;Richardson et al., 2018;Humphries and Driver, 1967). If driven by excessive unexpected uncertainty, underwritten by noradrenaline, protean defense may represent a heavily conserved, continuous common mechanism underlying vigilance and false alarms Rajkowski et al., 1994;Usher et al., 1999), arousal-linked attentional biases (Eldar et al., 2013) and selective processing of social threats. However, protean behaviors are not necessarily adaptive. Pathological insensitivity to feedback and reliance on internal beliefs over evidence constitute a 'break from reality' -in other words, psychosis.
When confronted with intractable unexpected uncertainty our participants rely on higher-level beliefs about the task environment. When humans experience non-social volatility, (For example through threats to their sense of control [Whitson and Galinsky, 2008] or exposure to surprising non-social stimuli [Proulx et al., 2012;Heine et al., 2006]), they appeal to the influence of powerful enemies, even when those enemies' influence is not obviously linked to the volatility (Sullivan et al., 2010). Our account places the locus of paranoia at the level of the individual. Here, our account departs from evolutionary accounts of paranoia grounded in coalitional threat (Raihani and Bell, 2019; persecutors are not scapegoats that increase group cohesion. Rather, when paranoid, we have a ready explanation for hazards. With a well-defined persecutor in mind, a volatile world may be perceived to have less randomly distributed risk (Sullivan et al., 2010). However, paranoia might become a self-fulfilling prophecy, engendering more volatility and negative social interactions. This aspect may be captured in our task through win-switch behavior. By failing to incorporate positive feedback from the best option, paranoid individuals sample sub-optimal options which delivers misleading positive feedback.
There are some important limitations to our conclusions. Compared with humans, rats are relatively asocial. But they are not completely asocial. In our experiment they were housed in pairs, and, more broadly, they evince social affiliative interactions with other rats (Donaldson et al., 2018;Kondrakiewicz et al., 2019;Urbach et al., 2010). A further limitation centers on the comparability of our experimental designs. In humans our comparisons were both within (contingency transition) and between groups (low versus high paranoia). In rats, the model was also mixed with some between (saline versus methamphetamine) and some within-subject (pre versus post chronic treatment) comparisons. We should be clear that there was no contingency context transition in the rat study. However, just as that transition made low paranoia humans behave like high paranoia, chronic methamphetamine exposure made rats behave on a stable contingency much like high paranoia humans -even in the absence of contingency transition. The comparable results across species, despite these differences, warrant the inference that our basic, relatively asocial, approach provides a robust tool for computational dissection of learning mechanisms.
Social interactions play a rich and undeniable role in paranoia, but translational, domain-general approaches may ultimately facilitate biological insights into paranoia, psychosis and delusions (Corlett and Fletcher, 2014;Feeney et al., 2017). Whilst we contend that our task is relatively free of social features (certainly compared to others [Raihani and Bell, 2017]), the possibility remains that the elevated U-values in our participants are reflective of attempts (and perhaps failures) to predict our intentions as experimenters. Indeed, this is a possibility raised previously with regards to simple conditioned behaviors in experimental animals. Even during Pavlovian conditioning, animals may attempt to infer a generative model of the task environment, which might, ultimately, include the experimenter arranging the contingencies (Gershman and Niv, 2012;Gershman and Niv, 2010). It is possible that all instances of human cognitive testing involve an element of inference by the participant with regards to the intentions of the experimenter, whether or not the task at hand is explicitly social, and indeed, all cognitive functions may be aimed at or modulated by such inferences (Turner et al., 1994).
In summary, a strong belief in the volatility of the world necessitates hypervigilance and a facility with change. However, in paranoia, that belief in the volatility of the world is itself resistant to change, making it difficult to reassure, teach, or change the minds of people who are paranoid. They remain 'on guard,' adhering to expectations over evidence. By using a non-social task, we have shown that this paranoid style is not restricted to the social domain, and that it can be modeled in relatively asocial animals. Additionally, our domain-general approach reaffirms the merit of establishing expectations of a stable, predictable environment to promote recovery from paranoia-associated illness (Powers et al., 2018). We note with interest the apparent relationship between conspiratorial ideation and societal crisis situations (terrorist attacks, plane crashes, natural disasters or war) throughout history, with peaks around the great fire of Rome (AD 64), the industrial revolution, the beginning of the cold war, 9/11, and contemporary financial crises (van Prooijen and Douglas, 2017). In today's world of escalating uncertainty and volatilty -particularly environmental climate change and viral pandemics -our findings suggest that the paranoid style of inference may prove particularly maladaptive for coordinating collaborative solutions.

Materials and methods
Experiments were conducted at Yale University and the Connecticut Mental Health Center (New Haven, CT) in strict accordance with Yale University's Human Investigation Committee and Institutional Animal Care and Use Committee. Informed consent was provided by all research participants.

Experiment 1
English-speaking participants aged 18 to 65 (n = 34) were recruited from the greater New Haven area through public fliers and mental health provider referrals. Exclusion criteria included history of cognitive or neurologic disorder (e.g., dementia), intellectual impairment, or epilepsy; current substance dependence or intoxication; cognition-impairing medications or doses (e.g. opiates, high dose benzodiazepines); history of special education; and color blindness. Participants were classified as healthy controls (n = 18), schizophrenia spectrum patients (schizophrenia or schizoaffective disorder; n = 8), and mood disorder patients (depression, bipolar disorder, generalized anxiety disorder, post-traumatic stress disorder; n = 8) on the basis of clinician referrals and/or self-report. Participants were compensated $10 for enrolment with an additional $10 upon completion. Two healthy controls were excluded from analyses due to failure to complete the questionnaires and suspected substance use, respectively.

Experiment 2
332 participants were recruited online via Amazon Mechanical Turk (MTurk). The study advertisement was accessible to MTurk workers with a 90% or higher HIT approval rate located within the United States. To discourage bot submissions and verify human participation, we required participants to answer open-ended free response questions; submit unique, separate completion codes for the behavioral task and questionnaires; and enter MTurk IDs into specific boxes within the questionnaires. All submissions were reviewed for completion code accuracy, completeness of responses (i. e., declining no more than 30% of questionnaire items), quality of free response items (e.g., length, appropriate grammar and content), and use of virtual private servers (VPS) to submit multiple responses and/or conceal non-US locations (Dennis VPS paper, 2018). Upon approval, workers were compensated $6. Those who scored in the top 25% on the card game (reversal-learning task) earned a $2 bonus. We rejected or excluded 19 submissions that geolocation services (https://www.iplocation.net/) identified as originating outside of the United States or from suspected server farms, four submissions for failure to manually enter MTurk ID codes, and two submissions for insufficient questionnaire completion. Submissions with grossly incorrect completion codes were rejected without further review.

Experiment 3
Subject information, behavioral data acquisition, and behavioral analyses were described previously (Groman et al., 2018). Long Evans rats (Charles River; n = 20) ranged from 7 to 9 weeks of age. Rats were exposed to escalating doses and frequency of saline (n = 10) or methamphetamine (n = 10, three withdrawn during dosing), imitating patterns of human methamphetamine users (Segal et al., 2003;Han et al., 2011). Prior to dosing (Pre-Rx), rats completed 26 within-session reversal sessions, including up to eight reversals per session. Post-dosing (Post-Rx), rats completed one test session per week for four weeks. Computational model parameters were estimated from each session and averaged across treatment conditions to yield one Pre-Rx and Post-Rx set of parameters per rat.

Behavioral task
Participants completed a 3-option probabilistic reversal-learning paradigm. Three decks of cards were displayed on a computer monitor for 160 trials. Participants selected a deck on each trial by pressing the predesignated key. We advised participants that each deck contained winning and losing cards (+100 and À50 points), but in different amounts. We also stated that the best deck may change. Participants were instructed to find the best deck and earn as many points as possible. Probabilities switched between decks when the highest probability deck was selected in 9 out of 10 consecutive trials (performance-dependent reversal). Every 40 trials the participant was provided a break, following which probabilities automatically reassigned (performance-independent reversal).
In Experiment 2, the task was administered via web browser link from the MTurk marketplace. We changed the task timing to self-paced and eliminated null trials and inter-trial jittering. A progress tracker was provided every 40 trials. Workers were randomly assigned to one of four task versions, using restricted block randomization to ensure comparable numbers of high paranoia participants across task versions. Version one had a constant contingency of 90-50-10%. Version 4 maintained a constant contingency of 80-40-20%. Version 3 replicated the 90-50-10% (block 1) to 80-40-20% (block 2) context transition of Experiment 1. Version 4 presented the reversed contingency transition, 80-40-20% (block 1) to 90-50-10% (block 2). We analyzed attrition rates across the four versions.

Questionnaires
Following task completion, questionnaires were administered via the Qualtrics survey platform (Qualtrics Labs, Inc, Provo, UT). Items included demographic information (age, gender, educational attainment, ethnicity, and race) and mental health questions (past or present diagnosis, medication use, Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) (Ryder et al., 2007), Beck's Anxiety Inventory (BAI) (Beck et al., 1988), Beck's Depression Inventory (BDI) (Beck et al., 1961). We removed the single suicidality question from the BDI for Experiment 2. Experiment 2 included additional items: income, three cognitive reflection questions (Table 7), and three free response items ('What do you think the card game was testing?', 'Did you use any particular strategy or strategies? If yes, please describe', and 'Did you find yourself switching strategies over the course of the game?'). We quantified trait-level paranoia using the paranoid personality subscale of the SCID-II, and we included an ideas of reference item from the schizotypy subscale ('When you are out in public and see people talking, do you often feel that they are talking about you?') This item, along with other SCID-II items, has previously been included as a metric of paranoia in the general population (Bebbington et al., 2013;Bell and O'Driscoll, 2018). Participants who endorsed four or more paranoid personality items (i.e., the cut-off for the top third identified in Experiment 1) were classified as 'high paranoia.' Each participant's SCID-II, BAI, and BDI scores were normalized by total scale items answered. Response rates were higher than 90% for all questionnaire items and scales (Table 11).

Behavioral analysis
We analyzed tendencies to choose alternative decks after positive feedback (win-switch) and select the same deck after negative feedback (lose-stay). Win-switch rates were calculated as the number of trials in which the participant switched after positive feedback divided by the number of trials in which they received positive feedback. Lose-stay rates were calculated as number of trials in which a participant persisted after negative feedback divided by total negative feedback trials. In Experiment 1, we excluded post-null trials from these analyses. To further characterize switching behavior, we calculated U-values, a measure of choice stochasticity: where b is the number of possible choice options (i.e., card decks or noseports) and a equals the relative frequency of choice option i (Kong et al., 2017). To avoid any choice counterbalancing effects across reversals, choice frequencies were determined by the underlying probabilities of the decks rather than their physical attributes (e.g., deck position or color). Additional behavioral analyses included trials to first reversal, trials to post-reversal recovery, and trials to post-reversal switch. The latter two were restricted to the first reversal in the first block. Trials post-reversal were counted from the first-negative feedback trial following the true reversal event. Recovery was defined as switching to the best deck and staying for at least one additional trial.

Perceptual parameter estimation
In the human reversal-learning experiments, we estimated perceptual parameters individually for the first and second halves of the task (i.e., blocks 1 and 2). Each participant's choices (i.e., deck 1, 2, or 3) and outcomes (win or loss) were entered as separate column vectors with rows corresponding to trials. Wins were encoded as '1', losses as '0', and choices as '1', '2', or '3'. We selected the autoregressive 3-level HGF multi-arm bandit configuration for our perceptual model and paired it with the softmax-mu03 decision model.
Rat reversal-learning data was entered similarly, with choices designated as '1', '2', or '3' and reward presence or absence noted as '1' and '0', respectively. Perceptual parameters were estimated as a single block per session and averaged across Pre-Rx or Post-Rx sessions for each subject. Since the contingency remained 70-30-10%, we used the default start point values of m 2 and m 3 , as in block one estimations for the human reversal-learning experiments).

Simulations
We performed ten simulations per participant (online version 3) to determine whether our parameter estimates and model successfully captured behavioral differences between groups (e.g., win-switch rates). Each simulation required the participant's actual data (i.e., the column vectors 'outcomes' and 'choices') and the corresponding set of derived perceptual parameters. On each trial, a new choice was simulated conditional on the actual inputs in previous trials.
To illustrate the effects of each parameter on task behavior we doubled or halved one parameter at a time, by establishing a baseline set of perceptual parameters containing the average values from the low paranoia participants (online version 3). We then ran 10 simulations per subject for each of the following conditions: baseline, 2k, 0.5k, 2m 3 0 , 0.5m 3 0 , 2w 3 , 0.5w 3 , 2w 2 , 0.5w 2 , and the average perceptual parameters (k, m 3 0 , w 3 , and w 2 ) from Post-Rx methamphetamine rats. The 2w 2 condition yielded parameters in a region where model assumptions were violated (negative posterior precision error message) and was excluded from further analysis. Win-shift and lose-stay rates were calculated from each simulation as follows, and then averaged for each condition: WinÀswitch rate ¼ Number of trials in which choice switched after positive feedback Total positive feedback trials LoseÀstay rate ¼ Number of trials in which choice repeated after negative feedback Total negative feedback trials For each participant, we divided rates derived from each condition by the baseline rates to determine relative win-switch and lose-stay rates. We compared each relative rate to the baseline condition (i.e., 1.0) with paired-samples t-tests using Bonferroni-corrected p-values.

Parameter recovery
We performed perceptual parameter estimation (see above) on 10 simulations per subject using first block data from online version 3. These simulations were generated from each subject's corresponding perceptual parameters. We averaged recovered parameters across simulations and low versus high paranoia (Figure 7).

Alternative models
We employed a Q-learning model with separate parameter weights for positive and negative prediction errors to determine whether differential weighting might contribute to paranoia group effects. This model has been described previously (Lefebvre et al., 2018). We also evaluated whether a simpler two-level HGF model might suffice to capture paranoia group differences. To sever the third level from the model, we fixed the logk parameter at negative infinity (i.e., by additionally setting the variance to zero), and similarly fixed the values of m 3 , w 3 , w 2 , F 3 at the values previously assigned in the configuration file. Parameter estimation was performed as described above, with a softmax decision model.

Statistics
Unless otherwise specified, statistical analyses and effect size calculations were performed in IBM SPSS Statistics, Version 25 (IBM Corp., Armonk, NY), with an alpha of 0.05. Box-plots were created with the web tool BoxPlotR (Spitzer et al., 2014). Model parameters were corrected for multiple comparisons using the Benjamini Hochberg (False Discovery Rate) method. Bonferroni corrected results were largely consistent ( Table 4).
To compare questionnaire item means between two groups ( Table 1, low versus high paranoia), we conducted independent samples t-tests. To compare questionnaire item means across paranoia groups and task versions ( Table 2), we employed univariate analyses. Associations between characteristic frequencies and subject group or task version were evaluated by Chi-Square Exact tests (two groups) or Monte Carlo tests (more than two groups). Pearson correlations established the associations between paranoia and BDI scores, BAI scores, win-switch rates, and k. We selected two-tailed p-values where applicable and assumed normality. Multiple regressions were conducted with k estimates from the first task block (dependent variable) and paranoia, BAI, and BDI scores from online version 3.
To compare HGF parameter estimates and behavioral patterns (win-switch, U-value, lose-stay) across block, paranoia group (Experiment 1, Experiment 2 version 3), and/or task version (Experiment 2), we employed repeated measures and split-plot ANOVAs (i.e., block designated within-subject factor, paranoia group and task version as between subject). We similarly evaluated Experiment three parameter estimates for treatment by time interactions. For Experiment 2, we performed ANCOVAs for m 3 0 , k, w 2 , and w 3 to evaluate three sets of covariates: (1) demographics (age, gender, ethnicity, and race); (2) mental health factors (medication usage, diagnostic category, BAI score, and BDI score); (3) and metrics and correlates of global cognitive function (educational attainment, income, and cognitive reflection). Unless otherwise stated, post-hoc tests were conducted as least significant difference (LSD)-corrected estimated marginal means. Meta-analyses were conducted using random effects models with the R Metafor package (Viechtbauer, 2010). Mean differences were assessed for low versus high paranoia groups in the in-laboratory experiment and online version 3. Standardized mean differences (methamphetamine or high paranoia versus saline or low paranoia) were employed to account for the differences in task design between animal and human studies.
The 2-step clustering analysis approach was selected to automatically determine optimal cluster count and cluster group assignment. Clustering variables included paranoia-relevant parameter estimates (m 3 0 , k, w 2 , and w 3 ) from Experiment 1 (block 1); online, version 3 (block 1), and rats (Post-Rx) as continuous variables with a Log-likelihood distance measure, maximum cluster count of 15, and Schwarz's Bayesian Criterion (BIC) clustering criterion. We validated our clustering solution by sorting the data into two halves and running separate cluster analyses. We also compared cluster solutions derived exclusively from rat data versus human data. A Chi-Square test determined the significance of the association between cluster membership and group (methamphetamine/high paranoia versus saline/low paranoia).