A Computational Analysis of Abnormal Belief Updating Processes and Their Association With Psychotic Experiences and Childhood Trauma in a UK Birth Cohort

Background Psychotic experiences emerge from abnormalities in perception and belief formation and occur more commonly in those experiencing childhood trauma. However, which precise aspects of belief formation are atypical in psychosis is not well understood. We used a computational modeling approach to characterize belief updating in young adults in the general population, examine their relationship with psychotic outcomes and trauma, and determine the extent to which they mediate the trauma-psychosis relationship. Methods We used data from 3360 individuals from the Avon Longitudinal Study of Parents and Children birth cohort who completed assessments for psychotic outcomes, depression, anxiety, and two belief updating tasks at age 24 and had data available on traumatic events assessed from birth to late adolescence. Unadjusted and adjusted regression and counterfactual mediation methods were used for the analyses. Results Basic behavioral measures of belief updating (draws-to-decision and disconfirmatory updating) were not associated with psychotic experiences. However, computational modeling revealed an association between increased decision noise with both psychotic experiences and trauma exposure, although <3% of the trauma–psychotic experience association was mediated by decision noise. Belief updating measures were also associated with intelligence and sociodemographic characteristics, confounding most of the associations with psychotic experiences. There was little evidence that belief updating parameters were differentially associated with delusions compared with hallucinations or that they were differentially associated with psychotic outcomes compared with depression or anxiety. Conclusions These findings challenge the hypothesis that atypical belief updating mechanisms (as indexed by the computational models and behavioral measures we used) underlie the development of psychotic phenomena.


Supplementary Methodology Information
Further information on assessment of trauma A binary measure of trauma was derived from 121 questions selected from 48 assessments completed contemporaneously by the child or their parents from the ages of 0-17 years, and from a questionnaire completed by the young person at age 22 to supplement information on sexual abuse that was almost entirely parent-reported in earlier questionnaires. The questions covered sexual, physical and emotional abuse, as well as emotional neglect, bullying and exposure to domestic violence, and were selected on the basis that they would be deemed as being highly upsetting for almost everyone who experienced them. Participants were coded as having been exposed to a trauma if they endorsed any of the questions relating to these traumas between ages 0 to 17 years.
Participants were coded as non-exposed if they had not endorsed any of the questions, and participated in at least 50% of assessments. For further details on how this measure was derived see Croft et al 2019 1 .

Further information on Assessment of Psychotic Outcomes
The interviewers were psychology graduates trained in using the PLIKSi, and blind to previous PLIKS assessments. Interviewers had to score >0.9 agreement with 'gold-standard' ratings on 2 audio-recorded interviews before they were able to start collecting data for the study. At regular intervals, a psychiatrist rated samples of recorded interviews to ensure that the interviewers were rating experiences correctly.

At-risk mental state for psychosis
Individuals with a current at-risk mental state for psychosis were identified by relating the PLIKS interview data at age 24 to the Structured Interview for Prodromal Symptoms (SIPS) 2 4 definitions of prodromal symptoms and the Comprehensive Assessment of At-Risk Mental State (CAARMS) 3 criteria (see Sullivan et al 2020 4 for more detail).

Psychotic disorder
We classified individuals as having a psychotic disorder if i) they were rated as having a definite psychotic experience not attributable to the effects of sleep or fever, ii) this had recurred regularly (at least once per month) averaged over the previous 6 months, and iii) they reported this as either very distressing, or having a very negative impact on their social or occupational functioning, or having led them to seek help from a professional source 4 .
Behavioural Tasks: Procedure and Design Each participant took part in two separate tasks: a 'Draws-To-Decision' (DTD) task and a 'Probability Estimation Task'. Both tasks have been used to assess belief formation in clinical populations in a substantial number of previous studies (see main text for details). For both tasks, each participant was instructed in person by a trained and experienced experimenter, who answered the participants' questions, carefully checked task comprehension, and conducted a practice with the participant to ensure they understood the instructions. To further support the experimenter's judgment, future studies might benefit from a more formal assessment of task comprehension at the end of the experiment. Verbal instructions were supported by on-screen text and illustrations (detailed in the next two sections). Participants received no re-imbursement other than travel expenses to attend the clinic.

Draws-to-Decisions Task
At the start of the experiment, the experimenter presented participants with an illustration of two jars with two different colours of beads. Participants were told that the jars contain 100 beads with inverse proportions of coloured beads at a ratio of 80:20. They were then told that the computer would randomly choose a bead from one of the jars, show it to them, and then put it back in the jar. After each presentation of a bead, the participant could either state which jar the bead was drawn from or request to see another bead, which was drawn from the same jar. There was no time limit to making a decision. Participants could request up to ten beads before deciding from which jar the beads were being drawn. The number of beads that were requested is referred to as 'draws'.

You have to decide which jar the beads are coming from
The computer will randomly choose a bead from one of the jars, show it to you, and then put it back in the jar You can ask the computer to show you more beads to help you decide which jar the beads are coming from, up to 10 beads in total You should decide as soon as you are sure about which jar the beads are coming from You will see the previous beads you have seen on the bottom of the screen

Current bead
First bead Previous bead 7 You will do this task 5 times -each time, the computer will choose a jar at random

Probability Estimation Task
The basic setup of the Probability Estimation Task was identical to that of the previous task: participants were presented with an illustration of two jars, again filled with two different colours of beads. They were told that the jars contain 100 beads with inverse proportions of coloured beads at a ratio of 80:20. Again, they were informed that the computer would randomly choose a bead from one of the jars, show it to them, and then put it back in the jar. Participants were told that they would be shown a sequence of 30 beads. Every time a bead was presented, the participant had to rate how certain they were about which jar the beads were being drawn from. Every participant was shown the entire sequence of 30 beads. They were also told that the jar from which the beads were being drawn may or may not change during the task at any point in the sequence and may change multiple times. Participants were not able to see the sequence of previously presented beads in this task. Similar to the previous task, there was no time limit to making a decision.
The bead sequence for the Probability Estimation Task was identical for each participant (0 = Red; 1 = Blue): 111 0 1 0 1111111 0 1 0000 1 000000 1 000 The on-screen text and illustrations that supported the experimenter's verbal task instructions were as follows: The computer will randomly choose a bead from one of the jars, show it to you, and then put it back in the jar This time you will be shown more than 10 beads Every time you are shown a bead you have to rate how sure you are of which jar the beads are coming from Use the mouse to click along the scale Draws to Decision Task Parameters Two behavioural measures of the task were derived. First, Draws to Decision (DTD), as widely used in previous literature 6 , indexes the average number of beads drawn by participants across five trials of the task. Second, the 'Jumping to Conclusions' bias that is also widely used in previous literature 7, 8 is a binary measure that indexes whether a decision was reached based on an average of 2 or fewer beads across the trials.
Two computational parameters were derived using a fully Bayesian model, as described in previous studies 9, 10 . The task consists of a partially-observable Markov decision process. It is a decision process in the sense that participants only have to decide when they have enough evidence to declare a decision in favour of one of the jars or the other. It is Markovian, because at every point in the task, after every draw, the evidence can be simply summarized as the number of draws of each colour drawn. Given this 'state description', it does not matter what the past history of draws is. Finally, it is partially observable because the coloured draws induce uncertain beliefs about the underlying jars, which are not directly observed. The task is so simple that a full dynamic programming solution can be computed. This means, all the possible draw scenarios between the current draw and the end of each trial can be considered, and on their basis the value of declaring for one colour or the other, or drawing again, can be calculated. Although this is easy for the computer model, it is hard for people to do. Fortunately, we can parameterize the 'effective depth' of people's cognition, which then becomes a measure of goal-directed or future-thinking ability, by a decision or cognitive noise parameter. This 'blurs' the model's ability to consider deep scenarios, effectively discounting cognitively distant potential observations. This is the decision-noise parameter.
Cost of sampling As the value (i.e. penalty or reward) for drawing additional beads is not stated, the cost of sampling estimates the subjective value that participants attribute to requesting additional beads. A high cost of sampling could account for the JTC bias by demonstrating that there is a consistent strategy where a greater cost is the basis for requesting less information before deciding. A higher perceived cost of sampling would indicate a greater desire to complete the task quickly, which may be due to motivational factors including subjective opportunity cost (i.e, one has better things to do with one's time than this task!), or possibly intolerance of uncertainty or subjective cost to self-esteem when requesting further information 9,10 . The cost of sampling index was not normally distributed and difficult to transform; therefore, when examining this as an outcome, we derived a dichotomous variable which grouped the top 10% of participants versus the bottom 90% ( Figure S3). As sensitivity analyses we also examined cut-offs at the 85 th and 95 th percentiles. When modelled as an exposure, we used the continuous measure, with/without a quadratic term, and also examined variables dichotomised at different cut-offs in sensitivity analyses.
The model is described in detail elsewhere 9,10 but in brief, these parameters affect decision making in the following way. Cost of sampling is defined relative to the (fixed) cost of making the wrong decision ; the possible decisions being that the jar is blue ( ), the jar is red ( ), or to sample again ( ). The value of sampling again and getting a red ( ) or blue ( ) bead, given the number of beads drawn of which were red, and given the underlying jar is red ( ) rather than blue ( ), is computed as: Here the probabilities are computed using Bayes theorem and the values of states are the summed products of values of actions that can be taken from those states and the probabilities of taking those actions. The probability of taking action (rather than ) in state , is computed according to a standard softmax function, incorporating decision noise : Values of states far in the future therefore depend on multiple actions at successive timepoints, and thus as becomes higher, the effective planning horizon becomes shorter, as it becomes less likely that sequences of optimal actions will be selected and so values of distant actions have little effect on values of current states.
We did not fit multiple models to the data and perform model selection for several reasons. The first is that this model and its variants have been validated for this task in multiple studies using both clinical and community samples 9,11,12 . Second, it is already very compact, estimating just two parameters, which is an advantage when so little data (five trials) is available per participant. Third, and related to this, the compactness of the model allowed us to easily use model-fitting techniques that are likely to optimise detection of individual variability, i.e. correlation with unmodelled participant characteristics as we shall see in the next section. Fourth, this model has the ability to capture a number of heuristic strategies that participants use (such as collecting all the data available and then deciding, as described above) which is important in large samples where it is unlikely that all participants will use the same cognitive strategy. Finally, our aim here was to test hypotheses closely related to the constructs captured by this model.

Model-fitting
The model-fitting approach used here relied on a full mapping of the likelihood function over a fine parameter grid covering the entire range of psychological interest of the two parameters in question. The nature of data collected in large, multi-measure studies like the present one poses particular challenges. The often-used maximum-likelihood parameter estimates are often very noisy due to the small number of datapoints per participant, necessitating the use of prior distributions over parameters. We have shown that using empirical prior distributions which do not take into account the variables external to the model (such as genetic, demographic and psychometric scores) is prone to suppressing variability due to these external variables of interest, making their effect harder to detect 13 . This theoretical work suggested that when no exploratory analyses are to be performed, the most sensitive approach is to incorporate hypothesis testing in the construction of the empirical priors; however, a compromise which is almost as good and allows greater flexibility for exploratory analyses is to use weak, regularizing priors. This is the approach we used here, diverging from the hierarchical (mixed-effects) fits of previous work which dealt with a priori defined participant groups 9,10 . Here, we constructed 200 x 200 parameter grids in log-temperature x log-subjective-cost space. These spanned the range from almostdeterministic to completely random behaviour (log decision noise from -5 to 5) and subjective cost of sampling (log (-cost) from -5 to 50 which was negligible compared to the reference cost of getting the answer wrong, to cost high enough to result in immediately declaring a decision, at the very first, obligatory item of information seen). Along each parameter dimension, we imposed a weak Gaussian prior of mean 0.83 and SD of 2.86. The mean was based on preliminary analysis of a random 100 participants by simple maximumlikelihood, and the SD was taken to be such that +/-3 SD was approximately twice as broad as the grid range. We then calculated the posterior probability of the data at each point of the grid, approximating the maximum with a parabolic fit.
Decision Noise Decision noise in this task is usually not normally distributed and difficult to transform. As described in 'Methods', participants are simply told to gather information until they want to declare their decision, but no external cost was imposed per item of information in our simple version of the task. Therefore, sophisticated participants who are patient can follow a simple strategy, which is to gather all the information available and then decide (rather than consider step-wise decisions). The model can capture this well, as participants with very low decision noise, resulting in a bimodal distribution. Similarly, the model can capture very erratic participants as having very high decision noise. This flexibility results in a psychologically meaningful, but highly non-normal distribution. For analysis of this as an outcome, we thus grouped the top 10% of participants versus the bottom 90%, though as sensitivity analyses we also examined cut-offs at the 85 th and 95 th percentiles. When this was an exposure, we modelled the continuous measure, with/without a quadratic term, and also examined variables dichotomised at different cut-offs in sensitivity analyses.

Probability Estimation Task Parameters
A behavioural measure of 'disconfirmatory updating' was derived, based on previous studies of the task 14,15 . This measure is the absolute value of the change in estimation after seeing a bead of a different colour to at least two identically-coloured preceding beads (e.g. the change in estimation when seeing a blue bead after seeing two or more red beads). This type of update happened several times during the sequence, so each participant's disconfirmatory update score was the mean of all these updates.
For computational modelling of the probability estimation task, we tested five models and selected the winning model based according to the Bayesian Information Criterion (BIC). The models were:

1) Pearce-Hall model
The Pearce-Hall model 16 is similar to a Rescorla-Wagner model -i.e. it learns contingencies by incremental updating by the product of a prediction error and a learning rate -but its learning rate is not fixed, but varies from trial to trial: the learning rate takes the value of the prediction error on the previous trial, unless this was 0, in which case the learning rate remains the same as it was on the previous trial. We used the Pearce-Hall model implemented in the Tapas Hierarchical Gaussian Filter toolbox (version 5), available from http://www.translationalneuromodeling.org/tapas/: this analysis used the perceptual model (which incrementally updates beliefs about the jars) 'tapas_ph_binary' and the response model (which maps from beliefs to the participant's response on the sliding scale) 'tapas_beta_obs', and the standard (weakly informative) prior settings over parameters. These were: initial ~ 0.5,0 , learning rate ~ 0.5,1 , stimulus intensity ~ 0.1,8 and response stochasticity determined by the precision of a beta distribution ~ 128,4 (where : , the conventional parameters of the beta distribution). The variances given here refer not to the parameters' native space, which in many cases is bounded, but to the unbounded space they were transformed to for estimation purposes.

2) Hierarchical Gaussian Filter: 2 levels
The HGF is a hierarchical Bayesian inference scheme which gives a principled account of how beliefs are updated on acquiring new data, using individual priors over parameters, and has been used many times(e.g 14,17 ) model this or similar tasks. Please see these references for full descriptions of the model: the following is a summary. This analysis used the perceptual model 'tapas_hgf_binary_scaled' and response model 'tapas_beta_obs'.
Bayes' theorem allows us to calculate the posterior belief that either jar is currently the source of the beads, if we combine the likelihoods that characterise each jar with a prior belief about the probability of either jar being correct. The likelihoods characterising each jar were simply the proportions of bead colours within each jar. The difference between prior and posterior belief constitutes a prediction error that can be used to learn the higherorder dynamics of changes between jars. In other words, the prediction errors that occur in the face of Bayes-optimal predictions are used to infer the current state of the environment.
At the bottom of the model is the bead drawn u (k) on trial k and the probability x1 (k) that draws are coming from the red jar. At the level above this is x2, the tendency towards the red jar. For x2 = 0, both jars are equally probable. This quantity is hidden from the participant and must be inferred: the participant's posterior estimate of x2 is μ2, and the participant's posterior estimate of the probability of the jar being red is a sigmoid function of this quantity, s(μ2) -equivalent to the prediction (denoted by ^) on the next trial ̂ . .
Before seeing any new input on trial k the model's expected jar probability ̂ and precisions (inverse variances) , of the expectations at each level are given by: generates a prediction error and the model updates and generates a new prediction as follows: The subject's response (i.e. where on the continuous or Likert scale they responded) is determined by ̂ and the precision of the response model's beta distribution .
Updates to x2 are driven by the product of the prediction errors from Bayesian updating explained above and a learning rate determined by parameter ω2: changes in x2 from trial to trial occur according to a Gaussian random walk whose variance depends upon the static parameter ω2: ~ , exp .
The parameters ω2 and ν were estimated individually for each participant. The (weakly informative) prior probability distributions for their values were: ~ 3,16 and ~ 128,4 . The model's prior beliefs at the start of the sequence were fixed at μ2 (0) = 0 (i.e. believing each jar to be equally likely).

3) Hierarchical Gaussian Filter: 2 levels, with non-linear updating
This HGF model is exactly the same as Model 2, but with one additional parameter κ1 which causes non-linear belief updates (see Adams et al, 2018 14 ). In this model, changes in μ2 from trial to trial occur according to two parameters: ω2, the variance of the Gaussian random walk, and κ1, a scaling factor that changes the size of updates when ̂ = 0.5, or maximum uncertainty, relative to when ̂ is closer to 0 or 1, i.e. when the participant is more confident about either jar. Formally, the scaling occurs as:

̂ ≡
When κ1 > 1, updating towards 1 on observing a blue bead (u = 1) is greatest (i.e. switching between jars becomes more likely) when ̂ < 0.3; when κ1 < 1, updating is comparatively far lower when ̂ < 0.3. This means that when κ1 > 1, the agent readily switches between jars on receiving unexpected evidence, but finds it difficult to become more confident about the current jar on receiving consistent evidence. The reverse is the case for when κ1 < 1.
κ1 also scales the updates for the precision and mean at second level of the model (it is fixed to 1 in Model 2), thus: The parameters κ1, ω2 and ν were estimated individually for each participant. The prior probability distributions for their values were: ~ 1,1 , ~ 3,16 and ~ 128,4 . The model's prior beliefs at the start of the sequence were fixed at μ2 (0) = 0 (i.e. believing each jar to be equally likely).

4) Hierarchical Gaussian Filter: 3 levels, Autoregressive (AR1) volatility model
This HGF model used the perceptual model 'tapas_hgf_ar1_binary' and the response model 'tapas_beta_obs'. Its first two levels are exactly as described in Model 2. It also has a third level which models volatility in the contingencies, which can affect the learning rate at the level below. A relatively short sequence of 30 beads (with only one change) is too short to explore participants' estimation of volatility in detail, other than to assess whether their learning rate changes during the course of the sequence (see below).
At the top level of the model, x3 (and its posterior estimate μ3) encodes the phasic volatility (more properly, the log-volatility) of x2 which determines the probability of the jar changing at any point. Parameters which affect the degree to which x2 and x3 can change during the experiment include m, φ, ω3 and ω2.
Changes in x2 from trial to trial occur according to a Gaussian random walk whose variance depends upon both static and dynamic factors. The volatility x3 has a dynamic influence on this learning rate (alongside the static influence of ω2), so that in a more volatile environment one learns more quickly: ~ , exp .
x3 evolves according to an autoregressive (AR(1)) process controlled by three parameters: m, a level of volatility to which x3 is attracted, φ, the rate of change of x3 towards m, and ω3, the variance of the random process: ~ , exp . In effect, m describes the level of volatility in high-level beliefs that one wishes to entertain. The AR(1) process at this third level can therefore account for individual differences in updating as the sequence evolves: for example, participants who take (proportionally) longer to infer there has been a change of jar than they did to infer the correct jar at the start of the sequence, known in psychology as a 'reversal learning' impairment. The model would account for this by having a higher initial volatility estimate that subsequently declines to a new level m during the sequence.
The parameters m, φ, ω3 and ω2 were estimated individually for each participant. The

5) Finite Retrospective Inference Hidden Markov model
This model 18,19 is described in detail elsewhere: the following is a summary of its important features. It assumes that participants are using Bayesian inference to determine which of two jars is currently favoured (similar to Models 2-4), but in addition, it allows participants to update a key parameter as the sequence unfolds, rather than assuming that all parameters are fixed for the duration of the sequence (as Models 1-4 do). This parameter is r, the reversal probability -i.e. the probability the jars will switch on any given trial (see below).
Otherwise, the model is a standard Hidden Markov model, containing two states (corresponding to the majority red and majority blue jars). On each trial it updates the probability of these states according to adjustment rate a and reversal probability r. Adjustment rate a is equivalent to the cue validity, and determines the likelihood (matrix A) of observations given the state:

1
Here, the columns represent the two states (red and blue majority jars, respectively), and the rows the probabilities of observing the outcomes (red and blue beads respectively). Reversal probability r determines the transition (i.e. jar change) probabilities (matrix T) from trial to trial: Here, the columns represent the two states (red and blue majority jars, respectively) on trial n, and the rows the same two states on trial n+1. The agent's placement of the cursor on the slider -given the probability of the states it has inferred -was parameterised by a beta distribution response model (as in Models 1-4), with precision ν. To make the response noise measures of the same form as the decision noise in the Draws-To-Decision task -i.e. lower values reflecting less 'noise' -we report the variance (inverse) of the response distribution, 1/ .
Most models of cognition only consider inference about current states given past states: known as Bayesian 'filtering'. The optimal use of all available information, however, would also involve updating beliefs about the past given new information, known as Bayesian 'smoothing'. This is impossible for agents performing online inference, as it means they would have to store and update all their beliefs about the past. A computationally parsimonious approximation is to store beliefs about the past up to a fixed window length (of L trials), and only these beliefs are updated on receiving new information, known as 'fixed-lag smoothing'. This retrospective belief updating does not affect current and future beliefs (i.e. a filtering agent would make the same responses as a filtering and smoothing agent in the Probability Estimation task), however, unless this retrospective information is used to update parameter estimates online 19 : For example, in the Probability Estimation beads task, the participant is uncertain both about the jar from which beads are being drawn, and about the reversal probability parameter, i.e. the probability this jar will change at any given trial. To perform better, participants may therefore update their beliefs not just about states (i.e. inference) but also parameters (i.e. learning) during the task. For example, if a participant begins the task thinking that the reversal probability r is around 0.2, but after 25 trials they have only seen one probable reversal, they would be wise to revise this estimate of r downwards, as otherwise they will overestimate the probability the jar has changed if they see an unexpected bead colour next trial. This effect would manifest as a difference in updating in the second versus the first half of the sequence (one example of such is an impairment in so-called 'reversal learning', i.e. becoming confident about the initial jar fairly quickly, but taking much longer to decide that it has changed). This also illustrates the mutual dependency between beliefs about states and beliefs about parameters: it follows that performing fixed-lag inference about past states improves one's current parameter estimates, and thus future performance 19 .
In the Finite Retrospective Inference model, participants can update their beliefs about r as the sequence continues, with or without using beliefs about past states. Two parameters determine the degree to which this happens: the window length L over which fixed-lag smoothing is performed, and the confidence in r, (a Dirichlet parameter over the transition matrix T).
can be seen as the number of observations about transitions that the participant has made prior to the start of the sequence, so is inversely related to how much they are willing to update this belief henceforth. Therefore a low number (e.g. 1) means they are very uncertain about r, and might revise their beliefs about it if given evidence to do so: a high number (e.g. 300) means they are very certain about r and will not update it whatever happens during the sequence.
We included this model because it could potentially capture numerous possible abnormalities of inference that might relate to psychotic experiences: not just altered reversal probability or adjustment rate, but also a reluctance to update parameters in the face of evidence, or a reduced time window (i.e. short term memory) from which evidence for parameter updating can be drawn.
The log joint probability distribution for the whole model depends on a vector of observations : , a vector of hidden states : , an initial distribution over ( ), , , and : Model inversion can then be performed by iterating variational update equations for the states and parameters 18,19 .

Model fitting and comparison
Model estimation was performed using Matlab R2015a (Natick, Massachusetts: The MathWorks Inc). Models 1-4 were fitted using the HGF toolbox using the priors over parameters documented above. Model fitting for Model 5 was performed according to maximum likelihood, after a grid search was performed over model parameters using the following values: Model 1 (Pearce-Hall) performed poorly and fitting failed in a majority of subjects: this is likely because it was unable to accommodate the pattern of belief updates that occur when subjects quickly switch between two underlying states but then make relatively small belief updates following this large change.
For the remaining models, we computed the BIC for each model given their k parameters, n=30 data points and maximum likelihood according to:

BIC ln 2ln
We converted the BICs to an approximation of the log model evidence: We then performed Bayesian model comparison 20,21 using the approximate log model evidences and spm_BMS (SPM12, Wellcome Centre for Human Neuroimaging). The results are shown in Figure S2: the probability of each model performing best for any given subject is shown on the left, and the probability of each model being the best overall, over and above chance (the protected exceedance probability) on the right. Model 5 (Finite Retrospective Inference HMM) was the winner. The second placed model (Model 3), like Model 5 but unlike the remaining models, had parameters that permitted differential updating to disconfirmatory versus confirmatory evidence. Model 5 does so because reversal probability has opposite effects on the size of these updates; Model 3 does so thanks to its belief instability parameter κ1 -this was the winning model in a previous study of belief updating in schizophrenia 14 .

Model 5 parameters
The distributions of Model 5 parameters are shown in Figure S3. The correlation between parameters is shown in Supplementary Table 1. Because most parameters were not normally distributed and could not straightforwardly be transformed into normal distributions, thresholds were used in these cases to group subjects into categories when examining these as outcomes: Reversal probability, r: As the parameter was not normally distributed and difficult to transform, the top 10% of the participants (with r>0.5) were grouped separately from the bottom 90% when examining this parameter as an outcome. As sensitivity analyses we also examined i) a cut-off at the 85 th percentile, ii) a cut-off at the 95 th percentile.

Adjustment rate, a:
This measure was transformed on a logarithm scale when examining this as an outcome.
Window length, L: This measure was transformed on a logarithm scale to the power of 2 when examining this as an outcome.
Confidence in r (Dirichlet parameter), : This distribution contains a large minority of subjects who are very uncertain about r (i.e. is close to 1, around 33%), a small number of subjects who do not update their belief about r at all ( >300, around 6%), and the rest are intermediate. Given the published literature on belief updating and psychosis, one could hypothesise that psychotic experiences could be associated with either end of this spectrum: i.e., either maximal or minimal uncertainty about the reversal probability. Therefore, a three-category variable was used reflecting high, middle and low parameter uncertainty (with the middle category used as the baseline).
Response noise, 1/ ν: The response precision parameter (ν) was inverted for subsequent analysis to make it compatible with the decision noise parameter from the DTD task (i.e. for both parameters, higher values mean greater 'noise'). It was then transformed on a logarithm scale when examining this parameter as an outcome.

Confounding Variables
Maternal education was collected at 32 weeks of pregnancy based on achievement of 'O' levels; subject-specific qualifications that were generally obtained at age 16 years (the minimal school leaving age from 1974 in England). Household income was based on equivalised income reported between 33-47 months of age separated into quintiles.
Polygenic risk scores (PRS) for schizophrenia were generated in previous work by Jones and colleagues 22 as the weighted mean number of disorder risk alleles in approximate linkage equilibrium. Scores were standardized using Z-score transformation. Risk alleles were defined as those associated with case-status in recent large consortia analyses of schizophrenia (40,675 cases and 64,643 controls) 23 . Risk alleles were defined as those associated at p < 0.05 as this threshold has previously been shown to maximally capture phenotypic variance for schizophrenia 23 . IQ was assessed at age 8 years using the Wechsler Intelligence Scale for Children (3rd edition). Executive functioning was assessed at age 8 years using the opposite worlds task from the Test of Everyday Attention for Childhood (TEA-Ch). Working memory was assessed at age 8 years using the WISC-III Digit Span task.

Supplementary Results
We simulated data using the winning model to show how higher r (reversal probability) relates to the behavioural measures of increased updating to disconfirmatory evidence and reduced updating to consistent evidence ( Figure S4). It can be seen that even a small change in r -from 0.05 (red line) to 0.075 (blue line) -leads to an increase in updating to 'disconfirmatory' beads 4 and 27, for example, but after consistent evidence for one jar (e.g. from beads 7-13 or 21-26), the blue line is further from certainty about the jar. Other parameter values in these simulations were a = 0.7, L = 1, = 320 (i.e. no updating of r occurs during the task).
We performed a parameter recovery analysis for the winning Probability Estimates task model because it has a relatively large number of parameters (5) given the number of data points (30). We simulated 100 datasets using parameter values drawn randomly from the ranges observed in the sample ( Figure S3). We then estimated parameters from these simulated data, and computed the Spearman correlations between the ground truth and recovered parameter values ( Figure S5). Three parameters were estimated with great accuracy (all ρ>0.9) -r (reversal probability), a (adjustment rate) and ν (response precision): note that r and ν were the only parameters associated with psychotic experiences and trauma. Two parameters could not be recovered accurately (both ρ<0.2) -(confidence in r) and L (window length) -and likely require longer data sequences to be estimated reliably.
We performed an additional sensitivity analysis for the Probability Estimates task modelling results by checking whether subjects who appeared to make some irrational or 'outlier' responses might distort the associations between model parameters and other variables. Outlier subjects were defined as those whose responses showed minimal variability (i.e., all 30 responses were between 0.45 and 0.55) and those who made at least two 'impossible' responses, of being certain that the jar was the opposite colour to the last bead they had seen. 287/3611 subjects (8%) fulfilled at least one of these criteria.
Median parameter values in the 'outlier' and 'non-outlier' groups (respectively) were: r = 0. 10  Overall, it is clear that removing subjects with outlier responses makes little difference to the median parameter values, even using a liberal threshold. Crucially, given reversal probability in the outlier subjects was consistently estimated to be lower than in the remaining subjects, outliers are unlikely to explain any association between psychotic experiences and a higher reversal probability. We also performed model comparison in the outlier and non-outlier groups separately: in both cases, Model 5 won convincingly.
We also assessed associations between the parameters of the runner-up model, Model 3, and psychotic experiences, as a previous modelling study of patients with schizophrenia 14 found that the belief instability parameter κ1 and response precision (similar to inverse decision noise) ν were higher and lower (respectively) in these individuals in two separate datasets. We did not find any significant associations between these parameters (or the two other parameters) and psychotic experiences in this population cohort, however.

Missing data
In those with observed data on the DTD task and psychotic experiences, the proportion of missing data was as follows:   This is a small change but the effects on inference are clear: there is increased updating to disconfirmatory evidence, but reduced updating to consistent evidence. Figure S5: Parameter recovery analysis -Spearman correlations between parameter values used to simulate data (x axes) and parameter values estimated from those data (y axes)