Probabilistic reasoning in schizophrenia is volatile but not biased

We update our beliefs based on evidence. Aberrant belief updating has been linked to schizophrenia and autism. It is not clear whether the faulty updating is due to reduced general cognitive abilities, overweighting of recent information, or lower thresholds for switching from one belief to another. A common task to assess belief updating is the beads task. Patients with schizophrenia show hasty decision-making. We here present a model describing the deviations from an ideal Bayesian observer and apply the model to three independent datasets, totalling n=176 healthy controls and n=128 patients with schizophrenia. The parameters describe a) the number of beads considered (memory), b) systematic deviation and c) unsystematic deviations (volatility) from probability estimates. We find that, on average, patients use fewer beads and or more volatile responding. However, patients have, on average, probability estimates that are closer to the true probabilities. Closer investigations yielded relevant differences among the datasets and sequences used. More challenging sequences improve the performance of patients. Our model captures well the cognitive mechanisms proposed to contribute to the performance differences in the beads task.


Background
You see dark clouds, how probable is it that it will rain? Such probability estimates require integration of previous evidence, and appropriate updating of the odds that it will rain. A commonly used paradigm to assess probabilistic reasoning in the lab is the beads task Phillips and Edwards (1966). Here, participants are presented with jars that contain coloured beads. In the standard version two jars are presented, each containing opposite ratios of beads e.g. 80% black and 20% white and vice versa). One bead at a time is drawn with replacement. Participants make probability estimates, also referred to as graded estimates. A variant of the task, draws to decision, is often used in clinical settings. A prominent finding is that patients with schizophrenia, particularly those with delusions, decide after a few beads from which jar the beads are coming e.g. Huq, Garety, and Hemsley (1988). This has been named the jumping to conclusion bias. Performance in the beads task has been attributed to a) reduced general cognitive abilities, i.e. a lower working memory capacity, b) overweighting of the most recent information and underweighting of previous information, or c) lower thresholds for switching from one belief to another (Moritz & Woodward, 2005). Without a mathematical model one cannot easily distinguish the contribution of those accounts on the jumping to conlusion bias. We here desribe a model that uses the graded estimate version and asks whether patients show more aberrant probability estimates than healthy controls, and whether this is due to basing the inference on fewer evidence. This is an important question, as this allows identifying a cognitive mechanism for the performance difference found in the beads task.

Methods
We analysed three datasets that did assess probability estimates in the beads task. Dataset 1 was kindly provided by S. Moritz. They tested 62 patients with a diagnosis of schizophrenia and 30 healthy controls (S. Moritz et al., 2016). There were two test sessions, but we use only the first test session. We use here only the first two sequences, a symmetric 80:20 and an asymmetric 50:50 vs 80:20 sequence. Participants made both a draws to decision judgement, as well as providing probability estimates for each of the two jars. Dataset 2 is from Peters and Garety (2006), referred to here as P&G, and kindly provided by Rick Adams. We used the first session, which had 23 patients with schizophrenia, and 35 healthy controls. We did not include the non-psychotic patient group here. P&G used a symmetrical 85:15 sequence and participants estimated the probabilities on a 0 to 100 scale. Dataset 3 is from Adams, Napier, Roiser, Mathys, and Gilleen (2018), referred to here as Gilleen. They tested 56 patients with schizophrenia and 111 healthy controls, all tested only once. There are four sequences made of two identical pairs (ratio 80:20), and the order of presenting the four sequences was randomised. Probability estimates were given on a 7 point scale from sure it is jar A, indifferent whether it is jar A or jar B, to sure it is jar B; and only an estimate for one jar was made. Common is that a total of 10 beads were shown sequentially, and beads were drawn with replacement, i.e. the proportion of beads in the jars did not change.

441
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0

The model
The probability that bead k comes from jar A in sequence j can be calculated using Bayes formula. We model here the deviations from an ideal Bayesian observer (IBO). We assume that a participant i may or may not use all available information, and model this with a parameter m, describing the number of beads included in one's probability estimates, i.e. the memory length. m ranges from 0 to 10, i.e. if m i = 0 the participant totally ignores the sampled beads, if m i = 1 one only considers the last bead sampled and so on, and if k ≤ m i one considers all the k beads sampled so far.
We further include two parameters describing systematic and random deviations between the subjective probability estimates and those from the IBO. We define this probability as ρ L i jk for the left bag, and correspondingly the probability for the other bag is 1 − ρ. Note, that in the Moritz dataset participants did indicate both probabilities, but to compare the 3 datasets we used this specification for all three datasets. Since the probability estimates range from 0 to 1, we use a truncated normal distribution which we associate with a normally dis- The mean and standard deviation of π L i jk we define as respectively.
For a i = 1 we have E π L i jk θ i = p L i jk (m i ). Parameter values a i < 1 and a i > 1 model tendencies for individual number i to specify values for ρ L i jk that are too close to a half and too close to zero or one, respectively, relative to the probability p L i jk (m i ) used to decide from which jar the beads are coming. The standard deviation σ i models to what degree the specified probability estimates ρ L i jk of individual number i deviate from E π L i jk θ i . Note that a can take on any value, and a describes how well one can discriminate probabilities. We adopt a hierarchical Bayesian setup and assign a prior distribution for the parameter vector of each individual, θ i = (m i , a i , σ i ). Given a vector of hyper-parameters, ϕ, we assume the parameter vectors θ i , i = 1, . . . , n to be a priori independent. Moreover, given ϕ we assume the components of θ i to be independent. We assume all a i and σ i to be gamma distributed and parameterise these gamma distribution by their mean values and standard deviations. For m i we just assume it has some discrete distribution over the possible values. We assign a vague prior to the vector of hyper-parameters ϕ. Further, we estimate the parameters for each group (patients and controls) and datasets separately.

Results
Participants with a diagnosis of schizophrenia use on average fewer beads, i.e. their m is lower than that for healthy control participants. The systematic deviation a is smaller in patients than in healthy controls (Fig 1). In P&G and Gilleen's datasets both groups have a < 1 but the patient group has a smaller systematic deviation than the control group (see table 1). In the Moritz dataset, on average, the patient group shows nearly no systematic deviation, compared to the more sigmoid behaviour of the control group. However, closer inspection shows that this is due to the asymmetric sequence (see below). The unsystematic deviation was larger in patients, across sequences and consistent for all three datasets (Fig 2). In all three datasets σ was negatively correlated with m, i.e.
the fewer beads considered the more random deviations the participant also showed, however the overall correlation was only r = -.14 with an 95% CI [-.245 -.032]. Similarly, there was an overall positive correlation between systematic and random deviation, r = .167, 95% CI [.059 .271] however, the correlation was negative in the P&G dataset and close to zero in the Moritz dataset, i.e. this relationship was driven by the large number of controls in the Gilleen dataset. Figure 3 illustrates the interaction of the memory parameter and stochastic behaviour for two cases with nearly perfectly linear probability estimates. In the left-hand of Fig 3 changes in observed beads have a high correlation with the specified probabilities; a low m is reasonable, as only the last bead is considered. When the low m explains most of the changes in the specified probabilities σ becomes low. In contrast, on the right-hand of Fig 3 the participant seems more or less to ignore the observed beads, then σ must become large.

Which sequence is most discriminative
Next, we looked at how the type of sequence affects performance. In the Gilleen dataset, we found that the systematic deviation was smallest for the sequences A and D than the sequences B and C. Also the unsystematic deviation σ was lowest for the first sequence participants saw. Healthy controls used the same number of beads in all four sequences whereas patients used around 2 beads more in the sequence B, indicating that they are aware that estimating from which jar the bead comes is harder to judge in a (de facto) 60:40 sequence. Moritz et al. used a symmetrical and an asymmetrical sequence. The asymmetrical sequence yielded in both groups a more sigmoid estimate of the probabilities, most extreme in healthy controls (m = 1.91, sd = .89) compared to patients (m = 1.28, sd = .64). This is in contrast to the symmetric sequence where the systematic deviation was very good for controls (m = .81, sd = .29), whereas patients showed a systematic deviation similar to that seen in P&G and Gilleen dataset (m = .55, sd = .26). The random deviation σ was similar in the symmetric condition but increased for both groups in the asymmetric sequence. Notably controls did use on average one more bead to make their probability estimates in the asymmetric condition.
Combining over all three datasets we found that the difficulty of a sequence mostly increases the unsystematic deviation in patients whereas such an increase in σ was not so profound for healthy controls.

Discussion
Our model captures two important aspects of the beads tasks: understanding probabilities and sequential updating of the information. We find clear differences between healthy controls and patients on all three parameters, and importantly also between the datasets. The number of beads considered is lower in patients than in healthy controls, but both groups evaluate more beads in the asymmetric sequence and in sequences with more even ratios, indicating that both groups are sensi-tive to the cognitive effort required to estimate the probabilities. Reduced cognitive abilities have been linked to the jumping to conclusions bias in the beads task (Garety et al., 2013;Speechley, Whitman, & Woodward, 2010;Woodward, Munz, LeClerc, & Lecomte, 2009). Thus, our model captures this explicitly by modelling the number of beads used for making an inference. Importantly, fewer beads considered does not lead to a more biased probability estimate or always to more unsystematic responding (see Fig 3). We find a smaller systematic deviation in patients than in controls, indicating good discriminability of probabilities, in fact patients treat probabilities more linearly than healthy controls do. The larger systematic deviation in healthy controls might reflect the conservative bias (Phillips & Edwards, 1966), i.e. updating of the probability estimate is less than prescribed by Bayes theorem. Unsystematic or random deviation was larger in patients than in controls, but patients were also more heterogenous than controls. The larger unsystematic deviation found here is in agreement with the results from Moutoussis, Bentall, El-Deredy, and Dayan (2011) who modelled the Draws to Decision version of the beads task. The more stochastic responding may reflect a propensity to see a change in jar, particularly in those cases where the number of beads considered is large (Pfuhl, Sandvik, Biegler, & Tjelmeland, 2015). The model by Adams et al. (2018) and other paradigms support the notion that patients perceive the world as more volatile, i.e. attributing new evidence to a change in the environment (Deserno et al., 2017).
In the Moritz dataset patients show on average very good understanding of probabilities but this was driven by the asymmetric sequence where controls show -contrary to the symmetric sequence -a sigmoid response, i.e. they prefer to dichotomize the probabilities into very sure / absolute sure it comes from jar A or it comes from jar B. In contrast, P&G's and Gilleen's datasets show that both patients and controls have a large range to which they respond with around 50% or could be any of the two jars. This might be driven by how the task was presented. Indeed, Moritz's dataset showed very good memory, i.e. m around 10 for most participants including patients. P&G and Gilleen, on the other hand, indicate that patients use fewer beads when judging probabilities. In Moritz we find a difference in subjective probability, not memory, whereas in P&G and Gilleen the main difference is one of memory, not of subjective probability estimates. Still, in all three datasets, patients on average respond more randomly (Fig 2). We also found that the first presentation is most discriminative between patients and controls. Patients become better on the second sequence, although that could be confounded by the difficulty of the sequences. Also, using asymmetric sequences reduces group differences.
In sum, our model may resolve why some find hypersalience in patients whereas other find a lower decision threshold. Participants use different strategies, some respond bead by bead, seen in a low m whereas others are nearly perfectly Bayesian which can appear in comparison to the healthy control sample as having a lower decision threshold. Probabilities are not treated linearly but, importantly, patients have a lower systematic bias in estimating probabilities than healthy controls.