Cost Evaluation During Decision-Making in Patients at Early Stages of Psychosis

Jumping to conclusions during probabilistic reasoning is a cognitive bias reliably observed in psychosis and linked to delusion formation. Although the reasons for this cognitive bias are unknown, one suggestion is that psychosis patients may view sampling information as more costly. However, previous computational modeling has provided evidence that patients with chronic schizophrenia jump to conclusions because of noisy decision-making. We developed a novel version of the classical beads task, systematically manipulating the cost of information gathering in four blocks. For 31 individuals with early symptoms of psychosis and 31 healthy volunteers, we examined the numbers of “draws to decision” when information sampling had no, a fixed, or an escalating cost. Computational modeling involved estimating a cost of information sampling parameter and a cognitive noise parameter. Overall, patients sampled less information than controls. However, group differences in numbers of draws became less prominent at higher cost trials, where less information was sampled. The attenuation of group difference was not due to floor effects, as in the most costly block, participants sampled more information than an ideal Bayesian agent. Computational modeling showed that, in the condition with no objective cost to information sampling, patients attributed higher costs to information sampling than controls did, Mann–Whitney U = 289, p = 0.007, with marginal evidence of differences in noise parameter estimates, t(60) = 1.86, p = 0.07. In patients, individual differences in severity of psychotic symptoms were statistically significantly associated with higher cost of information sampling, ρ = 0.6, p = 0.001, but not with more cognitive noise, ρ = 0.27, p = 0.14; in controls, cognitive noise predicted aspects of schizotypy (preoccupation and distress associated with delusion-like ideation on the Peters Delusion Inventory). Using a psychological manipulation and computational modeling, we provide evidence that early-psychosis patients jump to conclusions because of attributing higher costs to sampling information, not because of being primarily noisy decision makers.


INTRODUCTION
A consistent psychological finding in schizophrenia research is that patients, especially those with delusions, gather less information before reaching a decision. This tendency to draw a conclusion on the basis of little evidence has been called a jumping-to-conclusions (JTC) bias (Garety & Freeman, 2013). According to cognitive theories of psychosis, a JTC bias is a trait representing liability to delusions. People who jump to conclusions easily accept implausible ideas, discounting alternative explanations and thus ensuring the persistence of delusions (Garety & Freeman, 1999;Garety, Bebbington, Fowler, Freeman, & Kuipers, 2007;Garety et al., 2013). Reviews and meta-analyses have confirmed the specificity, strength, and reliability of the association of JTC bias and psychotic symptoms (Dudley, Taylor, Wickham, & Hutton, 2016;Fine, Gardner, Craigie, & Gold, 2007;Garety & Freeman, 2013;Ross, McKay, Coltheart, & Langdon, 2015;So, Garety, Peters, & Kapur, 2010;So, Siu, Wong, Chan, & Garety, 2015).
The JTC bias is commonly measured in psychosis research using variants of the beads in the jar task (Huq, Garety, & Hemsley, 1988). A person is presented with beads drawn one at a time from one of two jars containing beads of two colors mixed in opposite ratios (Garety, Hemsley, & Wessely, 1991;Huq et al., 1988). JTC has been operationally defined in the beads tasks as making decisions after just one or two beads (Garety et al., 2005;Warman, Lysaker, Martin, Davis, & Haudenschield, 2007). The commonly used outcome in this task is the number of beads seen before choosing the jar, known as draws to decision (DTD). The task requires the participant to decide how much information to sample before making a final decision. This behavior can be compared to the behavior of an ideal Bayesian reasoning agent (Huq et al., 1988). However, when people experience evidence seeking as costly, it is thought that gathering less information could be seen as optimal, leading to monetary gains at the expense of accuracy (Furl & Averbeck, 2011).
Although the JTC bias has been well replicated, the neurocognitive mechanisms underlying it are unknown; many possible psychological explanations have been put forward, not all of which are mutually exclusive (Evans, Averbeck, & Furl, 2015). Motivational factors, such as intolerance of uncertainty (Bentall & Swarbrick, 2003;Broome et al., 2007), a "need for closure" (Colbert & Peters, 2002), a cost to self-esteem of seeming to need more information (Bentall & Swarbrick, 2003), or an abnormal "hypersalience of evidence" (Esslinger et al., 2013;Menon, Mizrahi, & Kapur, 2008;Speechley, Whitman, & Woodward, 2010), have been posited as potentially underlying the JTC bias. A common theme emerging from the intolerance of uncertainty, the need for closure, and the cost to self-esteem hypotheses is that patients experience an excessive cost of sampling information.
Computational models allow researchers to consider important latent factors influencing decisions. The costed Bayesian model (Moutoussis, Bentall, El-Deredy, & Dayan, 2011) incorporates a Bayesian consideration of future outcomes with the subjective benefits or penalty (cost) for gathering additional information on each trial and the noise during the decisionmaking; Moutoussis and colleagues (2011) applied this computational model to the information sampling behavior of a sample of chronic schizophrenia patients undertaking the beads task. They found, contrary to their expectations, that a higher perceived cost of the information gathering did not underpin the JTC bias. Therefore they concluded that differences in the noise of decision-making were more useful in explaining the differences between patients and controls than the perceived cost of the information sampling. However, here we reason that a rejection of the increased cost of information sampling account of the JTC bias based on this finding is premature, as patients with chronic schizophrenia may not be representative of all psychosis patients, especially not those at early stages of psychosis-a stage particularly relevant for understanding the formation of delusions. A variety of different cognitive factors may contribute to the JTC bias; the balance of contributory factors may differ in different patient populations, with noisy decision-making relating to executive cognitive impairments predominating in chronic but perhaps not in early stages of psychosis. Furthermore, Moutoussis and colleagues applied their model to an existing dataset (Corcoran et al., 2008) that used the classic beads task. In this task, no explicit value was assigned to getting an answer correct or incorrect, and no explicit cost was assigned to gathering information. The authors themselves concluded that their work required replication, including incorporation of experimental manipulation of rewards and penalties.
The current study investigated the hypothesis that patients with early psychosis attribute higher costs to information sampling using a novel version of the traditional beads task and computational modeling. Focusing on patients at early stages of psychosis allowed us to investigate the JTC bias before the onset of a potential neuropsychological decline seen in some patients with chronic schizophrenia and to study a largely unmedicated sample of psychosis patients. Specifically, we were interested in testing whether patients adapt their decision strategies when there is an explicit cost of information sampling. We therefore developed a variation of the beads task in which there were blocks with and without an explicit cost of information sampling, and we gave feedback for correct and incorrect answers. This manipulation allowed the comparison between groups on different cost schedules. However, it also allowed us to test the competing hypothesis, which is that psychosis patients jump to conclusions because of primarily noisy decision-making behavior. Under this account, patients should be insensitive to a cost manipulation in the novel setup of the paradigm and apply random decision-making.
This study presents a novel investigation of the processes that lead to reduced information sampling in psychosis. We hypothesized that (a) psychosis patients would gather less information than controls when gathering information is cheap and that (b) psychosis patients and controls would adjust their information sampling according to experimental cost manipulations and that the adjustments would mitigate, but not abolish, the difference between the groups. Finally, we hypothesized that (c) the costed Bayesian model applied to this paradigm and a largely unmedicated early-psychosis group would provide explanatory evidence for JTC bias in favor of less information sampling because of higher perceived costs rather than purely noisy decision-making.

Study Participants
The study was approved by the Cambridgeshire 3 National Health Service research ethics committee. An early-psychosis group (N = 31) was recruited, consisting of individuals with first-episode psychotic illness (N = 14) or with at-risk mental states (ARMS; N = 17) from the Cambridge early intervention service in psychosis (CAMEO). Inclusion criteria were as follows: age 16-35 years and current psychotic symptoms meeting either ARMS or first episode of psychosis (FEP) criteria according to the Comprehensive Assessment of At-Risk Mental States (CAARMS; Morrison et al., 2012;Yung et al., 2005). Patients with FEP were required to meet ICD-10 criteria for a schizophrenia spectrum disorder (F20, F22, F23, F25, F28, F29) or affective psychosis (F30.2, F31.2, F32.3). Healthy volunteers (N = 31) without a history of Computational Psychiatry psychiatric illness or brain injury were recruited as control subjects. None of the participants had drug or alcohol dependence. Healthy volunteers had not reported any personal or family history of neurological or psychotic illness and were matched with regard to age, gender, handedness, level of education, and maternal level of education. None of the patients with ARMS were taking antipsychotic medication, and four patients with FEP were on antipsychotic medication at the time of testing. All of the experiments were completed with the participants' written informed consent.

Behavioral JTC Task
This was a novel task (Figure 1), based on previously published tasks (Garety et al., 1991;Huq et al., 1988) of reasoning bias in psychosis, but amended in the light of decision-making theory, according to which the amount of evidence sought is inversely proportional to the costs of information sampling. These costs include the high subjective cost of uncertainty and the cost to self-esteem or other factors (Moutoussis et al., 2011). Participants were told that there are two lakes, each containing black and gold fish in two different ratios (60:40). The ratios were explicitly stated and displayed on the introductory slide. A series of fish was drawn from one of the lakes; all the previously "caught" fish were visible to reduce the working memory load. The participants were informed that fish were being "caught" randomly from either of the two lakes and then allowed to "swim away." We used a pseudo-randomized order for each trial, which was the same for all participants. The lake from which the fish were drawn was also pseudo-randomized.
Participants could ask for a maximum of 20 fish to be shown. After each fish shown, they indicated whether the fish came from Lake G (mainly gold) or Lake B (mainly black) or asked to see another fish. The trial terminated when the subject chose the lake. There were four Figure 1. Experimental design of a single trial. In 50% of the trials, fish were coming from the mainly black lake, and in 50%, they were coming from the mainly gold lake. The order was pseudorandomized so that the same sequences were used for all participants. Feedback, depending on the block, was either of the words "Correct" and "Incorrect" in Block 1 or the number of points won (or lost) during the trial in all subsequent blocks.

Computational Psychiatry
blocks, each with the 10 trials of the predetermined sequences to increase reliability. Block 1 was similar to the classical beads task and was included to provide a reference point. The only difference was that feedback ("correct" or "incorrect") was provided after each trial. In Block 2, a win was assigned to a correct decision (100 points) and a loss (−100 points) to an incorrect decision. In Block 3, the cost of each extra fish after the first one was introduced (−5 points) was subtracted from the possible win or loss of 100 points for making a correct or incorrect decision, respectively. Block 4 was similar to Block 3, but the information sampling cost was incrementally increased: The first fish would cost 0 points, the second −5 points, the third −10 points, and so on. Thus a higher number of fish sampled led to more lost points. Subjects performed the task at their own pace. Whether Lake G or Lake B was correct was randomized. The task consisted of four blocks. Within each block were 10 trials of predetermined sequences of fish to increase reliability. All of the fish that were "caught" during one trial were visible on the screen to minimize the working memory load. Block order was not randomized because the task increased in complexity.
The main outcome variable was the number of fish sampled (DTD). Secondary outcomes were the accuracy of the decision, calculated according to Bayes's theorem, based on the probability of the chosen lake given the color and number of fish seen (Everitt & Skrondal, 2010), and the dichotomous JTC variable, which is defined as making a decision after two or fewer pieces of information. . Bottom: true state. For example, let the cost of sampling be very high. Then b 0 may be "equiprobable lakes," Action 1 "sample," s 1 "B," b 1 "60% B," Action 2 "declare B," and s 1 "Wrong." B) In this example, sampling cost is very low. A person has drawn 15 fishes, 7 of them g, hence the position of 15 on the x-axis and +1 on the y-axis as there is a +1 excess of black fish so far. The visible states corresponding to all possible future draws are shown. Looking ahead (example: gray arrow), the agent finds the "sampling" action more valuable in that the current preference for the B lake is likely to be strengthened at very low cost.

Partially Observable Markov Process Decision-Making
We consider a belief-based model of decision-making, formally a partially observable Markovian decision process, to model behavior in this task. The process is Markovian because we can concisely formulate the state in which the people find themselves upon observing n d draws, so that the state contains all information that can be extracted from observations thus far. As beads are drawn from one jar only at each trial, this can be simply defined as the number of g fishes seen so far and the total number seen: s = n d , n g . The agent is interested to infer upon the true state of the world, which is a B or G lake. This is not directly observable but partially observable. The agent maintains a belief component of their state, P(G | s). The Markov property still holds: future beliefs are independent of past beliefs given the current state ( Figure 2A). As we will see, belief-state transitions can be calculated just by considering the evidence so far. Given the current belief state, the probability of the possible unfolding of the task into the future can be estimated and hence the expected returns for each possible future decision ( Figure 2B). The value of the available choices can thus be estimated: choosing the B lake D B , the G lake D G , or sampling another piece of information D S . The agent chooses accordingly and either terminates the trial or gathers a new datum and repeats the process ( Figure 2).
We now formally specify the model. Let P G|n d , n g be the probability of the lake being gold after drawing n d fish and seeing n g gold fish. Using Bayes's theorem, and assuming that a gold lake and a black lake are equally probable, P G|n d , n g = P n d , n g |G P(G) P n d , n g |G P(G) + P n d , n g |B P(B) = P n d , n g |G P n d , n g |G + P n d , n g |B , P n d , n g |G = n g n d P(g|G) n g [1 − P(g|G)] n d −n g , P n d , n g |B = n g n d P(g|B) n g [1 − P(g|B)] n d −n g , P Gn d , n g = P(gG) n g [1 − P(gG)] n d −n g P(gG) n g [1 − P(gG)] n d −n g + P(gB) n g [1 − P(gB)] n d −n g . (1) We then need to calculate the value of each action. For the "declare" choices, the action value is the expected reward for a correct answer minus the expected cost for a wrong answer. For example, if Q D G ; n d , n g is the action value of declaring the lake gold, after drawing n d fish and seeing n g gold fish, where R C is the reward for declaring the color of the correct lake and C W is the cost of declaring the color of the wrong lake. In our task, R C = C W , so Q D G ; n d , n g = R C P G|n d , n g − P B|n d , n g = R C 2P G|n d , n g − 1 . Similarly, The action value of sampling again is the expectation over the value of the next state minus the cost of sampling C S (n d ). If the value of the (possible) next state is V(s ), the action Computational Psychiatry value for sampling again is the sum over the new possible states, weighted by their probabilities. The latter depends, in turn, on the identity of the true underlying lake, L: The possible outcomes for sampling again, and getting a black fish (going from (n d , n g ) to (n d + 1, n g )) and getting a gold fish (going from (n d , n g ) to (n d + 1, n g + 1)), are Agents will tend to prefer actions with the greatest value. An ideal, reward-maximizing agent will always choose the action with the maximum value and will thus endow the corresponding state with this value. Denoting q as the vector of action values, Real agents will choose probabilistically as a function of action values, so Agents cannot fill in the action values for sampling starting from their current state, as the next state value is not known. However, they can fill in all values by backward inference. At the very end, n d = 20, sampling is not an option, and the action values can be calculated directly from Equation 2 and the state value from Equation 4. Once all possible state values for n d = 20 have been calculated, Equations 2, 3a, 3b, 4a, and 4b are used to calculate action and state values for n d = 19 and downward to n d = 1.
We now turn to the ideal, deterministically maximizing agent against which we can compare human performance. When given the same sequence of fish as human participants, on average, the ideal Bayesian agent samples 20, 20, 3.5, and 1 fish in each of the four blocks, respectively, achieving total winnings of 1,070 points (Table 3). For no cost (Blocks 1 and 2), an ideal Bayesian agent samples all the fish and has p = 0.835 of being correct (100% if we had infinite fish). For constant cost (5 points, Block 3), it samples until the difference between black and gold fish (Nb − Ng) is 2, with p = 0.692 probability of being correct. For increasing cost (Block 4), it guesses after the first fish, so there is 0.6 probability of being correct. To model real agents, the probabilistic action choice in Equation 4b took the softmax form: with z a normalizing constant that is the same for all the actions at a specific state and T a decision temperature parameter, described later (Moutoussis et al., 2011).
Thus two parameters shape a participant's behavior: • CS, the cost of sampling, that is, the subjective cost of each additional piece of information compared to the final reward. It may be greater or smaller than the costs imposed by the experimenter but is here taken to be constant (in a given block). High values of CS mean that decisions are made early.

Computational Psychiatry
• T, the noise parameter. As this value increases, the probability of the participant following the ideal behavior (for their given value of CS) decreases and their actions become more uniformly random.

Model Parameter Estimation
Here we were interested in the most accurate possible estimates of the means and variances characterizing the psychosis and control groups. We therefore used what is known as a hierarchical model, a variant of the random effects approach. Here a participant's model parameters are drawn from a group, or population, distribution. This procedure estimates the mean and variance for CS and for T for the whole group. For a group of, for example, 30 participants with 10 data points each, we use 300 data points to estimate 4 values (i.e., mean and variance for each of the two parameter distributions) rather than 2 different parameters for each of the 30 participants (i.e., 60 parameters in total). We assumed both parameters to be positive and a priori uncorrelated and therefore appropriately modeled by independent gamma distributions. The standard ways that mathematicians parameterize gamma distributions are in terms of a shape and scale or rate. However, these do not map intuitively to the quantities we are usually interested in clinical research, that is, a measure of center and a measure of spread. Thus we follow Moutoussis et al. (2011) in describing our parameters by mean and variance. We used the expectation-maximization (EM) technique (see , Appendix; see also Moutoussis et al., 2011). In brief, EM proceeds by first assuming uninformative distributions at the group level; using these as uninformative priors, it derives probability distributions for the parameters of each participant based on each one's data. This is expectation. Then, it reestimates the group-level distributions to maximize the likelihood of the (temporarily fixed) lower level. This is maximization. The group-level distributions then form empirical priors that are used in place of the uninformative ones. The process is repeated until all estimates are stable; we ran between 25 and 30 iterations of the EM algorithm (see Ermakova, Gileadi et al., 2018, Appendix). Runs took about 30 s per iteration per participant on a single core of an Intel Core2 Duo CPU and 4 GB of RAM.
Once the maximum likelihood parameters for a given group are estimated, they can be interpreted using the integrated Bayesian information criterion (iBIC; Huys et al., 2012Huys et al., , 2015, which uses the likelihood values of best-fit parameters for two different models to decide which model best represents the data. In the case of the ordinary BIC, to assess model fit, we calculate the maximum likelihood of the data of each participant given a particular model, and we penalize this in proportion to the number of parameters in the model. The underlying assumption is that a greater number of parameters represents a proportionate reduction of prior belief that the participant belongs to a given region of the parameter space and that all parameters are on the same footing for each participant. The volume of this parameter space scales to the power of its dimensionality, that is, the number of parameters, so the log of this, used in the BIC, is proportional to this number of parameters. Redundant parameters, which would result in overfitting, are thus penalized. However, this approximation can be refined. The study sample itself gives information about the prior probability that a particular parameter obtains at the micro-level of the individual. This is the empirical prior, which, in our case, is calculated by EM. Now the complexity penalty at the level of the individual is calculated by forming a mean, or integrated, likelihood weighed by this prior. This allows for the data to speak to some parameters being "more equal than others" in penalizing complexity. We can now account well for the penalty due to the prior at the level of the individual, but we have not considered the level of the group. Should we assume that, say, patient participants should be fitted with different empirical priors than healthy controls, or the same? To compare separate-fit and common-fit statistical models, we turn to the BIC approximation of complexity proportional to the number of parameters, but now at the level of the groups. Thus the integrated BIC, which is integrated Computational Psychiatry in the sense that it contains weighted-mean likelihoods rather than maximum likelihoods, also contains a penalty term for models in proportion to the number of empirical priors, or groups, used. Therefore it can be used for hypothesis testing: If one model has all participant parameters coming from one distribution (four parameters) and a second model separates the control and unhealthy groups (eight parameters), a comparison of iBIC values can be used to decide if splitting the sample is justified. The interested reader is referred to Huys et al. (2012Huys et al. ( , 2015 for mathematical details. In addition, we conducted analyses based on individual participants' estimated model parameters. A reliable test of the hypothesis that the groups differ in the cost (or noise) parameters can be created by forcing the model to treat all participants as coming from one group, with a single group mean and variance, then using the model's estimates of the single subject parameters to conduct a test of whether there are differences according to diagnostic group. This approach is overconservative but serves a purpose in subjecting the test of group differences to a stern challenge.

Statistical Analysis of the Behavioral Data
The effect of manipulations of wins, losses, and costs on the DTD and accuracy was assessed by repeated-measures analysis of variance (ANOVA) using SPSS 23. Although DTD was not normally distributed, repeated-measures ANOVA is robust to violations of normality and was therefore an appropriate test to run. IQ and depressive symptoms scores were initially included as covariates and dropped from the subsequent analysis where nonsignificant. We report twotailed p-values, which were significant at p < 0.05. When the assumption of sphericity was violated, we applied Greenhouse-Geisser corrections. We also examined whether DTD was correlated with the severity of psychotic symptoms in the patient group (CAARMS positive symptoms) and with schizotypy scores on the PDI in controls using Spearman's correlation coefficients. The intraclass correlation coefficients (ICCs) were used to estimate the consistency of decision-making. We calculated the ICCs of the mean number of choices in each block separately for patients and controls.
For completeness and to help relate our study to prior literature (or for future metaanalyses), we report data on the dichotomous variable JTC, defined as making a decision after receiving one or two pieces of information, and we compare the FEP and ARMS patient groups separately to controls on Block 1 DTD and estimated model parameters.

Demographical Characteristics of the Participants
In Tables 1 and 2, the sociodemographic and cognitive parameters as well as clinical measures of the participants are presented. Although healthy volunteers had higher IQ compared to the  Table 2). *Significant differences.
psychosis group, the difference was not significant, and the groups were matched with regard to level of maternal education and number of years of education. The control and patient groups did not differ with regard to gender or age. There were significantly more smokers in the patient group compared to the control group, and some subjects of the patient group used more recreational drugs. For all of the other measures (e.g., alcohol or cannabis), there were no significant group differences. As expected, there are significant differences between the healthy volunteers and the patients in all subscales of CAARMS (Table 2), on the self-reported depression questionnaire (BDI), and in self-reported schizotypy scores (PDI). In the patient group only, we performed additional clinical assessments. The mean (±SD) score for PANSS positive symptoms was 13.68 (±3.99); for PANSS negative symptoms 9.87 (±4.88); for SANS 0.35 (±0.75), and for GAF 55.00 (±18.54).

Group Differences in the Number of DTD and Points
Inspection of Figure 3A reveals that in all of the blocks, the controls took more DTDs than the patients. Mauchly's test indicated that the assumption of sphericity was not violated, W  differences were statistically significant in the first two blocks (Block 1, p = 0.007; Block 2, p = 0.028), whereas in Blocks 3 and 4, the group differences were increasingly attenuated, as the cost of decision-making became increasingly high (Block 3, p = 0.059; Block 4, p = 0.419).
Control behavior was more similar to the ideal Bayesian agent than patients on the first three blocks but not Block 4 (Table 3). Group differences in the probability of being correct were very similar to the results in the number of DTD ( Figure 3B).
Analyzing the points won/lost in Blocks 2-4 ( Figure 4), we identified four outliers in each group that significantly exceeded the ±2 standard deviation threshold. After excluding these subjects, the mixed-model ANOVA revealed a significant effect for block, F(2) = 93.73, p < 0.001, a marginally significant interaction between Block · Group, F(2) = 2.52, p = 0.086, and a significant groups effect, F(1) = 4.14, p = 0.047. Patients won significantly fewer points in Block 2, F(2) = 4.65, p = 0.035, but did not differ from controls in Blocks 3 and 4, both p > 0.3. Percentage and count of people displaying a JTC reasoning style are presented in Table 4.

Intraclass Correlations of DTD and SD of the Mean DTD
ICCs of the mean number of DTD within each block were calculated separately for patients and controls. Within all blocks in both groups, the correlations were very high, indicating that behavior was consistent within each block (  Here there was an effect for block, F(2) = 93.73, p < 0.001, a marginally significant interaction between Block · Group, F(2) = 2.52, p = 0.086 and a significant groups effect, F(1) = 4.14, p = 0.047.

Correlation of Symptoms and IQ With DTD
We calculated two-tailed Spearman's correlations with positive psychotic symptoms in patients and with PDI scores in controls. In the patient group, we used a summary score of the three CAARMS subscales that quantify positive psychotic symptoms, namely, Unusual Thought Content, Non-bizarre Ideas (mainly persecutory ideas), and Perceptual Aberrations. An additional advantage of the summary measure is that it provides one measure to reduce the number of correlations that need to be performed. We also ran correlation BDI scores because the groups differed on this measure. In the group with psychotic symptoms, the correlations with the overall CAARMS score were significant in the first three blocks (Block 1, ρ = −0.515, p = 0.003; Note. DTD = draws to decision (i.e., number of fish seen by the participant before reaching a decision); Nb = number of black; Ng = number of gold; P corr = probability of being correct in Blocks 1-4. In controls, we found significant correlations of DTD in Block 1 with the Distress, ρ = −394, p = 0.035, and Preoccupation, ρ = −0.462, p = 0.012, subscales of the PDI. DTD in Block 2 correlated with the Preoccupation subscale of PDI, ρ = −0.387, p = 0.038. There were no significant correlations with BDI scores (e.g., for Block 1, ρ = −0.023, p = 0.905).

Group Differences in the Probability of Being Correct (Accuracy)
Mauchly's test indicated that the assumption of sphericity was not violated, W(5) = 0.667, p < 0.001. There was a significant main effect of block on the probability of being correct, Note. JTC (jumping to conclusions) is defined as reaching a decision after being given just one or two pieces of information.
Computational Psychiatry F(3) = 53.502, p < 0.001, and a significant effect of group, F(1) = 5.514, p = 0.022. There was a significant interaction between the group and the block, F(1) = 2.791, p = 0.042, indicating that as the cost of decision-making increased, the group differences became attenuated.

Additional Analyses Having Excluded Participants on Medication or to Make Groups More Closely Matched on IQ
To demonstrate that the findings were unaffected by antipsychotic medication, we conducted repeat analyses having excluded the four patients taking antipsychotic medication. The results were similar to in the full sample. As before, on mixed-model ANOVA, there were significant main effects of block, F(3) = 88.59, p < 0.001, and of group, F(1) = 5.21, p = 0.026, on the number of DTD. The interaction between the group and the block was also significant, F(3) = 4.54, p = 0.004. Group differences were statistically significant in the first two blocks (Block 1, p = 0.013; Block 2, p = 0.023) but reduced in Blocks 3 and 4 (Block 3, p = 0.098; Block 4, p = 0.55).
When we excluded the three highest IQ controls and two lowest IQ patients to make groups more similar in IQ (resulting in control mean IQ 107 and patient mean 105), the results were similar: On mixed-model ANOVA, there were significant main effects of block, F(3) = 90.2, p < 0.001, and of group, F(1) = 5.25, p = 0.026, on the number of DTD. The interaction between the group and the block was also significant, F(3) = 3.50, p = 0.017. Group differences were statistically significant in the first two blocks (Block 1, p = 0.013; Block 2, p = 0.039) but reduced in Blocks 3 and 4 (Block 3, p = 0.062; Block 4, p = 0.45).

Computational Modeling Results: Analysis of Group Estimates of Model Parameters
In our computational analysis of Blocks 1-3, we found that patients with early psychosis assigned a higher cost to sampling more data than did healthy controls (Table 5). For example, in Block 1, where there was no explicit cost, the modeled mean cost of sampling in the controls was very low (1.9 · 10 −3 ) compared to that of patients (1.7). The modeled variance was also higher for the patient group (13) compared to the controls (2.0 · 10 −6 ). The estimated noise parameters were similar in both groups. For example, in Blocks 1 and 2, the respective group estimated mean noise parameters were 3.4 and 3.6 in controls and 4.2 and 2.9 in patients, respectively.
In Block 3, where an explicit cost of 5 points per fish sampled was assigned, the model shows that patients very slightly overestimated the cost of sampling compared to the assigned cost (estimated mean cost of sample for patients 5.2), whereas the controls underestimated it (estimated mean cost of sample 1.6).
In Block 4, we explicitly told participants that the cost for additional information is not fixed but increases with amount of information requested. We did not fit the computational model to Block 4 because the model does not take into account the increasing cost structure set up.
To test the null hypothesis that both groups are drawn from the same distribution of behavior parameters, we used iBIC. Table 6 shows the results of that calculation, where the null hypothesis can be rejected with strong evidence. For example, in Block 1, despite the fact that the iBIC penalizes the use of extra parameters substantially, when fitting CS m , CS v , T m , and T v separately for each group, iBIC was 3,450.2, compared to an iBIC of 3,489.7 for fitting them as one combined group. Taken together, the iBIC can be used to compare the hypotheses about the two models of interest. In our case, those are the following two: (a) Unhealthy and Note. Best-fit parameters were calculated for each participant group separately and for the group formed by combining both participant groups. CS M = mean cost per sample for a particular group; CS V = variance of cost per sample within a particular group; T M is the mean noise parameter for a particular group; Tv = variance of the noise parameter within a particular group; LL = loglikelihood of the data given the parameters. Note. iBIC values and differences are presented for analyses with all participants. iBIC differences are also shown for repeat analyses after exclusions for antipsychotic medication or closer IQ matching. Positive iBIC differences indicate the preference for separate groups, negative for a single combined group. iBIC = integrated Bayesian information criterion.
healthy groups have independent CS and T parameter distributions, characterized by eight numbers (mean and variance for CS and for T, for each group), and (b) all participants come from the same pool of noise and cost parameters, characterized by four numbers. The iBIC suggests that Hypothesis 1 is better explained by the separate model in Blocks 1 and 2 and by the combined model in Block 3. Model fits were not changed substantially after exclusion for medication (Table 6).

Computational Modeling Results: Analysis of Individual Participant-Level Cost and Noise Parameters
A very strong test of the hypothesis that the groups differ in the cost (or noise) parameters can be created by forcing the model to treat all participants as coming from one group, with a single group mean and variance, then using the model's estimates of the single subject parameters to conduct a test of whether there are differences according to diagnostic group. Although we found (see earlier) that in Blocks 1 and 2, this assumption did not fit the data as well as modeling the groups separately, so the approach is overconservative, this procedure serves a purpose in subjecting the test of group differences to a stern challenge. The groups differed significantly on estimated cost parameters in this procedure ( The results were similar after exclusions for medication (Block 1, cost group difference Mann-Whitney U = 248, p = 0.008; noise group difference t[60] = 1.79, p = 0.08) or to equalized IQ (Block 1, cost group difference U = 246, p = 0.01; noise group difference t[60] = 1.5, p = 0.13).

DISCUSSION
Our study shows that early-psychosis patients generally gather less information before coming to a conclusion compared to healthy controls. Both groups slightly increased their DTD when rewarded for a correct answer (Block 2) and significantly decreased their DTD when there was an explicit cost for the sampling of information (Blocks 3 and 4). The decrease was strongest in Block 4, including the incremental cost increase for each "extra fish." These effects were especially strong in controls, as they gathered significantly more information during Blocks 1 and 2 compared to patients. Thus patients had a lower "baseline" against which to exhibit a change in DTD with increasing cost. However, we emphasize that an ideal Bayesian decision agent would still sample fewer information than the average control or patient in Blocks 3 and 4 (in Block 4, the ideal Bayesian agent decides after the first fish, whereas our human participants sampled more). This indicates that potential floor effects may not be responsible for the decreased reduction of DTD in patients compared to controls. Together with our modeling results, this finding supports the hypothesis that, independent of the objective cost value of information, patients with early psychosis experience information sampling as more costly than controls do.
In this novel version of the beads task using blocks with explicit costs, JTC can be an advantageous strategy, because sampling a large amount of very costly information would cancel out the potential gain due to correctly identifying the lake. Consistent with this, group differences were especially strong in the first two blocks, where information sampling was free. In Block 3, when there was only a small cost of information, both groups responded to that change by lowering the number of fish sampled. The difference between the groups was marginally significant on Block 3, where patients still applied fewer DTD, p = 0.06. However, further increasing the information sampling cost completely abolished the group differences. Our data furthermore show that patients were significantly worse in overall accuracy (i.e., probability of being correct at the time of making the decision) and total points won, indicating the application of an unsuccessful strategy. In general, these results show that healthy controls were more flexible in adapting their information sampling to the changed task blocks. Controls increased the number of fish sampled when there was a reward for the correct decision and decreased it when information sampling had a cost. Patients also decreased the number of DTD when information sampling became more costly, but not as much as the controls, suggesting that the patients view information sampling generally as costly, somewhat independent of the actual value and the feedback. The results on Blocks 1 and 2, furthermore, indicate that psychosis patients have difficulties integrating feedback appropriately to update their future decisions. This is similar to results we reported in a recent study on the win-stay/lose-shift behavior in a partially overlapping early-psychosis group .
The variance in DTD across the two groups was similar, and the ICCs were all greater than 0.94, indicating consistent decision-making behavior across the 10 trials in each block and within each group. If patients were acting more randomly, they would apply noisier and more variable decision-making behavior, but we did not observe this, which is in contrast to Moutoussis et al. (2011). Concluding from their Bayesian modeling data, they proposed that patients had more noise in their responses, leading to the reduced number of DTD. When modeling our data in a similar way, we found a difference between the groups in the perceived cost of information sampling, but less evidence in differences in the noise of decision-making behavior. We suggest that the contrast between our findings and those of Moutoussis et al. (2011) may be due to the differences in the patient groups used in the two studies. Whereas Moutoussis et al. used chronic, mainly schizophrenic, patients with a potential neuropsychological decline and a lower IQ (of 92), our study used patients at early stages of psychosis with preserved cognitive functioning (IQ 102). Severity of psychotic symptoms was related to DTD and cost parameters in our sample; however, as all our participants were in the early stages of illness, we were not able to explore relationships with chronicity statistically. We did, however, conduct secondary analyses to examine subgroup differences within our patient sample. On Block 1 DTD and cost parameters, the patients with confirmed psychotic illness (FEP) had the most pronounced differences from controls, whereas regarding the noise parameter, there was little difference between the patient groups, and indeed, the ARMS group, with milder symptoms, had slightly higher noise parameters. Differences in task design may also contribute to differences in the results from Moutoussis et al. Block 1 was most similar to that previous study: It differed in the type of test (computerized fishing emulation test vs. the classic test using actual beads and actual jars) and in providing feedback (i.e., "correct" or "incorrect") after each decision, as well as in showing the sequence of fish drawn in each trial. These changes in our task potentially assisted patients (e.g., if some patients had memory deficits), which might also contribute to why we did not observe robust differences in the noise parameter in Block 1. When analyzing the objective cost of information sampling in Block 3, we found that patients slightly overestimated the cost, while the controls underestimated it. Furthermore, key contributors to the best-fit parameters may be those who jump to conclusions most: participants who consistently made a decision after viewing only one fish. The proportion of those individuals was significantly higher among the patients.
The percentage of people with psychosis who demonstrated JTC reasoning style in our study was relatively low, at 26%, compared to previous studies that report 40% of individuals (Dudley et al., 2011) or half to two-thirds of the individuals with delusions (So et al., 2010). Likewise, our objective DTD values were slightly higher than those in most other studies. This might be due to the use of early-stage psychosis patients and the presence of feedback in our task compared to previous studies. Feedback has been shown to increase information sampling and accuracy both in patients with delusions and in controls (Lincoln, Ziegler, Mehl, & Rief, 2010).
Additionally, we found a correlation between IQ and DTD in the first three blocks in the patient group, so we cannot completely exclude the contribution of intelligence to the information-gathering bias. Some argue that impaired executive functions or working memory deficits contribute to the JTC bias (Falcone et al., 2015;Garety et al., 2013). A low IQ could lead to a low tolerance for uncertainty and an equivalent high cost of the information sampling as well as the inability to integrate feedback to update future decisions. If more information is unpleasant, because it exceeds one's capacity to utilize it, it could be viewed as costly. However, the fact that the correlation between IQ and DTD does not appear in controls argues against this. The groups were well matched on maternal education level (a proxy for premorbid or potential IQ), but the patient group had lower current IQ than controls (as expected, given that schizophrenia spectrum disorders are robustly associated with reduced current IQ compared to the general population). When we excluded the three controls with the highest IQ and the two patients with the lowest IQ, the DTD results were broadly unchanged. Computational modeling in Block 1 was not changed by these exclusions, although in Block 2, there were some differences. After the exclusions, in Block 2, the BIC values did not suggest evidence that the participants from different diagnostic groups are drawn from different populations. However, when we examined the individual-level modeled parameters, even after the exclusions, there was evidence that patients had higher estimated sampling costs compared to controls. Taken together, the findings indicate that lower IQ associated with psychosis is likely to contribute to the JTC bias but is unlikely solely to explain its existence in psychosis. The results of the study were unchanged when we excluded four patients taking antipsychotic (dopamine receptor antagonist) medication, which is consistent with two prior studies in healthy volunteers, suggesting that dopaminergic manipulations do not have a large effect on information sampling (Andreou, Moritz, Veith, Veckenstedt, & Naber, 2013;Ermakova, Ramachandra, Corlett, Fletcher, & Murray, 2014).
When comparing our results to those of Moutoussis et al. (2011), the use of the same computational model, implemented in the same way, is advantageous. However, we note that there are limitations in the approach. For example, the use of a gamma distribution may not be optimal in the case of values near zero (Moutoussis et al., 2011). It could be hypothesized that the degree of cognitive noise should be a constant per individual and that thus it would be more parsimonious to apply the same noise parameters for a given subject across blocks. Our data suggest that this is not the case. Noise is highly correlated across blocks, just as cost is. However, the experimental manipulation of block had a highly significant effect on both cost (as intended by our paradigm design) and noise (an incidental effect). Noise parameters were reduced in Block 2, where a correct decision is explicitly rewarded, compared to Block 1, where it is not (indicating that information sampling is not immutable but can be adaptively altered by psychological manipulation). Noise parameters were greater in Block 3 than in Block 2, presumably because the decision is more difficult in Block 3 (where participants need to balance the stated benefits and costs of sampling). When decisions reach a certain level of difficulty, participants may appear more random in their decision-making because they can no longer effectively utilize the information available.
Rather than focusing on ICD-10 or DSM-V schizophrenia patients, we studied a group of patients early in their course of psychosis. All patients in our study suffered from current psychotic symptoms. Psychosis is often viewed as an upper part of the continuum, ranging from rare occurrences of delusions or hallucinations at one end through individuals with regular "schizotypal" traits (van Os, Linscott, Myin-Germeys, Delespaul, & Krabbendam, 2009). Consistent with this approach, and with the theory that a JTC-style cognitive bias contributes to psychotic symptom formation (Huq et al., 1988), we found that patients with more severe positive symptoms sampled less information and had higher estimated sampling cost parameters. To investigate the idea of the psychosis continuum further, we looked at the correlations between information sampling and schizotypy characteristics in healthy volunteers. In this group, we found a negative correlation between the number of DTD and the scores on the distress and preoccupation subscales of the PDI, indicating that less information sampling is associated with higher scores. This is consistent with studies by Colbert and Peters (2002) and Lee, Barrowclough, and Lobban (2011), as well as the recent meta-analysis by Ross et al. (2015), in keeping with a continuum model of psychosis. Regarding modeled parameters in controls, estimated noise was associated with total PDI score, PDI distress, and PDI preoccupation, and estimated sampling costs were associated with PDI preoccupation and, marginally, with distress. This hints at the intriguing possibility that hasty decision-making due to cognitive noise may be a more important contributory factor to delusion-like thinking in the healthy population than it is to psychotic symptoms in psychotic illness, where hasty decision-making due to higher information sampling costs appears to be more important.

Summary
In summary, we found that early-psychosis patients demonstrate a hasty decision-making style compared with healthy volunteers, sampling significantly less information. This decisionmaking style was correlated with delusion severity, consistent with the possibility that it may be a cognitive mechanism contributing to delusion formation. Our data are not consistent with the account that patients sample less information because they are in general more noisy decision makers. Rather, our data suggest that patients with psychosis sample less information before making a decision because they attribute a higher cost to information sampling. Although psychosis patients were less able to adapt to the changing demands of the task, they did alter their decision-making style in response to the changing explicit costs of information, indicating that an impulsive decision-making style is not completely fixed in psychosis. This finding is consistent with the possibility that information sampling may be a treatment target, for example, for psychotherapy (Moritz et al., 2014), and that patients with psychosis may benefit in this neuropsychological domain, as they have in other domains, from cognitive scaffolding approaches exemplified in cognitive remediation therapy (Cella & Wykes, 2017).

AUTHOR CONTRIBUTIONS
Author roles: AE conceptualization, methodology, formal analysis, writing (original draft preparation, review, and editing); NG methodology, formal analysis, writing (review and editing); FK (methodology, formal analysis, writing (review and editing); AJ methodology, investigation, project administration, writing (review and editing); RA methodology, formal analysis, writing, review, and editing; PCF conceptualization, methodology, supervision writing (review and editing); MM methodology, analysis supervision, writing (original draft preparation, review, and editing); GKM conceptualization, project administration, methodology, formal analysis, supervision, funding acquisition, writing (original draft preparation, review, and editing)

FUNDING INFORMATION
Supported by a MRC Clinician Scientist (G0701911) and an Isaac Newton Trust award to GKM; by the University of Cambridge Behavioural and Clinical Neuroscience Institute, funded by a joint award from the Medical Research Council (G1000183) and Wellcome Trust (093875/Z/ 10/Z); by awards from the Wellcome Trust (095692) and the Bernard Wolfe Health Neuroscience Fund to PCF; by the Cambridgeshire and Peterborough NHS Foundation Trust and Cambridge NIHR Biomedical Research Centre, and by the Max Planck-UCL Centre for Computational Psychiatry and Ageing, a joint initiative of the Max Planck Society and University College London. MM also receives support from the UCLH Biomedical Research Centre and is funded staff in the "Neuroscience in Psychiatry Network," Wellcome Strategic Award (095844/7/11/Z).