Distinct alterations in probabilistic reversal learning across at-risk mental state, first episode psychosis and persistent schizophrenia

We used a probabilistic reversal learning task to examine prediction error-driven belief updating in three clinical groups with psychosis or psychosis-like symptoms. Study 1 compared people with at-risk mental state and first episode psychosis (FEP) to matched controls. Study 2 compared people diagnosed with treatment-resistant schizophrenia (TRS) to matched controls. The design replicated our previous work showing ketamine-related perturbations in how meta-level confidence maintained behavioural policy. We applied the same computational modelling analysis here, in order to compare the pharmacological model to three groups at different stages of psychosis. Accuracy was reduced in FEP, reflecting increased tendencies to shift strategy following probabilistic errors. The TRS group also showed a greater tendency to shift choice strategies though accuracy levels were not significantly reduced. Applying the previously-used computational modelling approach, we observed that only the TRS group showed altered confidence-based modulation of responding, previously observed under ketamine administration. Overall, our behavioural findings demonstrated resemblance between clinical groups (FEP and TRS) and ketamine in terms of a reduction in stabilisation of responding in a noisy environment. The computational analysis suggested that TRS, but not FEP, replicates ketamine effects but we consider the computational findings preliminary given limitations in performance of the model.


Behavioural results (study 1): ARMS, FEP and matched controls
Accuracy and risk tendencies For every trial (see "Methods" section below), participants responded to a cue by deciding between a "risky" option (betting a larger sum (£1)) and a "safer" option (betting a smaller amount (10p)).Visual feedback reminded them of their choice, and then informed them of the outcome ('win' or 'lose' whatever they had chosen to wager).The expected value of choosing either of these options on a particular trial depended on the cue shown.At any given time during the experimental session, one cue was statistically associated with a high probability of a 'win' outcome (P(Win) = 0.8) and thus indicated that the riskier choice was optimal.The other cue was, meanwhile, statistically associated with high-probability loss outcomes (P(Win) = 0.2) and therefore signalled that it would be optimal to choose the "less risky" (10p) bet.Choices were considered 'accurate' if they conformed to what would, conditional upon the prevailing cue-outcome contingencies, yield greater expected return on the present trial.That is, the optimal choice conditional upon the current statistical relationship between the presented cue and the probable outcome-regardless of whether the trial happened to be one of the minority (20%) of trials that violated these contingencies-was considered 'accurate' , since it conformed to the general strategy that would maximise gains and minimise losses throughout the current block of the task.

Choice switch tendencies
We operationally define outcomes according to whether they were more or less desirable than the counterfactual possibility that would have occurred, given the cue presented, had the alternative wagering option been chosen.More desirable outcomes entailed receiving the maximal gain or minimum loss on a given trial (i.e.receiving £1 rather than 10p, or losing 10p rather than £1).As discussed above, trials could also be categorised according to the optimality or correctness of the choice made.Crucially, all possible permutations of choice accuracy and outcome desirability were possible.Thus, a correct selection of the risky option could be followed by winning the (more desirable) £1 outcome or, on 20% of trials, by the less desirable outcome of losing a whole pound.If the participant erroneously responded to the relevant cue by choosing the non-risky response, they would usually receive a less desired outcome than the correct response would have garnered (i.e. would win only 10p rather than the £1 that would have been the outcome had they plumped for the alternative option in making their wager).On 20% of such trials, however, they would receive a more desirable outcome for their erroneous choice than the correct one would have yielded (i.e. if the low-probability 'loss' event happened to occur, then the counterfactual case in which they had made the correct choice relative to the prevailing cue-outcome contingencies, in this particular instance would have yielded an even larger monetary loss of £1).The same possibilities were afforded by the cue signifying optimality of a non-risky choice (in this case the more desired outcome was losing 10p while the less desired outcome was losing £1).A subject's propensity to interpret (less) desirable monetary outcomes (losses of £1 or wins of 10p) as indicative of a genuine need to alter their current response strategy could be assayed, behaviourally, by their propensity to 'switch' to an alternative strategy on the trial immediately following such events.Similarly, the tendency for recent indications of performance success to promote policy stability (i.e. the tendency to 'stick' with a seemingly successful strategy) was assayed behaviourally by examining the tendency to stay with the same response strategy (rather than switching) on the next trial after receiving a relatively desirable outcome (losses of only 10p or wins of £1).We analysed switch behaviour (i.e.tendency to change one's choice on the subsequent presentation of the same cue or to repeat the choice when the alternative cue was next presented) for each of the four permutations: (i) correct choice, more desirable outcome; (ii) correct choice, less desirable outcome; (iii) incorrect choice, more desirable outcome; (iii) incorrect choice, less desirable outcome.We describe group comparisons for each of these four permutations below: (i) Correct choice, more desirable outcome: the optimal behaviour here is to stick with the same choice strategy on ensuing trials.As seen in Fig. 1 (top-right) This differed significantly across groups (one-way ANOVA; df = 2,80; F = 8.793; p < 0.0001): pairwise comparisons (TK-corrected) found it was lower in FEP than both HC1 (p = 0.0002) and ARMS (p = 0.0369), but did not differ between ARMS and controls (p = 0.1968).In short, FEP participants tended to shift from an optimal choice strategy even though it had received the more desirable outcome.(ii) Correct choice, less desirable outcome: here, the optimal behaviour is to stick with the same choice despite the experience of a less desirable outcome.There was a significant group difference in probability of inappropriately shifting strategy.[One-Way ANOVA: df = 2,80; F = 4.96; p = 0.0093].Post-hoc pairwise comparisons (TK-corrected) found FEP participants were more susceptible to such "surprise-shift errors" than HC1 participants (p = 0.0066), while the ARMS group did not differ from FEP (p = 0.137) nor HC1 (p = 0.423) in this regard (see Fig. 1, bottom-right).That is, FEP participants appeared less likely than controls to maintain an optimal strategy in the face of less desirable feedback.(iii) Incorrect choice, more desirable outcome: there was no difference in tendency to shift strategy following choices that were incorrect, but (having being made on one of the minority (20%) of trials on which the prevailing probabilistic relationship between a cue and its typical outcome happened to be violated) had nevertheless received the more desirable outcome (one-way ANOVA, df = 2, 79; F = 2.44; p = 0.0939).(iv) Incorrect choice, less desirable outcome: The groups did not differ in their probability of appropriately shifting away from (as opposed to persevering with) an incorrect strategy on trials immediately following an informative undesirable outcome (One-way ANOVA: F(2,80) = 0.4866, p = 0.617).
In brief, these analyses demonstrate a reduced tendency in FEP participants, compared to controls, to maintain the optimal choice strategy.While ARMS participants were numerically intermediate between HC1 and FEP, they did not significantly differ from either group.The findings are noteworthy in relation to the overall conclusions from the behavioural analysis of healthy participants under ketamine administration, which were that that ketamine reduced the ability to stabilize behaviour in the face of probabilistic (misleading) unexpected outcomes.

Computational results (study 1): ARMS, FEP and matched controls
Variational Bayes analysis 32 strongly supported the hypothesis that groups did not differ in model frequencies (P(y|H =) ~ = 1).We therefore pooled HC1, ARMS and FEP data for submission to random-effects Bayesian Model Comparison.The best-fitting model was a hierarchical model that implemented optimality-based confidence updating and used confidence to modulate learning-rate and choice temperature with separate weights.This model's 'exceedance probability' (the probability it was more frequent than any other in the modelspace) was ep = 0.9998.Its "protected exceedance probability'' (an extension of this notion, controlling for the possibility that one model may occur more frequently than all others simply by chance 32 ) was pxp = 0.9995.Family-wise analysis found strong support (ep ~ = 1) for those models whose lower-level reinforcement learning algorithm was that of this winning model (i.e., symmetrical updating of cue-values according to outcome valence, reflecting an appreciation of the task structure) over less sophisticated alternatives (i.e. over families of models that updated only the seen cue's value on each trial, and/or updated cue-values more from |£1| outcomes than equivalently informative |10p| outcomes).When families were defined instead by how confidence was updated, family-wise analysis found strong evidence in favour of optimality-based "surface-monitoring" (ep ~ = 1).Finally, comparing families defined by how confidence was used to modulate learning rate and/or choice temperature found strong support for the family of models in which confidence modulated both these lower-level parameters with separate weighting-factors (ep ~ = 1).These family-wise analyses support the validity of model 26, which uniquely occupies the intersection between the winning reinforcement learning, confidence-monitoring, and confidence-modulation families.
Contrary to what we hypothesized based on previous ketamine findings 26 , neither of the two free parameters controlling confidence-modulation of lower-level parameters differed between groups.Of the winning model's five free parameters, groups differed significantly only in baseline choice temperature β 0 (Kruskal-Wallis ANOVA; df = 81,2; χ 2 = 12.7; p = 0.0017).See Fig. 2.This remained significant following correction for multiple comparisons.Pairwise comparisons (TK-corrected) found median β 0 was significantly higher in FEP than HC1 (p = 0.001), tended to be higher in ARMS than HC1 at trend-level (p = 0.067), and did not differ between ARMS and FEP (p = 0.339).
Median β 0 did not differ between patients who were taking antipsychotics, and patients who were not (P = 0.951; H = 0; z = 0.06, rank sum = 377).Further, among the former subgroup of patients, antipsychotic dose Figure 1.Boxplots showing median ( ), interquartile range ( ), and range ( ) of key behavioural outcomes from Study 1, plotted separately for Healthy Control (HC1), At-Risk Mental State (ARMS) and First Episode Psychosis (FEP) groups.Any outlying data points within a group are plotted individually ( ).Top left Accuracy was significantly lower in FEP than HC1, and was lower at trend-level in ARMS than HC1.Top right Behavioural sensitivity to informative (high-probability) desirable feedback.The tendency to adaptively stay with an optimal strategy following desirable feedback was significantly lower in FEP than in both HC1 and ARMS groups (who did not significantly differ from one another in this regard).Bottom left Behavioural sensitivity to misleading (low-probability) desirable feedback did not differ between groups.Bottom right Behavioural sensitivity to misleading (low-probability) undesirable feedback.FEP participants were significantly more likely than HC1 participants to inappropriately 'switch' following low-probability undesirable feedback.www.nature.com/scientificreports/(in chlorpromazine equivalent units 33 ) did not correlate with β 0 (n = 16; ρ = 0.127; p = 0.640).Thus, the significant group difference does not seem to reflect a medication effect.

Summary
The computational analyses suggested that, despite the behavioural resemblance between FEP and ketamine-in terms of a reduced tendency to stick to an optimal strategy when faced with probabilistic undesirable outcomes, there are computational differences: notably, in the current study, groups differed significantly only in one specific free parameter: baseline choice temperature β 0 was lower in the FEP group (who also showed worse performance) than the HC1 group.In contrast to the previous study, no differences were observed across group in terms of the confidence weighting parameters.The winning model here was largely identical to the one previously observed 26 , in that all three groups approached the task using a cognitive strategy involving: (1) maintaining a meta-level estimate of confidence in lower-level beliefs about values of cues; (2) updating this confidence estimate dynamically using outcome optimality as a teaching signal; and (3) deploying confidence to flexibly influence learning rate and choice temperature, using distinct weighting factors to separably control these (this latter finding represents the single point of departure between the winning model here and that of the previous study, which did not differentiate separate parameters for confidence-modulation of learning versus decision-making).An important caveat is that parameter recovery for four of the parameters was poor and only choice temperature, β0, was well-recovered from simulation data.

Behavioural results (study 2): TRS and matched controls
Accuracy and risk tendencies Accuracy (see Fig.

Choice switch tendencies
As with Study 1, we examined tendencies to switch choices according to the four possible permutations of trial.
(i) Correct choice, more desirable outcome: as described above, the optimal behaviour here is to stick with the same strategy on the ensuing trial.However, TRS participants were more likely to inappropriately shift from a choice that had, on the previous trial, received informative desirable feedback-a tendency which differentiated them significantly from HC2 participants (one-way ANOVA: df = 1, 58; F = 14.907; p < 0.0003).See Fig. 3 middle left.(ii) Correct choice, less desirable outcome: here, the optimal behaviour is to stick with the same choice despite the experience of a less desirable outcome.TRS was associated with significantly greater tendency to incorrectly switch away from the optimal strategy, i.e. to switch response when the preceding trial was associated with a correct response but an undesirable outcome (one-way ANOVA df = 1; 58; F = 5.6128; p = 0.0212).See Fig. 3 bottom right.(iii) Incorrect choice, more desirable outcome.The TRS group were more likely than the HC2 group to shift from a choice that, on the preceding trial, had received a 'misleadingly' desirable outcome (one-way ANOVA: df = 1,58; F = 7.99; p = 0.0064).See Fig. 3 middle right.Note that, in this respect, they showed a more optimal tendency than the controls (although, in the context of their overall elevated tendency to shift choices, this may not signify a true improvement in performance, which was indeed comparable in terms of overall accuracy between TRS and HC2 groups).www.nature.com/scientificreports/(iv) Incorrect choice, less desirable outcome: The TRS group were more likely than controls to correctly switch after undesirable feedback following a suboptimal response (one-way ANOVA: df = 1,58; F = 7.76; p = 0.0072).|See Fig. 3 bottom left.Again, though optimal, this tendency should be considered in the context of their overall increased tendency to switch.
In brief, for all permutations of choice and outcome, TRS participants showed an increased tendency to change their strategy across successive trials.While in some cases, this was optimal, and may indeed have contributed to their preserved accuracy relative to matched controls, the tendency proved disadvantageous in that it was to the detriment of maintaining optimal responding across the study session and, as with the previous ketamine study and the ARMS participants in study 1, suggest that the clinical group showed a reduced ability to stabilise responding in a probabilistic setting (Fig. 4).

Computational results (study 2): TRS and matched controls
Variational Bayes analysis 32 strongly supported the hypothesis that model frequencies did not differ between TRS and HC2 groups (P(y|H =) = 0.9998).Pooling both groups' data and submitting it to random-effects Bayesian Model Comparison to estimate the most likely distribution of models in this sample, we replicated the findings of Study 1. Once again, the data were best fit by the hierarchical model using outcome desirability as a teaching signal for confidence, which modulated both learning rate and choice temperature with different weights.The 'exceedance probability' of this winning model within the present sample was ep = 0.9103.Protected exceedance probability (pxp) for the winning model was pxp = 0.6797.Thus, the probability that in this sample, this best-fitting model was more frequent than any other "not by chance" 32 perhaps seems dubiously low.However, the previous pharmacological work 26 and Study 1 both found strong evidence for a significant difference in these models' frequencies.Further, the most frequent model in Study 1 was the same model as best fit Study 2's independent dataset here, and furthermore was identical, in all respects save for not differentiating between parameters κA and κB, to the winning model in the ketamine study 26 (possibly due to power limitations stemming from that previous study's smaller sample size).Thus, the more a priori probable assumption that (consistent with previous findings) "these models differ from one another in frequency" renders ep a more appropriate metric (and pxp overly conservative) in this case.
Family-wise analyses replicated the findings from the identical family-wise analyses reported for Study 1's separate dataset: concurring with the random-effects model comparison in supporting the model's validity within the HC2/TRS sample.There was strong support for the family whose reinforcement learning algorithm used outcome valence to update both cues symmetrically on each trial (ep = 0.9735); for the family updating confidence according to outcome desirability as per Eq. ( 4) (ep ~ = 1); and for the family using confidence to modulate learning-rate and choice-temperature with separate weights (ep = 0.9852).
Of the winning model's five free parameters, four showed no significant group difference.The exception was κB, the weight with which confidence modulates choice temperature: average κB was higher in HC2 than TRS (one-way ANOVA, df = 1,58; F = 4.235; p 0.0437).
Patients' clozapine dose did not correlate with κB (ρ = 0.0160; p = 0.929), nor did their current clozapine level (ρ = 0.123; p = 0.496).Whether effects of other antipsychotic medication, in those patients who were additionally taking them, might confound our finding's interpretation was investigated using an unpaired t-test (unequal variances).The subgroup of patients taking other (typical) antipsychotics as well as clozapine did not differ, in average κB, from those patients taking only clozapine (t = − 0.302; df = 31.8;p = 0.765).Thus, the difference between HC2 and TRS in κB does not seem attributable to antipsychotic medication.

Summary
In contrast to ARMS and FEP, people with TRS showed a reduction in average κB: the free parameter governing the weight with which meta-level confidence influences choice temperature.Thus, TRS resembled the pattern of reduced confidence-modulation observed under ketamine 26 .These interesting observations must be tempered by two caveats: first, simulation-based parameter recovery for κB was poor and, second, the groupwise difference did not survive a correction for the five parameter comparisons.For these reasons, though we find this apparent overlap between computational alterations across TRS and ketamine interesting, we treat it cautiously.

Exploratory analyses of relationship between task performance and delusional symptoms
These exploratory analyses focused on the question of whether the computational parameters showing significant group differences (i.e.β0 in FEP and κB in TRS) showed any relationship to clinical features.

Higher κB in TRS associated with presence of delusions
Due to the bimodal distribution of delusional symptoms within the TRS sample, patients were divided into two sub-groups based on delusional symptom severity.Average κB was significantly lower in patients with non-delusional (PANSS P1 = 1) TRS than with delusional TRS (PANSS P1 = 3-4) (one-way ANOVA; df = 1,18; F = 5.8; p = 0.027).

Discussion
Across two studies we examined probabilistic reversal learning in three clinical groups.Study 1 characterised learning in at-risk mental state (ARMS) and first-episode psychosis (FEP).Study 2 examined people with what has been termed treatment-resistant schizophrenia (TRS), i.e. participants diagnosed with schizophrenia and treated with clozapine due to a lack of efficacy of standard antipsychotic treatments.We used behavioural and computational analyses to characterise alterations in learning and choice.We are cautious about the computational analyses for reasons discussed below, and will begin by discussing the behavioural findings across the two studies.
The behavioural results suggest a distinction between ARMS, FEP and TRS phases of psychotic experience, in terms of how patients within these groups approach learning and decision-making under uncertainty.While FEP was associated with a significantly reduced overall accuracy level, there was no significant group difference in overall propensity to make risky choices, despite a numerical tendency towards riskier behaviour in the ARMS group-whose accuracy level was intermediate between FEP and controls, and was different at a trend-level from the control group.Conversely, TRS participants did not differ on overall accuracy from their matched control group, but did show a significantly greater tendency to make riskier choices.
A more detailed analysis of trial-to-trial behaviour considered tendencies to shift from one behavioural strategy for cue-guided choice to another, in response to probabilistic feedback.Here, we observed that FEP participants were more prone than controls to switch away from optimal responding, after experiencing both more desirable and less desirable outcomes.They did not show an increased tendency to switch away from a sub-optimal pattern of choice, irrespective of the desirability of the outcome on the previous trial.This may be at the root of their reduced accuracy, and accords with the behavioural impact observed in healthy participants undergoing acute ketamine challenge.Conversely, patients with TRS showed increased behavioural "switch" tendencies irrespective of the optimality of their previous choice or the desirability of its outcome, suggesting reduced stability of responding which is also a characteristic of chronic schizophrenia.Interestingly, TRS participants showed no concomitant reduction in overall accuracy compared to controls.
Overall, the behavioural findings in the clinical groups echo the previous observation that ketamine was associated with a reduced ability to stabilise responding within a probabilistic environment in which even optimal responses will occasionally be followed by less desirable outcomes.The apparently paradoxical observation that the TRS participants, whose choice behaviour was markedly more changeable (or less stable) than matched controls' , nevertheless showed preserved accuracy may be accounted for by these patients' general tendency to switch towards as well as away from optimal responses, which contrasts with the FEP group's more specifically elevated sensitivity to probabilistic undesirable feedback following optimal choices.It is possible too that the generally increased flexibility (or reduced stability) of the TRS group's responses had a mitigating effect on the disruptive effect of probabilistic reversals which occurred on three occasions over the study session.That is, relatively unstable responding can confer a brief advantage when the environment is volatile.
The ensuing computational analyses sought to determine the deeper processes underpinning more superficial behavioural observations, and focused on whether the TRS and FEP groups' behavioural resemblance to healthy participants under acute ketamine administration on this task was accompanied by comparable alterations in the computational parameters associated with ketamine.In brief, the prior work 26 suggested that ketamine infusion reduced the capacity to stay with an optimal behavioural policy when confronted with probabilistic undesirable outcomes, and computational modelling indicated a reduction in the degree to which meta-level confidence modulated lower-level reinforcement learning parameters so as to promote stricter adherence to policies that were, based on recent performance, more likely to be optimal.
As in the ketamine study, the computational model most successfully capturing choice data for studies 1 and 2 suggested that all groups approached the task by maintaining an estimate of meta-level confidence in their lower-level beliefs, and updated confidence dynamically using outcome (un)desirability as a teaching signal, with confidence growing in proportion to the number of recent desirable response-feedbacks.However, it should be noted that this winning model included multiple free parameters (in order to fully recapitulate the previous study's analysis) which meant that parameter recovery was overall poor.Our discussion below is therefore tempered with caution.
Our prediction was that the ARMS and FEP groups would show computational alterations resembling healthy participants treated with acute ketamine.However, this was not the case: while ketamine was primarily associated with reduced influence of current confidence over learning and behavioural policy, Study 1 showed that confidence-weighting parameters were not altered in ARMS or FEP compared to controls.Instead, the free parameter which in this hierarchical winning model corresponds to 'baseline' choice temperature, β 0 , was significantly elevated in FEP (and numerically, but not significantly, in ARMS) compared to controls.Choice temperature may signify relative randomness of responding 36 , and the increase in this parameter among FEP participants accords with our behavioural analysis demonstrating a relative failure to consistently maintain an optimal response pattern, an effect that was even more pronounced in TRS, the latter according with the conclusions of a recent systematic analysis 25 .Second, despite their more advanced illness stage, the TRS group (unlike ARMS and FEP) showed a computational perturbation resembling that produced by ketamine: namely, a reduction in the degree to which current confidence level (as estimated by recent performance success)modulated choice temperature.(It should be noted that the ketamine model comparison findings differed in one respect from the present two studies' , in that they did not differentiate two separate weighting factors for confidence modulation of learningrate and choice temperature 26 ).
However, we emphasise here that, as mentioned above, while simulation-based analysis showed that recovery of the β 0 parameter (altered in FEP) was good, it was poor for the parameter capturing the altered confidencemodulation in TRS (κ B ), a failure that appears common to many papers in the field as parameter recovery is often not reported on 25 .This is an important area for improvement in future studies as successful parameter recovery is essential for the development of valid and interpretable models 37 .Furthermore, the difference in this confidence-modulation parameter between TRS and matched controls did not survive correction for multiple comparisons.Thus, while it is striking that the pattern in FEP and ARMS groups differed from that under ketamine, the apparent computational resemblance between ketamine and TRS should be treated cautiously.
Our experimental approach across these studies follows a growing interest in seeking to understand delusional beliefs in relation to observations from associative learning research.It is challenging to relate complex symptoms like delusions to underlying cognitive processes and, in this regard, insights from associative learning have proven attractive and useful by offering simple models of how an agent samples evidence from its environment and uses this to derive inferences about the associative regularities structuring that environment.This provides a powerful framework for developing theories of delusion formation.Early work by Miller, inspired by associative learning, considered psychosis in terms of a lowering of the threshold of evidence for updating beliefs based on new observations, leading to a "hyperactivity of associations" 38 .Learning theory was also central to powerful cognitive models of schizophrenia developed by Hemsley, Gray and others 39 and subsequent neuroimaging work built on this to establish the presence of underlying perturbations in prediction error as a possible explanation for aberrations in belief updating in the context of associative learning tasks 11,12,40 .The link to prediction and prediction error, and to their underlying neurobiology, has been demonstrated across a range of tasks and techniques [40][41][42][43][44] .This body of work shares the perspective that it is instructive to conceptualise delusions in terms of the integration between predictions and sensory evidence 4,5,14,45,46 , and this has latterly included a consideration of how precision-based weighting of signals may provide a more complete framework for thinking about when and how new evidence (or prediction error) is used to update existing beliefs.
While a more comprehensive discussion of learning tasks in psychosis is beyond the scope of this paper, a recent systematic review of this literature suggested that, in reversal tasks such as ours, psychosis is associated with difficulties in reacting to changing contingencies: a phenomenon perhaps underpinned by relative insensitivity to environmental volatility, and by enhanced responses to irrelevant information 25 .Our findings are consistent with this-though they indicate that this characteristic pattern of disruption, in the context of different stages or forms of psychosis, may emerge from quite different underlying perturbations.Behaviourally, FEP and TRS are both associated with difficulties in sticking to an optimal strategy when it is challenged by unexpected and undesirable outcomes.Importantly, in TRS this general lability was reflected in elevated adaptive, as well as maladaptive, policy-shifting behaviour compared to controls.The disparities between the patterns of computational parameters, in the pharmacological model of early psychosis 26,47 compared to Study 1's ARMS and FEP patients, may reflect experimental differences: the task was administered twice in the previous study (once under ketamine and once under placebo), and although order was counter-balanced, findings might have been affected by between-session differences in participants' overall familiarity with the task.Another explanation for the discrepancy between the clinical and pharmacological model computational findings is that ketamine was administered as a planned, acute, transient experience whereas the experiences of participants with ARMS and FEP have developed gradually, with corresponding adaptations in how these individuals update their beliefs over the course of weeks and months.While this differing temporal profile of experiences might account for the differences between ketamine and FEP, such an explanation raises the question of why people with TRS (whose experience of psychotic illness has been even more persistent and prolonged) show computational alterations that do resemble those observed under acute ketamine: that is, TRS was likewise associated with reduced confidence modulation of choice temperature.One speculative explanation is that progression from an acute (FEP) to a more chronic (TRS) state involves gradual adaptation to "persistent doubt" 26 .An initial period (in FEP) of more random choice and behaviour 48,49 could lead to increased doubt about one's ability to accurately predict the world.This would render confidence a poorer assay of how reliably true one's current beliefs are likely to be.That is, as psychosis becomes more established, perhaps a shift could occur from doubting (and therefore updating) one's current model of the world 26,50 , to doubting whether one can actually model the world successfully [51][52][53] .That is, confidence in one's beliefs may no longer predict that choices based on these beliefs will reliably yield expected outcomes 54 .This, in turn, could manifest as partial uncoupling of confidence from choice temperature, as we observed in TRS.To put things simply, if FEP is characterised by a search for new priors to better predict the world, perhaps TRS is characterised by a sense that updating priors adds little to the world's predictability.
From another perspective, our findings are also consistent with the idea that TRS represents a distinct subtype of schizophrenia 55 .Schizophrenia is defined as 'treatment-resistant' in cases where symptoms are not responsive to treatment with two or more antipsychotic (dopamine D2-receptor antagonist) medications.Thus, in contrast to treatment-responsive forms of psychosis, it may be secondary to a non-dopaminergic pathology, mediated more by glutamatergic dysfunction.In TRS, NMDA-R hypofunction is believed to play a greater ongoing role in symptom maintenance, via its consequent hyperglutamatergic cortical state 56,57 .
Overall, we replicated a previous study design and analysis to explore whether distinct stages of clinical psychosis might resemble the effects of an acute ketamine challenge on learning and decision-making under uncertainty.From a phenomenological perspective, ketamine's effects appear more redolent of prodromal and early psychosis, and we did indeed observe that FEP was associated with a relative failure to stabilise choice behaviour in the face of probabilistic challenges.This instability of choice was more widespread in TRS and it was this more chronically unwell group who showed a greater resemblance to ketamine effects in the computational analysis.While, as discussed, we are cautious in interpreting the computational findings, our study shows the importance and value of thinking about psychosis in terms of its evolution over time since there are clearly both behavioural and computational distinctions between early and later stages of the condition.

Ethics declaration
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.Full informed consent from all participants was obtained in writing.
All experiments were designed and conducted in full accordance with local guidelines and regulations governing psychiatric research studies including human participants.

Study 2 sample (TRS, HC2)
Data was obtained from n = 60 participants, of whom n = 31 were patients with chronic treatment-resistant schizophrenia (TRS) recruited from the NHS Clozapine Clinic in Cambridge.Inclusion criteria for the TRS group were (1) age 18-65 years, (2) no major physical or substance use comorbidities, (3) no changes to clozapine treatment within at least three months.Matched healthy control (HC2 group) participants were recruited from online bulletins, and through fliers posted in public spaces (e.g.community noticeboards and local businesses).HC2 group inclusion criteria were: (1) age and demographic match to existing TRS patient sample, (2) no history of psychiatric illness, (3) no major physical conditions and/or psychotropic medication use.
Of the 40 patients recruited for Study 2, useable datasets were obtained from 31 (six patient datasets were lost due to technical issues, and three were excluded during quality control due to evident failure to understand or implement the task instructions).One healthy participant was unable to complete the task due to a technical issue during testing, and a replacement HC2 participant was subsequently recruited as a suitably matched control for the relevant patient dataset.
All participants (across both studies) were able to speak and understand written English, gave full written informed consent, and were reimbursed for their time and effort (Tables 1, 2, 3).

Task description
Each trial began with a fixation cross (~ 500 ms), after which one of two cues (' A' or 'B') was presented for ~ 1750 ms.A question mark then appeared, prompting the participant to respond (within a window of opportunity ~ 1500 ms), after which a printed reminder of their choice ("Risky" or "Less Risky") was displayed for ~ 750 ms.Finally, the outcome (monetary amount won or lost) was displayed for ~ 1750 ms.Cues A and B, for each participant, were two randomly selected elements from a set of 24 different Agamothodeian font characters.
On each of 240 trials, one of these cues was randomly selected for presentation, and participants reported their choice between "risky" and "less risky" options by making one of two alternative motor responses corresponding to "risky" (£1) and "less risky" (10p) gambles on the trial's outcome.
In Study 1, the relationship between "button press" (left/right) and "choice" (risky/less risky) was randomised across participants within each group.In Study 2, the task was delivered on a Dell 15.6″ laptop (rather than in the scanner) and participants reported choices using key-presses ('b'/ 'z') whose corresponding choice options were likewise randomized within HC2 and TRS groups.
Participants' choices controlled only the variance of the outcome (i.e.whether it would involve a magnitude of 10p or £1), and had no influence over its valence (i.e.whether it likely to be a 'win' or a 'loss').Cues signified the probability of outcomes: one cue predicted an 80% chance of winning, while the other cue predicted an 80% chance of losing.Thus, one cue indicated that the optimal choice was the "Risky" option (betting £1) while the other cue indicated the "Less Risky" option (betting only 10p) was optimal.
A given cue-outcome contingency (e.g.P(loss|A) = P(win|B) = 0.8) was consistent for 60 trials before reversing.In short, this design examined participants' ability to learn what each cue indicated was the optimal response, to persist with this strategy despite occasional "misleading" undesirable outcomes, and to flexibly alter that policy whenever contingencies reversed.   , and overall severity of symptom dimensions assayed by the PANSS structured clinical interview 34,59 .*Lower BACS z-score corresponds to more severe cognitive impairment in TRS.www.nature.com/scientificreports/
In behavioural analyses, the undesirability of an outcome (relative to the counterfactual outcome that would have occurred had the participant chosen the alternative wagering option) was used as a behavioural proxy of "surprise" associated with it.Assuming that choosing the risky (less risky) gamble corresponds to the subject's Figure 5. Upper Panel: Schematic illustration of a single trial ("good" cue).At cue onset, one of two abstract symbols is presented (P(cue A) = 0.5; P(cue B) = 0.5).Supposing the cue illustrated is currently the 'good' cue, then the 'risky' choice (to bet £1) is optimal, and the illustrated outcome is the most likely result of that choice (P('win'|'good' cue) = 0.8).Lower Panel: Schematic illustration of a single trial ("bad" cue).At cue onset, one of two abstract symbols is presented (P(cue A) = 0.5; P(cue B) = 0.5).Supposing the cue illustrated is currently the 'bad' cue, then the 'less risky' choice (to bet only ten in-game pence) is optimal, and the illustrated outcome is the most likely result of that choice (P('loss'|'bad' cue) = 0.8).Table 4.Among those 24 models which performed either kind of confidence-monitoring (i.e.either "surface" or "surprisal" based confidence updating), the effect(s) of the confidence representation on learning and/or decision-making systematically varied to define a third axis of model classification.

Figure 2 .
Figure 2. Boxplots showing the median ( ), interquartile range ( ), and range ( ) of Study 1 subjects' baseline choice temperature free parameter (β 0 ) under the winning computational model, plotted separately for Healthy Control (HC1), At-Risk Mental State (ARMS), and First Episode Psychosis (FEP) participants.Any outlying values of β 0 within a group are plotted individually ( ).Compared to the control group (HC1), there was a significant elevation of β 0 in patients with FEP, and a trend-level tendency towards elevated β 0 in patients with ARMS.

Figure 3 .
Figure 3. Boxplots showing median ( ), interquartile range ( ), and range ( ) of key behavioural outcomes from Study 2, plotted separately for Healthy Control (HC2), and Treatment-Resistant Chronic Schizophrenia (TRS) groups.Any outlying data points within a group are plotted individually ( ).Top left Average accuracy did not differ significantly between HC2 and TRS.Top right Average proportion of all responses that were 'Risky' was significantly higher in TRS than HC2.Middle Left Compared to controls, TRS is associated with significantly reduced tendency to adaptively stay with an optimal strategy following high-probability (Informative, 'Inf ') desirable feedback to a correct response.Middle Right Compared to controls, TRS is associated with significantly reduced propensity to inappropriately stay with a suboptimal strategy following low-probability (Misleading, 'ML') desirable feedback to an incorrect response.Bottom Left Compared to controls, TRS is associated with significantly elevated tendency to adaptively switch away from a suboptimal strategy following informative ('Inf ') undesirable feedback (left) to an incorrect response.Bottom Right Compared to controls, TRS is associated with significantly elevated propensity to inappropriately switch away from an optimal strategy following Misleading ('ML') undesirable feedback to a correct response.

Figure 4 .
Figure 4. Boxplots showing the median ( ), interquartile range ( ), and range ( ) of Study 2 subjects' values of κB (the free parameter corresponding to the weighting factor for trialwise confidence-modulation of choice temperature, under the winning computational model) plotted separately for Treatment-Resistant Chronic Schizophrenia (TRS) and Healthy Control (HC2) groups.Any outlying values of κB within a group are plotted individually ( ).Average κB was significantly lower in TRS than HC2.
Study 1 was part of the Neuroscience in Psychiatry Network (NSPN) Neuroscience Clinical Adolescent and Adult Psychiatry Study (NCAAPS), under the approval of West of Scotland Research Ethics Committee 3 (NSPN) and Cambridgeshire 3 National Health Service Research Ethics Committee.Joint institutional sponsorship was provided by Cambridgeshire and Peterborough NHS Foundation Trust (CPFT) and University of Cambridge.Study 2 was conducted under the approval of Cambridgeshire 3 National Health Service Research Ethics Committee, and jointly sponsored by CPFT and University of Cambridge. https://doi.org/10.1038/s41598-024-68004-7

Table 1 .
Mean (and standard deviation, s.d.) ages for participants in TRS and HC2 groups, and for all Study 2 participants (male and female) within each of those two group, are indicated.

Table 2 .
Mean and standard deviation (s.d.) clozapine dose in TRS group.Indicated are numbers of patients with TRS taking additional typical antipsychotics (AP), the typical AP types taken, and typical AP daily doseequivalent to oral ariprazole (mg) (the most frequent typical AP in this TRS sample).

Table 3 .
Mean and standard deviation (s.d.) for illness duration (years from onset of first psychotic episode), cognitive function assayed by the Brief Assessment of Cognition in Schizophrenia (BACS)