Spared performance but increased uncertainty in schizophrenia: Evidence from a probabilistic decision-making task

Aberrant attribution of salience to in fact little informative events might explain the emergence of positive symptoms in schizophrenia and has been linked to belief uncertainty. Uncertainty is thought to be encoded by neuromodulators, including norepinephrine. However, norepinephrinergic encoding of uncertainty, measured as task-related pupil dilation, has rarely been explored in schizophrenia. Here, we addressed this question by comparing individuals with a disorder from the schizophrenia spectrum to a non-psychiatric control group on behavioral and pupillometric measures in a probabilistic prediction task, where different levels of uncertainty were introduced. Behaviorally, patients performed similar to controls, but their belief uncertainty was higher, particularly when instability of the task environment was high, suggesting an increased sensitivity to this instability. Furthermore, while pupil dilation scaled positively with uncertainty, this was less the case for pa- tients, suggesting aberrant neuromodulatory regulation of neural gain, which may hinder the reduction of uncertainty in the long run. Together, the findings point to abnormal uncertainty processing and norepinephrinergic signaling in schizophrenia, potentially informing future development of both psychopharmacological therapies and psychotherapeutic approaches that deal with the processing of uncertain information.


Introduction
Aberrant salience attribution to insignificant events has been suggested to explain various symptoms in schizophrenia, including positive symptoms such as delusions (Kapur, 2003) and cognitive biases such as 'jumping-to-conclusions', where patients typically make or alter decisions based on little evidence (Speechley et al., 2010). Recent theories propose that salience is affected by uncertainty (Adams et al., 2013;Broyd et al., 2017;Fletcher and Frith, 2009). Here, increased attribution of salience ('hypersalience') to external information may result from increased uncertainty surrounding cognitive representations in the mind's belief hierarchy. Consequentially, perception and belief updating are biased towards external information and sensory events as opposed to prior beliefs, explaining the experience of 'strange percepts' in a state of delusional mood (Adams et al., 2013). Delusions may then manifest as an attempt to give meaning to these 'strange percepts' (Fletcher and Frith, 2009). Increased belief uncertainty might further explain why patients with schizophrenia often exhibit maladaptive switching behavior in probabilistic reversal learning tasks (Culbreth et al., 2016a;Kaplan et al., 2016;Li et al., 2014;Murray et al., 2008;Schlagenhauf et al., 2014;Waltz et al., 2013). In these tasks, participants have to learn which choice option is more likely to result in a positive outcome and have to adapt their choices once the choice-outcome probability reverses. A positive outcome should encourage staying with the previous choice, whereas a negative outcome might either reflect the current choice-outcome probability (i.e. the system's intrinsic noise), in which case it should be disregarded, or indicate a change in probabilities, hence encouraging a choice switch. Increased choice switching observed in schizophrenia often occurs in response to both positive and negative outcomes (Culbreth et al., 2016a;Deserno et al., 2020;Waltz et al., 2013), though some have reported a decreased sensitivity particularly to positive feedback (Li et al., 2014;Schlagenhauf et al., 2014). Patients' impaired performance in these tasks may indicate either deficient learning about choice-outcome probabilities (henceforth noise; Murray et al., 2008;Reddy et al., 2016;Weickert et al., 2010), or an overestimation of the likelihood for those probabilities to change (volatility) (Cole et al., 2020;Deserno et al., 2020;Schlagenhauf et al., 2014), and possibly both (Waltz et al., 2013). A misrepresentation of these different types of uncertainties (noise and volatility) may hence cause patients with schizophrenia to attribute too much salience to a given outcome, resulting in increased switching between the different choice options even when it is not beneficial. Clearly, increased salience attribution to new events may alternatively or additionally manifest in higher uncertainty and instability of beliefs.
Mechanistically, hypersalience in schizophrenia has been linked to dysfunctional dopaminergic signaling (Heinz and Schlagenhauf, 2010), but the role of norepinephrine is less explored, despite its suggested association with uncertainty processing (Yu and Dayan, 2005). Norepinephrinergic activity in the locus coeruleus is reflected in pupil size (Joshi et al., 2016;Rajkowski et al., 1994;Samuels and Szabadi, 2008) and indeed, task-related pupil dilation responds to both outcome surprise and environmental volatility (Browning et al., 2015;Lawson et al., 2017;Nassar et al., 2012;Preuschoff et al., 2011), scales with the extent to which an outcome should evoke belief updating (Hämmerer et al., 2019), and signals fluctuations in neural gain and learning (Eldar et al., 2013). Early studies showed that pupil size scales less with the probabilities of presented stimuli in individuals with schizophrenia (Steinhauer and Zubin, 1982;Steinhauer et al., 1979), indicating a reduced adaptation of neural gain to uncertainty. However, it is unclear how this diminished pupil response would be affected by volatility and uncertainty as experienced by the individual. Furthermore, group differences in choice switching and the extent to which they are affected by volatility, may depend on the particular noise conditions of the task. While the most commonly chosen choice-outcome probabilities are 0.20 and 0.80 (Culbreth et al., 2016a;Deserno et al., 2020;Waltz and Gold, 2007;Waltz et al., 2013), the differential effects of other noise conditions and their interaction with volatility remain to be explored.
To address the above questions, we compared individuals with a disorder from the schizophrenia spectrum to a non-psychiatric control group in a probabilistic prediction task where noise and volatility were manipulated independently. Using cognitive-computational models, we estimated uncertainty related parameters and latent variables behind the observed behavior, and investigated their relationship with clinical symptoms, and pupil dilation. Here, we expected to observe increased choice switching and an overestimation of volatility in patients, whereas pupil responses in this group were presumed to scale less with uncertainty and the extent to which a new outcome signals belief updating, indicative of a maladaptive adjustment of neural gain to the degree with which new events are salient and should lead to internal model updates.

Methods and materials
Participants had to meet the following inclusion criteria: (1) 18 to 65 years old, (2) capacity for informed consent, (3) very good command of German, (4) IQ above 80, (5) normal or corrected-to-normal eyesight, (6) no history of neurological disorders, (7) no substance dependence, (8) no recreational drug consumption within one week prior to the assessment (excluding alcohol, nicotine, and caffeine), (9) a primary diagnosis of schizophrenia or schizoaffective disorder (SZ group; DSM-V; American Psychiatric Association, 2013) or no psychiatric diagnosis at all (HC group), verified with the Mini-International Neuropsychiatric Interview (MINI; Sheehan et al., 1998). The SZ group included in-and outpatients from the Department of Psychiatry and Psychotherapy of the University Medical Center Hamburg-Eppendorf (UKE), Germany, who were contacted directly or replied to announcements made on site. Control participants were recruited via student job websites and advertising leaflets. In total, 62 participants (SZ: n = 32, HC: n = 30) were recruited whereof one was excluded from all analyses because they failed to meet the inclusion criteria. The study was approved by the local ethics committee of psychologists at the UKE. All participants gave written informed consent prior to the study.

Probabilistic prediction task
To measure decision-making and belief updating under different noise and volatility conditions, a newly developed probabilistic prediction task was administered (Kreis et al., 2020b, preprint). On each trial, participants had to predict whether an upcoming Gabor patch would be tilted to the left or the right from the center (left-alt key for 'left-tilted', right-ctrl key for 'right-tilted'; orientation ± 45 • ; see Fig. 1A). The probability for the left-or the right-tilted patch was unknown to the participants and alternated between 85:15 (indicating outcome schedule, namely, 85% left-tilted and 15% right tilted) and 60:40 and the reverse (15:85, 40:60) after 20 (±4) trials, constituting conditions of high (60:40/40:60) and low noise (85:15/15:85; Fig. 1B). Participants were instructed to track the probabilities and the changes as good as possible and to minimize the amount of prediction errors. In a first, volatile block of the task, probability changes were hidden, and in a second, cued block, changes were announced, constituting conditions of high (volatile) and low (cued) volatility, each spanning 160 trials (+12 and 18 practice trials, respectively). For the cued block, participants were advised to 'reset' their beliefs about the distribution of stimuli at every announced change point, and relearn the new underlying distribution through choice-outcome observations. While the order of the noise conditions was the same for both blocks and across participants to ensure the same reward structure across blocks, the identity of the majority Gabor patch was inverted (Fig. 1B). Since time points of changes were identical in both blocks but explicitly announced in the cued block, block order was not counterbalanced to prevent facilitation of the detection of hidden changes in the volatile block. Importantly, and in line with previous studies on volatility overestimation in psychotic disorders (Cole et al., 2020;Deserno et al., 2020), the term 'volatility' as used here describes the subjectively perceived volatility (instability) of the environment. It should be noted, however, that some authors differentiate between the volatility of the environment and the subjective unexpected uncertainty that this volatility induces (e.g. Soltani and Izquierdo, 2019).
Behavioral task performance was measured as accuracy (proportion of times where the current majority stimulus was predicted, calculated separately for the two noise conditions per block) and proportion of choice switches (proportion of times where prediction on trial t + 1 was different from prediction on trial t, calculated separately for the two noise conditions per block; see Fig. 1C).

Working memory task: visual digit span task
To control for inter-individual differences in working memory capacity, a visual, computerized version of the digit span subtest of the Wechsler adult intelligence scale (WAIS-IV; Wechsler, 2008) was administered (for details see Kreis et al., 2020a). Working memory capacity was measured as the maximum amount of digits recalled in the correct order.

Clinical assessments and demographics
Demographic and clinical variables (see Table 1) were recorded during an interview. The MINI (Sheehan et al., 1998) was applied to confirm the self-reported information about the presence (SZ group) or absence (HC group) of clinical diagnoses. Within the SZ group, positive and negative symptoms were assessed with the Positive and Negative Symptoms Scale (PANSS; Kay et al., 1987). Negative symptom scores were calculated as suggested by van der Gaag et al. (2006;subsequently PANSS-N vdGaag ). To estimate premorbid intelligence, the German multiple choice vocabulary test (WST; Lehrl et al., 1995) was administered.

Pupil size
Pupil diameter was recorded from the left (in seven cases from the right) eye at a sampling rate of 500 Hz with an infrared video-based eye tracker (Eyelink 1000, SR Research) during the prediction task.

Procedure
First, demographic and clinical variables were recorded. Next, the volatile block of the prediction task was administered, followed by the working memory task, a brief decision-making task (not reported here) and the WST. Then, the cued block of the prediction task was completed. Administration of the measures between the two blocks of the prediction task was supposed to reduce any potential strain associated with the eye tracking set up and to help participants start the cued block with a new mindset, reducing potential priming effects of the experiences within the volatile block. At the end of the session, the clinical assessment was conducted with the MINI and the PANSS.

Analysis
To test for the relevance of potential covariates, SZ and HC group were compared regarding age, education, premorbid verbal intelligence and working memory capacity, using non-parametric methods when variables were not normally distributed. To investigate the relationships between task conditions, group membership, behavioral performance, pupil dilation, and latent variables as extracted from cognitive-computational models, linear mixed-effects models were implemented. Their residuals were tested for normality and dependent variables were cube root, square root or square transformed if normality was violated. Group-level parameters from the winning cognitive-computational model (estimated using the hierarchical Bayesian approach) were compared between groups and task conditions by contrasting their posterior sampling distributions . Associations between symptoms and cognitive-computational parameters were tested with Spearman correlations (ρ) under conditions of non-normality. Testing was conducted with a significance level of 0.05 using R (R version 3.5.1; R Core Team, 2018).

Cognitive-computational modelling of behavior
To quantify latent cognitive processes, various cognitivecomputational models were fitted to participants' predictions (i.e. 'left' or 'right') and observed outcomes (i.e. correct or incorrect) for the volatile and the cued block, respectively, and separately for the SZ and the HC group. This approach has the benefit that both group level and individual level are simultaneously accounted for. Importantly, the same priors were used in the SZ and the HC models, respectively, so that any group differences that may emerge in model parameters after fitting Fig. 1. Probabilistic prediction task. A) Trial structure: Example trials 1, 2 and 21 are displayed. Each trial started with the presentation of a vertically striped Gabor patch. Participants then had to predict via a button press whether the upcoming patch was going to be either left-or right-tilted from the center. After a fixed twosecond delay, the outcome was presented and remained on screen for another 2 s. Then the vertical patch reappeared, prompting the next trial/prediction. Within the cued task block, changes in noise conditions were announced in the beginning of the respective trial (see B) through a 'change' message that appeared on screen. No further information was provided about the nature of the upcoming noise condition. Participants had to press 'enter' in response to that change message before they could continue with the task in order to guarantee that they perceived it. B) Task structure: the probabilities for the left-(p(left)) and the right-tilted (1-p(left)) Gabor patch changed at fixed time points after 20 ± 4 trials. In the volatile block (solid, red line), these changes were hidden, and in the cued block (dashed, black line) they were announced (see A). Whereas the timing of change points and the order of the different noise conditions were identical across blocks (lines are only jittered for display), the identity of the respective majority stimulus within a block was inversed. C) Proportion of accurate predictions (prediction of current majority stimulus; left panels) and proportion of choice switches (prediction on trial t + 1 is different from prediction on trial t; right panels) for each group on trials of high and low noise within the volatile and the cued block of the task. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) must be due to underlying group differences. The models included a win-stay-loose-shift model (Worthy and Todd Maddox, 2014), four different Reinforcement Learning models (den Ouden et al., 2013;Gläscher et al., 2008;Pearce and Hall, 1980;Rescorla and Wagner, 1972), and two variants of a Hidden Markov Model (HMM; Schlagenhauf et al., 2014) -all chosen to allow for the fact that participants might employ different strategies when solving the task (see Supplementary material for details). For the cued block, additional variants of all models were specified that incorporated belief resets whenever a change in probabilities was announced.
Models were estimated using a Markov chain Monte Carlo (MCMC) within the hierarchical Bayesian framework (Ahn et al., 2017;Gelman et al., 2013). We further conducted a model recovery analysis and all candidate models could be properly identified and recovered (Wilson and Collins, 2019;Crawley et al., 2020; see Supplementary material). For both groups and both blocks, respectively, a variant of the HMM provided the best fit (see Supplementary material for model comparison). The HMM, a Bayesian inference model, assumes a higher-order representation of the task structure that accounts for the instability of the task environment. Here, participants are expected to choose 'left' or 'right' depending on whether they believe to be in a left-('majority stimulus is left') or right-tilted hidden state ('majority stimulus is right'). State beliefs are inferred and updated on each trial, depending on the history of choice-outcome pairs as well as the estimated transition probability γ, which quantifies how the two hidden states are expected to change. Thus, γ indicates a participant's perceived volatility of the task environment. In the winning model (HMM RP ), positive (correct prediction) and negative (incorrect prediction) feedback sensitivity were allowed to differ since positive and negative feedback may affect participants' belief updating differently. For the cued block, the winning model included belief resets. Here, γ was expected to be lower than in the volatile block due to the absence of sudden, hidden changes. However, γ would still capture the randomness of the outcomes as driven by noise and might further reflect participants' uncertainty about the noise conditions overall.
As a measure of trial-wise uncertainty regarding the hidden states, belief entropy, H(S t ), was estimated based on the posterior for the different probabilities of the prediction to be correct. On a given trial, entropy reflects a participant's uncertainty about the current task state: Hence, entropy is highest when the probabilities associated with the different task states are assumed to be uniform (Hämmerer et al., 2019). Since uncertainty slows down reaction times in decision-making paradigms (e.g. Volz et al., 2005), higher entropy values may lead to prolonged reaction times on subsequent trials.
To obtain a measure that indicates to which extent a state belief should be updated on a given trial, Bayesian surprise was estimated as the Kullblack-Leibler divergence of the trial-wise state beliefs before (P (S tpre )), and after an outcome observation (P(S tpost )), extracted from the HMM RP :

Pupil signal preprocessing
The pupil signal was corrected for eye blinks and other artefacts based on the signal's velocity and subsequent cubic-spline interpolation (Mathôt et al., 2018). Missing data of more than 1000 consecutive milliseconds were not interpolated but treated as missing in later analyses. The corrected signal was smoothed with a 3 Hz low pass Butterworth filter and z-scored per task block and participant. The z-scored signal was baseline-corrected per trial through subtraction of the average signal of the 500 ms preceding outcome onset. Trials where more than 50% of the signal were missing or interpolated were treated as missing in subsequent analyses.

Data exclusion
For one participant, all data of the cued block were treated as missing as they aborted before completion. Another participant was excluded from the computational model of the volatile block, as prior modelling attempts resulted in an inappropriate fit. All pupil data of a participant within a task block were treated as missing if more than 50% of trials were missing within that block (no. of exclusions in volatile block: n HC = 1, n SZ = 5; cued block: n HC = 1, n SZ = 5, three consistent with volatile block).

Results
No significant group differences emerged in any of the demographic variables or working memory capacity (Table 1).
Proportion of choice switches was lower in the low volatility condition (b = − 0.06, t = − 3.14, p = .003) and higher on high-noise trials (b = 0.11, t = 8.80, p < .001). The interaction between volatility and noise was not significant (b = 0.00, t = 0.06, p = .950). Including group as a predictor revealed no significant effect for group (b = 0.04, t = 1.25, p = .217), or a noise by group interaction (b = 0.00, t = 0.17, p = .862). The interaction between volatility and group indicated that patients decreased the amount of choice switches more when moving from the volatile to the cued block (see Fig. 1C). However, this effect was not significant (b = − 0.06, t = − 1.86, p = .068), and neither was the threeway interaction of volatility, noise and group (b = − 0.03, t = − 0.90, p = .372). Exploratory analyses regarding the effect of feedback on choice switches revealed a negative effect for positive as opposed to negative feedback (correct vs. incorrect prediction: b = − 0.36, t = − 9.33, p < .001), which was not moderated by group (see Supplementary material).

Cognitive-computational parameters
The HMM RP (see Section 2.3.1) entailed three parameters: sensitivity to positive feedback (c), sensitivity to negative feedback (d), and participants' beliefs about the transition probability (γ). The corresponding group parameters per block are presented in Table 2.
To test for effects of volatility condition, group and their interaction on these group parameters, posterior distribution comparisons were conducted and the 89% highest density intervals (HDI; see McElreath, 2020) of the differences between block, group and their respective difference were investigated. For γ, the comparison revealed credibly higher values (Table 2) in the volatile than in the cued task block, without indication of a main group effect or an interaction (Fig. 2). For c, all HDIs included zero and for d, there was again only a credible effect of task block (Fig. 2).
We further explored how individual model parameters were related to positive or negative symptoms within the SZ group (Table 3). Here, severity of positive symptoms was associated with a decreased sensitivity to positive feedback c under low volatility (ρ = − 0.40, p = .030), but this effect did not survive Bonferroni correction for multiple comparisons (α adj = 0.004). Exploratory analyses regarding the relationship between symptoms and task behavior pointed towards a positive association between negative symptoms and accuracy in the cued block, but again the effect did not survive Bonferroni correction (see Supplementary material).

Entropy and Bayesian surprise by task conditions and group
The effects of volatility, noise, group, and their interactions on trialwise belief entropy (uncertainty) and Bayesian surprise, both cube root transformed, were assessed with linear mixed-effects models.
Uncertainty was higher on high-noise trials (Table 4) and within the SZ group (b = 0.11, t = 2.97, p = .004), though this seemed to be most pronounced during high volatility (b = − 0.11, t = − 3.09, p = .003). To elucidate the relationship between uncertainty and behavior, an additional analysis on reaction times was conducted. This revealed prolonged reaction times following trials of higher uncertainty (b = 0.02, t = 4.15, p < .001; see Supplementary material).
Bayesian surprise was significantly higher on high-noise trials but did not differ by volatility or between groups (Table 5).

Pupil response to entropy and Bayesian surprise
To assess the extent to which pupil dilation scaled with entropy (uncertainty) and Bayesian surprise, two separate linear mixed-effects models were constructed, where trial-wise maximum pupil dilation during outcome presentation (square root transformed to normalize model residuals) functioned as the dependent variable, respectively. To control for any prior differences in pupil size variation, average variation in baseline pupil size (standard deviations) was compared between groups, yielding no significant difference within the volatile (U = 451, p = .211; Md HC = 0.14, Md SZ = 0.12; n = 55) or the cued task block (U = 392, p = .618; Md HC = 0.15, Md SZ = 0.14; n = 54). To control for potential effects of the anticholinergic load induced by daily dosage of the prescribed antipsychotics in the SZ group (Minzenberg et al., 2004;Naicker et al., 2016), benztropine mesylate equivalents, where available (n = 27), were calculated and correlated with averaged baseline variation as well as averaged maximum pupil dilation. This revealed no significant relationships in the volatile (baseline variation: ρ = 0.23, p = .304; pupil dilation: ρ = 0.20, p = .368) or the cued block (baseline variation: ρ = 0.27, p = .236; pupil dilation: ρ = 0.17, p = .456).
Overall, pupil dilation was larger on trials of increased uncertainty and Bayesian surprise, respectively (both z-scored per task block and participant; see Table 6). However, the positive relationship between pupil dilation and entropy was smaller in the SZ group (b = − 0.01, t = − 2.53, p = .011), indicating that patients adapted their pupil size less in response to uncertainty (Fig. 3). Including block and associated interactions into the model did not reveal any significant block related effects (see Supplementary material).

Discussion
Here, we investigated decision-making under uncertainty in a probabilistic prediction task where noise and volatility were independently manipulated to assess their effect on behavior in individuals with a diagnosis from the schizophrenia spectrum (SZ group) and nonpsychiatric controls (HC group).
While task manipulation had the expected effects, with lower accuracy and more switches when noise or volatility was high, groups did not differ. This contrasts previous findings of impaired probabilistic learning and increased switching behavior in patients with schizophrenia and first-episode psychosis (Culbreth et al., 2016a;Deserno et al., 2020;Murray et al., 2008;Waltz et al., 2013) and may in part reflect task paradigm differences. In studies where a monetary reward is implemented, group differences may emerge due to differences in valuation processes (Chang et al., 2019;Culbreth et al., 2016b). Importantly, average accuracy was above chance level for all task conditions, indicating successful learning and effort investment even in the absence of an external reward. Another difference concerns the selected noise conditions: in most reversal learning tasks, only one noise condition is employed (Culbreth et al., 2016a;Deserno et al., 2020;Waltz et al., 2013). Here, noise conditions varied to test whether this moderates group differences. The low-noise condition (85:15) may have been easier to track, even for patients, whereas the high-noise condition (60:40) may have been so demanding that even the HC group experienced difficulties -both contributing to smaller group differences. This study, however, is not the first to report intact probabilistic learning in schizophrenia. Reddy et al. (2016) found preserved initial and reversal learning in a substantial subgroup of patients. Meanwhile, deficits in an impaired subgroup were linked to decreased feedback sensitivity and diminished neurocognitive performance, e.g. lower working memory capacity. Similar to their sample, our sample contained a large proportion of outpatients. Furthermore, working memory capacity did not differ significantly between SZ and HC group and groups were matched on relevant demographic variables and premorbid verbal intelligence. The general neurocognitive 'fitness', the rather stable psychopathology, and the comparable demographics of our sample  may thus explain the absence of behavioral differences. This highlights the importance of considering the heterogeneity of schizophrenia populations when drawing conclusions from and comparing results across single studies in this field (see also Moritz et al., 2020).
Similar to the behavioral results, the lack of group differences on the main parameters of the cognitive-computational model were at odds with previous findings of increased subjective volatility in patients with schizophrenia (Schlagenhauf et al., 2014) or at high risk for psychosis (Cole et al., 2020). However, a negative correlation between positive symptoms and positive feedback sensitivity within the SZ group was observed, which seemed to be in line with previous reports of decreased sensitivity to positive feedback in schizophrenia, particularly under high positive symptom load (Reddy et al., 2016;Schlagenhauf et al., 2014). Interestingly, this was only true when volatility was minimal, suggesting that despite announced environmental changes, participants with a higher current severity of delusions and hallucinations seemed not to perceive a positive feedback (i.e. a correct prediction) as a reliable indicator for their choice to be correct. In the volatile condition, this correlation might have been overshadowed as hidden changes increased feedback unreliability overall. Nevertheless, while the effect size was moderate to large, the significance test did not survive the Bonferroni correction for multiple comparisons. In contrast, uncertainty was significantly higher in the SZ group across the task, and even more so during high volatility. This suggests some increased sensitivity to the environment's volatility in patients, even though this did not translate into a significantly increased model-based volatility estimate. Moreover, patients showed a decreased adaption of pupil size to uncertainty. When uncertainty is high, especially in volatile environments, a given outcome should be highly salient as it serves as a teaching signal that could help to decrease prior uncertainty. Accordingly, pupil dilation should be larger if interpreted as an index of neural gain (Eldar et al., 2013). Therefore, the results point to a reduced ability to differentiate between high and low salient, or informative, outcomes in the SZ group, in line with the aberrant salience account.
In light of these results it may seem surprising that despite higher uncertainty, patients showed comparable behavioral performance. Notably, entropy was estimated in a fine grained manner, based on continuous updates of state beliefs across the task, influenced by overall parameters such as transition probability γ. In contrast, accuracy and choice switches were summarized more coarsely and may not be directly related to uncertainty on a given trial. However, entropy led to prolonged reaction times, a classic effect of uncertainty (see e.g. Volz et al., 2005).
One question that may arise with respect to the modelling is how well participants actually managed to reset their beliefs at every announced change point during the cued block of the task. Here, a postassessment questionnaire may have provided further insights into participants' subjective experiences. Nevertheless, given that the reset model provided the best fit to the data of the block and the fact that the transition probability in this block was lower than in the volatile condition, it is very likely that participants indeed engaged in belief resets as instructed. Another question is to what extent the blending of probability changes (noise condition changes but majority stimulus stays the same) and true reversals (majority stimulus changes) in this paradigm may have affected modelling. Notably, the fitted models were blind to the true probabilities and together with the superior fit of the reset model and the difference in transition probabilities this renders a large effect of this potential confounder on the derived results unlikely.
Taken together, our study demonstrates that under certain conditions, individuals with a diagnosis from the schizophrenia spectrum exhibit probabilistic decision-making similar to that of non-psychiatric controls, even though they are more uncertain, particularly when the task environment is volatile. The failure to reliably adapt pupil responses to the degree of uncertainty indicates a failure to differentiate between more and less informative outcomes. This might explain why uncertainty remains generally higher in the patient group and is not reduced through learning. The findings thus corroborate hypotheses of aberrant norepinephrinergic signaling in schizophrenia (Fitzgerald, 2014;Mäki-Marttunen et al., 2020) and call for further investigation of the different implicated neuromodulatory systems and their Notes: Entropy = cube root transformed choice uncertainty (HMM RP ); IV = independent variable; block = contrast of the second, cued task block to the first, volatile task block; noise = contrast of the high-to the low-noise condition; group = contrast of the SZ (schizophrenia) to the HC (controls) group; R 2 m = marginal R 2 , i.e. proportion of variance explained by the fixed effects alone; R 2 c = conditional R 2 , i.e. proportion of variance explained by both the fixed and random effects (R 2 m and R 2 c based on Nakagawa and Schielzeth, 2013). Notes: Bayesian surprise = cube root transformed belief updating (HMM RP ); IV = independent variable; block = contrast of the second, cued task block to the first, volatile task block; noise = contrast of the high-to the low-noise condition; group = contrast of the SZ (schizophrenia) to the HC (controls) group; R 2 m = marginal R 2 , i.e. proportion of variance explained by the fixed effects alone; R 2 c = conditional R 2 , i.e. proportion of variance explained by both the fixed and random effects (R 2 m and R 2 c based on Nakagawa and Schielzeth, 2013). Notes: Models were fitted separately for entropy and Bayesian surprise as the predictive latent HMM RP variable, both were z-scored per task block and participant; pupil dilation = square root transformed maximum baselinecorrected pupil dilation during outcome presentation (based on the z-scored pupil trace per participant and block); IV = independent variable; Group = contrast of the SZ (schizophrenia) to the HC (controls) group; random intercepts were specified for each participant; R 2 m = marginal R 2 , i.e. proportion of variance explained by the fixed effects alone; R 2 c = conditional R 2 , i.e. proportion of variance explained by both the fixed and random effects (R 2 m and R 2 c based on Nakagawa and Schielzeth, 2013).
interactions. Accumulated evidence from this field could inspire the development of psychopharmacological treatments where adding norepinephrine transmission modulating agents might show beneficial effects in subgroups of patients (Fitzgerald, 2014). Furthermore, the study highlights the role of uncertainty processing in schizophrenia, a concept that is already addressed in metacognitive training interventions (Moritz and Woodward, 2007). The future development of therapeutic interventions of this kind may profit from further insights into the distinct effects of different kinds of uncertainty, such as noise and volatility, on belief formation and updating in schizophrenia. In this context it will also be important to investigate to what the extent the findings reported here may be moderated by neurocognitive functioning or linked to subsyndromal psychotic symptoms.

Data statement
Raw and processed anonymized data, as well as scripts of the computational models are available in an Open Science Framework repository: DOI 10.17605/OSF.IO/AD65K.

Role of funding sources
The funding sources played no role in study design; collection, analysis and interpretation of data; the writing of the manuscript; or in the decision to submit the article for publication.

Declaration of competing interest
The authors report no conflict of interest. Fig. 3. Pupil responses to Bayesian surprise (left panels) and belief entropy (uncertainty; right panels) for each task block (top row: volatile, bottom row: cued). Trials of high and low values of the latent variables were categorized for each participant separately, with high values above and low values below or equal to the participant specific median within a block. For both types of trials (high values: brighter shades, low values: darker shades) and both groups of participants (HC = individuals without psychiatric disorder: blue colors, SZ = individuals with disorder from the schizophrenia spectrum: purple colors), z-scored and baseline-corrected pupil size was averaged for each sample during outcome presentation (solid lines; shaded areas indicate standard error of the mean). Note that these median-based groupings were not used in any of the statistical models but were merely chosen for a more intuitive display. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) their facilities and providing indispensable guidance during clinical assessments.
This work was supported by a Norwegian Research council grant under Grant FRIMEDBIO 262338, awarded to GP. LZ was partially supported by the Vienna Science and Technology Fund (WWTF VRG13-007).