State anxiety biases estimates of uncertainty and impairs reward learning in volatile environments

Clinical and subclinical (trait) anxiety impairs decision making and interferes with learning. Less understood are the eﬀects of temporary anxious states on learning and decision making in healthy populations, and whether these can serve as a model for clinical anxiety. Here we test whether anxious states in healthy individuals elicit a pattern of aberrant behavioural, neural, and physiological responses comparable with those found in anxiety disorders —particularly when processing uncertainty in unstable environments. In our study, both a state anx- ious and a control group learned probabilistic stimulus-outcome mappings in a volatile task environment while we recorded their electrophysiological (EEG) signals. By using a hierarchical Bayesian model of inference and learning, we assessed the eﬀect of state anxiety on Bayesian belief updating with a focus on uncertainty estimates. State anxiety was associated with an underestimation of environmental uncertainty, and informational uncertainty about the reward tendency. Anxious individuals’ beliefs about reward contingencies were more precise (had smaller uncertainty) and thus more resistant to updating, ultimately leading to impaired reward-based learning. State anxiety was also associated with greater uncertainty about volatility. We interpret this pattern as evidence that state anxious individuals are less tolerant to informational uncertainty about the contingencies governing their environment and more willing to be uncertain about the level of stability of the world itself. Further, we tracked the neural representation of belief update signals in the trial-by-trial EEG amplitudes. In control participants, lower-level precision-weighted prediction errors (pwPEs) about reward tendencies were represented in the ERP signals across central and parietal electrodes peaking at 496 ms, overlapping with the late P300 in classical ERP analysis. The state anxiety group did not exhibit a signiﬁcant representation of low-level pwPEs, and there were no signiﬁcant diﬀerences between the groups. Smaller variance in low-level pwPE about reward tendencies in state anxiety could partially account for the null results. Expanding previous computational work on trait anxiety, our ﬁndings establish that temporary anxious states in healthy individuals impair reward-based learning in volatile environments, primarily through changes in uncertainty estimates, which play a central role in current Bayesian accounts of perceptual inference and learning.


Introduction
Anxiety is characterised by excessive worry about negative possibilities ( Grupe and Nitschke, 2013 ). It can lead to distinct difficulties when making decisions and learning about the world, as anxious individuals experience negative reactions to uncertainty -known as intolerance of uncertainty (IU; Bishop, 2007 ;Carleton, 2016 ). Recent work has established that individuals high in trait anxiety have difficulties adapting their learning rate to changes in probabilistic task environments ( Browning et al., 2015 ;Huang et al., 2017 ). Less understood is how temporary states of anxiety in healthy subjects interfere with optimal learning and belief updating in the brain. Identifying the computations that subserve learning under state anxiety is important due information about those response-outcome relationships and decreases with learning. Lastly, changes in the environment (volatility) induce unexpected environmental uncertainty. Subjective estimates of volatility should affect learning as the individual should be more willing to update their estimates in a world that is changing ( Mathys et al., 2011 ). To reduce uncertainty, the brain is thought to appraise the inherent statistical structure of the world using probability distributions, continuously updating and inverting a hierarchical model of the causes of the sensory inputs it receives ( de Lange et al., 2018 ;Doya et al., 2007 ;Friston, 2005 ;Friston, 2010 ;Rao and Ballard, 1999 ). In this context, each type of uncertainty is expressed by the width (variance, or its inverse, precision) of the probability distribution of the corresponding belief ( Feldman and Friston, 2010 ;Mathys et al., 2011 ).
Examinations of belief, uncertainty, and precision estimates using Bayesian formulations in perceptual and learning tasks are increasingly used to provide mechanistic explanations for an array of neuropsychiatric conditions. Specifically, difficulties estimating precision has been suggested to explain various clinical expressions, from movement difficulties in Parkinson's disease to features of schizophrenia and autism ( Friston et al., 2016Lawson et al., 2017Lawson et al., , 2014Parr et al., 2018 ). In the case of anxiety, altered beliefs are also theorised to play a vital role ( Paulus and Stein, 2010 ;Paulus and Yu, 2012 ). As anxiety relates to worry over uncertainty, volatile task environments have been used to understand how trait anxiety affects learning, providing a mechanistic account of anxiety-related disorders ( Browning et al., 2015 ;Huang et al., 2017 ). Healthy individuals are known to adapt their learning rate to volatility, with changing environments promoting a higher learning rate as new information needs to be integrated to better predict the future ( Behrens et al., 2007 ). By contrast, high-trait anxious individuals show reduced adaptability of their learning rate to changes in volatility, both in aversive ( Browning et al., 2015 ) and reward settings ( Huang et al., 2017 ). Moreover, they show poorer performance in decision-making tasks ( de Visser et al., 2010 ;Hartley and Phelps, 2012 ;Miu et al., 2008 ). Expanding on those findings, here we evaluated whether temporary anxious states in healthy individuals influence reward learning in a volatile environment through changes in informational and environmental uncertainty. Evidence for a link between anxiety and inaccurate estimation of uncertainty would lend support to recent theoretical accounts suggesting that difficulties learning from incomplete information and misestimations of uncertainty are crucial to understanding affective disorders ( Pulcu and Browning, 2019 ).
Probabilistic inference is thought achieved through the sequential use of Bayes' rule, dynamically combining predictions (prior beliefs) with new evidence (sensory data) and weighting each resultant prediction error (PE) according to its precision ( Feldman and Friston, 2010 ;Friston and Kiebel, 2009 ;Kok and de Lange, 2015 ). This predictive coding scheme relies on a specific message passing policy carried out among regions of the cortical hierarchy ( Bastos et al., 2012 ;Iglesias et al., 2013 ;Rao and Ballard, 1999 ). Predictions are transmitted down the cortical hierarchy (backwards) to meet incoming ascending (forward) sensory PEs thought to arise in supragranular layers in superficial pyramidal cells ( Friston and Kiebel, 2009 ). Beliefs are then updated by reducing PE signals across each level of the cortical hierarchy, with both priors and PEs weighted according to their estimated precision ( Kok and de Lange, 2015 ). Importantly, developments in Bayesian computational modelling allow us to estimate inter-individual differences in the trial-wise computations and expression of these precision-weighted PEs ( Mathys et al., , 2011. Monkey single-cell recording and human functional magnetic resonance imaging (fMRI) studies have shown that PEs elicited by reward are encoded by phasic responses in midbrain dopamine neurons, and these signals are conveyed to the medial frontal cortex (MFC; Chew et al., 2019 ;Matsumoto et al., 2007 ;Morris et al., 2006 ;Zarr and Brown, 2016 ). Using electroencephalography (EEG), these reward learning signals can be detectable in the error related negativity (ERN), an event-related potential (ERP) triggered by overt errors around 100 ms; and the feedback ERN (fERN) that follows negative feedback around 250 ms ( Holroyd et al., 2003 ;Montague et al., 2004 ;Nieuwenhuis et al., 2004 ;Yeung et al., 2005 ). Both components have been shown to originate in the posterior medial frontal cortex (pMFC, including the anterior cingulate cortex, ACC; Holroyd et al., 2003 ;Montague et al., 2004 ;Yeung et al., 2005 ). Relevant to our study, the fERN has been proposed to index the magnitude of prediction violation (surprise), thus reflecting a reward PE signal that can be estimated, for instance, by using reinforcement learning models ( Gehring and Willoughby, 2004 ;Holroyd and Coles, 2008 ;Holroyd and Krigolson, 2007 ). Another component of the ERP that may be sensitive to reward PEs, valence, and surprise is the P300 (peaking between 250 − 500 ms with a parietal topography, Hajcak et al., 2007Hajcak et al., , 2005Polich, 2007 ;Wu and Zhou, 2009 ).
Evidence linking PEs, Bayesian surprise, and belief updating to trial-wise fluctuations in ERP responses comes from studies combining computational modelling and analysis of trial-wise EEG responses ( Diaconescu et al., 2017a ;Jepma et al., 2016 ;Kolossa et al., 2015 ;Mars et al., 2008 ;Stefanics et al., 2018 ;Weber et al., 2020 ). For instance, recent EEG studies on the mismatch negativity (MMN) were able to spatiotemporally dissociate lower-level precision-weighted PE (pwPE) signals, which drive updates in belief estimates ( Stefanics et al., 2018 ), and higher-level pwPEs -driving volatility updates ( Weber et al., 2020 ). In addition, model-based single-trial analyses of the P300 identified the earlier P3a waveform of anterior distribution as an index of belief updating, whereas Bayesian surprise was represented in the later posterior P3b component ( Kolossa et al., 2015 ). Here we were interested in assessing the neural representation of pwPEs across different levels, including lower-level pwPEs used to update reward tendency estimates, and higher-level pwPEs used to update volatility estimates, as belief updates on these two levels both depend on informational and environmental uncertainty. We therefore aimed to examine the effect of these two hierarchically-related pwPEs on brain activity by analysing trialwise ERP responses across frontal, central, and parietal brain regions, and within a broad temporal range from 200 to 850 ms, encompassing the fERN and extended P300 components.
To address our questions, we examined cortical dynamics in a state anxious group and a control group using EEG recordings during a reward-based learning task. To link anxiety-induced neural changes to potential alterations in uncertainty estimation, we used a Bayesian model of perception and learning, the Hierarchical Gaussian Filter (HGF, Mathys et al., 2014Mathys et al., , 2011. The HGF estimates individual trajectories of trial-wise belief updates governed by hierarchically related PEs based on the behavioural responses of participants. To reveal the effect of hierarchical PEs and precision weights on evoked brain responses, we used the relevant computational quantities (pwPEs) as regressors in a general linear model (GLM) of trial-wise EEG amplitudes -as done in previous studies ( Diaconescu et al., 2014( Diaconescu et al., , 2017aWeber et al., 2020 ).

Participants
Forty-two healthy individuals (age 18-35, 28 females, mean age 27, standard error of the mean [SEM] 0.9) participated in this reward-based learning study following written informed consent. This experiment was approved by the ethics review committee at Goldsmiths University of London. Our sample size was informed by previous computational work on anxiety ( Browning et al., 2015 ). All participants were healthy volunteers, with no past neurological or psychiatric disorders.
All participants were screened using Spielberger's Trait Anxiety Inventory (STAI; Spielberger, 1983 ) which has reliably demonstrated internal consistency and convergent and discriminant validity ( Barnes et al., 2002 ;Spielberger, 1983 ;Spielberger et al., 1970 ). Scores on this trait inventory range from low (20) to high anxiety (80). Participants were measured for their trait anxiety level (mean in the total sample was 46, SEM 1.5) and then split into two groups using the me-dian value (43). The sample population range was between 34 and 68 (Low trait = 34-42, High trait = 43-68). This created a high and low trait anxiety sample to then pseudo-randomly draw from to create the experimental (state anxiety, StA) and control groups (Cont). The mean trait anxiety score in the StA group was 47 (SEM = 2.1), while it was 46 (SEM = 2.2) in the Cont group. Importantly, individual trait anxiety levels did not exceed the clinical level ( > 70, a cutoff score provided by the authors who developed the Spielberger STAI scale corresponding with the mean and 2SD of the average score for adults, see: Spielberger et al., 1983 ).
Taken together, the ages (mean 27.7, SEM = 1.2) and sex (13 female, 8 male) of the Cont group were commensurate with those from StA (mean 27.5, SEM = 1.3, sex 14 female, 7 male), demonstrating that no age or sex-related confounds are present for subsequent analysis. This is important in the light of documented age and sex-related effects on heart-rate variability (HRV: see Voss et al., 2015 ), which we used to assess physiological changes due to state anxiety.

Experimental Design
We used a between-subjects experimental design with state anxiety being the between-subject factor (StA and Cont groups). Both groups completed our experimental task, which consisted of four blocks: resting state 1 (R1: baseline), reward learning task block 1 (TB1), reward learning task block 2 (TB2), and resting state 2 (R2; see Supplementary  Fig. 1). Both resting state blocks were 5 min-long recordings of EEG and electrocardiography (ECG) with eyes open. After R1, participants conducted a binary choice decision-making task with contingencies that changed over the course of learning as in previous work ( Behrens et al., 2007 ;de Berker et al., 2016 ;Iglesias et al., 2013 ). In our task, participants completed two blocks of 200 trials each (TB1, TB2), and their goal was to find out which one of two visual icons (always either blue or orange: see Figure 1 ) would lead to a monetary reward (positive reinforcement, 5 pence). Thus, they had to learn the probability of reward assigned to each stimulus (reciprocal: p, 1-p). Both experimental blocks were divided into 5 segments with different stimulus-outcome contingency mappings that were randomly ordered for each participant and varied in length between 26 and 38 trials. These contingencies ranked from being strongly biased (90/10), moderately biased (70/30), to unbiased (50/50), and repeated in reverse relationships (10/90; 30/70) so that over the two blocks there were 10 contingency blocks in total .
On individual trials, participants were asked to predict which of the two visual icons was going to reward them with money. Successful predictions were rewarded 5p, while unsuccessful predictions and no-responses were regarded as losses with 0p reward ( Fig. 1 ). The stimuli were either presented to the left or right of the centre of the screen randomly. They remained on the screen until a response was given or the prediction time (2200 ms ± 200 ms) expired. When a response of either the left arrow key or right arrow key was pressed, participants immediately saw their chosen image highlighted in bright green, which remained on screen for 1200 ms ( ± 200 ms) before the outcome was revealed. The outcome, either win or lose, was shown in the middle of the screen for 1200 ms ( ± 200 ms) in green and red respectively. Each trial ended with a fixation cross and an inter-trial interval of 1250 ms ( ± 250 ms).
The participants were given full computerised instructions for each element of the experiment, including questionnaires. Each questionnaire came with written instructions and was responded to using the numerical keyboard buttons. Just before 10 practice trials of the same probabilistic reward learning task used in the main experiment, participants were explicitly informed that the reward structure would change throughout the task and that they needed to adjust their predictions in response to inferred changes . Importantly, directly after practice trials but before TB1, both the state anxiety and the control groups were informed that this experiment was, in fact, an examination of performance in two subsequent tasks: reward learning and an additional presentation task (see next section). Their instructions with regard to the additional task were, however, different as we aimed to induce state anxiety during the blocks of reward-based learning in the state anxiety and not in the control group.

State Anxiety Manipulation
Participants in the StA group were informed that they had been randomly selected to complete a public speaking task after finishing the reward learning task ( Feldman et al., 2004 ;Lang et al., 2015 ;Lorberbaum et al., 2004 ). They were told they would be required to present a piece of abstract art and would be allowed to prepare for 3 min for a 5 min public presentation of this artwork to a panel of academic experts. Those in the control (Cont) group were instead informed that they would be given a piece of abstract art, but they were to give a mental description of it for the same time privately to themselves (instead of a panel of experts). After completing the reward-based learning blocks, StA participants were informed of the sudden unavailability of the assessment panel and were ultimately instructed to describe the artwork privately in line with the Cont group.

EEG and ECG Recording and Pre-Processing
EEG and ECG signals were recorded throughout all task blocks (R1, TB1, TB2, and R2) using the BioSemi ActiveTwo system (64 electrodes, extended international 10-20) with a sampling rate of 512 Hz. The EEG signals were referenced to the average between two electrodes affixed to the left and right earlobes. Four additional external electrodes in a bipolar configuration were used, which included two electrodes positioned to capture vertical and horizontal eye-movements (EOG), one to the zygomatic bone of the right eye, and one to the glabella (between both eyes); and two electrodes to record the ECG. ECG electrodes were placed in a two-lead configuration ( Moody and Mark, 1982 ) calibrated to fit the Einthoven triangle ( Wilson et al., 1931 ). All electrodes used highly conductive bacteriostatic Signa gel (by Parker Laboratories, Inc., 4 Sperry Road. Fairfield, NJ 07004 USA). All events, including presentation of stimuli, participant responses, and trial outcomes, were recorded in the EEG file using event markers.
Analysis of the ECG data was conducted in MATLAB (The Math-Works, Inc., MA, USA) using the FieldTrip toolbox  and their recommended procedure to detect the cardiac events ( http://www.fieldtriptoolbox.org/example/use_independent_ component_analysis_ica_to_remove_ecg_artifacts ). Following this approach, the ECG signal was used to detect the QRS-complex and its main peak, the R wave peak. Next, we extracted the latency of the R-peak, which was used to compute the coefficient of variation (CV = standard deviation/mean) of the difference intervals between consecutive Rpeaks (inter-beat interval: IBI). The CV of inter-beat intervals was used as a metric of heart rate variability for statistical testing and is termed HRV hereafter. This measure was recently shown to capture block-wise state anxiety changes using a similar manipulation, as validated by corresponding changes in state anxiety scores ( Sporn et al., 2020 ). See further details below in Section Measures of Anxiety .
EEG data were preprocessed in EEGLAB toolbox ( Delorme and Makeig, 2004 ) by first high-pass filtering at 0.5 Hz (hamming windowed sinc finite impulse response [FIR] filter, 3381 points) and then notchfiltering between 48-52 Hz (847 points) to remove power line noise. Afterwards, artefacts (eye blinks, eye movements, cardiac artefacts) were classified using independent components analysis (ICA, runICA algorithm) and removed (on average 2.3, SEM 0.16, components). Noisy channels were corrected utilising spherical interpolation. All signals were then epoched around outcome onsets (win, lose) from -200 to 1000 ms. Noisy epochs exceeding ± 100 V were identified and removed using a thresholding technique relative to the pre-stimulus baseline. The number of rejected trials for each participant did not exceed 10% of the Figure 1. Behavioural task structure and physiological measures. A) On individual trials, participants were presented with two visual icons. They were instructed to predict the rewarding stimulus (win = 5p). The stimuli (blue or orange image) were randomly presented to either the left or right of the screen. They remained on the screen until a response was given or the allowed time (2200 ms ± 200 ms) expired -recorded as no-response. When a response of either the left arrow key or right arrow key was pressed, participants immediately saw their chosen image highlighted in bright green, which remained on screen for 1200 ms ( ± 200 ms) before the outcome was revealed. The outcome, either win or lose, was shown in the middle of the screen for 1200 ms ( ± 200 ms) in green and red respectively. Each trial ended with a fixation cross and an inter-trial interval of 1250 ms ( ± 250 ms). B) The probability governing the likelihood of the blue stimulus being rewarded (p(win|blue), with reciprocal probability values for the orange stimulus: p(win|orange) = 1 -p(win|blue)). Probability mappings varied in length (26-38 trials) ranging from heavily biased, p(win|blue) = 0.9, through moderately biased, 0.7, to unbiased, 0.5; and repeated in reverse relationships (0.1, 0.3). Here we display one example of contingency changes for p(win|blue) over the course of the experimental blocks (TB1, TB2, 200 trials each). These blocks were divided into the 5 randomly ordered stimulus-outcome mappings and were randomly generated for each participant. While conducting the experimental task, participants' physiological responses -C) EEG and D) ECG -were recorded continuously, with R-peaks from ECG signals being used to calculate heart-rate variability (HRV) and spectral estimates of high frequency (0.15-0.4 Hz) power in HRV. total number. Additional processing steps related to the use of a General Linear Model in combination with the regressors extracted from the computational model are presented in the below section on EEG analysis and the general linear model.
Cleaned EEG and preprocessed behavioural data files are available in the Open Science Framework Data Repository: https://osf.io/b4qkp/ . The results shown in Figs. 3 , 4 , and 5 are based on these data.

Measures of State Anxiety
One marker of state anxiety used during the experiment was the CV of the inter-beat intervals to assess HRV, as this measure, similarly to other metrics of HRV, has been reported to show reductions during anxious states ( Chalmers et al., 2014 ;Friedman and Thayer, 1998 ;Gorman and Sloan, 2000 ;Kawachi et al., 1995 ). A lower HRV is associated with complexity reduction in physiological responses to stress and anxiety ( Friedman, 2007 ;Gorman and Sloan, 2000 ), and is used as a transdiagnostic marker to identify anxiety in psychiatry ( Quintana et al., 2016 ). In our recent work, we validated the use of the CV-based HRV as a proxy for state anxiety by showing that a similar experimental manipulation reduced this HRV index and increased state anxiety scores ( Sporn et al., 2020 ).
Complementing the HRV analysis, we acquired subjective selfreported measures of state anxiety (STAI state scale X1, 20 items: Spielberger, 1983 ) four times throughout the experiment using an on-line version that was embedded within the code for the experiment. However, due to an error in the code, the STAI was presented at the wrong time intervals, rendering it invalid to assess changes in state anxiety following the experimental manipulation. To address this limitation, an additional analysis on the spectral characteristics of the interbeat-interval data was performed to link our HRV proxy of state anxiety to autonomic modulation and parasympathetic (vagal) withdrawal ( Friedman, 2007 ;Gorman and Sloan, 2000 ). Reduced high-frequency HRV (0.15-0.40 Hz) and reduced variation between R-R intervals are consistently shown across trait anxiety, worry, and anxiety disorders ( Aikins and Craske, 2010 ;Friedman, 2007 ;Fuller, 1992 ;Klein et al., 1995 ;Miu et al., 2009 ;Mujica-Parodi et al., 2009 ;Pittig et al., 2013 ;Thayer et al., 1996 ). After obtaining the IBI time series, as described in the previous section, we interpolated it at 1 Hz with a spline function (order 3), with spectral power estimated using Welch's periodogram method (Hanning window: following Rebollo et al., 2018 ). These power estimates were then normalised to the average power in the baseline (R1) and converted to decibels (dB) for statistical analysis.

The Hierarchical Gaussian Filter (HGF)
We used the Hierarchical Gaussian Filter (HGF) from Mathys et al. (2014Mathys et al. ( , 2011 to estimate each participant's individual learning characteristics and belief trajectories during our binary reward learning task. The HGF is a freely distributed open source software available in TAPAS ( http://www.translationalneuromodeling.org/tapas ), and has been used to model and understand learning across diverse settings (e.g., de Berker et al., 2016 ;Diaconescu et al., 2017bDiaconescu et al., , 2014Iglesias et al., 2013 ;Marshall et al., 2016 ;Stefanics et al., 2018 ;Weber et al., 2020 ).
Alternative models to the HGF have been proposed based on a generative model of sudden changes in the environment (change-point models: Nassar et al., 2010 , Moens andZénon, 2019 ). In our task, changes to the contingencies governing the outcomes were abrupt (see Fig. 1 B), which is in contrast to the generative model of the environment suggested by the HGF, where states evolve as Gaussian random walks and thus change slowly and diffusively over time. While the HGF has been successful in explaining and predicting human behaviour in such tasks (e.g., Iglesias et al., 2013 ;de Berker et al., 2016 ), alternative models (change-point models: Nassar et al., 2010 ; Hierarchical Adaptive Forgetting Variational Filter: Moens and Zénon, 2019 ) were formulated to expect sudden changes and could outperform the HGF in environments with diffuse or sudden changes. In practice, however, both approaches (HGF and change-point models) can successfully deal with both kinds of environments (sudden versus diffuse changes), as a recent comparative analysis found ( Markovi ć and Kiebel, 2016 ).
The HGF is a generative model representing an approximately Bayesian observer estimating hidden states in the environment. As such, the HGF is a model of perception where beliefs about states are updated hierarchically. This perceptual model can then be coupled to a response model that associates belief estimates to decisions. More specifically, in the generative model, a sequence of hidden states x 1 (k) , x 2 (k) ,..., x n (k) gives rise to sensory inputs that each participant encounters across k trials. Notably, while the perceptual model specifies how inference from observations to beliefs operates hierarchically across those environmental states, the response model probabilistically generates responses (in our case, the choices of the agents) based on those beliefs (see Fig. 2 ). We used a 3-level HGF model for binary outcomes, where observed contingencies were used as input ( Mathys et al., , 2011. Hence, the trial-wise input u k = 1 if the blue stimulus was rewarding (or orange lose) and u k = 0 if the blue stimulus was not rewarding (orange stimulus win). Note that all equations of relevant HGF quantities presented below are taken from ( Mathys et al., , 2011. We refer the interested reader to these papers for the derivation of the perceptual model. At the lowest level of the model, the hidden state x 1 corresponds to the binary categorical variable of the experimental stimuli: whether the blue symbol is rewarded in trial k (x 1 (k) = 1; hence, orange would be non-rewarding) or not rewarded (x 1 (k) = 0; orange is rewarded). The second and third level states, x 2 and x 3 , are continuous variables evolving as Gaussian random walks coupled through their variance (inverse precision). Thus, their value at trial k will be normally distributed around their previous value at trial k-1. The posterior distribution of beliefs about these true hidden states x i (i = 2,3) is fully determined by the sufficient statistics i (mean, corresponding with a participant's expectation) and i (variance or uncertainty). State x 2 describes the true value of the tendency of the stimulusoutcome contingency. It can be mapped to the probability of the binary state x 1 through a Bernoulli distribution p(x 1 | x 2 ) = Bernoulli (x 1; s(x 2 )), where s(x) is a sigmoid function s(x) = 1/(1 + exp(-x)). We can then measure the change in expectation at the lowest level and interpret it as an implied learning rate ( ). This is defined as the sigmoid transformed difference between 2 before seeing the input and after seeing it, relative to the difference between the observed inputs u and its prediction s( 2 ) ( Fig. 2 , lower panel; TAPAS toolbox: tapas_hgf_binary.m). A larger belief update in response to the same observed mismatch between the input u and the prediction amounts to a higher learning rate . At the top level, x 3 represents the phasic log-volatility within the task environment; that is, the rate of change on the second level.
The coupling between levels 2 and 3 is through a positive (exponential) function of x 3 , which represents the variance or step size of the Gaussian random walk that determines how x 2 evolves in time: The parameters and 2 represent the coupling strength and the tonic volatility, respectively. In the associated belief updates, momentarily high volatility estimates ( 3 ) increase the speed with which beliefs at level 2 change. Larger values of the tonic (time-invariant) part of the variance ( 2 ) generally increase the step size of x 2 . This leads to faster belief updates on level 2 irrespective of current levels of (estimated) volatility.
The step size of the volatility state x 3 in our 3-Level HGF is governed by a positive constant, which is the exponential of a constant parameter 3 : Our analyses of uncertainty estimates focused on informational uncertainty, captured by variance on level 2 ( 2 , belief uncertainty about outcome tendencies) and level 3 ( 3 , belief uncertainty about volatility representing imperfect knowledge about how the reward outcome contingencies are changing across time: Mathys et al., [2014, Eq. 9 -10 ]). Uncertainty about x 2 can be split into two distinct forms of uncertainty (informational uncertainty 2 , and environmental uncertainty [exp( 3 (k-1) + 2 )]), whereas uncertainty about x 3 consists of 3 . (Note that 3 corresponds to "informational " uncertainty about state x 3 ). Environmental uncertainty (Mathys et al., 2014, Eq. 13 -14 ), determines the step of the random walk for x 2 through a combination of two quantities: phasic volatility ( 3 (k − 1) ) and tonic volatility ( 2 ): Formally, the update equations of the posterior estimates for level i (i = 2 and 3) take the following form: Where the posterior mean update term Δ i k is the difference between the posterior expectation in the current trial, i (k) and the prediction from the previous trial i (k-1) before seeing the input on the current trial. The update step is proportional to the prediction error i − 1 (k) term, which denotes the discrepancy between the lower level expectation i − 1 k and the prediction ̂( ) −1 . The prediction error is then weighted by a ratio of precisions (the precision of the prediction of the level below ̂( ) −1 before seeing the input divided by the precision of the current belief ( ) ). Precision is defined as the inverse variance of the posterior expectation: The precision-weights ratio in Eq. (4) can be interpreted as a learning rate, whereas its product with the prediction error constitutes the precision-weighted prediction error that governs the update steps (pwPE: see also Eq. 19 and 20 below). Correspondingly, Eq. (4) above articulates the idea that more uncertain (less precise) belief estimates for the current level should motivate a larger influence of unpredicted outcomes on subsequent belief updating.
As mentioned above, the updates on the first level of our model are equivalent to the input u (k) : While the posterior belief updates on level 2 of our 3-level HGF take the form: With the variance update as follows:

Fig. 2.
Three-level Hierarchical Gaussian Filter for binary outcomes. Bottom panel. Representation of the three levels of the HGF for binary outcomes and the associated belief trajectories across the total 400 trials in a representative participant. At the lowest level, the inputs u correspond to the rewarded outcome of each trial (1 = blue, 0 = orange; shown as black dots). The participant's responses y are shown in light blue dots tracking those trial outcomes. The learning rate ( ) about stimulus outcomes at the lowest level is also given in black. The belief on the second level, 2 ( 2 ), represents the participant's estimate of the stimulus tendency x 2 and the step size or variance of the Gaussian random walk for x 2 depends on parameters and 2 , in addition to the estimates of the level above , x 3 . The belief on the third level, 3 ( 3 ), represents estimates of volatility x 3 , whose step size is governed by parameter 3 . Top panel. Schematic representation of the 3-level HGF model with relevant parameters modulating each level. In our study, 2 , 3 and the response parameter were free parameters and were estimated by fitting the HGF to the individual responses and observed inputs. Generally, parameters 2, 3 describe an individual's learning motif (see the section below for further details).
Where the following definitions apply: Formulated as precision, the variance step from Eq. 8 above is: A similar form is found for the belief update on level 3: With Following the posterior updates from Eq. 7 and Eq. 13 , the equations for pwPE on level 2 ( 2 ) and level 3 ( 3 ) can be written as: As response model we used the unit-square sigmoid observation model for binary responses ( Iglesias et al., 2013 ;Mathys et al., 2014 ). This transforms the predicted probability m(k) that the stimulus (e.g. blue) is rewarding on trial k (outcome = 1) -which is a function of the current beliefs -into the probabilities p(y (k) = 1) and p(y (k) = 0) that the participant will choose that stimulus (blue, 1) or the alternative (orange, 0): Higher values of the response parameter lead to the participants being more likely to choose the response that corresponds with their current belief about the rewarded stimulus.
Fitting the combination of perceptual and response model to an individual participant's responses allows for a subject-specific characterisation of learning (and response) by the set of perceptual (and response) parameters. Here, we estimated the parameters 2, 3, and (see below for free model parameters in an alternative HGF model). The priors on these values were set to be relatively uninformative by choosing a broad variance (16 for 2, 3 and 1 for as we expected less variation in this parameter). We fixed both the coupling parameter and the starting value of the belief on the third level 3 (0) to 1 following de Berker et al., (2016) , but note that the scale of x 3 is arbitrary in our setting (for details, see Mathys et al., 2014 ). We chose a neutral starting value for the belief on the second level, i.e., 2 (0) = 0, assuming participants would not have any initial preference for the outcome to be either rewarding (positive 2 ) or not rewarding (negative 2 ). The initial uncertainties of these beliefs ( 2 (0) = 0.1 and 3 (0) = 1) corresponded to the default settings of the toolbox, and we verified that these values had a negligible impact of the estimated belief trajectories. All prior settings are summarised in Table 1 (see also model space below for alternative models). Maximum-a-posteriori (MAP) estimates of model parameters were determined using these priors on parameters and the series of inputs, optimised with a quasi-Newton optimisation algorithm and calculated in the HGF toolbox version 3.1.
To assess the reliability of our estimates for the free model parameters in our implementation of the HGF (winning model, see model comparison details below), we simulated behavioural responses of 70 agents for nine different values of 2 (total 630 simulations), when observing the input of Cont participant #1. Similar simulations were run to estimate parameters 3 and ( Supplementary Fig. 2). This analysis demonstrated high accuracy for estimating 2 and , while 3 was poorly recovered. Poor estimation of 3 has also been reported in a recent study using a different approach ( Reed et al., 2020 ). A complementary analysis using simulated responses to observed inputs from StA participant #1 provided very similar results (Supplementary Figure 3).
Based on these results, we excluded 3 from subsequent betweengroup statistical analyses. Given our stimulus sequence, which exerted a certain level of volatility (changes in the contingencies every 26-38 Table 1 Means and variances of the priors on perceptual parameters and starting values of the beliefs of the HGF models. Values are shown for 3-level HGF, 2-level HGF and HGF 3 models. Free parameters are estimated in their unbounded space. Accordingly, parameters that are restricted to a confined interval are logtransformed, to allow for estimation in an unbounded space. In our study, was estimated in the log space (3-level HGF and 2-level HGF models). Model HGF 3 had as free parameters 2 , 3 , 3 (0) , and 3 (0) with 3 governing the decision noise through a negative exponential ( Diaconescu et al., 2014 ). Here, 3 (0) was estimated in the log space.

3-level HGF
1 1 trials) but did not contain marked changes in volatility, it is thus unsurprising that we could not infer on participants' beliefs about the metavolatility 3. However, even if the true value of environmental volatility is constant, participants still need to estimate the adequate level for performing the task, suggesting that learning about volatility is still relevant (consistent with our model comparison results, see below; see also de Berker et al., 2016 which used a very similar task structure with constant true volatility). In sum, in the current study, the computational quantities of interest for the statistical comparison between the groups were the model parameters 2 (tonic volatility estimate) and the decision noise from the response model, . In addition, we assessed the trial-wise trajectories of posterior mean on beliefs about volatility ( 3 ), environmental uncertainty, and the variances on levels 2 and 3 ( 2 , 3 ) as a measure of (informational) uncertainty about the hidden states on these levels. Note that due to the poor estimation of 3 ('meta-volatility'), which directly modulates precision in level 3 and thus the update steps on the expectation of volatility, 3 , interpretation of between-group results for 3 and 3 should be treated with care.
Precision-weighted prediction errors play an important role in current Bayesian theories of perceptual inference and learning ( Doya et al., 2007 ;Feldman and Friston, 2010 ;Friston et al., 2013 ;Friston and Kiebel, 2009 ;Moran et al., 2013 ;Rao and Ballard, 1999 ), and these are the quantities that are considered to predominantly modulate EEG signals ( Friston and Kiebel, 2009 ). We initially selected the pwPE trajectories from levels 2 and 3 (labelled 2 , 3 , Eqs. [19] and [20]) to examine how these are represented in the brain as a function of state anxiety. However, as we identified a very high correlation between the regressors derived from 2 and 3 , our final GLM analysis excluded the pwPE trajectories from level 3 (see GLM analysis section below).

Model Space
We tested five computational models of learning. The first three were a 3-level HGF (with volatility on the third level: HGF 3 ), a reduced 2-level HGF excluding volatility (HGF 2 ) and a modified 3-level HGF where the decision noise parameter that maps beliefs to choices ( ) depends on trial-by-trial estimates of volatility ( 3 ) (in line with Diaconescu et al. 2014 ; here termed HGF 3 ). In the modified model HGF 3 trial-wise increases in volatility correspond with an individual exhibiting more exploratory or noisier behaviour (smaller decision noise ). In this model, in addition to the free model parameters 2 and 3 , we estimated 3 (0) and 3 (0) . These were all hierarchical Bayesian models implemented using the HGF TAPAS toolbox ( Mathys et al., 2011. The priors on hierarchical Bayesian model parameters are shown in Table 1 . The fourth and fifth models were broadly used reinforcement learning models: a Rescorla Wagner (RW) where PEs drive belief updating but with a set learning rate ( Rescorla and Wagner, 1972 ); and a Sutton K1 model (SK1) that permits the learning rate to change with recent prediction errors ( Sutton, 1992 ).
Models were then compared at the group level for fit using random effects Bayesian model selection (BMS; Stephan et al., 2009 ; code from the freely available MACS toolbox; Soch and Allefeld, 2018 ). BMS provided model frequencies and exceedance probabilities, reflecting how optimal each model or family of models performed ( Soch et al., 2016 ). First, the log-model evidence (LME) from all Bayesian models were combined to get the log-family evidence (LFE) and was compared to the LFE of the family of reinforcement learning models (RW and SK1) to assess which provided more evidence. In the winner family, additional BMS determined the final optimal model.

EEG analysis and the General Linear Model
Prior to single-trial ERP analysis using the general linear model (GLM), a statistical analysis of the main effect of outcome on the ERP was conducted in the total population (N = 42). The aim of this ERP analysis was to assess whether the windows associated with the effect of the outcome (lose versus win) on the EEG signals in our task converge with the windows of the fERN and P300 effects reported in previous studies (see for instance Nieuwenhuis et al., 2004 ;Hajcak et al., 2005 ). Accordingly, we assessed the difference between lose and win ERPs in a broad window between 100 and 1000 ms, which includes the latency of those previously reported ERP components. This analysis was carried out using permutation tests with a cluster-based threshold correction to control the family-wise error (FWE) at level 0.05 (dependent samples t-test, 5000 iterations: Maris and Oostenveld, 2007 ;FieldTrip toolbox, Oostenveld et al., 2011 ).
To allow for the detection of significant clusters corresponding with positive and negative ERP differences, cluster-based test statistics being in the 2.5th and 97.5th percentiles of the permutation distribution were considered significant (two-sided test). For this statistical analysis, the ERP data epochs were baseline-corrected by subtracting the mean activation during the baseline period from -200 ms to 0 ms.
For the GLM single-trial analysis, we selected a smaller 200-850 ms interval, primarily based on the observed latency of the fERN and P300 components in our study. This interval also covered the latency of HGF regressors in previous GLM studies (see, e.g. Diaconescu et al. [2017a] ; Weber et al. [2020] ; although these studies used quite different tasks). Additionally, it should be noted that the modulation by pwPE regressors of the trial-wise ERP responses can peak at different latencies than the model-free ERP effects ( Diaconescu et al. [2017a] ; Weber et al. [2020] ; Stefanics et al. [2018] ).
In this analysis, EEG data were downsampled to 256 Hz, lowpass filtered at 30Hz and converted to SPM 12 ( http://www.fil. ion.ucl.ac.uk/spm/ version 7487) . In SPM 12 soft-ware we converted the EEG data into 3-dimensional volumes (two spatial dimensions: anterior to posterior, left to right across the scalp; one temporal dimension: peri-stimulus time; Litvak et al., 2011 ). All participants' data consisted of 64 channels and 168 time points using a voxel size of 4.2 mm × 5.4 mm × 4 ms and were spatially smoothed to adjust for between-subject spatial variability in the channel space. The scalp x time 3D images were then tested statistically using statistical parametric mapping and the GLM (see next section; Friston, 2004a , 2004b ;Kilner and Friston, 2010 ). This procedure is firmly established in EEG using SPM ( Litvak et al., 2011 ).
Initially, our GLM was composed of trial-wise estimates of two computational quantities: absolute values of pwPEs in level 2 ( 2 ), and pw-PEs in level 3 ( 3 ). The absolute value of 2 was selected because its sign is arbitrary: the quantity x 2 is related to the tendency of one choice (e.g. blue stimulus) to be rewarding (x 1 = 1); yet this choice and equivalently the sign of the pwPE at this level was arbitrary (see for instance Stefanics et al., 2018 ). In addition, we aimed to use as third regressor the trial-wise win/lose outcome values as we expected this variable to account for much of the signal variance in the EEG epochs.
However, we observed a prominent correlation between the two regressor quantities abs( 2 ) and 3 . The Pearson correlation coefficient ranged from 0.67 to 0.96 across all 42 participants, mean 0.86, median 0.88; and the correlation was significant in all participants ( p < 0.05). The effect of collinearity on GLMs in neuroimaging has been assessed and discussed in detail before (see, e.g. Mumford et al., 2015 ). Collinear regressors can reduce power and lead to unreliable parameter estimates, but researchers should only be concerned with this issue in the case of near-collinearity (very high correlation between regressors; Mumford et al., 2015 ). A common practice is to orthogonalise collinear regressors in the model to solve the problem of reduced power and unreliable parameter estimates in the GLM ( Mumford et al., 2015 ). However, other authors argue that despite the potential appeal of orthogonalisation of regressors to remove collinearity from the model, the implications are actually not necessarily beneficial: it does not improve the overall fit of the model, and in most cases, it can lead to a misleading interpretation of the resulting inferences ( Vanhove, 2020 ). Here we followed this second line of argumentation, and instead of orthognalising our pwPE regressors, we updated our GLM to only include trial-wise estimates of the absolute values of pwPEs on level 2, abs( 2 ), and the outcome of each trial (1 = win, 0 = lose). Regressor abs( 2 ) was chosen over 3 out of theoretical considerations, but also as abs( 2 ) was associated with much higher efficiency of coefficients in the GLM compared to 3 (see Supplementary Materials, following Mumford et al., 2015 ).
Having reduced the regressor space, we then assessed the efficiency for the coefficients associated with each regressor in the final GLM. The efficiency for 1 , modulating the effect of pwPEs about reward outcome on the EEG, was in the same order of magnitude as the efficiency for 2 , associated with the outcome regressor (Supplementary Materials). In addition, we observed that, while the efficiency for 2 was very similar in both Cont and StA groups, the efficiency for 1 associated with abs( 2 ) was considerably lower in the StA group, relative to control participants. The efficiency values indicated that our final GLM model was a priori well specified for our chosen explanatory variable abs( 2 ), although it had greater efficiency for the regression coefficient on this variable in the control group.
Using these choices for regressors and time interval, we then carried out a whole-volume (spatiotemporal) analysis that searched for representations of our computational quantities in the single-trial EEG responses for each individual participant, before assessing within-group statistical effects at the second level. We corrected for multiple comparisons across the whole time-sensor matrix using Gaussian random field theory ( Worsley et al., 1996 ) with a family-wise error (FWE) correction at the cluster-level ( p < 0.05). This was performed with a cluster defining threshold (CDT) of p < 0.001 ( Flandin and Friston, 2019 ). Importantly, all results reported survived whole-volume correction at the peak-level ( p < 0.05). We assessed separately within each group, whether the tra-jectories of our computational quantities were associated with increases or decreases in EEG amplitudes using an F-test. Following within-group analysis, we used a 2-sample t-test to assess between-group StA minus Cont differences in the representations of those same regressors. A standard summary statistics approach was used to perform random effects group analysis within each group (StA, Cont) of 21 participants independently and between groups ( Penny and Holmes, 2007 ).

Statistics
To assess Group (StA, Cont) and Block (1,2) main effects and interactions in state anxiety measures, behavioural, and computational model variables, we applied non-parametric factorial synchronised permutations tests ( Basso et al., 2007 ) with 5000 permutations. These permutation-based factorial analyses assessed main effects and interaction effects for factors Block (TB1, TB2) and Group (StA, Cont). As shown below (see Results 3.1), we found a significant main effect of factor Block on the HRV index, indicating that the anxiety manipulation led to different physiological changes as a function of the block number. Accordingly, we continued to assess all our dependent variables using the 2 × 2 non-parametric factorial design with factors Block and Group. Factorial analyses were complemented with planned pair-wise permutation tests to assess our specific hypothesis of between-group differences (5000 permutations). This applies to the following dependent variables: (a) model-free behavioural measures (error rate, reaction time: RT); (b) CV as a measure of HRV (CV values in TB1, TB2 blocks were corrected by subtracting the R1 baseline value) and power for spectral analysis expressed in dB; (c) HGF perceptual model parameter 2 (tonic volatility modulating the variance of the Gaussian random walk at level 2); (d) Decision noise of the response model, ; (e) Relevant HGF quantities: (i) informational uncertainty about the reward tendency x 2 ( 2 ); (ii) estimates of belief on volatility (mean, 3 , and variance, 3 ); and last, (iii) environmental uncertainty -related to volatility in the environment: exp( 3 + 2 ).
Note that the above selected HGF trajectories do not directly reflect the subject-specific sequence of contingency blocks, which was randomly generated for each participant. By contrast, the expectation on the reward tendency, 2 , was strongly associated with the structure of contingency blocks and therefore necessarily differed across participants by nature of the task design. Accordingly, 2 was not selected as a dependent variable.
Pair-wise permutation tests were also used to test within-group differences in RT across blocks. In the case of multiple comparisons (for instance, two between-group permutation tests run separately for each block), we controlled the false discovery rate (FDR) using an adaptive linear step-up procedure set to a level of q = 0.05 ( Benjamini et al., 2006 ). This procedure furnished us with an adapted threshold p -value (P FDR ). Prior to these statistical analyses and following BMS, the trialwise trajectory for each computational quantity of interest ( 2 , 3 , 3 , and environmental uncertainty, Eq. 3 ) was extracted from the winning model, followed by an average across trials within task blocks (TB1, TB2). By collapsing the trial information, we aimed to assess the general block-related or group-related monotonic changes in the HGF quantities using the 2 × 2 factorial analysis with the factors Group and Block described above.
Below, in the Results section, we present the mean and standard error of the mean (SEM) for our dependent variables (either in text or in a figure), alongside non-parametric effect sizes for pair-wise comparisons and corresponding bootstrapped confidence intervals ( Grissom and Kim, 2012 ;Ruscio and Mullen, 2012 ). In the case of within-group comparisons, the non-parametric effect size was estimated using the probability of superiority for dependent samples ( Δ dep ), whereas for between-group effects we used the probability of superiority ( Δ); both are calculated in line with Grissom and Kim (2012) , expressed as the number of values in sample A greater than those in sample B ( Δ = P[A > B]). In the case of dependent samples, the comparison between Fig. 3. State anxiety modulates heart rate variability and behavioural responses. A) Modification in heart-rate variability (HRV) during anxiety manipulation. The average HRV (measured with the coefficient of variation of the inter-beat-interval of the ECG signal) is provided for the state anxiety (StA) and Control (Cont) groups across task block 1 (TB1), task block 2 (TB2) and final resting state (R2). The average of the resting state (R1: baseline) has been subtracted from each subsequent task block to normalise HRV values. Significant between-group differences assessed in learning blocks TB1 and TB2 are identified by black bars on the x-axis (paired permutation test, P FDR < 0.05 after control of the FDR at level q = 0.05). B) The effect of anxiety on reward-based learning performance: error rates. Here, the average error rates of each group, the state anxiety (StA) and the control group (Cont) are presented using a central point flanked by SEM bars. To the right of each mean and SEM are the individual data points in each group to show group population dispersion. Anxiety significantly increased the error rate in the StA group when compared to Controls ( p = 0.001). C) The main effect of outcome (win, grey; lose, blue) on mean reaction times (RT: p = 0). On the left, the average RT of each outcome is presented using a central point with SEM bars. To the right of each mean and SEM are the individual data points of each group to show group population dispersion. pairs is done for matched pairs. Although in the original formulation by Grissom and Kim (2012) , ties were not taken into account; here, in line with Ruscio and Mullen (2012) , we corrected ( Δ) using the number of ties (difference scores = 0) and estimated bootstrapped confidence intervals (CI) for ( Δ).

Heart-rate variability
Using a non-parametric 2 × 2 factorial test with synchronised rearrangements, significant main effects of Block ( p = 0.009) and Group ( p = 0.04) were revealed on the normalised HRV index during reward-based learning blocks. No interaction effect was found. Planned betweengroup comparisons using permutation testing revealed significantly lower HRV in StA during TB1 (mean -0.02, SEM 0.004) when compared to Cont (mean -0.005, SEM 0.005, p FDR < 0.05, Δ = 0.70, CI = [0.54, 0.86], see Fig. 3 A). These results indicate that the experimental manipulation induced physiological changes in cardiovascular activity corresponding to an anxious state ( Chalmers et al., 2014 ;Feldman et al., 2004 ). An additional analysis on the spectral characteristics of the IBI time series corresponded our HRV result to autonomic inflexibility in state anxiety, with significantly reduced high frequency HRV (0.15-0.4 Hz, termed HF-HRV hereafter) in StA (mean -6.3, SEM 0.6) compared to Cont (mean -4.7, SEM 0.5, non-parametric 2 × 2 factorial test with synchronised rearrangements: significant main effect of Group p = 0.02 and trend level interaction effect p = 0.06, see Supplementary Fig. 4). There was no effect of Block ( p = 0.8).
Our analysis demonstrating reduced HRV in state anxiety corresponds to both prior research showing lower HRV in anxiety ( Chalmers et al., 2014 ;Friedman and Thayer, 1998 ;Gorman and Sloan, 2000 ) and our previously published work using a similar state anxiety manipulation where we found lower HRV along with higher state anxiety scores using the STAI state scale omitted from the present study ( Sporn et al., 2020 ). Our additional analysis of the frequency content in the IBI time series further links our lower HRV result (as a proxy of state anxiety) to research showing reduced HF-HRV (0.15-0.40 Hz) in anxiety conditions, a physiological expression of inflexible autonomic activity ( Aikins and Craske, 2010 ;Friedman, 2007 ;Fuller, 1992 ;Klein et al., 1995 ;Miu et al., 2009 ;Mujica-Parodi et al., 2009 ;Pittig et al., 2013 ;Thayer et al., 1996 ).

Model-free Analysis
The percentage of errors made by each participant across 400 trials was used as a measure to assess whether anxiety impairs reward learning task performance. Using non-parametric factorial test (synchronised rearrangements), the main effect of factor Group ( p = 0.01) on error rates was significant, whereas factor Block revealed only a trend ( p = 0.056). There was no significant interaction effect ( p = 0.66). A planned between-group comparison on the Group factor alone using pair-wise permutation tests revealed a significantly higher total average error rate in the StA group (mean 38.0, SEM 0.97), in comparison to the Cont group (mean 35.6, SEM 0.66, p = 0.001, Δ = 0.70, CI = [0.58, 0.82], see Fig. 3 B).
Turning to the mean reaction times (RT, in milliseconds; averaged across all trials), a significant main effect of task Block on RTs was observed ( p = 0.02). No significant main effect of Group or interaction effect was found ( p = 0.64, p = 0.26) in line with previous work on anxiety ( Bishop, 2009 ). Post-hoc analyses on the Block effect in each group revealed there was a significant decrease in the mean RT from TB1 (mean 658.3, SEM 32.53) to TB2 (mean 552.8, SEM 20.68) in the StA group ( P FDR = < 0.05, Δ = 0.73, CI = [0.65, 0.81]). This effect was also significant for Cont, with mean RT dropping from TB1 (mean 657.7, SEM 35.65) to TB2 (mean 591.4, SEM 31.78, P FDR = < 0.05, Δ = 0.65, CI = [0.56, 0.72]). As a separate analysis, and given the lack of between-group differences in RTs, we contrasted the total-population (StA + Cont) mean RT of lose and win trials. The lose minus win RT difference was highly significant, as expected, reflecting a slower response on trials where participants responded incorrectly ( p = 0, Δ = 1, CI = [1, 1]; mean RT for lose trials 639.8 ms [SEM 0.36 ms], and for win trials 600.8 ms, [SEM 0.33 ms], Fig. 3 C).
In a final post-hoc analysis, we assessed RT separately in blocks of unpredictable cues (50-50 contingency phase) and highly predictable cues (90-10 contingency phase). The rationale for this analysis was that previous work using classical attention paradigms revealed that lack of attention leads to larger RTs both on trials with predictive cues (our 90-10 contingency phase) and trials with uninformative cues (as our 50-50 contingency phase; see for instance Prinzmetal et al., 2009 ). We thus aimed to assess whether state anxiety influenced attention using classical behavioural measurements. We examined whether the strength of the contingency bias shapes RT differently in each group. The results demonstrate no significant difference in RT between the two groups for either unpredictable or predictable contingency phases ( p > 0.05: See Supplementary Materials). in each of the 42 participants and obtaining log-model evidence (LME) values for each, we compared the five models using Bayesian model selection (BMS). Results from BMS revealed that the family of Bayesian models (HGF 3 , HGF 2 , and HGF 3 ) had much stronger evidence than the reinforcement-learning models (RW, SK1), with an exceedance probability of 0.99, and an expected frequency of 0.74 (leftmost columns: Fig. 4 A). Next, within the Bayesian models, an additional BMS step using the LME for each subject and model demonstrated much stronger evidence for the HGF 3 model relative to the HGF 2 and HGF 3 model versions, with an exceedance probability of 0.98 and an expected frequency of 0.61 (rightmost columns: Fig. 4 A). The HGF 3 model was the winner model also when performing BMS separately in the StA and Control groups.

Bayesian Model Selection
Although a previous study found the HGF 3 model to outperform the 3-level HGF ( Diaconescu et al., 2014 ), here we found the latter to provide more model evidence. One possible explanation is that the HGF 3 might be particularly useful in paradigms where (at least some) participants are exposed to a scenario of alternating low and high volatility ( Diaconescu et al., 2014 ). For a fixed value of true volatility, as in our study (constant rate of change of contingency blocks), the standard 3level HGF with a decision noise parameter that is not modulated by the dynamics of 3 performed the best.
To further determine the quality of the fit of the winning HGF 3 model, we simulated responses using the estimated model parameters for each individual ( 2 , 3 , ). Similarly to Aylward et al. (2019) , we computed the probability of response switch (choosing orange or blue) across trials in each individual and separately for simulated and empirical responses. We found a high significant non-parametric Spearman rank correlation between both variables across participants (N = 42): = 0.8679, p = 4 × 10 − 14 ( Supplementary Fig. 5). A similar outcome was obtained when assessing correlations within each group, suggesting that the winning model captured the dynamics in the data well.

State anxiety is associated with a lower learning rate about stimulus outcomes
We observed significant differences between the groups in the perceptual model parameter 2 , with smaller values obtained in StA (mean -3.1, SEM 0.23) when compared to the Cont group (mean -2.0, SEM 0.15, p = 0.002, effect size: Δ = 0.75, CI = [0.55, 0.90]). The decision noise parameter, , did not differ between groups ( p = 0.62), and was moderately low in both groups: Cont (mean 1.98 [0.26]) and StA (2.20 [0.41]).
The values of 2 influence, among other HGF trajectories, the learning rate at the lowest level, (through modulation of 2 ), driving the step of the update about stimulus outcomes . More negative 2 values -as found in StA -lead to smaller updates, and thus to smaller learning rates (See Fig. 4 B). To illustrate the impact that this has on the evolution of beliefs about reward contingencies and environmental volatility in our task, we additionally provide simulations of belief trajectories for both 2 ( 2 ) and the log-volatility 3 ( 3 ) (Supplementary Figures 6-7). The results demonstrate that decreasing 2 reduces the estimation uncertainty about the reward tendency 2 with smaller update steps on 2 (Supplementary Figure 6). The effect of decreasing 2 on high-level beliefs is to reduce the update steps for the expectation of log-volatility 3 and increase uncertainty 3 (Supplementary Figure 7). In the following, we explore whether the two experimental groups did indeed differ in the uncertainty of their beliefs as a consequence of the observed change in 2 .

Informational uncertainty about the outcome tendency is lower in state anxious individuals
As indicated in the previous section and illustrated in Supplementary Figure 6, the informational (belief) uncertainty about the outcome The two leftmost panels represent the model frequency and exceedance probability for the family of models 'HGF Fam' (HGF 2 , HGF 3 and HGF 3 : dark blue) and the family of reinforcement learning models 'RL Fam' (RW, SK1: blue). The family of HGF models provided the best model evidence. In the two right panels is the comparison between the three HGF models (HGF 2 : blue, HGF 3 : dark blue and HGF 3 : light blue). The HGF 3 provided stronger model evidence. B) Illustration of the trial-by-trial learning rate about stimulus outcomes ( ) in two ideal observers with different values of 2 . Trajectories were simulated using the same input sequence and parameters (except 2 ): 2 (0) = 0, 3 (0) = 1, 2 = log(0.1), 3 (0) = log(1), = 1, 3 = -7. The two values on 2 used in the simulated trajectories are -2 (orange) and -4 (black). This parameter represents the tonic part of the variance in the Gaussian random walk for x 2 and modulates the learning rate about stimulus outcomes at the lowest level. Lower 2 values lead to smaller trial-by-trial learning increments. When comparing 2 values between groups (StA, Cont), we found more negative values in StA than in the Cont group ( p = 0.002). C) Lower 2 in state anxiety leads to decreased informational uncertainty about x 2 . There was a significant main effect for factor Group (StA, green; Cont, black; synchronised permutation test: p = 0.008) but not for factor Block ( p > 0.05). Planned between-group comparisons indicated that state anxiety significantly decreased the average uncertainty about beliefs on tendency x 2 ( p = 0.003, as given by black bars), after averaging across both blocks; significant effect indicated by black bars at the bottom). D) Lower 2 in state anxiety leads to decreased environmental uncertainty ( p = 0.02; not effect of factor Block, p > 0.05). Thus, StA participants had a lower estimate of environmental uncertainty. E) State anxiety increased uncertainty about volatility in the task environment ( 3 ). We found a significant main effect for factor Block ( p = 0.0006) and Group (StA, green, Cont, black; p = 0.0002), modulating uncertainty about volatility. Planned between-group comparisons further indicated that state anxiety exhibited significantly higher 3 , as compared to control participants, separately in each task block (TB1, TB2, P FDR < 0.05, as given by black bars). tendency, 2 , is reduced for smaller 2 values, while it also depends on the volatility estimate 3 from the previous trial and other quantities (see Eq. 11 and 13 in Mathys et al., 2014 ). Here we found a significant main effect of factor Group on 2 ( p = 0.008). There were no significant effects of block and no interaction effect ( p = 0.58, p = 0.78). In addition, planned comparisons showed that anxiety significantly lowered the total average 2 for StA in comparison to Cont, as expected from the lower 2 values in StA ( Fig. 4 C

Environmental uncertainty is underestimated in state anxiety
Environmental uncertainty -induced by changes in the environment -depends on the tonic volatility estimate, 2 , and the trial-wise volatility estimate 3 (k-1) (see Eq. 3 above; the coupling constant was fixed to one). More volatile environments lead to greater environmental uncertainty. We found that environmental uncertainty was significantly modulated by factor Group ( p = 0.02), while there was no significant main effect for factor Block or interaction effect ( p = 0.58, p = 0.75). Further pair-wise analyses demonstrated that the StA group underestimated the environmental uncertainty, relative to control participants, when averaging across both experimental blocks ( Fig. 4 D; p = 0 , Δ = 0.74, CI = [0.54, 0.89]), consistent with their reduction in 2 .

Uncertainty about volatility is higher in state anxious individuals
In contrast to the effect on 2 reported above, state anxiety increased belief uncertainty on level 3 ( 3 ; uncertainty about volatility). We found both a significant main effect of Block ( p = 0.0006) and Group on this parameter ( p = 0.0002), yet no interaction effect ( p = 0.99). Across blocks, the uncertainty about volatility generally decreased. Planned comparisons demonstrated that separately in the first and second task blocks anxiety significantly increased 3 in the StA group when compared to the Cont group (  Fig. 7).

Standard Lose versus Win ERP results
Cluster-based random permutation tests demonstrated a significant difference between the effect of the two outcomes (lose, win) on the ERP (N = 42: two significant clusters at level p < 0.025). Losing led to a more negative ERP amplitude than winning during a time window between 200 and 350 ms post outcome (negative cluster, p = 0.003). This effect at first had a centro-parietal distribution, which later propagated to broader central, frontal, temporal, and parietal electrode regions, occurring approximately in line with the fERN ERP ( Supplementary Fig.  8). In a later time window, between 350 and 860 ms, losing evoked a more positive amplitude when compared to winning (positive cluster, p = 0.0002). During this later latency, the difference originated over fronto-central electrodes and later spread to centro-parietal electrodes resembling the P300 component wave (Supplementary Fig. 8). The latency of the significant clusters confirmed that lose relative to win trials elicited a biphasic ERP modulation consisting of an earlier negative wave resembling the fERN and a later positive and very pronounced deflection corresponding to the P300.

Single-trial ERP modulations by precision-weighted PEs
The HGF results had confirmed that state anxiety alters informational uncertainty of beliefs about reward contingencies (level 2) and also about volatility (level 3) ( Fig. 4 ) in an opposing pattern of changes (decrease in 2 and increase in 3 relative to control participants). We then proceeded to analyse in each group separately the electrophysiological representations of trial-wise pwPEs for level 2 -which are a function of the uncertainty estimate as shown in Eq. 12 (for an illustration of abs( 2 ), see Fig. 5 A). The GLM results of the additional outcome regressor are shown in Supplementary Figure 9.

Low-level precision-weighted prediction errors
In the Cont group, abs( 2 ) significantly modulated trial-wise EEG responses from 475 ms to 503 ms post stimulus over central and parietal electrodes, with a maximum effect at 496 ms across a left parietal region ( P FWE < 0.0001). An additional significant effect of a smaller cluster size was found earlier between 425-464 ms with a peak at 457 ms ( P FWE = 0.001) across central and frontal electrodes ( Fig. 5 B). Details on the cluster effects can be found in Table 2 . Precision-weighted PEs about the stimulus tendencies abs( 2 ) did not significantly modulate the ERP responses in the StA group. When directly contrasting the groups, there were no significant differences in the representation of abs( 2 ) in EEG activity.

Discussion
We combined computational modelling of behaviour and analysis of electrophysiological responses to examine how state anxiety modulates reward-based learning when learning in a volatile environment. Our key finding is that state anxiety was associated with a reduced estimate of tonic volatility, which resulted in an overall lower learning rate, and corresponded to a significant underestimation of environmental and informational uncertainty. At the same time, a reduction of tonic volatility in our paradigm led to a decrease in learning about phasic volatility, a higher-level belief about the current rate of change in the environment. Our modelling results offer a mechanistic explanation for the increase in error rate that we observed in the anxiety group.
Trial-wise estimates of uncertainty -or its inverse, precision -serve to scale the impact of prediction errors (PEs) on the belief updates. We found that precision-weighted PEs (pwPEs) about the stimulus-reward contingency explained trial-wise modulation of observed ERP responses in control participants only. We observed these effects mainly around 425-503 ms across left parietal and central electrodes. In state anxiety, there was no significant effect of lower-level pwPEs about reward contingencies on EEG amplitudes. Taken together, the data suggest that temporary anxious states in healthy individuals impair reward-based learning in volatile environments, primarily through changes in uncertainty estimates, potentially mediated by a degraded neuronal representation of lower-level pwPEs about reward contingencies, although the latter remains speculative given the lack of significant differences in pwPE representation between the groups (see below for further discussion).

States of anxiety bias computations of uncertainty during reward-based learning
The threat of a public speaking task used in our experiment reduced both heart rate variability, which is consistent with previous findings on state anxiety ( Chalmers et al., 2014 ;Feldman et al., 2004 ;Gorman and Sloan, 2000 ), and high frequency HRV (HF-HRV: 0.15-0.4 Hz), an index of autonomic inflexibility found across anxious conditions ( Friedman, 2007 ;Miu et al., 2009 ;Mujica-Parodi et al., 2009 ;Pittig et al., 2013 ). Beyond the initial induction where the effect of state anxiety on HRV was strongest, modulation of computational estimates of uncertainty persisted into the second task block. This suggests states of anxiety have a diffuse and transient effect on computational estimates of uncertainty during learning in volatile environments. It is important to note though that we were not able to validate the HRV block-related effects with similar changes in the self-reported STAI inventory, as the data from this inventory were not acquired at the appropriate times. Our interpretation of the results is therefore based on the assumption that we can use the HRV changes observed here as a proxy for the successful induction of state anxiety in our paradigm. This assumption is further supported by a previous study from our lab ( Sporn et al., 2020 ) where we used a similar approach in a motor learning task to successfully induce changes in state anxiety STAI scores (higher in the anxiety group), which was paralleled by a lowering of the same HRV proxy of state anxiety as used here.
Our experimental manipulation had an adverse effect on rewardbased learning. Having matched trait anxiety levels across the state anxious and the control group, our results indicate that the changes observed in reward-based learning -lower learning rates due to changes in belief uncertainty -can be linked to temporary anxious states independently of trait levels. These outcomes thus expand prior findings of an association between high levels of trait anxiety and difficulties in decision-making tasks ( de Visser et al., 2010 ;Miu et al., 2008 ) and learning in volatile task environments ( Browning et al., 2015 ;Huang et al., 2017 ) to the realm of state anxiety. While we tested only a volatile environment where probabilistic contingencies changed regularly (as in Iglesias et al., 2013 ;de Berker et al., 2016 ), a still unresolved question Fig. 5. Signatures of precision-weighted prediction errors on trial-wise ERPs. A) Trajectories of modelbased estimates for both lower-level and higher-level pwPE for one representative control group participant across 400 trials. In green are higher-level pwPEs concerning volatility; in black are the absolute values of the lower-level pwPE concerning beliefs about the rewarding stimulus. This second trajectory was chosen as a regressor in our GLM analysis, whereas 3 was excluded due to near-collinearity of 3 and abs( 2 ): Individual correlation values were around 0.9 (see main text). B) Effect of pwPEs on level 2 (abs( 2 )) in controls. In the Cont group, pwPEs about reward outcomes correlated with activation changes across a left parietal and central region between 475 ms to 503 ms, as shown in the topographical representation on the top at the time of the maximum peak of the cluster (496 ms post stimulus, P FWE -corrected = 0.00001, significant after controlling the FWE at 0.05, with a clusterdefining threshold of p < 0.001). An earlier cluster was also found as shown in the bottom topographical representation, with activation between 425-464 ms ( P FWE -corrected = 0.001) at frontocentral channels. C) The bottom panels show the average EEG response to the 40 highest ("High") and 40 lowest ("Low") pwPE values from each participant, and at P5 and FCz electrodes -representing the significant GLM cluster obtained in Cont participants (shown in B). The averaged EEG responses are displayed separately for StA High, StA Low, Cont High, and Cont Low. Both participant groups show an increased response in EEG activity during "High" relative to "Low" abs( 2 ) trials at both electrode locations and between 475 − 503 ms. Shaded bars show 1.96 * SEM.
is whether the anxiety-related modulations of uncertainty estimates are exclusive to a volatile environment or would also emerge in stable environments. Given that previous research in trait anxiety showed that learning is affected exclusively during volatile (not stable) experimental phases ( Browning et al., 2015 ;Huang et al., 2017 ), we predict that during stable blocks state anxiety would not alter belief uncertainty. Moreover, our task did not allow for robust inferences on phasic volatility estimation (as reflected by parameters like the meta-volatility 3 ). Additional follow-up work should extend the current paradigm to also consider an environment with dynamic (as opposed to fixed) volatility, to systematically assess whether state anxiety affects the estimation of phasic volatility on top of the altered tonic volatility estimates observed here.
By using the threat of public speaking instead of a specified aversive outcome, our approach allowed us to investigate behavioural, physiological, and neural responses in anticipation of a future unpredictable threat. Alterations in anticipatory responses to upcoming uncertain threats have been proposed to be a common explanation for anxious states in healthy individuals and anxiety disorders alike ( Grupe and Nitschke, 2013 ). Accordingly, our findings that anxiety leads to changes in informational and environmental uncertainty could prove relevant for understanding the alterations in decision-making and learning ob- Table 2 Test statistics for lower-level precision-weighted prediction errors and trial outcomes. Each significant activation is ordered according to size (leftmost column). We provide both the cluster and peak p values with the family-wise error correction applied. Also given are the relevant statistics ( F and peak equivalent Z ) for each activation cluster and within each activation. served in anxiety disorders ( Bishop and Gagne, 2018 ;Browning et al., 2015 ;de Visser et al., 2010 ;Huang et al., 2017 ;Miu et al., 2008 ). Our approach is not the first in proposing a role of uncertainty estimates in cognitive biases in anxiety. A recent account of affective disorders suggested that difficulties with uncertainty estimation underlie some of the psychiatric symptoms in these populations ( Pulcu and Browning, 2019 ). This work distinguished between different types of uncertainty, corresponding to irreducible, informational, and environmental uncertainty as described here, and assigned a particular relevance of environmental ( "unexpected ") uncertainty in explaining anxiety. In fact, evidence from computational studies converges in linking trait anxiety with difficulties in learning in unstable or volatile environments ( Browning et al., 2015 ;Huang et al., 2017 ). As shown by Browning et al. (2015) , an inability to adapt to changes in a task structure can be measured by comparing a single volatile block to a single stable block. Alternatively, suboptimal learning in anxiety can be captured by focusing on volatile environments alone, in which the probability of reward (or punishment) changes regularly across different blocks ( Huang et al., 2017 ).
Here we followed the second approach to investigate reward-based learning in a volatile environment. We investigated the adaptive scaling of learning rates to estimates of environmental uncertainty on a trialby-trial basis by applying a computational model that explicitly incorporates learning about volatility in a hierarchical Bayesian framework. The winning computational model that best explained our behavioural data was the 3-level HGF, where the third level is a mathematical description of volatility estimates and their variance. Our inferences about phasic volatility estimation, as represented on this third level, are limited by the fact that our paradigm did not include marked changes in the level of volatility over time. Accordingly, we were not able to recover perceptual parameters related to phasic volatility estimation. The fact that the model that included phasic volatility estimation was still a better explanation of the observed responses suggests that trial-wise updating of beliefs about the level of volatility may nevertheless play a role. Participants still need to infer the adequate level of volatility as they perform the task ( Iglesias et al., 2013 ;Weber et al., 2020 ). Similarly, the three-level HGF outperformed the two-level HGF in a task with comparable structure (and identical priors), further suggesting the validity of the three-level HGF in identifying learning alterations in threatening or stressful environments .
We found that the state anxious participants' estimates of tonic volatility, as captured by the parameter 2 , were significantly lower than in controls, which led to significantly reduced learning rates and estimates of informational and environmental uncertainty. Beliefs about the outcome tendency were thus estimated to be more precise during anxiety, such that new and potentially revealing information about the true nature of hidden states had a smaller influence on the belief updates on that level. Critically, an overly precise belief about the outcome tendency might be inappropriate given the fluctuations in the true underlying hidden state. Thus, a drop in informational uncertainty during state anxiety might lead to biased learning, which here was further characterised by a lower learning rate about stimulus outcomes. This finding was confirmed in a separate model-free behavioural analysis: state anxious individuals exhibited a higher error rate during task performance relative to control participants. Our study thus provides novel and compelling evidence for abnormal precision (uncertainty) estimates underlying impoverished learning in healthy individuals experiencing temporary states of anxiety. Thereupon, the improper weighting of precision could be a general mechanism underlying a range of cognitive biases observed in healthy and psychiatric conditions, such as "hysteria " or autism ( Edwards et al., 2012 ;Lawson et al., 2017 ).
Theories of aberrant precision estimates are typically formulated using a Bayesian or predictive coding framework ( Parr, Rees, Friston, 2018 ). Precision is formalised as an attentional mechanism, calibrating neural gain to regulate the influence of prior beliefs and sensory outcomes on future expectations ( Friston and Kiebel, 2009 ;Feldman and Friston, 2010 ;Moran et al., 2013 ). Our results provide evidence for this computational account of attention through altered uncertainty estimates. However, more "classical " accounts of attention detailing a limited resource capacity do not wholly explain our behavioural data ( Lavie, 1995 ;Lavie et al., 2004 ). Our results showed that RT was not affected by the anxiety manipulation (in line with Bishop, 2009 ). This suggests deficient attentional resources or increased distraction are not the primary driving factor behind our reported impaired learning performance under state anxiety.
We also found that state anxiety led to a decrease in the precision of beliefs about environmental volatility, and reduced learning about this quantity. Learning about higher-level quantities thus depends upon the transmission of learning signals (precision-weighted PEs) from lower to higher levels. As our simulations show, a reduction in tonic volatility estimates does not only reduce learning about the contingencies governing observed stimuli and outcomes ( Supplementary Fig. 6) but also impairs learning about volatility. In particular, it prevents a trial-by-trial modulation of volatility estimates -learning -which would reduce the uncertainty about this quantity (Supplementary Fig. 7). Therefore, the model indicates that state anxious individuals remained uncertain about the current rate of change in the environment in our task. However, to examine whether state anxiety induces changes to phasic volatility estimation above and beyond this consequence of aberrant tonic volatility estimates, future studies will have to confront participants with environments in which the rate of change is dynamic across the experiment.
Changes to the contingencies governing the outcomes in our task were abrupt (see Fig. 1 B), which is in contrast to the generative model of the environment suggested by the HGF, where states evolve as Gaussian random walks and thus change slow and diffusively over time. While the HGF has been successful in explaining and predicting human behaviour in such tasks (e.g., Iglesias et al., 2013 ;de Berker et al., 2016 ), alternative models have been proposed based on a generative model which expects sudden changes ( Moens and Zénon, 2019 ;Nassar et al., 2010 ). In practice, both approaches (HGF and change-point models) can successfully deal with both kinds of environments (sudden versus diffuse changes), as a recent comparative analysis found ( Markovi ć and Kiebel, 2016 ). However, this analysis also indicated that Bayesian inference and model comparison methods can accurately disambiguate between data generated by the HGF versus a (reformulation of a) changedetection model. To understand whether participants use one or the other to infer on the dynamics of the environment, future work would thus profit from directly comparing the recent reformulations of changepoint models ( Markovi ć and Kiebel, 2016 ;Moens and Zénon, 2019 ) to the HGF.
Overall, the computational results confirm our hypothesis that state anxious individuals choose their responses founded on a biased representation of uncertainty over the current belief states -at least when dealing with volatile environments as assessed here. Overly precise beliefs may represent a strategy to regain a sense of control because uncertainty is experienced as aversive ( Carleton, 2016 ), such as observed in obsessive compulsive disorder ( Carleton, 2016 ) and ritualistic behaviour ( Lang et al., 2015 ). In turn, this emergence of biased estimates could increase the symptoms of anxiety over time through inaccurate recursive assessments of threat from uncertainty, thereby fitting a profile of anxious responses similar to those of anxiety-related disorders ( Grupe and Nitschke, 2013 ;Pulcu and Browning, 2019 ).

Precision-weighted prediction errors modulate trial-by-trial ERP responses
The modulation of trial-by-trial ERP responses by lower-level pwPEs in the control group aligns with previous studies combining EEG analyses with the HGF, which revealed that low-level pwPEs are reflected in trial-wise ERP responses during learning and perception in unstable environments ( Stefanics et al., 2018 ;Weber et al., 2020 ). Some studies also found higher-level pwPEs modulating brain responses, and supported that different hierarchically-related pwPEs (or related HGF quantities) are represented across different brain regions specific to the task demands ( Diaconescu et al., 2017a ;Iglesias et al., 2013 ;Weber et al., 2020 ).
Here, however, we excluded higher-level pwPEs from the GLM analysis due to near-collinearity of 3 and abs( 2 ) regressors. The fact that we did not observe a significant modulation of EEG responses by lowerlevel pwPEs in the StA group is consistent with our finding of reduced learning rates in this group. However, EEG responses to pwPEs were not significantly different when directly contrasting the groups, which prevents us from drawing strong conclusions about differential pwPE representations during state anxiety. The complementary visualisation of ERP modulations to high and low pwPEs further suggested a similar profile of ERP amplitude changes for both groups at the peak electrodes showing within-group effects to abs( 2 ) in the control group. Thus, the specific neural mechanism explaining the biased uncertainty estimates on reward contingencies -which are related to lower-level pwPEs -observed in state anxious participants remains elusive.
More generally, the evidence for neural representations of pwPEs in the control group is aligned with current predictive coding proposals. These view the brain as a Bayesian observer, estimating beliefs about hidden states in the environment through implementing a hierarchical generative model of the incoming sensory data ( de Lange et al., 2018 ;Doya et al., 2007 ;Friston, 2010 ;Rao and Ballard, 1999 ). In this framework, superficial pyramidal cells encode PEs weighted by precision, and these are also the signals that are thought to dominate the EEG ( Friston and Kiebel, 2009 ). This motivated us to assess the representation of pwPEs in brain responses, an approach followed by some of the previous fMRI and EEG studies ( Diaconescu et al., 2017a ;Iglesias et al., 2013 ;Stefanics et al., 2018 ;Weber et al., 2020 ).
Other model-based studies of trial-wise ERP responses like the P300 assessed alternative Bayesian inference parameters, such as precision or Bayesian surprise ( Kolossa et al., 2015 ;Mars et al., 2008 ;Ostwald et al., 2012 ). The centrally-distributed P3a component around 340 ms was identified as an index of belief updating, whereas the later P3b waveform of posterior topography was found to represent Bayesian surprise ( Kolossa et al., 2015 ). Despite these computational approaches to the P300 not being directly comparable to our pwPE results, they share a similar timeline and topography, as the centroparietal cluster in the Cont group overlaps with the location of the P3a and P3b waves as shown in Kolossa et al., (2015) . The ERP modulation to low-level pwPEs in our study might thus partially contribute to explaining the P300 amplitude changes obtained in the standard lose minus win ERP analysis conducted here, which itself showed the expected topographic gradient of the P300 component from central to posterior regions as shown in classical model-free ERP studies ( Hajcak et al., 2007( Hajcak et al., , 2005Polich, 2007 ;Wu and Zhou, 2009 ). Collectively, these results suggest that future studies assessing the effect of subclinical (trait, state) anxiety on the neural representation of computational quantities related to prediction updates could specifically target the topography and latency of the trial-wise P300. A state anxiety manipulation using the widely-used method of the threat of shock ( Grillon et al. 2019 ) could potentially induce more consistent neural responses in StA participants and thus allow for discrimination of the neural bases of pwPE in this group when compared to control participants.
It is important to note, interpretations concerning neuroanatomical regions are limited in our EEG study as it provided exclusively sensorlevel results. The anterior cingulate cortex (ACC) has been shown to contribute to encoding lower-level pwPEs in a task with a similar structure ( Iglesias et al. 2013 ). Intriguingly, state anxiety has been shown to deactivate the ventrolateral prefrontal cortex (PFC) and rostral ACC during cognitive control tasks that crucially depend on these areas ( Bishop et al., 2004 ). Attention bias for threat in anxiety is also associated with alterations in ACC/PFC, specifically in the connectivity between dorsal ACC/dorsomedial PFC and the amygdala ( Grillon et al., 2019 ). Thus, one hypothesis that could be tested in future combined fMRI-EEG studies is whether state anxiety disengages the ACC and PFC regions during reward-based learning, undermining their proper contribution to tracking pwPE about stimulus outcome tendencies, but also volatility.
Of particular interest, decreased dorsolateral PFC activity also characterises elevated trait anxiety levels, with detrimental consequences for performance and attentional control ( Bishop, 2009 ). And portions of the cingulate cortex and PFC are part of the central network underlying anxiety disturbances ( Grupe and Nitschke, 2013 ). Thus, an additional interesting question for future studies would be to assess the role that these brain regions play in the modulation of hierarchically-related pw-PEs that may lead to the computational biases described in trait anxiety ( Browning et al., 2015 ;Huang et al., 2017 ).