State anxiety biases estimates of uncertainty during reward learning in volatile environments

Previous research established that clinical anxiety impairs decision making and that high trait anxiety interferes with learning rates. Less understood are the effects of temporary anxious states on learning and decision making in healthy populations. Here we follow proposals that anxious states in healthy individuals elicit a pattern of aberrant behavioural, neural, and physiological responses comparable with those found in anxiety disorders, particularly when processing uncertainty in unstable environments. In our study, both a state anxious and a control group learned probabilistic stimulus-outcome mappings in a volatile task environment while we recorded their electrophysiological (EEG) signals. By using a hierarchical Bayesian model, we assessed the effect of state anxiety on Bayesian belief updating with a focus on uncertainty estimates. State anxiety was associated with an underestimation of environmental and informational uncertainty, and an increase in uncertainty about volatility estimates. Anxious individuals deemed their beliefs about reward contingencies to be more precise and to require less updating, ultimately leading to impaired reward-based learning. We interpret this pattern as evidence that state anxious individuals are less tolerant to informational uncertainty about the contingencies governing their environment and more uncertain about the level of stability of the world itself. Further, we tracked the neural representation of belief update signals in the trial-by-trial EEG amplitudes. In control participants, both lower-level precision-weighted prediction errors (pwPEs) about the reward outcomes and higher-level volatility-pwPEs were represented in the ERP signals with an anterior distribution. A different pattern emerged under state anxiety, where a neural representation of pwPEs was only found for updates about volatility. Expanding previous computational work on trait anxiety, our findings establish that temporary anxious states in healthy individuals impair reward-based learning in volatile environments, primarily through changes in uncertainty estimates and potentially a degradation of the neuronal representation of hierarchically-related pwPEs, considered to play a central role in current Bayesian accounts of perceptual inference and learning.


Introduction
Anxiety is characterised by excessive worry about negative possibilities (Grupe and Nitschke, 2013). It can lead to distinct difficulties when making decisions and learning about the world, as anxious individuals experience negative reactions to uncertainty, known as intolerance of uncertainty (IU; Bishop, 2007;Carleton, 2016). Recent work has established that individuals high in trait anxiety have difficulties adapting their learning rate to changes in probabilistic task environments (Browning et al., 2015;Huang et al., 2017). Less understood is how temporary states of anxiety in healthy subjects interfere with optimal learning and belief updating in the brain. Identifying the computations that subserve learning under state anxiety is important due to the prevalence of highly anxious states in most real-world environments that are filled with uncertainty (Bach et al., 2011;Bishop and Gagne, 2018). In addition, these insights could expand our understanding of the mechanisms by which anxiety biases beliefs about the world, linking to anxiety-related disorders.
Previous computational work identified three types of uncertainty during decision-making and learning: irreducible uncertainty, informational (estimation) uncertainty, and environmental uncertainty (Bland and Schaefer, 2012;de Berker et al., 2016;Yu and Dayan, 2005). Irreducible uncertainty emerges from the probabilistic relationships between responses and their outcomes, which is an inherent property of most real-world interactions. Estimation uncertainty arises from the imperfect information about those response-outcome relationships. Lastly, environmental uncertainty induced by volatility represents the possibility of change in that probabilistic environment. The latter affects learning as the individual is uncertain about her estimates as the world is changing. To reduce uncertainty, the brain is thought to appraise the inherent statistical structure of the world using probability distributions, continuously updating and inverting a hierarchical model of the sensory inputs (de Lange et al., 2018;Doya et al., 2007;Friston, 2010;Rao and Ballard, 1999). In this context, each type of uncertainty is expressed by the width (variance, or its inverse, precision) of the probability distribution of the corresponding belief (Feldman and Friston, 2010;Mathys 2011).
Examinations of belief, uncertainty, and precision estimates using Bayesian formulations in perceptual and learning tasks are increasingly used to provide mechanistic explanations for an array of neuropsychiatric conditions. Specifically, difficulties estimating precision have been suggested to explain various clinical expressions, from movement difficulties in Parkinson's disease to features of schizophrenia and autism (Friston et al., 2016Lawson et al., 2017Lawson et al., , 2014Parr et al., 2018). In the case of anxiety, altered beliefs are also theorised to play a vital role (Paulus and Stein, 2010;Paulus and Yu, 2012). As anxiety relates to worry over uncertainty, volatile task environments have been used to understand how trait anxiety affects learning, providing a mechanistic account of anxiety-related disorders (Browning et al., 2015;Huang et al., 2017). Healthy individuals are known to adapt their learning rate to volatility, with changing environments promoting a higher learning rate as new information needs to be integrated to better predict the future (Behrens et al., 2007). By contrast, high-trait anxious individuals show reduced adaptability of their learning rate to volatile environments, both in aversive (Browning et al., 2015) and reward settings (Huang et al., 2017). Moreover, they show poorer performance in decision-making tasks (de Visser et al., 2010;Miu et al., 2008).
Expanding on those findings, here we evaluated whether temporary anxious states in healthy individuals influence reward learning in a volatile environment through changes in informational and environmental uncertainty. Evidence for a link between anxiety and inaccurate estimation of uncertainty would lend support to recent theoretical accounts suggesting that difficulties learning from incomplete information and misestimations of uncertainty are crucial to understanding affective disorders (Pulcu and Browning, 2019).
Probabilistic inference has been proposed to be achieved through the sequential use of Bayes' rule: by dynamically combining our predictions (prior beliefs) with new evidence (sensory data), and weighting each resultant prediction error (PE) according to its precision (Feldman and Friston, 2010;Friston and Kiebel, 2009;Kok and de Lange, 2015). This predictive coding scheme relies on the hierarchical flow of information between cortical regions (Bastos et al., 2012;Iglesias et al., 2013;Rao and Ballard, 1999). Predictions are transmitted down the cortical hierarchy (backward) to meet incoming ascending (forward) sensory PEs thought to arise in supragranular layers in superficial pyramidal cells (Friston and Kiebel, 2009). Beliefs are then updated by reducing PE signals across each level of the cortical hierarchy weighted according to their estimated precision (Kok and de Lange, 2015). Importantly, developments in Bayesian computational modelling allow us to estimate inter-individual differences in the trial-wise computations and expression of these precision-weighted PEs (Mathys et al., , 2011. Monkey single-cell recording and human functional magnetic resonance imaging (fMRI) studies have shown that PEs elicited by reward are encoded by phasic responses in midbrain dopamine neurons, and these signals are conveyed to the medial frontal cortex (MFC;Chew et al., 2019;Matsumoto et al., 2007;Morris et al., 2006;Zarr and Brown, 2016). Using electroencephalography (EEG), these reward learning signals can be detectable in the error related negativity (ERN), an event-related potential (ERP) triggered by overt errors around 100 ms; and the feedback ERN (fERN) that follows negative feedback around 250 ms (Holroyd et al., 2003;Montague et al., 2004;Nieuwenhuis et al., 2004;Yeung et al., 2005). Both components have been shown to originate in the posterior medial frontal cortex (pMFC, including the anterior cingulate cortex, ACC; Holroyd et al., 2003;Montague et al., 2004;Yeung et al., 2005). Relevant to our study, the fERN has been proposed to index the magnitude of prediction violation (surprise), thus reflecting a reward PE signal that can be estimated, for instance, by using reinforcement learning models (Gehring and Willoughby, 2004;Holroyd and Coles, 2008;Holroyd and Krigolson, 2007). Also, the P300 (peaking between 250 -500 ms) of parietal topography may be sensitive to reward PEs, valence, and surprise (Hajcak et al., 2007(Hajcak et al., , 2005Polich, 2007;Wu and Zhou, 2009).
More compelling evidence linking PEs, Bayesian surprise, and belief updating to changes in ERP responses comes from studies combining computational modelling and analysis of trialwise EEG responses (Diaconescu et al., 2017a;Jepma et al., 2016;Kolossa et al., 2015;Mars et al., 2008;Stefanics et al., 2018;Weber et al., 2019). For instance, recent EEG studies on the MMN were able to spatiotemporally dissociate lower-level precision-weighted PE (pwPE) signals, which drive updates in belief estimates (Stefanics et al., 2018), and higher-level pwPEs, driving volatility updates (Weber et al., 2019). In addition, model-based single-trial analyses of the P300 identified the earlier P3a waveform of anterior distribution as an index of belief updating, whereas Bayesian surprise was represented in the later posterior P3b component (Kolossa et al., 2015). Here we were interested in assessing the neural representation of pwPEs across different levels, including lower-level pwPEs used to update reward tendency estimates, and higher-level pwPEs used to update volatility estimates, as belief updates on these two levels differentially depend on informational and environmental uncertainty. Accordingly, we evaluated the effect of these two hierarchically-related pwPEs on brain activity by analysing trial-wise ERP responses across frontal, central, and parietal brain regions, and within a broad temporal range from 200 to 500 ms, encompassing the fERN and P300 components.
To address our questions, we examined cortical dynamics in a control and a state anxious group using EEG recordings during a reward-based learning task. Further, to link the anxietyinduced neural changes to potential alterations in uncertainty estimation, we used a generative Bayesian inference model of perception and learning, the Hierarchical Gaussian Filter (HGF) (Mathys et al., , 2011. The HGF estimates individual trajectories of trial-wise belief updates governed by hierarchically related PEs based on the responses of participants. To reveal the effect of hierarchical PEs and precision weights on evoked brain responses, we used the relevant hierarchical computational quantities (pwPEs) as regressors in a general linear model (GLM) of trial-wise EEG amplitudes as done in previous studies (Diaconescu et al., 2017a(Diaconescu et al., , 2014Weber et al., 2019).

Participants
Forty-two healthy individuals (age 18-35, 28 females, mean age 27, standard error of the mean [SEM] 0.9) participated in this reward-based learning study following written informed consent. This experiment was approved by Goldsmiths University of London's ethics review committee.
Our sample size was informed by previous computational work on anxiety (Browning et al., 2015). All participants were healthy volunteers, with no past neurological or psychiatric disorders.
All participants were screened using Spielberger's Trait Anxiety Inventory (STAI; Spielberger, 1983) which has reliably demonstrated internal consistency and convergent and discriminant validity (Barnes et al., 2002;Spielberger, 1983;Spielberger et al., 1970). Scores on this trait inventory range from low (20) to high anxiety (80). Participants were measured for their trait anxiety level (mean 46, SEM 1.5) and then split into two groups using the median value (43).
This created a high and low trait anxiety group to then randomly draw from to create the experimental and control groups. Importantly, trait anxiety levels did not exceed the clinical level (> 70: this cutoff score represents the mean and 2 SD above the mean for adults (Spielberger, 1983;Taylor et al., 2005).

Experimental Design
Using a between-subject experimental design with state anxiety being the between-subject factor, we allocated participants pseudo-randomly (after screening trait anxiety scores) to the experimental (state anxiety, StA) or control (Cont) group. They completed our experimental task, which consisted of four blocks, resting state 1 (R1: baseline), reward-learning task block 1 (TB1), reward-learning task block 2 (TB2), and resting state 2 (R2; see Supplementary Figure   1). Both resting state blocks were 5 minute-long recordings of EEG and electrocardiography (ECG) with eyes open. After R1, participants conducted a binary choice decision-making task with contingencies that changed over the course of learning as in previous work (Behrens et al., 2007;de Berker et al., 2016;Iglesias et al., 2013). In our task, participants completed two blocks of 200 trials each (TB1, TB2), and their goal was to find out which one of two visual icons (always either blue or orange: see Figure 1) would lead to a monetary reward (positive reinforcement, 5 pence). Thus, they had to learn the probability of reward assigned to each stimulus (reciprocal: p, 1-p). Both experimental blocks were divided into 5 segments with different stimulus-outcome contingency mappings that were randomly ordered for each participant and varied in length between 26 and 38 trials. These contingencies ranked from being strongly biased (90/10), moderately biased (70/30), to unbiased (50/50), and repeated in reverse relationships (10/90; 30/70) so that over the two blocks there were 10 contingency blocks in total . were presented with two visual icons. They were instructed to predict the rewarding stimulus (win = 5p). The stimuli (blue or orange image) were randomly presented to either the left or right of the screen. They remained on the screen until a response was given or the allowed time (2200 ms ± 200 ms) expiredrecorded as no-response. When a response of either the left arrow key or right arrow key was pressed, participants immediately saw their chosen image highlighted in bright green, which remained on screen  for 1200 ms (±200 ms) before the outcome was revealed. The outcome, either win or lose, was shown in the middle of the screen for 1200 ms (±200 ms) in green and red respectively. Each trial ended with a fixation cross and an inter-trial interval of 1250 ms (±250 ms). B) The probability governing the likelihood of the blue stimulus being rewarded (p(win|blue), with reciprocal probability values for the orange stimulus: p(win|orange) = 1 -p(win|blue)). Probability mappings varied in length (26-38 trials) ranging from heavily biased (90/10), moderately biased (70/30), to unbiased (50/50), and repeated in reverse relationships (10/90; 30/70). Here we follow one example of contingency changes for p(win|blue) over the course of the experimental blocks (TB1, TB2, 200 trials each). These blocks were divided into the 5 randomly ordered stimulus-outcome mappings and were randomly generated for each participant. While conducting the experimental task, participants' physiological responses -C) EEG and D) ECG -were recorded continuously, with R-peaks from ECG signals being used to calculate heart-rate variability.
On individual trials, participants were asked to predict which of the two visual icons was going to reward them with money. Successful predictions were rewarded 5p, while unsuccessful predictions and no-responses were regarded as losses with 0p reward (Figure 1). The stimuli were either presented to the left or right of centre screen randomly. They remained on the screen until a response was given or the prediction time (2200 ms ±200 ms) expired. When a response of either the left arrow key or right arrow key was pressed, participants immediately saw their chosen image highlighted in bright green, which remained on screen for 1200 ms (±200 ms) before the outcome was revealed. The outcome, either win or lose, was shown in the middle of the screen for 1200 ms (±200 ms) in green and red respectively. Each trial ended with a fixation cross and an inter-trial interval of 1250 ms (± 250 ms).
The participants were given full computerised instructions for each element of the experiment, including questionnaires. Each questionnaire came with written instructions and was responded to using the numerical keyboard buttons. Just before 10 practice trials of the same probabilistic reward-learning task used in the main experiment, participants were explicitly informed that the reward structure would change throughout the task and that they needed to adjust their predictions in response to inferred changes . Importantly, directly after this but before TB1, all were informed that this experiment was, in fact, an examination of performance using two tasks, reward-learning and public speaking; participants were instructed according to their group allocation in StA or Cont.

State Anxiety Manipulation
Those participants in StA were informed that they had been randomly selected to complete a public speaking task after finalising the reward-learning task (Feldman et al., 2004;Lang et al., 2015). Participants were told they would be required to present a piece of abstract art and would be allowed to prepare for 3 minutes for a 5 minute presentation of this artwork to a panel of academic experts. Those in the control group (Cont), were informed that they were to be given a piece of abstract art and they were to describe it to themselves (instead of a panel of experts) for the same period of time. After completion of the reward-based learning blocks, participants in the StA group were informed about the sudden unavailability of the panel, and thus were instructed to present the artwork to themselves (similarly to the Cont group).

EEG and ECG Recording and Pre-Processing
EEG and ECG signals were recorded throughout all task blocks (R1, TB1, TB2, and R2) using the BioSemi ActiveTwo system (64 electrodes, extended international 10-20) with a sampling rate of 512 Hz. The EEG signals were referenced to two electrodes affixed to the left and right earlobes. Four additional external electrodes in a bipolar configuration were used, which included two electrodes positioned to capture vertical and horizontal eye-movements (EOG), one to the zygomatic bone of the right eye, and one to the glabella (between both eyes); and two electrodes to record the ECG. ECG electrodes were placed in a two-lead configuration (Moody and Mark, 1982) calibrated to fit the Einthoven triangle (Wilson et al., 1931). All electrodes used highly conductive bacteriostatic Signa gel (by Parker). All events, including presentation of stimuli, participant responses, and trial outcomes were recorded in the EEG file using event markers.
Analysis of the ECG data was conducted in MATLAB (The MathWorks, Inc., MA, USA) using the FieldTrip toolbox  and their recommended procedure to detect the cardiac artefacts (http://www.fieldtriptoolbox.org/example/use_independent_component_analysis_ica_to_remove _ecg_artifacts). Following this approach, the ECG signal was used to detect the QRS-complex and its main peak, the R wave peak. Next, we extracted the latency of the R-peak, which was used to compute the coefficient of variation (CV = standard deviation / mean) of the difference intervals between consecutive R-peaks (inter-beat interval). The CV of inter-beat intervals was calculated within each task block (R1: baseline, TB1, TB2, R2), and was used as a metric of heart rate variability (HRV) for statistical testing.
EEG data were preprocessed in EEGLAB toolbox (Delorme and Makeig, 2004) by first highpass filtering at 0.5Hz (hamming windowed sinc finite impulse response [FIR] filter, 3381 points) and then notch-filtering between 48-52Hz (847 points) to remove power line noise. Afterwards, artefacts (eye blinks, eye movements, cardiac artefacts) were classified using independent components analysis (ICA, runICA algorithm) and removed (on average 2.3, SEM 0.16, components). Noisy channels were corrected utilising spherical interpolation. All signals were then epoched around outcome onsets (win, lose) from -100 to 500 ms. Noisy epochs exceeding +/-100μV were identified and removed using a thresholding technique relative to the prestimulus baseline. The number of rejected trials for each participant did not exceed 10% of the total number.
Cleaned EEG and preprocessed behavioural data files are available in the Open Science Framework Data Repository: https://osf.io/b4qkp/. The results shown in Figures 3, 4, and 5 are based on these data.

Measures of State Anxiety
Two markers of state anxiety were used during the experiment. First, we used the CV of the inter-beat intervals to assess HRV, as this measure, similarly to other metrics of HRV, has been reported to show reductions during anxious states (Chalmers et al., 2014;Friedman and Thayer, 1998;Gorman and Sloan, 2000;Kawachi et al., 1995). A lower HRV is associated with complexity reduction in physiological responses to stress and anxiety (Friedman, 2007;Gorman and Sloan, 2000), and is used as a transdiagnostic marker to identify anxiety in psychiatry (Quintana et al., 2016). In addition, we acquired subjective self-reported measures of state anxiety (STAI state scale X1, 20 items: Spielberger, 1983). This score was acquired twice, once before R1 (prior to the anxiety manipulation for the StA group), and once after completing the reward-learning task (just before the scheduled public speaking in the StA group). The latter score was expected to be higher in the StA group relative to the baseline pre-R1 score.

The Hierarchical Gaussian Filter (HGF)
We used the Hierarchical Gaussian Filter (HGF) from Mathys et al. (2014Mathys et al. ( , 2011 to estimate each participant's individual learning characteristics and belief trajectories during our binary reward-learning task. The HGF has been applied to understand learning across diverse settings Diaconescu et al., 2017bDiaconescu et al., , 2014Iglesias et al., 2013;Marshall et al., 2016;Stefanics et al., 2018;Weber et al., 2019). It is implemented in the freely available open source software TAPAS (http://www.translationalneuromodeling.org/tapas).
The HGF is a generative model representing an approximately Bayesian observer estimating hidden states in the environment. As such, the HGF is a model of perceptual inference and learning, which can be coupled to a response model. In the generative model, a sequence of hidden states x1 (k) , x2 (k) ,..., xn (k) gives rise to sensory inputs that each participant encounters across k trials. Inference from observations to beliefs is implemented as a hierarchical belief updating process. Notably, while the perceptual model specifies how the sensory inputs are used to estimate the hidden states, the response model generates the most probable response according to those estimates (see Figure 2).

Figure 2. Three level binary Hierarchical Gaussian Filter for binary outcomes. Bottom panel.
Representation of the three levels of the HGF for binary outcomes and the associated belief trajectories across the total 400 trials in a representative participant. At the lowest level, the inputs u correspond to the rewarded outcome of each trial (1 = blue, 0 = orange; shown as black dots). The participant's responses y are shown in light blue dots tracking those trial outcomes. The learning rate (α) about stimulus outcomes at the lowest level is also given in black. The belief on the second level, μ2, represents the participant's estimate of the stimulus tendency x2 and the step size or variance of the Gaussian random walk for x2 depends on parameters κ and ω2, in addition to the estimates of the level above, x3. The belief on the third level, μ3, represents estimates of volatility x3, whose step size is governed by parameter ω3. Top panel. Schematic representation of the 3-level HGF model with relevant parameters modulating each level. All parameters are fitted to individual responses of the participants and describe an individual's learning fingerprint.
Here, we used a 3-level HGF model for binary outcomes (Mathys et al., , 2011. At the lowest level, the hidden state x1 corresponds to the binary categorical variable of the experimental stimuli, which represents whether the blue symbol is rewarding (x1 (k) = 1; hence, orange would be non-rewarding) or not rewarding (x1 (k) = 0; with orange rewarded) in trial k. The second and third level states, x2 and x3, are continuous variables evolving as coupled Gaussian random walks. Thus, their value at trial k will be normally distributed around their previous value at trial k-1. State x2 describes the true value of the tendency of the stimulus-outcome contingency, whereas μ2 denotes each participant's estimation (mean; σ2 being the variance) of the tendency for the probabilistic outcomes. State x2 can be mapped to the probability of the binary state x1 through a Bernoulli distribution, p(x1 | x2) = Bernoulli (x1; s(x2)), where s(x) is a sigmoid function s(x) = 1/(1 + exp(-x)). The implied learning rate at the lowest level, α, can be defined as the change in expectation, defined as the sigmoid transformed difference between μ2 before seeing the input and after seeing it, relative to the difference between the observed inputs u and its prediction s(μ2) (Figure 2, lower panel; TAPAS toolbox: tapas_hgf_binary.m). A larger belief update in response to the same observed mismatch between the input u and the prediction amounts to a higher learning rate α. At the top level, x3 represents the phasic logvolatility within the task environment (change in the probabilistic relationships across the experiment) and μ3 (σ3) the individual's estimate of it. The coupling between levels 2 and 3 is through a positive (exponential) function of x3, which represents the variance or step size of the Gaussian random walk that determines how x2 evolves in time: The parameters κ and ω2 represent the coupling strength and the tonic volatility, respectively. In the associated belief updates, momentarily high volatility estimates (μ3) increase the speed with which beliefs at level 2 change. Larger values of the tonic (time-invariant) part of the variance, ω2, generally increase the step size of x2 and lead to faster belief updates on level 2 irrespective of current levels of (estimated) volatility. The step size of the volatility state, x3, is fixed to a constant parameter ω3, with ω3 also being estimated in each individual participant, similarly to κ and ω2: 2 (%) ∼ 7 2 (%*+) , exp ( 2 )8 (2) As response model we used the unit-square sigmoid observation model for binary responses (Iglesias et al., 2013;Mathys et al., 2014). This transforms the predicted probability m(k) that the stimulus (e.g. blue) is rewarding on trial k (outcome = 1) -which is a function of the current beliefs -into the probabilities p(y (k) = 1) and p(y (k) = 0) that the participant will choose that stimulus (blue, 1) or the alternative (orange, 0): Higher values of the response parameter ζ lead to the participants being more likely to choose the response that corresponds with their current belief about the rewarded stimulus.
Fitting the combination of perceptual and response model to an individual participant's responses allows for a subject-specific characterisation of learning (and response) style by the set of perceptual (and response) parameters. Here, we only estimated ω2, ω3, and ζ, with κ and the starting values of the beliefs fixed according to Table 1.
Importantly, the update equations of the posterior estimates for level i (i = 2 and 3) depend on the prediction error of the level below, δi-1, scaled proportionally to the ratio of the precision of the prediction of the level below (hat denotes prediction before seeing the input) and the precision of the current level. This is captured in the expression: And precision is defined as the inverse variance of the expectation: The variance of the posterior expectation, σi, corresponds to the estimation or informational uncertainty about the hidden state xi. Accordingly, equation 4 above articulates the idea that more uncertain (less precise) belief estimates for the current level should motivate larger changes to beliefs. More detailed update equations for our 3-level HGF model are supplied in Mathys et al. (2011Mathys et al. ( , 2014. The additional measure of uncertainty that we used was environmental uncertainty, which is related to volatility in the environment, according to this expression: In sum, in the current study, the computational quantities of interest were the model parameters ω2 (tonic volatility estimate) and ω3 ('meta-volatility'); the trial-wise posterior beliefs about volatility (μ3)which were used to estimate trial-wise environmental uncertainty; and the trialwise variances on levels 2 and 3 (σ2, σ3) as a measure of (informational) uncertainty about the hidden states on these levels.
Because precision-weighted prediction errors play an important role in current Bayesian theories of perceptual inference and learning (Doya et al., 2007;Feldman and Friston, 2010;Friston et al., 2013;Friston and Kiebel, 2009;Moran et al., 2013;Rao and Ballard, 1999), and these are the quantities that are considered to predominantly modulate EEG signals (Friston and Kiebel, 2009), we selected the pwPE trajectories from levels 2 and 3 (labelled ε2, ε3) to examine how these are represented in the brain as a function of state anxiety (see GLM analysis section below).

Model Space
We used four computational models of learning. The first two were a 2-level (excluding volatility) and 3-level hierarchical Bayesian model (HGF: Mathys et al. (2011). The third model was a Rescorla Wagner (RW) where PEs drive belief updating but with a set learning rate (Rescorla and Wagner, 1972). The final model was a Sutton K1 model (SK1) that permits the learning rate to change with recent prediction errors (Sutton, 1992). These models are also implemented in the TAPAS toolbox. Models were then compared at the group level for fit using random effects Bayesian model selection (BMS; Stephan et al., 2009; code from the freely available MACS toolbox; Soch and Allefeld, 2018). BMS provided model frequencies and exceedance probabilities reflecting how optimal each model or family of models performed (Soch et al., 2016). First, the log-model evidence (LME) from both Bayesian models were combined to get the log-family evidence (LFE) and was compared to the LFE of the family of reinforcement learning models (RW and SK1) to assess which provided more evidence. In the winner family, additional BMS determined the final optimal model.

EEG analysis and the General Linear Model
Prior to single-trial ERP analysis using the general linear model (GLM), a statistical analysis of the differences between ERPs following win versus loss outcomes was conducted independently in both groups (StA, N = 21, Cont, N = 21) using permutation tests with a clusterbased threshold correction to control the family-wise error (FWE) at level 0.05 (dependent samples t-test, 1000 iterations; (Maris and Oostenveld, 2007); FieldTrip toolbox, . Experimental cluster-based test statistics being in the 2.5th and 97.5th percentiles of the permutation distribution were considered significant (two-tailed test, P < 0.025). For this statistical analysis, the ERP data epochs were baseline-corrected by subtracting the mean activation during the baseline period from -200 ms to 0 ms. The aim of this within-group ERP analysis was to assess whether the windows associated with the effect of the outcome (win, lose) on the EEG signals in our task and in each group separately converge with the windows of the fERN and P300 effects reported in previous studies (see for instance Nieuwenhuis et al., 2004;Hajcak et al., 2005). Note that the windows selected for the GLM analysis were broadly between-subject spatial variability in the channel space. In accordance with fMRI, the scalp x time 3D images were then tested statistically using statistical parametric mapping and the GLM (see next section; Friston, 2004a, 2004b;Kilner and Friston, 2010). This procedure is firmly established in EEG using SPM (Litvak et al., 2011).
Our GLM was composed of trial-wise estimates of two computational quantities: absolute values of pwPEs in level 2 (ε2), and pwPEs in level 3 (ε3). The absolute value of ε2 was selected because its sign is arbitrary: the quantity x2 is related to the tendency of one choice (e.g. blue stimulus) to be rewarding (x1 = 1), yet this choice was arbitrary and thus is the sign of the pwPE at this level (see for instance Stefanics et al., 2018). In addition, we used as third regressor the trial-wise outcome values (0 for lose, 1 for win) as we expected this variable to account for much of the signal variance in the EEG epochs. These three regressors were not orthogonalised. The window for this analysis was selected from 200 to 500 ms, based on previous literature on the fERN (also ERN) and P300 components (Hajcak et al., 2005;Nieuwenhuis et al., 2004).
Using these choices for regressors and time interval, we then carried out a whole-volume (spatiotemporal) analysis that searched for representations of our computational quantities in the single-trial EEG responses for each individual participant, before assessing within-group statistical effects at the second level. We corrected for multiple comparisons across the whole time-sensor matrix using Gaussian random field theory  with a family-wise error (FWE) correction at the cluster-level (p<0.05). This was performed with a cluster defining threshold (CDT) of p<0,001 (Flandin and Friston, 2019). Importantly, all results reported survived whole-volume correction at the peak-level (p<0.05). We assessed separately within each group whether the trajectories of our computational quantities were associated with increases or decreases in EEG amplitudes using an F-test. A standard summary statistics approach was used to perform random effects group analysis within each group (StA, Cont) of 21 participants independently (Penny and Holmes, 2007). As we hypothesised between-group differences in the uncertainty estimates, which would differently affect the precision weights on the PEs, thus the ε2 and ε3 regressors, we did not implement a between-group statistical analysis on the GLM-driven EEG representations as this would result in invalid statistical inference (Kriegeskorte et al., 2009).

Statistics
To assess Group (StA, Cont) and Block (1,2) main effects and interactions in state anxiety measures, behavioural and computational model variables, we applied non-parametric factorial synchronised permutations tests (Basso et al., 2007). These permutation-based factorial analyses were followed up by planned pair-wise permutation tests to assess our specific hypothesis of between-group differences. This applies to the following dependent variables: ( Pair-wise permutation tests were also used to test within-group differences in RT across blocks. In the case of multiple comparisons (for instance, two between-group permutation tests run separately for each block), we controlled the false discovery rate (FDR) using an adaptive linear step-up procedure set to a level of q = 0.05 (Benjamini et al., 2006). This procedure furnished us with an adapted threshold p-value (PFDR). Prior to these statistical analyses and following BMS, the trial-wise trajectory for each computational quantity of interest (σ2, σ3 or environmental uncertainty, eq. 6) was extracted from the winning model, followed by an average across trials within task blocks (TB1, TB2). Despite of trial-by-trial changes in these belief trajectories relating to the subject-specific trial structure and contingency blocks, the average values across trials revealed the general monotonic changes in the trajectories within each block, which is what we aimed to evaluate as a function of the factors Group and Block using the 2 x 2 factorial analysis, as described above.
Below in the Results section, we present the mean and standard error of the mean (SEM) for our dependent variables (either in text or in a figure), alongside non-parametric effect sizes for pair-wise comparisons and corresponding bootstrapped confidence intervals (Grissom and Kim, 2012;Ruscio and Mullen, 2012). In the case of within-group comparisons, the non-parametric effect size was estimated using the probability of superiority for dependent samples (Δdep), whereas for between-group effects we used the probability of superiority (Δ); both are calculated in line with Grissom and Kim (2012), expressed as the number of values in sample A greater than those in sample B (Δ = P[A>B]). In the case of dependent samples, the comparison between pairs is done for matched pairs. Although in the original formulation by Grissom and Kim (2012), ties were not taken into account; here, in line with Ruscio and Mullen (2012), we corrected (Δ) using the number of ties (difference scores = 0) and estimated bootstrapped confidence intervals (CI) for (Δ).

Heart-rate variability
Using a non-parametric 2 x 3 factorial test with synchronised rearrangements, significant main  Figure 3A). These results indicate that the experimental manipulation achieved physiological changes from the heart corresponding to an anxious state (Chalmers et al., 2014;Feldman et al., 2004).

State-trait Inventory
Self-reported state anxiety measures led to a significant main effect of Block (P = 0.03). There anxiety scores between groups was found (P > 0.1).

Model-free Analysis
The percentage of errors made by each participant across 400 trials was used as a measure to assess whether anxiety impairs reward-learning task performance. Using non-parametric factorial test (synchronised rearrangements), the main effect of factor Group on error rates was significant (P = 0.01), but not the main effect of task Block or interaction effect (P = 0.056, P = Turning to the mean reaction times (RT, in milliseconds), a significant main effect of task Block was observed (P = 0.008). But no significant main effect of Group or interaction effect was found (P = 0.64, P = 0.26) in line with previous work on anxiety (Bishop, 2009 Figure 3C).

Figure 3. State anxiety modulates heart rate variability and behavioural responses. A)
Modification in heart-rate variability (HRV) by the anxiety manipulation. The average HRV (measured with the coefficient of variation of the inter-beat-interval of the ECG signal) is provided for the state anxiety (StA) and Control (Cont) groups across task block 1 (TB1), task block 2 (TB2) and final resting state (R2). The average of the resting state (R1: baseline) has been subtracted from each subsequent task block to normalise HRV values. Significant between-group differences are identified by black bars on the x-axis (paired permutation test, PFDR < 0.05 after control of the FDR at level q = 0.05). B) The effect of anxiety on reward-based learning performance: error rates. Here, the average error rates of each group, the state anxiety (StA) and the control group (Cont), are presented using a central point flanked by SEM bars. To the right of each mean and SEM are the individual data points in each group to show group population dispersion. Anxiety significantly increased the error rate in the StA group when compared to Controls (P = 0.001). C) The main effect of outcome (win, black; lose, green) on mean reaction times (RT: P = 0). On the left the average RT of each outcome is presented using a central point with SEM bars. To the right of each mean and SEM are the individual data points of each group to show group population dispersion.

Bayesian Model Selection
After fitting each model (HGF: 3-Levels, 2-Levels, the Rescorla Wagner [RW], and Sutton K1 [SK1]) individually in each of the 42 participants and obtaining log-model evidence (LME) values for each, we compared the four models using Bayesian model selection (BMS). Results from BMS revealed that the family of Bayesian models (3-levels and 2-levels HGF) had much stronger evidence than the reinforcement-learning models (RW, SK1), with an exceedance probability of 0.99, and an expected frequency of 0.73 (leftmost columns: Figure 4A). Next, within the Bayesian models, an additional BMS step using the LME for each subject and model demonstrated much stronger evidence for the 3-levels HGF model relative to the 2-levels version, with an exceedance probability of 0.98 and an expected frequency of 0.68 (rightmost columns: Figure 4A). The 3-levels HGF model was the winner model also when performing BMS separately in the StA and Control groups.

State anxiety is associated with a lower learning rate about stimulus outcomes
We observed significant differences between the groups in parameter ω2, which is the tonic part  . More negative ω2 values -as found in StAlead to smaller updates, and thus to smaller learning rates (See illustration in Figure 4B).

Informational Uncertainty about the outcome tendency is lower in state anxious individuals.
We then evaluated the model estimates of informational uncertainty about the outcome tendency, σ2. This variance measure reflects the lack of knowledge about x2 and depends on ω2 but also on the volatility estimate μ3 and other quantities (eqs. 11 and 13 in Mathys et al., 2014).
Thus, lower ω2 values lead to smaller σ2, however, the impact of μ3 on σ2 could alter this effect.
Here we found a significant main effect of Group (P = 0.003). Yet, the Block factor and interaction effect were not significant (P = 0.4028, P = 0.7352). In addition, planned comparisons showed that anxiety significantly lowered the total average σ2 for StA in comparison to Cont (Figure 4C; P = 0.003, Δ = 0.75, CI = [0.55, 0.89]). Because precision is the inverse variance (informational uncertainty) of the distribution, these outcomes demonstrate that StA individuals estimated their beliefs about the outcome tendency to be more precise, and therefore new information had a smaller impact on the update equations for x2.

Environmental uncertainty is underestimated in state anxiety.
Environmental uncertainty, which relates to the posterior beliefs over volatility, depends only on the tonic volatility, ω2, and the trial-wise volatility estimate, µ3 (see equation 6 above; the coupling constant k was fixed to zero). We found that environmental uncertainty was significantly modulated by factor Group (P = 0.02), while there was no significant main effect for factor Block or interaction effect (P = 0.58, P = 0.7547). Further pair-wise analyses demonstrated that the StA group underestimated the environmental uncertainty, relative to control participants, when averaging across both experimental blocks ( Figure 4D; P = 0, Δ = 0.74, CI = [0.54, 0.88]).

Uncertainty about volatility is higher in state anxious individuals.
In contrast to the effect on σ2 reported above, state anxiety increased uncertainty on level 3 (σ3).
We  Trajectories were simulated using the same input sequence and parameters (except ω2): μ2 (0) = 0, μ3 (0) = 1, σ2 (0) = log(0.1), σ3 (0) = log(1), κ = 1, ω3 = 7. The two priors on ω2 used in the simulated trajectories are -2 (orange) and -4 (black). This parameter represents the tonic part of the variance in the Gaussian random walk for x2 and modulates the learning rate about stimulus outcomes at the lowest level. Lower ω2 values lead to smaller trial-by-trial learning increments. When comparing ω2 values between groups (StA, Cont), we found more negative values in StA than in the Cont group (P = 0.002). C) Lower ω2 in state anxiety leads to decreased informational uncertainty about x2. There was a significant main effect for factor Group (StA, green; Cont, black; synchronised permutation test: P = 0.003) but not for factor Block (P > 0.05). Planned between-group comparisons indicated that state anxiety significantly decreased the average uncertainty about beliefs on tendency x2 (P = 0.003, as given by black bars), after averaging across both blocks; significant effect indicated by black bars at the bottom). D) Lower ω2 in state anxiety leads to decreased environmental uncertainty (P = 0.02) (not effect of factor Block (P > 0.05)). Thus, StA participants had a lower estimate of environmental uncertainty or volatility. E) State anxiety increased uncertainty about volatility in the task environment (σ3). We found a significant main effect for factor Block (P = 0.004) and Group (StA, green, Cont, black; P = 0.0002), modulating uncertainty about volatility. Planned between-group comparisons further indicated that state anxiety exhibited significantly higher σ3, as compared to control participants, separately in each task block (TB1, TB2, PFDR < 0.05, as given by black bars).

Standard Lose versus Win ERP results
Cluster-based random permutation tests demonstrated in both groups (StA, 21, Cont, 21) a significant difference between the effect of the two outcomes (lose, win) on the ERP (two significant clusters in each group at level P < 0.025).
In the control group, losing led to a more negative ERP amplitude than winning during a time window between 230 and 360 ms post outcome (negative cluster, P = 0.008). This effect at first had a centro-parietal distribution, which later propagated to broader central, frontal, temporal, and parietal electrode regions, occurring approximately in line with the fERN ERP (Supplementary Figure 2). In a later time window, between 350 and 500 ms, losing evoked a more positive amplitude when compared to winning (positive cluster, P = 0.0002). During this later latency, the difference originated over fronto-central electrodes, and later spread to centroparietal electrodes resembling the P300 component wave (Supplementary Figure 2). The latency of the significant clusters confirmed that lose relative to win trials elicited a biphasic ERP modulation consisting of an earlier negative wave resembling the fERN and a later positive and very pronounced deflection corresponding to the P300.
In the state anxiety group, a similar spatio-temporal ERP profile to the Cont group emerged.
Losing was associated to a more negative ERP amplitude when compared to winning between 240 and 350 ms post outcome (significant negative cluster, P = 0.004; Supplementary Figure   3). This effect originated in centro-parietal regions and spread to frontal and central sites later in the time window. Following this effect, we found a significant positive deflection between 350 and 500 ms (positive cluster, P = 0.0002; Supplementary Figure 3). In this later time window, the spatio-temporal pattern began with activation across fronto-central electrodes and developed in to centro-parietal electrodes regions, like in the control group, but in more anterior central electrodes.

Single-trial ERP modulations by precision-weighted PEs
The HGF results had confirmed that state anxiety alters informational uncertainty estimates about beliefs on level 2 and also about volatility on level 3 (Figure 4), in an opposing pattern of changes (decrease in σ2 and increase in σ3 relative to control participants). We then proceeded to analyse in each group separately the electrophysiological representations of trial-wise pwPEs for level 2 and 3 -which are a function of those uncertainty estimates as shown in equation 4 (for an illustration of ε2, ε3, see Figure 5A). The GLM results of the additional outcome regressor are shown in Supplementary Figure 4.

Low-level precision-weighted prediction errors
A significant effect was found in the Cont group between 440 ms and 461 ms post stimulus, peaking at 453 ms over central channels (Figure 5B; whole-volume cluster-level FWE corrected, termed PFWE hereafter, PFWE = 0.007). Two additional clusters were found slightly later around 480-490 ms at right frontal (PFWE = 0.039) and central (PFWE = 0.020) electrodes.
Details on the cluster effects can be found in Table 2. By contrast, testing the GLM in the StA group, we found no significant modulation by ε2 of single-trial ERPs.
In an attempt to understand the lack of significant effects of ε2 in the GLM analysis in the anxiety group, we evaluated the variance of this regressor in every participant and then calculated the mean in each group separately. In StA, the group mean variance of ε2 was smaller (0.14, SEM 0.048) than in the control group (0.31, SEM 0.080), suggesting that less variance in this regressor is available in the StA group to explain the EEG variance.

High-level precision-weighted prediction errors
In the Cont group, ε3 significantly elicited trial-wise EEG responses from 304 ms to 450 ms post stimulus, with a maximum effect at 387 ms across a left frontocentral region (PFWE < 0.0001).
Additional significant effects of a smaller cluster size were found earlier, between 205-225 ms (PFWE = 0.022) at right parietal channels ( Figure 5C).

. Signatures of precision-weighted prediction errors on trial-wise ERPs. A)
Trajectories of model-based estimates for both lower-level and higher-level pwPE for one representative control group participant across 400 trials. In green are higher-level pwPEs concerning volatility; in black are the absolute values of the lower-level pwPE concerning beliefs about the rewarding stimulus. B) Effect of pwPEs on level 2 (ε2) in controls. In the control group (Cont), responses in single-trial ERPs were significantly modulated by pwPEs about reward tendency in central electrodes. This significant cluster occurred between 440 ms and 461 ms, and is shown on a 2D scalp map at the time of the maximum peak of the cluster (453 ms post stimulus, PFWE = 0.007, with a cluster-defining threshold of P < 0.001). C) Effect of pwPEs on level 3 (ε3) in controls. In the Cont group, pwPEs about volatility estimates correlated with in activation changes across a left frontocentral region between 304 ms to 450 ms, as shown in this topographical representation at the time of the maximum peak of the cluster (387 ms post stimulus, PFWE < 0.0001, with a cluster-defining threshold of P < 0.001). D) Effect of pwPEs on level 3 (ε3) during state anxiety. In the state anxiety group (StA), ε3 was associated with trial-wise ERP changes in midline parietal electrodes. This effect, ranging from 354 ms and 365 ms, is shown in a topographic scalp map at the time of the maximum peak of the cluster (359 ms post stimulus, PFWE = 0.028, with a cluster-defining threshold of P < 0.001). A further significant effect of a smaller cluster size occurred between 423-431 ms (PFWE = 0.035) at the left temporal region.

Discussion
We combined computational modelling of behaviour and analysis of electrophysiological responses to examine how state anxiety modulates reward-based learning when learning in a volatile environment. Our key finding is that state anxiety was associated with a lower learning rate, driven by an underestimation of environmental and informational uncertainty. At the same time, we observed a decrease in the precision of estimates of environmental volatility -a higher-level belief -during anxiety.
Trial-wise estimates of uncertainty -or its inverse, precision -serve to scale the impact of prediction errors (PEs) on the belief updates. Consistent with previous reports (Stefanics et al., 2018;Weber et al., 2019), we found that precision-weighted PEs (pwPEs) on two hierarchical levels can explain trial-wise modulation of observed ERP responses in control participants.
Specifically, lower-level pwPEs about reward outcomes explained variation in EEG amplitudes in a different time window after stimulus presentation than did higher-level pwPEs informing volatility estimates. A different pattern emerged in the state anxiety group, where only higherlevel pwPEs modulated the trial-wise ERP changes. Taken together, the data suggest that temporary anxious states in healthy individuals impair reward-based learning in volatile environments, primarily through changes in uncertainty estimates and potentially a degradation of the neuronal representation of hierarchically-related pwPEs, which are considered to play a central role in current Bayesian accounts of perceptual inference and learning.

States of anxiety bias computations of different types of uncertainty during reward-based learning
The threat of a public speaking task used in our experiment reduced heart rate variability, which is consistent with previous findings on state anxiety (Chalmers et al., 2014;Feldman et al., 2004;Gorman and Sloan, 2000), and despite the lack of corresponding significant effects in the STAI state anxiety scale. At the same time, our experimental manipulation had an adverse effect on reward-based learning. Having matched trait anxiety levels across the state anxious and the control group, our results indicate that the changes observed in reward-based learning -lower learning rates and changes in uncertainty -can be linked to temporary anxious states independent of trait levels. These outcomes thus expand prior findings of an association between high levels of trait anxiety and difficulties in decision-making tasks (de Visser et al., 2010;Miu et al., 2008) and learning in volatile task environments (Browning et al., 2015;Huang et al., 2017) to the realm of state anxiety. Moreover, using the threat of public speech, our approach allowed us to investigate behavioural, physiological, and neural responses in anticipation of a future unpredictable threat. Aberrant anticipatory responding to upcoming uncertain threats has been proposed to be a common explanation of anxious states in healthy individuals and anxiety disorders alike (Grupe and Nitschke, 2013). Accordingly, our findings that anxiety leads to changes in informational and environmental uncertainty could prove relevant for understanding the alterations in decision-making and learning observed in anxiety disorders (Bishop and Gagne, 2018;Browning et al., 2015;de Visser et al., 2010;Huang et al., 2017;Miu et al., 2008).
Our approach is not the first in proposing a role of uncertainty estimates in cognitive biases in anxiety. A recent account of affective disorders suggested that difficulties with uncertainty estimation underlie some of the psychiatric symptoms in these populations (Pulcu and Browning, 2019). This work distinguished between different types of uncertainty, corresponding to irreducible, informational, and environmental uncertainty as described here, and assigned a particular relevance of environmental ("unexpected") uncertainty in explaining anxiety. In fact, evidence from computational studies converges in linking trait anxiety with difficulties in learning in unstable or volatile environments (Browning et al., 2015;Huang et al., 2017). As shown by Browning et al. (2015), an inability to adapt to changes in a task structure can be measured by comparing a single volatile block to a single stable block. Alternatively, suboptimal learning in anxiety can be captured by focusing on volatile environments alone, in which the probability of reward (or punishment) changes regularly across different blocks (Huang et al., 2017). Here we followed the second approach to investigate reward-based learning in a volatile environment.
Critically, we investigate adaptive scaling of learning rates to estimates of environmental uncertainty on a trial-by-trial basis by applying a computational model that explicitly incorporates learning about volatility in a hierarchical Bayesian framework. The winning computational model that best explained our behavioural data was the 3-level HGF, where the third level is a mathematical description of volatility estimates and their variance. This model allowed us to assess the effect of state anxiety on an array of relevant computational quantities during task performance. Above and beyond revealing a misestimation of volatility (environmental uncertainty) -as proposed by Pulcu and Browning (2019) -our approach identified biases in uncertainty on various levels that drive suboptimal learning in state anxiety.
First of all, the participants' estimates of tonic volatility -as captured by the parameter ω2were significantly reduced in the state anxiety group, which led to significantly reduced learning rates and estimates of informational and environmental uncertainty. Beliefs about the outcome tendency were thus estimated to be more precise during anxiety, such that new and potentially revealing information about the true nature of hidden states had a smaller influence on the belief updates on that level. Critically, an overly precise belief about the outcome tendency might be inappropriate given the fluctuations in the true underlying hidden state. Thus, an aberrant drop in informational uncertainty might lead to biased learning, which here was further characterised by a lower learning rate about stimulus outcomes. This finding was confirmed in a behavioural measure independent of the modelling approach: state anxious individuals exhibited a higher error rate during task performance relative to control participants. Our study thus provides novel and compelling evidence for abnormal precision estimates underlying impoverished learning in healthy individuals going through temporary states of anxiety. Thus, improper precision weighting could be a general mechanism underlying a range of cognitive biases observed in healthy and psychiatric conditions, such as "hysteria" or autism (Edwards et al., 2012;Lawson et al., 2017).
Secondly, we found that state anxiety led to a decrease in the precision of beliefs about environmental volatility. In the HGF update equations, greater uncertainty on the higher level leads to new information having a stronger influence on the update of beliefs concerning volatility (equation 4; see also Mathys, 2011, thus rendering this belief more changeable. A greater uncertainty about the world's current level of stability may underlie an increased tendency to anticipate potential danger. In the context of anxious individuals having negative reactions to uncertainty (Carleton, 2016), our results may reflect how state anxiety increases the likelihood of misinterpreting unstable signals from the environment as threatening -signaling a more volatile world overall -leading to inappropriate value estimates.
In sum, state anxious individuals in our study showed both a decrease in uncertainty and learning rate on the lower level, driven by changes in tonic volatility estimates, ω2, and an increase in uncertainty (and thus learning rate) on the higher, volatility estimating level. A possible interpretation of these findings is that rather than learning about (changes of) states themselves, individuals in an anxious state attribute any detected changes in the environment to a potential change in the stability of the world. Lower estimates of tonic volatility (ω2) indicate that they expect (or tolerate) fewer changes to the current contingencies governing their environment; instead, when conditions change, they might rapidly infer that the world has gone from a stable to a volatile period due to aberrant uncertainty about the world's level of stability.
Overall, the computational results confirm our hypothesis that state anxious individuals choose their responses founded on a biased representation of uncertainty over the current belief states -at least when dealing with volatile environments as assessed here. Entertaining overly precise beliefs may represent a strategy to regain a sense of control because uncertainty is experienced as aversive, such as observed in obsessive compulsive disorder (Carleton, 2016) and ritualistic behaviour (Lang et al., 2015). In turn, this emergence of biased estimates could increase the symptoms of anxiety over time through recursive inaccurate assessments of threat from uncertainty, thereby fitting a profile of anxious responses similar to those of anxiety-related disorders (Grupe and Nitschke, 2013;Pulcu and Browning, 2019).

Hierarchically-related prediction errors modulate trial-by-trial ERP responses
In the control group, both the lower and higher-level pwPE trajectories modulated the trial-bytrial ERP responses. Lower level pwPEs about the reward outcomes, updating beliefs about reward tendency and higher-level pwPEs about reward tendencies, updating volatility estimates, were primarily represented across frontocentral regions, with an earlier latency for ε3 (387 ms) than ε2 (453 ms). These results align with previous studies combining EEG analyses with the HGF, which revealed that multiple, hierarchically-related pwPEs are computed while learning in volatile environments; and these are represented across different brain regions specific to the task demands (Diaconescu et al., 2017a;Stefanics et al., 2018;Weber et al., 2019).
Importantly, however, this prior work emphasised a specific temporal hierarchy governing the neural representation of the pwPEs across different levels, with lower pwPEs emerging earlier in time relative to higher-level pwPEs. The reasoning behind this emphasis was that in the onestep update equations of the HGF, the lower-level PE needs to be computed first, because the higher-level PE depends on the belief update on the lower level. Here, however, we find that the peak of activation by the higher-level PE precedes that of the lower-level PE. Further work is needed to clarify the implications of this result; however, it may be helpful to consider the nature of the model (HGF) versus the signal (EEG) in this context. The HGF update equations quantify the (total) change in beliefs -both in the mean and the uncertainty or precision -on different levels, in response to an observation. However, while these equations calculate the new posterior in one step, the brain operates in continuous time, where the constant message passing among hierarchically organised regions results in oscillatory signals (Bogacz, 2017;Friston, 2005) which we measure as evoked responses in the EEG. It thus seems likely that while the final value of the posterior belief has to be reached by the end of the ERP, when the evoked oscillation stabilises at a new level, the temporal dynamics leading up to this might be more complex than a linear sequence of updates. Future work on this might profit from formulating an explicit response model which directly links belief states in the HGF to observable EEG responses.
More generally however, the evidence for distinct neural representations of different types of pwPEs in the control group lends support to current predictive coding proposals. These view the brain as a Bayesian observer, estimating beliefs about hidden states in the environment through implementing a hierarchical generative model of the incoming sensory data (de Lange et al., 2018;Doya et al., 2007;Friston, 2010;Rao and Ballard, 1999). In this framework, superficial pyramidal cells encode PEs weighted by precision, and these are also the signals that are thought to dominate the EEG (Friston and Kiebel, 2009). This motivated us to assess the representation of pwPEs in brain responses, an approach followed by some of the previous fMRI and EEG studies (Iglesias et al., 2013;Stefanics et al., 2018;Weber et al., 2019). Other model-based studies of trial-wise ERP responses like the P300 assessed alternative Bayesian inference parameters, such as precision or Bayesian surprise (Kolossa et al., 2015;Mars et al., 2008;Ostwald et al., 2012). The more anterior P3a component around 340 ms was identified as an index of belief updating, whereas the later P3b waveform of posterior topography was found to represent Bayesian surprise (Kolossa et al., 2015). Despite these computational approaches to the P300 not being directly comparable to our pwPE results, they do resemble the timeline and topography of the results from the standard lose minus win ERP analysis conducted here in the Cont and StA groups separately, showing the expected anterior to posterior topographic shift in the P300 component from classical model-free ERP studies (Hajcak et al., 2007(Hajcak et al., , 2005Polich, 2007;Wu and Zhou, 2009).
Under state anxiety, a neural representation of pwPEs emerged exclusively for volatility updates. Due to the smaller variance of ε2 in the anxiety group, it is likely that this regressor was less potent to explain the variance in the EEG signals, making it potentially harder to detect in the brain responses in this group. Possibly, our study was underpowered to detect these weaker effects on the anxiety-related ERP waveforms. The sample size was based on previous research in trait anxiety combining behavioural and computational analysis (Browning et al., 2015), and future work should carry out a sample size estimation specifically targeting the GLM EEG analysis to validate these results. In summary, in state anxiety, belief updating during reward-based learning was mainly driven by updates in volatility, with a corresponding neural representation in parietal electrode regions.
The latency of the trial-wise ERP changes to ε3 in anxiety, 359 ms, resembled the main ε3 effect in control participants, yet the distribution was shifted towards parietal regions. An important limitation in our study is that we cannot directly statistically compare the effects of the pwPEs on EEG responses between the two groups, because the subject-specific trial-wise regressors used in the GLMs already differ between the groups, due to the effect of state anxiety on belief updating as discussed above. Accordingly, here, we only speculate about explanations for the different pattern of activations observed under state anxiety.
Activity in the right dorsal ACC has been shown to express changes in ε3 in a combined fMRI-EEG study (Diaconescu et al., 2017a), while earlier fMRI work linked the ACC, together with the DLPFC, insula and dopaminergic ventral tegmental area / substantia nigra (VTA/SN) to lowerlevel pwPEs instead; and ε3 to the cholinergic basal forebrain (Iglesias et al., 2013).
Interpretations with regard to neuroanatomical regions are limited in our EEG study as it provided exclusively sensor-level results. Despite this limitation, potential contributions by the ACC and DLPFC regions to ε2 and ε3 responses would be compatible with the anterior distribution of pwPEs observed in control participants. Intriguingly, state anxiety has been shown to deactivate the DLPFC (and ventrolateral PFC) and ACC during cognitive control tasks that crucially depend on these areas (Bishop, 2009(Bishop, , 2007Bishop et al., 2004). Thus, one hypothesis that could be tested in future combined fMRI-EEG studies is whether state anxiety disengages these brain regions during reward-based learning, undermining their proper contribution to tracking pwPE about the reward tendency and volatility. Parietal regions could be playing a compensatory role in the StA group, at least to track pwPE about volatility, whereas any available resources in prefrontal and ACC regions may have been allocated to an earlier processing of win and lose outcomes, as shown in the anterior topography of the ERP modulations at 293 ms associated with the outcome regressor. Of particular interest, decreased DLPFC activity also characterises elevated trait anxiety levels, with detrimental consequences for performance and attentional control (Bishop, 2009). And portions of the cingulate cortex and prefrontal cortex are part of the central network underlying anxiety disturbances (Grupe and Nitschke, 2013). Thus, an additional interesting question for future studies would be to assess the role that these brain regions play in the modulation of hierarchically-related pwPEs that may lead to the computational biases described in trait anxiety (Browning et al., 2015;Huang et al., 2017).

Conclusion and outlook
This study is the first to provide a mechanistic understanding of how temporary anxious states impair reward-based learning in volatile environments. The results thus have implications for understanding cognitive biases and impaired learning in healthy individuals exposed to upcoming uncertain threats, but could also generalise to clinical settings. One important direction for future research will be to determine whether the distributed network of brain regions and neurotransmitter systems linked to anxiety-disorders and trait anxiety interact with the neural representation of hierarchical pwPEs, thereby accounting for the misestimation of both precision and uncertainty, impairing learning.
Supplementary Figure 3. Results of the ERP response comparison between outcome in the state anxiety group: wins and losses. Left panels. Cluster-based random permutation analysis of ERP responses in the state anxiety group (StA, N= 21) to assess the effect of the outcome (win, lose). Maps given for each cluster show the scalp topography of the significant cluster ERP differences between outcomes (win, lose) across an earlier (leftmost) time window and later (rightmost) time window. We present this to show the spread of the clusters. Black dots on the topographical maps indicate electrodes referring to a significant cluster (P < 0.025, two-tailed test). Right panel. Grand-mean ERP waveforms of the two outcomes (lose, red; win, blue) and the difference (lose minus win, black) are presented from all electrodes between -0.05 and 0.5 seconds, with SEM given as grey shaded areas. Significant clusters are denoted by black bars on the x-axis. outcome on EEG brain activity in the control group. Using trial outcomes (win, loss) as a regressor, a significant representation in ERP responses were shown between 226 ms to 259 ms over parietooccipital regions, with the peak effect occurring at 242 ms (PFWE < 0.0001, with a cluster-defining threshold of P < 0.001). An additional effect included a later cluster in frontal locations between 404-424 ms (PFWE = 0.004). B) The effect of trial outcome on EEG brain activity in the state anxiety group. In this group, the regressor corresponding to the trial outcomes led to a significant modulation of the ERP waveform between 254 ms and 322 ms at frontal and fronto-central channels, with the peak effect occurring at 293 ms (PFWE = 0, with a cluster-defining threshold of P < 0.001). In addition, a later significant cluster in the same frontal-central channels was found between 388 and 416 ms, with a maximum peak at 402 ms (PFWE = 0.003).