State anxiety alters the neural oscillatory correlates of predictions and prediction errors during reward-based learning

Anxiety inﬂuences how the brain estimates and responds to uncertainty. The consequences of these processes on behaviour have been described in theoretical and empirical studies, yet the associated neural correlates remain unclear. Rhythm-based accounts of Bayesian predictive coding propose that predictions in generative models of perception are represented in alpha (8–12 Hz) and beta oscillations (13–30 Hz). Updates to predictions are driven by prediction errors weighted by precision (inverse variance) encoded in gamma oscillations ( > 30 Hz) and associated with the suppression of beta activity. We tested whether state anxiety alters the neural oscillatory activity associated with predictions and precision-weighted prediction errors (pwPE) during learning. Healthy human participants performed a probabilistic reward-based learning task in a volatile environment. In our previous work, we described learning behaviour in this task using a hierarchical Bayesian model, revealing more precise (biased) beliefs about the tendency of the reward contingency in state anxiety, consistent with reduced learning in this group. The model provided trajectories of predictions and pwPEs for the current study, allowing us to assess their parametric eﬀects on the time-frequency representations of EEG data. Using convolution modelling for oscillatory responses, we found that, relative to a control group, state anxiety increased beta activity in frontal and sensorimotor regions during processing of pwPE, and in fronto-parietal regions during encoding of predictions. No eﬀects of state anxiety on gamma modulation were found. Our ﬁndings expand prior evidence on the oscillatory representations of predictions and pwPEs into the reward-based learning domain. The results suggest that state anxiety modulates beta-band oscillatory correlates of pwPE and predictions in generative models, providing insights into the neural processes associated with biased belief updating and poorer learning.


Introduction
Affective states closely interact with decision making ( Lerner et al., 2015 ). For example, altered computations -such as learning rates and estimates of belief uncertainty -during decision making are considered central to explaining clinical conditions including anxiety, depression and stress from a Bayesian predictive coding (Bayesian PC) perspective ( Browning et al., 2015 ;de Berker et al., 2016 ;Paulus and Yu, 2012 ;Pulcu and Browning, 2019 ;Williams, 2016 ). The Bayesian PC framework proposes that the brain continuously updates a hierarchical generative model using predictions optimised through their discrepancy with sensory data -prediction errors (PE) -and weighted by precision (inverse variance; Friston, 2010 ;Rao and Ballard, 1999 ;Srinivasan et al., 1982 ). This hierarchical message passing was hypothesised (in the context of sensory processing) to be mediated by neural oscillations at specific frequencies, in distinct cortical layers and re-gions ( Bastos et al., 2012 ). Empirical evidence supports this, identifying predictions in alpha and beta frequencies and PEs in gamma frequencies ( Arnal and Giraud, 2012 ;Auksztulewicz et al., 2017 ;Bastos et al., 2020 ;Sedley et al., 2016 ). Yet how affective states modulate the oscillatory activity associated with predictions and PE signals has been largely overlooked.
Uncertainty makes refining predictions particularly challenging. Estimates of uncertainty (or its inverse, precision) regulate how influential PEs are on updating our generative model of the environment ( Friston, 2008 ;Yu and Dayan, 2005 ), scaling precision-weighted PEs (pwPEs). Uncertain and changing environments may render prior beliefs obsolete, down-weighting predictions in favour of increasing learning about sensory input. Recent studies have highlighted that precision estimates are important in explaining atypical learning and perception in neuropsychiatric conditions ( Fletcher and Frith, 2009 ;Friston et al., 2013 ;Lawson et al., 2014 ;Montague et al., 2012 ). Anxiety, in parhttps://doi.org/10.1016/j.neuroimage.2022.118895 . Available online 10 January 2022. 1053-8119/© 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) ticular, has been shown to lead to insufficient adaptation in the face of environmental change ( Browning et al., 2015 ;Huang et al., 2017 ), disruption in learning, and maladaptive biases -in both aversive and reward-based learning contexts ( Hein et al., 2021 ;Huang et al., 2017 ;Kim et al., 2020 ;Lamba et al., 2020 ;Piray et al., 2019 ;Pulcu and Browning, 2019 ). Whether the learning alterations in anxiety are mediated by oscillatory changes representing predictions and pwPEs remains unknown.
Predictions have been consistently associated with the modulation of alpha-beta rhythms across multiple modalities, such as visual ( Gould et al., 2011 ), motor ( Schoffelen et al., 2005 ), somatosensory ( van Ede et al., 2011 ), and auditory ( Todorovic et al., 2015 ) -yet frequency-domain evidence for predictions about reward contingencies in volatile environments is currently lacking. This is important to understand as learning biases manifest in anxiety conditions during environmental instability ( Browning et al., 2015 ;Pulcu and Browning, 2019 ). Crucially, predictions in deep layers are thought to functionally inhibit the processing of sensory input and PEs in superficial layers ( Bastos et al., 2015 ;Bauer et al., 2014 ;Mayer et al., 2016 ;Van Kerkoerle et al., 2014 ). This suggests that aberrant oscillatory states modulating predictions would be an additional route through which encoding of pwPEs is altered, contributing to impaired learning.
Here, we used convolution modelling of oscillatory responses ( Litvak et al., 2013 ) in previously acquired EEG data to estimate the neural oscillatory representations of predictions and pwPEs during rewardbased learning in healthy controls and a state anxious group. Our previous computational modelling study ( Hein et al., 2021 ) revealed that state anxiety biases uncertainty estimates, increasing the precision of posterior beliefs about the stimulus-reward contingency. We now ask whether this bias is associated with altered spectral characteristics of hierarchical message passing, which could represent a candidate marker of biased belief updating and poorer reward-based learning in anxiety. We hypothesised that, in state anxiety, increased precision in the predictions about a certain stimulus-reward contingency should be associated with increased alpha and beta activity. This, in turn, would inhibit the processing of expected inputs in line with PC accounts, resulting in a hypothesised lower gamma activity and concomitantly higher alpha-beta activity for attenuating encoding of pwPEs.

Participant sample
The data used in the preparation of this work were obtained from our previous study Hein et al. (2021) , which was approved by the ethical review committee at Goldsmiths, University of London. Participants were pseudo-randomly allocated into an experimental state anxiety (StA) and control (Cont) group, following a screening phase in which we measured trait anxiety levels in each participant using Spielberger's Trait Anxiety Inventory (STAI; Spielberger, 1983 ). Trait anxiety levels were matched in StA and Cont groups (average score and standard error of the mean, SEM: 47 [2.1] in StA,46 [2.2] in Cont). Importantly, individual trait anxiety scores above 46 have been shown to be typical in anxiety disorder patients ( Fisher and Durham, 1999 ), suggesting that a proportion of our participants had relatively high trait anxiety levels. Further, the age of the control group (mean 27.7, SEM = 1.2) and their sex (13 female, 8 male) were consistent with those from the state anxiety group (mean 27.5, SEM = 1.3, sex 14 female, 7 male). This is important to consider as there are known age and sex-related confounds to measures of state anxiety (see Voss et al., 2015 ).

Experimental design
Both groups (StA, Cont) performed a probabilistic binary rewardbased learning task where the probability of reward between two images changes across time ( Behrens et al., 2007 ;de Berker et al., 2016 ;Iglesias et al., 2013 ). The experiment was divided into four blocks: an initial resting state block (R1: baseline), two reward-based learning task blocks (TB1, TB2), and a final resting state block (R2). Each resting state block was 5 min. Participants were instructed to relax and keep their eyes open and fixated on a cross in the middle of the presentation screen while we recorded EEG responses from the scalp and EKG responses from the heart.
The experimental task consisted of 200 trials in each task block (TB1, TB2). The aim was for participants to maximise reward across all trials by predicting which of the two images (blue, orange) would reward them (win, positive reinforcement, 5 pence reward) or not (lose, 0 pence reward). The probability governing reward for each stimulus (reciprocal: p, 1 − p) changed across the experiment, every 26 to 38 trials. There were 10 contingency mappings for both task blocks: 2 x strongly biased (90/10; i.e. probability of reward for blue p = 0.9), 2 x moderately biased (70/30), and 2 x unbiased (50/50: as in de Berker et al., 2016 ). The biased mappings repeated in reverse relationships (2 × 10/90; 2 × 30/70) to ensure that over the two blocks (TB1, TB2) there were 10 stimulusoutcome contingency phases in total.
In each trial the stimuli were presented randomly to the left or right of the centre of the screen where they remained until either a response was given (left, right) or the trial expired (maximum waiting time, 2200 ms ± 200 ms). Next, the chosen image was highlighted in bright green for 1200 ms ( ± 200 ms) before the outcome (win, green; lose or no response, red) was shown in the middle of the screen (1200 ms ± 200 ms). At the end of each trial, the outcome was replaced by a fixation cross at an inter-trial interval of 1250 ms ( ± 250 ms).
Specific task instructions to participants were to select which image they predicted would reward them on each trial and adjust their predictions according to inferred changes in the probability of reward (as in de Berker et al., 2016 ). All participants filled out computerised questionnaires (state anxiety STAI state scale X1, 20 items: Spielberger, 1983 ) and conducted practice trials as detailed in Hein et al. (2021) . Critically, the state anxiety manipulation was delivered just before the first reward-based learning block (TB1) to the StA group (see the following section).

Manipulation and assessment of state anxiety
Our StA group was instructed to complete a public speaking task in line with previous work ( Feldman et al., 2004 ;Lang et al., 2015 ). This meant, as detailed in Hein et al. (2021) , that StA participants were told just before TB1 that they would need to present a piece of abstract art for 5 min to a panel of academic experts after completing the rewardbased learning task, with 3 min preparation time. By contrast, the Cont group were informed that they would need to give a mental description of the piece of abstract artwork for the same time privately (rather than to a panel of experts, see Hein et al., 2021 ). Importantly, the state anxiety manipulation was then revoked in the StA group directly after completing the second reward-based learning block (TB2) and before the second resting state block (R2). They were informed that the panel of experts was suddenly unavailable. Both groups, therefore, presented the artwork to themselves after completing the reward-based learning task.
To assess state anxiety, as in our previous work, we used the coefficient of variation (CV = standard deviation/mean) of the inter-beat intervals (IBI) as a metric of heart rate variability (HRV), as this index has been shown to drop during anxious states ( Chalmers et al., 2014 ;Feldman et al., 2004 ;Gorman and Sloan, 2000 ;Kawachi et al., 1995 ;Quintana et al., 2016 ). Additional to this, the spectral characteristics of the IBI data were analysed to obtain an HRV proxy of state anxiety associated with autonomic modulation and parasympathetic (vagal) withdrawal ( Friedman, 2007 ;Gorman and Sloan, 2000 ). HRV and highfrequency HRV (HF-HRV, 0.15-0.40 Hz) measures were derived from the R-peaks extracted from the EKG signal recorded throughout the experimental sessions (see details in Hein et al., 2021 , and section EEG acquisition and analysis below). The HRV and HRV-HF measures during performance blocks were normalised with the average baseline levels during R1, after we established that StA and Cont groups did not differ in these indexes in the initial resting state phase ( P = 0.76, 0.66 for HRV and HRV-HF, respectively). This outcome suggested that control and anxious participants were not significantly dissociated in these physiological measures at the beginning of the experiment. Hereafter we refer to R1-normalised measures when summarising the results from Hein et al. (2021) on the HRV/HRV-HF measures during task blocks.
In line with prior research, our previous study showed reduced HF-HRV and reduced HRV in state anxious participants relative to controls ( Fig. 1 C ). Reduction in these measures has been reliably shown across trait anxiety, worry, and anxiety disorders ( Aikins and Craske, 2010 ;Friedman, 2007 ;Fuller, 1992 ;Klein et al., 1995 ;Miu et al., 2009 ;Mujica-Parodi et al., 2009 ;Pittig et al., 2013 ;Thayer et al., 1996 ), and thus, significant changes to these metrics suggested physiological responses consistent with state anxiety. Subjective self-reported measures of state anxiety (STAI state scale X1, 20 items: Spielberger, 1983 ) were taken at four points during the original Hein et al. (2021) study, but the data could not be used due to an error in STAI data collection. We showed in a separate study, however, that HRV can effectively track changes in state anxiety, as validated by concurrent changes in STAI scores (state scale; Sporn et al., 2020 ).

Behavioural analysis and modelling
The behavioural data in our paradigm were analysed in Hein et al. (2021) using the Hierarchical Gaussian Filter (HGF, Mathys et al., 2011Mathys et al., , 2014. This model describes hierarchically structured learning across various levels, corresponding to hidden states of the environment x 1 (k) , x 2 (k),…, x n (k) and defined as coupled Gaussian random walks. Belief updating on each level is driven by PEs modulated by precision ratios, weighting the influence of precision or uncertainty in the current level and the level below. The HGF was implemented with the open-source software in TAPAS http://www.translationalneuromodeling.org/tapas , version 3.1.0).
To model learning about the tendency towards reward for blue/orange stimuli and the rate of change in that tendency (volatility), we used three alternative HGF models and two reinforcement learning models ( Hein et al., 2021 ). The input to the models was the series of 400 outcomes and the participant's responses. Outcomes in trial k were either u (k) = 1 if the blue image was rewarded or u (k) = 0 if the orange image was rewarded. Trial responses were defined as y (k) = 1 if participants chose the blue image, while y (k) = 0 corresponded to the choice of the orange image. We tested a 3-level HGF (HGF 3 , with volatility estimated on the third level), a 2-level reduced HGF (HGF 2 , that fixes volatility to a constant level), and a HGF where decisions are informed by trial-wise estimates of volatility (see Diaconescu et al., 2014 ). We additionally tested two widely used reinforcement models, a Rescorla Wagner (RW, Rescorla and Wagner, 1972 ) and Sutton K1 model (SK1, Sutton, 1992 ). Following random effects Bayesian model comparison, the model that best explained the behavioral data amongst participants was the 3-level HGF for binary outcomes (see Fig. 1 A ). In this winning model, the first level represents the binary outcome in a trial (either blue or orange wins) and beliefs on this level feature expected or irreducible uncertainty due to the probabilistic nature of the rewarded outcome ( Soltani and Izquierdo, 2019 ). The second level x 2 (k) represents the true tendency for either image (blue, orange) to be rewarding on trial k . And the third level represents the log-volatility or rate of change of reward tendencies ( Bland and Schaefer, 2012 ;Yu and Dayan, 2005 ). In the HGF update equations, the second and third level states, x 2 (k) and x 3 (k) , are modelled as continuous variables evolving as Gaussian random walks coupled through their variance (inverse precision). Hereafter we drop the trial index k in most expressions for simplicity.
Variational inversion of the model provides the trial-wise trajectories of the sufficient statistics of the posterior distribution of beliefs about x i ( i = 2,3): i (mean, denoting participant's expectation) and i (variance, termed informational or estimation uncertainty for level 2; uncertainty about volatility for level 3). The coupling function between levels 2 and 3 is as follows: In Eq. (1) , 2 represents the invariant (tonic) portion of the log volatility of x 2 and captures the size of each individual's stimulusoutcome belief update independent of x 3 . The parameter establishes the strength of the coupling between x 2 and x 3 , and thus the degree to which estimated environmental volatility impacts the learning rate about the stimulus-outcome probabilities -in Hein et al. (2021) was fixed to one.
Another relevant parameter in the HGF equations is 3 , which seizes upon 'metavolatility': how estimates of environmental volatility evolve -with larger values articulating a belief that the changeability of the task is itself changing. Note that in our experimental task, however, the rate of change (true volatility) was constant, as the stimulus-outcome contingencies changed every 26-38 trials (similarly to de Berker et al. [2016] and Iglesias et al., [2013] ). Environmental uncertainty is defined as exp ( 3 (k-1) + 2 ), which depends on the phasic log-volatility estimates on the previous trial ( 3 (k − 1 ) ) and the tonic volatility ( 2 ). Thus, the higher 3 (k − 1 ) or 2 are, the greater the environmental uncertainty (see Mathys et al., 2014 , page 15, Eq. (11)).
In our implementation of the winning model, the 3-level HGF, we estimated the perceptual model parameters 2 , 3 , while we fixed and the initial values of the mean and variance of the belief trajectories . This choice was based on the previous work that we used as reference for our study ( de Berker et al., 2016 ). The prior values on the model parameters can be found in Supplementary Table  1 and Hein et al. (2021) . Hein et al. (2021) also includes the results of simulations carried out to assess how well the HGF 3 estimated each free model parameter. In brief, 2 could be estimated well, whereas 3 was not recovered, in line with recent findings ( Reed et al., 2020 ).
Paired with this perceptual model of hierarchically-related beliefs is a response model that obtains the most likely response for  Hein et al. (2021) . The free perceptual model parameters 2 , 3 and the response parameter were estimated by fitting the HGF to observed inputs (u) and individual responses (y). ( B) HGF trajectories of the computational quantities used to form our GLM convolution regressors, from one participant. The lowest level shows the sequence of outcomes (green dots: 1 = blue win, 0 = orange win) and the participant's responses (dark blue dots) on each trial. The black line indicates the series of prediction errors (PE) about the stimulus outcome, and the pink line the precision weight on level 2. The middle layer of (B) shows the trial-wise HGF estimate of pwPE about stimulus outcomes (pwPE updating level 2, termed pwPE 2 in the graphic, 2 in the main text; blue). For our GLM convolution analysis, we used unsigned values of 2 as the first parametric regressor. The precision ratio included in the pwPE 2 term, in succession, weights the influence of prediction errors about stimulus outcomes on the expectation of beliefs on level 2. Predictions about the tendency towards a stimulus-reward contingency on level 2 are displayed on the top level (maroon). We took the absolute values of this quantity as our second parametric regressor (labelled Predictions 2 in the graphic). ( C) In Hein et al. (2021) , a significant drop in heart rate variability (HRV, a metric of anxiety using the coefficient of variation of the inter-beat-interval of the recorded heart beats), was observed in the StA group (pink) relative to Cont (black). Panel (C) shows the mean HRV (with vertical SEM bars) over the experimental task blocks 1 and 2 (TB1, TB2) and the final resting state block (R2). These blocks (TB1, TB2, R1) were normalised to the average HRV value of the first resting state block (R1: baseline). A significant effect of group and block was discovered using non-parametric 2 × 2 factorial tests with synchronised rearrangements. After control of the FDR at level q = 0.05, planned comparisons showed a significant between groups result (black bar) in TB1. ( D) State anxiety impeded the overall reward-based learning performance as given by the percentage of errors. In the above, the mean of each group (StA, pink, Cont, black) is provided with SEM bars extending vertically. On the right of the group mean are the individual values depicting the sample population dispersion. State anxiety significantly increased the error rate relative to Controls. ( E-G) HGF modelling results. Hein et al. (2021) reported significantly lower 2 in StA relative to Cont. Simulations in that study showed that a lower 2 is associated with reduced estimation (informational) uncertainty on level 2, 2 . ( E) In our StA group, the block average of estimation uncertainty about the stimulus-reward contingency ( 2 ) was significantly smaller than in Cont (main effect of group; StA, pink; Cont, black). ( F) We observed significantly lower environmental uncertainty in StA relative to Cont (main effect of group). ( G) State anxiety increased uncertainty about volatility ( 3 , main effect of block and group). Planned between-group comparisons additionally revealed a significantly higher 3 in StA relative to Cont in each task block separately (TB1, TB2, black bars). each trial using the belief estimates. The winning HGF model from Hein et al. (2021) used the unit-square sigmoid observation model for binary responses ( Iglesias et al., 2013 ;Mathys et al., 2011Mathys et al., , 2014 and the response model parameter , which represents decision noise, was additionally estimated for each participant (see Supplementary Table  1 ). Simulations carried out in Hein et al. (2021) revealed that the decision noise parameter was also estimated well. We refer the reader to the original HGF methods papers for more detail on the mathematical derivations ( Mathys et al., 2011, and to Hein et al. (2021) for equations included in the original results.
In the current study, we used two types of subject-specific trajectories of HGF variables as parametric regressors for convolution GLM analysis: (a) unsigned predictions about the tendency towards a certain stimulus-reward contingency (| μ 2 |); (b) precision-weighted prediction errors on level 2 (| 2 |) updating the beliefs on the tendency towards a reward contingency. The arguments supporting our choice of unsigned (absolute) values for these computational quantities are given in section Spectral Analysis below. The update steps for the posterior mean on level 2 on trial k, μ (k) 2 , depend on the prediction error on the level below, δ (k) 1 , weighted by a precision term according to the following expression: The prediction about the tendency towards a stimulus-reward contingency before observing the outcome, μ (k) 2 , is, in our winning model, the expectation in the previous trial, μ (k−1) 2 . The pwPE term on level 2, on the other hand, is the product of the estimation uncertainty (inverse precision π 2 ) and the PE about the stimulus outcome: σ (k) 2 δ (k) 1 . Thus, the influence of PEs on updating μ 2 decreases with greater precision on that level, π 2 , or smaller estimation uncertainty, σ 2 . The intuition from this expression is that the less certain we are about level 2, the more we should update that level using new information (prediction errors) from the level below. See Mathys et al. (2011Mathys et al. ( , 2014 for detailed mathematical expressions for the HGF. Details on the free parameters of the HGF model that were estimated, including prior values, can be found in Hein et al. (2021) , and in Supplementary Table 1 .
Trial-by-trial trajectories of the unsigned predictions about the stimulus-reward tendency | μ 2 | and pwPEs about the stimulus outcome | 2 | for an exemplar participant are provided in Fig. 1 B .

EEG and EKG acquisition and analysis
EEG, EKG and EOG signals were recorded continuously throughout the study using the BioSemi ActiveTwo system (64 electrodes, extended international 10-20, sampling rate 512 Hz). External electrodes were placed on the left and right earlobes to use as references upon importing the EEG data in the analysis software. EKG and EOG signals were recorded using bipolar configurations. For EOG, we used two external electrodes to acquire vertical and horizontal eye movements, one on top of the zygomatic bone by the right eye, and one between both eyes, on the glabella. For EKG we used two external electrodes in a two-lead configuration ( Moody and Mark, 1982 ). Please refer to Hein et al. (2021) for further details on the electrophysiology acquisition.
EEG data were preprocessed in the EEGLAB toolbox ( Delorme and Makeig, 2004 ). The continuous EEG data were first filtered using a high-pass filter at 0.5 Hz (with a hamming windowed sinc finite impulse response filter with order 3380) and notch-filtered at 48-52 Hz (filter order 846). Next, independent component analysis (ICA, runICA method) was implemented to remove artefacts related to eye blinks, saccades and heartbeats (2.3 components were removed on average [SEM 0.16]), as detailed in Hein et al. (2021) . Continuous EEG data were then segmented into epochs centred around the outcome event (win, lose, no response) from − 200 to 1000 ms. Noisy data epochs defined as exceeding a threshold set to ± 100 V were marked as artefactual (and were excluded during convolution modelling, see next sec-tion). Further to this, a stricter requirement was placed on the artefact rejection process to achieve higher quality time-frequency decomposition, as proposed for the gamma band (see Hassler et al., 2011 ;Keren et al., 2010 ). Data epochs exceeding an additional threshold set to the 75th percentile + 1.5 ⋅IQR (the interquartile range, summed over all channels) were marked to be rejected ( Carling, 2000 ;Schwertman et al., 2004 ;Tukey, 1977 ). The two rejection criteria resulted in an average of 22.37 (SEM 2.4) rejected events, with a participant minimum of 80% of the total 400 events available for convolution modelling. Following preprocessing, EEG continuous data were converted to SPM 12 ( http://www.fil.ion.ucl.ac.uk/spm/ version 7487) downsampled to 256 Hz and time-frequency analysis was performed ( Litvak et al., 2011 ).
Preprocessed EEG and behavioural data files are available in the Open Science Framework Data Repository: https://osf.io/b4qkp/ . All subsequent results shown here are based on these data.

Spectral analysis
Prior to assessing the effect of HGF predictors on "phasic " changes in the time-frequency representations, we determined whether the average spectral power differed between state anxiety and control participants during task performance. To achieve this, we extracted the standard power spectral density (in mV 2 /Hz) of the raw data within 1-90 Hz and during task blocks TB1 and TB2 (fast Fourier transform, Welch method, Hanning window of 1 s, 75% overlap) and converted it into decibels (dB: 10 * log 10 ).
Standard time-frequency (TF) representations of the continuous EEG data were estimated by convolving the time series with Morlet wavelets. TF spectral power was estimated in the range 4 to 80 Hz, using a higher number of wavelet cycles for higher frequencies. For alpha (8-12 Hz) and beta (13-30 Hz) frequency ranges, we sampled the range 8-30 Hz in bins of 2 Hz, using 5-cycle wavelets shifted every sampled point ( Kilner et al., 2005 )-achieving a good compromise between high temporal and spectral resolution ( Litvak et al., 2011 ;Ruiz et al., 2009 ). Gamma band activity (31-80 Hz) was also sampled in steps of 2 Hz, using 7-cycle wavelets.
Following the time-frequency transformation, we modelled the time series using a linear convolution model for oscillatory responses ( Litvak et al., 2013 ). This convolution model was introduced to adapt the classical general linear model (GLM) approach of fMRI analysis to time-frequency data ( Litvak et al., 2013 ). The main advantage of this approach is that it allows assessing the modulation of neural oscillatory responses on a trial-by-trial basis by one specific explanatory regressor while controlling for the effect of the other regressors included in the model. This control is particularly relevant in the case of stimuli or response events with variable timing on each trial. Convolution modelling of oscillatory responses has been successfully used in EEG ( Litvak et al., 2013 ;Spitzer et al., 2016 ) and MEG research ( Auksztulewicz et al., 2017 ).
In brief, the convolution GLM approach is an adaptation of the classical GLM, which aims to explain measured signals (BOLD for fMRI or time-domain EEG signals) across time as a linear combination of explanatory variables (regressors) and residual noise ( Litvak et al., 2013 ). In convolution modelling for oscillatory responses, the measured signals are the time-frequency transformation (power or amplitude) of the continuous time series, denoted by matrix Y in the following expression: Here ∈ ( ℝ ) × is defined over t time bins and f frequencies. These signals are explained by a linear combination of n explanatory variables or regressors in matrix ∈ ( ℝ ) × , modulated by the regression coefficients ∈ ( ℝ ) × . The coefficients must be estimated for each regressor and frequency, using ordinary or weighted least squares.
The convolution modelling approach developed by Litvak et al. (2013) redefines this problem into the problem of In GLM, signals Y are explained by a linear combination of explanatory variables or regressors in matrix X , modulated by the regression coefficients , and with an added noise term ( ). Our design matrix X in this example included the following regressors (columns left to right): Outcome Win, Outcome Lose, Outcome No Response, and absolute pwPE on level 2, which were defined over time. Matrix X was specified as the convolution of an impulse response function, encoding the presence and value of discrete or parametric events for each regressor and time bin, and a Fourier basis function (left inset at the bottom). Solving a convolution GLM provides response images (TF estimate in the figure) that are the combination of the basis functions and the regression coefficients i for a particular regressor type i . Thus, convolution GLM effectively estimates deconvolved time-frequency responses (TF estimate, rightmost image at the bottom) to the event types and associated parametric regressors.
finding time-frequency images R i for a specific type of event i (e.g. outcome or response event type): Here, B denotes a family of m basis functions (e.g. sines, cosines) used to create the regressor variables X by convolving the basis functions B with k input functions U representing the events of interest at their onset latencies, and thus X = UB . The time-frequency response images ∈ ( ℝ ) × have dimensions p (peri -event interval of interest) and f, and are therefore interpreted as deconvolved time-frequency responses to the event types and associated parametric regressors. It is the images R i that are used for subsequent standard group-level statistical analysis. For a visual depiction of the convolution modelling of time-frequency responses, see Fig. 2 .
In our study, we were particularly interested in assessing parametric effects of computational quantities, such as pwPEs and predictions, on the time-frequency representations of the EEG data in each electrode. We implemented convolution modelling by adapting code developed by Spitzer et al. (2016) freely available at https://github.com/bernspitz/ convolution -models -MEEG . The total spectral power was first converted to amplitude using a square-root transformation to conform with the GLM error assumptions Litvak et al., 2013 ). Our trial-wise explanatory variables included discrete regressors coding for stimuli (blue image, orange image), responses (right, left, no response), outcome (win, lose, no response) and relevant parametric HGF regressors: unsigned HGF model estimates of predictions about the tendency towards a stimulus-reward contingency on level 2 (| μ 2 |, hereinafter termed 'predictions') and precision-weighted prediction errors (pwPEs) on that level encoding the magnitude of the update in the be-liefs about the reward contingency (| 2 |, hereinafter termed 'pwPEs'; see Fig. 1 B ). We selected the absolute value of predictions and pwPEs on level 2 because the sign in these HGF variables is arbitrary: a positive or negative value in pwPEs or predictions does not denote a win or a lose trial (see other HGF work using unsigned HGF variables as regressors, for instance, Auksztulewicz et al., 2017 ;Stefanics et al., 2018 ). The absolute values of predictions do, however, represent a prediction about the tendency towards a particular stimulus-reward contingency, and thus the greater the value of | μ 2 | the stronger the expectation that given the correct stimulus choice a reward will be received.
As in our previous work, pwPE on level 3 ( 3 ) updating the logvolatility estimates were excluded from this analysis due to multicollinearity: high linear correlation between 2 and 3 (for further detail, see Hein et al., 2021 ). Likewise, trial-wise HGF estimates of absolute predictions about stimulus outcomes were highly linearly correlated with predictions on the third level about volatility μ 3 (Pearson correlation coefficients ranging from − 0.97 to − 0.03 across all 42 participants, mean − 0.7). As such, we also excluded μ 3 from the analysis. (For details on the impact of multicollinearity of regressors on GLMs see Mumford et al., (2015) and Vanhove (2020) ). Another factor informing our decision to choose level 2 over level 3 regressors was that, as shown in simulations in Hein et al. (2021) , in the winning model 2 can be estimated well, whereas 3 is not (see also Reed et al., 2020 ). The chosen HGF pwPE and prediction regressors were consistently uncorrelated, below 0.25 in line with previous work using HGF quantities as regressors ( Auksztulewicz et al., 2017 ;Iglesias et al., 2013 ;Vossel et al., 2015 ).
Our primary convolution GLM analysis introduced regressor values for pwPEs at the latency of the outcome regressor. This allowed us to assess the parametric effect of pwPEs about stimulus outcomes on the time-frequency responses in a relevant peri -event time interval. Although previous work analysed the effect of pwPEs on neural responses up to 1000 ms, we showed in Sporn et al. (2020) that pwPEs during reward-based learning can modulate neural oscillatory responses in the beta band up to 1600 ms, and these responses are dissociated between anxiety and control groups. The recent studies by Bauer et al. (2014) and Palmer et al. (2019) also showed that the latency of PE and pwPE effects on neural activity can extend up to 2 s. Accordingly, the pwPE convolution model was estimated using a window from − 200 to 2000 ms relative to the outcome event, and the statistical analysis focused on the 100-1600 ms interval (see next section).
Concerning the prediction regressor, we considered different time intervals in which we could capture neural oscillatory responses to predictions. This is a challenging task acknowledged before ( Diaconescu et al., 2017 ), as the neural representation of predictions likely evolves gradually from the outcome on the previous trial to the outcome on the current trial. It is thus not expected to be locked to a specific event. This explains why most of the previous work using the HGF framework excluded predictions as a regressor for GLM analysis. Here we followed Auksztulewicz et al. (2017) , who analysed predictions locked to the cue, and Palmer et al. (2019) , who assessed a wide interval surrounding the movement (response); note that in the Palmer et al. (2019) study, the motor response was the last event in each trial (i.e. there was no additional response feedback). We thus hypothesised that the neural representation of predictions on the reward outcome contingencies could be captured by focusing on two complementary windows of analysis: (i) an interval following the stimulus presentation (stimulus-locked); (ii) an interval preceding the outcome on the current trial (outcome-locked). Unlike the targeted pwPE analysis described above, the analysis of the prediction regressor was exploratory as we did not have a strong hypothesis regarding which of both time windows would preferentially reflect prediction-related neural modulations.
To assess the stimulus-locked parametric effect of predictions on the time-frequency responses, we run a convolution GLM in a time interval from − 200 to 2000 ms. For the outcome-locked parametric effect of predictions, the convolution GLM was run from − 2500 to 0 ms. This later interval extended to − 2500 to allow for the presence of a baseline interval in every trial prior to the preceding stimulus -which we used exclusively for within-subject analyses (see below). Thus, two separate convolution GLMs were run with the prediction regressor modulating neural activity locked to either the stimulus or outcome events. These broad windows were further refined in our statistical analysis (see next section).
In all alpha-beta convolution GLM analyses, discrete and parametric regressors were convolved with a 12th-order Fourier basis set (24 basis functions, 12 sines and 12 cosines), as in Litvak et al. (2013) . For convolution models run from − 200 to 2000 ms locked to an event type, using a 12th-order basis functions set allowed the GLM to resolve modulations in the TF responses up to ∼ 5.5 Hz (12 cycles / 2.2 s; or 183 ms). For the outcome-locked GLM run from − 2500 to 0 ms, the 12th-order Fourier basis set resolves frequencies up to ∼5 Hz. Our choice of a 12th order set was compatible with the temporal extent of the pwPE and prediction effects on alpha-beta oscillatory activity reported in previous work (200-400 ms-long effects in Auksztulewicz et al., 2017 ) up to 2000 ms-long effects in Palmer et al. (2019) . In the case of gamma oscillations modulating pwPEs, we considered a higher order basis function set to allow for potentially faster gamma effects to be resolved. Using a 20th-order Fourier basis set on the gamma-band convolution GLM within − 200 to 2000 ms enabled resolving modulations in the TF responses up to ∼ 9 Hz (20 cycles / 2.2 s; or 110 ms).

Statistical analysis
The time-frequency images (in arbitrary units, a.u.) from the convolution model were subsequently converted to data structures compatible with the FieldTrip Toolbox for statistical analysis . We used permutation tests with a cluster-based threshold correction to control the family-wise error rate (FWER) at level 0.05 (5000 iterations; Maris and Oostenveld, 2007 ;Oostenveld et al., 2011 ). These analyses were conducted with spatio-spectral-temporal data, after averaging the time-frequency responses within each frequency band (alpha, beta and gamma ranges). We thus run the cluster-based permutation tests along the spatial (64 channels), frequency-band (3) and temporal dimensions (FWER-controlled). Importantly, in convolution modelling for oscillatory responses the TF images are usually not baseline corrected as in standard TF analyses (no subtraction or division by the average baseline level). Instead, the baseline activity is estimated -similarly to the post-event activity -taking into account the latency variation of different events in the continuous recording ( Litvak et al., 2013 ). Thus, TF images are not centred at 0 amplitude during the baseline period.
The statistics approach consisted of investigating separately within and between-group effects. The within-group level analysis used dependant-samples two-sided tests and aimed to assess whether the neural oscillatory responses to the HGF regressors were larger or smaller during a window of interest as compared to a reference (baseline) interval. Next, we separately evaluated between-group effects of HGF regressors on oscillatory responses using one-sided tests ( N = 21 Cont, 21 StA). This allowed us to test our hypothesis of increased alpha and beta activity and reduced gamma activity in StA compared to Cont. In the case of two-sided tests, the cluster-based test statistic used as threshold the 2.5-th and the 97.5-th quantiles of the t-distribution, whereas we used the 95th quantile of the permutation distribution as critical value in one-sided tests.

Analysis of the pwPE regressor
At the within-group level, we assessed the changes in time-frequency activity during the window of interest relative to a baseline period (given independently below) separately in StA and Cont groups ( N = 21 each). For the within-group analysis of the pwPE regressor, we contrasted the time-frequency images between an interval from 100 to 1600 ms post-outcome and a baseline level averaged from − 200 to 0 ms, separately in each group. The 100-1600 ms time window of analysis encompasses the effects from our previous single-trial ERP study ( Hein et al., 2021 ) and our work on the modulation of beta oscillatory responses by pwPEs during motor learning in state anxiety, which revealed effects between 400 and 1600 ms ( Sporn et al., 2020 ). Between-group differences in TF representations of pwPE were separately assessed. This analysis was also conducted within 100-1600 ms. Overall, we controlled the FWER at level 0.05 to deal with the issue of multiple comparisons emerging from the spatial (64) × spectral (3) × temporal dimensions.

Analysis of the predictions regressor
Within-group level statistical analysis of the stimulus-locked timefrequency images of the prediction regressor focused on the range 100-1000 ms, and relative to an average pre-stimulus baseline level from − 200 to 0 ms. This target window for statistical analysis balanced the evidence from previous work ( Auksztulewicz et al., 2017 ;Palmer et al., 2019 ) and aimed to exclude an overlap with the outcome events, which appeared 1000 ms after the response. In the current study, participants' reaction time was 598 ms on average (SEM 130 ms; minimum RT was ∼300 ms). However, the effects of the response were factored out from the prediction-related oscillatory activity by including the response regressor in the convolution GLM. This was validated in a control analysis that assessed the effect of the response regressor in the same time window, between 100 and 1000 ms stimulus-locked, to confirm independent changes in sensorimotor electrode regions.
Within-group statistical analysis of the outcome-locked effects of predictions was conducted in a similar window 100-1000 ms preceding the outcome event (that is, from − 1000 to − 100 ms before the outcome). Activation in this interval was contrasted to a baseline level of 200 ms, from − 2300 to − 2100 ms. This baseline period was calculated to safely precede stimuli presentation across all trials, during which participants were fixating on a central point on the monitor. As mentioned above, to confirm independent changes in sensorimotor regions in response to the response regressor, we used an identical window in an additional control analysis.
The between-group level stimulus-locked analysis of predictions was conducted within 100-1000 ms. The outcome-locked analysis targeted the interval from − 1000 to − 100 ms, as mentioned above. In all GLM analyses, the FWER was controlled at level 0.05.

Previous results: biases of state anxiety on processing uncertainty
In Hein et al. (2021) we showed state anxiety (StA) significantly reduced HRV and HF-HRV (0.15-0.40 Hz) relative to the control group (Cont, Fig. 1 C ). This outcome suggested that our state anxiety manipulation had successfully modulated physiological responses in a manner consistent with changes in state anxiety ( Friedman, 2007 ;Fuller, 1992 ;Klein et al., 1995 ;Miu et al., 2009 ;Pittig et al., 2013 ). We further showed that state anxiety significantly increased the percentage of errors made during reward-based learning when compared to the control group ( Fig. 1 D ). In parallel to the cardiovascular and behavioural changes induced by the anxiety manipulation, by modelling decisions with the HGF, we found that state anxiety impaired learning. First, we found significantly reduced estimation uncertainty ( 2 ) in StA relative to Cont ( Fig. 1 E ). This bias in StA indicates that new information has a smaller impact on the update of beliefs about the tendency towards a stimulus-reward contingency (level 2). State anxious individuals also exhibited an underestimation of environmental uncertainty when compared with controls ( Fig. 1 F ). However, uncertainty on volatility ( 3 ) increased in StA relative to Cont ( Fig. 1 G ). StA also had a lower 2 parameter than control participants, which in Hein et al. (2021) was associated in a simulation analysis with the reduced estimation uncertainty in this group. Other model parameters (ω 3 , ζ) did not differ between groups. These HGF model-based results were aligned with the results of our separate standard behavioural analysis as mentioned above, demonstrating a significantly higher error rate in StA during rewardbased learning performance ( Fig. 1 D ).

General modulation of spectral power
The average raw spectral power during task performance did not differ between state anxiety and control participants ( P > 0.05, clusterbased permutation test; Supplementary Fig. 1 ). Thus, the state anxiety manipulation did not significantly modulate the general spectral profile of oscillatory activity during task performance, as we showed in a recent study ( Sporn et al., 2020 ).

Precision-weighted prediction errors about stimulus outcomes
The overall time course of the parametric modulation of alpha (8-12 Hz) and beta (13-30 Hz) oscillatory activity by pwPEs about stimulus outcomes is displayed in Figs. 3 A and 4 A , respectively. On the within-subject level, there was a significant decrease relative to baseline in alpha and beta activity in the control group (one negative cluster, P = 0.0002, two-sided test, FWER-controlled). The effect was within 600-1400 ms for alpha, and 400-1120 ms for beta activity. No significant clusters were found in the gamma band. The alpha-band effect originated in centro-parietal electrodes and later spread across the whole scalp ( Fig. 3 B ). The beta-band modulation, on the other hand, had a widespread topography and started earlier than the alpha-band effect (at 400 ms; Fig. 4 B ). In the StA group, a negative cluster was also found, corresponding to a decrease from baseline in alpha and beta activity ( P = 0.0054, two-sided test; 600-1000 ms for alpha, 440-1000 ms for beta). The StA alpha-band effect also emerged in centro-parietal electrodes but later shifted to frontocentral electrodes ( Fig. 3 C ). In the beta range, the negative modulation of oscillatory activity in StA had a right frontocentral and left centro-parietal distribution ( Fig. 4 C ).
Complementing the within-subject results, between-group statistical analysis across the alpha, beta and gamma ranges revealed one significant positive cluster in the beta range (between 1200 and 1570 ms, P = 0.027, one-sided test; FWER-controlled). This effect was associated with higher beta activity at left sensorimotor and frontocentral electrodes in StA relative to Cont ( Fig. 5 AB) . The individual average of beta-band activity in the significant cluster is shown in Fig. 5 C. Of note, in StA, a qualitative comparison of the sensorimotor and frontocentral beta activity associated with the significant cluster of the betweengroup statistical analysis revealed a greater activity increase in the sensorimotor than in the frontocentral electrode region ( Fig. 5 D ). In the control group, the beta response to pwPE decreased in both electrode regions, but the reduction was more pronounced in frontocentral electrodes ( Fig. 5 E ). There were no additional significant clusters associated with between-group differences in the alpha or gamma ranges (see illustration of gamma responses to pwPE in Supplementary Fig. 2 ).
Because the within-subject and between-group modulation of TF images by the pwPE regressor were limited to the alpha and beta frequency ranges, we performed an additional control analysis to determine the separate effect of the precision weight ( 2 ) and PE (abs[ 1 ]) regressors. Of note, the absolute value of PEs (abs[ 1 ]) is often termed surprise (see e.g. de Berker et al., 2016 ). Like for pwPE about stimulus outcomes, the sign in 1 is not informative and thus a sensible choice is to use the unsigned values ( de Berker et al., 2016 ;Auksztulewicz et al., 2017 ;Stefanics et al., 2018 ). This control analysis could determine whether the alpha and beta pwPE effects primarily stem from precision weights modulating lower frequency activity, or rather from a modulation by the surprise experienced by the participants. Moreover, similarly to PEs, surprise about inputs has been shown to correlate with gamma oscillations ( Bauer et al., 2014 ). Thus, the analysis of the abs[ 1 ] regressor could identify gamma modulation effects that may not be observable in the pwPE analysis. This convolution GLM model included both continuous regressors 2 and abs( 1 ) as well as the discrete regressors coding for outcomes. At the between-subject level, we observed a significant increase in beta-band oscillatory responses to surprise about stimulus outcomes in the StA group relative to Cont (one positive cluster within 1380-1600 ms, P = 0.01, one-sided test, FWER-controlled). This effect was distributed across frontocentral and left sensorimotor electrodes ( Fig. 6 ), similarly to the pwPE effects. There was no significant difference between groups in alpha or gamma-band modulation by surprise. Withinsubject effects also demonstrated that the absolute PE regressor alone modulated alpha and beta oscillatory activity in each group separately (see details in Supplementary Results and Supplementary Figs. 3 and  4 ).
Using the precision weights term ( 2 ) as a regressor, the comparison between groups demonstrated a positive significant cluster exclusively in the alpha frequency range (within 1200-1600 ms, P = 0.01, FWERcontrolled). The positive cluster was associated with higher alpha activity primarily at central electrodes but also at fronto-central and temporal electrodes in StA relative to Cont ( Fig. 7 ). At the within-subject level we only observed that in Cont participants there was a negative change in alpha activity to the precision weight regressor (one negative cluster within 1270-1530 ms; P = 0.024 FWER-controlled, see Supplementary  Fig. 5 ).
Lastly, to exclude the possibility that our high-pass filter settings (0.5 Hz) explained the lack of significant modulation effects in the gamma band, we reanalysed the data in four representative participants after applying a 0.1 Hz high-pass filter during pre-processing. This analysis was motivated by studies showing that higher cutoff frequencies In Cont participants, beta-band oscillations were significantly modulated relative to a baseline level in one negative cluster spanning 400-1120 ms ( P = 0.0002, FWER-controlled due to multiple comparisons arising from testing across space × frequency-band × time dimensions) Left: The topographic distribution of the beta-band effect is widespread across the entire scalp. Right: Time-frequency image for pwPE on level 2, averaged across the significant cluster electrodes. The black dashed line marks the onset of the outcome, and black squares indicate the time-frequency range of the significant cluster. ( C) Same as (B) but in the StA group. We found a significant negative cluster across the beta band, with a latency of 440-1000 ms ( P = 0.0054, FWER-controlled). The beta modulation started in posterior central electrodes and later spread to frontocentral electrodes. Dashed and continuous black lines denote outcome onset and the extension of the significant cluster in the time-frequency range, as in (B).
for high-pass filters can impact the signal-to-noise ratio (SNR) in general and gamma activity in particular ( Bénar et al., 2010 ;Jas et al., 2018 ). In brief, using a 0.1 Hz cutoff as opposed to our choice of 0.5 Hz for high-pass filtering did not reveal any prominent gamma modulation by pwPE or surprise/PE regressors ( Supplementary Figs. 6, 7 ), and did not substantially affect the general SNR level in the power spectral density ( Supplementary Fig. 8) .

Predictions about the stimulus-reward contingency
When assessing within-group level modulations in stimulus-locked oscillatory activity by the prediction regressor, there were no significant effects, neither in the Cont or StA group ( P > 0.05, FWER-controlled). Between-group statistical analysis revealed that predictions about the tendency towards a certain stimulus-reward contingency are associated with significantly higher levels of beta activity in StA than in Cont across frontocentral and parietal electrodes (one positive cluster in the beta band only, from 200 to 640 ms, P = 0.04, one-sided test, FWERcontrolled; Fig. 8 AB ). There were no additional significant clusters extending to the alpha range.
The between-group effect of predictions on beta activity was not confounded by any concomitant effect of motor responses on the neural oscillatory responses, as we had included a response regressor in this analysis. A control analysis on this between-group effect of the response regressor on beta activity showed no significant difference between the two groups (see Supplementary Fig. 9A ).  parametric effects of predictions on outcome-locked beta activity. State anxious participants exhibited a significant increase from baseline in beta oscillatory activity (one significant positive cluster from − 1000 to − 468 ms, P = 0.0106, two-sided test, FWER-controlled). This effect peaked at central parietal and left frontocentral electrodes (see Fig. 8 D ). There were no significant changes from baseline in alpha or beta oscillatory activity for the control group participants. Neither did we find significant between-group differences in outcome-locked alpha or beta activity. Like in our stimuluslocked results, the significant outcome-locked increase from baseline in beta oscillatory activity in the StA group was not confounded by motor modulation, as this was included as a separate regressor in the convolution model. A control analysis of the effect of the response regressor on beta activity yielded non-significant changes from baseline in StA ( Supplementary Fig. 9B ).

Discussion
This study investigated how anxiety states modulate the oscillatory correlates of predictions and prediction errors during the learning Fig. 6. Between-group effects of surprise (absolute PEs) on beta oscillatory activity. The time-frequency images representing modulation by the absolute value of PE about stimulus outcomes (abs[ 1 ]) were estimated in a control convolution GLM using two continuous regressors (abs[ 1 ], 2 ) and additional discrete regressors coding for outcome events (win, lose, no response). (A, B) Between-group differences in beta oscillatory activity (13- of stimulus-reward associations in a volatile environment. The analysis focused on low-level predictions about the tendency of stimulusoutcome contingencies and prediction errors about stimulus outcomes. Because in generative models of the external world precision weights regulate the influence that PEs have on updating predictions ( Feldman and Friston, 2010 ;Friston, 2010 ), we assessed the neural oscillatory responses to precision-weighted PEs (pwPEs), similarly to Auksztulewicz et al. (2017) . We tested this by re-analysing data from our previous study, which investigated Bayesian predictive coding (PC) in state anxiety ( Hein et al., 2021 ). That study showed that anxious individuals overestimate how precise their belief about the stimulus-reward contingency is, attenuating pwPEs on that level and decreasing learning. In the current study, trial-wise model estimates of predictions and pwPEs were used as parametric regressors in a convolution model to explain modulations in the amplitude of oscillatory EEG activity ( Litvak et al., 2013 ).
Consistent with our hypotheses, we found that state anxiety alters the spectral correlates of pwPE and prediction signalling. While pwPEs did not significantly modulate gamma activity as a function of anxiety, they enhanced the amplitude of beta oscillations in state anxiety relative to control participants. This outcome is aligned with our recent findings in temporary anxiety during reward-based motor learning ( Sporn et al., 2020 ). Below we discuss whether this result can be reconciled with hypotheses from generalised PC ( Brown and Friston, 2013 ;Feldman and Friston, 2010 ) in which attention modulates precision weights on PEs through changes in synaptic gains and lower frequency oscillations ( Bauer et al., 2014 ;Sedley et al., 2016 ). Our exploratory analysis of the neural representation of predictions suggested that anxiety states enhance beta oscillations during the generation of predictions about the stimulus-reward contingency. This finding should be taken with care as a between-group difference was observed exclusively in the stimulus-locked analysis, not in the outcome-locked analysis. If validated in future work, this outcome could be an indication that state anxious individuals exhibit a stronger reliance on prior beliefs ( Bauer et al., 2014 ;Sedley et al., 2016 ), down weighting the role of PEs in updating predictions and suppressing gamma responses ( Bauer et al., 2014 ). Overall, our results extend computational work on maladaptive learning in anxiety, suggesting that altered beta frequency oscillations may explain impeded reward-based learning in anxiety, particularly in volatile environments ( Browning et al., 2015 ;Piray et al., 2019 ;Pulcu and Browning, 2019 ).

Oscillatory correlates of precision-weighted prediction errors in state anxiety
In Hein et al. (2021) , a 3-level HGF model best explained learning behaviour. Key findings were that state anxiety decreased the overall learning rate and led to an underestimation of environmental uncertainty and estimation uncertainty about the tendency towards a stimulus-reward contingency. As lower estimation uncertainty (greater precision) drove smaller pwPEs on that level, decreasing learning rates, here we predicted lower gamma activity during processing pwPEs in the state anxiety group. Given that both enhanced gamma and suppressed beta (and alpha) activity have been associated with pwPE during perceptual learning ( Auksztulewicz et al., 2017 ) and with processing unexpected stimuli , we also hypothesised concurrent higher alpha and beta modulation in state anxiety during pwPE signalling. More generally, gamma oscillations are anticorrelated with beta (and alpha) oscillations across the cortex, as shown for sensorimotor processing and working memory ( Hoogenboom et al., 2006 ;Lundqvist et al., 2020 ;Lundqvist et al. 2018Lundqvist et al. , 2016Miller et al., 2018 ;Potes et al., 2014 ).
Our results provide novel insights into how rhythm-based formulations of (Bayesian) PC -initially proposed for sensory processing -can be extended to learning about changing stimulus-reward associations. Our findings show that unsigned pwPEs about stimulus outcomes decreased alpha and beta activity 400-1000 ms post-outcome, separately in each group, suggesting that attenuation of lower frequency responses is associated with processing pwPEs independently of anxiety. Similar findings were observed when analysing separately the unsigned PEs about stimulus outcomes -representing the surprise experienced by the participants -and after controlling for the concomitant effect of precision weights on the update of beliefs. Subsequently, during 1200-1570 ms, state anxiety relative to controls increased beta responses to pwPEs (and similarly for surprise) in sensorimotor and frontocentral electrode regions. This effect is closely aligned with the effects of state anxiety on beta activity (power and burst events) during processing pwPEs in reward-based motor learning ( Sporn et al., 2020 ). Reduced alpha-beta activity was linked to pwPEs in Auksztulewicz et al. (2017) . In addition, beta oscillations have been shown to be involved in updat-ing the content of sensory predictions in auditory processing and visuomotor learning paradigms ( Sedley et al., 2016 ;Tan et al., 2016 ). This is also in line with our results, as the update steps of beliefs about the tendency of the stimulus-outcome contingency in the HGF are a function of the pwPEs on level 2. Accordingly, the increased beta activity in anxiety during the encoding of pwPEs could reflect smaller updates to predictions, explaining poorer learning in this group. The frontal and sensorimotor distribution of the beta effects, however, should be validated in future work combining EEG/MEG with individual MRI scans to conduct convolution modelling in the individual source space.
While recent studies observed an attenuation of low frequency activity during encoding PEs/pwPEs in perceptual tasks, this effect was paralleled by increased gamma oscillatory activity ( Auksztulewicz et al., 2017 ;Bastos et al., 2020 ) -in line with PC hypotheses. We failed to find any effects of pwPEs or unsigned PEs (surprise) on gamma activity, limiting the interpretation of the results. We outline below different accounts that could partially explain the lack of gamma-band effects in this study.
Hierarchical models of sensory information message-passing propose that suppression of PEs conveyed by gamma oscillations can occur through two main mechanisms: (1) the inhibitory effects of top-down predictions, and (2) postsynaptic gain regulation ( Bauer et al., 2014 ;Brown and Friston, 2013 ;Larkum et al., 2004 ). Both mechanisms could partly account for our findings, yet not exclusively. On the one hand, the greater beta activity associated with predictions in state anxiety would convey inhibitory input to superficial pyramidal neurons encoding PEs, decreasing gamma ( Bastos et al., 2012 ;Sedley et al., 2016 ). On the other hand, the estimation uncertainty 2 is the term modulating PEs  ( Eq. (2) ). Accordingly, the lower 2 in state anxiety would attenuate pwPEs, and the putative associated gamma activity would decline.
Mechanistically, precision is thought encoded via postsynaptic gain, modulated by neurotransmitters and attentional processes ( Bauer et al., 2014 ;Feldman and Friston, 2010 ;Friston and Kiebel, 2009 ;Moran et al., 2013 ). Empirical investigations of sensory PEs implicate alpha and beta oscillations in the encoding of the precision of predictions about upcoming sensory input ( Bauer et al., 2014 ;Palmer et al., 2019 ;Sedley et al., 2016 ). Because we investigated biases in learning about stimulus-reward contingencies in anxiety, the relevant precision term in our computational model was 2 (1/ 2 ): the precision of the posterior belief about the tendency towards a stimulus-reward contingency. Increased precision 2 , or reduced estimation uncertainty 2 , as we observed in state anxiety, was associated in our control GLM analysis with increases in alpha activity. One possible interpretation of our results is that the enhanced alpha modulation by 2 in StA could decrease synaptic gain, as proposed for attentional alpha ( Bauer et al., 2014 ), thereby dampening the transmission of prediction errors about stimulus outcomes and the associated gamma oscillations.
Importantly, however, our results do not show that state anxiety attenuates gamma oscillatory activity during encoding pwPE or surprise (absolute PEs). Rather, our analysis suggests that, in our paradigm, even in a normative population such as our control group, encoding pwPEs and surprise about stimulus outcomes is not associated with gamma modulation. This outcome was unexpected as growing evidence indicates that cortical gamma activity is modulated by reward information in different domains beyond perception. Earlier work demonstrated a prominent gamma-band coupling between the frontal cortex and striatum in rats during reward processing and under pharmacological manipulation of dopamine ( Berke, 2009 ). More recently, optogenetic stimulation of dopamine neurons in the rodent ventral tegmental area was shown to increase gamma activity in the medial prefrontal cortex (mPFC, Lohani et al., 2019 ). The effects were larger on sustained relative to phasic gamma and therefore it remains unclear whether dopamine in the PFC can provide transient teaching signals about stimulus-outcome contingencies ( Ellwood et al., 2017 ). Yet a recent study in humans demonstrated a role of dmPFC gamma oscillations in the encoding of unsigned reward prediction errors during an exploration-exploitation dilemma ( Domenech et al., 2020 ). Using invasive local field potential (LFP) recordings across the dmPFC and ventromedial PFC, this latter study provided compelling evidence that the rhythm-based PC mechanism proposed for sensory processing can account for decision making during exploration-exploitation behaviour. Because the pwPE and surprise regressors in our model are not directly coding reward PEs, it is possible that the lack of gamma effects in our study is due to our choice of experimental task and modelling approach. On the other hand, the reduced sensitivity of EEG (unlike invasive LFPs) to gamma oscillations may also account for the lack of gamma activity correlates of pwPEs during reward-based learning in our study. Using invasive LFP recordings in humans, when available, could be particularly relevant in future work to inform an extension of rhythm-based proposals of Bayesian PC to more general learning contexts.
More generally, EEG/MEG studies consistently show that frontocentral beta oscillations are modulated by positive reward feedback or predicting cues ( Bunzeck et al., 2011;Cunillera et al., 2012;Marco-Pallares et al., 2008 ). The effects seem to stem from cortical structures linked to the reward-related fronto-subcortical network, such as the PFC ( HajiHosseini et al., 2012;Mas-Herrero et al., 2015;O'Doherty, 2004 ). These studies, however, did not directly model the update of predictions about the stimulus-reward contingency via PEs. Beyond the Bayesian PC interpretations, a common view is that reduced beta activity in the prefrontal, somatosensory, and sensorimotor territories facilitates the encoding of relevant information to shape ongoing task performance ( Engel and Fries, 2010;Schmidt et al., 2019;Shin et al., 2017 ). Accordingly, state anxiety could be more broadly associated with disrupting processing of relevant information through changes in beta oscillations, in line with some of the evidence on EEG markers of social anxiety disorders ( Al-Ezzi et al., 2020 ) and subclinical state anxiety ( Sporn et al., 2020 ). This can also account for the lack of anxiety-related effects on the modulation of EEG signals in the time domain in our previous work ( Hein et al., 2021 ). In that study we observed that pwPEs about the stimulus tendencies modulated the event-related potentials (ERP) exclusively in the control group during ∼400-600 ms. This effect had a similar latency and topography to the P300-ERP components that had been associated with Bayesian surprise or precision in previous computational studies using EEG ( Kolossa et al., 2015;Mars et al., 2008;Ostwald et al., 2012 ). Although not directly comparable, given that the amplitude of the P300 decreases with increased beta power ( Enriquez-Geppert and Barceló, 2018 ;Polich, 2007 ), it is possible that the abnormally enhanced amplitude of beta oscillations in state anxiety during encoding pwPE may be paralleled by a reduced pwPE-ERP amplitude, explaining the null results in Hein et al. (2021) . Overall, the current results suggest beta oscillations as a candidate marker of biased learning and attenuated belief updating in state anxiety and, as such, could be used as an intervention target in non-invasive brain stimulation, neurofeedback or pharmacological studies.

Biased predictions in state anxiety are associated with enhanced beta oscillations
Capturing neural modulations by predictions is challenging ( Diaconescu et al., 2017 ). The neural representation of predictions could develop anywhere between the previous and current trial's outcome. To address this, we separately analysed oscillatory correlates of predictions about the tendency towards a certain stimulus-reward contingency, both post-stimulus and pre-outcome. Between-group effects were obtained exclusively in the stimulus-locked analysis, corresponding with an increase in beta activity between 200 and 640 ms in the state anxiety group relative to controls, with a widespread topography. This effect was paralleled by a significant beta activity increase in the state anxiety group, yet exclusively in the outcome-locked representation, from − 1000 to − 500 ms prior to the outcome. The topography of this effect extended across central, parietal, and frontal electrode regions. Our analysis focusing on two different yet dependant windows was exploratory; we did not have a strong hypothesis concerning which time interval would be best suited to assess the effect of anxiety on neural oscillatory correlates of predictions, given the gradual modulation of predictions argued before ( Diaconescu et al., 2017 ). The results are, accordingly, interesting yet preliminary and require validation in future work. Previous studies associated alpha and beta oscillatory power to encoding predictions -potentially down-modulating precision weights ( Auksztulewicz et al., 2017 ;Bauer et al., 2014 ;Sedley et al., 2016 ). This work, however, focused on sensory predictions and healthy control participants, which leaves open the question of how aberrant affective states may interact with oscillatory correlates of prediction signals. In our study, interpretation of results in healthy controls is limited given the lack of a significant modulation by prediction in this group.
Further investigation is needed to identify the oscillatory responses to prediction and PE signalling in healthy controls, opening up rhythmbased accounts of Bayesian PC to learning stimulus-reward contingencies in volatile environments. Above all, our findings extend recent computational work on learning difficulties in anxiety ( Browning et al., 2015 ;de Visser et al., 2010 ;Huang et al., 2017 ;Lamba et al., 2020 ;Miu et al., 2008 ;Piray et al., 2019 ). We propose amplified beta oscillations as one neurophysiological marker associated with impaired reward-based learning and attenuated belief updating in state anxiety.

Declaration of Competing Interest
The authors declare no competing financial interests.