Stress effects on learning and feedback‐related neural activity depend on feedback delay

The adaptation of behavior relies on the processing of feedback we receive. We frequently make decisions under stressful conditions and consequences can occur immedi­ ately, but actions can have consequences that are delayed by seconds, minutes, or even months. Both factors, stress and the timing of feedback, can influence the learning from feedback and its neural correlates. While learning from immediate feedback is associated with medial frontal, particularly the dorsal anterior cingulate cortex (dACC; Cohen, Elger, & Ranganath, 2007; Gehring & Willoughby, 2002; Haber & Knutson, 2010; Kessler, Hewig, Weichold, Silbereisen, & Miltner, 2016; Peterburs, Kobza, & Received: 21 November 2018 | Revised: 31 July 2019 | Accepted: 2 August 2019 DOI: 10.1111/psyp.13471

and striatal feedback processing (Foerde & Shohamy, 2011a, but see Dobryakova & Tricomi, 2013), delayed feedback fosters a hippocampal involvement in learning (Foerde, Race, Verfaellie, & Shohamy, 2013, Foerde & Shohamy, 2011b, as revealed by functional neuroimaging findings from healthy participants and work in patients suffering from amnesia or Parkinson's disease. These structures are known to be susceptible to the influence of stress (Hermans, Henckens, Joëls, & Fernández, 2014;Joëls, Karst, & Sarabdjitsingh, 2018), which raises the question of how stress affects learning and feedback-related neural processes depending on the timing of feedback.
Differences in the processing of immediate and delayed feedback emerged in the ERP component feedback-related negativity (FRN). The FRN is a negative deflection between 220-380 ms after feedback presentation in the ERP that is larger for negative compared to positive feedback (Miltner, Braun, & Coles, 1997). The source of the FRN has been lo cated in the dACC (Gehring & Willoughby, 2002;Hauser et al., 2014), while other studies have also found a striatal contribution (Becker, Nitsch, Miltner, & Straube, 2014;Foti, Weinberg, Dien, & Hajcak, 2011). According to a prominent theory, neurons in the dACC are inhibited by bursts of dopa minergic activity following reward or positive feedback but disinhibited by negative feedback, with the dACC integrating these dopaminergic reinforcement signals from the midbrain with information about preceding actions to achieve behav ioral adaptation (Holroyd & Coles, 2002). Accordingly, the FRN correlates with trial-by-trial adaptations of behavior and the updating of outcome expectations Van Der Helden, Boksem, & Blom, 2010). After de layed feedback, the FRN difference between positive and negative feedback is reduced, which is in line with the as sumption of a shift away from medial frontal and striatal pro cessing of feedback toward medial temporal engagement in learning (Peterburs et al., 2016;Weinberg, Luhmann, Bress, & Hajcak, 2012;Weismüller & Bellebaum, 2016).
Another event-related potential (ERP) component related to feedback processing is the P300, which is a positive de flection in the ERP between 300-500 ms after feedback pre sentation. It is often larger for positive compared to negative feedback and is thought to reflect an integration process of positive outcomes over many trials to maximize future re wards (Bellebaum & Daum, 2008;Bellebaum, Polezzi, & Daum, 2010;Kessler et al., 2016). Other studies, however, have found that the P300 is insensitive to feedback valence but sensitive to the magnitude of an outcome (Foti et al., 2011;Goyer, Woldorff, & Huettel, 2008;Yeung & Sanfey, 2004). One prominent interpretation is that the P300 is responsible for context updating, which states that the P300 reflects the revision of mental models of the current task (Donchin, 1981;Donchin & Coles, 1988). In order to interpret the sensitivity to outcome magnitudes of the P300, many authors refer to the motivational salience of a stimulus (Duncan-Johnson & Donchin, 1977;Nieuwenhuis, Aston-Jones, & Cohen, 2005; for reviews see Polich, 2007;San Martín, 2012).
More recently, the analysis of time-frequency dynamics of the EEG has revealed that conflicts and negative behavioral outcomes elicit theta band oscillations over medial frontal electrodes (Cavanagh & Frank, 2014;Cohen, 2014;Cohen, Wilmes, & van de Vijver, 2011). As for the FRN amplitude, frontal theta power increases have been linked to the trialby-trial adaptation of behavior, predominantly after negative outcomes (Cavanagh, Frank, Klein, & Allen, 2010;van de Vijver, Ridderinkhof, & Cohen, 2011). Functional, temporal, and topographical commonalities of theta power with the FRN following negative outcomes have led to the conclusion that theta oscillations play a central role in the generation of the FRN (Cavanagh, Zambrano-Vazquez, & Allen, 2012;Glazer, Kelley, Pornpattananangkul, Mittal, & Nusslock, 2018). Despite these commonalities, frontal theta oscillations have been shown to make unique contributions to feedback processing. The dACC has been proposed to use oscillations in the theta range for communication with the dorsolateral prefrontal cortex (PFC) to realize behavioral ad aptation after negative feedback and for the resolution of con flicts (Cohen, 2014;van de Vijver et al., 2011). In line with this, it has been shown with human intracranial EEG that the dACC generates theta oscillations to recruit the lateral PFC in the implementation of behavioral adaptation (Smith et al., 2015). Overall, theta oscillations appear to reflect a neural process that is not specifically linked to the evaluation of outcome stimuli but more generally to conflict and cognitive control (Cavanagh & Frank, 2014;Cohen, 2014).
The processing of feedback and the adaptation of behav ior following feedback are sensitive to modulations by acute stress. Studies using fMRI have reported a reduction in the reward-related activity in the medial PFC (Ossewaarde et al., 2011) and the reward system after stress (Kruse, Tapia León, Stalder, Stark, & Klucken, 2018). A pharmacological study demonstrated that the stress hormone cortisol, which is re leased from the adrenal cortex as a result of an activation of the hypothalamus-pituitary adrenal (HPA) axis by a stressor, decreased the neural activity in the reward system and the ACC (Kinner, Wolf, & Merz, 2016). A neuroimaging study has yielded further evidence for the notion that cortisol is a central mediator of stress effects on reward-related neural activity (Oei, Both, van Heemst, & van der Grond, 2014). On the behavioral level, stress has been shown to increase learning from positive feedback (Lighthall, Gorlick, Schoeke, Frank, & Mather, 2013) or decrease learning from negative feedback (Petzold, Plessow, Goschke, & Kirschbaum, 2010), while the overall learning performance was not affected.
Only few studies have investigated the effects of stress on the electrophysiological correlates of feedback processing so far, and the available evidence is restricted to a modulation of the FRN and theta power. Concerning the FRN, stress was found to increase the amplitude difference between nega tive and positive outcomes in feedback-based learning tasks (Glienke, Wolf, & Bellebaum, 2015;Wirz, Wacker, Felten, Reuter, & Schwabe, 2017). This finding is so far consistent with the behavioral effects as it indicates stronger process ing differences between negative and positive feedback under stress. At the same time, the changes in amplitude for positive and negative feedback processing cannot directly be linked to changes in learning from positive and negative feedback as other processes irrespective of feedback valence also contrib ute to the result pattern (e.g., Ferdinand, Mecklinger, Kray, & Gehring, 2012), and enhanced differences between negative and positive feedback processing can result from changes for only one type of feedback or both. Other studies using gam bling tasks with random feedback and a noise stressor (which occurred in parallel to the task execution) demonstrated decreases of the difference between negative and positive feedback for the FRN. Also, for frontal theta power, previ ous findings are inconsistent, with both power enhancements for negative feedback  and reduced power differences for negative and positive feedback in the stress condition (Banis, Geerligs, & Lorist, 2014;Banis & Lorist, 2012), possibly mediated by differences in the type and tim ing of the stressor as well as the type of task. As for the FRN, there is no 1:1 relationship between theta power and behav ioral accuracy during learning. For example, enhanced theta after negative feedback may indicate an enhanced tendency for behavioral adaptation which can, if it is too strong, even be a disadvantage in probabilistic learning tasks.
In the current study, we investigated the effects of stress on learning from and the neural processing of immediate and delayed feedback. To test these effects, participants were sub jected to either an acute laboratory stressor (stress group) or a control situation (control group) before they conducted a probabilistic reward learning task with immediate (500 ms) and delayed feedback (6,500 ms). Cortisol concentrations were determined from saliva to capture the stress-induced cortisol reactivity. We examined the effects of stress on feed back-locked ERPs (FRN, P300) and frontal theta power. Moreover, the relationship of the ERPs and theta power to the trial-by-trial subsequent behavioral accuracy was assessed using cross-trial regression analyses.
Based on our previous work with the same stress proto col and related learning paradigms (Glienke et al., 2015;Paul et al., 2018), we hypothesized that stress would increase FRN amplitudes and theta power for negative immediate feedback. Increased FRN amplitudes and theta power were expected to be accompanied by increases in the association of both components with trial-by-trial behavioral accuracy. Due to inconsistencies in previous findings, however, the opposite result pattern is also conceivable (Banis et al., 2014;Banis & Lorist, 2012). For delayed feedback, we expected that stress enhances the FRN and theta power for negative feedback relative to the control group even more strongly than for immediate feedback. Given that delayed feedback process ing recruits the hippocampus (Foerde et al., 2013), which is compromised under stress, and that stress fosters striatal learning based on dopaminergic input (Schwabe & Wolf, 2012), the striatum and dACC may take over in an attempt to compensate for hippocampal dysfunction.
Whether the P300 is modulated by stress and feedback delay is currently unknown. Since the P300 has been linked to reward integration over time, a strong association of P300 amplitudes with accuracy on the single-trial level was not expected.
Finally, to investigate the role of the cortisol reactivity as one important mediator of stress effects on EEG correlates of feedback processing more directly, we performed additional linear mixed effects (LME) analyses. As we had assessed the cortisol level before, during, and after stress induction, we could use the cortisol increase to investigate linear relation ships between this response measure and the FRN and P300 amplitudes and frontal theta power for immediate and de layed feedback in participants of the stress group. The LME analyses served to explore to what extent the observed stress effects were related to the effects of stress-induced cortisol increases on feedback processing.

| Participants
Fifty healthy male volunteers between 18-35 years (mean = 25.3 years, SD = 3.8 years) participated in this study. Prior to testing, participants were screened for the exclusion criteria smoking, previous or current psychiatric or neurolog ical disorders, intake of medication, substance abuse, and a body mass index below 18 or above 29 kg/m 2 . Additionally, they had to be naïve to the stressor (socially evaluated cold pressor test, SECPT, see below). All participants had normal or corrected-to-normal vision.
Participants were randomly assigned to the stress (n = 25) or the control condition (n = 25). Based on their cortisol reactivity (Δ cortisol), participants were classified as stress responders or nonresponders. Participants show ing an increase in cortisol concentrations from baseline to peak of 1.5 nmol/l or higher were classified as respond ers (Miller, Plessow, Kirschbaum, & Stalder, 2013). Since cortisol has been identified as one important mediator of stress effects on reward processing (Kinner et al., 2016;Montoya, Bos, Terburg, Rosenberger, & van Honk, 2014), we excluded eight nonresponders from the stress group and four responders from the control group for inferential statistical analyses that compared the groups directly. For the investigation of the relationship between Δ cortisol and electrophysiological correlates in the stress group by means of LME analyses, we excluded one participant as an outlier for Δ cortisol (35.7 nmol/l) and three partici pants as outliers for either the FRN, P300, or frontal theta power values (these participants were also excluded from the group comparisons as they were nonresponders).
The final sample size for group comparisons was in ac cordance with the ad hoc power analysis (G*Power, ver sion 3.1.9.4; Faul, Erdfelder, Lang, & Buchner, 2007) that revealed a required total sample size of 38 participants to achieve a power of 1-β = .95 to detect a 2 (between-subjects factor) by 4 (within-subject factors) interaction effect with an effect size of f = .329 (with α = .05 and an average correla tion among repeated measures of r = .1). The effects size was expected based on previously reported stress effects on the FRN (Glienke et al., 2015).
The study was approved by the ethics board of the Faculty of Psychology at Ruhr University Bochum. All participants gave informed written consent before participation and were reimbursed with 12 €.

| Experimental procedure
To control for the diurnal cycle of the endogenous cortisol concentrations (Kalsbeek et al., 2012), testing was conducted in the afternoon between 1 and 7 p.m. Participants were in structed to abstain from alcohol and excessive exercise the day before the testing and to refrain from anything but water 2 hr before testing.
After their arrival in the lab, participants gave written in formed consent and EEG electrodes were prepared (Figure 1a). The first saliva sample (−1 min) and baseline cardiovascular measures were obtained before participants underwent the F I G U R E 1 (a) Time line of the experiment. After electrode preparation, participants were either subjected to the stressful SECPT or a control procedure. Twenty minutes after the treatment, participants were subjected to the reward learning task. Participants conducted one block with immediate feedback and one block with delayed feedback. Order of the blocks and timing of the onset of the immediate feedback condition were counterbalanced between the participants. Saliva samples were taken at four time points (−1, +1, +20, +56 min). The time is reported relative to the onset of the treatment (SECPT or control procedure). (b) Course of an example trial. Participants had to make a choice between two stimuli within 1,000 ms. The choice was highlighted for 200 ms, followed by a delay period that was 500 ms for the immediate feedback condition and 6,500 ms in the delayed feedback condition. After the delay, feedback was presented for 500 ms. Participants could either receive 20 cents or lose 10 cents. If no response was made within 1,000 ms after stimulus presentation, participants were reminded to respond faster stress treatment. While the stress group was subjected to the stressful SECPT, the control group was subjected to a control situation. One minute after the stress manipulation, the sec ond saliva sample was collected (+1 min), and post-treatment cardiovascular measures and subjective stress ratings were ob tained. Twenty minutes after the stress manipulation, the third saliva sample was collected (+20 min), and participants con ducted the reward learning task. After the reward learning task, the last saliva sample was taken (+56 min), and participants were debriefed and reimbursed.

| Stress induction and assessment
During the SECPT, participants had to immerse their right hand in ice water (0-2°C) for maximal 3 min, while they were videotaped. Additionally, an unfamiliar female experi menter instructed and observed the participants. During the control situation, participants immersed their right hand in warm water (35-37°C) and were neither videotaped nor ob served by an experimenter.
To assess the effectiveness of the stress induction, sub jective and physiological stress measures were obtained. Participants rated the stressfulness, pain, discomfort, and difficulty to keep the hand immersed on scales increasing in steps of 10 from 0 (not at all) to 100 (very much). Systolic and diastolic blood pressure and the heart rate were obtained before, during, and after the treatment using the Dinamap system (Critikon, Tampa, FL). We obtained three measures of blood pressure and heart rate at each time point and av eraged the measures at each time point. Salivettes (Sarstedt, Nümbrecht, Germany) were used to collect saliva at four time points (−1 min, +1 min, +20 min, and + 56 min). After testing, saliva samples were stored at −18°C. To determine the cortisol concentrations, saliva was analyzed using a cor tisol enzyme-linked immunosorbent assay (Demeditec, Kiel, Germany) with intra-assay coefficients of variance (CV) below 5% and interassay CVs below 15%.

| Reward learning task
During the probabilistic reward learning task that was adapted from a previous study (Figure 1b;Weismüller & Bellebaum, 2016), participants learned to make beneficial choices be tween stimuli based on immediate and delayed monetary feedback. In each trial, participants made a choice between two Japanese characters that appeared on the left and right side of a computer screen and received 20 cents or lost 10 cents for their choice.
Participants conducted two blocks of 100 trials. In one block, participants received immediate feedback (500 ms), while in the other block feedback was delayed by 6,500 ms. The order of the blocks was counterbalanced between partic ipants. Ten Japanese characters were used as stimuli, five of which were used in the block with immediate feedback and five in the block with delayed feedback (counterbalanced be tween participants). In both blocks, each of the five Japanese characters was associated with a fixed reward probability of 0%, 20%, 40%, 60%, or 80%. Each of the ten possible stimu lus combinations was presented ten times within one block, with the assignment of stimulus to the side on the computer screen counterbalanced between trials.
Participants had to press the left or right Ctrl key on a standard computer keyboard to choose the left or right stim ulus within 1,000 ms after stimulus presentation. The choice was highlighted for 200 ms. After that, a fixation cross was presented for either 500 ms (immediate feedback condition) or 6,500 ms (delayed feedback condition) before the feedback was presented for 500 ms. The length of the intertrial interval was jittered between 500 ms and 1,000 ms.
While the delayed feedback block lasted 18 min, the imme diate feedback block took 8 min to complete. Since the timing of a task relative to the stressor is a crucial factor for the in fluence of stress on learning and PFC functioning (Arnsten, 2009;Pabst, Brand, & Wolf, 2013;Schwabe & Wolf, 2013; for a review, see Joëls, Fernandez, & Roozendaal, 2011), we minimized the differences in the timing of the task relative to the stressor between the blocks by varying the onset of the immediate feedback block between participants ( Figure 1a). While the delayed-feedback block started 20 min (delayed feedback first) or 38 min (immediate feedback first) after the stressor, the onset of the immediate-feedback block was ran domized between 20, 25, and 30 min after the stressor (when the immediate feedback was first) or between 38, 43, and 48 min after the stressor (when delayed feedback was first). During breaks between the blocks, participants remained seated in the EEG chamber and rested.
The response accuracy was determined separately for the immediate and delayed feedback condition. In line with previous studies Foerde et al., 2013;Weismüller & Bellebaum, 2016), responses were considered correct when participants chose the stimulus with the higher reward probability. To analyze differences between groups and conditions, percent correct responses was determined.

| EEG recording and data processing
EEG was recorded from 30 passive Ag/AgCl electrodes, which were mounted on the head in an elastic cap (EasyCap, Herrsching, Germany). Electrodes were distributed accord ing to the 10-20 system. Data were digitized at 500 Hz by a 32-channel BrainAmp Standard AC amplifier (Brain Products, Gilching, Germany) and with a time constant of 10 s. Participants were grounded by an electrode at the FPz position, and electrodes at linked mastoids served as refer ences. Impedances were kept below 10 kΩ. EEG data were analyzed using the FieldTrip toolbox (Oostenveld, Fries, Maris, & Schoffelen, 2011) and MATLAB R2016a (The MathWorks Inc., Natick, MA). Continuous data were segmented from 1,500 ms before to 3,000 ms after feed back presentation and filtered with a 0.5 Hz high-pass, zeroshift Butterworth IIR filter and a 48-52 Hz band-stop filter for the elimination of line noise. Eyeblinks were removed using an independent component analysis. One component with a symmetrically frontal, positive topography was iden tified and removed from the data for each participant before the data were back-transformed. After the eyeblink correc tion, segments with residual artifacts, such as muscle artifacts and sharp edges, were removed by careful visual inspection.
For the ERP analyses, an additional 20 Hz low-pass, zeroshift Butterworth IIR filter was applied to the data. Averages were calculated for the four task conditions (immediate posi tive feedback, immediate negative feedback, delayed positive feedback, delayed negative feedback). Averages contained on average of 43.5 (SEM = 1.0) immediate positive feedback tri als, 50.7 (0.9) immediate negative feedback trials, 43.8 (1.2) delayed positive feedback trials, and 47.9 (1.7) delayed neg ative feedback trials. Afterward, the averages were baseline corrected using a −200 to 0 ms prefeedback baseline.
Time windows for the quantification of the ERP compo nents were determined from the average ERP of all trials and all participants (across all experimental factors). The FRN was defined as mean amplitude between 215 and 315 ms rela tive to the feedback onset at electrode FCz. The time window was centered at the latency of the FRN peak in the average ERP. The P300 was defined as local maximum between 300 and 500 ms after feedback onset at electrode FCz.
To obtain the time-frequency spectra, data were con volved with a series of 59 linearly spaced complex Morlet wavelets ranging from 1 to 30 Hz. The wavelets each had a width of 5 cycles, resulting in a σ of 132.6 ms at 6 Hz, which was at the center of the frequency band of interest (4-8 Hz). Power spectra were averaged over segments of the immediate positive feedback, immediate negative feed back, delayed positive feedback, and delayed negative feed back conditions, respectively. Afterward, the relative signal change was calculated with respect to the −400 to −100 ms prefeedback baseline. To compare the theta power between groups and conditions, we averaged the power between 200 and 600 ms after feedback onset and between 4 and 8 Hz.
To assess the relationship between the ERP and the time-frequency power on the one hand and trial-by-trial behavioral adaptation on the other hand, we performed cross-trial regression analyses. The aim of the analysis was to investigate whether the FRN and the frontal theta power of the current trial were related to the accuracy of the subsequent trial in which the chosen stimulus was pre sented again. We focused on the ERPs and time-frequency power after negative feedback in this analysis, since a link between FRN and theta power and the adaptation of behav ior has been shown for negative feedback trials (Cavanagh et al., 2010;van de Vijver et al., 2011;Van Der Helden et al., 2010). Similar to previous studies (Cohen, 2016), the EEG data of each trial and at each time or time-frequency point at electrode FCz were projected onto a design matrix that comprised one column for the intercept and one column containing the accuracy of the subsequent trial in which the chosen stimulus was pre sented again. The time series or time-frequency power (y) and the design matrix (X) were subjected to a least squares equation as β = (X T X) −1 X T y, where X is the design matrix, y is the data matrix, T is the transpose, and −1 is the inverse of a matrix. The least squares equation was solved using the mldivide function in MATLAB, which is the least squares solution to linear systems as Ax = B. As a result of this analysis, we obtained a time series or time-frequency map of β coefficients per condition for each participant. β coef ficients, which describe the relationship of the data (time series and time-frequency power) and the design matrix (accuracy), were z transformed afterward. Subsequently, time-frequency z values were averaged over the theta band (4-8 Hz) and between 200 and 600 ms after feedback onset.

| Statistical analyses
Subjective stress ratings and cardiovascular measures were analyzed using multivariate analyses of variance (MANOVA) with the between-subjects factor group (stress, control). The analysis of the cardiovascular measures additionally in cluded the within-subject factor time (pretreatment, during, post-treatment).
Differences in salivary cortisol concentrations were tested using repeated measures ANOVAs with the within-subject factor time (−1 min, +1 min, +20 min, +56 min) and the between-subjects factor group (stress, control).
Accuracy and the amplitudes of the FRN and P300 were analyzed with repeated measures ANOVAs with the be tween-subjects factor group (stress, control) and the withinsubject factor feedback delay (immediate feedback, delayed feedback). The analysis of the FRN and P300 additionally included the within-subject factor feedback valence (positive feedback, negative feedback).
Significant interactions were resolved using post hoc t tests and repeated measures ANOVAs. Post hoc tests were corrected for multiple comparisons using the false discov ery rate (FDR) correction (Benjamini & Hochberg, 1995). In all cases of violations to the sphericity assumption, the Greenhouse-Geisser correction was applied and ε values are reported. If unequal variances were detected, degrees of freedom of the t tests were corrected accordingly. The α level of .05 was applied to all parametric tests. Partial etasquared values are reported as estimates of effect sizes of the | 7 of 18 PAUL et AL.
MANOVAs and ANOVAs. Effect sizes of pairwise compari sons are reported as Cohen's d.
Frontal theta power was analyzed with the factors group, feedback-delay, and valence. Since theta power increases are often observed over medial and lateral fron tal electrodes for negative feedback van de Vijver et al., 2011), theta power was analyzed for all electrodes to determine the topographical specific ity of the observed effects. The statistical analysis of the time-frequency data relied on nonparametric cluster-based permutation statistics to correct for the accumulation of alpha errors in multiple comparisons (Maris & Oostenveld, 2007). First, coherent spatial clusters of electrodes ex ceeding the statistical threshold of α < .05 were detected, and summed t values of each cluster were returned as test statistic. Subsequently, at each of 1,000 iterations during the permutation test, the group affiliations (stress, control) of a random subset of participants were swapped and the first step was repeated to create a null distribution. The test statistic is then compared to the null distribution. For each cluster reaching the cluster-based threshold, summed t values (t sum ) and cluster p values are reported. For the statistical analysis of the regression analysis between the time-frequency spectrum and accuracy, a null distribution was created as described before, which was used to z trans form the beta coefficients. P values were determined from z values averaged over the theta band (4-8 Hz) and over the time window of 200-600 ms after feedback presentation.
Finally, we directly examined the effect of cortisol re activity (Δ cortisol), which was defined as the increase in cortisol concentrations from baseline (−1 min) to peak (+20 min), and its interactions with the effects of feedback delay and valence on neural feedback processing in par ticipants assigned to the stress group by means of LME analyses. LME analyses were performed by using the lme4 statistical package (version 1.1-18) in the R environment (version 3.5.1). The LME analyses were conducted for 21 participants, including five nonresponders with very low cortisol increases or even cortisol decreases. Separately for the FRN, the P300, and the frontal midline theta power, we specified a model that included the categorical factors feedback delay (recoded as +1 = immediate, −1 = delayed feedback) and feedback valence (recoded as +1 = positive feedback, −1 = negative feedback), and the continuous fac tor Δ cortisol (mean-centered) as fixed effects predictors. We also modeled all the interactions between these factors. Participants were entered into the model as a random effects factor. Following the approach suggested by Luke (2017), we used the restricted maximum likelihood approach to estimate the model and the R package lmerTest (version 3.0-1; Kuznetsova, Brockhoff, & Christensen, 2017) to evaluate significance in the model by using Satterthwaite approximation for the degrees of freedom. Significant interactions were examined by applying follow-up simple slope analyses using the R package jtool (version 0.7.3).

| Subjective stress response
The SECPT successfully elicited a subjective stress re sponse (see Table 1

| Behavior
To assess the influence of stress on the performance in the reward learning task, we determined the accuracy (Figure 3) during the learning from immediate and delayed feedback.
The analysis revealed a Feedback Delay × Group inter action: F(1, 36) = 5.21, p = .024, η 2 p = .13. Exploring the interaction with FDR-corrected within-group comparisons, we found more correct responses in the delayed feedback condition compared to the immediate feedback condition in the stress group, t(16) = 2.77, p = .014, d = 0.97, while no difference in accuracy between conditions was detected for the control group, t(20) = 0.51, p = .614, d = 0.13.

| Stress and feedback-delay modulations of neural feedback processing
To test whether stress and feedback delay affected neural feedback processing, we analyzed the FRN, the P300, and frontal midline theta power. Additionally, cross-trial regres sion analyses were conducted to assess the relationship of the ERPs and frontal theta power with the subsequent behavioral accuracy on the single-trial level.

F I G U R E 2 Salivary cortisol concentrations are depicted relative
to the time of the onset of the treatment (SECPT or control treatment). While there were no differences in salivary cortisol between the groups at baseline (−1) and 1 min after the treatment (+1), cortisol was elevated in the stress group 20 min after the SECPT (+20) and after the reward learning task (+56). Error bars represent the SEM. *p < .05 F I G U R E 3 Accuracy in percent correct responses, averaged over the immediate and delayed feedback condition and for the control and the stress group. Accuracy in the stress group was relatively reduced in the immediate feedback condition, while it was larger than in the control group in the delayed feedback condition. The asterisk represents the significant Group × Feedback Delay interaction (p = .024). Error bars represent the SEM | 9 of 18 PAUL et AL.

| Frontal midline theta power
Cluster-based permutation tests were applied to investi gate the effects of stress and feedback delay on the frontal midline theta power (4-8 Hz, 200-600 ms postfeedback). A cluster of electrodes was detected that demonstrated a Group × Feedback Delay × Valence interaction effect, t sum (36) = 45.99, p = .004. This cluster also included the midfrontal electrode FCz, t(36) = 2.38, p = .004. Further permutation tests revealed that both groups showed stronger theta power for negative compared to positive feedback after immediate feedback (control: t sum (20) = 50.12, p = .006; FCz: t(20) = 3.99, p = .006; stress: t sum (16) = 16.38, p = .048; FCz: t(16) = 2.60, p = .048). This theta power difference between negative and positive feedback was larger in the control group compared to the stress group for immediate feedback (Figure 5a,b; Group × Valence interaction: t sum (36) = 27.32, p = .014). The theta power increase in the control group was detected at a cluster of electrodes with a lateral frontal distribution that did not in clude the FCz, t(36) = 1.68, p > .9. In the delayed feedback condition (Figure 5c,d), no difference between the groups F I G U R E 4 Results from the ERP analysis. (a) Grand averages are presented for the immediate and delayed feedback conditions and for the control and the stress group. Shaded areas represent time intervals used for the mean amplitude of the FRN and peak detection of the P300. (b) Average amplitudes of the FRN (upper) and the P300 (lower) are depicted. The FRN was reduced in the stress group compared to the control group and in the delayed feedback compared to immediate feedback condition. In the delayed feedback condition, the P300 was larger for positive compared to negative feedback in the control group. The reduction of the FRN in the stress group tended to be larger in the delayed feedback condition. In the stress group, the P300 for positive feedback was reduced and did not differ between positive and negative feedback. There was no effect of group or valence on the P300 in the immediate feedback condition. Error bars represent the SEM was detected for the contrast between negative and posi tive feedback (Group × Feedback Valence interaction: at all electrodes t ≤ 1.42, and all ps > .9).

| Cross-trial regression analyses of midfrontal EEG components and accuracy
To investigate whether the observed differences in midfron tal EEG components (FRN, theta power) were related to the behavioral accuracy on the single-trial level, we performed regression analyses between each time point or timefrequency point at midfrontal electrode FCz after negative feedback, and the accuracy in the subsequent trial in which the chosen stimulus of the current trial can be chosen again.
We found a significantly stronger relationship between the ERP and subsequent accuracy within the time range of the FRN (286-308 ms) in the control group compared to the stress group for immediate feedback (Figure 6a, z = −2.16, p = .031). A stronger relationship indicated that larger FRN amplitudes were related to better subsequent performance. The groups did not differ in the delayed feedback condition ( Figure 6b, 215-315 ms, z = −0.16, p = .87). The group dif ference for immediate feedback tended to be larger compared to delayed feedback (Group × Feedback Delay interaction at time interval 234-248 ms: z = −1.79, p = .073).
The frontal theta power (4-8 Hz, 200-600 ms post feedback) had a positive relation with subsequent accu racy (Figure 7) depending on feedback delay and group (Group × Feedback Delay interaction: z = 2.47, p = .014). Theta power and subsequent accuracy had a positive relation ship in controls for immediate feedback that was reduced in stressed participants (z = 2.27, p = .023). For delayed feed back, the relationship between theta and behavior did not dif fer between groups (z = 0.83, p = .408).  (Figure 8c), and Δ cortisol depending on feedback delay and valence, which we examined with LME analyses. Online F I G U R E 5 Time-frequency plots and topographical maps of the difference between negative and positive feedback (negative-positive) are shown for the control group, the stress group, and the difference between groups (control-stress). Time-frequency maps show the relative power changes averages of all electrodes included in a significant cluster. If no significant effect was observed, time-frequency power of electrode FCz is depicted. Topographical maps show the relative power change averaged over the theta band (4-8 Hz) and over 200-600 ms postfeedback interval. Significant electrode clusters are highlighted with filled circles. Bar graphs show relative theta power changes (4−8 Hz, 200-600 ms postfeedback) at electrode FCz separately for positive and negative feedback. (a, b) Results of the immediate feedback conditions. (c, d) Results of the delayed feedback condition. Results show a decrease in frontal theta power (4-8 Hz and 200-600 ms postfeedback) after stress in immediate feedback. In delayed feedback, frontal theta power is diminished and does not differ between groups supporting information, Table S1A, S1B, and S1C provides a summary of the estimated mixed-effect models, with param eter-specific t tests for all effects for the FRN, the P300, and theta power, respectively. The description of the results below will be restricted to effects involving the factor Δ cortisol.

| Feedback-related negativity
The analysis for the FRN revealed that the Feedback Delay × Δ Cortisol, F(1, 57) = 14.573, p < .001, interac tion was significant. Follow-up simple slope analysis of this interaction revealed that the amplitude of the FRN was sig nificantly modulated by Δ cortisol only for immediate feed back (p = .02) but not for delayed feedback (p = .92). In the immediate feedback condition, FRN amplitudes became less positive and thus larger for larger values of Δ cortisol. The Δ cortisol main effect and all remaining interactions including Δ cortisol as a factor were not significant (all ps > .07).

| P300
The P300 analysis revealed a significant main effect of Δ cortisol, F(1, 19) = 6.197, p = .022, which was further quali fied by a significant Feedback Delay × Δ Cortisol interaction, F(1, 57) = 7.550, p = .008. Follow-up simple slope analysis of this interaction revealed that the amplitude of the P300 was significantly modulated by Δ cortisol only for immediate feedback (p < .001) but not for delayed feedback (p = .18). Resembling the pattern of the FRN, amplitudes were re duced (i.e., less positive) for increasing values of Δ corti sol. The Valence × Δ Cortisol interaction and the Feedback Delay × Valence × Δ Cortisol three-way interaction were not significant (both ps > .80).

| Frontal midline theta power
The analysis of theta power revealed a significant main ef fect of Δ cortisol, with reduced theta power for larger cortisol increases, F(1, 19) = 15.225, p < .001. We did not find any significant interaction effect (all ps > .25).

| DISCUSSION
The current study investigated the effects of stress on learn ing from immediate and delayed feedback and the underly ing neural mechanisms of feedback processing. Participants that underwent the stress induction reported increased sub jective stress and showed increased cardiovascular and cor tisol responses relative to controls. In the stress group, the performance was increased for delayed feedback relative to immediate feedback, but this was not the case in the con trol group. The neural correlates of feedback processing were also influenced by stress. Stress overall decreased the difference between FRN amplitudes for negative and posi tive feedback. The P300 was decreased in the stress group relative to the control group for delayed feedback, while it did not differ between groups for immediate feedback. Stress reduced the P300 specifically for positive delayed feedback. As a consequence, the valence sensitivity of the P300 was diminished in the stress group. Frontal theta power was re duced by stress for immediate feedback but not for delayed feedback. Beyond stress-induced modulations of ERP ampli tudes and theta power, we observed that stress changes the association of the FRN and frontal theta with future behavior. Cross-trial regression analyses revealed that stress decreased the associations of the FRN and frontal theta power with sub sequent performance for immediate feedback trials. Learning from delayed feedback was unrelated to the FRN and frontal theta power in both groups. LME analyses showed that stressinduced cortisol increases were associated with increases in FRN amplitudes for immediate feedback, and the difference between negative and positive feedback tended to decrease with increases in cortisol. For delayed feedback, cortisol in creases were not related to FRN amplitudes. Cortisol was related to decreases in P300 amplitudes for immediate, but F I G U R E 6 Time series of beta coefficients of cross-trial regression analysis between the ERPs time-locked to the presentation of negative feedback and the accuracy in the subsequent trial in which the chosen stimulus was presented again. (a) Cross-trial regression beta coefficients for the immediate feedback condition. (b) Beta coefficients for the delayed feedback condition. Significant differences between control and stress group are highlighted by shaded areas not delayed, feedback. Theta power overall decreased with increasing cortisol responses.
The FRN and frontal theta oscillations both have been re lated to the processing of feedback and the subsequent behav ioral adaptation (Cavanagh et al., 2010;. While imaging studies demonstrated stress-induced reductions in the activity of brain regions responsible for feedback processing (Kruse et al., 2018;Ossewaarde et al., 2011), investigations of stress effects on EEG correlates of feedback processing have yielded inconsistent findings.
With respect to the FRN, some studies reported an in creasing effect of stress on the amplitude difference between negative and positive feedback (Glienke et al., 2015;Wirz et al., 2017) and on the functionally related error-related neg ativity (Dierolf et al., 2018). Other studies, however, demon strated that FRN amplitude differences between negative and positive feedback are reduced by stress (Banis et al., 2014;Banis & Lorist, 2012).
The present FRN results are in line with the latter findings as the difference between negative and positive feedback was overall reduced by stress in the current study, and stress also decreased the association between the FRN and subsequent be havior for immediate negative feedback, suggesting that, under stress, feedback could not be used for behavioral adaptation. LME analyses further revealed that stress-related corti sol reactivity was associated with larger FRN amplitudes F I G U R E 7 Time-frequency maps depict the beta coefficients that reveal the strength of the relationship between each time-frequency point and the accuracy in the subsequent trial in which the chosen stimulus was presented again. (a) Regression results for the immediate feedback condition. (b) Results for the delayed feedback condition for immediate feedback. Moreover, the difference between negative and positive feedback decreased with increasing cortisol responses, indicating that the FRN becomes less sensitive to feedback valence in participants characterized by a strong stress response. The stress effect on the FRN described above thus seems to be mainly driven by corti sol effects on feedback processing. This finding appears to contradict not only our previous finding (Glienke et al., 2015) but also our hypothesis that stress would enhance the FRN amplitude difference between negative and positive F I G U R E 8 Correlation between the amplitude of (a) FRN, (b) P300, and (c) theta power and Δ cortisol depending on valence (negative in gray, positive in black) in the immediate feedback (left) and delayed feedback (right) condition. FRN for immediate feedback was larger with increasing cortisol responses, but the difference between negative and positive feedback tended to decrease with increasing cortisol responses. This association was not observed for delayed feedback. P300 amplitudes were overall negatively correlated with cortisol responses for immediate but not for delayed feedback. Theta power overall decreased with increasing cortisol responses for immediate and delayed feedback feedback, especially for delayed feedback. Our reasoning was that stress should promote incremental learning based on prediction error coding by dopamine neurons in the midbrain and their projections to the striatum and ACC, which should be reflected by the FRN amplitude difference between negative and positive feedback. It must be pointed out, however, that the learning paradigm and the analysis strategy in our previous study (Glienke et al., 2015) dif fered from the present study. There, we focused on the later period of the experiment and found an increased FRN am plitude difference between negative and positive feedback processing under stress only for a condition in which feed back was not contingent on the previous response so that learning was not possible.
According to a more recent view, the FRN reflects also a salience prediction error, possibly in addition to a reward prediction error, which would be in line with the theory that the ACC is primarily an action-outcome predictor and not specifically related to the processing of feedback valence (Alexander & Brown, 2011). Indeed, some studies found that the FRN is sensitive to both positive and negative unexpected outcomes (Ferdinand et al., 2012;Sambrook & Goslin, 2016;Talmi, Atkinson, & El-Deredy, 2013). In light of these find ings, stress-induced increases in cortisol reactivity might have caused an increased saliency of feedback stimuli, irrespective of the feedback's valence, which generally increased the FRN and caused a decreased sensitivity to feedback valence.
For feedback-locked theta modulations, previous results concerning effects of stress are also inconsistent. In a recent study, we found increased frontal theta power for negative feed back following stress . The current stress-re lated decrease of the frontal theta power during learning from immediate feedback contradicts this but is in line with a previ ous EEG study that found a stress-induced decline in frontal theta (Banis et al., 2014). The present result is also in line with previous imaging studies showing that stress reduces the BOLD signal in prefrontal brain regions during feedback processing (Kruse et al., 2018;Ossewaarde et al., 2011), as mediofron tal theta has been linked to prefrontal processes of cognitive control (Cavanagh & Frank, 2014;Cohen, 2014). Beyond the power changes, we found that stress attenuated the association of frontal theta power with subsequent behavioral accuracy. The stress effects on theta power were likely caused by corti sol, as the LME analysis revealed that cortisol increases in the current study were associated with overall decreases in frontal theta power. This is in line with previous findings showing that the administration of cortisol is associated with reduced activa tion of the dACC (Kinner et al., 2016). Accordingly, the cur rent finding might reflect a reduced control of the medial PFC/ dACC over behavior with increasing cortisol reactivity.
Inconsistencies of the theta findings with the results of previous studies may again, at least partially, be related to differences in the tasks that were used. In our previous study, for instance, we applied a category learning task and found an increasing effect of stress on frontal theta power only in a difficult task condition but not in an easy task condition . This result suggests that task difficulty and thus the amount of cognitive control needed for the task at hand might be critical for the stress effect on frontal theta power (see Cavanagh & Frank, 2014;Cohen, 2014). Increased theta power was interpreted in terms of compensatory cognitive processes to maintain performance under stress in the face of high task demands. In the current study, task difficulty can probably not account for the differences in theta power between immediate and delayed feedback. While learning from delayed feedback was in some studies found to be more difficult than learning from immediate feedback (Maddox, Ashby, & Bohil, 2003;Maddox & Ing, 2005), it was associ ated with overall decreased rather than increased theta power in the present study. Moreover, theta power was unrelated to subsequent behavioral accuracy for delayed feedback, sug gesting that the role of theta oscillations for performance was reduced overall. Instead, theta time-locked to negative feed back may reflect a cognitive control process that indicates the need for behavioral adaptation especially for immediately preceding events, which was also suggested by a very recent related study (Weismüller, Kullmann, Hoenen, & Bellebaum, 2019). This process seems to be affected by stress-and cor tisol, in particular. The reduced association between theta power and accuracy for delayed feedback on the single-trial level, however, may have been due to overall reduced theta power. Together, these results indicate that stress reduces me dial frontal neural oscillations especially for immediate feed back that is associated with behavioral adaptation.
Similar to previous reports (Bellebaum & Daum, 2008;Bellebaum et al., 2010;Hajcak, Moser, Holroyd, & Simons, 2007), the P300 in the current study was larger for positive compared to negative outcomes. Although other studies found the P300 to be sensitive to outcome magnitude but not valence (Sato et al., 2005;Yeung & Sanfey, 2004) and yet another study reported that the P300 is sensitive to both valence and magni tude of an outcome (Wu & Zhou, 2009), there is consent upon the role of the P300 in categorizing and integrating feedback information to optimize behavioral strategies and obtain max imal gains (San Martín, 2012). The current finding that P300 amplitudes were unrelated to subsequent behavioral accuracy fits with the idea that the P300 reflects the integration of feed back information over time and not trial-by-trial behavioral adaptation (Glazer et al., 2018;Polich, 2007). While stress did not affect the P300 for immediate feedback, it reduced the P300 for delayed feedback specifically after rewards. This sug gests that the P300 for delayed feedback reflects a stress-re lated attenuation of the sensitivity to feedback valence, which is in accordance with previous studies reporting that stress reduces the reward sensitivity (Berghorst, Bogdan, Frank, & Pizzagalli, 2013;Bogdan & Pizzagalli, 2006). The association of reduced P300 amplitudes with increasing cortisol levels suggests that these stress effects on the P300 were mainly me diated by cortisol reactivity following stress.
Previous studies reported sex differences in stress effects on emotional learning (Andreano & Cahill, 2006;Merz & Wolf, 2017;Zoladz et al., 2015) and in the effects of cortisol on the reward system (Kinner et al., 2016). We controlled for potential sex differences by testing only men. Future studies need to explore potential sex differences in the current stress effects on learning from immediate and delayed feedback and the neural correlates of feedback processing.
Finally, we found evidence that learning from delayed feedback is enhanced relative to immediate feedback under stress, while in the control group no differences between feedback delays were seen. This is puzzling, as all measures of feedback processing appear to suggest a stress-induced impairment of feedback processing. It thus seems that this behavioral effect was driven by neural mechanisms that were not reflected in the EEG measures that we analyzed as dependent variables. While studies suggest a stronger hip pocampal involvement in learning from delayed feedback (Foerde et al., 2013), an enhanced hippocampal involve ment under stress seems unlikely as hippocampal process ing has been suggested to be impaired by stress (Schwabe & Wolf, 2012). Furthermore, a stronger contribution of the dorsal striatum or the ACC to learning from delayed feedback under stress can be excluded, as this should be reflected by enhanced FRN amplitudes. At the same time, it is important to note that the dopaminergic system and the striatum are also involved in the processing of delayed feedback. For example, Dobryakova and Tricomi (2013) found striatal activations for feedback stimuli that followed a response after a delay of 25 min, and Weismüller et al. (2018) described a similar effect of reduced dopamine lev els in Parkinson's disease on learning from immediate and delayed feedback. What seems to differ between learning from immediate and delayed feedback is the integration of feedback with the preceding response, which is based more on the dorsal striatum for immediate and more on the hippocampus for delayed feedback processing (Foerde & Shohamy, 2011a). On the other hand, the ventral stria tum, which has been linked more to learning stimulus-out come rather than action-outcome associations (O'Doherty et al., 2004), has been described to be similarly involved for both types of feedback (Foerde & Shohamy, 2011a). It is thus conceivable that delayed feedback was processed more by the ventral striatum under stress and that, given the role of the ventral striatum in learning stimulus-out come associations, the task was solved mainly by focus ing on the relation between the stimuli and the outcomes. This would also explain why we did not see this enhanced feedback processing in the FRN, as the FRN reflects more strongly processes of action-outcome association (Oliveira, McDonald, & Goodman, 2007;Yeung, Holroyd, & Cohen, 2005). Nevertheless, this explanation is speculative, and it remains open as to how the feedback was integrated with the preceding event, stimulus, and/or response over a delay under stress.
Concerning feedback processing, there was only one as pect in the current results in which the pattern for stressed participants and delayed feedback processing differed from all other conditions. While the P300 was generally reduced by stress for delayed feedback, it did not distinguish between negative and positive feedback processing. The current find ing may reflect a "more realistic" feedback processing in this condition, as it is consistent with the actual frequencies of the occurrence of negative and positive feedback. Based on the idea that the P300 reflects the integration of reward infor mation over time (San Martín, 2012), this altered feedback processing indicated by the P300 may underlie enhanced task performance for delayed feedback by stressed participants.
In summary, the current study revealed that stress influ ences feedback learning and neural feedback processing, par tially depending on the timing of feedback. The disruption of associations between frontal theta oscillations and the FRN with subsequent behavioral accuracy is a potential mechanism of stress-induced learning impairments for immediate feed back that was, however, compensated for by the stressed par ticipants of the present study, so that overall learning was not impaired under stress. Instead, learning from delayed feedback was even enhanced after stress, although unrelated to neural feedback processes as reflected by the EEG measures. Our findings illustrate complex interactions between stress, feed back delay, and feedback valence. The observed behavioral ef fects cannot fully be explained by the EEG-derived measures of neural feedback processing. Future studies with different methodological approaches are needed in order to integrate the current findings into a formal model of feedback-based learn ing under stress.

ACKNOWLEDGMENTS
This research was supported by the Deutsche Forschungs gemeinschaft (DFG) project B4 of the Collaborative Research Centre 874 (Integration and Representation of Sensory Processes)-project number 122679504. We would like to thank Osman Akan, Julia Pietzko, and Svenja Quassowsky for help with data collection.