Learning something new versus changing your ways: Distinct effects on midfrontal oscillations and cardiac activity for learning and flexible adjustments

We need to be able to learn new behaviour, but also be capable of changing existing routines, when they start conflicting with our long-term goals. Little is known about to what extent blank-slate learning of new and adjustment of existing behavioural routines rely on different neural and bodily mechanisms. In the current study, participants first acquired novel stimulus-response contingencies, which were subsequently randomly changed to create the need for flexible adjustments. We measured midfrontal theta oscillations via EEG as an indicator of neural conflict processing, as well as heart rate as a proxy of autonomic activity. Participants' trial-wise learning progress was estimated via computation modelling. Theta power and heart rate significantly differed between correct and incorrect trials. Differences between correct and incorrect trials in both neural and cardiac feedback processing were more pronounced for adjustments compared to blank-slate learning. This indicates that both midfrontal and cardiac processing are sensitive to changes in stimulus-response contingencies. Increases in individual learning rates predicted lower impact of performance feedback on midfrontal theta power, but higher impact on heart rate. This suggests that cardiac and midfrontal reactivity are partially reflective of different mechanisms related to feedback learning. Our results shed new light on the role of neural and autonomic mechanisms for learning and behavioural adjustments.


Introduction
Learning new behavioural patterns is necessary to efficiently deal with reoccurring challenges. However, an inflexible reliance on existing response patterns can be detrimental. When changing circumstances render a previously learned behaviour as no longer appropriate, we need to be able to change it. Thus, both the acquisition of new, as well as the adjustment of existing stimulus-response associations are crucial skills in our lives. However, it is still unclear if the initial acquisition and subsequent alteration of response patterns should be seen as the same or as separate cognitive processes. The current study investigates the differences between blank-slate learning of new and adjustment learning of existing stimulus-response associations with respect to their underlying mechanisms in the central and autonomic nervous system.
Both the acquisition of novel and adjustments of existing stimulusresponse associations share important functional similarities. Both are related to reinforcement learning, meaning contexts in which we receive positive or negative feedback for our actions ( Niv, 2009 ;Schiffer et al., 2015 ). It has been shown that reinforcement learning can lead to both the initial formation and subsequent alteration of stimulus-response-ing learning of novel stimulus-response associations, midfrontal theta power has been found to be highest during initial stimulus presentations, and to gradually decrease after the correct S-R associations have been established ( Clarke et al., 2018 ). This suggests, that midfrontal theta power is related to the formation of new S-R patterns.
Midfrontal theta oscillations have also been associated with cognitive control ( Cavanagh and Frank, 2014 ). Stimuli which signal the need to deviate from prepotent action impulses elicit increases in midfrontal theta power Nigbur et al., 2011 ). Moreover, incorrect actions and negative feedback can also lead to increases in theta power ( Cohen, 2011 ;Trujillo and Allen, 2007 ;Valadez and Simons, 2018 ). It has been suggested that theta oscillations following errors facilitate necessary behavioural adjustments for subsequent trials by increasing intercommunication between task-relevant brain areas ( van de Vijver et al., 2018 ;van Driel et al., 2012 ). In support of this assumption, some studies have found that higher theta reactivity after errors partially predicts improved performance in subsequent trials ( Luft et al., 2013 ;van de Vijver et al., 2011 ). Since adjustment of existing response patterns involves both a) the suppression of previously learned, but no longer appropriate action tendencies, and b) the incorporation of feedback to alter previously learned stimulus-response associations, midfrontal theta oscillations are likely to be an important part of engaging in flexible behavioural change ( Cavanagh et al., 2013 ).
To summarise, midfrontal theta oscillations have been identified as an important correlate of both blank-slate learning (where no prior S-R associations exist), as well as behavioural adaption (where prior S-R associations are in conflict with novel task affordances). This might suggest that midfrontal theta oscillations are part of general mechanisms for both blank-slate and adjustment learning. However, to better understand the functional role of neural oscillations in these cognitive processes, it is crucial to directly compare blank-slate and adjustment learning with respect to the potential modulation of midfrontal theta power. Therefore, the current study employed a reinforcement learning tasks consisting of two phases: In an initial blank-slate learning phase, participants acquired novel S-R-O associations. In a subsequent adjustment phase, a random alteration of previous S-R contingencies led to the need to change existing response patterns. We measured neural activation both during stimulus and feedback presentation, to see in how far the need for S-R adjustments affected either the initial processing of the target stimuli or the relevant performance feedback We used computational modelling to estimate participants' learning progress in each phase and correlated individual learning curves with neural midfrontal activation. This allowed us to test directly, if behavioural success in blank-slate and adjustments learning is related to the same or different patterns in oscillatory activity.
Learning and adjustment processes are not only reflected in neural oscillations, but also influence processes related to the autonomic nervous system. Recent studies suggest that a coupling of neural and autonomic processes play a role in flexible behavioural adjustments ( Spruit et al., 2018 ;Ullsperger et al., 2014 ). Cardiovascular activity represents a central part of autonomic functioning. Importantly, it has been shown that cardiac activity is sensitive to many cognitive parameters which also influence midfrontal theta oscillations. First, heart rate, like midfrontal theta power, changes with increased need for cognitive control, with higher control affordances leading to heart beat deceleration ( Fiehler et al., 2004 ). Secondly, like midfrontal theta power, cardiac activity is influenced by performance feedback, with negative compared to positive feedback leading to a slower heart rate ( Crone et al., 2003 ;Kube et al., 2016 ;Wessel et al., 2011 ). Some studies found that populations with lower performance in reinforcement learning paradigms, such as people diagnosed with ADHD, showed lower cardiac reactivity during the presentation of task feedback ( Luman et al., 2007( Luman et al., , 2008. Overall, these results appear to suggest that changes in midfrontal theta power and changes in heart rate are indicative of the same underlying cognitive processes. This potential link is supported by the finding that the anterior cingulate cortex, which is believed to play a role in generating midfrontal theta power, shows strong connectivity with the anterior insula, which is known to be involved in the neural processing of cardiac signals ( Limongi et al., 2013 ;Medford and Critchley, 2010 ). It has been suggested that the integration of cardiac and neural processing plays an important role in processing affective information such as task-relevant feedback ( Gentsch et al., 2019 ;Seth and Friston, 2016 ). However, since neural oscillations and cardiac activity are commonly measured in separate studies, the potential overlap and differentiation between these processes is still largely speculative. Therefore, we also measured heart rate during performance feedback. This allowed us to evaluate in how far midfrontal oscillations and cardiac reactivity differ in their relationship with participants' performance during blank-slate and adjustment learning.
To summarise, the current study employed a reinforcement learning paradigm, in which first new stimulus-response associations had to be learned (blank-slate learning). Subsequently, changes in previously learned stimulus-response contingencies created the need to overwrite existing associations (adjustment learning). We estimated trialwise learning progress via computational modelling and recorded both EEG and heart rate during the task. This allowed us to investigate in how far task success in blank-slate learning and behavioural adjustments depends on the same or different changes in neural midfrontal oscillations, as well as cardiac activity.

Participants
Twenty-four participants (8 males) with a mean age of 25.29 (SD = 4.46) took part in the study for either course credit or a financial reimbursement of 9 Euros per hour.

Measurement setup
EEG was recorded with 65 active electrodes (BrainProducts ActiSnap) and one additional ground electrode, positioned according to the international 10-20 system. The FCz electrode was used as an online reference. ECG was recorded with two bipolar electrodes below the left pectroral muscle and the left clavicle, as well as one ground electrode below the right clavicle. Both EEG and ECG were recorded with a Brain-Vision QuickAmp amplifier, employing a 500 Hz sampling rate and a 0.016 Hz -250 Hz bandpass filter.

Stimuli
For each experimental block, one set of four squares with different colours was randomly generated. For each colour, three numbers between 0 and 255 were randomly chosen as values for red, green, and blue (RGB) colour intensities. Since every colour is defined by these three values, each unique colour can be conceptualised as a position in three-dimensional RGB space. Accordingly, the RGB-values were randomly generated with the following constraints: Relative to all other colours from the same set, the colour of each stimulus had to have a minimum distance of 200 steps in RGB space. Additionally, the colour of each stimulus needed to have a minimum distance of 120 steps relative to all stimulus colours from the directly preceding set. These constraints ensured that no item was too similar in colour compared to the stimuli from the same or the preceding learning set.
A white check mark and a white X were used as symbols for positive and negative feedback, respectively. Every stimulus in this experiment had the same size (visual angle: 1.2°x 1.2°) and was always presented in the middle of the screen. All stimuli were shown on a grey background on a 24-in. display with a distance of 90 cm from the participants. Note. Overview of experimental design: a) Trial structure with presentation durations in brackets; b) example assignment of stimulus-response contingencies in one block for the two phases. Note that actual stimuli colours, initial assignments for blank-slate learning, and changes in assignments for the adjustment learning phase were randomly generated for each experimental block. Fig. 1 shows an overview of the experimental design. The procedure consisted of 14 blocks, each employing a new set of four distinct target stimuli. For every block, a 1:1-assignment between the four target stimuli and the four response keys (F1, F2, F11, and F12 on a reversed keyboard) was randomly generated. No prior information about the correct stimulus-response assignments were provided to the participants. Participants were instructed to press one of the four keys whenever a stimulus would appear and use the subsequent feedback to infer the correct stimulus-response associations in every block. Additionally, participants were informed that the stimulus-response assignments might change over the course of a block.

Procedure
Every block consisted of a blank-slate phase and an adjustment phase. The blank-slate phase in each block consisted of 48 trials, made up of 12 repetitions of all four stimuli in randomised order. Every trial consisted of an action period, a delay period, and a feedback period. The action period started with the presentation of one target stimulus for 200 ms, followed by a fixation cross for 800 ms. Beginning with stimulus onset, participants had 1000 ms to press one of four keys on the keyboard. This was followed by a delay period of 1200 ms, where only the fixation cross was on the screen. The delay period helped to ensure that action-and feedback-related neural activity did not overlap. After the delay period, participants received either positive or negative feedback for 200 ms, depending on if their key press matched the stimulus-response assignment for this block. In case participants had not responded within the initial 1000 ms action period, a clock symbol was shown to encourage faster responses. The feedback symbol was followed by a fixation cross for 800 ms. The intertrial interval varied randomly between 1500 and 1700 ms.
During the adjustment phase the same task procedure with the same stimuli as in the blank-slate phase was employed. Importantly, for two out of the four stimuli the stimulus-response assignments were randomly switched (e.g., Stimulus 1 = Button 1; Stimulus 2 = Button 2 = > Stimulus 1 = Button 2; Stimulus 2 = Button 1). Participants received no explicit information about which assignment would change in any block. Additionally, the transition between blank-slate and adjustment phase was not marked by any kind of explicit notification. The adjustment phase consisted of 10 repetitions of the four target stimuli in randomised order, resulting in 40 trials per block. The adjustment phase was slightly shorter than the blank-slate phase, since behavioural data of a pilot study indicated that the majority of the adjustment process took place within this time frame. After each block participants received feedback about the percentage of correct responses and their mean response time.

Behavioural analysis
In order to inspect the temporal changes of error rates over each block, we summarised the binary performance of each trial (correct/incorrect) into time bins of eight trials each, meaning two repetitions of the same stimulus per bin. Additionally, based on individual trial data, we estimated individual learning progress via a computational model for reinforcement learning ( Smith et al., 2004 ;s. also Clarke et al., 2018 ). In a nutshell, this model estimates for each trial a continuous value between 0 and 1 which expresses the likelihood to answer correctly at that trial, based on previous performance of the participant in the same block. Thus, higher values indicate increased success in the overall learning progress. For estimation of the learning curves, the model uses state-space smoothing and an expectation maximisation algorithm to estimate the unobservable learning state and its variance based on observable response data. The probability of correct response on a given trial is then the mode of the Gaussian probability density function fitted for that trial with the estimated learning state and variance as parameters. Models were calculated with a sigma parameter of 0.05. We calculated learning curves separately for the blank-slate and adjustment phase of each block, since reinforcement learning models assume relatively stable stimulus-response contingencies throughout the learning process. For the background probability of the blank-slate phase, we chose 0.25, since without prior knowledge there is 25% chance to choose the correct response amongst the four possible buttons. For the adjustment phase, the background probability was set to half of the final learning progress estimate from the preceding blank-slate phase, taking into account that half of the stimulus-response associations were ran- Fig. 2. Topographical plots of midfrontal theta power for each condition. Note. Topographical plots show baseline-corrected oscillatory power in the theta range (4-7 Hz) during feedback presentation (2.4-2.8 s). Stars indicate electrodes employed in the main analysis.
domly changed at the beginning of the adjustment phase. To account for cases where the state-space smoothing procedure could not estimate a sufficient fit with the data, we excluded all blocks for which the algorithm did not converge after 1000 iterations. This led to the average exclusion of 2.62% (SD = 4.66) of all trials.

EEG analysis
EEG data was filtered (high-pass: 1 Hz, low-pass: 40 Hz) and re-referenced to the average of all electrodes. For one participant, two exceedingly noise electrodes were removed and subsequently replaced with spherical spline interpolations using the Fieldtrip function ft_channelrepair ( Oostenveld et al., 2011 ). Data was cut into trial epochs ranging from − 1500 ms to + 4500 ms relative to stimulus onset on each trial. We used independent component analysis (ICA) as implemented in Fieldtrip to identify and remove components caused by eye blinking or other artefacts clearly unrelated to neural activity. This led to the removal of 1-4 (mean = 1.79) components per participant. Subsequently, we removed all trials in which activity exceeded + − 100 μV to account for noise artefacts, resulting in the average exclusion rate of 3.75% (SD = 3.99).
To improve spatial specificity of the data, we employed a Laplacian filter via the Fieldtrip function ft_scalpcurrentdensity with the spline method and a polynomial degree of 10. Subsequently, oscillatory activity was calculated between 2 and 40 Hz in 1-Hz steps using complex Morlet waveletes the number of cycles linearly increasing from 3 to 8. For analysis of condition-wise activity, we averaged trials separately for each phase (blank-slate/adjustment) trials according to participants' performance (correct/incorrect). Condition averages were baseline-corrected via decibel conversion relative to the pre-trial mean of all trials between − 400 to − 100 ms.
Most previous studies concerning midfrontal theta oscillations reported theta peaks around the FCz electrode (e.g., Cavanagh et al., 2012 ;Cohen and Donner, 2013 ;Vissers et al., 2018 ). Based on these findings, we averaged the electrodes FCz/FC1/FC2 for the analysis of midfrontal oscillations. As can be seen from topographical plots, this area shows a peak in theta activity ( Fig. 2 ). To aid comparison between oscillatory effects and event-related potentials, we also present an analogous analysis of event-related potentials for the same electrodes in the Supplemental Material.
For identifying significant differences between conditions, we employed cluster-based permutation as implemented in the Fieldtrip function ft_freqstatistics ( Maris and Oostenveld, 2007 ). First, this function calculated for each time-frequency point a t -value for the difference between target conditions. T-values with the same sign which surpassed a predefined threshold of p < .025 were grouped together into posi-tive or negative clusters. The absolute sum of each cluster's individual t-values was defined as the cluster's weight. The weight of each cluster was employed as the sole criterion for determining that cluster's significance. For this purpose, cluster-based permutation estimates the likelihood of each cluster's weight in the actual data in comparison to random permutations of the data set. More specifically, the assignment of each participant's time-frequency data was randomly shuffled between the conditions for 2000 iterations, and the cluster weights for the shuffled data were calculated analogous to the actual data. The maximum cluster weight of each iteration was retained. If the presence of a cluster in the actual data presented systematic condition differences that were not due to chance, its (or a higher) weight should be less likely to occur during randomised iterations. Accordingly, the p -value for each cluster is defined as the proportion of random iterations that resulted in a higher cluster weight. For each significant cluster, we report the cluster weight, p -value, and its start and end time.
In order to test for the relationship between trial-specific learning progress and neural activation, we computed the correlations between the oscillatory activity and the estimated learning progress score on the same trial for each time-frequency point in every condition, separately for each participant. Both oscillatory activity and behavioural data was rank-transformed within each participant prior to the calculations of correlations. Rank transformation lowers the influence of potential outliers on correlation estimates ( Cohen, 2014b ). This procedure generated for each participant and condition one separate electrodetime-frequency map of correlation coefficients. Using the previously described cluster-based permutation approach, we tested the powerlearning correlation maps of all participants in each condition for clusters, which on average significantly differed from 0, and therefore represented statistically significant correlations between oscillatory power and computationally estimated learning progress. For visualisation of participant-wise correlations maps, we averaged for each condition the correlation coefficients over all participants, resulting in one heatmap of average correlation strength for each condition.

Heart beat analysis
ECG data was processed with the same filter settings as EEG data. Additionally, all trials marked as noisy during EEG exclusion were also removed from the ECG. For analysis of the heart rate after feedback, we extracted the 2-second interval starting from feedback onset of each trial. We also report the results for heart rate during the action-and delay period prior to feedback as Supplemental Material. We used an implementation of the Pan-Tompkins algorithm in the BioSigKit Toolbox for detecting R-peaks for each trial and calculated the latency between R-peaks within the time window of interest ( Sedghamiz, 2018 ). Mean latencies between peaks for each trial were converted to beats per minutes. To account for outliers in the R-peak identification we excluded all trials from further analysis, where the heart rate deviated more than 3 standard deviations from the participant's mean. Trial-wise heart rate estimates were averaged for each condition. Additionally, heart ratevalues of individual trials were retained for assessing the trial-specific relation between cardiac activity and learning progress.
Average heart beat activity between conditions was compared with a repeated measures ANOVA with the factors PHASE (blankslate/adjustment) and PERFORMANCE (correct/incorrect). Subsequent t -test were Bonferroni-corrected. We report 2 and Cohen's d as effect sizes. We also report Bayes Factors (BF), which estimate the ratio of evidence for the alternative hypothesis (meaning here a difference between conditions) relative to evidence for the null hypothesis. While Bayes factors are not bound to a specific cutoff value, it has been suggested that BF > 3 could indicate positive evidence for the alternative hypothesis, while BF < 0.33 could be seen as positive evidence for the null hypothesis ( Jarosz and Wiley, 2014 ;Van de Schoot et al., 2014 ). For the calculation of all tests we used the R packages ez, rstatix , and BayesFactor .
In order to evaluate the relation between learning progress and heart rate on individual trials, we employed hierarchical linear regression. Compared to traditional non-hierarchical regression, this method has the advantage that it allows to model the trial-wise relation between relevant variables, while still taking into account differences between participants ( Hayes, 2006 ;Richter, 2006 ). For this analysis, heart rate data was mean-centred by subtracting each participant's average heart rate from the value of each trial ( Hofmann and Gavin, 1998 ). Accordingly, the resulting values express the change on each trial due to the current trial's feedback. Subsequently, for each condition separately, we estimated a regression model with each trial's heart rate as a dependant variable, and the current learning rate on this trial as a fixedeffect predictor. Each model included the intercept and learning rate as additional random error terms nested in participants as a cluster variable. This allowed for differences between participants in intercept and slope of the estimated regression line and therefore for differences in participant-specific relationships between learning progress and heart rate ( Gelman and Hill, 2006 ). We estimated the regressions models in R via the package lme4 , using the R formula heart beat ~learning progress + (learning progress|participant) . Since inspection of model residuals via scatter plots indicated deviations from the normality distribution, we calculated bootstrapped confidence intervals for the fixed factor learning rate using the bootMer function with 1000 simulations. Since this procedure relies on resampling values from the original data set, bootstrapped confidence intervals are less reliant on the assumption of normality than non-bootstrapped estimates ( Field and Wilcox, 2017 ). Confidence intervals for the factor learning rate which do not contain 0 indicate a significant relationship between learning progress and heart rate. Moreover, we calculated p-values for the regression coefficient of learning progress by performing bootstrapped model comparison as implemented in the function PBmodcomp of the R package pbkrtest ( Halekoh and Højsgaard, 2014 ). We compared the full model as stated above with a baseline model using exactly the same parameters, but without learning progress as a fixed factor (in R notation: heart beat 1 + (learning progress|participant) ). A significant result of this test means that the inclusion of learning progress as a predictor significantly improves the model, and therefore indicates a general relation between learning rate and heart rate. Fig. 3 shows changes in error rate averaged over all blocks. To evaluate participants' task performance, we compared error rates between the beginning and end of the blank-slate and adjustment phase, defined as the first and last time bin in each phase (cf. Table 1 ). There was a significant decrease in errors from the beginning to the end of the blank-slate phase, t (23) = 26.43, p < .001, d = 5.40, BF > 10 6 . This shows that participants on average successfully learned the initial stimulus-response associations. Errors increased significantly from the end of the blank-slate phase to the beginning of the adjustment phase, t (23) = − 18.05, p < .001, d = 3.68, BF ⟩ 10 6 . From the beginning to the end of the adjustment phase, errors again significantly decreased, t (23) = 12.66, p < .001, d = 2.58, BF ⟩ 10 6 . This indicates that the changes in stimulus-response associations interfered with performance, but that participants were able to change their response patterns during the adjustment phase. Notably, error rates at the end of the adjustment phase were still higher than at the end of the blank-slate phase, t (23) = 4.48, p < .001, d = 0.92, BF = 169.16. This also remained true, when we controlled for the slighter longer duration of the blank-slate phase by comparing the average error rates between the fifth time bin of the adjustment phase (trial 81-88) with the fifth time bin of the blankslate phase (trial 33-40), t (23) = 4.22, p < .001, d = 0.86, BF = 94.43. Thus, the alterations in stimulus-response contingencies during the adjustment phase had a prolonged negative effect on performance. Fig. 4 shows averages of the trial-wise learning progress as estimated via computational approximation from the error rates (also cf. Table 1 ).   Learning progress significantly increased from the beginning to the end of blank-slate phase, t(23) = 53.77, p < .001, d = 10.98, BF > 10 6 , and decreased at the beginning of the adjustment phase compared to the end of the blank-slate phase, t (23) = − 64.00, p < .001, d = 13.06, BF ⟩ 10 6 . Learning progress again increased towards the end compared to the end of the adjustment phase, t (23) = 22.43, p < .001, d = 4.58, BF > 10 6 . Overall, the estimates of learning progress mirror the changes in error rates during the blank-slate and adjustment phase. Fig. 5 shows average response times. Response times significantly increased from the beginning to the end of the blank-slate phase, t (23) = − 3.75, p < .001, d = 0.77, BF = 33.96. This is likely to reflect the fact that at the beginning of this phase participants could just select responses at random, since they had no prior information about the correct stimulus-response mappings. Conversely, later in the phase, response selection was more likely to involve memory retrieval concerning the appropriate response. From the end of the blank-slate phase to the beginning of the adjustment phase, response times significantly increased, t (23) = − 3.31, p < .00, d = 0.68, BF = 13.27. This could indicate increased response caution due to the changes in previously learned stimulus-response mappings. There was no difference in response times between the start and the end of the adjustment phase, t (23) = − 0.02, p = .99, d = 0.003, BF = 0.21. Fig. 6 shows average midfrontal oscillatory activity, separately for correct and incorrect trials in both phases, with clusters of significant differences highlighted. For blank-slate learning, incorrect compared to correct trials showed a cluster of significantly lower activ- ity with a negative peak from approximately 7 up to 20 Hz during the action period and delay period, maximum duration: 0 s-2.44 s, t weight = − 33,719.73, p < .001, as well as during feedback presentation, 2.52 s-3.20 s, t weight = − 8222, p = .008. This indicates a decrease in alpha power (commonly defined as activity between 8 -14 Hz), as well as beta power (commonly defined between 15 and 30 Hz). Additionally, incorrect compared to correct trials during blank-slate learning showed significantly higher power in the delta-theta range during the feedback period, 2.35 s-3.2 s, t weight = 3918.77, p = .026.

Midfrontal oscillations
For adjustment learning, a similar pattern emerged, with significantly lower alpha activity for incorrect compared to correct trials during both action and delay period, 0 s-2.42 s, t weight = − 29,572.90, p < .001, as well as during the feedback period, 2.65 s-3.2 s, t weight = − 6489.81, p = .007. Incorrect compared to correct trials during adjustment learning were marked by significant higher delta-theta activity both during the action period, 0.52 s -1.24 s, t weight = 2416.62, p = .04, as well as during the feedback period, 2.27 s-3.2 s, t weight = 6027.85, p = 0.006.
Permutation analysis also revealed differences in oscillatory activity between blank-slate learning and adjustment learning. For correct trials, the adjustment phase compared to the blank-slate phase showed a cluster of significantly higher oscillatory activity, which was mostly confined to the alpha/beta range in the delay period and the feedback presentation, 1.01 s-3.04 s, t weight = 14,401.69, p = .001. For incorrect trials, adjustment learning also showed a cluster of significantly higher oscillatory power, 0.52 s-3.2 s, t weight = 27,904.46, p < .001. As for correct trials, this cluster included higher alpha/beta power for adjustment learning compared to blank-slate learning. Additionally, this positive cluster entailed periods of higher activity in the delta/theta range (2-8 Hz) during both the action period (approximately from 0.5 to 1.5 s) and the feedback period (from 2.5 to 3 s). This indicates stronger midfrontal oscillatory activity for negative trials during the adjustment phase compared to blank-slate learning, which occurred in the delta/theta as well as in the alpha range. To conclude, both during blank-slate and adjustment learning, incorrect compared to correct trials were marked by stronger midfrontal theta responses during feedback. Additionally, during adjustment learning, incorrect compared to correct trials also showed significantly higher theta responses during the action phase. Incorrect compared to correct trials in both phases were also marked by a prolonged pattern of lower alpha/beta activity throughout most of the trial duration. Adjustment learning compared to blank-slate learning led to increased midfrontal oscillatory power over a broad frequency range. Particularly for incorrect trials, the increase for adjustment compared to blank-slate learning also included higher activity in the theta range. Fig. 7 shows the averages of trial-wise correlations between midfrontal oscillatory power and learning progress, as estimated via statespace modelling. Permutation analysis revealed several clusters of significant correlations. For correct trials during blank-slate learning, we found a positive correlation between learning progress and oscillatory power throughout the whole trial, which was mostly confined to the alpha/beta range, 0 s-3.2 s, t weight = 48,829.44, p < .001. For incorrect trials during blank-slate learning, there was a positive correlation be- tween learning progress and oscillatory power during action and delay period, 0 s-2.34 s, t weight = 29,305.26, p < .001, as well as during the feedback period, 2.42 s-3.2 s, t weight = 5318.19, p = .011. Notably, the positive cluster during the action period for incorrect trials spanned a wide frequency range which included the alpha-range, as well as the delta/theta range. This indicates a positive correlation between learning progress and midfrontal power in both the alpha and delta-theta frequencies during action execution for incorrect trials. Thus, for blankslate learning, increases in learning progress, indicating higher familiarity with the current learning set, led to higher oscillatory power over a relatively broad frequency range. While for correct trials this mostly entailed increases in the alpha/beta range, incorrect trials showed a wider increase reaching into the theta range during the action period.

Brain-behaviour correlations
For correct trials during adjustment learning, there were three closely adjacent positive clusters in the alpha /beta range, indicating significant positive correlations between learning progress and alpha/beta power in the action period, 0 s-0.35 s, t weight = 2080.62, p = .037, the delay period, 0.35 s-2.5 s, t weight = 32,570.32, p < .001, as well as the feedback phase, 2.51 s-3.2 s, t weight = 8793.44, p = .007. Additionally, we found a negative cluster in the delta/theta range during the feedback period, 2.32 s -2.92 s, t weight = − 2066.58, p = .038. This indicates that with increased learning progress, positive feedback had less impact on midfrontal theta power during adjustment learning.
For incorrect trials during adjustment learning, we found three adjacent positive clusters in the alpha/beta range, indicating positive correlations between learning progress and alpha/beta power during the action period, 0 s-1.21 s, t weight = 8824.38, p < .001, the delay period, 1.3 s-2.26 s, t weight = 6269.17, p < .001, and the feedback phase, 2.54 s-3.2 s, t weight = 5924.08, p < .001. Notably, the positive cluster during the action period extended into the delta/theta range, suggesting a positive correlation between learning progress and both the midfrontal alpha as well as delta/theta response during response selection. As for correct trials, there was a significant negative cluster in the theta range during the feedback period, 2.37 s-3.16 s, t weight = − 1529.33, p = .04, indicating that increased learning progress during adjustment learning was associated with decreasing midfrontal delta/theta reactivity towards negative feedback.
To conclude, we found that midfrontal oscillatory reactivity correlated with participants' trial-specific learning progress. During both blank-slate learning and adjustment learning, higher learning progress was related to higher alpha/beta power. Additionally, during both phases higher learning progress lead to an increased midfrontal theta response during the initial presentation of incorrect items. Only during adjustment learning, the impact of both positive and negative feedback on midfrontal theta power significantly diminished with increasing learning progress.

Heart rate
Cardiac activity during feedback processing was analysed by comparing the condition-wise averages in heart rate of each participant (cf. Fig. 8 ). A PHASE (blank-slate/adjustment) x PERFORMANCE (correct/incorrect) ANOVA showed no significant main effect of PHASE, F < 1, p = .31, BF = 0.25, a main effect of PERFORMANCE, F (1,23) = 97.03, p < .001, 2 = 0.81, BF > 10 6 , and a significant PHASE x PERFORMANCE interaction, F (1,23) = 21.64, p < .001, 2 = 0.48, BF = 4.67. For blank- Fig. 8. Heart rate after feedback for correct and incorrect trials during blank-slate and adjustment learning. Note. Grey lines show changes in heart rates of individual participants.

Table 2
Parameter estimates of the multilevel model for the relationship between learning progress and heart rate during the feedback period. Note. Table shows parameter estimates of hierarchical regression with trial-wise learning progress as predictor and heart rate as outcome variable for each experimental condition.
slate learning, incorrect trials (mean = 71.27, SD = 7.67) compared to correct trials (mean = 71.61, SD = 7.81) showed a significantly lower heart rate, t (23) = − 6.95, p < .001, d = 1.42, BF = 38,493.95. During adjustment learning, heart rate was also significantly lower for incorrect trials, (mean = 70.99, SD = 7.78), and correct trials, (mean = 72.02, SD = 7.75), t (23) = − 10.87, p < .001, d = 2.22, BF > 10 6 . The heart rate for incorrect trials did not differ between blank-slate and adjustment learning, t (23) = 1.15, p = .26, d = 0.23, BF = 0.39. However, the heart rate after feedback for correct trials was significantly higher during adjustment learning compared to blank-slate learning, t (23) = 3.65, p = .01, d = 0.75, BF = 27.46. Overall, performance feedback for both blank-slate and adjustment learning had an impact on cardiac activity, with feedback for correct compared to incorrect trials leading to a higher heart rate. Impact on heart rate was stronger during correct trials in the adjustment learning phase compared to correct trials in the blank-slate learning phase.

Heart rate-behaviour correlations
We estimated the relationships between learning progress and heart rate by using hierarchical regression based on the trial-wise data of each participant. Thus, the results indicate how changes in learning progress are related to changes in heart rate on individual trials (cf. Fig. 9 and Table 2 ). For blank-slate learning, we found that for incorrect trials higher learning progress was a significant negative predictor of heart rate, b = − 2.37, ci = [ − 3.69, − 1.06], p = .002. Conversely, for correct trials during blank-slate learning, learning progress was a marginally significant positive predictor of heart rate, b = 1.84, ci = [0.04, 3.54], p = .05. For adjustment learning, we found no significant relation between learning progress and heart rate for incorrect trials, b = − 0.44, ci = [ − 2.96, 2.05], p = .75. For correct trials during adjustment learning, learning progress was a significant positive predictor of heart rate, b = 2.65, ci = [1.03, 4.23], p = .004. To summarise, increased learning rate tended to predict a higher heart rate in response to positive feedback, but a lower heart rate in response to negative feedback. This relationship was not present during incorrect trials in the adjustment phase. Overall, this indicates that with increasing learning progress the difference in cardiac reactivity between positive and negative feedback increased, leading to stronger heart rate acceleration during correct trials and stronger heart rate deceleration during incorrect trials.

Discussion
The current study compared blank-slate learning of novel stimulusresponse associations with behavioural adjustments where previously learned associations had to be overwritten. For both the learning of novel associations and behavioural adjustments, incorrect compared to correct trials lead to increased midfrontal theta power during performance feedback, as well as decreased alpha/beta power during both stimulus and feedback presentation. Additionally, in both types of learning, feedback for incorrect compared to correct trials lead to significant heart rate deceleration. Importantly, particularly during feedback presentations, adjustment learning compared to blank-slate learning showed higher midfrontal theta reactivity for incorrect trials, as well as a higher heart rate for correct trials. Thus, during the adjustment of previously learned stimulus-response associations, the differences in neural and cardiac reactivity between positive and negative feedback was more pronounced. Additionally, we found that higher learning progress predicted lower midfrontal theta reactivity during adjustment learning. In contrasts, higher learning progress was related to increased differences in heart rate reactivity between positive and negative feedback. This suggests a partial differentiation in the functional relevance of cardiac and midfrontal oscillatory activity during novel association learning and behavioural adjustments.
It is noteworthy that significant differences in oscillatory activity between incorrect and correct trials emerged early during the course of a trial and before participants received feedback about their performance. For adjustment learning, the mere presentation of incorrect items led to a broadband increase in oscillatory power, which included increases in both the alpha/beta range, as well as midfrontal theta power. Incorrect responses during adjustment learning are most likely to occur for stim-uli for which participants have not yet successfully overwritten their old response patterns. Thus, the initial spike in theta power for incorrect items indicates an early identification or 'tagging' of more challenging stimuli. The significant increase in theta power during stimulus presentations in the adjustment phase is consistent with the theory that midfrontal theta oscillations partly reflect a neural conflict detection signal ( Cavanagh et al., 2012 ;Nigbur et al., 2011 ).
In many traditional cognitive control paradigms, such as flanker or Stroop tasks, conflicts are induced by the simultaneous presentation of two stimulus features which are associated with incompatible responses ( Cohen, 2014a ;Ridderinkhof et al., 2011 ). In contrast, the stimuli in our task were not inherently associated with any specific response pattern. Instead, the conflict during adjustment learning was induced via participants' own learning history. In accordance with this interpretation, we found that increased learning progress, as estimated via computational modelling, predicted stronger midfrontal oscillatory power during stimulus presentations for items on incorrect trials. This suggests that with increased learning progress participants became better in quickly identifying the items for which correct stimulus-response associations were not yet formed. This finding supports the theory that the regulation of neural resources for cognitive control is modulated via associative learning mechanisms, such that participants can learn over time to associate specific stimulus or context features with an enhanced need for top-down control ( Braem et al., 2020 ;Egner, 2014 ;Verguts and Notebaert, 2009 ).
In accordance with previous studies of error processing, we found increased midfrontal theta power during feedback presentations for incorrect compared to correct trials ( Valadez and Simons, 2018 ;van de Vijver et al., 2018 ;van Driel et al., 2012 ). Importantly, our findings show that midfrontal theta reactivity, as well as midfrontal alpha activation, for negative feedback was stronger during adjustment learning than during blank-slate learning. Since the overall number, as well as the relative complexity of to-be-remembered stimulus-response associations did not differ between the two types of learning in our task, this could indicate an increase in the demand for processing resources during learning adjustments. Previous studies have shown that midfrontal oscillations are sensitive to expectation violations ( Cavanagh et al., 2012 ;Hajihosseini and Holroyd, 2013 ;Harper et al., 2017 ). During the learning of novel stimulus-response associations, participants do not yet have strong expectations about the feedback they might receive for their responses. Conversely, adjustment learning is triggered by a mismatch between expected outcomes and actual feedback. Thus, higher oscillatory reactivity towards feedback for adjustment compared to blankslate learning could reflect increased prediction error. In line with this interpretation, we found that during adjustment learning theta reactivity towards feedback significantly decreased with increasing learning progress. This might reflect the updating of expectations about the feedback during behavioural adjustments. While the changes in stimulus response contingencies first led to heightened prediction error, participants changed their internal stimulus-response associations over the course of adjustment process, which resulted in lower midfrontal impact of received feedback. Overall, our findings indicate that differences in neural processing of blank-slate learning and behavioural adjustments are driven by the increased expectation violations during the adjustment process. Successful behavioural adjustments are characterised by the updating of internal S-R contingency models to minimise ensuing prediction error.
Similarly to midfrontal neural activity, and in line with research suggesting a crucial role of autonomic responses in error processing cardiac responses were influenced by the type of feedback ( Ullsperger et al., 2010 ). Feedback for incorrect compared to correct trials led to a lower heart rate. This finding is consistent with previous studies suggesting stronger heart beat deceleration after the presentation of negative information ( Crone et al., 2003 ;Kube et al., 2016 ). As for midfrontal oscillations, differences in heart rate between negative and positive feedback were stronger during adjustment learning than during blank-slate learning. However, midfrontal power and cardiac activity markedly differed in their relation to participants' individual learning rates. Increases in learning progress were associated with a decrease of midfrontal reactivity, meaning that the differences between neural reactivity for positive and negative feedback became less pronounced. Conversely, increased learning progress led to higher heart rate after positive feedback and lower heart rate after negative feedback, meaning that the differences in cardiac reactivity between positive and negative feedback became more pronounced. Thus, neural and cardiac measures showed opposite patterns of reactivity over the course of the learning process.
The increases in cardiac reactivity over the learning period could indicate increases in affective relevance of performance feedback. Heart rate reactivity towards affective stimuli has been shown to depend on subjective emotional intensity ( Bradley et al., 2001 ;Lang et al., 1993 ;Poli et al., 2007 ). At the beginning of both blank-slate and adjustment learning, participants have low control over the feedback they receive, since they cannot know in advance the currently correct stimulusresponse contingencies. Instead, during the initial trials of each phase participants have to rely more strongly on trial-and-error learning, and hence positive and negative feedback is mostly due to chance. Accordingly, initial feedback is less reflective of participants' personal ability to perform successfully at the task. In contrast, feedback during the advanced period of each learning phase more strongly depends on participants' individual ability to learn from previous trials. Thus, feedback which occurs later in the process, is more likely to be interpreted by participants as reflective of their own performance, and therefore more self-relevant. Overall, the increase in impact on performance feedback on cardiac reactivity with higher learning progress could indicate that feedback towards the end of the learning process has more affective value for participants.
Our finding that learning progress partly predicts the impact of performance feedback on cardiac activity, contributes to a better understanding of the role of autonomic reactivity in self-regulation. It has been suggested that the integration of cortical activity with information for the autonomous nervous system plays an important role in feedback processing, for example for the regulation of awareness for our own actions and conscious perception of errors ( Tallon-Baudry et al., 2018 ;Ullsperger et al., 2014 ;Wessel et al., 2011 ). However, to fully understand in how far autonomic activity is mainly reflective of neural learning processes, or rather might play a role in shaping self-regulation via bottom-up feedback, further research is needed concerning the interplay between autonomic and central processing during action adjustments ( Marshall et al., 2018 ;Seth et al., 2012 ).
While the main focus of our study were midfrontal theta oscillations, our results showed modulations in broad range of frequencies beyond the theta range. Most prominent were effects of blank-slate learning and behavioural adjustments on oscillations in the alpha/beta range. In line with previous studies, negative compared to positive feedback led to lower alpha/beta power during stimulus presentation, delay period and feedback ( Clarke et al., 2018 ;Khader et al., 2010 ). It has been suggested that heightened alpha power shields cortical regions against external interference, which might be particularly beneficial during the retention of task-relevant information ( Bonnefond and Jensen, 2012 ;Clayton et al., 2017 ;Klimesch, 2012 ). In contrast, lower alpha power is indicative of increased attention towards external stimuli Van Diepen et al., 2019 ). Error trials, which are more likely to consists of items for which no sufficiently reliable stimulus-response association has been formed yet, increase the need to attend towards external information in order to learn the correct responses ( van Driel et al., 2012 ;Wessel, 2017 ). Thus, the differences in alpha activity between correct and incorrect trials in the current study could indicate differences between internal and external information processing ( Hanslmayr et al., 2011 ).
While we investigated the modulation of oscillatory power during feedback learning, we also observed task-related differences in eventrelated potentials (ERPs) with a similar spatiotemporal distribution as midfrontal theta oscillations (see Supplemental Material). This is in line with previous studies which related the processing of errors and behavioural conflicts to ERPs, such as the P300 and feedback-related negativity ( Huster et al., 2013 ;Wessel et al., 2011 ). The relation between conflict-related ERPs and oscillatory effects is as yet unclear. While some authors suggest that changes in midfrontal theta power can be seen as a by-product of spikes in event-related activity ( Wang and Ding, 2011 ), others suggest that ERPs might arise due to sudden changes in neural oscillations ( Barry, 2009 ;Harper et al., 2014 , Truijillo andAllen, 2007 ). Event-related and oscillatory effects during conflict and feedback processing are likely to at least partially stem from overlapping neural generators such as the anterior cingulate cortex ( Cavanagh and Frank, 2014 ;Huster et al., 2013 ). Further research is needed to determine in how far ERP and oscillatory correlates of behavioural conflicts can be dissociated with respect to their underlying functions and related behavioural outcomes.
When considering the results of the current study, it is important to note some limitations in our design. First, by changing stimulusresponse contingencies in the adjustment phase, we aimed at identifying neural and cardiac processes related to flexible behavioural changes. However, it would be wrong to assume that errors during adjustment learning were solely driven by the changes in stimulus-response associations. The adjustment phase most likely also included errors which are not specific to adjustment learning, such as the forgetting of pre-viously learned, but still valid stimulus-response mappings, as well as attentional slips in the latter part of the learning phase. Thus, while the remapping of stimulus-response contingencies was clearly related to modulations of neural and cardiac reactivity, these effects should not be seen as a pure measure of internal readjustment processes. Therefore, it remains an important challenge for future studies to more clearly distinguish between neural processes related to flexible adjustments and other cognitive mechanisms involved in feedback learning.
As is common in learning feedback paradigms, learning progress was highly time-dependant, with a higher number of learning trials leading to an increased success rate. As a result, our data cannot unambiguously distinguish between the effects of learning progress and timedependant effects not directly related to learning, such as habituation due to repeated stimulus presentations (but see Supplemental Material for a comparison of early and late learning trials). Future studies could probe this distinction by including a control condition where participants repeatedly respond to stimuli without obtaining feedback which allows learning stable stimulus-response associations.
Another limitation of our study is the focus on within-subject differences between correct and incorrect responses. This leaves open in how far cardiac and midfrontal neural oscillations can be linked to interindividual variability in learning success. While we detected mostly strong effects of our experimental manipulation within subjects, the sample size in this study would most likely not allow for reliable between-subject comparisons between more and less successful participants. To substantiate the link between individual learning progress and either cardiac or midfrontal reactivity, it would be important in future studies to compare participants who differ in their learning performance with larger sample sizes.
To summarise, our study shows that both the blank-slate learning of new associations, as well as the readjustment of existing information are characterised by significant changes in midfrontal oscillations and cardiac activity. Neural and cardiac reactivity differed between correct and incorrect trials. Particularly during learning feedback, differences between correct and incorrect trials in midfrontal theta power and heart rate were more pronounced for adjustment compared to blankslate learning. This could highlight the role of predictive processing for learning processes, and the need to overcome cognitive conflicts based on a mismatch between previously learned and currently relevant information during behavioural adjustments. Moreover, we found that increased learning progress leads to lower impact of performance feedback on midfrontal theta power, but higher impact on cardiac reactivity. Theories of reinforcement learning often assume that feedback can be either classified by its information value (i.e., its predictability), or its affective value (i.e., its emotional impact; Hajihosseini and Holroyd 2013 ;Cavanagh and Frank 2014 ). Overall, our results suggest that while midfrontal theta oscillations might be more sensitive towards expectation violations, cardiac reactivity is more likely to be influenced by the affective value and self-relevance of performance feedback.

Funding
This work was supported by a grant to AG and SSB by the German Research Foundation (DFG) -project number 402781060.

Data accessibility
Data and materials of this study are archived online at https://osf.io/cndwk/?view_only = 938ea4652dee4f3b885e8e55d47c8f69

Declaration of Competing Interest
The authors declare no conflict of interest.