Risk prediction error signaling: A two-component response?

Organisms use rewards to navigate and adapt to (uncertain) environments. Error-based learning about rewards is supported by the dopaminergic system, which is thought to signal reward prediction errors to make adjustments to past predictions. More recently, the phasic dopamine response was suggested to have two components: the first rapid component is thought to signal the detection of a potentially rewarding stimulus; the second, slightly later component characterizes the stimulus by its reward prediction error. Error-based learning signals have also been found for risk. However, whether the neural generators of these signals employ a two-component coding scheme like the dopaminergic system is unknown. Here, using human high density EEG, we ask whether risk learning, or more generally speaking surprise-based learning under uncertainty, is similarly comprised of two temporally dissociable components. Using a simple card game, we show that the risk prediction error is reflected in the amplitude of the P3b component. This P3b modulation is preceded by an earlier component, that is modulated by the stimulus salience. Source analyses are compatible with the idea that both the early salience signal and the later risk prediction error signal are generated in insular, frontal, and temporal cortex. The identified sources are parts of the risk processing network that receives input from noradrenergic cells in the locus coeruleus. Finally, the P3b amplitude modulation is mirrored by an analogous modulation of pupil size, which is consistent with the idea that both the P3b and pupil size indirectly reflect locus coeruleus activity.


Introduction
Reward is crucial to adapt to constantly changing environments. Reward-based learning is well captured by reinforcement learning models (Sutton and Barto, 1998) that use the reward prediction error as a learning signal. The reward prediction error is the difference between the actual and expected reward. The activity of midbrain dopaminergic neurons shows a remarkable correlation with the reward prediction error (see Schultz, 2015, for a review). It was recently discovered that the phasic dopamine response is comprised of two components (see Schultz, 2016, for a review). First, the sensory component signals the salience of a stimulus, independent of its reward value (Day et al., 2007;Kobayashi and Schultz, 2014). The sensory component varies with the physical salience of the stimulus (e.g., its contrast, luminance, loudness, etc.), its novelty salience, and its surprise salience (Horvitz et al., 1997). Second, the value component is sensitive to the motivational salience of the stimulus, e.g., the reward prediction error (Waelti et al., 2001).
We previously argued that prediction error-based learning is not restricted to the prediction of reward, but also to risk, such as the economic risk (Preuschoff et al., 2006(Preuschoff et al., , 2008(Preuschoff et al., , 2011. Just like reward prediction errors, risk prediction errors, or more generally surprise, measure violations of expectations and are thus viable learning signals. Risk prediction errors occur in response to unexpected uncertainty, that is, when the variability of a reward is misjudged. For example, returns on stock investments are always variable, but the market dynamics could suddenly be amplified by an unexpected event, like a natural disaster or the sudden announcement of a new policy. It was proposed that the noradrenergic system signals risk prediction errors (Dayan and Yu, 2006;Preuschoff et al., 2008Preuschoff et al., , 2011Nassar et al., 2012). Noradrenergic neurons in the locus coeruleus respond to unexpected changes such as a novel stimulus or the reversal of reward contingencies and have widespread projections throughout the brain, including insular cortex (Nieuwenhuis et al., 2011). As pupil dilation is modulated by locus coeruleus activity (Costa and Rudebeck, 2016;Gilzenrat et al., 2010;Nieuwenhuis et al., 2005Nieuwenhuis et al., , 2011Murphy et al., 2011Murphy et al., , 2014, the size of the pupil is frequently employed as a proxy of noradrenergic activity. Several studies found correlations between pupil dilation and risk prediction errors (Preuschoff et al., 2011;Nassar et al., 2012;Kloosterman et al., 2015).
Hence, the signaling of risk prediction errors by locus coeruleus noradrenergic neurons parallels the signaling of reward prediction errors by midbrain dopaminergic neurons. However, whether the neural response to risk prediction errors consists of two subcomponents is unknown. Research using fMRI and pupillometry is unable to answer this question, because the pupil and BOLD response are too sluggish to resolve such fine temporal differences. Here, we capitalized on the excellent temporal resolution of EEG. We recorded high-density EEG to test if risk processing also consists of two components, where the first one is coding for saliency and the second one for the prediction error.
We hypothesized that the P3b event related potential (ERP) component of the EEG is a likely candidate for risk prediction error modulation. The P3b component is a late (>300 ms) sustained positivity, with a topographical focus at central posterior electrode sites. It is typically observed in response to infrequent or deviant stimuli whose appearance is in some way expected or task-relevant, like the presentation of a target in a stream of distractor stimuli (Polich, 2007(Polich, , 2012Luck, 2014). P3b partially overlaps with the slightly earlier P3a component, that has a more frontal focus and is also observed in response to stimuli that are task-irrelevant, like a loud noise in a quiet environment, or the appearance of a novel visual stimulus (Polich, 2007(Polich, , 2012. Because all stimuli in our paradigm are in some form expected, we focused on the P3b component, which we will refer to as P3 hereafter. Donchin (1981) proposed that the P3 is linked to "context updating" and the updating of expectations about the environment, which suggests an important role in learning (however, see Verleger, 1988; response by Donchin and Coles, 1988). Research on the P3 has primarily used oddball paradigms (cf. Polich, 2012;Luck, 2014) where rare events (the "oddballs") elicit larger P3 amplitudes than more frequent events (Duncan-Johnson & Donchin, 1977;Polich, 2012). Importantly, P3 amplitudes correlate more strongly with the subjective, rather than objective frequency (Duncan- Johnson & Donchin, 1977). In other words, the P3 is less driven by how infrequently an event occurs, rather than by how unexpected or surprising the event is. In line with this proposition, Nieuwenhuis and colleagues have proposed that the P3 component, like pupil size, reflects phasic locus coeruleus activity (see Nieuwenhuis et al., 2005 for a review of evidence). Hence, the P3 may reflect the processing of risk prediction errors in the locus coeruleus. Note that in oddball paradigms, changing the frequency (probability) of the oddball simultaneously changes the risk associated with the odd stimulus (as for example in Duncan- Johnson & Donchin, 1977). A similar argument can often be made for studies that link the P3 to reward processing as in Yeung and Sanfey (2004) where the change in valence results in a simultaneous change in risk in a gambling paradigm. As such, reward prediction error, expectation, and risk prediction error are correlated and it is, hence, not possible to disambiguate their contributions to the P3 response. Here, we use a design that addresses this problem by orthogonalizing reward prediction errors and risk prediction errors (surprise), allowing us to disentangle risk from reward processing (Preuschoff et al., 2006, 2008 in the supplementary material which shows how the errors are orthogonalized).
Sensory ERP components, that precede the P3, are modulated by attention and the bottom-up attributes that determine the physical salience of a stimulus. For example, the first reliably detectable ERP response to a visual stimulus is the P1 component, which peaks around 100 ms after stimulus onset at contralateral occipital electrode sites (Luck, 2014). Stimuli with high luminance, contrast, or duration evoke P1 components with higher amplitude and lower latency than faint or brief stimuli (Luck, 2014). Likewise, attended stimuli lead to larger P1 amplitudes than unattended stimuli (Hillyard et al., 1998;Mangun, 1995;Taylor, 2002). However, sensory components like P1 reflect stimulus processing in extrastriate cortex (Di Russo et al., 2003;Clark et al., 1995;Gomez Gonzalez et al., 1994), rather than noradrenergic structures. Whether the neural sources that give rise to the EEG signature of risk prediction errors produce their own response to stimulus salience, that precedes the prediction error signal, is an open question.
In summary, dopamine neurons support learning by a) signaling the salience of stimuli in a first, rapid sensory component, and b) signaling reward prediction errors in a second, subsequent value component. Using human fMRI and pupillometry, we showed that noradrenaline mediated risk prediction error processing resembles the dopamine mediated processing of reward prediction errors (Preuschoff et al., 2008(Preuschoff et al., , 2011. But we could not resolve whether the risk prediction error signal is preceded by a salience component, like in the case of dopamine. Here, we used EEG and show that, indeed, there are two processes: early components reflecting saliency and a subsequent component (P3) reflecting the risk prediction error. Inverse solutions of our EEG data suggest that both arise from the same neural sources in the noradrenergic system.

Participants
Twenty participants took part in the experiment (age 18-27 years, M ¼ 23.5, SD ¼ 2.6; 10 females). All observers were right-handed (Oldfield, 1971) and had normal or corrected to normal visual acuity, as indicated by a value of 1.0 in at least one eye in the Freiburg Visual Acuity test (Bach, 1996). All participants gave written informed consent prior to the experiment, and were paid based on their performance. The experiment was performed in accordance with the Declaration of Helsinki (World Medical Association, 2013).

Task
We adapted the card game paradigm that we used in earlier fMRI and pupillometry studies for use with EEG ( Fig. 1 and Preuschoff et al., 2006Preuschoff et al., , 2008Preuschoff et al., , 2011. In each trial, two cards were drawn randomly without replacement from ten cards with the values 1-10. Before the first card was shown, participants made a bet whether the second card will be higher or lower than the first card. The participants were aware that each Card 1 value occurred equally often per block, so that the probability of winning at the moment of the bet was 50%. The participant received a brief visual confirmation of the chosen bet (0.4s), followed by a blank screen (1.6 s), during which only a central fixation cross was presented. Card 1 was then presented (0.8 s), which changed the probability of winning. For example, if Card 1 was a 4, the probability of winning changed from 5/10 to 3/9 for a "Second card lower" and to 5/9 for a "Second card higher" bet. After a prolonged blank-screen interval (3.0 s), during which the participant anticipated the outcome, Card 2 was presented (0.8 s), resolving the gamble. Card 2 was randomly selected from the remaining nine values. After another blank screen (1.6 s) the participants evaluated whether they won or lost their bet and 1 CHF. Trials were separated by a blank screen inter-trial interval of variable duration (0.5-1.5 s). The participants started each block with an endowment of 45 CHF (40 EUR) and won or lost 1 CHF on every trial. A penalty of 0.25 CHF was incurred for incorrect bet evaluations and non-responses. Feedback was only provided at the end of the block. Six blocks of 50 trials were run per participant (5 trials per condition value and block). At the end of the experiment, the participants were paid the end-balance of one randomly chosen block.

Model
Our models for reward prediction errors and risk prediction errors are as follows (see Preuschoff et al., 2008Preuschoff et al., , 2011 for mathematical details). In brief, expected reward and risk (or uncertainty) are the first and second order moments of the expected reward distribution. The reward prediction error measures the deviation from the mean (expected reward) of this distribution; the risk prediction error measures the squared deviation of the reward prediction error from the variance (risk) of this distribution.

Stimuli
Stimuli were presented using Matlab with Psychtoolbox (Brainard, 1997;Pelli, 1997). Participants viewed the stimuli from a distance of 1 m on a gamma-calibrated 24 inch Asus VG248QE LCD monitor (1920 Â 1280 px, 100 Hz; http://display-corner.epfl.ch/index.php/ASU S_VG248QE). Responses were collected from two push-buttons, one held in the left and the other in the right hand. The cards measured 3.3 Â 4.8 arcdeg and were white, with a number of black spades corresponding to the card value. The background was midlevel grey with a red fixation cross (7.6 Â 7.6 arcmin).

EEG recording and analysis
The EEG was recorded in a dimly lit, electrically shielded recording chamber using a BioSemi Active Two system (BioSemi, Amsterdam, The Netherlands) with 192 active sintered Ag/AgCl electrodes. The set of electrodes uniformly covered the entire scalp. Electrode Cz was positioned halfway between the inion and nasion and half way between the two ears. Four additional electrodes were positioned 1 cm above/below the right eye and 1 cm lateral to the outer canthi, to facilitate the detection of eye-blinks (electrooculogram, EOG). Data were recorded with a CMS (Common Mode Sensor) reference and re-referenced off-line to the average of all electrodes (mean reference) in the last step of the analysis. Data were recorded at a sampling rate of 2048 Hz and downsampled offline to 512 Hz before analysis.
The EEG data were pre-processed in Matlab (TheMathWorks) with EEGLAB (Delorme and Makeig, 2004) and an in-house, automated pre-processing pipeline (Da Cruz et al., 2018). The pipeline included bandpass filtering to 0.01-40 Hz (3rd order Butterworth filter); removal of line-noise (Mullen, 2012); re-referencing to the biweight estimate of the mean of all electrode-channels (Hoaglin et al., 1983(Hoaglin et al., , 1985; removal and 3D spline interpolation of bad electrode channels; removal of bad epochs; and removal of epoch artifacts. All participants were retained for analysis. During pre-processing, an average of less than 6 out of 192 channels (<1%) was interpolated per block and participant (M ¼ 3.0%; SD ¼ 1.4%; range: 1.7-6.7%; median ¼ 2.5%). The average proportion of rejected epochs was 18% per block and participant for Card 1 (M ¼ 18.2%; SD ¼ 9.1%; range: 8-47%; median ¼ 15.3%), and 17% for Card 2 (M ¼ 16.6%; SD ¼ 7.3%; range: 9-34%; median ¼ 14.2%). As a sanity check, data were ICA-corrected for eye-movements and results are shown in the supplementary material (section S1). It did not change the main results. One disadvantage of our experimental design was that the card number was not presented in the center of the card, potentially motivating participants to make an eye-movement to the number in the corner of the card and leading to avoidable ocular artifacts in the data. Our motivation to use an original playing card layout with off-center numbers was to stay close to real-life gambling cards, and to use the same stimuli as in earlier studies with this paradigm that used fMRI and pupillometry (Preuschoff et al., 2008(Preuschoff et al., , 2011. The data were analyzed from a mid-central electrode cluster, that was formed by averaging electrode Cz and the five surrounding channels. Large parts of the published literature on the P3 component used oddball-type tasks and central-posterior electrode sites (typically Pz or CPz). Our choice of electrode site Cz was motivated by Johnson (1986), who observed that the optimal electrode for observing P3 is task dependent: P3 was strongest at Pz for oddball-type tasks, but significantly stronger at Cz than Pz in a non-oddball task. Epochs of À100 to 800 ms were formed relative to each card onset and baseline corrected to the pre-stimulus period. The data were split into trials with high, medium and low risk prediction error and performed RM-ANOVAs in STEN (Knebel, 2012) to find time-windows with significant differences (Figs. 2 and 3). To control for multiple comparisons p-values were adjusted for a false-discovery rate of 5% (Benjamini and Hochberg, 1995) and only considered time windows of 30 ms or more. The mean amplitude was then computed during these time windows and compared statistically in JASP (The JASP Team, 2016). The inverse solutions were computed from the grand average data in CARTOOL (Brunet et al., 2011) using the Distributed Electrical Source Imaging method (Grave De Peralta Menendez et al., 2001; with the 152-Montreal Neurological Institute template and a space of 4022 solution points. The current source densities (CSD) were estimated with the Local Auto-Regressive Average (LAURA) algorithm Brodbeck et al., 2011), and the 5% (201/4022) of solution points with the maximal CSD activity were clustered in SPM8 (http://www.fi l.ion.ucl.ac.uk/spm/software/). For all clusters with at least five adjacent solution points, MRIcron's (http://people.cas.sc.edu/rorden/mri cron/index.html) automated anatomical labeling atlas was used to locate their center of mass in MNI coordinates (Table 1), which was then converted to Talairach coordinates using an online version of the Yale BioImage Suite (http://sprout022.sprout.yale.edu/mni2tal/mni 2tal.html).

Pupillometry
Alongside the EEG, eyetracking data were recorded using an infrared video eyetracker (TheEyeTribe; v0.9.49, Firmware 293) and Dalmaijer (2014) Matlab control functions. The data were recorded binocularly at 30 Hz, then averaged over both eyes and smoothed with a 100 ms Two cards are drawn randomly from the values 1-10. Before having seen the first card, the participant bets whether the second card will be higher or lower. In the period between first and second card, expected reward depends linearly and risk depends quadratically on the value of Card 1 and the chosen bet. Card 2 resolves the gamble.
sliding-window average, to improve the signal-to-noise ratio. Trials in which the subject failed to respond were excluded from analysis. Samples with signal-loss in one or both eyes were identified using the manufacturer's 'state'-indicator and ignored during averaging. Epochs of À200 to 2400 ms relative to each card onset were formed. The single-participant averages were converted from absolute to relative pupil size, expressed in percent change compared to the average pupil size during the baseline period of the same subject and trial. As for the EEG data, the data were tested for differences between high, medium, and low risk prediction error trials using sample-wise RM-ANOVAs ( Fig. 4B and C). Consistent with our earlier pupillometry study (Preuschoff et al., 2011), the sample with the lowest p-value was selected for further analyses ( Fig. 4D and E). Eyetracking data were recorded from only a subset of 11 participants, because the device was not yet at our disposal in the beginning of the data collection phase. For the comparison of pupil size with EEG mean amplitudes, only the EEG data of the 11 participants for whom we had Grand average ERP waveforms, Cz cluster. A. ERP activity from À200 to þ5400 ms relative to the onset of Card 1. Card presentations is represented by the shaded areas. Grey boxes outline the intervals analyzed in B and C. B. Card 1 (-100 -800 ms): The amplitudes of an early negative and a late positive component varied as a function of risk prediction error magnitude. Amplitudes were more positive on trials with high (green, pWin ¼ 0 or 1), than both medium (blue, 0<pWin<0.3 or 0.7<pWin<1) and low (black, 0.3<pWin<0.7) risk prediction error. Black bar: Significant main effect of risk prediction error. C. Card 2 (-100 -800 ms): The amplitudes of an early positive and a late positive component were lower for trials with certain outcome (black, pWin ¼ 0 or 1), than trials with moderately (blue, 0.5<pWin<1 & win, or 0<pWin<0.5 & loss), and highly surprising outcome (green, 0<pWin<0.5 & win, or 0.5<pWin<1 & loss). The effect of the bet outcome (win/loss) is comparably minor, with amplitudes being only slightly more positive on win (solid lines) than loss trials (broken lines). Grey bar: Significant main effect of outcome. Black bar: Significant main effect of surprise. D. Mean EEG amplitude during the late significant interval after Card 1 (460-800 ms), plotted against the probability of winning after Card 1. The U-shaped model of risk prediction error, computed using the formulae provided in Preuschoff et al. (2011), is overlaid as solid lines and fitted well by the data. Shown is the best model fit that includes a linear and a quadratic component (model R 2 ¼ 0.75; F ¼ 19.75; p ¼ .0005). E. Mean EEG amplitude during the late significant interval after Card 2 (243-800 ms), plotted separately for win (solid markers) and loss trials (open markers) against the probability of winning after Card 1. The model of risk prediction error (Preuschoff et al., 2011) is overlaid as solid line for win and as broken line for loss trials and is again well fitted by the data. Shown is the best model fit that includes a linear and a quadratic component (model R 2 ¼ 0.886; F ¼ 27.24; p ¼ .0001. As the certain cases are processed differently (as seen in the P1* component), we only include the uncertain trials in these regressions. Error bars are S.E.M. pupillometry data were used.

P3 amplitude is modulated by surprise/risk prediction error
Both card presentations were followed by pronounced event-related potentials ( Fig. 2A). The amplitude decrease between Card 1 and Card 2 seen in Fig. 2a reflects participants' attentive anticipation of the second card (a phenomenon known as contingent negative variation; Brunia et al., 2011;Brunia et al., 2012). We analyzed the data in time windows from À100 to 800 ms relative to each card onset and sorted the trials into three levels of risk prediction error (high/medium/low surprise; see Fig. 2 caption). Low risk prediction error (surprise) trials are those whose probability lies between 0.3 and 0.7 after Card1; high risk prediction error (surprise) trials are trials with probability 0 or 1 of winning after Card1 (see also Fig. S1 in the supplementary material). Both after Card 1 and after Card 2, we found that larger surprise leads to higher amplitudes in the P3 range ( Fig. 2B and C). In sample-wise RM-ANOVAs with false-discovery rate (FDR) adjustment, we found significant differences from 460 to 800 ms post Card 1 onset and 243-800 ms post Card 2 onset. The mean amplitudes during these intervals differed significantly between the three levels of risk prediction error (Card 1: F (1.48, 28.04) ¼ 14.32, p < .001, ƞ 2 ¼ 0.43, Greenhouse-Geisser corrected; Card 2: F (2, 38) ¼ 34.99, p < .001, ƞ 2 ¼ 0.65). For Card 1, mean amplitudes were higher for the high than both the medium and low risk prediction error  μV). Mean amplitudes were higher for medium than low risk prediction error trials (t (19) ¼ 6.00, p < .001, d ¼ 1.34, diff. ¼ 1.631 μV). Note, that Card 1 merely changes the participant's reward prediction and certainty about the outcome. Card 2 reveals the outcome (reward). Because the surprise levels are larger after Card 2 than after Card 1, the three levels separate more strongly after Card 2 than after Card 1.
To quantify the relationship between the EEG activity and our model of risk prediction error, we plotted the mean amplitudes of the late significant intervals as a function of the probability of winning after Card 1 and 2 ( Fig. 2D and E). For both cards, the mean amplitudes closely followed an inverted U-shape, which is consistent with our model of risk prediction error (see Model and Fig. S1 of the supplementary material and Preuschoff et al., 2008Preuschoff et al., , 2011 for details). After Card 1, the expected reward increases monotonically with the probability of winning, while the risk prediction error magnitude is U-shaped with a minimum at a probability of winning of 0.5 and maxima at 0 and 1. Mean amplitudes followed the U-shape of the risk prediction error model (Fig. 2D) and were highest when Card 1 predicted the outcome of the bet with certainty (i.e., Card 1 values 1 or 10 and probability of winning 0 or 1) and lowest when Card 1 was little predictive of the outcome (i.e., Card 1 values of 4 or 5 and probability of winning therefore still close to 1 2 ). After Card 2, the risk prediction error reflected how surprising the outcome was. Hence, it increased with the probability of winning on loss trials, but decreased on win trials. The mean amplitudes reflected this pattern ( Fig. 2E; see supplementary materials for further analysis).

P1* and N2 are modulated by saliency
Next, we tested our hypothesis that a modulation by the risk prediction error is preceded by an earlier, risk-independent salience signal. We made use of the fact that in some trials the outcome is already certain after Card 1 is presented (i.e., pWin ¼ 0 or pWin ¼ 1). In these trials, Card 2 is no longer relevant (i.e., less salient) for the overall outcome of the gamble. This difference in saliency is reflected in the amplitude of the first positive component occurring around 180 ms after Card 2 onset (Fig. 2B). We call this positivity P1* to avoid confusion with the visual P1 component, which is usually recorded from posterior electrodes (e.g., Luck, 2014). Using sample-wise, FDR adjusted RM-ANOVAs, we found an interval from 144 to 181 ms after Card 2 onset with significant amplitude differences between trials with certain (no surprise) and uncertain (low or high surprise) outcomes. The mean amplitude over this interval was significantly lower on trials with certain outcome, than trials with low and high surprise (t (19) ¼ 2.86, p ¼ .010, d ¼ 0.64, diff. ¼ 0.754 μV; and t (19) ¼ 3.47, p ¼ .003, d ¼ 0.78, diff. ¼ 0.808 μV, respectively). Mean amplitudes on trials with low and high surprise were nearly identical (t (19) ¼ 0.32, p ¼ .751, d ¼ 0.07, diff. ¼ 0.053 μV). Compatible with our interpretation of a saliency signal which is independent from reward, the mean amplitudes did not differ between won and lost bets (F (1,19) ¼ 0.019, p ¼ .893, ƞ 2 ¼ 0). There was also no interaction between the bet outcome (i.e. reward) and surprise (F (1.34, 25.49) ¼ 0.189, p ¼ .739, ƞ 2 ¼ 0.01 Greenhouse-Geisser corrected).
Following Card 1, a difference between certain and uncertain trials first emerged on the N2 component (second negativity, 199-288 ms). Mean amplitudes over this interval were modulated by surprise (F (2,38) ¼ 11.70, p < .001, ƞ 2 ¼ 0.38) and were significantly less negative when Card 1 predicted an outcome with certainty, than for both medium and low certainty predictions (t (19) Mean amplitudes in trials with low and medium certainty did not differ (t . The P1* and earlier components evoked by the presentation of Card 1 were modulated neither by the expected reward nor the risk prediction error. This absence of a modulation reflects the fact that, in the beginning of the trial, all cards are equally salient. Their motivational salience changes once the card has undergone sufficient visual processing to be recognized, which appears to have required about 200 ms in our experiment. Hence, following both card presentations, the P3 was preceded by an early component (Card 1: N2 199-288 ms; Card 2: P1* 144-181 ms) whose amplitude reflected the saliency of the stimulus. The salience signal occurs later for Card 1 than for Card 2, because Card 1 first has to be processed to determine its value and salience. This is not the case for Card 2, when Card 1 already predicts the outcome with certainty (values 1 and 10). Further analyses such as model fitting and Fz and Pz clusters can be

Common neural sources
We computed inverse solutions to test if the observed potentials are generated in structures related to risk processing. In the P3 time windows we found activations in temporal and frontal cortex ( Fig. 3C and D). We localized the main foci of this activity to the left superior temporal lobe and the right insula for Card 1, and the left and right insula and the frontal lobe for Card 2 (Table 1). These sources are part of the risk processing network that we identified with fMRI (Preuschoff et al., 2006(Preuschoff et al., , 2008. Next, we tested whether also the sources of the P1* and N2 components originate from the same network. It is well established that early sensory components (including P1 and N2), that originate from extrastriate cortex, reflect the saliency of visual stimuli and whether they are attended. In line with this literature, we found strong activations of the occipital visual areas in the time-windows of the central N2 of Card 1 and P1* of Card 2 ( Fig. 3A and B). Importantly, we found that the sources of both components were not restricted to occipital areas, but occurred also in the middle temporal and superior temporal lobes, as well as the insular cortex (Table 1), the same areas that were active in the P3 intervals. This suggests that the same neural sources underlie the early salience signal (P1*/N2) and the later risk prediction error signal (P3) we measured at central electrodes.
Supplementary Movie 1 shows the evolution of the neural sources over the course of the trial: Shortly after onset of Card 1, bilateral sources in occipital and temporal cortex activate with about the same latency. The activity in occipital areas quickly dissipates, while in temporal areas the activity intensifies and spreads to frontal cortex. This activity peaks at around 1100 ms and then slowly dissipates. When Card 2 is presented, the same sources activate in the same order, but the temporal and frontal Fig. 4. A. Pupil size from À200 to 6200 ms relative to Card 1 onset, baseline corrected to the pre-stimulus interval. Upon the presentation of both cards (shaded areas), the pupil initially constricted but quickly started to dilate again. For the largest part, this response reflects the pupillary adaptation reflex to the changing luminance. However, the change in pupil size was modulated by the risk prediction error. Black boxes outline the intervals analyzed in B-E. B. Card 1 (-200 -2400 ms). Top: Change of pupil size over time. The pupil dilated more on high (green, pWin ¼ 0 or 1) than medium (blue, 0<pWin<0.3 or 0.7<pWin<1), and low (black, 0.3<pWin<0.7) risk prediction error trials. Bottom: Uncorrected p-values for the main effect of risk prediction error (sample-wise RM-ANOVAs). The dotted line indicates p ¼ .05. The sample with the lowest p-value was selected for further analysis. C. Card 2 (-200 -2400 ms). Top: On win trials (solid lines) the pupil dilated more in trials with highly surprising outcome (green, 0<pWin<0.5 & win, or 0.5<pWin<1 & loss), than in trials where surprise was low (blue, 0.5<pWin<1 & win, or 0<pWin<0.5 & loss), and trials with certain outcome (black, pWin ¼ 1 or 0). In loss trials, the pupil size varied only little (dotted lines). Bottom: Conventions as in B. D-E. For the most significant sample, the pupil size correlates strongly with the mean EEG amplitude of the late significant interval for the same participants. Lines show quadratic (D) and linear (E) least-squares fits to the data.
activations are activated more intensely and more rapidly than after Card 1, peaking around 500-600 ms [INSERT Fig. 3 ABOUT HERE].
Supplementary video related to this article can be found at https://d oi.org/10.1016/j.neuroimage.2020.116766

The role of noradrenaline
Using an auditory version of this task, we have previously shown that pupil size signals surprise, and we hypothesized that this modulation is driven by the noradrenergic system (cf. Preuschoff et al., 2011). It has been suggested that the P3 component of ERP equally reflects phasic locus coeruleus activity. Consequently, if both pupil size and the P3 component of ERPs are driven by noradrenergic activity, we should observe a strong correlation between the two signals. We thus recorded pupil size data alongside the EEG for a subset of 11/20 participants. The pupil data mirror the EEG data in several aspects. First, we see a modulation of pupil size by surprise, which matches our previous report (Preuschoff et al., 2011). Pupil size was larger for high, than both medium and low risk prediction error trials after both cards ( Fig. 4B and C). Second and more importantly, we observed a strong correlation between pupil size and EEG amplitude. Analogous to the EEG analyses, we performed sample-wise RM-ANOVAs on time-windows locked to the Card 1 and Card 2 onsets (À100 to 2400 ms), to test for differences between high, medium, and low risk prediction error trials. In line with our previous study (Preuschoff et al., 2011), we extracted the pupil size of the time point where this difference was most significant for further analysis. We then plotted the pupil size as a function of the probability of winning after Card 1 and overlaid the mean EEG amplitudes of the P3 interval. For Card 1, the pupil size correlated strongly with the mean EEG amplitudes ( Fig. 4D; r ¼ 0.66, p ¼ .027). For Card 2, we found a strong correlation for trials in which the bet was won ( Fig. 4E; r ¼ 0.61, p ¼ .078). On loss trials, pupil size varied much less overall (cf. dotted lines in Fig. 4C) and the correlation with the mean EEG amplitude was weak ( Fig. 4E; r ¼ 0.24, p ¼ .536).

Discussion
Goal-directed behavior requires the brain to maintain an up-to-date representation of the motivational value of choice options. This representation needs to take into account expected rewards as well as other aspects of the reward distribution such as risk (i.e., the variability of reward) and volatility (i.e., the rate at which the environment changes). It was proposed that the reward prediction error is encoded by activity of midbrain dopaminergic neurons. Recently, it was discovered that dopamine neurons respond to stimuli in two steps: First, a rapid response signals the saliency of the stimulus. Second, a later response signals the reward prediction error.
Using human fMRI, we have previously shown that the concept of reward prediction errors can be generalized to risk prediction error signals, which are reflected by activity in the anterior insula (Preuschoff et al., 2008) and the dilation of the pupil (Preuschoff et al., 2011). In the same way reward learning is driven by dopamine, risk learning is driven by noradrenaline (Dayan and Yu, 2006;Preuschoff et al., 2011;Faraji et al., 2018). However, we could not resolve whether the noradrenergic system uses a similar two-component coding scheme as the dopaminergic system (i.e., first salience, then prediction error).
Here, we used EEG and the same card game as in earlier experiments (Preuschoff et al., 2006(Preuschoff et al., , 2008(Preuschoff et al., , 2011. We found that the risk prediction error magnitude was reflected in the amplitude of the P3 component. Reconstructions of the neural sources of this component showed strong activations in the insula, temporal, and frontal cortex. In addition, both card presentations evoked an earlier component (P1*, N2), whose amplitude reflected the stimulus saliency. This is unsurprising, because early sensory components originating from visual cortex are modulated by stimulus salience and attention. However, the inverse solutions we computed for the early components pointed not only to sources in visual areas, but also showed strong activations in temporal cortex and the insula -similar sources as we found for the later P3 component. This is compatible with our hypothesis, that the same neural structures that generate the P3 risk prediction error signal, also generate an earlier salience signal.
Source reconstructions of EEG components should be interpreted with caution, even when using high-density electrode setups like in our study. Any given pattern of scalp observed activity can theoretically be caused by an infinite number of underlying source configurations. In the present case, the inverse solutions point to sources in the risk processing network, that we identified in fMRI experiments with the same paradigm (Preuschoff et al., 2006(Preuschoff et al., , 2008. In particular, the insula has frequently been related to uncertainty signals (e.g., Critchley et al., 2004;Xue et al., 2010;Jones et al., 2011;Weller et al., 2009), and more specifically prediction errors (Preuschoff et al., 2008). Together with other regions, such as the anterior cingulate cortex, the insula is also part of the salience network, which is thought to recruit other brain areas for information processing. The reconstructed sources are hence highly compatible with our hypothesis that both the early and late component are generated in structures linked to noradrenaline signaling.
In an earlier study, Yu et al. (2011) used a similar paradigm to investigate reward prediction error and uncertainty signals in the feedback-related negativity (FRN; review: Walsh and Anderson, 2012). The FRN is a sustained negative component, which is typically computed as the amplitude difference between negative and positive feedback trials. It peaks around 300 ms after feedback and is strongest at frontal-central electrode sites. It has been linked to the dopaminergic system (Holroyd and Coles, 2002) and is thought to reflect activity in the anterior circulate cortex (ACC), a structure that receives projections from midbrain dopaminergic regions. Yu and colleagues found that the FRN was sensitive to expected reward, but found an influence of uncertainty only in win trials. The authors used band-pass filtering to minimize the influence of P3 activity on the FRN amplitude. Here, we took the inverse approach and focused on the P3. Unlike Yu and colleagues, we find a modulation of P3 amplitudes by risk prediction errors in both win and loss trials, but no effect of expected reward. In line with the FRN literature, the P3 responses in our experiment were marginally stronger on win than loss trials. However, this difference was too small and inconsistent for further analysis (cf. Fig. 2C and Supplementary Material Fig. S5). One explanation regarding the null result of the FRN is due to the high cognitive load of the task. FRN is reduced when feedback processing involves more cognitive load (Krigolson et al., 2012). In the current task, processing the valance of the feedback implies the comparison between two numbers relatively to a choice that was made at the beginning of the trial. Hence, the amplitude of the P3 component was modulated by the risk prediction error, but was largely independent of the reward prediction error.
Our findings are the first evidence that the response to uncertain and surprising stimuli consists of two components. The first component signals the salience of the stimulus and may serve to redirect attention, by acting as a filter to identify potentially relevant stimuli among the vast number of stimuli that are constantly available to an organism. The second component then evaluates how accurate the risk prediction was given the uncertainty of the environment. Note that in a highly stochastic environment a prediction error is not necessarily evidence for poor prediction. If you made a prediction but knew that many different outcomes were equally likely to occur (e.g., a roll of a die, or a number at a game of roulette), then a prediction error will not be surprising to you and will not require that you update your prediction. However, a similar prediction error will be very surprising if you were very sure or confident about the outcome. As such, surprise is an important quantity to evaluate the relevance of prediction errors, and may trigger and modulate learning (Faraji et al., 2018).
In sum, our findings suggest that noradrenergic cells show a twocomponent response to uncertain or surprising stimuli, that parallels the dopaminergic response to reward-related stimuli: An early component that signals the salience of the stimulus, and a subsequent component that carries information not about value, but about how surprising the stimulus is.

Funding
This project was supported by NCCR Synapsy Grant No. "51NF40-158776" of the Swiss National Science Foundation.

Author contribution
ML, OF, MH, and KP designed the experiment. ML and OF collected the data. ML, OF, and SG analyzed the data. ML, SG, OF, MH, and KP interpreted the data and wrote the manuscript.

Declaration of competing interest
The authors have no competing interests to declare.