Disentangling the effects of reward value and probability on anticipatory event-related potentials

Optimal decision-making requires humans to predict the value and probability of prospective (rewarding) outcomes. The aim of the present study was to evaluate and dissociate the cortical mechanisms activated by information on an upcoming potentially rewarded target stimulus with varying probabilities. Electro-cortical activity was recorded during a cued Go/NoGo experiment, during which cue letters signaled upcoming target letters to which participants had to respond. The probability of target letter appearance after the cue letter and the amount of money that could be won for correct and fast responses were orthogonally manipulated across four task blocks. As expected, reward availability affected a prefrontally distributed reward-related positivity, and a centrally distributed P300-like event-related potential (ERP). Moreover, a late prefrontally distributed ERP was affected by probability information. These results show that information on value and probability, respectively, activates separate mechanisms in the cortex. These results contribute to a further understanding of the neural underpinnings of normal and abnormal reward processing.


Introduction
Optimal decision-making requires humans to predict the value and probability of prospective (rewarding) outcomes (Glimcher and Rustichini, 2004). These predictions have direct implications for subsequent behavior, which is based upon cortical activity. In the present study we evaluate the cortical mechanisms activated in a context of anticipating reward with varying probabilities. We furthermore investigate the extent to which these cortical activations interact and to which they are independent.
Neuroimaging studies have investigated the representation of anticipated subjective value (or reward) in the human brain. Parts of the ventro-medial prefrontal cortex including the medial orbito-frontal cortex (mOFC) and the rostral anterior cingulate cortex (rACC) (Breiter et al., 2001;Howard et al., 2015;Kable and Glimcher, 2007;Padmala and Pessoa, 2011;Smith et al., 2009), as well as more posterior regions of the cingulate cortex (Kable and Glimcher, 2007;Kirsch et al., 2003;Padmala and Pessoa, 2011;Smith et al., 2009) show activity during the anticipation of reward. These cortical regions act in concert with subcortical structures, such as the ventral striatum and midbrain (e.g. Breiter et al., 2001;Knutson et al., 2005;Smith et al., 2009;Yacubian et al., 2006; for reviews see: Haber and Knutson, 2010;O'Doherty, 2004;Rushworth and Behrens, 2008). Some of these studies also investigated the effect of increasing the anticipated probability of a reward and reported affected regions in the posterior cingulate cortex (PCC) (Knutson et al., 2005) and medial prefrontal cortex (Knutson et al., 2005;Yacubian et al., 2006), as well as in the dorsal ACC (dACC) (Smith et al., 2009).
Results of the neuroimaging study by Knutson and colleagues (2005) indicate that the dACC may integrate reward and probability information. However, in that study as well in the other neuroimaging studies discussed above, main effects of reward value and probability manipulations abound as well. This prompts the crucial question of the temporal relations between activity related to either the value or probability manipulation on the one hand, and activity related to their interaction on the other. The event-related potential (ERP) technique has a much higher temporal resolution compared to neuroimaging. It is therefore much better suited in temporally separating sequential neural activities within the brief time windows that characterize real-life decision-making.
Recently, a number of studies have investigated the effects of reward value (but not so much probability) on ERPs during outcome anticipation. It was found that reward cues (highly) predictive of an upcoming monetary reward (monetary incentive delay task: Donamayor et al., 2012;Flores et al., 2015; passive gambling task: Holroyd et al., 2011) Yu and Zhou, 2006) than cues (highly) predictive of no reward or a loss, with a latency between 200 and 300 ms after the reward announcing cue and a fronto-central scalp distribution. A reward-sensitive ERP with a similar latency was observed during choice presentation (so before the feedback stage) in a gambling task after subjects learned which of the choice options yielded reward (Krigolson, Hassall and Handy, 2014). This reward-related positivity has been labelled "reward positivity" (Holroyd et al., 2011). It is observed not only in response to reward-predicting cues but also in response to reward delivery (Holroyd et al., 2011). Especially in the latter context it has also been described as a mirror inverse of the feedback-related negativity (FRN; Krigolson, 2018;Proudfit, 2015). Also in the context of negative feedback and errors, cues predicting errors or non-reward have been observed to elicit a larger FRN/error-related negativity (ERN) relative to cues predicting correct responses or reward (Baker and Holroyd, 2009;Krigolson and Holroyd, 2007).
With respect to probability, a prior study by our group (Bekker et al., 2004) showed that cues highly predictive of an upcoming target, and therefore containing highly relevant information, elicit a larger posterior P300 ERP than cues less predictive of an upcoming target. This finding is line with a large body of research showing sensitivity of the P300 to internal updating of the subjective probability of relevant events or outcomes (Duncan-Johnson and Donchin, 1982). In another study, probability was manipulated within a reward-anticipation context (Yu et al., 2011). Here, cues signaling a 100% certain future reward elicited a smaller FRN/larger RRP, relative to cues signaling less than 100% certainty (ranging from 0 to 87.5%).
ERP studies investigating the effect of both reward value and probability manipulations during reward anticipation are scarce. Furthermore, it remains to be determined whether ERPs elicited by reward value and probability manipulations are dissociable. In the current study, reward value and probability were orthogonally manipulated across four task blocks during a cued Go/NoGo experiment (CGN task) (adapted from Bekker et al., 2004) in which cues signaled upcoming targets to which participants had to respond. Unlike paradigms in which performance was based on choices between different reward-probability conditions (e.g. Krigolson et al., 2014;Smith et al., 2009;Yacubian et al., 2006), in the present study cued reward value and probability information was decoupled from task requirements such as choosing a response. This enabled us to isolate specific activations related to reward value and probability from those of task requirements. For example, when a certain response choice is directed at an option of a certain reward being obtained with a certain probability, the probability level directly affects the expected reward value. In our design, probability only concerns whether the action will have to be performed at all. This contributes to strong orthogonality, while at the same time reward obtainment is still dependent on producing the adequate response. This latter feature is important at least with respect to the FRN/ERN (Yeung et al., 2005).
The main aims of the current study were: (1) to gain understanding of the temporal profile of cortical mechanisms activated in a reward anticipation context; (2) to investigate the extent to which activations related to reward value and probability manipulations interact when both are completely orthogonally manipulated. Our main focus was on anticipatory activity within a 180-500 ms post cue window (see Methods section). 1 Specifically, we tested the hypothesis that reward value affects frontal ERP activity early in time. This ERP activity could be associated with the processing of reward itself, or more with indirect effects of reward on attention networks (Corbetta and Shulman, 2002). Probability manipulations were expected to specifically affect parietal ERP activity later in time (P300) (Bekker et al., 2004;Holroyd et al., 2011;Donamayor et al., 2012;Flores et al., 2015). We also anticipated the possibility of an interaction between reward value and probability. Reward value could have an additive effect on the probability P300 ERP, given that numerous studies have shown that the P300 is also sensitive to reward outcome (e.g. Yeung and Sanfey, 2004) and reward anticipation (Broyd et al., 2012;Flores et al., 2015;Pfabigan et al., 2014). Alternatively, reward value and probability could interact like in the Knutson et al. (2005) study. In this scenario, reward announcing cues were expected to elicit more ERP activity than no reward cues, but only when they also predict high probability of an upcoming target.
It should be noted that while our design ensures orthogonal manipulation at block level of reward value and probability, this does not hold at the level of single trials, as cues indicating high-probability rewarded targets also cue low single-target rewards. An 'adaptive scaling' (see Walsh and Anderson, 2012) perspective predicts a response to low single-target reward cues (cueing 98% probability reward), when the low value is the only available option (in addition to no reward, throughout a block of trials), that is identical to the response to high single-target reward cues (cueing 50% probability reward) when the high value is the only available option. From this perspective, reward values during different probability conditions could be validly compared. To assess the extent to which this perspective is tenable, we performed additional analyses using specific contrasts to isolate 'pure' reward and probability effects (see Methods-statistical analysis).

Subjects
Forty-nine healthy subjects participated in the experiment. Participants were recruited via advertisement at the campus of Utrecht University. None of the subjects had a history of psychiatric or neurologic disorders and none of the subjects used psycho-active medication. Participants were requested to abstain from consuming caffeine and smoking for at least 12 h prior to participation and were requested to refrain from drugs for at least 2 weeks prior to participation. All participants declared to have normal or corrected-to-normal vision. The study was approved by the medical ethical committee of the University Medical Centre Utrecht and subjects gave written informed consent prior to participation. Participants received 6 Euros per hour or received study credits instead, and additionally received a monetary bonus with a maximum of 10 euros. The monetary bonus was dependent on task performance (see Cued Go/NoGo task). ERP data of 1 participant were not stored due to a technical issue. Furthermore, 3 participants were excluded during analysis, because too few segments were left in one or more conditions for multiple neighboring electrodes (see data processing). Therefore the final sample consisted of 45 participants (mean age (SD) = 23.9 (4.2) years, 34 females, 43 righthanded).

Procedure
This experiment was part of a larger study (3 sessions on separate days) on the effect of reward and target probability (anticipation) on various aspects of behavior, and of psycho-and neurophysiology. Participants were informed about the experimental procedure and signed the informed consent form during the first session. Half of the subjects completed the cued Go/NoGo (CGN) task during the second session and the other half during the third session.
The CGN task session started with placement of cap and electrodes. Participants were seated in a chair 1 m in front of a computer screen in a dimly lit room adjacent to the control room with the chin placed on a chin-rest. Participants fixated on the center of the screen and the chair was adjusted to a comfortable height accordingly. Task instructions were given and the CGN task started subsequently, which lasted about 1 h. EEG was recorded during the task. Five subjects completed a spatial cuing task before (3 subjects) or after (2 subjects) the CGN task. The other participants were not subjected to other tasks during the CGN task-session. At the end of the test session the cap and electrodes were removed and participants were paid and dismissed.

Cued Go/NoGo task
The CGN task was controlled by Presentation ® software (version 16.0, www.neurobs.com). During the CGN task (adapted from Bekker et al., 2004), the letters A, C, D, E, F, G, H, J, L, X and Y were presented in the center of a 16 inch Dell CRT screen (resolution: 1280 × 1024) in black on a grey background between two vertical bars (height: 1.03°, width: 0.05°). Letters were presented in Arial font, size 79. The letter stimuli were presented for 150 ms and were interleaved by inter-stimulus intervals with a random duration between 1400 and 1600 ms.
The task is illustrated in Fig. 1. Participants were instructed to press the left button with the left index finger when letter X followed letter A and to press the right button with the right index finger when letter Y followed letter A, as fast and accurately as possible. This mapping was reversed for half of the participants. Responses were made on a qwerty keyboard 2 on which all keys were covered by a plastic sheet, except for the "z" key, "/" key and the spacebar. The "z" and "/" key were the left and right target button, respectively, and the symbols on these buttons were covered by a white sticker.
Target probability was either 50% (total of 40 targets, see below) or 98% (78 targets) and either no money or a maximum of 5 Euros in total could be won during each block. Participants were fully informed about the probability and the reward levels for a block before the start of that block. Note that although we manipulated target probability (i.e., probability of X or Y following letter A) and target reward value, we were specifically interested in the electrocortical aspects of reward and probability anticipation as they occur after the cue (i.e., letter A) but before the target (letter X or Y).
The amount of money won in the reward block was calculated by multiplying the percentage of correct and timely responses by 12.5 eurocents (5 Euro divided by 40 trials) and 6.4 eurocents (5 Euro divided by 78 trials), respectively.
The task started with 100 practice trials (letters). The practice block always consisted of the reward-98% target probability condition. The main task consisted of four blocks of 400 trials (letters), comprising the four conditions. Participants were informed about the target probability and the total amount of money that could be won during the block at the beginning of each block. Order of the four blocks was counterbalanced across participants. One-minute rest breaks were provided halfway through each block and participants were reminded of the target probability and reward availability of the current block after the rest breaks. One-minute rest breaks were also provided between blocks.
The cue (A) appeared 80 times during each block. In the 50% target probability blocks, 20 cues were followed by target X and 20 cues were followed by target Y. Forty cues were not followed by a target and 20 X's and 20 Y's were not preceded by a cue. In the 98% target probability blocks, 39 cues were followed by target X and 39 cues were followed by target Y. Two cues were not followed by a target and 1 X and 1 Y were not preceded by a cue. Each of the other letters appeared 20 times in each block, except for letter H and C, which appeared more often (80 and 40 times, respectively), in order to control for frequency differences between the letter stimuli and cues/targets (Bekker et al., 2004). Letter stimuli were presented in a pseudo-random order within each block, with the following restrictions: (1) stimuli were never directly followed by identical stimuli. (2) cues followed by targets (A-X or A-Y) and cues not followed by targets ("NoGo": A-not X or Y) were always followed by at least one "nocue" (i.e., C, D, E, F, G, H, J, L not preceded by a cue). Only cue-and no cue-related ERP activity was analyzed. Note that nocues were not associated with reward or probability information. Nocue related activity was used as a baseline for non-specific effects, as the context of reward/high probability within the current block may sensitize processing. Subtraction of nocue from cue-related activity, therefore, yields ERP activity specifically related to the temporally specific information on reward and probability (i.e., the cue A signaling that with 50 or 98% probability a reward adding up to 5 Euros could be earned in the reward condition, or of 0 Euro in the no-reward condition).
To briefly recapitulate our hypotheses: We expected cues to elicit a reward-related positivity (RRP) and P300, relative to no-cues (these were either C, D, E, F, G, H, J or L). Furthermore, we expected reward (vs. no reward) to enhance RRP and P300, and high (vs. low) probability to enhance P300. Cues and no-cues were presented pseudorandomly within a block of trials, and the resulting average no-cue ERP served as to be subtracted baseline for the resulting average cue ERP. This was done to control for non-specific effects of reward and probability that would affect any response to any stimulus in a given block, including irrelevant probes. In total four of these blocks were presented, corresponding to four conditions that resulted from a 2 × 2 design based on orthogonal manipulation of the probability of target appearance (letter X or Y) after the cue (letter A), and the amount of money that could be won for correct and fast responses.

EEG data acquisition
ERP signals were recorded with the Active-Two system (Biosemi, Amsterdam, The Netherlands) with 64 Ag-AgCl electrodes. Recording electrodes were placed according to the 10/10 system. EOG electrodes were placed above and below the left eye and at the outer canthi of both eyes. EEG signals were online referenced to the Common Mode Sense/ Driven Right Leg electrode. EEG data were sampled at 2048 Hz and online low pass filtered at DC to 400 Hz.

Behavioral data
Mean reaction times (RTs) for valid responses to the target (i.e., single responses within the time window 150-1500 ms after target onset) were calculated for each condition and each subject. Furthermore, the percentage correct responses and percentage omissions were calculated for each condition and each subject. The percentage commission errors to the NoGo stimulus (i.e., a non-target preceded by a cue) was calculated for each subject only for the 50% target probability blocks, because there were too few NoGo trials during Fig. 1. Overview of the cued Go/NoGo task. Letters were presented on the screen and participants were instructed to press a pre-specified button when letter X (target) followed letter A (cue), and when letter Y (target) followed letter A (cue). Four blocks of letter trials were presented, which differed in the amount of money that could be won for correct and fast responses (either 0 or 5 Euros maximally in total) and in the probability of target appearance after the cue (either 50% or 98%). the 98% target probability blocks.

ERP data
ERP data collected during the CGN task were analyzed using Brainvision Analyzer 2.0 (Brain Products GmbH). Data were re-referenced to the average reference, were filtered with a 30 Hz low pass filter (24 dB/oct) and an additional 50 Hz Notch filter, and re-sampled to 256 Hz. Data were segmented into windows from 100 ms before (no) cue onset until 1000 ms after (no)cue onset. Cue-locked segments with pre-mature (< 150 ms) or late responses to the target (> 1500 ms) or with omissions, choice errors, or commission errors were removed from further analyses. Ocular artifacts were corrected by using the Gratton & Coles method (Gratton et al., 1983) and a baseline correction was applied subsequently by using the 100 ms time window before cue onset. Channels were individually inspected for segments with artifacts by using an automatic artifact rejection procedure (maximal allowed absolute difference between two values: 100 μV, lowest allowed activity within a 100 ms interval: 0,5 μV). 1.8 ( ± 2.1) % of the data segments were lost on average due to the artifact rejection procedure.
Channels with less than half of the segments left within a particular subject/condition were interpolated with a spherical splines method (Brainvision Analyzer 2.0, Brain Products GmbH) using the neighboring electrodes. Data of three subjects were removed from further analyses, because less than half of the cue-(< 40) or nocue-locked (< 100) segments were left in one or more conditions for multiple neighboring electrodes. For 12 subjects data of one or more electrodes within one or more conditions were interpolated (See the table in section 2 of the Supplementary materials). For each subject and condition the average cue-nocue waveforms were computed from -100-700 ms around the cue. The border of the segment was set to 700 ms in order to limit the number of factors obtained with the PCA.
A principal component analysis (PCA) was conducted as this technique allows to separate possibly overlapping ERP components sensitive to reward, probability, or both in terms of spatial distribution and timing. A temporo-spatial PCA was conducted following the guidelines by Dien (2012) and by using the ERP PCA toolkit version 2.63 (Dien, 2010). Promax rotation with Kaiser loading weighting was used for the initial temporal PCA and nine factors were retained based on a Scree plot (Cattell, 1966). Subsequently, a separate spatial PCA with Infomax rotation was conducted for each of the temporal factors. Five spatial factors were retained for each temporal factor based on the Scree plot averaged over all temporal factors. Both PCA steps were based on the covariance matrix. These steps yielded 45 temporo-spatial factor combinations (TFSF). Based on prior studies (see Introduction) we expected the reward and probability effects to be strongest surrounding the midline of the scalp. The effects were expected to emerge between approximately 180-500 ms post cue onset. Based on recommendations by Dien (2012), PCA factors were selected for statistical analysis in case: (1) they explained more than 0.5% of the total variance, (2) the temporal loading peaked between 180 and 500 ms, and (3) the positive voltage was maximal around the midline electrodes. Eight of the 45 TFSF combinations met these criteria. Exploratory analyses were conducted for factors with a temporal loading outside the 180-500 ms postcue window (i.e., within 0-180 ms or 500-700 ms post-cue), and for factors with a more lateral spatial distribution. This pertained to an additional 10 TFSF factors.

Standard ERP analysis
For comparability with earlier and future studies we supplement the PCA results with results of a standard ERP analysis. Grand-average ERP waveforms for selected midline electrodes are depicted in the results section. The methods and statistical results of the standard analysis are provided in the supplementary materials section.
2.6. Statistical analyses 2.6.1. Behavioral data Repeated-measures ANOVAs (GLM, SPSS version 22) were run for RT, the percentage correct responses, the percentage commission errors to the NoGo stimulus, and the percentage omissions with reward availability (no reward, reward) and target probability (50%, 98%) as within-subject variables. For each contrast (i.e., reward-no reward, high-low probability, and reward effect high-reward effect low probability) deviation from normality was tested using Shapiro-Wilk's tests. Non-parametric Wilcoxon signed-rank tests were conducted for those contrasts that deviated significantly from normality.

ERP data -PCA
Reward x probability ANOVAs (GLM, SPSS version 22) were run on the TFSF combinations using the average of a 20-ms window around the peak of the factor tested. Alpha was set at 0.05. For each contrast (i.e., reward-no reward, high-low probability, and reward effect high-reward effect low probability) deviation from normality was tested using Shapiro-Wilk's tests. Non-parametric Wilcoxon signed-rank tests were conducted for those contrasts that deviated significantly from normality. The ten exploratory analyses were corrected for multiple comparisons using a Bonferroni correction. In addition, we analyzed the specific contrasts as mentioned in the introduction (discussion of 'adaptive scaling') in order to identify cortical activations sensitive to gradual increases in single-target reward value. The 'pure reward effect' was estimated from the contrast between 50%-target probability reward versus 50% no-reward conditions. The 'pure probability effect' was estimated from the contrast between no-reward 98% versus noreward 50%-target probability conditions. Furthermore, in order to identify cortical activations sensitive to gradual increases in singletarget reward value (which would NOT be predicted from the adaptivescaling perspective), rather than just block-level reward versus no reward, we constructed a 3-level factor consisting of 50%-probabilityreward (high reward per trial) versus 90%-probability-reward (low reward per trial) versus the average of the two no-reward conditions. This 'gradual-reward effect' was tested with MANOVA (GLM, SPSS version 22).

Behavioral results
None of the contrasts, except the probability contrast for RT, was normally distributed. Reaction times to the targets were significantly shorter during the reward blocks (median (mdn) RT = 470 ms) compared to no reward blocks (mdn = 501 ms), Z = −3.20, p = .001, rank bi-serial r (rrb) = 0.55. Reaction times were also significantly shorter during the 98% target probability blocks (mdn = 469 ms) compared to the 50% target probability blocks (mdn = 498 ms), F(1,44) = 32.41, p < .001, η p = .42. Furthermore, participants were more accurate during the reward blocks (mdn = 99.4%) compared to the no reward blocks (mdn = 98.8%), Z = −3.05, p = .002, rrb = 0.62. There was no significant main effect of target probability for the percentage correct responses. The percentage omissions was greater during the no reward blocks (mdn = 1.25%) compared to the reward blocks (mdn = 0.64%), Z = −2.21, p = .027, rrb = 0.43. There was no such difference between the high and low probability blocks. Participants rarely made commission errors to the NoGo stimulus. There was no significant difference between the reward and no reward block for the percentage commission errors (no reward block mdn = 0%; reward block mdn = 0%), Z = −1, p = .317, rrb = 0.5. Fig. 2 shows superimposed ERPs from the four reward-probability I. Schutte, et al. Neuropsychologia 132 (2019) 107138 conditions from selected midline electrode sites. The PCA yielded 8 temporo-spatial factors that met the pre-specified latency and medialdistribution criteria (see Methods paragraph 2.5.2). Table 1 provides an overview of the temporal and spatial distributions of the factors that met the pre-specified criteria and one additional factor that survived the Bonferroni correction. Fig. 3 displays the temporal loadings for the components with a significant effect of reward or probability. Fig. 4 displays the spatial loadings of these components. Fig. 5 summarizes the effects of reward and probability on ERP activity. TF1SF1 peaked at 439 ms after the cue and its positive maximum was localized at Pz. This factor was significantly more positive for the low compared to high probability condition. The factor peak was slightly larger at electrode FPz where it was less negative for the high vs low probability condition, F(1,44) = 4.12, p = .048, η p = .09. For the pure probability contrast (no-reward 50 vs 98%) this effect was replicated, Z = −2.07, p = .038, rrb = 0.35 (the distribution of the pure probability contrast deviated significantly from normality. The outcome of the Wilcoxon signed-rank test was therefore reported instead).

ERPs: PCA analysis of the effects of reward and target probability
TF1SF2 had the same peak latency (439 ms), but its maximum positivity was localized at Cz. It was significantly more positive for the reward compared to the no reward blocks, Z = −2.61, p = .009, rrb = 0.45. This effect was replicated for the pure reward contrast (50% reward vs no reward), t(44) = −2.74, p = .009, d = 0.41. The gradualreward effect (50%-probability-reward (high reward per trial) versus 90%-probability-reward (low reward per trial) versus the average of the two no-reward conditions) was also significant for this component, F (2,43) = 3.41, p = .042, η p = .14 (note, however, that the 50% reward versus averaged no reward contrast was not normally distributed. Therefore, the Wilcoxon signed-rank test was used as a follow-up test for this contrast). The gradual-reward effect reflected significant differences between both low and high per-trial reward versus no reward (t(44) = 2.61, p = .012, d = 0.39 and Z = −2.29, p = .022, rrb = 0.39, respectively), in the absence of a low versus high difference (p = .357). This is consistent with the adaptive-scaling perspective.
Additional exploratory analyses were conducted for 10 temporospatial factors with lateral spatial distributions and/or temporal loadings outside the 180-500 ms interval. Only TF2SF1 survived the correction for multiple comparisons. This factor was significantly less negative for the reward compared to the no reward condition at electrode FP1, F(1,44) = 12.84, p = .001, η p = .23. The special-contrast analysis again revealed a significant pure reward effect, t(44) = −2.46,  Note. a The TF1SF1 component was most positive at electrode Pz. The high > low probability effect, however, was observed at electrode Fpz. At Fpz the amplitude of TF1SF1 was less negative for the high compared to the low probability condition. b TF2SF1 did not meet the pre-specified latency criterion and was therefore tested exploratively. The other factors that did not meet the pre-specified criteria did not survive Bonferroni correction.

Discussion
The current study aimed to gain understanding in the temporal profile of the cortical processes activated during anticipation of reward with varying probabilities. Another main aim was to investigate whether ERPs elicited by reward and probability manipulations are dissociable. This study provides evidence for separate processing of reward value and probability in the cortex.
Consistent with prior studies (e.g., Bekker et al., 2004;Donamayor et al., 2012;Flores et al., 2015;Pfabigan et al., 2014), reaction times were significantly shorter during the reward blocks compared to the noreward blocks and during the 98% compared to 50% probability blocks. Participants were also more accurate during the reward blocks. These performance results show that cue-elicited processing must have been differential depending on reward and probability level. Note that this was the case even while, as in the current paradigm, behavioral choices did not at all concern reward or probability options.
To answer our research question, two main effects of reward were found, as well as one main effect of probability. One reward-related ERP emerged relatively early, and was strongest over the prefrontal Fig. 3. Temporal loadings as obtained before spatial decomposition. The Figure displays the temporal loadings associated with the factors with a significant effect of reward or probability. TF1SF1 (probabilityrelated positivity) and TF1SF2 (reward P300) originated from spatial decomposition of temporal factor (TF) 1 (left panel). TF2SF1 (late reward ERP) originated from spatial decomposition of temporal factor TF 2 (middle panel). TF5SF1 (reward-related positivity) originated from spatial decomposition of temporal factor TF 5 (right panel). Temporal loadings are converted to microvolt scaling.  electrode locations. A second reward-related ERP emerged later. This P300-like ERP peaked around 400 ms, and was prominent at the central electrode. The probability-related ERP had a similar latency, but the high-low probability ERP was largest over the medial prefrontal cortex. Exploratory analyses revealed an additional reward-related ERP late in the cue-target interval (around 680 ms post-cue). This reward-no reward ERP was largest over the left prefrontal cortex.
As noted, in the present design cues indicating high-probability rewarded targets also cue low single-target rewards. This implies a potential confound between reward and probability effects. Such a confound would not be expected from an 'adaptive scaling' perspective (Walsh and Anderson, 2012). This perspective predicts a response to a relatively low reward value (versus no reward), when the low value is the only available option (in addition to no reward, throughout a block of trials), that is identical to the response to a relatively high reward value (versus no reward) when the high value is the only available option. To evaluate the tenability of the adaptive-scaling perspective, special contrasts between conditions were constructed to assess differences between low and high per-trial reward effects (as in high-probability reward and low-probability reward blocked conditions, respectively). The analyses of the contrasts revealed that reward (versus no reward) effects did not differ at all as a function of per-trial reward magnitude, consistent with the adaptive-scaling perspective. In a similar vein, no indication was found that probability effects depended on the blocked-reward condition.
The PCA revealed an early frontal component that was significantly more positive when reward was at stake compared to when no reward was at stake. It had a latency of 244 ms, which is comparable to reported latencies of reward-related positivities (Donamayor et al., 2012, Flores et al., 2015Holroyd et al., 2011;Krigolson et al., 2014). These studies found an increased positivity around 200-250 ms following cues that signal reward compared to cues signaling no reward (Donamayor et al., 2012;Flores et al., 2015;Holroyd et al., 2011, Krigolson et al., 2014. Similarly, Yu and Zhou (2006) observed less negative ERP activity for cues predicting that money could be won during an upcoming gamble trial compared to cues predicting that money could be lost, albeit somewhat later in time (around 270 mst post-cue).
The early reward-related positivity observed in the present study and in the studies mentioned above, may be an instance of "the reward positivity". This is an ERP mostly observed after positive feedback about performance or a rewarding outcome (Foti et al., 2011;Holroyd et al., 2008;Holroyd et al., 2011;Proudfit, 2015). It usually peaks at midfrontal electrode sites and is proposed to reflect phasic dopaminergic input from the ventral tegmental area into the dorsal ACC when outcomes are better than expected in order to guide reinforcement learning (Holroyd and Coles, 2002) or to reduce conflict (Holroyd et al., 2008). A similar reward-related positivity has also been observed after reward announcing cues after the reward-predicting value of the cue has been learned (Krigolson et al., 2014). As such, the reward-related positivity associated with cues may reflect an initial estimation of the likelihood of a prospective reward which is adjusted after feedback when necessary (Holroyd et al., 2011).
In the current study the early reward-related positivity (we refer here to any reward manipulation-specific positive deflection peaking before 300 ms) was most prominent at the medial prefrontal electrode site AFz, and resembled the topography of the reward-related positivity as observed by Flores et al. (2015). Nieuwenhuis et al. (2005b) observed multiple generators of this component 3 including areas within the rostral ACC. In the study by Donamayor and colleagues (2012), however, the reward-related positivity peaked somewhat more posteriorly (i.e., at the mid-frontal electrodes), and was source localized to the dorsal posterior cingulate cortex (dPCC).
The early reward-related positivity in the current study may alternatively reflect enhanced attentional capture by the reward cues (Padmala and Pessoa, 2011), or may instead reflect the modulatory effect of reward on sensory processes (Pessoa and Engelmann, 2010). A related phenomenon has been described as the 'frontal selection positivity' (FSP; Kenemans et al., 2002;Bekker et al., 2004). This is a frontally distributed ERP deflection that is stronger for relevant (e.g., cues signaling potential reward) relative to less relevant stimuli (e.g., cues signaling no potential reward). Identifying the present rewardrelated positivity as an FSP would imply a pronounced contribution of posterior-cortex generators (Kenemans et al., 2002), consistent with an interpretation in terms of the modulatory effect of reward on sensory (i.e., visual) processing.
The PCA yielded another component that was significantly more positive for reward compared to no-reward cues. This component has a central distribution and a temporal loading of 439 ms. The latency and central distribution of this ERP may be consistent with an interpretation in terms of P300/P3b. This finding is in line with previous research showing sensitivity of the P300 to the magnitude of wins and losses (Yeung and Sanfey, 2004) as well as the anticipation of reward (Broyd et al., 2012;Flores et al., 2015;Pfabigan et al., 2014). Nieuwenhuis and colleagues (Nieuwenhuis, Aston-Jones & Cohen, 2005a) argued that the P300 reflects the modulatory influence of the locus coeruleus norepinephrine system on information processing in the case of motivationally relevant events. Pfabigan et al. (2014) additionally suggested that the P300 elicited during anticipation of reward may particularly be dependent on dopamine transmission.
The third main effect concerned an effect of target probability. The PCA component sensitive to the probability manipulation had a temporal loading of 439 ms. It consisted of a parietal positivity and prefrontal negativity. The component was significantly more positive (less negative) for highly reliable cues (indicating 98% probability of an upcoming target) compared to unreliable cues (indicating 50% probability of an upcoming target) at the prefrontal electrode site. Based on a prior study of our lab (Bekker et al., 2004) and classical findings on the P300 (Duncan-Johnson and Donchin, 1982) we expected the parietal P300 to be sensitive to target probability. However, the high > low probability ERP effect in the current study was frontally distributed. This frontal distribution is consistent with an fMRI study by Knutson et al. (2005) indicating that probability is also represented in the medial prefrontal cortex (Knutson et al., 2005). It is probable that the precise nature of probability effects in the context of explicit reward is different from that in more implicit-reward contexts (e.g., when subjects just follow the instructions of the experimenter).
It could be argued that any effect of probability on cue-elicited activation reflects enhanced response preparation to highly probable subsequent targets, compared to low-probability targets. However, this view predicts probability effects on late slow cue-induced potentials such as the contingent negative variation and stimulus-preceding negativity, not on earlier activations such as the currently described probability ERP. In addition, Bekker et al. (2004) reported such early probability effects, in the absence of probability effects on late slow potentials. Furthermore, in the present study the extent of specific-response preparation was probably very limited, as target stimuli embodied a two-choice reaction-time task in which both response alternatives had equal probability.
In conclusion, the current study aimed to dissociate the effect of reward value and target probability manipulations on anticipatory ERPs and provides evidence for separate processing of reward value and probability cues in the cortex. An early reward-related positivity and a late (P300-like) ERP component were specifically affected by reward availability, whereas target probability affected a late frontally distributed ERP. Both reward effects obey the principle of adaptive scaling. The early-reward-related positivity may reflect reward-modulated sensory processing of the reward cues. The probability effect is qualitatively different from analogous effects reported before, perhaps due to the explicit-reward context as maintained in the present paradigm.

Funding
This work was supported by the Netherlands Organisation for Scientific Research [grant number 404-10-318]. The funder had no role in the study design, in the collection, analysis, and interpretation of data, in writing the report and the decision to publish the report.

Declarations of interest
None.
CRediT authorship contribution statement