Title 2 Neural precursors of deliberate and arbitrary decisions in the study of voluntary action 3 4

17 The readiness potential (RP)—a key ERP correlate of upcoming action—is known to precede 18 subjects' reports of their decision to move. Some view this as evidence against a causal role for 19 consciousness in human decision-making and thus against free-will. Yet those studies focused 20 on arbitrary decisions—purposeless, unreasoned, and without consequences. It remains 21 unknown to what degree the RP generalizes to deliberate, more ecological decisions. We 22 directly compared deliberate and arbitrary decision-making during a $1000-donation task to 23 non-profit organizations. While we found the expected RPs for arbitrary decisions, they were 24 strikingly absent for deliberate ones. Our results and drift-diffusion model are congruent with 25 the RP representing accumulation of noisy, random fluctuations that drive arbitrary—but not 26 deliberate—decisions. They further point to different neural mechanisms underlying deliberate 27 and arbitrary decisions, challenging the generalizability of studies that argue for no causal role 28 for consciousness in decision-making to real-life decisions. 29


Introduction 45
Humans typically experience freely selecting between alternative courses of action, say, when 46 ordering a particular item off a restaurant menu. Yet a series of human studies using 47 electroencephalography (EEG) (Haggard & Eimer, 1999 (Perez et al., 50 2015), and single-cell recordings (Fried, Mukamel, & Kreiman, 2011) challenged the validity 51 of this common experience. These studies found neural correlates of decision processes 52 hundreds of milliseconds and even seconds prior to the moment that subjects reported having 53 consciously decided. The seminal research that launched this series of studies was conducted 54 by Benjamin Libet and colleagues (Libet, Gleason, Wright, & Pearl, 1983). There, the authors 55 showed that the readiness potential (RP)-a ramp-up in EEG negativity before movement 56 onset, thought to originate from the presupplementary motor area (pre-SMA)-begins before 57 subjects report a conscious decision to act. Some have claimed, following these and other 58 findings, that the subjective human experience of freely deciding is but an illusion, because 59 human actions are unconsciously initiated before the conscious decision to act (Harris, 2012; 60 Libet et al., 1983;Wegner, 2002). This debate has been captivating scholars from many 61 disciplines in and outside of academia (C. Frith Mele, 2006;Wegner, 2002). 64 Critically, in the above studies, subjects were told to arbitrarily move their right hand or flex 65 their right wrist; or they were instructed to arbitrarily move either the right or left hand 66 (Haggard, 2008;Hallett, 2016;Roskies, 2010). Thus, their decisions were always unreasoned, 67 purposeless, and bereft of any real consequence. This stands in sharp contrast to many real-life 68 decisions that are deliberate-i.e., reasoned, purposeful, and bearing consequences (Ullmann-69 Margalit & Morgenbesser, 1977): which clothes to wear, what route to take to work, as well as 70 more formative decisions about life partners, career choices, and so on. 71 Deliberate decisions have been widely studied in the field of neuroeconomics (Kable & 72 Glimcher, 2009; Sanfey, Loewenstein, McClure, & Cohen, 2006) and in perceptual tasks (Gold 73 & Shadlen, 2007). Yet, interestingly, little has been done in that field to assess the relation 74 between decision-related activity, subjects' conscious experience of deciding, and the neural 75 activity instantaneously contributing to this experience. Though some studies compared, for 76 example, internally driven and externally cued decisions (Thut et al., 2000;Wisniewski, 77 Goschke, & Haynes, 2016), or stimulus-based and intention-based actions (Waszak et al.,78 2005), these were typically arbitrary decisions and actions with no real implications. Therefore, 79 the results of these studies provide no direct evidence about potential differences between 80 arbitrary and deliberate decisions. 81 Such direct comparisons are critical for the free will debate, because it is deliberate, rather than 82 arbitrary, decisions that are at the center of philosophical arguments about free will and moral 83 responsibility (Breitmeyer, 1985;Roskies, 2010). Deliberate decisions typically involve more 84 conscious and lengthy deliberation and might thus be more tightly bound to conscious 85 processes than arbitrary ones. Thus, one could speculate that different findings might be 86 obtained when inspecting the RP in arbitrary compared to deliberate decisions. 87 A further reason to expect such differences stems from a recent computational model, which 88 challenged the claim that the RP represents a genuine marker of unconscious decisions. Rather, 89 the model suggested that the RP might reflect the artificial accumulation, up to a threshold, of 90 stochastic fluctuations in neural activity. In the model, crossing the threshold directly leads to 91 action (Schurger, Sitt, & Dehaene, 2012 Here, we tested this prediction and directly compared the neural precursors of deliberate and 99 arbitrary decisions-and in particular the RP-on the same subjects, in an EEG experiment. 100 Our experiment utilized a donation-preference paradigm, in which a pair of non-profit 101 organizations (NPOs) were presented in each trial. In deliberate-decision trials, subjects chose 102 to which NPO they would like to donate $1000. In arbitrary-decision trials, both NPOs 103 received an equal donation of $500, irrespective of subjects' key presses (Fig. 1). In both 104 conditions, subjects were instructed to report their decisions as soon as they made them, and 105 their hands were placed on the response keys, to make sure they could do so as quickly as 106 possible. Notably, while the visual inputs and motor outputs were identical between deliberate 107 and arbitrary decisions, the decisions' meaning for the subjects was radically different: in 108 deliberate blocks, the decisions were meaningful and consequential-reminiscent of important, 109 real-life decisions-while in arbitrary blocks, the decisions were meaningless and bereft of 110 consequences-mimicking previous studies of volition. 111 112

Results 113
Behavioral Results 114 Subjects' reaction times (RTs) were analyzed using a 2-way ANOVA along decision 115 type (arbitrary/deliberate) and difficulty (easy/hard). This was carried out on log-116 transformed data (raw RTs violated the normality assumption; W=0.94, p=0.001). 117 As expected, subjects were substantially slower for deliberate (M=2. 33 that our model predicted for deliberate trials (see below) rather than reflecting a typical RP. As 197 the RTs for deliberate trials were longer than for arbitrary ones, this trend might have become 198 more pronounced for those trials. To test this, we switched the baseline period to -1000 ms 199 to -500 ms relative to movement onset (i.e., a baseline that immediately preceded our time of 200 interest window). Under this analysis, we found evidence that deliberate decisions (pooled 201 across decision difficulty) are not different from 0 (BF=0.332), supporting the claim that the 202 RP during the last 500 ms before response onset was completely absent (BF for similarly 203 pooled arbitrary decisions was 5.07·10 4 ). 204 205

221
In an effort to further test for continuous time regions where the RP is different from 0 for 222 deliberate and arbitrary trials, we ran a cluster-based nonparametric permutation analysis 223 (Maris & Oostenveld, 2007) for all four conditions against 0. Using the default parameters (see 224 Methods), we found a prolonged cluster (~1.2s) of activation that reliably differed from 0 in 225 both arbitrary conditions (designated by horizontal blue-shaded lines above the x axis in Fig.  226 3A). The same analysis revealed no clusters of activity differing from zero in either of the 227 deliberate conditions. 228 In a similar manner, regressing voltage against time for the last 1000 ms before response onset, 229 the downward trend was significant for arbitrary decisions ( Fig. 3B; p<0.0001, BF>10 25 for 230 both easy and hard conditions) but not for deliberate decisions, with the Bayes factor indicating 231 conclusive evidence for no effect (hard: p>0.5, BF=0.09; easy: p=0.35, BF=0.31; all 232 Bonferroni corrected for multiple comparisons). Notably, this pattern of results was also 233 manifested for single-subject analysis ( Fig. 4

Control analyses 244
We further tested whether differences in reaction time between the conditions, eye movements, 245 filtering, and subjects' consistency scores might explain our effect. We also tested whether the 246 RPs might reflect some stimulus-locked potentials or be due to baseline considerations. 247

Differences in reaction times (RT) between conditions, including stimulus-locked potentials 248
and baselines, do not drive the effect 249 RTs in deliberate decisions were typically more than twice as long as RTs in arbitrary 250 decisions. We therefore wanted to rule out the possibility that the absence of RP in deliberate 251 decisions stemmed from the difference in RTs between the conditions. We carried out six 252 analyses for this purpose. First, we ran a median split analysis-dividing the subjects into two 253 groups based on their RTs: lower (faster) and higher (slower) than the median, for deliberate 254 and arbitrary trials, respectively. We then ran the same analysis using only the faster subjects 255 in While the results of the above analyses suggested that our effects do not stem from differences 280 between the RTs in deliberate and arbitrary decisions, the average RTs for fast deliberate 281 subjects were still 660 ms slower than for slow arbitrary subjects. In addition, we had only half 282 of the subjects in each condition due to the median split, raising the concern that some of our 283 null results might have been underpowered. We also wanted to look at the effect of cross-trial 284 variations within subjects and not just cross-subjects ones. We therefore ran a third, within-285 subjects analysis. We combined the two decision difficulties (easy and hard) for each decision 286 type (arbitrary and deliberate) for greater statistical power. And then we took the faster (below-287 median RT) deliberate trials and slower (above-median RT) arbitrary trials for each subject 288 separately. So, this time we had 17 subjects (again, one was removed) and better powered 289 results. Here, fast deliberate arbitrary trials (M=1.63 s, SD=0.25) were just 230 ms slower than 290 slow arbitrary decisions (M=1.40 s, SD=0.45), on average. This cut the difference between fast 291 deliberate and slow arbitrary by about 2/3 from the between-subjects analysis. We then 292 computed the RPs for just these fast deliberate and slow arbitrary trials within each subject 293 (Fig. 5C). Visually, the pattern there is the same as the main analysis (Fig. 3A). What is more, 294 deliberate and arbitrary decisions remained reliably different (t(16)=3.36, p=0.004). Arbitrary 295 trials were again different from 0 (t(16)=-4.40, p=0.0005), while deliberate trials were not 296 (t(16)=-1.54, p=0.14). 297 298

304
(confidence intervals: dashed red lines). The R 2 is 0.05. One subject, #7, had an RT difference 305 between deliberate and arbitrary decisions that was more than 6 interquartile ranges (IQRs) 306 away from the median difference across all subjects. That same subject's RT difference was 307 also more than 5 IQRs higher than the 75 th percentile across all subjects. That subject was 308 therefore designated an outlier and removed only from this regression analysis. We further regressed the within-subject differences between RPs in fast deliberate and slow 315 arbitrary decisions (defined as above) against the differences between the corresponding RTs 316 for each subject to ascertain that such a correlation would not exist for trials that are closer 317 together. We again found no reliable relation between the two differences ( Fig. 5D Yet another concern that could relate to the RT differences among the conditions is that the RP 320 in arbitrary blocks might actually be some potential evoked by the stimuli (i.e., the 321 presentations of the two causes), specifically in arbitrary blocks, where the RTs are shorter 322 (and thus stimuli-evoked effects could still affect the decision). In particular, a stimulus-evoked 323 potential might just happen to bear some similarity to the RP when plotted locked to response 324 onset. To test this explanation, we ran a fifth analysis, plotting the potentials in all conditions, 325 locked to the onset of the stimulus (Fig. 6A). We also plotted the response-locked potentials 326 across an expanded timecourse for comparison (Fig. 6B). If the RP-like shape we see in Figs. 327 3A and 6B is due to a stimulus-locked potential, we would expect to see the following before 328 the 4 mean response onset times (indicated by vertical lines at 0.98 and 1.00, 2.13, and 2.52 s 329 for arbitrary easy, arbitrary hard, deliberate easy, and deliberate hard, respectively) in the 330 stimulus-locked plot ( Fig. 6A): Consistent potentials, which precede the mean response times, 331 that would further be of a similar shape and magnitude to the RPs found in the decision-locked 332 analysis in the arbitrary condition (though potentially more smeared for stimulus locking). We 333 thus calculated a stimulus-locked version of our ERPs, using the same baseline ( Fig. 6A). As 334 the comparison between Fig. 6A and 6B clearly shows, no such consistent potentials were 335 found before the 4 response times, nor were these potentials similar to the RP in either shape or 336 magnitude (their magnitudes are at the most around 1µV, while the RP magnitudes we found 337 are around 2.5 µV; Figs. 3A, 6B). This analysis thus suggests that it is unlikely that a stimulus-338 locked potential drives the RP we found. 339 Notably, the stimulus-locked alignment did imply that the arbitrary easy condition evoked a 340 stronger activity in roughly the last 0.5 s before stimulus onset. However, this prestimulus 341 activity cannot explain the response-locked RP, as it was found only in arbitrary easy trials 342 and not in arbitrary hard trials. At the same time, the response-locked RP did not differ 343 between these conditions. What is more, easy and hard trials were randomly interspersed 344 within deliberate and arbitrary blocks, and the subject discovered the trial difficulty only at 345 stimulus onset. Thus, there could not have been differential preparatory activity that varies 346 with decision difficulty. This divergence in one condition only is accordingly not likely to 347 reflect any preparatory RP activity. 348 One more concern is that the differences in RTs may affect the results in the following manner: 349 Because the main baseline period we used thus far was 1 to 0.5 s before stimulus onset, the 350 duration from the baseline to the decision varied widely between the conditions. To make sure 351 this difference in temporal distance between the baseline period and the response to which the 352 ERPs were locked did not drive our results, we recalculated the potentials for all conditions 353 with a response-locked baseline of -1 to -0.5 s ( Fig. 6C; the same baseline we used for the 354 Bayesian analysis above). The rationale behind this choice of baseline was to have the time 355 that elapsed from baseline to response onset be the same across all conditions. As is evident in 356 Fig. 6C t(17)=1.13, p=0.27). This supports the notion that the choice of baseline does not strongly 364 affect our main results. Taken together, the results of the six analyses above provide strong 365 evidence against the claim that the differences in RPs stem from or are affected by the 366 differences in RTs between the conditions. 367

375
Eye movements do not affect the results 376 Though ICA was used to remove blink artifacts and saccades (see Methods), we wanted to 377 make sure our results do not stem from differential eye movement patterns between the 378 conditions. We therefore computed a saccade-count metric (SC; see Methods) for each trial for 379 all subjects. Focusing again on the last 500 ms before response onset, we computed mean 380

Stimulus-locked Cz ERPs
Baseline Period We further investigated potential effects of saccades by running a median-split analysis-385 dividing the trials for each subject into two groups based on their SC score: lower and higher 386 than the median, for deliberate and arbitrary trials, respectively. We then ran the same analysis 387

Response-locked Cz ERPs
using only the trials with more saccades in the deliberate condition (SC was 2.02±0.07 and 388 2.04±0.07 for easy and hard, respectively) and those with less saccades for the arbitrary 389 condition (SC was 1.33±0.07 and 1.31±0.08 for easy and hard, respectively). If the number of 390 saccades affects RP amplitudes, we would expect that the differences in RPs between arbitrary 391 and deliberate trials will diminish, or even reverse (as now we had more saccades in the 392 deliberate condition original results, a difference was found between RP amplitude in the two conditions 427 (t(13)=2.29, p=0.0394), with RP in the arbitrary condition differing from zero (t(13)=-5.71, 428 p<0.0001), as opposed to the deliberate condition, where it did not (t(13)=-0.76, p=0.462). This 429 provides evidence against the claim that our results are due to our choice of high-pass filter. 430 EEG Results: Lateralized Readiness Potential (LRP) 431 The LRP, which reflects activation processes within the motor cortex for action preparation 432 after action selection (Eimer, 1998    for the fact that, in our experiment, subjects not only decided when to move, but also what to 482 move (either to indicate which NPO they prefer in the deliberate condition, or to generate a 483 meaningless right/left movement in the arbitrary condition). We modeled this by defining two 484 types of movement. One was moving the hand corresponding to the location of the NPO that 485 was rated higher in the first, rating part of the experiment (the congruent option; see Methods). 486 The other was moving the hand corresponding to the location of the lower-rated NPO (the 487 incongruent option). We used the race-to-threshold framework to model the decision processed 488 between a pair of leaky, stochastic accumulators, or DDMs (see again Fig. 8A). One DDM 489 simulated the process that leads to selecting the congruent option, and the other simulated the 490 process that leads to selecting the incongruent option. Hence, in each model run, the two 491 DDMs ran in parallel; the first one to cross the threshold determined the decision outcome. 492 And so, if the DDM corresponding to the congruent (incongruent) option reached the threshold 493 first, the trial ended with selecting the congruent (incongruent) option. Thus, for deliberate 494 decisions, the congruent cause had a higher value than the incongruent cause; the DDM 495 associated with the congruent option accordingly had a higher drift rate than that of the DDM 496 associated with the incongruent option. For arbitrary decisions, the values of the decision 497 alternatives mattered very little and this was reflected in the small differences, if at all, among 498 the drift rates (Table 1). 499 Therefore, taken together, these two changes to the original model by Schurger and colleagues 500 resulted in a model that included four DDMs, divided into two pairs, each pair racing to a 501 threshold (Fig. 8A); the first pair reflected the value assessment process (taking place in 502 Region X, and determining the result of deliberate decisions). The second reflected a 503 mechanism of threshold crossing by random fluctuations (taking place in the SMA and 504 determining the results of arbitrary decisions). Each such pair included one DDM for the 505 congruent option and one DDM for the incongruent option. And so, in each trial, the four 506 DDMs were run, and the decision outcome was determined by the first DDM to reach the 507 threshold in the noise component for arbitrary decisions and in the value component for 508 deliberate decisions. 509 suggest that noise generation might be a key function of the SMA and other brain regions 525 underneath the Cz electrode, at least during this specific task. When subjects make arbitrary 526 decisions, these might be based on some symmetry-breaking mechanism, which is driven by 527 random fluctuations that are here simulated as noise. Thus, we neither claim nor think that 528 noise generation is the main purpose or function of these brain regions in general.) 529 The had no consistent effect on the EEG data, we focus the discussion below on easy decisions 532 (though the same holds for hard decisions). According to our model, the race-to-threshold pair 533 of DDMs that would determine deliberate decisions and trigger the ensuing action is the value-534 assessment one in Region X. Hence, when the first DDM of the Region X pair would reach the 535 threshold, the decision would be made and movement would ensue. The SMA pair, in contrast, 536 would not integrate toward a decision (Fig. 8B). We modeled this by not including any 537 decision threshold in the SMA in deliberate decisions (i.e., the threshold was set to infinity, 538 letting the DDM accumulate forever). (The corresponding magnitudes of the drift-rate are 539 detailed in the Methods.) So, what happens in the SMA (and supposedly recorded using 540 electrode Cz) when Region X activity reaches the threshold? SMA activity will have 541 accumulated to some random level (Fig. 8B). This entails that, when we align such SMA 542 activity to movement onset, we will find just a simple, weak linear trend in the SMA. This 543 trend is the one depicted in red in Fig. 9C (in red) for the deliberate easy and hard conditions 544 (here model activity was flipped vertically-from increasing above the x axis to decreasing 545 below it-as in Schurger et al., 2012). In arbitrary decisions, on the other hand, the SMA pair 546 determines the outcome, and motion ensues whenever one of the DDMs crosses the threshold. 547 Thus, when its activity is inspected with respect to movement onset, it forms the RP-like shape 548 of Fig. 9C (Fig. 9A, B). The model was simultaneously fit 554 to the empirical consistency ratios (the proportions of congruent decisions), which were 0.99, 555 0.83, 0.54 and 0.49. The model's corresponding consistency ratios were 1.00, 0.84, 0.53 and 556 0.53. The model then predicted the shape of the ERP in its noise component, over the SMA 557 (assumed to be reflected by Cz-electrode activity) for each decision type: a continuing, RP-like 558 increase in activity (with a negative sign) for arbitrary decisions, but only a very slight increase 559 in activity for deliberate decisions (Fig. 9C, here a decrease due to the negative sign). This was 560 in line with our empirical results (compare Fig. 3A). Note that that the Schurger model aims to 561 account for neural activity leading up to the decision to move, but no further (Schurger et al.,  562 2012). Similarly, we expect our DDM to fit Cz neural data only up to around -0.1 s (100 ms 563 before response onset). We also make no claims that ours is the only, or even optimal, model 564 that explains our results. Rather, by extending the Schurger model, our goal was to show how 565 that interpretation of the RP could also be applied to our more-complex paradigm. (  value, and the values of alternatives typically guide one's decision). But this notion of freedom 600 faces several obstacles. First, most discussions of free will focus on deliberate decisions, 601 asking when and whether these are free (Frankfurt, 1971;Hobbes, 1994;Wolf, 1990). This 602 might be because everyday decisions to which we associate freedom of will-like choosing a 603 more expensive but more environmentally friendly car, helping a friend instead of studying 604 more for a test, donating to charity, and so on-are generally deliberate, in the sense of being 605 reasoned, purposeful, and bearing consequences (although see Deutschländer, Pauen, and 606 Haynes (2017)). In particular, the free will debate is often considered in the context of moral 607 responsibility (e.g., was the decision to harm another person free or not) (Fischer, 1999 1994), and free will is even sometimes defined as the capacity that allows one to be morally 610 responsible (Mele, 2006(Mele, , 2009). In contrast, it seems meaningless to assign blame or praise to 611 arbitrary decisions. Thus, though the scientific operationalization of free will has typically 612 focused on arbitrary decisions, the common interpretations of these studies-in neuroscience 613 and across the free will debate-have often alluded to deliberate ones. 614 Here, we show that inference from arbitrary to deliberate decisions may not be justified, as the 615 neural precursors of arbitrary decisions, and in particular the RP, do not generalize to 616 meaningful ones (Breitmeyer, 1985;Roskies, 2010 align with the different brain regions associated with the two decision types above, as also 667 evidenced by the differences we found between the scalp distributions of arbitrary and 668 deliberate decisions (Fig. 3A). Further studies are needed to explore this potential divergence 669 in the neural regions between the two decision types. 670 To be clear, and following the above, we do not claim that the RP captures all unconscious 671 processes that precede conscious awareness. However, some have suggested that the RP 672 represents unconscious motor-preparatory activity before any kind of decision (e.g., Libet, 673 1985). But our results provide evidence against that claim, as we do not find an RP before 674 deliberate decisions, which also entail motor preparation. What is more, in deliberate decisions 675 in particular, it is likely that there are neural precursors of upcoming actions-possibly 676 involving the above neural circuits as well as circuits that represents values-which are 677 unrelated to the RP. Note also that we did not attempt to separately measure the timing of 678 subjects' conscious decision to move. Rather, we instructed them to hold their hands above the 679 relevant keyboard keys and press their selected key as soon as they made up their mind. This 680 was both to keep the decisions in this task more ecological and because we think that the key 681 method of measuring decision onset (using some type of clock to measure Libet's W-time) is 682 highly problematic (see Methods interpreted as "press right". It would therefore follow that subjects are actually not making 723 decisions in deliberate trials, which in turn is reflected by the absence of the RP in those trials. 724 However, the reaction time and consistency results that we obtained provide evidence against 725 this interpretation. We found longer reaction times for hard-deliberate decisions than for easy-726 deliberate ones (2.52 versus 2.13 s, on average, respectively; Fig. 2 left) and higher 727 consistencies with the initial ratings for easy-deliberate decisions than for hard-deliberate 728 decisions (0.99 versus 0.83, on average, respectively; Fig. 2 right). If the NPO names acted as 729 mere cues, we would have expected no differences between reaction times or consistencies for 730 easy-and hard-deliberate decisions. In addition, there were 50 different causes in the first part 731 of the experiment. So, it is highly unlikely that subjects could memorize all 1225 pairwise 732 preferences among these causes and simply transform any decision between a pair of causes 733 into a stimulus instructing to press left or right. 734 Another alternative interpretation of our results is that subjects engage in (unconscious) 735 deliberation also during arbitrary decisions (Tusche, Bode, & Haynes, 2010), as they are trying 736 to find a way to break the symmetry between the two possible actions. If so, the RP in the 737 arbitrary decisions might actually reflect the extra effort in those types of decisions, which is 738 not found in deliberate decisions. However, this interpretation entails a longer reaction time for 739 arbitrary than for deliberate decisions, because of the heavier cognitive load, which is the 740 opposite of what we found ( Fig. 2A). Under this interpretation, we would also expect the 741 simpler deliberation in arbitrary-easy trials to result in a shorter reaction-time than that of 742 arbitrary-hard. But this is not what we find ( Fig. 2A). 743 In conclusion, our study suggests that RPs do not precede deliberate decisions or is at least 744 strongly diminished before such decisions. In addition, it suggests that RPs represent an 745 artificial accumulation of random fluctuations rather than serving a genuine marker of an 746 unconscious decision to initiate voluntary movement. This further motivates future 747 investigations into other precursors of action besides the RP using EEG, fMRI, or other 748 techniques. It also highlights that it would be of particular interest to find the neural activity 749 that precedes deliberate decisions. And it would also be of interest to find neural activity, 750 which is not motor activity, that is common to both deliberate and arbitrary decisions. 751

Materials and Methods 752
Subjects 753 Twenty healthy subjects participated in the study. They were California Institute of 754 Technology (Caltech) students as well as members of the Pasadena community. All subjects 755 had reported normal or corrected-to-normal sight and no psychiatric or neurological history. 756 They volunteered to participate in the study for payment ($20 per hour). Subjects were 757 prescreened to include only participants who were socially involved and active in the 758 community (based on the strength of their support of social causes, past volunteer work, past 759 donations to social causes, and tendency to vote). The data from 18 subjects was analyzed; two 760 subjects were excluded from our analysis (see Sample size and exclusion criteria below). The 761 experiment was approved by Caltech's Institutional Review Board (14-0432; Neural markers 762 of deliberate and random decisions), and informed consent was obtained from all participants 763 after the experimental procedures were explained to them. 764

Sample size and exclusion criteria 765
We ran a power analysis based on the findings of Haggard and Eimer (1999). Their RP in a 766 free left/right-choice task had a mean of 5.293 µV and standard deviation of 2.267 µV. Data 767 from a pilot study we ran before this experiment suggested that we might obtain smaller RP 768 values in our task (they referenced to the tip of the nose and we to the average of all channels, 769 which typically results in a smaller RP). Therefore, we conservatively estimated the magnitude 770 of our RP as half of that of Haggard & Eimer, 2.647 µV, while keeping the standard deviation 771 the same at 2.267 µV. Our power analysis therefore suggested that we would need at least 16 772 subjects to reliably find a difference between an RP and a null RP (0 µV) at a p-value of 0.05 773 and power of 0.99. This number agreed with our pilot study, where we found that a sample size 774 of at least 16 subjects resulted in a clear, averaged RP. Following the above reasoning, we 775 decided beforehand to collect 20 subjects for this study, taking into account that some could be 776 excluded as they would not meet the following predefined inclusion criteria: at least 30 trials 777 per experimental condition remaining after artifact rejection; and averaged RTs (across 778 conditions) that deviated by less than 3 standard deviations from the group mean. 779 Subjects were informed about the overall number of subjects that would participate in the 780 experiment when the NPO lottery was explained to them (see below). So, we had to finalize 781 the overall number of subjects who would participate in the study-but not necessarily the 782 overall number of subjects whose data would be part of the analysis-before the experiment 783 began. After completing data collection, we ran only the EEG preprocessing and behavioral-784 data analysis to test each subject against the exclusion criteria. This was done before we looked 785 at the data with respect to our hypothesis or research question. Two subjects did not meet the 786 inclusion criteria: the data of one subject (#18) suffered from poor signal quality, resulting in 787 less than 30 trials remaining after artifact rejection; another subject (#12) had RTs longer than 788 3 standard deviations from the mean. All analyses were thus run on the 18 remaining subjects. 789

Stimuli and apparatus 790
Subjects sat in a dimly lit room. The stimuli were presented on a 21" Viewsonic G225f (20" 791 viewable) CRT monitor with a 60-Hz refresh rate and a 1024×768 resolution using 792 Psychtoolbox version 3 and Mathworks Matlab 2014b (Brainard, 1997;Pelli, 1997 In the first part of the experiment, subjects were presented with each of the 50 NPOs and the 817 causes with which the NPOs were associated separately (see Supplementary Table 1). They  818 were instructed to rate how much they would like to support that NPO with a $1000 donation 819 on a scale of 1 ("I would not like to support this NPO at all) to 7 ("I would very much like to 820 support this NPO"). No time pressure was put on the subjects, and they were given access to 821 the website of each NPO to give them the opportunity to learn more about the NPO and the 822 cause it supports. 823 After the subjects finished rating all NPOs, the main experiment began. In each block of the 824 experiment, subjects made either deliberate or arbitrary decisions. Two succinct cause 825 descriptions, representing two actual NPOs, were presented in each trial (Fig. 1). In deliberate 826 blocks, subjects were instructed to choose the NPO to which they would like to donate $1000 827 by pressing the <Q> or <P> key on the keyboard, using their left and right index finger, for the 828 NPO on the left or right, respectively, as soon as they decided. Subjects were informed that at 829 the end of each block one of the NPOs they chose would be randomly selected to advance to a 830 lottery. Then, at the end of the experiment, the lottery will take place and the winning NPO 831 will receive a $20 donation. In addition, that NPO will advance to the final, inter-subject 832 lottery, where one subject's NPO will be picked randomly for a $1000 donation. It was 833 stressed that the donations were real and that no deception was used in the experiment. To 834 persuade the subjects that the donations were real, we presented a signed commitment to 835 donate the money, and promised to send them the donation receipts after the experiment. Thus, 836 subjects knew that in deliberate trials, every choice they made was not hypothetical, and could 837 potentially lead to an actual $1020 donation to their chosen NPO. 838 Arbitrary trials were identical to deliberate trials except for the following crucial differences. 839 Subjects were told that, at the end of each block, the pair of NPOs in one randomly selected 840 trial would advance to the lottery together. And, if that pair wins the lottery, both NPOs would 841 receive $10 (each). Further, the NPO pair that would win the inter-subject lottery would 842 receive a $500 donation each. Hence it was stressed to the subjects that there was no reason for 843 them to prefer one NPO over the other in arbitrary blocks, as both NPOs would receive the 844 same donation regardless of their button press. Subjects were told to therefore simply press 845 either <Q> or <P> as soon as they decided to do so. 846 Thus, while subjects' decisions in the deliberate blocks were meaningful and consequential, 847 their decisions in the arbitrary blocks had no impact on the final donations that were made. In 848 these trials, subjects were further urged not to let their preferred NPO dictate their response. 849 Importantly, despite the difference in decision type between deliberate and arbitrary blocks, the 850 instructions for carrying out the decisions were identical: Subjects were instructed to report 851 their decisions as soon as they made them in both conditions. They were further asked to place 852 their right and left index fingers on the response keys, so they could respond as quickly as 853 possible. Note that we did not ask subjects to report their "W-time" (time of consciously 854 reaching a decision), because this measure was shown to rely on neural processes occurring 855 after response onset (Lau, Rogers, & Passingham, 2007) and to potentially be backward 856 inferred from movement time (Banks & Isham, 2009 both causes may each get a $500 donation regardless of the choice") on a gray background that 867 was used throughout the experiment. Short-hand instructions appeared at the top of the screen 868 throughout the block in the same colors as that block's initial instructions; Deliberate: "Choose 869 for $1000" or Arbitrary: "Press for $500 each" (Fig. 1). 870 Each trial started with the gray screen that was blank except for a centered, black fixation 871 cross. The fixation screen was on for a duration drawn from a uniform distribution between 1 872 and 1.5 s. Then, the two cause-descriptions appeared on the left and right side of the fixation 873 cross (left/right assignments were randomly counterbalanced) and remained on the screen until 874 the subjects reported their decisions with a key press-<Q> or <P> on the keyboard for the 875 cause on the left or right, respectively. The cause corresponding to the pressed button then 876 turned white for 1 s, and a new trial started immediately. If subjects did not respond within 20 877 s, they received an error message and were informed that, if this trial would be selected for the 878 lottery, no NPO would receive a donation. However, this did not happen for any subject on any 879 trial. 880 To assess the consistency of subjects' decisions during the main experiment with their ratings 881 in the first part of the experiment, subjects' choices were coded in the following way: each 882 binary choice in the main experiment was given a consistency grade of 1, if subjects chose the 883 NPO that was rated higher in the rating session, and 0 if not. Then an averaged consistency 884 grade for each subject was calculated as the mean consistency grade over all the choices. Thus, 885 a consistency grade of 1 indicates perfect consistency with one's ratings across all trials, 0 is 886 perfect inconsistency, and 0.5 is chance performance. 887 We wanted to make sure subjects were carefully reading and remembering the causes also 888 during the arbitrary trials to better equate memory load, attention, and other cognitive aspects 889 between deliberate and arbitrary decisions-except those aspects directly associated with the 890 decision type, which was the focus of our investigation. We therefore randomly interspersed 36 891 memory catch-trials throughout the experiment (thus more than one catch trial could occur per 892 block The EEG was recorded using an Active 2 system (BioSemi, the Netherlands) from 64 904 electrodes distributed based on the extended 10-20 system and connected to a cap, and seven 905 external electrodes. Four of the external electrodes recorded the EOG: two located at the outer 906 canthi of the right and left eyes and two above and below the center of the right eye. Two 907 external electrodes were located on the mastoids, and one electrode was placed on the tip of the 908 nose. All electrodes were referenced during recording to a common-mode signal (CMS) 909 electrode between POz and PO3. The EEG was continuously sampled at 512 Hz and stored for 910 offline analysis. 911 ERP analysis 912 ERP analysis was conducted using the "Brain Vision Analyzer" software (Brain Products, 913 Germany) and in-house Mathworks Matlab scripts. Data from all channels were referenced 914 offline to the average of all channels, which is known to result in a reduced-amplitude RP 915 (because the RP is such a spatially diffuse signal). The data were then digitally high-pass 916 filtered at 0. baseline period was defined as the time window between -1000 ms and -500 ms prior to 936 stimulus onset, that is, the onset of the causes screen, rather than prior to movement onset. In 937 addition to the main baseline, we tested another baseline-from -1000 ms to -500 ms relative 938 to movement onset-to investigate whether the baseline period influenced our main results (see 939 Results). Furthermore, we segmented the EEG based on stimulus onset, using the same 940 baseline, for stimulus-locked analysis (again, see Results). 941 To assess potential effects of eye movements during the experiment, we defined the radial eye 942 signal as the average over all 4 EOG channels, when band-pass filtered to between 30 and 100 943 Hz. We then defined a saccade as any signal that was more than 2.5 standardized IQRs away 944 from the median of the radial signal for more than 2 ms. Two consecutive saccades had to be at 945 least 50 ms apart. button-press onset were used for the ANOVAs. Greenhouse-Geisser correction was never 953 required as sphericity was never violated (Picton et al., 2000). 954 Trend analysis on all subjects' data was carried out by regressing the voltage for every subject 955 against time for the last 1000 ms before response onset using first-order polynomial linear 956 regression (see Results). We used every 10 th time sample for the regression (i.e., the 1 st , 11 th , 957 21 st , 31 st samples, and so on) to conform with the individual-subject analysis (see below). For 958 the individual-subject analysis, the voltage on all trials was regressed against time in the same 959 manner (i.e., for the last 1000 ms before response onset and using first-order polynomial linear 960 regression). As individual-trial data is much noisier than the mean over all trials in each 961 subject, we opted for standard robust-regression using iteratively reweighted least squares 962 (implemented using the robustfit() function in Mathworks Matlab). The iterative robust-963 regression procedure is time consuming. So, we used every 10 th time sample instead of every 964 sample to make the procedure's run time manageable. Also, as EEG signals have a 1/f power 965 spectrum, taking every 10 th sample further better conforms with the assumption of i.i.d. noise 966 in linear regression. 967 We further conducted Bayesian analyses of our main results. This allowed us to assess the 968 strength of the evidence for or against the existence of an effect, and specifically test whether 969 null results stem from genuine absence of an effect or from insufficient or underpowered data. 970 Specifically, the Bayes factor allowed us to compare the probability of observing the data 971 given H0 (i.e., no RP in deliberate decisions) against the probability of observing the data given 972 H1 (i.e., RP exists in deliberate decisions). We followed the convention that a BF < 0.33 973 implies substantial evidence for lack of an effect (that is, the data is at least three times more 974 likely to be observed given H0 than given H1), 0.33 < BF < 3 suggests insensitivity of the data, 975 and BF > 3 denotes substantial evidence for the presence of an effect (H1) (Jeffreys, 1998). 976 Bayesian analysis was carried out using JASP (ver. 0.8; default settings). 977 In addition to the above, we used the cluster-based nonparametric method developed by Maris 978 and Oostenveld to find continuous temporal windows where EEG activity was reliably 979 different from 0 (Maris & Oostenveld, 2007). We used an in-house implementation of the 980 method in Mathworks Matlab with a threshold of 2 on the t statistic and with a significance 981 level of p = 0.05. 982

Model and Simulations 983
All simulations were performed using Mathworks Matlab 2018b. The model was devised off 984 the one proposed by Schurger et al. (2012). Like them, we built a drift-diffusion model 985 (Ratcliff, 1978;Usher & McClelland, 2001), which included a leaky stochastic accumulator 986 (with a threshold on its output) and a time-locking/epoching procedure. The original model 987 amounted to iterative numerical integration of the differential equation 988 (1) where I is the drift rate, k is the leak (exponential decay in x), ξ is Gaussian noise, and c is a 989 noise-scaling factor (we used c = 0.05). Δt is the discrete time step used in the simulation (we 990 used Δt = 0.001, similar to our EEG sampling rate). The model integrates xi until it crosses a 991 threshold, which represents a decision having been made. 992 In such drift-diffusion models, for a given k and c, the values of I and the threshold together 993 determine how quickly a decision will be reached, on average. If we further fix the threshold, a 994 higher drift rate, I, represents a faster decision, on average. The drift rate alone can thus be 995 viewed as a constant "urgency to respond" (using the original Schurger term) that is inherent in 996 the demand characteristics of the task, evidenced by the fact that no subject took more than 20 997 s to make a decision on any trial. The leak term, k, ensures that the model would not be too 998 linear; i.e., it prevented the drift rate from setting up a linear trajectory for the accumulator 999 toward the threshold. Also, k has a negative sign and is multiplied by xi. So, kxi acts against the 1000 drift induced by I and gets stronger as xi grows. Hence, due to the leak term, doubling the 1001 height of the threshold could make the accumulator rarely reach the threshold instead of 1002 reaching it in roughly twice the amount of time (up to the noise term). 1003 When comparing the model's activity on the SMA and on Region X, we needed to know how 1004 to set the drift rate for the DDM in the Region X for deliberate decisions. We made the 1005 assumption that the ratio between the drift rate in Region X and in the SMA during deliberate 1006 decisions would be the same as the ratio between the average actual activity in the SMA and in 1007 the rest of the brain during arbitrary decisions. Our EEG data suggested that this ratio 1008 (calculated as activity in Cz divided by the mean activity in the rest of the electrodes is 1.45. 1009 Hence, we set the drift rate in Region X to be 1.45 times smaller than that of the SMA (see 1010  Table 1 for the drift values in the SMA). 1011 Our model differed from Schurger's in two main ways. First, it accounted for both arbitrary 1012 and deliberate decisions and was thus built to fit our empirical results. We devised a model that 1013 was composed of two distinct components (Fig. 8A), each described by a race to threshold 1014 between 2 DDMs based on Eq.
(1) (see below), but with different parameter values for each 1015 DDM (Table 1). The first component accumulated activity that drove arbitrary decisions (i.e., 1016 hand (used to perform the task), but the center of the spatial distribution varied from subject to subject. Therefore, for each subject we selected an electrode from Cz, C1, or FC1 (Cz, C2, or FC2 if left handed) on the basis of data from the classic task, showing the highest-amplitude RP. This same electrode was then used for analysis of the data from the interruptus task (so the choice of electrode used in Fig. 3 was independent of the data presented in Fig. 3). Limiting the choice to C1 (C2) or FC1 (FC2) did not change the outcome.
Model and Simulations. All simulations were performed using MatLab (MathWorks). The model includes two components: a leaky stochastic accumulator (with a threshold on its output) and a time-locking/epoching procedure. We used a well-known accumulator model (DDM) (27), which is an extension of an earlier model developed by Ratcliff (23). Simulation of the model amounts to iterative numerical integration of the differential equation where I is drift rate, k is leak (exponential decay in x), ξ is Gaussian noise, and c is a noise-scaling factor (we used c = 0.1). Δt is the discrete time step used in the simulation (we used Δt = 0.001). In the context of our model, I corresponds to a general (and we assume constant) urgency to respond that is inherent in the demand characteristics of the task. A small amount of urgency is necessary in the model to account for the fact that subjects rarely if ever wait longer than ∼20 s to produce a movement in any given trial. Because of the leak term, the urgency does not set up a linear trajectory toward the threshold (i.e., if we were to increase the threshold that we used by a factor of 2, the output of the accumulator would essentially never reach it), but simply moves the baseline level of activity closer to the threshold so that a crossing is very likely to happen soon (Fig. 1, Inset). Thus, the model has three free parameters, urgency (I), leak (k), and threshold (β). The threshold was expressed as a percentile of the output amplitude over a set of 1,000 simulated trials (50,000 time steps each). These three parameters were chosen on the basis of the best fit of the first crossing-time distribution to the empirical waiting-time distribution from the classic Libet task (we use the term "waiting time" instead of "reaction time"). The parameters were then fixed at these values for all other simulations and analyses, including the fitting of the RP. The three parameter values assigned were k = 0.5, I = 0.11, and β = 0.298 (corresponding to the 80th percentile). We modeled the classic task by simply identifying the time point of the first threshold crossing in each simulated trial and then extracting the time series (the output of the accumulator) from 5,000 time steps before the threshold