Using a Veto paradigm to investigate the decision models in explaining Libet-style experiments

Classical Libet task, resulting in longer wait times and earlier self-reported intentions to act (i


Introduction
Whether free will exists or not has been a topic of interest for philosophers for centuries (O'Connor and Christopher, 2022).Only a few decades ago, the empirical work of Benjamin Libet and colleagues (Libet et al., 1983) opened a new avenue for cognitive neuroscientists to empirically investigate this question.In a seminal experiment, Libet et al. (1983) demonstrated that brain activation (the so-called readiness potential; RP) preceded the reported conscious decision to act by a few hundred milliseconds casting doubts on the intuition of conscious free will.More specifically, participants were instructed to look at a clock with a rotating dot and freely decide when to press a key while observing the dot rotating.After pressing the key, they reported when it seemed to them that they had decided to press the key (i.e., the W; Fig. 1) using the position of the dot.Libet et al. then compared the W with the onset of the RP, which is an event-related potential (ERP) component traditionally interpreted as a movement preparation signal (Kornhuber & Deecke, 1965;Libet et al., 1983) and found that the RP preceded the W by 150 to 350 ms (ms).Libet (1999) interpreted the RP as a marker of the unconscious initiation of an agent's action happening before his/her own conscious intention.While Libet himself inclined to accept the existence of free will under his own finding (Libet, 1999), his finding and interpretation has been used as an argument against the existence of free will by the free will skeptics (e.g., Ebert & Wegner, 2011).Although Libet's findings were successfully replicated (Braun et al., 2021;Dominik et al., 2018), the validity of the experiment has been debated ever since its first publication.While some researchers challenged how accurately the self-reported time of intention could reflect the actual onset of an intention (e.g., Dominik et al., 2017Dominik et al., , 2018;;2023;Ivanof et al., 2022;2022;Lau et al., 2007;Triggiani et al., 2023), others provided a re-interpretation of the RP which preceded the reported conscious intention to act (Brass et al., 2019;Schmidt et al., 2016;Schurger, 2018;Schurger et al., 2012;2021).One such re-interpretation assumes that the RP is not the outcome of a decision process but rather reflects the decision process itself which is only completed when the RP reaches a specific bound, the integration to bound model of intentional action (ITB; Bode et al., 2014;Brass et al., 2019;Schurger, 2018;Schurger et al., 2012;2021).The 'Conditional Intention and Integration to Bound model' (COINTOB; Brass et al., 2019) assumes that, based on the experimental instructions, parameters of the decision process are specified (i.e. the decision threshold and the rate of evidence accumulation).Participants accumulate evidence until the intention thresholds is reached and a conscious proximal intention is generated.However, the decision-making process does not end yet.The accumulation of the information continues either until the change-of-mind bound is reached and the agent vetoes his/her own conscious proximal intention and stops the action, or until the point of no return (i.e., a time point that is too late for the participants to veto their action; Matsuhashi & Hallett, 2008;Schultze-Kraft et al., 2016) is reached.
It has been argued (Bode et al., 2014, Brass et al. 2019, Roskies, 2010) that intentional action during Libet-style experiments is similar to other types of decision-making, such as perceptual decision-making and only differs in the type of information that is integrated (Brass et al., 2019).In perceptual decision-making, participants set up intention thresholds for different options based on instructions and accumulate perceptual information until one of the intention thresholds is reached.Then, the decision is made according to the corresponding threshold (Gold and Shadlen, 2007).Since in Libet-style experiments there is no perceptual evidence that guides the decision, it was suggested that internal signals, such as interoceptive signals or stochastic noise would feed the decision process (Brass et al., 2019;Park et al., 2020;Schurger et al., 2012;2016;2021).Regardless of the specific model specification, ITB models converge on the idea that the RP simply reflects the continuous integration of information, and the onset of the RP does not actually signify an agent's "unconscious neural decision".Hence, ITB models of Libet-style experiments are "late decision models" where the decision is only reached briefly before action execution (Schurger et al., 2021) rather than hundreds of milliseconds earlier as assumed by "early decision models" such as the one by Libet et al. (1983).
Support for ITB models of Libet-style experiments stems from investigations of the Waiting time (i.e., the duration between the start of the trial and the execution of keypress; Fig. 1) and the shape of the RP.First, both the distribution of the Waiting time and the shape of the RP could be modeled well with the ITB (Murakami et al., 2014;Schurger et al., 2012;Schurger, 2018).Furthermore, Schurger (2018) compared the RP amplitude of trials with longer Waiting time to that with shorter Waiting time.Since the longer Waiting time implied a longer signal accumulation for reaching the intention threshold, the signal accumulation slope in trials with the longer Waiting time should be less steep in general.Therefore, the ITB predicted a lower RP amplitude in the trials with the longer Waiting time, which was indeed supported by the empirical data.
While there are many variants of ITB models (Bode et al., 2014;Brass et al., 2019;Schurger, 2018;Schurger et al., 2012;2021), two particular elements deserve special attention.The first element is the number of thresholds.Brass et al. (2019) and Schurger (2018) proposed that there are two decision thresholds in the model.While the naming of the thresholds is different between papers, they unanimously proposed that one threshold is related to the generation of intention (the intention threshold) while another one is related to the generation of the motor command (Schurger: the action threshold/ Brass et al.: the change-of-mind bound).In both models, participants generate the intention after the intention threshold is crossed and then execute the action if the action threshold is crossed Fig. 1.Timeline of dependent measures in all the experiments.Traditionally, the W is calculated relative to the keypress onset.Therefore, the W is a negative value if the intention onset is reported to be earlier than the keypress onset.(Schurger, 2018)/the change-of-mind bound is not reached to veto the action (Brass et al., 2019).Schurger tested his model with empirical data.Specifically, his model configuration predicted that if the signal accumulation was weaker, it would take longer for the agent to cross the action threshold for executing the action after crossing the intention threshold.Hence, a negative correlation was expected between Waiting time and W, which was indeed the case (Schurger, 2018).
The second element that one should pay attention to is the leak in the accumulated signal.In Schurger's model (Schurger, 2018;Schurger et al., 2012) there is a leak term in the model, such that the accumulated signal is attenuated in a constant rate as the time passes (Bogler et al., 2023).The inclusion of the leak term was proposed by Usher & McClelland (2001) to deal with perceptual decision-making accuracy under a situation that the difference between stimuli are minimal while an unlimited time is given to the participants to respond.Usher & McClelland (2001) argued that the discriminability of the stimuli is predicted to increase infinitely with time in this situation by the classical accumulation models without a leak term, which obviously does not match with the empirical data.To tackle this issue, they introduced the leak term to model the loss of evidence across time so that the accuracy does not increase to infinity over time.However, in the Libet-style experiment there was not any genuine "evidence" for making the decision, nor there is any accuracy of the task that necessarily requires a leak term to explain.Therefore, we do not see a reason why the leak term is necessary in a decision-making model explaining arbitrary intentional action like the Libet-style experiment and we would prefer models that are neutral to the leak term (e.g., the COINTOB model by Brass et al., 2019).
One central argument of the ITB model of Libet-style experiments is that these tasks are in principle similar to other decisionmaking tasks which are sensitive to strategic processes induced by task instructions (Brass et al., 2019).Such strategic adjustments based on task instructions should affect different components of the integration to bound process.Previous study (Trovò et al., 2021) instructed participants to perform self-paced movements under varying time pressure (i.e., the strength of the imperative).They observed an increase in the RP amplitude when the time pressure was stronger.One post-hoc explanation of this result is that when the imperative is stronger, the accumulation slope of the decision signal becomes steeper, and the measured amplitude of the decision signal (reflected by the RP) is higher.If this explanation is correct, one should expect an earlier reported onset of intention relative to the action (i.e., the W), and an earlier execution of behavior (i.e., the Waiting time).However, Trov'o et al. did not measure the intention onset in their study, and the Waiting time of their study was highly restricted by their manipulation.Therefore, these behavioral predictions remained untested.
To address this gap, we applied a different instruction-related manipulation to test the ITB model of Libet-style experiments in the current study.In particular, we implemented an intentional inhibition instruction to the classical Libet task (Brass & Haggard, 2007, 2008).In a seminal study, Brass and Haggard (2007) asked participants to carry out a classical Libet task.However, participants were also instructed to veto their decision to act in approximately 50 % of the trials.We argued that participants can only veto their own intention to act before the decision signal reaches the action threshold (i.e., the point of no return; Matsuhashi & Hallett, 2008).This argument, in fact, is supported by the literature.For instance, Schultze-Kraft et al. (2016) observed that if a stop signal was very close to the onset of the EMG (e.g., 100 ms before), participants sometimes could not abort the keypress.Therefore, participants have to have enough time between experiencing the intention to act and reaching the action threshold in order to reserve enough time for vetoing.
Based on this assumption, we formulated the Slope Steepness Hypothesis from the ITB (Fig. 2a).Accordingly, we hypothesized that the signal accumulation in the Veto Libet task would be less steep than that in the Classical Libet task.This was because if the signal accumulation slope was too steep, the signal would cross the action threshold immediately after crossing the intention threshold, which would give participants insufficient time to veto their action in the Veto Libet task.On the other hand, in the Classical Libet task, the insufficient time to veto should not be a concern since participants do not have to veto their decision.Based on this, we formulated two main predictions: first, the Waiting time in the Veto Libet task would be longer than in the Classical Libet task.Second, the W in the Veto Libet task would be earlier relative to the keypress than that in the Classical Libet task.In addition, we also developed a novel measure, the Start-to-W time (i.e., the time between the onset of the trial to the onset of the self-reported intention; see Fig. 1 for a timeline of all the behavioral measures) and performed exploratory analyses in comparing between the two tasks.It was expected that the Start-to-W time would be longer in the Veto Libet task.We also considered another ITB model that only the initial bias, but not the slope of the decision signal, was affected by the instruction (Fig. 2b).In this case, the temporal gap between crossing the two thresholds should remain constant, so the W across the two conditions should not differ.
The aim of the current study was to test whether asking participants to veto their own intention sometimes at the last moment would affect the Start-to-W time, the W, and the Waiting time as the aforementioned way.This was first addressed by Experiment 1.Since the Veto Libet task added another decision component on top of the Classical Libet task, the levels of cognitive demand between the two conditions might not be the same.This potentially explained the difference we observed in the dependent measures when we compared the two tasks.Furthermore, participants might pre-plan whether to move at the beginning of the trial instead of deciding after formulating the When decision.To address these concerns, in Experiment 2 we designed the pre-plan Libet task, which asked participants to pre-plan whether to move before the trial started.Then, we compared the Start-to-W time, the W, and the Waiting time between the Pre-plan Libet task and the Veto Libet task and observed if we could obtain the same result patterns.
In addition to testing the Slope Steepness Hypothesis, there are two additional novelties in the current study.First, this is the first study that applies the Libet task in an online fashion.Online testing has the advantage of allowing us to test large samples with minimal effort.However, there is a risk that participants in online testing are not motivated enough to carry out tasks like the Libet task.To see whether online testing of the Libet task provides reasonable results we can compare our online results with a meta-analysis based on face-to-face studies (Braun et al., 2021).Second, previous studies using the Veto Libet task (Brass & Haggard, 2007;Kühn et al., 2009;Walsh et al., 2010) realized a difficulty in studying the vetoing phenomenon: namely, that there was not any behavioral correlate for vetoing one's action.If a reliable behavioral difference between the Classical Libet task and the Veto Libet task (i.e., the difference in the Waiting time or the W) can be observed, this would then establish a behavioral correlate that could be used in future studies of the Y.H. Shum et al. vetoing phenomenon.

Participants
The current experiment was pre-registered in AsPredicted (https://aspredicted.org/bt8ah.pdf).All data, analysis scripts, and jsPsych programs are available on the Open Science Framework (OSF; https://osf.io/sk7v4/?view_ only=c76c0b6d1fbc4c26a5bd1b6587c21411).A Bayesian sequential sampling plan was administered for the current data collection.The initial sample size started with 110 for each condition, which was determined based on a small to medium effect size (Cohen's d = 0.34) with a power of 0.8.The estimated effect size was chosen based on the only study comparing the W between the Classical Libet and the Veto Libet task (Walsh et al., 2010), which found an effect with d = 0.39.After analyzing the pilot data (which was not included in the actual data analysis), we slightly decreased the estimated effect size.
The pre-registered analyses focused on the two Bayesian Independent sample t-tests (two-sided) to compare the Waiting time and W between the Classical Libet and Veto Libet conditions.If the evidences were inconclusive (1/3 < BF10 < 3) in both t-tests, the sample size was increased by 10 per condition until either the evidence for at least one of the t-tests was conclusive, or until the stopping cut-off of 150 participants per group was collected.
All participants were recruited and reimbursed online on the Prolific platform (https://www.prolific.co/).Participants were filtered to have English as their first language.Participants' data were excluded from analysis if they were identified as not following the instructions or being an outlier (as outlined in our pre-registration in AsPredicted (https://aspredicted.org/).The data were excluded when at least one of the following exclusion criteria was matched: (1) Participants who admitted that they have always or often pre-planned their movement,(2) Participants with less than 20 % action/veto trials in the Whether task, (3) Participants with the SD of keypress time lower than 100 ms, (4) Participants always performing the keypress in the same or almost the same clock hand position (as measured by the variance of clock hand position during the button press), and ( 5) Outliers of the W-time and Waiting time with values lower than the lower hinge*1.5Interquartile range (IQR) or higher than the upper hinge*1.5 IQR.An equal number of replacements were collected if participants were excluded.

Design
The experiment consisted of two tasks, the Classical Libet task (Fig. 3; Libet et al., 1983) and the Veto Libet task (Fig. 3; Brass & Haggard, 2007).Participants were randomly assigned to one of these two tasks.
In each trial of the Classical Libet task, a clock face appeared at the beginning of the trial.A clock hand then appeared and started rotating after a delay of 400 ms.The starting position of the rotating clock hand was random.The clock hand took 2560 ms to finish one rotation around the clock face.Participants were instructed to fixate on the rotating clock hand and wait for one complete rotation of Fig. 3. Paradigms of Experiment 1 procedures.In the Classical Libet task (top), participants pressed a key whenever they decided and reported the onset of their When Decision.In the Veto Libet task (bottom), participants decided when to press the key and whether to veto their action plan.Then, they reported the onset of their When Decision, regardless of whether they executed it or not. the clock hand.Then, they had to decide when to press the "ENTER" key and then execute the movement immediately after they decided to do so.After the keypress, the clock hand continued to rotate for a random interval between 500 and 800 ms (with 20 ms as a step) and disappeared afterward.A controllable clock hand then appeared, and the participants were asked to report when they experienced their intention to press the key by indicating the position of the rotating clock hand at that moment.The trial ended after the report.Participants were also instructed not to pre-plan the time of their keypress.There were 60 trials in the Classical Libet task.
The procedure of the Veto Libet trial was the same as that in the Classical Libet task, except that in the Veto Libet task participants were instructed to withhold their keypress intention after generating it in approximately 50 % of the trials.If they decided to withhold, they waited for three clock hand rotations without pressing any keys to complete the trial.If they decided not to withhold, they pressed the key.Regardless of the decision, participants were asked to report when they experienced their intention to act based on the procedure mentioned above.Participants chose freely which trials to withhold, but they were instructed to withhold their keypress in approximately 50 % of the trials and not to pre-plan their keypress withholding decisions nor the time of their keypress.There were 120 trials in the Veto Libet task.The Veto Libet task comprised twice the number of trials in the Classical Libet task, since participants were instructed to act in approximately 50 % of the trials in the Veto Libet task, whereas in the Classical Libet task participants consistently pressed the key.
Therefore, the number of trials was adjusted to ensure comparable numbers of action trials in the analysis between the two tasks.

Pre-registered analysis plan
Before the analyses, the W was obtained by comparing the clock hand position reported by the participant and the clock hand position during the keypress.The W was calculated based on the closest distance between the reported intention onset and the time of the keypress.For example, if the participant pressed at 1o'clock and reported that they decided at 11 o'clock, the W would be − 426.67 ms (since it takes 213.33 ms for the clock to travel from one "hour" to the other).On the other hand, if the participant pressed at 1 o'clock and reported that they decided at 3 o'clock, the W would be 426.67 ms.In the original analysis we removed outliers (defined as +/-3 stand deviations beyond the mean) for each participant.However, all the results reported in this study remained unchanged regardless of this outlier removal procedure.The results we reported below were those with outlier removed.
After the calculation of the W, two analyses were conducted.First, the number of total trials in the Classical Libet task (60 trials) was different from that in the Veto Libet task (120 trials).The level of fatigue of participants might therefore differ between the two groups.We addressed this issue by comparing the action trials in the first half of the Veto Libet task and those in the second half.Then, the Waiting time and W between the two halves were compared with Bayesian pair-sample t-tests.If there was no substantial evidence supporting a difference between the action trials from the two halves, the trials would be pooled in the following analyses.If there was substantial evidence suggesting that they differed from each other, then only the action trials in the first half would be analyzed.
Second, we addressed the two main hypotheses in the current study.Based on the Slope Steepness Hypothesis under the ITB model, the slope of the signal accumulation in the Veto Libet task was predicted to be less steep than that in the Classical Libet task.Therefore, the Waiting time in the Veto Libet task was expected to be longer compared to the Classical Libet task.The less steep slope in the Veto Libet task would also make the temporal gap between crossing the intention threshold and the action threshold to be wider, hence an Y.H. Shum et al. earlier W was expected in the Veto Libet task compared to that in the Classical Libet task.To test these two hypotheses Bayesian independent-samples t-tests were performed to compare the Waiting time and W between the two groups.The medium Cauchy prior (i.e., 0.707), which was also the default prior in Jamovi software (2022, version 2.3), was used to calculate the BF 10 .Although the default prior did not match with the estimated effect size (i.e., d = 0.34), changing the prior to 0.34 did not affect any of the reported results in the current study.Therefore, only results conducted with the default prior were reported.A BF 10 larger than 3 was considered substantial evidence supporting the alternative hypothesis, while a BF 10 smaller than 0.3 was considered substantial evidence supporting the null hypothesis.For completeness, independent-sample t-tests with frequentist statistics were also performed.Since the results of both analytic approaches lead to the same conclusion, only the Bayesian statistics results were reported.
Another hypothesis was about the relationship between Waiting time and W. A correlation between Waiting time and W in the Classical Libet task was reported in Schurger (2018).To test whether this finding could be replicated in an online setting as well as in the Veto Libet task, two analyses were conducted.First, we pooled the trials of all the participants together and calculated the correlation between Waiting time and W at trial level, which was the method conducted by Schurger (2018).However, this analytical approach was suboptimal as it aggregated trials from different individuals, thereby neglecting individual variances.Consequently, we also computed correlations at the individual level.Specifically, we first calculated the correlation between Waiting time and W for each participant, then computed the mean of those individual correlations by using the Fisher transformation.The mean individual correlation was then inverse Fisher transformed, and the BF 10 was then estimated based on the sample size and the mean individual correlation.This analysis approach took the individual variance into account, therefore was more appropriate.The default stretched beta prior width (i.e., 1) in Jamovi software (2022, version 2.3) was used to calculate the BF 10 .The Bayesian factor for the second analysis was calculated with the R code by Verhagen & Wagenmakers (2014) with the default prior in Jamovi software.

Exploratory analysis plan
In the analysis only action trials were analyzed, since the dependent measures (i.e., the W, Start-to-W, and Waiting time) were only available in action trials.Two exploratory analyses were conducted.First, we wanted to index the time it takes participants to form an intention (i.e., the Start-to-W time).Since the clock hand position reported by the participant could be in any rotation, the clock hand position reported by itself could not index the Start-to-W time.To achieve the Start-to-W time, we summed up the Waiting time and W. A sum was used because the W reflected how much earlier the intention occurred before the keypress (i.e., the Start-to-W time minus the Waiting time).This approach also allows the possibility that the Start-to-W time happens later than the Waiting time, if participants reported their intention to be later than their keypress.Hence, this method was better than subtracting the absolute number of W from the Waiting time.The Start-to-W time was then compared between the Veto Libet task and the Classical Libet task by using Bayesian independent-sample t-tests.According to the Slope Steepness Hypothesis, participants in the Veto Libet task were predicted to take more time to cross the intention threshold and generate their intention than that in the Classical Libet task.
The second exploratory analysis was about the effect of trial history on the Waiting time and W in the Veto Libet task.In the ITB an initial bias is set to reflect the agent's preference to a choice (i.e., a preference of whether to veto or not).We hypothesize that this initial bias might also be affected by the choice in the previous trial.In the literature of random choice generation, it has been widely noticed that participants have a tendency to avoid repetition (e.g., Jahanshahi et al., 1998;Rabinowitz, 1970).Given the spontaneous nature of the Veto Libet task, we hypothesized a similar random choice generation pattern would occur if participants were instructed to veto their intention sometimes.That is, participants would be inclined to veto if they had acted in the previous trial.In this case extra time was needed for the decision signal to go against the initial bias towards vetoing their action and commit to executing the action instead.Thus, a longer Waiting time in the current action trial was predicted if the previous trial was also an action trial.To verify this prediction, trials were separated based on whether participants acted or not in the preceding trial.If the participant acted in the preceding trial, the trial was categorized as the Action-Action trial.If the participant did not act in the preceding trial, the trial was categorized as a Veto-Action trial.The Bayesian paired-samples t-tests were conducted to compare the Waiting time and W between the Action-Action trials and the Veto-Action trials.
In order to explore the robustness of the findings in our pre-registered analyses, we also performed Bayesian sequential analyses to examine how the BF 10 developed with the increase of sample size.We conducted a sequential analysis starting from half of the whole sample (i.e., 55 participants per group; 110 in total), and plotted how the BF 10 would change as the sample size increases, until our full sample size (i.e., 110 participants per group; 220 in total).The sequential analyses were done with Jamovi software (2022, version 2.3).

Pre-registered analysis
In the Veto Libet task, the mean proportion of veto was 44 % (range: 24.4 % to 79.2 %).This vetoing proportion was highly similar with Brass & Haggard (45.5 %;2007) but different from Walsh et al. (49.7 %;2010).The descriptive statistics of Waiting time and W were summarized in Table 1.First, we compared the Waiting time and W in the first vs.second half of the Veto Libet.Bayesian pairsamples t-test moderately supported the null hypothesis in comparing the Waiting time between the two halves (first half: 4191 ms vs second half: 4165 ms, BF 10 : 0.13).Similarly, Bayesian pair-samples t-test moderately supported the null hypothesis in comparing the W between the two halves (first half: − 148 ms vs second half: − 162 ms, BF 10 : 0.22).Therefore, the trials in the whole Veto Libet task were pooled in the following analyses.
Second, Bayesian independent-samples t-tests were performed to compare the Waiting time and W between the Classical Libet and the Veto Libet tasks to verify the Slope Steepness Hypothesis.There was decisive evidence (BF 10 > 100; d = 0.74) to support that the Waiting time in the Veto Libet task was longer than that in the Classical Libet task.In contrast to the Waiting time results, the W results were less clear.Although the pattern of data in the W measure was consistent with the predicted direction (i.e., earlier W in the Veto Libet task), there was not any evidence either supporting the alternative hypothesis or the null hypothesis (BF 10 = 0.44; d = 0.21).Therefore, the Slope Steepness Hypothesis was only supported by the Waiting time difference, but not the W difference.
Correlations between Waiting time and W were conducted separately for each task.The trial level Pearson's correlation between the Waiting time and W in the Classical Libet task was − 0.06.Despite the small coefficient, decisive evidence supporting the existence of correlation was suggested by the Bayesian analysis (BF 10 > 100).The trial level correlation between the Waiting time and W in the Veto Libet task was − 0.004.Very strong evidence to support the null hypothesis was suggested by the Bayesian analysis (BF 10 = 0.02).Taken together, the correlation between Waiting time and W in the Classical Libet task was replicated in the online setting, while such correlation was not observed in the Veto Libet task.
By averaging the individual correlations, moderate evidence supporting the null hypothesis was observed in both the Classical Libet task (mean r = -0.08;BF 10 = 0.16) and the Veto Libet task (mean r = -0.03;BF 10 = 0.13).In the Classical Libet task, both analyses showed similar correlation coefficient (− 0.06 in trial level vs − 0.08 on the individual level).

Exploratory analysis
First, we compared the Start-to-W time between the Classical Libet task and the Veto Libet task.There was extreme evidence (BF 10 > 100; d = 0.68) supporting that the Start-to-W time in the Veto Libet task was longer compared to the Classical Libet task.This result further supported the Slope Steepness Hypothesis by showing that participants indeed needed more time to generate their When Decision in the Veto Libet task.
Second, we explored whether the choice of the preceding trial would affect the Waiting time and the W of the following trial in the Veto Libet task.When comparing the Action trial with the Veto-Action trial, moderate evidence supporting the null hypothesis was observed in the comparisons of Waiting time (BF 10 = 0.22), as well as the W (BF 10 = 0.15).Hence, the preceding Whether Decision did not affect the following When Decision.
Finally, we conducted Bayesian sequential analyses starting from half of the whole sample.Regarding the Waiting time (Fig. 4a.), the BF 10 fluctuated from around 10 to 100 as the sample size increased from 110 to 130. Once the sample size increased to 150 (i.e., 75 per group), the BF 10 were stably higher than 100, ranged from 100 to higher than 1,00,000.This suggested that the extreme evidence in the t-test for the Wait-time was indeed extremely stable.
Regarding the W time, sequential analysis (Fig. 4b.) showed that the BF 10 fluctuated between around 0.2 and 1 with the increase of the samples.During the sample size between 120 and 180, the BF 10 stayed below 1/3.This moderate evidence supporting the null hypothesis disappeared once the sample size exceeded 180.This suggested that the data in Experiment 1 demonstrated moderate evidence supporting the null hypothesis, even though the evidence was not very stable.
Regarding the Start-to-W time (Fig. 4c.), the BF 10 fluctuated from around 10 to 100 as the sample size increased from 110 to 130. Once the sample size increased to 160 (i.e., 80 per group), the BF 10 were stably higher than 100, ranged from 100 to higher than 100000.Again, this suggested that the extreme evidence in the t-test for the Start-to-W time was indeed extremely stable.

Interim discussion
Both the pre-registered and exploratory analyses partly supported the Slope Steepness Hypothesis by showing longer Waiting time and Start-to-W time in the Veto Libet task compared to the Classical Libet task.However, while W was affected according to the predicted direction, this effect was not reliable.Furthermore, the trial-level correlation observed between the Waiting time and the W in Schurger (2018) was also replicated in an online sample.The effect size, nevertheless, was smaller than the previous study (Pearson's r = -0.116 in Schurger's paper and r = -0.06 in the current study).
Although this study provided preliminary support for the Slope Steepness Hypothesis, there are potential confounds in the current design.Specifically, participants were instructed to explicitly balance their choice of whether to veto their decision in the Veto Libet task, while in the Classical Libet task participants did not have to keep track on their choice to execute the keypress (as they have to press the key in every trial).Therefore, the cognitive demand of the two tasks were different.This difference in the cognitive demand Y.H. Shum et al. might also affect the accuracy of the temporal judgment in the report of the W. One possibility of that was a substantial increase of standard error (S.E.) in the W, which was not observed in the data.Alternatively, it was also possible that the cognitive demand might systematically shift the report of the W to be shorter than the actual experience.This systematic shortening of the W might cancel out the delay of W induced by the veto instruction, and thus the expected effect in W was not observed.Furthermore, although the participants were instructed not to pre-plan their decision of whether to execute the action, it was still possible that they pre-planned their decision in the Veto Libet task instead of deciding whether to veto their decision in the last moment before movement execution.To address these issues, in Experiment 2 we implemented a control task, the Pre-plan Libet task (Fig. 5), in which we specifically asked the participant to pre-plan their choices of whether to execute the keypress at the beginning of the trial before generating the When Decision.We also slightly modified the Veto Libet task to match the task structure (Fig. 5).By comparing the Pre-plan Libet task and the Modified Veto Libet task, the cognitive demand difference was addressed as participants had the same demand to monitor their choice, so that any behavioral effect observed in this comparison could not be due to the difference in cognitive demand.Furthermore, if the behavioral effects observed in Experiment 1 were due to pre-planning in the Veto Libet task, Bayesian analyses on the behavioral measures (i.e., Waiting time, W, and the Start-to-W) between the Modified Veto Libet task and the Pre-plan Libet task would support the null hypothesis.On the other hand, if the behavioral effect observed in Experiment 1 could still be observed in Experiment 2, then the behavioral effects observed in Experiment 1 could not be due to pre-planning in the Veto Libet task.
Experiment 2 therefore addressed the alternative hypothesis about pre-planning and cognitive demand.

Participants
The current experiment was modified based on a previous version of the control experiment (see supplementary materials for details) and pre-registered in AsPredicted (https://aspredicted.org/79fn8.pdf).All data, analysis scripts, and jsPsych programs are available on the Open Science Framework (OSF; https://osf.io/sk7v4/?view_only=c76c0b6d1fbc4c26a5bd1b6587c21411).The Bayesian sequential sampling plan administered was the same as Experiment 1, except the pre-registered analyses focused on the two Bayesian Independent sample t-tests (two-sided) to compare the Waiting time and W between the Pre-plan Libet and Modified Veto Libet conditions.The exclusion criteria were also the same.

Design
Experiment 2 consisted of two tasks, the Pre-plan Libet task (Fig. 5.) and the Modified Veto Libet task (Fig. 5.).Participants were randomly assigned to one of these two tasks.
There were 120 trials in both tasks.In the Pre-plan Libet task, participants decided whether to act in the next Libet trial before the trial started.Specifically, they read an instruction screen asking them to decide whether they wanted to act in the next Libet trial.If they wanted to act, they pressed the Y key on the keyboard to proceed to the next trial.If they did not want to act, they pressed the N key to proceed to the next trial.
The Modified Veto Libet task was the same as the Veto Libet task, except that participants read an instruction screen asking them to Y.H. Shum et al. press either the Y or the N key to proceed to the next Libet trial.Participants were told not to make their subsequent decision about whether to withhold based on their choice in this instruction screen.The rest of the trial was the same as that in the Veto Libet task in the experiments above.

Procedures
The procedures of Experiment 2 were the same as Experiment 1, except for the practice blocks.There were four practice blocks in both the Pre-plan Libet task and the Modified Veto Libet task.The first two practice blocks were the general practice blocks in Experiment 1, while in the third and the fourth practice blocks, participants followed the procedure of either the Modified Pre-plan Libet trial or the Modified Veto Libet trial.

Pre-registered analysis plan
Analyses in Experiment 2 were the same as Experiment 1.

Exploratory analysis plan
Exploratory analyses about the Start-to-W time in Experiment 1 were also performed in Experiment 2. Apart from that, one could raise the concern that in the Modified Veto Liebt task participants might also preplan their choices of whether to act based on the key they pressed at the beginning.If they were doing that then the Modified Veto Libet task would be the same as the Pre-plan Libet task.To address this, we applied logistic regression on each participant with using the participants' Y/N answers before the trial began to predict their subsequent choices.If participants preplanned, then the logistic regression should be significant.Then, we excluded participants if the participants' Y/N answers before the trial began significantly predicted their subsequent choice and compared the new Waiting time and W with the ones in the full sample.Although 17 participants (out of 110) were excluded based on the regression, the results of the subsample were highly similar to the full sample (Waiting time in full sample: 3967 vs Waiting time in the subsample: 3939; W in the full sample: − 191 vs W in the subsample: − 199).Since the analyses results did not change, data with the full sample were reported in the Result section.

Pre-registered analysis
In the Pre-plan Libet task, the mean proportion of not acting was 48.7 % (range: 22.9 % to 60 %), while in the Modified Veto Libet task the mean proportion of veto was 46.6 % (range: 25.6 % to 61.2 %).Overall, the proportions of not acting/veto in both tasks were similar to those reported by previous studies (Brass & Haggard, 2007) and the previous two experiments.Similar to Experiment 1, only action trials were analyzed.The descriptive statistics of Waiting time and W were summarized in Table 1.Again, Bayesian independent-samples t-tests were performed to compare the Waiting time and W between the Pre-plan Libet task and the Modified Veto Libet task to verify the Slope Steepness hypothesis.There was strong evidence (BF 10 = 11.6;d = 0.41) supporting that the Waiting time in the Modified Veto Libet task was longer than that in the Pre-plan Libet task.Furthermore, extreme evidence (BF 10 > 100; d = 0.66) was observed supporting that the W occurred earlier in the Modified Veto Libet task than in the Pre-plan Libet task.These results provided clear evidence to support the Slope Steepness Hypothesis.
Correlations between Waiting time and W were also conducted.The trial level Pearson's correlation between the Waiting time and W in the two tasks were in opposite directions.Specifically, there was extreme evidence supporting the existence of a positive correlation in the Pre-plan Libet task (r = 0.06; BF 10 > 100), while the evidence moderately supported the existence of a negative correlation in the Modified Veto Libet task (r = -0.04;BF 10 = 5.26).By averaging the individual correlations, moderate evidence supporting the null hypothesis was observed in both the Pre-plan Libet task (mean r = -0.02;BF 10 = 0.12) and the Modified Veto Libet task (mean r = 0.007; BF 10 = 0.13).The difference between the two correlation analysis approaches, again, was due to the difference in the number of observations (>7000 vs 110).

Exploratory analysis
We compared the Start-to-W time between the Pre-plan Libet task and the Modified Veto Libet task.There was not any evidence either supporting the alternative hypothesis or the null hypothesis in this comparison (BF 10 = 1.12).We also compared the proportion of not acting/vetoing between the Pre-plan Libet task and the Veto Libet task in the current experiment.There was strong evidence supporting the alternative hypothesis (BF 10 = 13.5;d = 0.42).One reasonable explanation for this difference was that vetoing one's own decision after formulating an intention was more cognitive demanding than pre-planning not to move.Hence, participants in the Veto Libet task inclined more to act than to inhibit one's action at the beginning of the trial (i.e., in the Pre-plan Libet task).This further validated the difference in psychological process between pre-planning not to act and vetoing.
We also conducted Bayesian sequential analyses starting from half of the whole sample.Regarding the Wait-time, as shown in the Fig. 6a below, the BF 10 fluctuated from around 0.5 to 10 as the sample size increases from 110 to around 180. Once the sample size increased to 180 (i.e., 90 per group), the BF 10 were stably higher than 10, ranged from 10 to above 30.This suggested that the strong evidence in the t-test for the Wait-time was stable once the sample size per group exceeded 90 (i.e., 180 in total).
Regarding the W, sequential analysis (Fig. 6b.) showed that the BF 10 stably exceeded 100 once the total sample was increased to 150 (i.e., 75 per group).This suggested that the conclusive evidence in the t-test for the W-time was indeed extremely stable.
Regarding the Start-to-W time, sequential analysis (Fig. 6c.) showed that the BF 10 fluctuated between 1/3 to 3 across the whole sample.This again supported that there was not any evidence either supporting the alternative hypothesis or the null hypothesis in the experiment.

Interim discussion
Experiment 2 examined the Slope Steepness Hypothesis while addressing the difference in cognitive demand between the Classical Libet task and the Veto Libet task.Both the pre-registered and exploratory analyses provided strong support for the Slope Steepness Hypothesis by showing a longer Waiting time and an earlier W in the Modified Veto Libet task than that in the Pre-plan Libet task.However, the trial-level correlation observed between the Waiting time and the W in Schurger (2018) could not be replicated in the Pre-plan Libet task.Specifically, Schurger (2018) observed a negative correlation between the Wait time and the W, while a positive correlation was observed in the current experiment.

Exploratory combined study
The effects on the Start-to-W time and the W were found only in Experiment 1 and Experiment 2, respectively.To increase the statistical power, a combined analysis was conducted with the data in Experiment 1 and Experiment 2 for testing whether the differences in the W and the Start-to-W time could be observed across the two studies.There was extreme evidence supporting the differences in the Waiting time, the W, and the Start-to-W time (all BF 10 > 100).
We also conducted Bayesian sequential analyses starting from half of the whole sample.Both the BF 10 for the Wait-time and the Wtime exceeded 100 once the participants in total increased to 300 (i.e., 150 participants per group; see Fig. 7a and 7b for details).Regarding the Start-to-W time, the BF 10 became stably higher than 100 once the sample size exceeded 360 (i.e., 180 participants per group; see Fig. 7c for details).Overall, these results suggested that the strong evidence in the t-tests for all the behavioral measures were stable in the combined study.Libet et al. (1983) observed that the onset of the RP preceded the subjective experience of will by a few hundred milliseconds.Based on this, Libet et al. (1983) interpreted the RP as the neural marker reflecting the decision and preparation of an agent's action preceding the agent's conscious decision to act.Libet's finding and interpretation has been considered as evidence to argue against the existence of conscious free will by free will skeptics (e.g., Ebert & Wegner, 2011).This interpretation of the RP in Libet-style tasks has been called into question by researchers arguing that the readiness potential is not the consequence of an unconscious decision but rather reflects the decision processes itself (Brass et al., 2019;Schurger et al., 2012;Schurger, 2018).According to ITB models of intentional action, the RP reflects the accumulation of decision signal that only leads to a decision when a first intention threshold is hit.In this case, the moment that the accumulated neural signal reaches the intention threshold coincides with the moment when the conscious intention is formed.No "neural decision" has been made before that and therefore the precedence of RP to the agent's conscious intention is no longer a threat to free will under this interpretation.

General discussion
While some correlational support for the ITB model has been found in a previous study (Schurger, 2018), and supporting evidence has been reported based on neural data (Trovò et al., 2021), experimental examination of the ITB model of Libet-style experiments remained scarce.The aim of the current study was to provide further experimental evidence for the ITB model of intentional action by manipulating the decision-making process through instructions.We developed the Slope Steepness Hypothesis (Fig. 2a), which hypothesized that in the Veto Libet task participants would decrease the steepness of the signal accumulation, such that they would have enough time to decide whether to veto their decision to act or not.Based on this, we predicted that both the Waiting time would be longer, and the W would be earlier in the Veto Libet task than those in the Classical Libet task.Furthermore, we hypothesized that the difference in the Waiting time was larger than the difference in the W, such that the Start-to-W time (calculated by summing the W with the Waiting time), would be longer in the Veto Libet task than that in the Classical Libet task.Overall, the data of Experiment 1 supports the Slope Steepness Hypothesis, since the Waiting time and the Start-to-W time was longer in the Veto Libet task than those in the Classical Libet task.However, the W did not differ significantly between the two conditions in Experiment 1.
Although Experiment 1 provided support for the Slope Steepness Hypothesis, the levels of cognitive demand between conditions were not well controlled, which might also contribute to the difference observed in the Waiting time and the Start-to-W time.Moreover, in the Veto Libet task participants might pre-plan their decisions of movement execution at the beginning of the trial rather than deciding it after formulating the When decision.The concern about pre-planning was also mentioned but not addressed in previous studies using the Veto Libet task (e.g., Brass & Haggard, 2007).To address these issues, we compared participants' Waiting time, W, and Start-to-W time between the Pre-plan Libet task and the Veto Libet task (Experiment 2).In the Pre-plan Libet task participants were asked to pre-plan before the trial began so that the start of the signal accumulation would not be delayed.We also slightly modified the Veto Libet task to make the trial structure more similar, in which participants had to press either the Y key or the N key to initiate the Veto Libet task.Some reader might consider this as a cue for participants to preplan in the modified Veto Libet task.However, if this was the case the behavioral findings in the modified Veto Libet task should be the same as those in the Pre-plan Libet task (since in both cases the participants preplanned their choices).However, the results suggested that this was not the case.In other words, the Y/N response should be meaningless to the participant in the Veto Libet task.
Experiment 2 successfully replicated the Waiting time difference in Experiment 1. Furthermore, an earlier W in the Modified Veto Libet task than that in the Modified Pre-plan Libet task was observed.The difference in the W provided strong evidence to support the Slope Steepness Hypothesis (Fig. 2a) instead of the mere difference in the starting signal (Fig. 2b).Although the exploratory comparison in the Start-to-W time was inconclusive, the difference in the Waiting time (i.e., 299 ms) was substantially different from the difference in the W (i.e., 95.2 ms).This implies that the difference in the Waiting time cannot be totally explained by the difference in the W and further supports the Slope Steepness Hypothesis.
Apart from the Start-to-W time and the Waiting time, one interesting exploratory analysis revealed that the proportion of not acting trials in the Pre-plan Libet task was higher than the proportion of veto trials in the Veto Libet task.To our knowledge, there is limited literature comparing preplanned inhibition and the last moment veto of action.One possible speculation regarding this observation is that participants may find it more difficult to veto their intention than deciding not to act from the beginning.Regardless of the underlying reason, it again highlights that the decision-making processes behind the Veto-Libet task and the Pre-plan Libet task is different.This also indicates that participants were unlikely to pre-plan their decisions in the Veto Libet task and strengthened the construct validity of the task.
Some may consider the plausibility of the Veto Libet task as questionable.Kavka (1983) raised this question by his toxin-puzzle example.Briefly, in the toxin-puzzle example a billionaire makes an offer to a rational agent that if s/he intends to drink a toxin (which makes him/her ill for a day) at the midnight, s/he will get a million dollars.The billionaire emphasizes that the agent does not have to really drink the toxin, s/he can change his/her mind after intending it and s/he will still get the money.Based on this thought experiment Kavka asks a question: is it possible to make an intention that someone knows s/he will cancel it?While this thought experiment maybe interesting on its own, it is substantially different from the Veto Libet task.First, in Kavka's toxin-puzzle the agents preplan to cancel the intention before they generate it, and this is clearly a case of preplanning.Since in Experiment 2 we have demonstrated that the behavioral results of the Veto Libet task were substantially different from those of the Pre-plan Libet task, we argue that participants did not engage in this kind of preplanning process in the Veto Libet task.Instead, in the Veto Libet task participants were instructed to decide whether to veto or not only after generating the intention.Therefore, even though it is difficult or impossible for an agent to generate an intention after deciding to cancel it, it does not introduce a challenge to the Veto Libet task, which asks participants to generate an intention and then decide to cancel it.The current data clearly supports the Slope Steepness Hypothesis and the ITB model interpretation of the Libet-style experiment.Some readers may be interested in the relationship between the current results and the early decision model.However, this is not an issue we can settle based on the current results, as the early decision model is, in principle, compatible with the current findings.For instance, Bogler et al. (2023) argued that even if the participants have to accumulate decision signals and only experience their intention when the intention threshold is crossed, the nature of the signal can still be linear ballistic such that the decision of when/ whether to act is pre-decided much earlier than the experience of intention.The major aim of the current study is to test the specific shape of the decision signal (i.e., the Slope Steepness Hypothesis), and the design of the current study cannot determine whether the signal is linear ballistic, as suggested by Bogler et al. (2023), or stochastic, as suggested by Schurger et al. (2012).We acknowledge this as a limitation of the current study and invite future research to determine the nature of the signal.Since the debate about whether free will exists is related to the nature of the signal, we also remain neutral in the free will debate.
In the current study we compared the predictions produced by ITB model proposing only a change in the signal accumulation slope (i.e., the Slope Steepness Hypothesis; Fig. 2a) with another ITB model proposing only a change in the initial bias (Fig. 2b).However, is it also possible that both the initial bias and the signal accumulation slope change simultaneously between the two conditions, such that a combined model is proposed?We acknowledge this possibility and in the current study there is not a way to differentiate the combined model with the Slope Steepness Hypothesis model.We would consider this as a limitation of our study and invite future studies to address this.
While the Whether Decision has been studied using neuroimaging techniques such as EEG and fMRI (Brass & Haggard, 2007;Walsh et al., 2010), behavioral experiments on the Whether Decision are scarce as it was thought to be difficult to find a behavioral correlate of the Whether Decision.The current study suggests that the Waiting time and the Start-to-W time can fill this gap as they differed between the Classical Libet task and the Veto Libet task.A previous study (Walsh et al., 2010) also found a difference in W between the Classical Libet task and the Veto Libet task, which also seemed to be eligible as a behavioral correlate.The current study extended the scope of the behavioral correlates by suggesting the Waiting time and the Start-to-W time as other candidates.It has been shown that the W could be changed by a manipulation after the execution of the behavior (Lau et al., 2007), presumably because the manipulation affected the retrospective processes of the report.In contrast, the Waiting time is determined by the trial's onset and the behavior's initiation, all of which are objectively measured, eliminating the need for retrospective processes.This approach prevents postbehavior events from influencing the measurement.This objectivity is advantageous if researchers want to integrate the Veto Libet task with other manipulations, or if they do not want to use the Libet clock (e.g., the Kornhuber task; Kornhuber & Deecke, 1965).The Start-to-W time, on the other hand, is conceptually novel and interesting as it can be used as a temporal marker to study the brain signals across different time-windows before the conscious intention is formulated completely.Future neuroimaging research may use the Start-to-W time and the Waiting time to disentangle the time-windows they use in investigating the formation of intention and the execution of behavior.
The behavioral data in the current online study showed high consistency with the data collected in offline settings.The W estimate of the classical Libet task in the current study (Experiment 1) was − 126 ms, which was highly similar to the W estimate (− 122 ms) reported in a recent meta-analysis (Braun et al., 2021).The Waiting time estimate, nevertheless, was not provided in the meta-analysis so no such comparison could be conducted.The mean Waiting time in Schurger (2018) was much higher than that in the current study (7100 ms vs 3639 ms).However, this was mainly related to the difference in instruction, as we instructed the participants to act before the third rotation ended while Schurger (2018) did not employ such a constraint.The W estimate in the Veto Libet task was − 153 ms, which was also similar to that reported in the offline setting (− 141 ms; Brass & Haggard, 2007).The trial-level correlation between W and Waiting time observed in Schurger (2018) was replicated in the online version of the Classical Libet task based on the method in the original paper, although the Pearson's correlation coefficient found in the current study (− 0.06) was smaller than that in the original study (− 0.12).All in all, the behavioral data collected in current online experiment was highly consistent with the behavioral data collected offline by other studies.
The consistency of the findings suggests that the intention of voluntary action can be studied with an online setup.Recent metaanalysis (Braun et al., 2021) suggested that the average sample size of Libet-style experiments was approximately 19.This size is sufficient only for detecting effects greater than 0.68 in terms of Cohen's d with 0.8 statistical power, meaning that many of the studies were underpowered.Conducting the Libet-style experiment online addresses this issue by making large-scale experiments possible, since the cost of data collection can be reduced substantially.This would definitely increase the replicability of the field.
In summary, the current experiment examined the decision-making process behind voluntary behavior in the context of Libet-style experiment.Our data overall supported the ITB model interpretation to the decision-making process behind the Libet-style experiment.Furthermore, the current study suggested two underused measures, the Waiting time and Start-to-W time, could be reliable behavioral markers in studying human intention.By using experimental manipulations (such as the instruction to veto) and deriving specific behavioral predictions based on them, behavioral experiments can already help us verify different hypotheses behind the decisionmaking process.Future research can apply these two measures to study various aspects of intention, such as the less studied Whether Decision.Finally, the small number of samples has been suggested to be a shared problem across many of the Libet-style experiments.The present experiment successfully reproduced the behavioral patterns of the Classical Libet task and the Veto Libet task within an online environment.By adapting the Libet-style experiment to an online platform, it becomes possible to conduct largescale experiments, thereby enhancing statistical power and facilitating result replication.This approach holds the potential to significantly advance our comprehension of human intention.
Fig. 2. a. (Top) The Slope Steepness Hypothesis.The signal in the Classical Libet task (red line) has a steeper accumulation slope, while the signal in the Veto Libet task (blue line) has a less steep slope.Based on this, it is predicted that (1) the Waiting time (which is determined by the time of crossing the Action Threshold) in the Veto Libet task would be longer than that in the Classical Libet task, and (2) the W in the Veto Libet task would be earlier than that in the Classical Libet task.b. (Below) Prediction based on an ITB with only the difference in the initial bias and without the change in the signal accumulation slope.The W should not differ between the Classical Libet task and the Veto Libet task.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig. 5. Procedures of Experiment 2. (Top) In the Pre-plan Libet task, participants decided whether to act in the current trial before the trial started.If they decided to act, they pressed a key and reported the onset of their When Decision.If they decided not to act, they waited until the trial ended and reported the moment when the clock hand stopped.(Bottom) In the Modified Veto Libet task, the trial was structured to be similar to the Preplan Libet task.

Table 1
Descriptive statistics of Experiment 1 (top panel) and Experiment 2 (bottom).The mean (M) and standard error (S.E.) are shown for each dependent measure.All results were presented in millisecond (ms).