Post-decisional sense of confidence shapes speed-accuracy tradeoff for subsequent choices

In the absence of external feedback about decision outcomes, agents need to adapt their decision policies based on their internal evaluation of their own performance (i.e., decision confidence). We hypothesized that agents use decision confidence to continuously update the tradeoff between the speed and the accuracy of their decisions: When confidence is low on one trial, the decision-maker commits to a choice only after having accumulated more evidence on the next trial, leading to slower but more accurate decisions. Bounded accumulation models of decision-making provide a formal framework for testing this idea. Such models conceptualize the decision process as the accumulation of noisy sensory evidence towards one of two decision bounds, the height of which sets the speed-accuracy tradeoff. Low confidence on one trial should then lead to an increase of the decision bound on the next. We tested this prediction by fitting a bounded accumulation model to behavioral data from three different perceptual tasks, which entailed a binary choice with subsequent confidence rating. Indeed, decision bounds depended on the reported confidence about the correctness of the previous choice. Decision bounds were particularly strongly increased after participants were relatively certain to have made an error. The increase in decision bound was predicted by one post-decisional EEG signal sensitive to confidence and error perception (the so-called error positivity) peaking over centro-parietal cortex, but not by another (earlier, mid-frontal) post-decisional EEG signal (the error-related negativity). We conclude that the brain uses a subset of its post-decisional confidence signals for the ongoing adjustment of decision policies.


Introduction
Every day humans have to make numerous choices. These span from small and trivial (which shirt to wear) to complex and important (which house to buy). Decision-making is particularly challenging in the face of uncertainty (i.e., when based on ambiguous or noisy information about the state of the world). Human decision-makers are remarkably good at estimating their own accuracy, commonly reporting higher confidence for correct then for incorrect choices. Decision confidence can be defined as the probability of a choice being correct, given the available evidence (Pouget, Drugowitsch, & Kepecs, 2016;Sanders, Hangya, & Kepecs, 2016;Urai, Braun, & Donner, 2017). In recent years, an increasing number of studies has investigated neural correlates of decision confidence (Fleming, Weil, Nagy, Dolan, & Rees, 2010;Kepecs, Uchida, Zariwala, & Mainen, 2008;), including attempts to dissociate subjective reports of decision confidence from objective decision accuracy (Desender, Boldt, & Yeung, 2018;Odegaard et al., 2018;Zylberberg, Barttfeld, & Sigman, 2012). An important open question is whether and how the sense of decision confidence is used to regulate subsequent behavior (Meyniel, Sigman, & Mainen, 2015;Yeung & Summerfield, 2012). One key prediction from theoretical treatments of confidence is that, when information is sampled sequentially, confidence (the complement of uncertainty) can be used to regulate how much information should be sampled before committing to a choice (Meyniel et al., 2015).
Here, we tested this prediction within the context of bounded accumulation models of perceptual choice (see Figure 1 for illustration). Such models of decision-making posit that in order to make a choice, agents continuously integrate noisy sensory evidence over time (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006;Gold & Shadlen, 2007;Ratcliff & McKoon, 2008;Usher & McClelland, 2001). The so-called drift diffusion model (DDM ;Ratcliff & McKoon, 2008) is a widely used variant of these models. Here, the mean drift rate quantifies the efficiency of evidence accumulation. Large signal-to-noise ratio (SNR) of the sensory evidence yields a large drift rate and, consequently, high accuracy and rapid decisions (conversely for low SNR). Once the accumulated evidence reaches a predefined decision bound, observers stop accumulating and commit to a choice. When evidence SNR is constant over trials, this sequential sampling process achieves an intended level of decision accuracy with the shortest decision time, or, conversely, an intended decision time at the highest accuracy level (Gold & Shadlen, 2007;Moran, 2015). The separation of the two decision bounds is a means to control response caution: it sets the tradeoff between decision time and accuracy: The larger the bound separation, the more evidence is required before committing to a choice, increasing accuracy but at the cost of longer decision time.

Figure 1. Schematic of drift diffusion model (DDM) with varying decision bounds (upper bounds) and varying drift rates (lower bound). Noisy sensory evidence is accumulated over time, until the decision variable reaches one of two bounds (a or -a), corresponding to left and right choices. The efficiency of information accumulation is given by v (mean drift rate). The time for sensory encoding and response execution is given by T er . By increasing the separation between decision bounds, the probability of being correct increases, at the expense of prolonged reaction times. RT distributions (full lines) and error distributions (dotted lines) are depicted for different levels of decision bound and drift rate.
Several studies have shown that decision-makers can change their decision bounds as a function of external manipulations. For example, instructions to adhere to a liberal or conservative response strategy (Forstmann et al., 2008;Hanks, Kiani, & Shadlen, 2014;Palmer, Huk, & Shadlen, 2005), or environments that reward fast or accurate responses (Bogacz, Hu, Holmes, & Cohen, 2010) all change the bound separation. Such manipulations often rely on providing external feedback to the observer, however, in real-life decisions, choice outcome is often unknown to the decision-maker because explicit feedback is delayed or withheld. We reasoned that, in the absence of performance feedback, decision-makers set their decision bounds depending on their internal decision confidence. This prediction is in line with accounts that conceptualize confidence as an internal evaluation signal in the absence of feedback (Meyniel et al., 2015;Yeung & Summerfield, 2012).
To preview our findings, decision confidence indeed influenced the decision bound separation on the subsequent trial. The bound separation was increased after participants were relatively certain to have made an error, compared to trials where they had high confidence that they were correct -a situation that was most likely due to accumulation of additional evidence from a sensory buffer after the choice in our task (Resulaj, Kiani, Wolpert, & Shadlen, 2009;Yeung & Summerfield, 2012). We thus quantified the impact of two established post-decisional EEG signals on subsequent decision bound: A centroparietal positivity that is sensitive to confidence (Boldt & Yeung, 2015) and error perception (often called the error positivity, PE; Nieuwenhuis, Ridderinkhof, Blom, Band, & Kok, 2001) and the fronto-central error-related negativity (ERN; Dehaene, Posner, & Tucker, 1994). We found that the PE was linearly related to the subsequent-trial decision bound, whereas the ERN was not.

Post-decisional contribution to confidence judgments in motion discrimination
Twenty-eight human participants performed the task widely been used in computational and neurophysiological analyses of perceptual decision-making: discrimination of the net motion direction in dynamic random dot displays (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006;Gold & Shadlen, 2007;Siegel, Engel, & Donner, 2011). We asked participants to decide, as fast and accurate as possible, whether a subset of dots was moving coherently towards the left or right side of the screen. After their choice, a 1s blank screen or 1s of continued motion (same coherence and same direction as the initial stimulus) was shown, so as to allow for post-decisional accumulation of extra evidence, either from the sensory buffer (in the blank condition; Resulaj, Kiani, Wolpert, & Shadlen, 2009) or from the external stimulus (continued motion condition; Fleming, van der Putten, & Daw, 2018). After this additional second, participants indicated how confident they felt about having made the correct choice (see Figure 2A). Decision difficulty was manipulated by varying, from trial to trial, the proportion of coherently moving dots around psychophysical threshold (Materials and Methods).  As expected, RTs on correct trials and choice accuracy scaled with coherence level ( Figure 2B; RTs: F(4, 45.33) = 30.61, p < .001, error rates: Χ ²(4) = 1285.6, p < .001).
Correspondingly, drift rates estimated from fits of the drift diffusion model (see Materials and Methods) also increased monotonically with coherence level ( Figure 2B; Friedman χ 2(5) = 140, p < .001). In these model fits, decision bound separation was not allowed to vary as a function of coherence; its average estimate across participants was 2.09 (SD = .33).
Similarly, non-decision time was held constant across levels of coherence; its average was 0.38 (SD = .09). Model fits closely capture the patterns seen in behavior (i.e., green crosses in Figure 2B), suggesting that the DDM fits the behavioral data from our task well.
Participants' confidence ratings exhibited a key signature of statistical decision confidence (Sanders et al., 2016;Urai et al., 2017): an opposite-sign relation between evidence strength and confidence for correct and incorrect choices ( Figure 2F). The scaling of confidence judgments with coherence level (F(4,52.6) = 4.56, p = .003) depended on choice accuracy (F(4,6824.6) = 154.52, p < .001), with confidence increasing with coherence levels for correct trials (linear contrast: p < .001) and decreasing for error trials (linear contrast: p < .001). This pattern was highly similar in blocks with continued evidence following the choice and blocks in which choices were followed by a blank screen (see Figure   S1). Correspondingly, confidence ratings were closely linked to choice accuracy ( Figure 2C).
Even on a trial-by-trial basis confidence ratings were monotonically related to choice accuracy, even after factoring out the stimulus coherence (logistic regression of confidence on accuracy with coherence as a covariate: positive slopes for all observers, 23 ps < .025, five non-significant). Notably, confidence ratings were monotonically related to accuracy over a range from below maximum uncertainty (i.e., 50%) to about 100%: Rating-predicted accuracy ranged from 23% (certain error) up to 94% (certain correct), both of which were significantly different from chance level performance, both ps < .001. RTs for the initial choice were also monotonically related to accuracy (b = -.03, t(27) = -9.41, p < .001); however, in sharp contrast to the confidence ratings, RT-predicted accuracy variations only spanned the range from about 60 to 90% correct ( Figure 2E).
This pattern of results generalizes the signatures of decision confidence (as defined above) reported by previous analyses of reaction times or confidence reports (Sanders et al., 2016;Urai et al., 2017), by showing that confidence ratings can lawfully account for certainty about errors (i.e., accuracy levels below 50%) when such certainty is enabled by the experimental protocol, due to post-decisional evidence accumulation (see also Fleming et al., 2018). In line with this idea, when eliminating the possibility for post-decisional evidence accumulation in a control condition by having participants report their direction choice and level of confidence simultaneously, confidence linearly predicted accuracy only over the range from clearly above 50% (maximum uncertainty) up to close to 100% ( Figure 2D). In sum, participants' confidence ratings exhibited hallmark signatures of statistical decision confidence, in our task computed based on both, pre-as well as post-decisional evidence.

Decision confidence influences subsequent decision bound
We then used the DDM to quantify how participants' post-decisional sense of confidence, as reflected in their ratings, shaped the subsequent decision process, by fitting the model with trial-by-trial confidence ratings as covariate (see Materials and Methods). To obtain reliable and robust estimates, we combined trials labeled as 'certainly correct' and 'probably correct' into a 'high confidence' bin, trials labeled as 'guess correct' and 'guess wrong' into a 'low confidence' bin, and trials labeled as 'probably wrong' and 'certainly wrong'  Figure S3A). Finally, decision confidence had no effect on subsequent drift rate ( Figure 3A). Figure 2. Distributions show the group posteriors. Trials with high confidence were treated as reference (i.e., fixed to zero). Statistical significance is reflected in overlap between posterior distributions over parameter estimates (Materials and Methods). Note that, due to a lack of trials in one of the cells, results shown in C and D are based on 25 and 24 participants, respectively.

Figure 3. The influence of decision confidence on subsequent decision bounds and drift rates (A). Modeling results remain qualitatively similar when modeling only correct (B) or only error trials (C). Same conventions as in
The analyses presented in Figure  Further, fitting for the results in Figure 3A was performed regardless of choice accuracy. So, for example, trials labelled as 'perceived errors' contained a mixture of correct and error trials. Part of the results could thus reflect previously established effects of posterror slowing in decision-making (Purcell & Kiani, 2016). Going beyond these findings, we observed that the confidence rating-dependence of decision bound holds even for variations of confidence ratings within correct trials and within error trials tested in isolation ( Figure 3B-C). This indicates that the confidence-dependent modulation of decision bounds is specifically due to trial-by-trial variations in internal confidence signals, rather than the objective accuracy of the choice.

Confidence-dependent modulation of decision bound generalizes to other experimental paradigms
To establish the robustness of the above findings, we repeated the analyses for two further datasets using different perceptual choice tasks and experimental protocols. In Experiment 2, we reanalyzed the data of Boldt and Yeung (2015) in which sixteen participants performed a speeded decision task in which they decided as quickly as possible which of two boxes contained more dots (see Figure 4A). Importantly, different from Experiment 1, in this dataset only a single level of difficulty was used, thus allowing us to test whether the findings of Experiment 1 generalize to internal variations of confidence occurring at a fixed SNR of the evidence. Similar to Experiment 1, both RTs and confidence judgments predicted choice accuracy (see Figure S4). In Experiment 3, twenty-three participants performed a visual color categorization task, which required deciding as fast as possible whether the mean color of eight elements was red or blue ( Figure 4A; de Gardelle & Summerfield, 2011). Task difficulty was manipulated by independently varying the distance of the color mean from the category bound and the standard deviation across the elements' colors. Both variables together determined the SNR of the sensory evidence (i.e., mean distance from category boundary/variance). As reported in the Supplementary Materials, all signatures that were seen in the RTs, accuracy and confidence of Experiment 1, were also observed in this dataset (see Figure S6).
In both datasets, we found that subsequent decision bounds were modulated by decision confidence (see Figure 4B). In particular, when participants perceived to have

A post-decisional neural confidence marker predicts subsequent decision bound
The results from the previous sections indicated that confidence modulates the separation of the bounds for subsequent decisions. In all analyzed datasets, robust contributions to this bound modulation came from the post-decisional accumulation of evidence (from the sensory buffer or additional external information) -specifically from trials in which this post-decisional accumulation rendered participants relatively certain that they had made an error. Assuming that participants did not intentionally pick the wrong choice, certainty about an error could have arisen only after choice commitment. Indeed, post-decisional evidence accumulation has been proposed as a general mechanism for confidence ratings (Pleskac & Busemeyer, 2010). Two post-decisional components of the evoked potential are established EEG markers of internal confidence or error signals in humans: (i) the ERN, a fronto-central signal peaking around the time of the response; and (ii) the Pe, a centro-parietal signal that follows the ERN in time. The ERN is implicated in error processing (Yeung, Botvinick, & Cohen, 2004) and originates from mid-frontal cortex (Dehaene et al., 1994;Van Veen & Carter, 2002). The Pe was initially linked to error perception (hence its name; Nieuwenhuis et al., 2001) and more recently to post-decisional evidence accumulation (Murphy, Robertson, Harty, & O'Connell, 2015) as well as fine-grained variations in decision confidence (Boldt & Yeung, 2015).
We here used the EEG data that were collected in Experiment 2 to test if the confidence-dependent modulation of subsequent decision bounds was linked to one, or both, of these post-decisional neural signals. We reasoned that these neural data may provide a more veridical measure of the internal confidence signals governing behavior than the overt ratings provided by participants, which require additional transformations (and thus additional noises sources) and are likely biased by inter-individual differences in scale use and calibration. Furthermore, quantifying the unique contribution of both post-decisional signals to bound adjustment allowed for testing for the specificity of their functional roles, an important issue given their distinct latencies and neuroanatomical sources.
First, we showed that in line with the original report of these data (Boldt & Yeung, 2015), both the Pe and the ERN were modulated by decision confidence ( Figure 5A We then fitted the DDM to the data with both EEG markers as (trial-to-trial) covariates (i.e., ignoring decision confidence) to test if either or both post-decisional EEG signals predicted subsequent decision bound (see Figure 6A).  Figure S8.

Figure 6. Subsequent decision bounds and drift rates (A) as a function of Pe and ERN. B. Pedependent variations in subsequent decision computation. C-D. Modeling results remain qualitatively similar when modeling only correct (B) or only error trials (C).
The bin with the lowest Pe amplitude was always treated as reference category (i.e., fixed to zero). Data were fit using the regression approach, so values reflect coefficients.

The error positivity (Pe) linearly scales with subsequent decision bound
The previous section showed that the Pe was related to subsequent decision bound and drift rate, but that analysis could not reveal potential nonlinearities in this relationship.
We therefore divided the Pe into five equal-sized bins based on its amplitude, separately for each participant. In the previous section, both components were entered in a single model, thus in that fit the regression coefficients capture unique variance of each signal. Here, we specifically followed-up on the Pe, and so in order to again capture unique variance (i.e., there was shared variance with the ERN; mean r = .180, t(15) = 6.92, p < .001), bins were created after the ERN was regressed out of the Pe, separately for each participant. As can be seen in the inset of Figure 5A, this did not affect the scaling of the Pe by confidence. Note that the results described below remain largely unchanged when creating bins based on raw Pe amplitude ( Figure S9). Figure  Thus, the Pe qualifies as a neural marker of decision confidence predicting flexible, trial-totrial adaptation of the decision bounds.

Discussion
Accumulation-to-bound models of decision making assume that choices are formed once the integration of noisy evidence reaches a bound. This decision bound is commonly assumed to be fixed within a block of constant external task conditions (Ratcliff & McKoon, 2008). Here, we show that this decision bound, in fact, dynamically changes from trial to trial, dependent on the confidence about the previous decision: In three independent datasets the separation between decision bounds increased after participants sensed they had made an error. A post-decisional brain signal, the so-called Pe component of the ERP, scaled with decision confidence and linearly predicted the decision bound on the subsequent trial. These findings indicate that, in the absence of external feedback about choice outcome, decisionmakers use internal, post-decisional confidence signals to continuously update their decision policies.

Decision confidence modulates subsequent decision bound
Choice behavior exhibits substantial intrinsic variability (for review, see Wyart & Koechlin, 2016). Current models of decision-making account for this behavioral variability in terms of parameters quantifying random "noise" in the decision process (e.g., within the DDM: drift rate variability; Ratcliff & McKoon, 2008). Recent evidence shows that some of this variability is not actually noise, but rather due to dynamic variations in systematic decision biases due to choice history (Urai et al., 2017)  coming from a sensory buffer (Resulaj et al., 2009) or from additional sensory input (Fleming et al., 2018). After the integrated evidence has hit a decision bound, and a choice is made, the evidence continues to accumulate, and so the decision variable can eventually favor the unchosen option. Such post-decisional evidence accumulation can naturally account for dissociations between confidence ratings and choice accuracy (Moran, Teodorescu, & Usher, 2015;Navajas, Bahrami, & Latham, 2016;Pleskac & Busemeyer, 2010). Indeed, recent work using a similar protocol like our Experiment 1 showed, likewise, low confidence judgments predicting close to 0% accuracy, which was attributed to the integration of postdecisional evidence into confidence judgments (Fleming et al., 2018). That previous study also showed near-perfect integration of pre-decisional and post-decisional stimulus information into confidence judgments. By contrast, in our Experiment 1, we found that postdecisional sensory stimuli did not have a larger impact on confidence than a post-decisional delay with just a blank screen. The fact that both of those conditions with delayed a confidence judgment had equally strong impact on confidence judgments compared to a third condition with immediate confidence judgments (figure 2D) indicates that post-decisional evidence was accumulated from a buffer, but the extra sensory information was not used for the confidence judgment, different from Fleming et al. (2018). This difference might be explained by a number of differences between the experimental protocols -most importantly, the fact that Fleming et al., but not us, rewarded their participants based on the accuracy of their confidence judgments, which might have motivated their participants to actively process the post-decisional stimulus information.
Recent work has identified the error positivity (Pe) as tracking post-decisional evidence accumulation (Murphy et al., 2015) and reflecting fine-grained levels of decision confidence (Boldt & Yeung, 2015). Building on these insights, we demonstrated that the Pe predicted increases in subsequent decision bound. Interestingly, this relation was specific for the Pe, and not evident for another robust post-decisional signal, the ERN. Other work has linked frontal theta oscillations (which have been proposed to drive the ERN (Cavanagh & Frank, 2014;Yeung, Bogacz, Holroyd, Nieuwenhuis, & Cohen, 2007; but see Cohen & Donner, 2013) to slowed reaction times following an error (Cavanagh, Cohen, & Allen, 2009).
However, this is typically observed in flanker tasks, where there is no ambiguity concerning choice accuracy, obviating the need for confidence to adapt behavior.

The relation between decision confidence and drift rate
The main focus of the current work was to unravel influences of decision confidence on subsequent decision bound; we had no predefined hypothesis about whether confidence also affects subsequent drift rate. In Experiments 2 and 3, we observed a small reduction in drift rate following low-confidence trials. This non-monotonic reduction in drift rate driven by low confidence seems hard to reconcile with the clear monotonic relation between currenttrial Pe amplitude and subsequent drift rate seen in the EEG data of Experiment 2. One explanation for this discrepancy might be that neural recordings provide a more veridical measure of the internal evaluation of accuracy than explicit confidence reports, which is subject to differences in scale use and differences in calibration. Indeed, when we fitted a model in which subsequent drift rate was allowed to vary as a function of both decision confidence and binned Pe amplitude, both the non-monotonic relation with decision confidence and the monotonic relation with Pe amplitude were replicated. Previous work has observed similar reductions of subsequent drift rate after errors (Notebaert et al., 2009;Purcell & Kiani, 2016), possibly reflecting distraction of attention from the main task due to error processing. Thus, in addition to affecting subsequent decision bounds, internal confidence (in particular: error) signals might also affect subsequent attentional focus on subsequent trials.

Decision confidence and adaptive behavior
Human observers slow down following incorrect choices, a phenomenon referred to as post-error slowing (Rabbitt, 1966). The underlying mechanism has been a matter of debate. Post-error slowing has been interpreted as a strategic increase in decision bound in order to avoid future errors (Dutilh, Vandekerckhove, et al., 2012;Goldfarb, Wong-Lin, Schwemmer, Leonard, & Holmes, 2012;Holroyd, Yeung, Coles, & Cohen, 2005) or an involuntary decrease in attentional focus (e.g., reduced drift rate) following an unexpected event (Notebaert et al., 2009;Purcell & Kiani, 2016). A key observation of the current work is that similar adjustments can also be observed based on internally computed confidence signals. Our results also go beyond established effects of post-error slowing in that we establish them for trial-to-trial variations confidence within the "correct" and "error" condition.
Modulating the speed-accuracy tradeoff by decision confidence can be thought of as adaptive way to achieve a certain level of accuracy. Indeed, normative models prescribe that uncertainty (i.e., the inverse of confidence) should determine how much information needs to be sampled (Bach & Dolan, 2012;Meyniel, Sigman, & Mainen, 2015). The current findings help bridge between studies of top-down control and perceptual decision-making (Shea et al., 2014;Shimamura, 2008;Yeung & Summerfield, 2012). Decision confidence has been shown to influence information-sampling choices (Desender et al., 2018), guide study choices (Metcalfe & Finn, 2008) and act as an internal teaching signal that supports learning (Guggenmos, Wilbertz, Hebart, & Sterzer, 2016). Of direct relevance for the current work is a recent study by van den Berg and colleagues (2016) who showed that confidence acts as a bridge in multi-step decision-making. In their work, reward was obtained only when two choices in trial sequence were correct. The results showed a linear increase in decision bound with increasing confidence in the first decision of a sequence. The sign of this relation was opposite to what we observed in the current work. Given the multi-step nature of the task, observers likely sacrificed performance on the second choice (by decreasing the decision bound) when having low confidence in the first choice, given that both choices needed to be correct in order to obtain a reward. Contrary to this, in our current work observers were motivated to perform well on each trial, and thus adaptively varied the height of the decision bound in order to achieve optimal performance.
In sum, we have shown that decision confidence affects subsequent decision bounds on a trial-by-trial level. A post-decisional brain signal sensitive to decision confidence predicted this adaptive modulation of the decision bound at a single-trial level.

Participants
Thirty participants (two men; age: M = 18.5, SD = .78, range 18 -21) took part in Experiment 1 (two excluded due to a lack of data in one of the confidence judgments).
Twelve participants (three men, mean age: 20.6 years, range 18 -42) took part in Experiment 2a (one excluded due to a lack of variation in confidence judgments) and twelve participants (all female, mean age: 19.1 years, range 18 -22) in Experiment 2b, all in return for course credit. All participants provided written informed consent before participation. All reported normal or corrected-to-normal vision and were naive with respect to the hypothesis.
All procedures were approved by the local ethics committees.

Stimuli and apparatus
In all experiments, stimuli were presented on a gray background on a 20-inch CRT monitor with a 75 Hz refresh rate, using the MATLAB toolbox Psychtoolbox3 (Brainard, 1997). Responses were made using a standard QWERTY keyboard.
In Experiment 1, random moving white dots were drawn in a circular aperture centered on the fixation point. The experiment was based on code provided by Kiani and colleagues (2013), and parameter details can be found there.
In Experiment 2, two fields were presented with one field containing 45 dots in a 10by-10 matrix, the other containing 55 dots. Within this constraint, the displays were randomly generated for each new trial. In

Procedure
Experiment 1. After a fixation cross shown for 1000ms, randomly moving dots were shown on the screen until a response was made or 3s passed. On each trial, the proportion of dots moving coherently towards the left or right side of the screen was either 0%, 5%, 10%, 20% or 40%. In each block, there was an equal number of leftward and rightward movement. Participants were instructed to respond as quickly as possible, deciding whether the majority of dots were moving left or right, by pressing 'c' or 'n' with the thumbs of their left and right hand, respectively (counterbalanced between participants). When participants failed to respond within 3s, the trial terminated with the message 'too slow, press any key to continue'. When participants responded in time, either a blank screen was shown for 1s or continued random motion continued for 1s (sampled from the same parameters as the predecisional motion). Whether a blank screen or continued motion was shown depended on the block that participants were in. Subsequently, a 6-point confidence scale appeared with labels "certainly wrong", "probably wrong", "maybe wrong", "maybe correct", "probably correct", and "certainly correct" (reversed order for half of the participants). Participants had unlimited time to indicate their confidence by pressing one of six numerical keys at the top of their keyboard (1, 2, 3, 8, 9 or 0), which mapped onto the six confidence levels. On half of the trials, the coherence value on each timeframe was sampled from a normal distribution (SD = 25.6%) around the generative coherence (cf. Zylberberg, Fetsch, & Shadlen, 2016). This manipulation was irrelevant for the current purpose, however, and was ignored in the analysis. Apart from the blocks with a 1s blank screen and 1s continued evidence following the response, there was a third block type in which participants jointly indicated their choice (left or right) and level of confidence (low, medium, or high) in a single response. Because perceived errors cannot be indicated using this procedure, data from these blocks was omitted. The block order of these three conditions was counterbalanced using a Latin square. The main part of Experiment 1 comprised 9 blocks of 60 trials. The experiment started with one practice block (60 trials) without confidence judgments (only 20% and 40% coherence), one practice block (60 trials) without confidence judgments (all coherence values) and one practice block (60 trials) with confidence judgments.
Experiment 2. On each trial, participants judged which of two simultaneously flashed fields (160ms) contained more dots, using the same response keys as in Experiment 1 (counterbalanced across participants). After their response, a blank screen was presented for 600ms after which confidence in the decision was queried using the same labels and response lay-out as in Experiment 1. The inter-trial interval lasted 1s. Each participant performed 18 blocks of 48 trials.
Experiment 3. After a fixation point shown for 200ms, the stimulus was flashed for 200ms, followed again by the fixation point. Participants were instructed to respond as quickly as possible, deciding whether the average of the eight elements was blue or red, using the same response lay-out as in Experiment 1 (counterbalanced between participants).
When participants failed to respond within 1500ms, the trial terminated with the message 'too slow, press any key to continue'. When participants responded in time, a fixation point was shown for 200ms. Then, participants where queried for a confidence judgments using the same scale and response lay-out as in Experiment 1. The inter-trial interval lasted 1000ms.
The main part of Experiment 3a comprised 8 blocks of 60 trials. To maintain a stable color criterion over the course of the experiment, each block started with 12 additional practice trials in which the confidence judgment was omitted. The experiment started with one practice block (60 trials) without confidence judgments and one practice block (60 trials) with confidence judgments. The main part of Experiment 3b comprised 8 blocks of 64 trials. Each block started with 16 additional practice trials in which the confidence judgment was omitted.
The experiment started with one practice block (64 trials) without confidence judgments and one practice block (64 trials) with confidence judgments. In even blocks of Experiment 3b, participants did not provide a confidence judgment, these data are excluded here.

Behavioral analyses
Behavioral data were analyzed using mixed regression modeling. This method allows analyzing data at the single-trial level. We fitted random intercepts for each participant; error variance caused by between-subject differences was accounted for by adding random slopes to the model. The latter was done only when this increased the model fit, as assessed by model comparison. RTs and confidence were analyzed using linear mixed models, for which F statistics are reported and the degrees of freedom were estimated by Satterthwaite's approximation (Kuznetsova, Brockhoff, & Christensen, 2014). Accuracy was analyzed using logistic linear mixed models, for which Χ ² statistics are reported. Model fitting was done in R (R Development Core Team, 2008) using the lme4 package (Bates, Maechler, Bolker, & Walker, 2015).

EEG data preparation
Precise details about the EEG collection have been described in Boldt & Yeung (2015) and are not reiterated here. From the data presented in that work, we extracted rawdata single-trial amplitudes using the specified time windows and electrodes. Raw data were low-pass filtered at 10Hz. Afterwards, single-trial ERN amplitudes were extracted at electrode FCz during the window -10ms pre until 90ms post-response. Single-trial Pe amplitudes were extracted at electrode Pz during a window from 250ms to 350ms postresponse.

Controlling for autocorrelation in performance
The relation between decision confidence and decision bound on the subsequent trial might be confounded by autocorrelations in performance. During the course of an experiment autocorrelation is typically observed in RTs, accuracy (Dutilh, Ravenzwaaij, et al., 2012), and confidence (Rahnev, Koizumi, Mccurdy, Esposito, & Lau, 2015). This could be due to slow fluctuations in attention or motivation (Macdonald, Mathan, & Yeung, 2011). When observers report high confidence in fast periods and low confidence in slow periods of the experiment (c.f., the link between response speed and confidence; Kiani, Corthell, & Shadlen, 2014), this can artificially induce a negative relation between decision confidence on trial n and reaction time on trial n+1 . A straightforward solution to control for this is using confidence on trial n+2 . An effect of confidence on trial n+2 on decision bound on trial n+1 cannot be driven by confidence (i.e., because it follows in time) and therefore captures these slow variations in performance.
By subtracting the influence of confidence on trial n+2 on our dependent variables from the influence of confidence on trial n on our dependent variables, a pure measure of modulations of the decision bound can be extracted.

Drift diffusion modeling
We fitted the drift diffusion model (DDM) to behavioral data (choices and reaction times). The DDM is a popular variant of sequential sampling models of perceptual decision making (Ratcliff & McKoon, 2008). It provides good fits to RT and choice patterns from a vast array of two-choice tasks. It quantifies latent computational processes that naturally map onto neural mechanisms situated at different stages of the sensory-motor pathways of the brain (Gold & Shadlen, 2007). We used hierarchical drift diffusion modeling as implemented in the HDDM toolbox (Wiecki, Sofer, & Frank, 2013 To compute statistics, we subtracted group posterior distributions of confidence on trial n+2 from confidence on trial n , and computed p-values from these difference distributions. To compare these models against simpler ones, we additionally fitted models in which bound, drift or both were fixed rather than free. We used Deviance Information Criterion (DIC) to compare different models to each other. Lower DIC values indicate that a model explains the data better, while taking model complexity into account. A DIC of 10 is generally taken as a meaningful difference in model fit.

DDM modeling of behavior
In Experiment 1, we first used the default accuracy coding scheme to fit a model where drift rate depended on the coherence level. All other parameters were not allowed to vary. This fit produced lower DIC values compared to a fit in which the drift rate was fixed (ΔDIC = -3882). Next, we used the regression coding scheme and allowed both the decision bound and drift rate to vary as a function of confidence on trial n and confidence on trial n+2 (both of which were treated as factors). Trials with high confidence were always treated as reference category (i.e., fixed to zero). In addition, the drift rate was allowed to vary as a function of coherence, which was treated as a covariate (because we were not interested in the parameter estimate but solely wanted to capture variance in the data accounted for by signal-to-noise ratio). To quantify the influence of confidence on the subsequent decision bound and drift rate, we subtracted estimates of subsequent bound and drift by confidence on trial n+2 from estimates of subsequent bound and drift by confidence on trial n . Statistics of the simple effects of confidence on trial n and confidence on trial n+2 are reported in the Supplementary Materials. Relative to the null model without confidence, the full model (presented in Figure 3) provides the best fit (ΔDIC = -288), explaining the data better than simpler models in which only the bound (ΔDIC = -234) or the drift (ΔDIC = -94) were allowed to vary.
The data of Experiment 2 were analyzed in the same way, except that difficulty was fixed and thus trial difficulty (i.e., coherence or signal-to-noise ratio) needed not to be accounted for within the model. Relative to the null model, allowing both drift and bound to vary as a function of confidence provides the best fit (ΔDIC = -677), which explained the data better than simpler models in which only the bound (ΔDIC = -302) or the drift (ΔDIC = -170) were allowed to vary.
The data of Experiment 3 were analyzed in the same way as Experiment 1, except that the variable coherence was replaced by signal-to-noise ratio. For both experiment 3a and 3b, a model in which only the drift was allowed to vary as a function of signal-to-noise ratio produced lower DIC values compared to a fit in which the drift rate was fixed (Experiment 3a: Δ DIC = -311; Experiment 3b: Δ DIC = -88). For the confidence-dependent fitting, a single model was fit to the data of Experiments 3a and 3b simultaneously. Relative to the null model without confidence, the full model (presented in Figure 4) provides the best fit (ΔDIC = -634), explaining the data better than simpler models in which only the bound (ΔDIC = -58) or the drift (ΔDIC = -260) were allowed to vary.

DDM modeling of EEG data
Because single-trial EEG contains substantial noise, a robust measure was computed by rank ordering all trials per participant, and then using rank as a predictor rather than the raw EEG signal. A hierarchical DDM regression model was then fit in which subsequent bound and drift were allowed to vary as a function of the Pe and the ERN, both on trial n and trial n+2 .
To examine potential nonlinear effects, the Pe was divided into five bins, separately for each participant. This was done after regressing out the effect of the ERN, separately for each participant. Then, a hierarchical DMM regression model was run in which subsequent bound and drift were allowed to vary as a function of binned Pe on trial n and trial n+2 . The bin with the lowest amplitudes was always treated as the reference category. Model comparison revealed that, relative to a model without the Pe, the full model provides the best fit (ΔDIC = -6524), explaining the data much better than simpler models in which only the bound (ΔDIC = -1816) or the drift (ΔDIC = -1787) were allowed to vary. When applying the same binned analysis to the ERN, model comparison revealed that both the full model (ΔDIC = 24), and a model in which only drift (ΔDIC = 171) or bound (ΔDIC = 182) were allowed to vary, provided a worse fit than the null model. Thus, the ERN had no explanatory power in explaining either drift rate or decision bound.