Cognitive modelling reveals distinct electrophysiological markers of decision confidence and error monitoring

Is confidence in perceptual decisions generated by the same brain processes as decision itself, or does confidence require metacognitive processes following up on the decision? In a masked orientation task with varying stimulus-onset-asynchrony, we used EEG and cognitive modelling to trace the timing of the neural correlates of confidence. Confidence as reported by observers increased with stimulus-onset-asynchrony in correct and to a lesser degree in incorrect trials, a pattern incompatible with established models of confidence. Electrophysiological activity in two different time periods was associated with confidence, namely, 350 – 500 ms after stimulus onset and 250 – 350 ms after the response. Cognitive modelling revealed that only the activity following on the stimulus exhibited the same statistical regularities as confidence, while the statistical pattern of the activity following the response was not compatible with confidence. It is argued that electrophysiological markers of decision confidence and error awareness are at least in parts distinct.


Introduction
Decision confidence is a ubiquitous feature of human decision making: Whenever we make a choice, the decision is accompanied by a greater or smaller degree of confidence that the choice is correct. Confidence can be defined as an evaluation of one's decision making, resulting in a degree of certainty that the decision is correct (Pouget, Drugowitsch, & Kepecs, 2016). How does the brain give rise to confidence? Two conflicting views have been proposed: According to one view, confidence may be generated directly by the very same brain processes that are involved in decision formation (Kepecs, Uchida, Zariwala, & Mainen, 2008;Rolls, Grabenhorst, & Deco, 2010). This view is closely related to Bayesian brain theory, which argues that representations about uncertainty are necessary for optimal choices (Knill & Pouget, 2004). According to the second view, confidence is generated by a separate, metacognitive process that gives rise to both confidence and error awareness (Boldt & Yeung, 2015;Charles & Yeung, 2018). A common mechanism underlying error monitoring and decision confidence may be on-going accumulation of sensory evidence after the decision, allowing observers to reverse their belief about the stimulus (Pleskac & Busemeyer, 2010;Resulaj, Kiani, Wolpert, & Shadlen, 2009;Steinhauser, Maier, & Hübner, 2008;van den Berg et al., 2016).
The aim of the present study was to test if the neural correlates of confidence in a perceptual decision emerge already before the time of the behavioural response consistent with a common origin of confidence and choice formation, or if these correlates do not emerge until the time of neural markers of error awareness following the response. For this purpose, the present study used cognitive modelling and electroencephalography to trace the timing of the neural correlates of confidence in perceptual decisions.
However, a marker of accumulated evidence is by far not the only interpretation of the P3: According to a classical theory, the P3 reflects updating of working memory in response to task-relevant events (Donchin & Coles, 1988). Other theories include the global broadcast of visual contents within a neural global workspace (Sergent et al., 2005), the mobilization for action following motivationally significant stimuli (Nieuwenhuis, de Geus, & Aston-Jones, 2011), or a monitoring process if the decision is correctly transformed into an action (Verleger, Ja, & Wascher, 2005) ERN and Pe are established makers of error processing: If one shared neurocognitive mechanism gives rise to both confidence and error monitoring, confidence should be associated with ERN and Pe. The ERN is an ERP component with frontocentral topography at the same time of shortly after incorrect responses (Falkenstein, Hohnsbein, Hoormann, & Blanke, 1991;Gehring, Goss, Coles, Meyer, & Donchin, 1993). An equivalent yet smaller negativity referred to as CRN was observed after correct responses (Vidal, Burle, Bonnet, Grapperon, & Hasbroucq, 2003). Previous studies suggested that the ERN was associated with participants' confidence judgments in a flanker task (Scheffers & Coles, 2000).
However, the ERN failed to predict graded confidence judgments on a trial-to-trial basis in visual discrimination tasks with briefly flashed stimuli (Boldt & Yeung, 2015). Finally, the ERN can be dissociated from decision confidence by the relation with subjective visibility: In a masked number discrimination task, the ERN varied in an all-or-nothing way and was only present if there was a conscious percept of the stimulus, while confidence varied continuously and did not depend on a conscious percept of the stimulus (Charles, King, & Dehaene, 2014;Charles, Opstal, Marti, & Dehaene, 2013).
The Pe is a parietally focused positive deflection 200 -500 ms after incorrect responses. The Pe is similar to the parietal P3 in terms of topography and latency despite the fact that the Pe is locked to the response, and P3 to the stimulus (Overbeek, Nieuwenhuis, & Ridderinkhof, 2005). The Pe is a marker of conscious awareness of having committed an error (Nieuwenhuis, Ridderinkhof, Blom, Band, & Kok, 2001) and can be dissociated from the ERN: In a study where participants responded to a masked target stimulus surrounded by visible flanker stimuli, erroneous responding to the flanker elicited only a Pe, but not an ERN (Di Gregorio, Maier, & Steinhauser, 2018). The Pe can be explained by the strength of accumulated evidence of having made an error (Steinhauser & Yeung, 2010Ullsperger, Harsay, Wessel, & Ridderinkhof, 2010;Wessel, Danielmeier, & Ullsperger, 2011). Moreover, in a visual discrimination task, the Pe was associated with both confidence in correct responses as well as the subjective belief of having made an error in a gradual way (Boldt & Yeung, 2015). However, the timings of ERN and Pe are not immediately plausible for correlates of decision confidence. As it seems that confidence is experienced already at a point in time when no response has yet been made, correlates of confidence may naïvely be expected before the response, at the same time as the decision or shortly afterwards. And yet, ERN and Pe do not occur until after the response.
How can neural correlates of confidence be identified? If specific neural activity is a correlate of confidence, it must be associated with the same statistical regularities as confidence judgments (Kepecs et al., 2008;: By implication, if the statistical regularities of a specific ERP component are incompatible with those of confidence, that component is not a correlate of confidence. In the present study, we tracked the statistical regularities of confidence by fitting a series of cognitive models to the behavioural data. The model that fitted the behaviour best was used to predict the neuronal data. Previous studies used the so-called folded X-pattern as a statistical marker of confidence (Braun, Urai, & Donner, 2018;Fetsch, Kiani, Newsome, & Shadlen, 2014;Herding et al., 2019;Lak, Nomoto, Keramati, Sakagami, & Kepecs, 2017;Urai, Braun, & Donner, 2017). The folded X-pattern is characterised by an increase of confidence with stimulus strength in correct trials and a decrease of confidence with stimulus strength in incorrect trials and was derived from Bayesian decision theory Sanders et al., 2016), but also follows from signal detection theory (Kepecs et al., 2008) or postdecisional accumulation models (Moran, Teodorescu, & Usher, 2015).
However, the folded X-pattern can be misleading about confidence because Bayesian decision theory is compatible with other statistical patterns, too (Adler & Ma, 2018;Rausch & Zehetleitner, 2019b). In addition, in some tasks, confidence empirically increased with stimulus strength in correct trials and to a lesser degree in incorrect trials (Kiani, Corthell, & Shadlen, 2014;Rausch, Hellmann, & Zehetleitner, 2018;Stolyarova et al., 2019;van den Berg et al., 2016), a pattern we refer to as double increase pattern. The double increase pattern can be reproduced by a smaller number of mathematical models, including the weighted evidence and visibility (WEV) model (Rausch et al., 2018), the heuristic detection model (Maniscalco, Peters, & Lau, 2016;Peters et al., 2017), and some Bayesian models (Adler & Ma, 2018;Rausch & Zehetleitner, 2019b). For these reasons, it is not legitimate to assume a specific statistical pattern a priori. However, irrespective of whether confidence follows the folded-X or double increase pattern in a specific task, a neural correlate of confidence should always show the same pattern as the one observed with confidence judgments. In addition, a cognitive model fitted to confidence judgments should also be able to accurately predict the neural correlate of confidence.
To assess the timing of the neural correlates of confidence in perceptual decisions, human observers performed a masked orientation discrimination task (see Fig. 1) while EEG was recorded. After each single response, observers reported their confidence on a scale with the categories, "not at all", "a little", "nearly sure", and "completely sure". In case observers were aware of an incorrect response, observers were instructed to respond, "not at all". We used a task where confidence followed the double increase pattern in previous studies (Rausch et al., 2018), because the double increase pattern can be explained by a smaller number of cognitive models. The strength of stimulation was manipulated by varying the stimulus-onset-asynchrony (SOA), i.e. the time between onset of the stimulus and the mask.
Bayes factors were used for statistical inference, allowing us to quantify both the evidence for an effect as well as evidence against an effect (Rouder, Speckman, Son, & Morey, 2009). Fig. 1. Sequence of events during the Experiment. The target stimulus was a sinusoidal grating, oriented horizontally or vertically. After 16.7, 33.3, 66.7, 133.3 ms, the target was replaced by a chequered mask presented for 500 ms. Afterwards, observers reported first the orientation of the target and then their degree of confidence in having made the correct orientation response. Observers were instructed that accuracy but not speed was critical for both responses.
Because the objective of the present study was specifically confidence, we selected models from the literature that are fitted directly to confidence judgments, and not to reaction times.
From the model parameters that fitted the behavioural data best, we generated a prediction about the ERP amplitudes under the assumption that the ERP amplitude is proportional to confidence.
With respect to confidence judgments, we expected that confidence increases as a function of the SOA both in correct as well as in incorrect trials, i.e. confidence is characterised by the double increase pattern. Regarding proposed ERP correlates of confidence, as correct responses are commonly associated with more positive activity at the time of the P3 (Koivisto & Revonsuo, 2010), we hypothesized that confidence is positively associated with EEG activity at the time of the P3. As errors are known to cause negative shifts at the time of the ERN, again a positive association was expected between confidence and activity at the time of the ERN (Scheffers & Coles, 2000). In contrast, as errors are known to cause positive shifts at the time of the Pe, we predicted a negative association between confidence and activity at the time of the Pe in line with previous research (Boldt & Yeung, 2015). Moreover, if P3, ERN, and Pe were indeed correlates of confidence, the statistical pattern as a function of SOA and choice accuracy should correspond to the statistical pattern observed in confidence judgments: This means that P3, ERN, and Pe should be characterised by the double increase pattern as well. Regarding cognitive models, we expected that the best fit to the behavioural data should be achieved by one of the models that is in principle able to accommodate the double increase pattern, i.e. the WEV-model, the heuristic detection model, or the noisy decay model. Finally, the models that provide an adequate fit to the behavioural data should also accurately predict the ERP correlates of confidence.

Methods
Participants 25 human participants (21 female, 4 male) took part in the experiment. The age of the participants ranged between 18 and 36 years (Md = 22). All participants reported normal or corrected-to-normal vision, no history of neuropsychological or psychiatric disorders and not to be on psycho-active medication. All participants gave written informed consent and received either course credits or €8 per hour for participation. The experimental protocol was approved by the ethics committee of the Catholic University of Eichstätt-Ingolstadt.

Apparatus and stimuli
The experiment was performed a sound-attenuated and electrically shielded cabin.
The stimuli were presented on an Iiyama MS103DT monitor with screen diagonal of 51 cm, set at a resolution of 1280 x 1024 px and refresh rate of 60 Hz. The viewing distance, not enforced by constraints, was approximately 60 cm. The experiment was conducted using PsychoPy v.1.83.04 (Peirce, 2007(Peirce, , 2009) on a Fujitsu Celsius W530 desktop computer with Windows 8.1. The target stimulus was a square (size 3° × 3°), textured with a sinusoidal grating with one cycle per degree of visual angle (maximal luminance: 44 cd/m²; minimal luminance: 14 cd/m²). The mask consisted of a square (4° × 4°) with a black (0 cd/m²) and white (60 cd/m²) chequered pattern consisting of 5 columns and rows. All stimuli were presented at fixation in front of a grey (29 cd/m²) background. The orientation of the grating varied randomly between horizontal or vertical. Participants reported the orientation of the grating with their right hands by pressing the down key when the grating was vertical and the right key when the grating was horizontal. Likewise, participants reported their confidence in being correct with their left hands by pressing one, two, three, or four on the number keys in top row of the keyboard.

Experimental trial
Each trial began with the presentation of a fixation cross whose duration randomly chosen between 950, 1000, and 1050 ms. Then the target stimulus was shown for a short period of time until it was replaced by the masking stimulus. There were four different possible SOAs, time periods between target onset and mask onset, 16.7, 33.3, 66.7 and 133.3 ms. The mask was presented for 500 ms. When the mask had disappeared, an empty screen was shown. Participants then indicated whether the target had been horizontal or vertical.
The question "How confident are you about your response?" was displayed on screen 500 ms after the response, with the four response options "not at all", "a little", "nearly sure", and "completely sure". Participants then pressed a key to indicate their degree of confidence that their orientation response was correct. In case of incorrect orientation responses, the trial ended by the presentation of the word error for 1,000 ms.

Design and procedure
Participants were instructed to report the orientation of the grating as accurately as possible without time pressure and to guess the orientation of the target if they had no idea about the orientation at all. In addition, they were instructed that they should report their degree of confidence that their orientation response had been correct, they should report their confidence as accurately as possible and that if they were aware that they had made an error, they should rate their degree of confidence as "not at all".
The experiment consisted of one training block and 24 experimental blocks of 40 trials each.
Each SOA featured 10 times in each block in random order. The orientation of the target stimulus varied randomly across trials. After each block, the percentage of errors was displayed to provide participants with feedback about their accuracy.

EEG acquisition
The electroencephalogram (EEG) was recorded from 64 electrodes using a BIOSEMI Active-Two system (BioSemi, Amsterdam, The Netherlands; Ag/AgCl electrodes, channels P8, P10, PO8, PO4, O2 as well as the left and right mastoid, relative to common mode sense CMS active electrode and driven right leg DR passive electrode). Vertical and horizontal electrooculogram (EOG) was recorded from electrodes above and below the right eye and on the outer canthi of both eyes. All electrodes were off-line re-referenced to linked mastoids.
EEG and EOG data were continuously recorded at a sampling rate of 512 Hz.

EEG analysis
The analysis of the EEG data was performed using MNE-Python (Gramfort et al., 2013(Gramfort et al., , 2014 . First, the data was re-referenced to the linked mastoids. Next, the signal was band-pass filtered between 0.5 and 40 Hz by windowed finite impulse response filtering. The raw data was inspected visually to remove extreme noise events and artefact-contaminated electrodes. Then, we conducted an independent component analysis based on the fastica algorithm (Hyvärinen, 1999), identifying components representing blinks and/or horizontal eye movements and removing these artefacts before back-projection of the residual components.
For analysis of the P3 time window, the continuous EEG was epoched into segments starting 200 ms before stimulus onset and ending 600 ms after stimulus onset. The 200 ms pre-stimulus interval was used for baseline correction. For analyses of the ERN and Pe time windows, the EEG data was epoched into segments starting 200 ms before the orientation response and ending 600 ms after the response. The time range between 150 and 50 ms before the response was used as baseline correction. Epochs with amplitude changes greater than 100 μV were excluded from analysis, the same exclusion criterion as in a previous study of EEG correlates of confidence (Boldt & Yeung, 2015). Finally, ERP waveforms were obtained by averaging across epochs (but not for the validation of the prediction by the cognitive models, see below). EEG activity in specific time windows was quantified by calculating mean amplitudes because mean amplitudes are robust to different numbers of trials across conditions (Luck, 2014). The time windows were 350 -500 ms poststimulus at electrode Pz for the P3, -40 -60 ms after the orientation discrimination response at electrode FCz for the ERN, and 250 -350 ms after the orientation discrimination response at electrode Pz for the Pe, the same time windows as in a previous study of EEG correlates of confidence (Boldt & Yeung, 2015). The time window of 350 -500 after stimulus onset excluded the topographical maps, artefact-contaminated electrodes that were excluded in the beginning were interpolated using spherical splines (Perrin, Pernier, Bertrand, & Echallier, 1989).

Model specification
Nine models were fitted to the combined distributions of orientation discrimination and confidence judgments, separately for each single participant. were based on different decision architectures.

SDT derived models
Models (i) -(vi) assumed that a decision about the identity of the stimulus was made based on a comparison between a continuous decision variable for the discrimination judgment with the free criterion parameter . Participants responded = 0, when < , and = 1 if > . The decision variable for the discrimination judgment was modelled as a random sample from a Gaussian distribution : The stimulus strength was a free parameter specific to each SOA. When = 0, the distribution of was shifted to the left by the distance of 2 ⁄ . When = 1, the distribution is shifted by the same distance to the right. Thus, denotes the distance of the distributions generated by the two possible identities of the stimulus and is in this respect equivalent to the sensitivity parameter in standard SDT. Concerning the standard deviation , model fitting was reproduced with two different assumptions about the variability of : For the first set of analyses, the standard deviation of was fixed at 1 for both identities of the stimulus. For the second set of analyses, the variability of could vary depending on : An additional parameter indicated the ratio of the standard deviations of created by two possible identities of the stimulus.
A specific degree of confidence ∈ {1, 2, 3, 4} was determined by comparing the decision variable for confidence against a set of three criteria . Each criterion delineated between two adjacent categories of confidence, e.g. participants selected the category 2 if fell between 1 (which separated category 1 and 2) and 2 (which separated category 2 and 3). To be consistent with standard SDT, we fitted three different criteria for each of the two response options. The different models were characterised by different ways how was determined.
SDT rating model. According to model (i), the decision variables for identification and confidence were identical: Noisy SDT model. According to model (ii), was sampled from a Gaussian distribution, with a mean equal to the decision variable and the standard deviation , which was an additional free parameter: Noisy decay model. According to model (iii), the, was also sampled from a Gaussian distribution with the standard deviation . The mean of depended on , but was reduced by multiplication with a signal reduction parameter . The signal reduction parameter was a separate free parameter for each SOA and was bounded between 0 and 1.
WEV model. According to models (iv), was again sampled from a Gaussian distribution with the standard deviation : Formula (5)  . The term 2 − 1 ensured that strong stimuli tended to shift the location of the distribution in a way that high confidence was more likely, and likewise, weak stimuli tended to shift the location of the distribution in a way that the probability of low confidence increased. ̅ denotes the mean of across the five SOAs and was added to the formulae to increase stability during parameter fitting. Two-channel model. According to model (v), was again sampled from a Gaussian distribution, but now independently from : The free parameter a expressed the fraction of signal available to the second channel relative to the signal available to the first channel.
SDT model with postdecisional evidence. According to model (vi), the, was again sampled from a Gaussian distribution: The free parameter indicated the amount of postdecisional accumulation, and the term 2 − 1 ensured that postdecisional accumulation tended to decrease when = 0, and to increase when = 1.

Non-SDT models
Model (vii)-(ix) assumed a different decision architecture for the identification judgment than models (i)-(vi).
Detection heuristic model. According to model (vii), there were two separate decision variables for the identification judgment, each belonging to one possible identity of the stimulus: The parameter reflected the a priori bias in favour of = 1. Participants were assumed to respond = 0, when 0 > 1 , and = 1 if 0 < 1 . Confidence judgments were only based on the decision variable pertaining to the selected response: When = 0, 0 was compared against a series of confidence criteria 0 to select a specific degree of confidence; and when = 1, the comparison was based on 1 as well as a second set of criteria 1 . P(S id = 1|δ id0 , δ id1 ) = ∑ P(δ id0 |Δt = t , s, S id = 1)P(δ id1 |Δt = t , s, S id = 1) t ∑ P(δ id0 |Δt = t , s, S id = i)P(δ id1 |Δt = t , s, S id = i) t,i

2-D Bayesian
A specific identity and degree of visibility were chosen by comparing the posterior probability P(S id = 1| 0 , 1 ) against a set of criteria . It was assumed that the possible identities and degrees of visibility formed an ordered set of decision options. Each criterion delineated two adjacent decision options, e.g. participants chose to respond that the identity was 1 and visibility was 1 if P(S id = 1|δ id0 , δ id1 ) was smaller than the criterion associated with identity 1 and visibility 2, and at the same time P(S id = 1|δ id0 , δ id1 ) was greater than the criterion for identity 0 and visibility 1. Finally, it was assumed that observers did not always give the same response as they intended to. When a lapse occurred, identification and visibility responses were assumed to be random with equal probabilities. The lapse rate λ was an additional free parameter. Two high thresholds model. Model (ix), the two high thresholds model, assumed that the decision variable for the identification judgment was not continuous, but categorical ∈ {0, 0.5, 1} : Observer could either detect the identity of the stimulus and choose the response accordingly = 0 = 0, and = 1 = 1. Alternatively, observers could be in a state of uncertainty, = 0.5, in which no information about the identity was available, and observers responded by random guessing. The probability to detect the identity of the stimulus depended on the five SOAs as well as on the identity of the stimulus, resulting in a total of ten detection parameters ( = | , ). A guessing parameter determined the probability with which observers responded = 1 when they were in the state of uncertainty. A specific degree of confidence was sampled randomly depending on the three possible states of and the response . As the response was fixed when observers detected the identity, there were four different sets of probabilities to determine confidence judgments ( = | = 0), ( = | = 1), ( = | = 0.5, = 0), and ( = | = 0.5, = 1). All ( = | , ), ( = | , ) and were free parameters.

Model fitting
The nine models were fitted to the combined distributions of orientation discrimination and confidence judgments separately for each single participant. First, the frequency of each confidence category was counted for each orientation of the stimulus and each orientation response. Then, for each model, the set of parameters was determined that minimized the negative log-likelihood. For models (i)-(vii) and (ix), the likelihood was calculated analytically. Only for the 2-D Bayesian model, the likelihood was approximated by simulation. Minimization was performed using a general SIMPLEX minimization routine (Nelder & Mead, 1965). To quantify the goodness-of fit of the nine models, we calculated the BIC (Schwarz, 1978) and the AICc (Burnham & Anderson, 2002), a variant of the Akaike information criterion (Akaike, 1974) using the negative likelihood of each model fit with respect to each single participant and the trial number.

Predictions of ERP amplitudes
To determine expected ERP amplitudes from model fits, we first used the parameter sets obtained during model fitting of the behavioural data to calculate the probabilities of all four confidence categories as a function of SOA and choice accuracy. Then, the expected ERP amplitude was determined as the sum of the transformed confidence categories, weighted by the probability of each confidence category. The transformations of confidence were determined by SIMPLEX minimization of the sum of squares of the deviation between the expected ERP amplitude and the observed ERP amplitudes across trials separately for each single subject. Two types of transformations of confidence were used: A linear transformation and a monotonous transformation. For the monotonous transformation, there was one free parameter for each of the four confidence categories, specifying the expected ERP amplitude. The fitting algorithm was constrained to ensure that the expected ERP amplitude was either monotonously increasing or decreasing with confidence. Finally, the correlations between predicted and observed ERP amplitudes were assessed across trials separately for each participant.

Bayesian statistics.
All statistical tests were based on Bayes factors (Rouder et al., 2009), as implemented in the R package BayesFactor (Morey & Rouder, 2015). To test if an ERP component was related to confidence or to the SOA, we used a Bayesian linear mixed regression model with a fixed effect of SOA or confidence and a random effect of participant on the intercept, using default mixture-of-variance priors and a scale parameter of r = 1/2 (Rouder & Morey, 2012).
Conceptually, the prior represents the a priori belief that smaller regression slopes are more plausible than large slopes, although even very large slopes are not impossible. Each Bayes factor represents a comparison between the full regression model and a regression model with only the random effect of participant. To compare fits between models of confidence, the Bayesian equivalent of a paired t-test was used, assuming a Cauchy distribution with a scale parameter of 1 as prior for the standardized effect size δ, a choice recommended as default (Rouder et al., 2009). The strength of statistical evidence was interpreted according to an established guideline (Lee & Wagenmakers, 2013). In addition, we constructed 95% HDI intervals of the regression slopes or mean differences by 10 6 samples from the posterior distribution using the same models and priors as for Bayes factors.

Behavioural results
Discrimination performance of the orientation ranged between chance at the shortest SOA (M = 50.8%, SD = 2.6) and close-to-ceiling at the longest SOA (M = 94.8%, SD = 8.5, see Fig. 2A). Confidence ranged between M = 1.6 (SD = 0.6) on a four-point scale at the shortest SOA and M = 3.7 (SD = 0.4) at the longest SOA. Fig. 2B shows that confidence was characterised by an increase with SOA in correct as well as in incorrect trials.

ERP results
The effects of confidence were examined in correct trials during the time windows of the three candidate correlates of confidence: P3, ERN, and Pe. Consistent with our prediction, there was extremely strong evidence that EEG activity in the P3 time range (350 -confidence, 95% HDI [1.8 2.9] µV/scale step, BF10 = 3.6•10 10 (see Fig. 3A). Fig. 4A shows that the association between ERPs and confidence in correct trials during the P3 time window had a centroparietal distribution over the scalp, consistent with known topographies of the P3 in difficult perceptual discrimination tasks (Koivisto & Revonsuo, 2010).

Cognitive modelling
Modelling confidence judgments. Fig. 6 shows confidence judgments as a function of SOA and discrimination accuracy compared to the model prediction based on parameter sets identified during fitting. The WEV-model, the noisy decay model, the detection heuristic model and the 2-D Bayesian model correctly predicted that confidence in incorrect trials increases with SOA (Fig. 6, A, E, H, I). The other models could not reproduce the double increase pattern (Fig. 6, B, C, D, F, G). Fig. 6. Mean confidence judgments depending on stimulus-onset-asynchrony (x-Axis) and accuracy of the orientation response. Different panels show the prediction of the different models based on the sets of parameters identified during model fitting, assuming constant variances of the decision variable. Solid lines indicate the prediction for correct trials, dashed lines for incorrect trials. Blue circles indicate observed confidence judgments in correct trials, and red triangles in incorrect trials. Error bars = 1 SEM.
Quantifying model fit using the Akaike information criterion (AICc) and the Bayes information criterion (BIC) showed that the best fit to the data was obtained by the WEVmodel, followed by the noisy decay model (see Table 1). Regarding AICc, the evidence that the WEV model performed better than the noisy decay model was not conclusive, but there was very strong evidence that the WEV-model performed better than the two-highthresholds-model and extreme evidence that the WEV-model performed better than each of the other models. Regarding BIC, there was moderate evidence that the WEV-model performed better than the detection heuristic model, strong evidence that the WEV-model was better than the noisy decay model, and extreme evidence that the WEV model was better than each of the other five models. These results were essentially the same when it was assumed that the variances of the decision variable differed between horizontal or vertical stimuli. To assess the accuracy of model classification, models were fitted to simulated data based on the parameter sets obtained during model fitting of the empirical data. It was found that model classification was suitable for the purpose of the present study.  Fig. 7A shows that the linear transformation of predicted confidence resulted in a reasonably accurate prediction regarding ERP amplitude in the P3 window. In contrast, as can be seen from Fig. 7B, the predicted EEG in the ERN time window was more negative in correct than in incorrect trials, even though the ERN is well-established as a negativity related to errors. The prediction did not match the established polarity of the ERN because the transformation was determined without any assumptions about the polarity of the EEG effects; when the transformation was constraint to a positive relationship between confidence and EEG amplitude, the prediction deviated even more strongly from the observed EEG at the time of the ERN. Likewise, Fig 7C shows that longer SOAs were associated with a positive shift in incorrect trials during the Pe time window, which was just opposite to the pattern observed with confidence judgments (cf. Fig 2B) and therefore was not reproduced by the prediction. As a consequence, there was a medium-sized correction  The relationship between confidence and ERP amplitudes of course does not need to be linear. For this reason, we fitted non-linear transformations to the data from each subject by assigning the voltage that minimized the prediction error with respect to ERP amplitude to each level of confidence. The only restriction of the transformation was that the relationship between confidence and ERP amplitudes was assumed to be monotonous. Nevertheless, the predictions based on these specifically adapted transformations were only consistent with amplitudes at the time of the P3, but not with ERN or Pe.

Discussion
The present study revealed a close correlation between decision confidence and EEG

Bayesian accounts of the relation between P3 and confidence
How can the association between P3 and decision confidence be reconciled with the extended literature on various different roles of the P3? An explanation may be given in terms of the Bayesian brain theory, which states that the brain must use representations about certainty to make optimal perceptual computations (Knill & Pouget, 2004). One possible interpretation in terms of Bayesian Brain theory is that the P3 directly reflects certainty within the decision process (Herding et al., 2019). In line with this interpretation, the P3 showed a statistical pattern theoretically expected to signify confidence in a vibrotactile task (Herding et al., 2019). Moreover, the P3 is related to the accumulation of sensory evidence within the decision process (O'Connell et al., 2012;Twomey et al., 2015). Finally, the P3 is suppressed in highly visible stimuli, even when observers are not required to make a perceptual decision (Pitts, Padwal, Fennelly, Martínez, & Hillyard, 2014). These findings converge with a line of research suggesting that decision confidence may emerge directly from the decision process. For example, neurons in parietal cortex of rhesus monkeys represented both formation of the direction decision and the degree of certainty ). Likewise, human EEG correlates of decision formation and confidence coincided in time and in reconstructed sources in a face vs. car discrimination task (Gherman & Philiastides, 2015).
A second interpretation in terms of Bayesian Brain theory is that the P3 reflects sensory representations that include the reliability of the percept (Kopp et al., 2016). This second view is consistent with classical interpretations of the P3 as update of working memory in response to task-relevant events (Donchin & Coles, 1988) or global broadcast of information within a neural global workspace (Sergent et al., 2005). These updated or broadcast representations may encompass the reliability of the percept (Shea & Frith, 2019), which is why the P3 should be correlated with confidence judgments. In line with this interpretation, the WEV-model assumes that confidence is determined by the perceived strength or reliability of the percept based on evidence about choice-relevant and choiceirrelevant features. This means that that the inferred computational principles underlying decision confidence include a representation of the reliability of the percept as well.

Role of ERN/Pe during the present task?
In the present study, EEG activity in the ERN time window can be interpreted as specifically error detection, but not as decision confidence. EEG activity at the time of the ERN does not reflect confidence because the effects of SOA were opposite to what was expected from observed confidence judgments. At least in the present study, the ERN may not be related to postdecisional sensory evidence, because sensory evidence in correct trials is expected to increase with SOA , but at the time of the ERN, the only reliable effect was large negative shift specifically in incorrect trials at the longest SOA. The absence of an ERN at shorter SOAs is in line with a previous study showing that the elicitation of a ERN requires participants to know which response is the correct one (Di Gregorio et al., 2018). Likewise, in the present study, observers also did not know for sure which response had been correct at shorter SOAs because the mask impeded perception of the target. These findings are also consistent with a previous study showing that the ERN occur only when observers make erroneous responses to stimuli rated as "visible" (Charles et al., 2014(Charles et al., , 2013. Although we did not measure conscious awareness in the present study, we can extrapolate from other studies using the same task that observers' conscious percepts of the stimuli were degraded in shorter SOAs (Rausch & Zehetleitner, 2019a;Zehetleitner & Rausch, 2013); possibly, weakly conscious stimuli are not sufficient to trigger an ERN.
A possible interpretation for the role of the Pe in the present study is as accumulation of postdecisional sensory evidence. At least in the present study, the Pe does not reflect decision confidence because their statistical patterns as functions of SOA and choice accuracy are not compatible. In addition, the Pe does not exclusively reflect error awareness, because EEG activity at the time of the Pe was correlated with confidence in correct trials.
However, the pattern of the Pe as a function of SOA and choice accuracy matches the diverging pattern between correct and incorrect responses expected from postdecisional accumulation of sensory evidence (Moran et al., 2015). The contribution of postdecisional sensory evidence to confidence may be relatively small in the present paradigm, because the mask prevents ongoing accumulation of evidence from sensory memory. Accordingly, cognitive modelling showed that the WEV model fitted confidence much better than the SDT model with postdecisional evidence. If the Pe reflects postdecisional accumulation of evidence, this explains why in the present study effects at the time of the Pe seemed to be limited to high confidence trials. The efficiency of the mask varies across trials, and presumably the mask had been relatively ineffective in trials when observers reported high degrees of confidence. Moreover, if the Pe represents postdecisional sensory evidence, it can be explained why a previous study detected an association between the Pe and all degrees of confidence (Boldt & Yeung, 2015). As stimuli in that study were not masked, postdecisional accumulation of sensory evidence may have been more effective than in the present study.
Finally, the Pe may not only be sensitive to postdecisional sensory evidence, but may reflect also other sources of information, including response conflict, efference copy, proprioception, perception of action effects, and interoception (Ullsperger et al., 2010;Wessel et al., 2011).

Statistical signatures of confidence?
The present study demonstrates that statistical patterns of confidence can provide a strong test for identifying correlates of confidence, although it is crucial to validate statistical signatures of confidence empirically by behavioural measures of confidence. It has been argued that if confidence is determined objectively as the posterior probability of being correct, the pattern referred to as folded X-pattern is the statistical signature of confidence Sanders et al., 2016). Therefore, a substantial number of recent studies have searched for the folded X-pattern to empirically identify correlates of decision confidence (Braun et al., 2018;Fetsch et al., 2014;Herding et al., 2019;Lak et al., 2017;Sanders et al., 2016;Urai et al., 2017). However, it has been shown mathematically that the folded X-pattern is neither a necessary nor a sufficient condition for Bayesian confidence (Adler & Ma, 2018;Rausch & Zehetleitner, 2019b). The present study showed empirically that a second statistical pattern of confidence exists and can be used to identify correlates of confidence. Had we not measured decision confidence directly and relied on the purported folded-X signature, the Pe, not the P3, would have been falsely considered a correlate of confidence.

Conclusion
The present results suggest that there is no single EEG correlate of decision confidence and error awareness. EEG activity over parietal electrodes 350 -500 ms after onset of the stimulus is closely correlated with decision confidence. However, EEG components after the response, which have been established as markers of error detection or error awareness, are dissociated from decision confidence.