Confidence response times: Challenging postdecisional models of confidence

Even though the nature of confidence computations has been the topic of intense interest, little attention has been paid to what confidence response times (cRTs) reveal about the underlying confidence computations. Several previous studies found cRTs to be negatively correlated with confidence in the group as a whole and consequently hypothesized the existence of an intrinsic relationship of cRT with confidence for all subjects. This hypothesis was further used to support postdecisional models of confidence that predict that cRT and confidence should always be negatively correlated. Here we test the alternative hypothesis that cRT is driven by the frequency of confidence responses such that the most frequent confidence ratings are inherently made faster regardless of whether they are high or low. We examined cRTs in three large data sets from the Confidence Database and found that the lowest cRTs occurred for the most frequent confidence rating. In other words, subjects who gave high confidence ratings most frequently had negative confidence–cRT relationships, whereas subjects who gave low confidence ratings most frequently had positive confidence–cRT relationships. In addition, we found a strong across-subject correlation between response time and cRT, suggesting that response speed for both the decision and the confidence rating is influenced by a common factor. Our results show that cRT is not intrinsically linked to confidence and strongly challenge several postdecisional models of confidence.


Supplementary Materials for "Confidence response times: Challenging post-decisional models of confidence"
Sixing Chen 1 , Dobromir Rahnev 2 1 School of Psychological and Cognitive Sciences, Peking University, Beijing, China 2 School of Psychology, Georgia Institute of Technology, Atlanta, GA, USA

Supplementary Methods
To examine whether the results from the main paper replicate in a larger number of datasets, we relaxed criterion 3 used for the selection of datasets.
Specifically, we changed that criterion so that we would select datasets with at least 30 subjects (instead of 75 subjects) who each completed at least 150 trials per task (instead of 200). This resulted in selecting 5 additional datasets.
However, two of the datasets could not be used and had to be excluded.
Specifically, the Siedlecka_2021 dataset has an experimental design in which subjects could choose not to respond on some trials, and the Xu_2019_Exp1 dataset has most subjects (27/30) used the maximum confidence level of 4 most frequently. These features made it impossible to run our analyses on these two datasets. Thus, we report results from the other 3 additional datasets: Maniscalco_2017_expt1, Maniscalco_2017_expt2, and Yeon_unpub_Exp2 (which we named Maniscalco1, Maniscalco2, and Yeon). The results of these datasets (Figures S1-3) are in line with the results of the datasets that we analyzed in the main paper. Figure S1. Analyses of Maniscalco1 dataset. The name of the Maniscalco1 dataset in the Confidence Database is "Maniscalco_2017_expt1" (Maniscalco et al., 2017). In the task, circular patches of white noise were presented to the left and right of fixation. A sinusoidal grating was embedded in either the left or right patch of noise. Subjects indicated which patch contained the grating, left or right, with a keypress. After that, subjects rated confidence on a 4-point scale. 30 subjects each completed 1000 trials. More details about the dataset can be found in the original publication. In the dataset, both cRT-confidence and cRT-accuracy relationships are driven by the response frequency, and are not significant at the population level. (A) cRT for each confidence rating for each group formed based on the most frequent confidence rating. In line with Hypothesis 2, cRT was lowest for the most frequent confidence rating for groups 1 and 4, though the effect was less clear in groups 2 and 3, presumably due to the smaller power for this dataset. (B) Analyzing all groups together showed that the slope of the cRTconfidence relationship (i.e., βcRT~Confidence) decreases for the groups where the most frequent confidence rating is higher (slope = -54.36, t (  The cRT-confidence relationship is not significant at the population level. For the cRT-accuracy relationship, cRT is significantly lower in correct trials than in error trials. (A) cRT for each confidence rating for each group formed based on the most frequent confidence rating. In line with Hypothesis 2, cRT was lowest for the most frequent confidence rating for groups1 and 4, though the effect was not present in groups 2 and 3, presumably due to the limited subject number in this experiment. (B) Analyzing all groups together showed that the slope of the cRT-confidence relationship (i.e., βcRT~Confidence) decreases for the groups where the most frequent confidence rating is higher (slope = -27.51, t(39) = -7.00, p = 2.2×10 -8 , Cohen's d = -1.11), indicating that the cRT-confidence relationship is driven by the response frequency. (C) There is no significant cRT-confidence relationship at the population level (slope = -11.50, %95 CI = [-24.24, 1.28], t(40.14) = -1.79, p = .08, Cohen's d = -.28, BF10 = .72), consistent with the lack of clear bias in the group towards either high or low confidence responses. (D) The cRT difference between correct and error trials decreases for the groups where the most frequent confidence rating is higher (slope = -19.70, t(39) = -5.69, p = 1.4×10 -6 , Cohen's d = -.90), indicating that the cRT-accuracy relationship is driven by the response frequency. (E) cRT was slightly lower in correct than in error trials (t(40) = -2.10, p = .04, Cohen's d = -.33, BF10 = 1.21). Error bars show SEM. ***, p < 0.001; *, p < .05; n.s., not significant. Figure S3. Analyses of Yeon dataset. The name of the Yeon dataset in the Confidence Database is "Yeon_unpub_Exp2". In the task, a large number of white dots were moving in random directions while a certain proportion of dots were moving toward either left or right direction. Subjects indicated the direction of the coherently moving dots. After that, subjects rated confidence on a 4-point scale. 37 subjects each completed 800 trials. More details about the dataset can be found in https://osf.io/g7ary. In the dataset, both cRT-confidence and cRTaccuracy relationships are driven by the response frequency, and are not significant at the population level. (A) cRT for each confidence rating for each group formed based on the most frequent confidence rating. In line with Hypothesis 2, cRT was lowest for the most frequent confidence rating for groups 1 and 4, though the effect was less clear in groups 2 and 3, presumably due to the smaller power for this dataset. (B) Analyzing all groups together showed that the slope of the cRT-confidence relationship (i.e., βcRT~Confidence) decreases for the groups where the most frequent confidence rating is higher (   We computed the cRT-confidence relationship separately based on the odd and even trials for each subject, and then correlated the strengths of the relationship across subjects. We found very high correlations between odd and even trials for the strength of the cRT-confidence relationship, quantified as βcRT~Confidence (Bang: r = .78, p = 3.2×10 -42 ; Haddara1: r = .89, p = 3.8×10 -151 , bootstrapped r = .84; Haddara2: r = .99, p = 4.8×10 -61 , bootstrapped r = .98). Importantly, there were high proportions of subjects with consistently positive or consistently negative cRT-confidence relationship, suggesting the existence of substantial and reliable individual differences in the cRT-confidence relationship. In all computations, cRTs were in ms. Each dot corresponds to one subject. Solid lines indicate bestfitting regressions. Figure S6. cRT-accuracy relationship at the individual level. We computed the cRT-accuracy relationship separately based on the odd and even trials for each subject, and then correlated the strengths of the relationship across subjects. We found very high correlations between odd and even trials for the cRT-accuracy relationship, quantified as cRTcorrect -cRTerror (Bang: r = .46, p = 6.8×10 -12 , bootstrapped r = .40; Haddara1: r = .52, p = 3.3×10 -32 , bootstrapped r = .54; Haddara2: r = .92, p = 9.8×10 -31 , bootstrapped r = .88). Crucially, there were high proportions of subjects with consistently positive or consistently negative cRT-accuracy relationships. In all computations, cRTs were in ms. Each dot corresponds to one subject. Solid lines indicate best-fitting regressions.