Re-evaluating frontopolar and temporoparietal contributions to detection and discrimination confidence

Previously, we identified a subset of regions where the relation between decision confidence and univariate functional magnetic resonance imaging (fMRI) activity was quadratic, with stronger activation for both high and low compared with intermediate levels of confidence. We further showed that, in a subset of these regions, this quadratic modulation appeared only for confidence in detection decisions about the presence or absence of a stimulus, and not for confidence in discrimination decisions about stimulus identity (Mazor et al. 2021). Here, in a pre-registered follow-up experiment, we sought to replicate our original findings and identify the origins of putative detection-specific confidence signals by introducing a novel asymmetric-discrimination condition. The new condition required discriminating two alternatives but was engineered such that the distribution of perceptual evidence was asymmetric, just as in yes/no detection. We successfully replicated the quadratic modulation of subjective confidence in prefrontal, parietal and temporal cortices. However, in contrast with our original report, this quadratic effect was similar in detection and discrimination responses, but stronger in the novel asymmetric-discrimination condition. We interpret our findings as weighing against the detection-specificity of confidence signatures and speculate about possible alternative origins of a quadratic modulation of decision confidence.


Introduction
Adult humans are able not only to evaluate what they see and do not see, but also how confident they are in these percepts [1]. Investigations into the neural basis of metacognition reveal a network of brain regions where activation scales with perceptual confidence (for a coordinate-based meta-analysis, see [2]). However, a majority of previous computational modelling and neuroimaging studies of perceptual confidence have focused on understanding confidence in discrimination decisions (e.g. was it a bird or a plane?). By contrast, the computations and neural substrates supporting perceptual confidence for detection decisions (e.g. was there anything there at all?) remain largely uncharted territory. Mapping that territory is of considerable interest, both due to the conceptual overlap between (detection) confidence and perceptual awareness, and also because detection may invoke distinct computational demands that are not required in discrimination [3][4][5].
In a previous study [6], we compared the parametric effect of subjective decision confidence on brain activation in two perceptual decision-making tasks: a discrimination task (was the grating tilted clockwise or anticlockwise?) and a detection task (was there any grating present at all?). Replicating previous findings [2,7,8], we observed a linear effect of confidence in a set of predefined regions of interest, with high confidence levels associated with stronger (ventromedial prefrontal cortex, vmPFC; precuneus; ventral striatum) or weaker ( posterior medial frontal cortex, pMFC) signals, across both tasks and responses. Exploratory analysis additionally revealed a widespread positive quadratic effect of confidence, with stronger signals associated with using the extreme ends of the confidence scale. In the right frontopolar cortex, right superior temporal sulcus (STS) and right pre-supplementary motor area ( pre-SMA), this quadratic effect was stronger for the detection task, where participants decided whether a grating was present or absent. Additionally, in the right temporoparietal junction (rTPJ), a linear effect of confidence was stronger following judgements about target absence compared with judgements about target presence.
Signal detection-based computational simulations suggested that a quadratic activation profile may reflect the unequal variance nature of detection tasks (see figure 1). In detection, the variance associated with perceiving a signal is higher than the variance associated with perceiving the absence of a signal [9,10]. This unequal variance evidence structure can then produce a quadratic activation pattern in brain regions that are involved in dynamically updating a decision criterion or in representing the likelihood ratio between the two stimulus classes [6]. An alternative interpretation of our previous results is that distinct metacognitive processes are selectively invoked for decisions and confidence formation about presence and absence, but not in confidence about stimulus identity. For example, brain regions that selectively encode stimulus visibility [11], ones that correspond to higherorder nodes in a hierarchical model of perceptual states [5], or ones that are implicated in counterfactual thinking and attention monitoring [12] may show differential modulation of confidence in detection and discrimination decisions.
The design of our previous study did not allow us to decide between these alternative accounts. Here, we introduce a third hybrid condition to our experimental design: a discrimination task with the distributional properties of a detection task (tilt recognition; following [13]). This task requires subjects to report whether a grating is tilted or vertical: a discrimination judgement between two stimulus classes. However, because tilted gratings can appear in various orientations while vertical gratings are fixed, the distribution of perceptual evidence is of higher variance in the former, mimicking the variance asymmetry of a (yes/no) detection task. The two possible explanations of our previous findings (sensitivity of confidence encoding to variance structure versus a specific representation of presence and absence) thus make different predictions for this third condition. An unequal variance account predicts qualitatively similar neural confidence effects to those observed in detection, as the two tasks share a similar distributional structure. Conversely, a presence-absence asymmetry account predicts confidence effects that are qualitatively similar to those observed in discrimination, as the tilt-recognition task no longer requires inference about stimulus presence versus absence.
To anticipate our results, behavioural analysis confirmed that the tilt-recognition task induced detectionlike unequal variance effects on subjects' confidence ratings-confirming that it created an asymmetricdiscrimination task, as intended. A mass-univariate analysis of functional magnetic resonance imaging (fMRI) data replicated linear and quadratic effects of confidence in pre-specified regions of interest. However, unlike in our previous study, here the quadratic effects of confidence were similar in the detection and discrimination tasks and were instead stronger in the novel asymmetric-discrimination condition, which was also the condition with the most pronounced behavioural signatures of unequal variance. Furthermore, and in contrast with what we observed in Mazor et al. [6], in the rTPJ, a negative linear modulation of decision confidence was similar in detection 'yes' and 'no' responses. Representational similarity analysis (RSA) indicated that differences in multivariate activity patterns between high-and lowconfidence trials were mostly task-invariant. We conclude with a discussion of how our previous conclusions should be revised in light of these new results.

Results
In a pre-registered design ( pre-registered protocol folder: github.com/matanmazor/unequalVariance Discrimination/tree/main/experiment/protocolFolder/protocolFolder), a total of 46 participants performed three perceptual decision-making tasks while being scanned in a 3T MRI scanner: an orientation discrimination task (was the grating tilted clockwise or anticlockwise?), a detection task (was any grating presented at all?) and a tilt-recognition task (was the grating tilted or vertical?; see figure 1). Tasks were performed in separate blocks each comprising 26 trials. At the end of each trial, participants rated their confidence in the accuracy of their decision on a six-point scale. We adjusted the difficulty of the three tasks in a preceding behavioural session to achieve similar performance of around 70% accuracy. Fifteen to eighteen blocks were presented in five-six scanner runs.

Behavioural results
Thirty-five participants met our pre-registered inclusion criteria (see Methods). Task performance was similar for discrimination (76% accuracy), detection (78% accuracy) and tilt recognition (77% accuracy). Repeated measures analysis of variance revealed no difference in response accuracy between the three tasks (F 2,68 = 1.17, p = 0.32, BF 01 = 3.43; figure 2a). The probability of responding 'clockwise' in the discrimination task was 51% and not significantly different from 0.5 (t 34 = 0.58, p = 0.57, d = 0.10). By contrast, participants were more likely to respond 'no' than 'yes' in the detection task (54% of all responses, t 34 = 3.70, p < 0.001, d = 0.63) and 'vertical' than 'tilted' in the tilt-recognition task (57% of all responses, t 34 = 6.67, p < 0.001, d = 1.13). This is consistent with an optimal setting of a decision criterion in an unequal variance setting [14]. Similar to what we had observed in Mazor et al. [6], participants were equally confident in reporting both clockwise (mean confidence on a 1-6 scale: 3.51) and anticlockwise tilt (mean confidence: 3.44) in the discrimination task (t 34 = 0.80, p = 0.43, d = 0.14). However, unlike in our previous study, here a numerical difference in mean confidence between detection 'yes' (3.95) and 'no' (3.80) responses was Figure 1. Experimental design. Stimuli consisted of dynamic random patterns of greyscale values. In all trials except for detection 'target absent' trials, a grating emerged from and disappeared back into the noise. Participants used the index and middle fingers of their right hand to indicate whether a grating was tilted clockwise or anticlockwise (discrimination), whether it was present or absent (detection), or whether it was vertical or tilted (tilt recognition). They then reported their level of confidence on a sixpoint scale by controlling the size of a coloured circle with their left thumb. In blue: inter-trial variability was similar for the two stimulus categories in discrimination, whereas in detection and tilt recognition, it was higher for one category over the other. LLR is a linear function of the perceptual sample in discrimination, but this relation is quadratic in detection and tilt recognition.
Confidence ratings are typically aligned with objective accuracy, such that participants are more confident on average when they are correct, compared with when they are wrong. Previous studies found that this alignment, commonly referred to as metacognitive sensitivity, is reduced for decisions about target absence compared with decisions about target presence [6,10,15,16]. Here also, metacognitive sensitivity (quantified as the area under the response-conditional type-II receiver operating characteristic (ROC) curve) was significantly higher for detection 'yes' compared with 'no' responses (t 34 = 6.41, p < 0.001, d = 1.1; figure 2c). Similarly, in the tilt-recognition task, metacognitive sensitivity was higher for 'tilted' compared with 'vertical' responses (t 34 = 9.55, p < 0.001, d = 1.61). No difference in metacognitive sensitivity was observed between discrimination clockwise and anticlockwise responses (t 34 = 0.70, p = 0.49, d = 0.12).
In an unequal variance signal detection setting, metacognitive sensitivity is expected to be higher for classifying a stimulus as belonging to the high-variance compared with the low-variance stimulus class. Our tilt-recognition task is an example of such a setting: the 'vertical' class had low variance (all stimuli were vertical), and the 'tilted' class had high variance (some stimuli were more tilted than others). As expected, the ratio between the standard deviations of the two stimulus categories (measured as the geometric mean of type-1 zROC slopes), was 0.55 and significantly lower than 1, indicating higher variability in the representation of tilted stimuli (a t-test performed on log-slopes against 0: t 34 = −12.50, p < 0.001, d = 2.11; figure 2b). Similarly, this ratio was 0.74 for the detection task, indicating higher variability in the encoding of target presence (t 34 = −7.27, p < 0.001, d = 1.23). By contrast, the ratio was 0.99 for the discrimination task and statistically indistinguishable from 1, indicating similar variability in the encoding of clockwise and anticlockwise stimuli (t 34 = −0.26, p = 0.79, d = 0.04).
We pre-registered a plan to evaluate the parametric modulation of confidence both directly from brain activations, as well as indirectly from beta coefficients of a design matrix where confidence is The log zROC slope was not different from 0 in discrimination, indicating similar variability in the representation of clockwise and anticlockwise stimuli. In detection, this quantity was significantly negative, indicating higher variability in the representation of signal. In tilt recognition, this quantity was even more negative, indicating higher variability in the representation of tilted stimuli. (c) Metacognitive sensitivity, quantified as the area under the response-conditional type-II ROC curve, was significantly higher for both 'yes' and 'tilted' responses compared with 'no' and 'vertical' responses, respectively. We observed no significant difference in metacognitive sensitivity between discrimination 'clockwise' and 'anticlockwise' responses. (d) Distributions of confidence ratings (on a 1-6 scale) for the three tasks and six responses. ÃÃÃ p < 0.001. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 specified as a categorical variable. This two-step solution controls for metacognitive biases in that all confidence levels equally contribute to parametric modulation estimates, regardless of their frequency in the data. Results from the categorical design matrices mostly agreed with those from the parametric modulation analysis. We therefore report the parametric modulation results and mention when the categorical design matrix provided conflicting results. Full results from both approaches are presented in the electronic supplementary material.

Linear and quadratic effects of confidence
Our primary (quadratic confidence) design matrix included parametric modulators for linear and quadratic effects of confidence. Among our pre-specified regions of interest, the medial frontopolar (FPm) and vmPFC regions of interest (ROIs) showed a positive linear effect of confidence (FPm: t 34 = 3.45, p < 0.001, d = 0.58, vmPFC: t 34 = 4.53, p < 0.001, d = 0.77). A whole-brain contrast revealed a positive modulation of confidence in the bilateral precuneus, claustrum and ventral striatum. Conversely, the rTPJ showed a negative linear modulation of confidence, similar to what we observed in our previous study (t 34 = −3.35, p < 0.001, d = 0.57). Quadratic polynomials fitted to beta values from the categorical design matrices revealed a negative linear modulation of confidence also in the right STS (t 34 = 2.47, p < 0.05, d = 0.42), and a whole-brain analysis revealed a negative linear effect of confidence in the pMFC. Consistent with what we had observed in our previous study, a positive quadratic effect of confidence was robustly observed in a number of regions. Among our pre-specified ROIs, this effect was significant in lateral frontopolar cortex (FPl; t 34 = 2.93, p < 0.01, d = 0.50), Brodmann area 46 (BA46; t 34 = 4.43, p < 0.001, d = 0.75), rTPJ (t 34 = 4.89, p < 0.001, d = 0.83), right STS (rSTS) (t 34 = 3.62, p < 0 .001, d = 0.61) and pre-supplementary motor area ( pre-SMA; t 34 = 5.00, p < 0.001, d = 0.85). Whole-brain analysis revealed a quadratic effect of confidence also in dorsolateral prefrontal and orbitofrontal cortex, anterior insula, precuneus, posterior cingulate and in the cerebellum. Similar effects were observed when directly controlling for motor aspects of the confidence-rating phase (see electronic supplementary material, S18).

Task-specific activations
We next asked whether brain activation differed between the three tasks, collapsed across responses and confidence levels. Repeated measures analyses of variance failed to find a main effect of task in any of our seven pre-registered ROIs (all ps > 0.33; see electronic supplementary material, S3). Outside these regions, whole-brain analysis ( p < 0.05, corrected for family-wise error at the cluster level) revealed that activations in bilateral premotor cortex were sensitive to task identity. This is consistent with the successful encoding of semantic meaning of motor actions from associative motor cortex [17].

Task-and response-specific confidence modulations
We next asked whether confidence-related brain activation differed between the three tasks. A linear modulation of confidence was similar for the three tasks: repeated measures ANOVAs revealed no effect of task in any of our ROIs (all ps > 0.30; see electronic supplementary material, S4 and figure 3, first row), and whole-brain analysis revealed no differences outside these pre-specified regions. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 5 Our main hypothesis, however, was that a quadratic modulation of confidence should be more pronounced in some tasks than in others-indicating presence-absence or unequal variance-related asymmetries. Within our ROIs, this was the case in BA46 (F 2,34 = 4.47, p < 0.05), rSTS (F 2,34 = 3.80, p < 0.05) and marginally in rTPJ (F 2,34 = 2.1, p = 0.08). The same regions showed a marginal effect when subjecting betas from the categorical design matrix to a group-level ANOVA ( p = 0.06, p = 0.08 and p = 0.06 for BA46, rSTS and rTPJ, respectively). Whole-brain analysis further revealed differences in a quadratic modulation of confidence across tasks in the insula ( p < 0.05, cluster-corrected).
Our next set of pre-registered tests was designed to pinpoint the origins of this interaction of the quadratic expansion of confidence with task. First, we attempted to replicate our finding from study 1 of a stronger quadratic modulation of confidence in detection compared with discrimination. Contrary to our prediction, we found no significant differences in modulation in any of our ROIs (all ps > 0.17, see electronic supplementary material, S6). To determine whether this absence of a significant result should also be taken as positive evidence against a difference between detection and discrimination, we subjected our data to a Bayesian t-test [18]. In the FPm ROI, we obtained moderate evidence against a difference between detection and discrimination (BF 01 = 3.71). Similarly, we obtained moderate evidence against a difference between detection and discrimination in the pre-SMA (BF 01 = 5.43). Analysing beta values from the categorical design matrix, we obtained moderate evidence for the null hypothesis of no difference also in FPl (BF 01 = 3.04) and rTPJ (BF 01 = 5.48). Bayes factors for all other ROIs in which we observed an effect in study 1 were within the interval [⅓,3], indicating no clear evidence for or against an effect (see electronic supplementary material, S6).
In our original study [6], a cluster in the rTPJ showed a negative linear effect of confidence, which was significantly more negative in detection 'no' compared with 'yes' responses. In the current study, activation in this region again showed a negative linear, as well as a positive quadratic effect of confidence. However, an interaction between confidence and detection response was not significant (t 34

Multivariate analysis
To further investigate the relationships between spatial activation patterns across tasks, responses and confidence levels, we next turned to representational similarity analysis [19]. Specifically, we asked which regions represented task, irrespective of confidence; which represented confidence, irrespective of task; and which represented confidence in a task-dependent manner. We pre-registered eight representational dissimilarity matrices (RDMs), each specifying a theory-based prediction regarding which trials should be encoded similarly or dissimilarly based on task, response and reported confidence (figure 4) and compared them against empirical dissimilarity matrices extracted from our pre-registered ROIs.
First, we used the median confidence rating for each response to separate high-and low-confidence responses (for a similar analysis using a response-invariant confidence cut-off with similar results, see electronic supplementary material, S16 and S17). We then computed the empirical similarity in spatial activation patterns between the 12 trial categories (3 tasks × 2 responses × high or low confidence). In order to verify that empirical RDMs hold reliable condition-specific multivariate activation patterns, we compared on-and off-diagonal entries in the ranked RDMs. If condition-specific information is encoded in RDMs, off-diagonal distances should be higher than on-diagonal ones [20]. This was the case in FPm (t 34 = 3.29, p < 0.01), BA46 (t 34 = 2.87, p < 0.01), vmPFC (t 34 = 2.06, p < 0.05) and rSTS (t 34 = 2.17, p < 0.05), but not in FPl, rTPJ and pre-SMA (all ps > 0.17). In the FPm ROI, subject-specific RDMs were not predicted by the averaged RDM of other subjects ( p = 0.38), reflecting poor multivariate signal in this region [21]. We therefore restricted the multivariate analysis to the three regions with royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 reliable multivariate activation patterns: BA46, vmPFC and rSTS. In all following analyses, on-diagonal entries are ignored, in order not to artificially inflate correlation measures [20].
In the rSTS ROI, multivariate activation patterns were most consistent with encoding of detection responses (target present or absent; RDM C, figure 4c; p < 0.05). A negative correlation with RDM G (detection confidence) is driven by the perfect negative correlation between RDMs C and G, when excluding the diagonal entries.
By contrast, in the prefrontal BA46 and vmPFC ROIs, multivariate brain activation patterns were most consistent with task-invariant confidence encoding (RDM E, figure 4e; ps < 0.05), and with confidence encoding that is specific to an unequal variance setting (RDM H, figure 4h, ps < 0.05). Spearman correlations with RDMs E and H were not significantly different from each other ( ps > 0.54). This is in line with results from our previous study [6], where cross-classification analysis revealed no evidence for task-specificity in multivariate confidence representations (see full pre-registered analyses https://osf.io/ y3ftk/, section 'Task-specific and task-invariant confidence representation').
To further explore the sensitivity of confidence encoding to variance structure, we subjected the empirical RDMs to a multiple regression analysis in which candidate RDMs competed to explain the variance in the empirical RDM. First, we decomposed RDM E into 18 constituent sub-RDMs. Each such RDM represented the similarity between confidence encoding in two tasks, or within  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 a single task (figure 5a). We focused on specific sub-RDMs for which a variance structure account made unique predictions. In particular, a variance structure account predicts a response-specific similarity between confidence in detection and in tilt recognition (figure 5b), but a presence-absence account predicts a similar response-invariant encoding of confidence in tilt recognition and in discrimination (figure 5c). To test this prediction, we compared the weighted coefficient combinations for these two predictions.
In the two prefrontal regions, vmPFC and BA46, multivariate activation patterns in the tiltrecognition task were similar both to multivariate activation patterns in the discrimination task (vmPFC: t 34 = 3.41, p < 0.01; BA46: t 34 = 3.48, p = 0.001), and to multivariate activation patterns in the detection task (vmPFC: t 34 = 2.10, p < 0.05; BA46: t 34 = 2.81, p < 0.01). Consistent with our RSA results, this finding is in line with a task-invariant multivariate representation of confidence in these regions.
By contrast, rSTS showed similar confidence encoding in the tilt-recognition and discrimination tasks (t 34 = 3.79, p < 0.001), but distinct confidence encodings in tilt-recognition and detection tasks (t 34 = 1.25, p = 0.22). Still, we observed no significant difference between these correlations (t 34 = 0.84, p = 0.41; figure 5d), providing only indirect support for the hypothesis that multivariate confidence encoding in rSTS was different for detection and discrimination.

Discussion
In a previous study we identified distinct neural contributions to confidence in perceptual detection that were not observed in a performance-matched discrimination task. In this pre-registered follow-up study, we set out to replicate this finding and to further characterize the computational origins of a quadratic modulation of confidence. In what follows, we summarize our findings, discuss how our previous conclusions should be revised in light of this new data and unpack what this may mean for our understanding of a quadratic effect of decision confidence in association cortex.  Figure 5. Multiple regression analysis. RDM E from figure 4 was broken down into 18 constituent RDMs, which were then used to predict empirical RDMs in the seven ROIs (a). We then used beta coefficients from this multiple regression analysis to produce two beta combinations encoding our two hypotheses about differences between tasks (b,c): the first corresponds to similarity in confidence encoding between the tilt-recognition and the detection tasks, and the second to similarity in confidence encoding between the tilt-recognition and discrimination tasks. We found no significant differences between these two weighted contrasts (d ), supporting a task-invariant account of confidence encoding in these regions. ÃÃ p < 0.01; ÃÃÃ p < 0.001; n.s. p > 0.05. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 3.1. A quadratic modulation of confidence is not distinct to perceptual detection As in Mazor et al. [6], here too we observed a widespread quadratic modulation of subjective confidence in prefrontal and parietal association cortex. In our previous study, this effect was stronger in a perceptual detection task, and in some frontopolar regions was not observed at all in a performance-matched discrimination task. By contrast, in the present study this effect was no longer specific to detection, and was instead observed in both detection and discrimination tasks. When considered together, the results of both studies provide no clear evidence for or against the detection-specificity of quadratic confidence responses in FPl and FPm (BF 01 = 2.60 and BF 01 = 1.47, respectively, calculated as the product of the two Bayes factors from Exp. 1 and 2, using a scaling factor of 0.707 over effect sizes).
There are a number of potential reasons for this difference between our results here and those reported in our previous study; for example, in our previous study stimuli were presented briefly, whereas here we opted for a dynamic display mode (see Methods). Alternatively, the inclusion of a third task where difficulty is manipulated differently may have had an indirect effect on participants' disposition towards their confidence ratings in the detection and discrimination tasks. An alternative interpretation is that in our previous study a difference between detection and discrimination was a false positive driven by noise, which happened to facilitate the quadratic trend in one task and reduce it in another. Indeed, this difference in a quadratic modulation was not a priori expected based on theory in our previous study and was the result of exploratory analysis that went beyond our preregistration. The current pre-registered, hypothesis-driven replication provides an unbiased test of this effect, which resulted in a failure to replicate it.
Surprisingly, however, the third 'hybrid' tilt-recognition task which we introduced-a discrimination task with the signal detection properties of a detection task-produced the strongest quadratic modulation of confidence in a number of regions of interest. Specifically, in BA46, rTPJ and rSTS, a quadratic modulation of confidence was significantly more pronounced in tilt recognition than in the discrimination task. Although this result is difficult to interpret without it being accompanied by a significant difference in quadratic modulation for detection (and thereby being consistent with a signature of unequal variance), it nevertheless provides some support that a quadratic modulation of confidence-related brain activity may be sensitive to the variance structure of perceptual evidence. Indeed, the hybrid tilt-recognition task showed exaggerated behavioural markers of unequal variance compared with both the detection and discrimination tasks (in both zROC curves and rcROCs; figure 2). Our results are therefore consistent with a graded sensitivity of this neural marker to variance structure. By contrast, the fact that the strongest quadratic modulation was observed in decisions about stimulus category (tilt versus vertical), rather than stimulus presence or absence, strongly suggests that this effect is not driven by a qualitative difference between detection and discrimination decisions.

A similar linear modulation of confidence for detection 'yes' and 'no' responses in right temporoparietal junction
In our previous study, whole-brain analysis revealed a cluster in the right posterior TPJ in which a linear modulation of decision confidence was more negative for detection 'no' compared with 'yes' responses. We interpreted this finding in light of a potential role for attention monitoring in inference about absence, where participants are required to differentiate failures to perceive the target due to target absence and due to lapses of attention, and in light of an involvement of the temporoparietal junction in modelling attention states [22]. In this second experiment, we pre-registered our plan to directly test this hypothesis, defining the rTPJ based on the relevant contrast in our first study. Contrary to our expectation, a negative modulation of decision confidence in the rTPJ ROI was similar for the two detection responses (see electronic supplementary material, S13). A Bayesian t-test provided moderate evidence for the null hypothesis that confidence encoding in this region is invariant to detection response.

Multivariate analysis provides weak evidence for distinct confidence encoding in an unequal variance setting
Using RSA, we compared multivariate activation patterns in our pre-specified ROIs against theorydriven representational similarity matrices ( figure 4). This analysis revealed robust encoding of decision confidence in BA46 and vmPFC, consistent with a representation of decision confidence that royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 is either task-invariant or sensitive to the variance structure of a task. By contrast, multivariate activation patterns in rSTS were most consistent with a representation of detection decisions about stimulus presence or absence. Follow-up multiple regression analysis revealed that rSTS similarly encoded decision confidence in the two discrimination tasks, but that these confidence representations were not shared with the detection task. This exploratory analysis provides some indirect support for different neuronal mechanisms underlying metacognitive evaluations of decisions about presence and absence, versus stimulus category.

Power considerations in neuroimaging research
We note that our target sample size (N = 35 included subjects) was based on an a priori power calculation to obtain 65-85% power to replicate our previous findings, assuming no inflation of effect sizes in the original report, and no effect of a reduction in the number of trials per task for group-level sensitivity. Our power and sample size aspirations were balanced with resource constraints (grant funding and time, given that N = 46 subjects were tested to obtain N = 35 included subjects), but remain substantially above recent estimates in fMRI research, where power can be routinely lower than 10% ( [23]; though we note that power calculations for whole-brain analyses are complicated by the need to deal with mass-univariate multiple comparisons correction). Low statistical power hinders the field's ability to create a progressive research programme in which one set of findings builds on the other. Even with medium-range power of approximately 75%, and assuming a true effect, we should expect mixed results in a series of studies (roughly 1 in 3 null results with an alpha level of 0.05 for individual tests). As mentioned above, our original finding of a difference in detection and discrimination was the result of exploratory analysis, probably heightening the probability of a type 1 error. Our suspicion is that our current study would, in a previous era, have languished unpublished-a casualty of the file drawer effect. We think it is critically important that such studies are now published to allow a full picture of the strength of different effects. More generally, our findings highlight the importance of allocating funding to replication studies in cognitive neuroscience and the importance of pre-registration of hypothesis tests. With these considerations in mind, we therefore turn in the remainder of the discussion to offer some interpretations of the current findings and highlight questions for future research.

A quadratic modulation of confidence: ideas and speculations
The previous sections summarize the current picture of our findings, in light of our pre-registered hypotheses, and describe both the similarities and differences between the current results and those of our previous paper. A particularly strong and consistent finding across both studies was that univariate fMRI activation in prefrontal and parietal cortex is quadratically modulated by decision confidence. However, as described above, we find no clear support for or against our hypothesized variance structure account of a quadratic modulation of decision confidence. This therefore leaves underdetermined the computational basis of such an effect. In the following, we discuss two additional candidate interpretations of a quadratic modulation of confidence.
In Mazor et al. [6], we referred to two previous reports of a quadratic relation between subjective ratings and brain activation: one in subjective visibility ratings [24] and the other in product desirability ratings [25]. We then explained that our findings were qualitatively different: while a quadratic effect for visibility or product desirability can reflect a linear modulation of subjective confidence when both ends of the rating scale are associated with higher levels of confidence (for example, being highly confident that a product is or is not desirable, or that a stimulus is or is not visible), our results highlight a quadratic effect in confidence itself. In other words, the low end of the scale should reflect low confidence in the perceptual decision, rather than high confidence in a negative rating.
However, although the low end of the confidence scale reflects low confidence in a decision, it may still reflect a high level of confidence in the confidence rating itself. 1 With our incentive structure, reward is dependent not only on task performance but also on the adequacy of confidence reports (see Methods). It is therefore not unlikely a participant would reason something along the lines of 'I'm not sure what I just saw, so I'm highly confident that I should rate my subjective confidence as low to maximize my bonus'. Brain regions where activation scales with subjective confidence may then reflect this meta-level subjective confidence (effectively, confidence in confidence) with a higher level of activity not only for the upper end, but also for the lower end of the confidence scale. Note this scheme does not imply the existence of 'meta-meta-cognition': all that is required is that confidence ratings are represented as part of the primary (type-1) task, such that metacognitive resources are available to evaluate the quality of confidence reports themselves [26].
It then remains to be explained why a quadratic effect of confidence is significantly stronger in the tilt-recognition compared with both discrimination and detection tasks (figure 3; electronic supplementary material, figure S7). One possibility is that confidence in the presence or absence of a tilt more naturally lends itself to rule-based heuristics (such as a mapping between perceived angle and confidence level), leaving metacognitive resources free to monitor the quality of subjective confidence ratings.
Alternatively, a quadratic modulation of decision confidence may reflect object-level (that is, not meta-level) inter-trial fluctuations in visual attention. As an example of how this might arise, a recent EEG study [27] revealed a negative linear relation between reported attention and pre-stimulus alpha power in the 8-12 Hz frequency band. By contrast, high levels of decision confidence were associated with intermediate pre-stimulus alpha power, giving rise to a negative quadratic relation between alpha power and confidence. One implication of this result is that brain regions where activation is typically negatively correlated with alpha power may show a positive quadratic modulation of confidence in virtue of their relation to pre-stimulus alpha. This potentially explains why a quadratic modulation of confidence is prominent in a frontoparietal network where activity has been negatively linked to alpha power [28,29].
It is unclear, however, why a putative relationship between neural correlates of attention and subjective confidence should be sensitive to the variance structure of the task. One useful approach is to ask how, under Bayesian decision theory, decisions and confidence estimates should adjust to reflect differences in the effects of attention on internal distributions. For instance, in a behavioural study [13], participants rated their confidence in whether the orientation of a grating was sampled from a wide or narrow distribution, both centred at 0 degrees. A comparison of trials with valid, invalid and neutral cues revealed that participants rationally adapted their decisions and confidence estimates to their current attention state. Note that this effect of attention on perceptual decisions is specific to unequal variance settings; in an equal variance setting, accuracy is highest when the decision criterion is set to midway between the two stimulus distributions, regardless of sensory precision. By contrast, an unequal variance setting introduces a link between optimal placement of the decision criterion and sensory precision. As we show in Mazor et al. [6], a model where subjects dynamically adjust their decision criterion based on previous samples displays different associations between criterion adjustment and confidence, depending on the variance structure of the task. Together, a possible interpretation of our findings is that a quadratic modulation of confidence in regions such as BA46, rTPJ and rSTS is effectively mediated by decisions about policy changes, such as adjustments of a decision criterion based on observed samples.

Conclusion
In conclusion, in a pre-registered experiment we find that a quadratic effect of decision confidence on brain activity is not specific to decisions about presence and absence, but may be sensitive to the variance structure of the task. We discuss three candidate accounts of this effect, one postulating a role for subjective confidence in the accuracy of confidence ratings themselves, one identifying this quadratic effect with the neural correlates of fluctuations in attention, and one linking brain activations to the online adjustment of a decision criterion.

Methods
We report how we determined our sample size, all data exclusions (if any), all manipulations and all measures in the study. All design and analysis details were pre-registered before data acquisition and time-locked using pre-RNG randomization: we used the SHA256 hash function to translate our preregistered protocol folder (https://github.com/matanmazor/unequalVarianceDiscrimination/blob/ main/experiment/protocolFolder.zip) to a series of bits (7c2c27da12b6768b1789907ba5d2ec46b45d302 199d5368795879fcff844d043). These bits were then used to initialize Matlab's pseudorandom number generator for determining the order and timing of experimental events (relevant lines in the experimental code). Doing so ensures that pre-registration could not have taken place after data collection [30].

Participants
Forty-six participants took part in the study (ages 20-39, mean = 24.2 ± 4.5; 29 females). Participants gave their informed consent to take part in the experiment. The experiment was approved by the UCL ethics committee (approval numbers 8231/001 and 1260/003). Thirty-five participants met our prespecified inclusion criteria (ages 20-38, mean = 25.2 ± 4.5; 24 females). We pre-specified a sample size of 35, balancing statistical power and resource considerations. We calculated that with 35 participants, we will have statistical power of 65-85% to replicate our previous findings, assuming no inflation of effect sizes in the original report, and no effect of a reduction in the number of trials per task for group-level sensitivity.

Design and procedure
After a temporally jittered rest period of 500-4000 ms, each trial started with a fixation cross (500 ms), followed by a presentation of a target for 500 ms. In all three conditions, stimuli were dynamic noisy patterns of greyscale values, out of which a grating sometimes emerged and quickly disappeared ( figure 1, upper panel). We chose this mode of stimulus presentation in light of indications that it produces stronger metacognitive asymmetries between the perceptions of presence and absence relative to standard static presentation modes [31]. Stimuli consisted of 10 greyscale frames presented at 20 frames per second within a circle of diameter 3°. Stimuli were generated in the following way: (1) Generate 10 greyscale frames (F 1 , …F 10 ), each an array of 142 by 142 random luminance values.
(2) Create a 142 by 142 sinusudial grating (G; 24 pixels per period, random phase). The orientation of the grating is determined according to the trial type.
(3) Determine grating visibility for frame i as p i = v × exp(−|i − 5|/2) with v being the visibility level in this trial (0 for target absent trials). (4) For each pixel in the frame F i , j, k, replace the luminance value for this pixel with the luminance value of this pixel in the grating (G j , k) with a probability of p i .
Participants performed the following three tasks: (1) Discrimination. Decide whether the grating was tilted clockwise (50% of trials; 45°relative to a vertical baseline) or anticlockwise (−45°relative to a vertical baseline). (2) Tilt recognition. Decide whether the grating was vertical (50% of trials; 0°) or tilted (sampled from a normal distribution with mean 0°and standard deviation σ orientation ). Stimuli were presented with a fixed v value of 0.2 at which stimuli are clearly visible. (3) Detection. Decide whether the grating was present (50% of trials) or absent. Gratings in the 'present' trials were sampled from a normal distribution with mean 0°and s.d. σ orientation (yoked to the tilt-recognition task).
For all three tasks, responses were made with the right-hand index and middle fingers, and response mappings between fingers and stimulus classes were counterbalanced between blocks. Immediately after making a decision, participants rated their confidence on a six-point scale by using two keys to increase and decrease their reported confidence level with their left-hand thumb, using the same procedure and incentive structure as in Mazor et al. [6]. The perceptual decision and the confidence rating phases were restricted to 1000 and 2500 ms, respectively. No feedback was delivered to subjects about their performance.
Participants were acquainted with the task in a preceding behavioural session. During this session, task difficulty was adjusted independently for detection, discrimination and tilt recognition, targeting around 70% accuracy on all three tasks. In detection and discrimination, we achieved this by adaptively controlling the visibility v value once in every 10 trials: increasing it when accuracy fell below 60%, and decreasing it when accuracy exceeded 80%. In tilt recognition, v was set to 0.20 such that stimuli were highly visible, and calibration was performed on the standard deviation of orientations σ orientation in a similar manner. Performance on all three tasks was further calibrated to the scanner environment at the beginning of the scanning session, during the acquisition of anatomical (MP-RAGE and fieldmap) images. After completing the calibration phase, participants underwent five to six 10 min functional scanner runs, each comprising one block of 26 trials from each experimental condition, presented in a random order.
To avoid stimulus-driven fluctuations in confidence, v and σ orientation were kept fixed within each experimental block. Nevertheless, following experimental blocks with markedly bad (less than or equal royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 to 52.5%) or good (greater than or equal to 85%) accuracy, v or σ orientation were adjusted for the next block of the same task (divided or multiplied by a factor of 0.95 for bad and good performance, respectively).

Scanning parameters
Scanning took place at the Wellcome Centre for Human Neuroimaging, London, using a 3 Tesla Siemens Prisma MRI scanner with a 64-channel head coil. We used the same sequences as in Mazor et al. [6].

Analysis
The pre-registered objectives of this study were to: 1. replicate our finding of an interaction between task (discrimination/detection) and a quadratic effect of confidence on BOLD signal in medial and lateral frontopolar cortex, as well as in the STS and pre-SMA, 2. replicate our finding of an interaction between detection response ( present/absent) and the linear effect of confidence on activation in the right TPJ, 3. compare quadratic effects of confidence on activations in the frontopolar cortex, the STS and the pre-SMA in a tilt-recognition task with those in detection and discrimination tasks, and 4. compare response-specific linear effects of confidence on activation in the right TPJ in a tilt-recognition task with those in detection and discrimination tasks.

Exclusion criteria
Individual experimental blocks were excluded in the following cases: (1) More than 20% of the trials in the block were missed.
(3) The participant used the same response in more than 80% of the trials.
(4) For a particular response, the same confidence level was reported for more than 90% of the trials.
The first trial of each block was excluded from all analyses, leaving 25 usable trials per block. Subjects were included only if after applying block-wise exclusion specified above, their data had at least three blocks for each task.

fMRI data preprocessing
As in Mazor et al. [6], fMRI data preprocessing followed the procedure described in Morales et al. [8].

Regions of interest
In addition to an exploratory whole-brain analysis (corrected for multiple comparisons at the cluster level), our analysis focused on the following a priori regions of interest: (1) Medial frontopolar cortex (FPm). Obtained from a previous connectivity-based parcellation [32].
(3) Brodman area 46 (BA46). Obtained from [32]. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 measures ( pulse and breathing). Button presses were modelled as stick functions, convolved with the canonical HRF and separated into three regressors: two regressors for each of the two right-hand buttons, and one regressor for both up and down left-hand presses (table 1).

Categorical-confidence design matrices
We also fitted a set of three design matrices-one for each task-in which confidence was modelled as a categorical variable. These design matrices consisted of only one regressor of interest for all included trials, modelled by a boxcar with non-zero entries at the 4000 ms interval starting at the onset of the stimulus and ending immediately after the confidence rating phase, convolved with a canonical HRF. This regressor was in turn modulated by a series of 12 dummy (0/1) parametric modulators-one for every response ('yes' and 'no' for detection, 'vertical' and 'tilted' for tilt recognition and 'clockwise' and 'anticlockwise' for discrimination) and confidence rating (1-6). Using three design matrices instead of one allowed us to set trials from the remaining two tasks to serve as a baseline for the task of interest. These design matrices included the same set of nuisance regressors as the main design matrix. For each participant, beta-estimates from the categorical-confidence design matrices were used as input to six response-specific multiple linear regression models, with linear confidence and quadraticconfidence terms as predictors, in addition to an intercept term. Subject-specific coefficients were then subjected to ordinary least-squares (OLS) group-level inference, to estimate linear and quadratic effects of confidence on univariate brain activation and compare these effects between responses. The rationale for employing this two-step approach is its indifference to differences in the confidence distributions for the six responses, which may bias the estimation of quadratic and linear terms. Furthermore, linear and quadratic regressors were not orthogonalized, and instead competed to explain variance in the data, minimizing interpretational ambiguity that may arise when using default orthogonalization settings [34].

Representational similarity analysis
RSA [19] was used to detect consistent spatio-temporal structures in the representation of choice and confidence across tasks and responses, within our seven pre-specified ROIs. High-and low-confidence trials were defined using a median split within each response category. The empirical RDM was then compared against the following set of a priori theoretical RDMs: (1) Task (figure 4a). Trials of the same task are similar; trials of different tasks are different.
(2) Variance structure (figure 4b). Discrimination trials are similar to each other. Detection 'present' trials and tilt-recognition 'tilted' trials are similar (high variance), and detection 'absent' trials are similar to tilt-recognition 'vertical' trials (low variance). responses, and 'tilted' responses are different from 'vertical' responses in the tilt-recognition task, with no consistent differences between the two discrimination responses. (5) Confidence (figure 4e). High-and low-confidence trials are represented differently, without an effect of task or response. (6) Confidence and variance structure interaction (figure 4f ). High-and low-confidence trials are represented differently. This effect is modulated by the variance structure of the trial category. Since tasks were presented in different blocks, a high degree of similarity between conditions within a task could emerge due to temporal autocorrelations in physiological and physical noise, irrespective of distances in neural representations. To control for this, neural RDMs were constructed from distances between pairs of conditions from distinct experimental runs, and never within a single run. We chose Euclidean distance as our dissimilarity measure in order to be sensitive to differences in overall activity between tasks and responses, in addition to the relative activation patterns of voxels within an ROI.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221091 For each ROI, the lower bound of the noise ceiling was defined as the average Spearman correlation between a given participants' empirical RDM and the average ranked RDM of all other participants [21]. This number reflects the shared variance between the RDMs of different participants that can be captured by any theoretical RDM.

Group-level inference
For exploratory whole-brain analysis, group-level inference followed an OLS procedure on the subjectspecific contrast maps. Correction for multiple comparisons was performed at the cluster level, using a significance threshold of p = 0.05 and a cluster defining threshold of p = 0.001. No correction for multiple comparisons was applied to our pre-specified ROIs.
Bayes factors were extracted by following the method described in Rouder et al. [18]. Whenever an expected effect size could be estimated from Exp. 1, we used it as a scaling factor for the prior distribution over effect sizes, reflecting a belief that if an effect exists, there is a probability of 0.5 that it is weaker than what we had observed in Exp. 1. Whenever an effect size could not be reliably estimated, we used the default scaling factor of ffiffi ffi 2 p =2 .
Ethics. Participants gave their informed consent to take part in the experiment. The experiment was approved by the UCL ethics committee (approval nos. 8231/001 and 1260/003).