Metacognition across sensory modalities: Vision, warmth, and nociceptive pain

The distinctive experience of pain, beyond mere processing of nociceptive inputs, is much debated in psychology and neuroscience. One aspect of perceptual experience is captured by metacognition—the ability to monitor and evaluate one’s own mental processes. We investigated confidence in judgements about nociceptive pain (i.e. pain that arises from the activation of nociceptors by a noxious stimulus) to determine whether metacognitive processes contribute to the distinctiveness of the pain experience. Our participants made intensity judgements about noxious heat, innocuous warmth, and visual contrast (first-order, perceptual decisions) and rated their confidence in those judgements (second-order, metacognitive decisions). First-order task performance between modalities was balanced using adaptive staircase procedures. For each modality, we quantified metacognitive efficiency (meta-d’/d’)—the degree to which participants’ confidence reports were informed by the same evidence that contributed to their perceptual judgements—and metacognitive bias (mean confidence)—the participant’s tendency to report higher or lower confidence overall. We found no overall differences in metacognitive efficiency or mean confidence between modalities. Mean confidence ratings were highly correlated between all three tasks, reflecting stable inter-individual variability in metacognitive bias. However, metacognitive efficiency for pain varied independently of metacognitive efficiency for warmth and visual perception. That is, those participants who had higher metacognitive efficiency in the visual task also tended to have higher metacognitive efficiency in the warmth task, but not necessarily in the pain task. We thus suggest that some distinctive and idiosyncratic aspects of the pain experience may stem from additional variability at a metacognitive level. We further speculate that this additional variability may arise from the affective or arousal aspects of pain.


Introduction
Subjectivity is considered a fundamental aspect of the pain experience (e.g. Beecher, 1957Beecher, , 1965Coghill, McHaffie, & Yen, 2003;Guerit, 2012;Hyyppä, 1987;Koyama, McHaffie, Laurienti, & Coghill, 2005;Raij, Numminen, Narvanen, Hiltunen, & Hari, 2005). One facet of subjective experience is metacognition-the ability to monitor and evaluate one's own mental processes (Metcalfe & Shimamura, 1994). Metacognition can be measured by how closely confidence reports track the fidelity of the mental process in question. In perceptual decisionmaking tasks, people with high metacognitive sensitivity are more confident when they have made a correct judgement (i.e. when their perceptual decision accurately reflects the physical properties of a sensory stimulus) than when they have made an incorrect judgement. Independently of metacognitive sensitivity, a person might show a metacognitive bias, that is, a tendency to be over-or under-confident regardless of whether the judgement was correct. These measures jointly characterise how people evaluate their perceptual decisions. Applied to judgements about nociceptive pain-i.e. pain that arises from the activation of nociceptors by a noxious stimulus (IASP Task Force on Taxonomy, 2011)-metacognitive measures may shed light on some distinctive features of pain perception, such as its vividness and its variability, even when the physical properties of the evoking stimulus are held constant (Coghill et al., 2003;Nickel et al., 2017;Schulz et al., 2015;Woo et al., 2017).
There are several reasons to suspect that metacognition for nociceptive pain may differ from metacognition for other sensory modalities. First, nociception, like interoceptive senses, serves a primary role in body regulation and defence (Craig, 2002(Craig, , 2003, rather than fine discrimination of stimulus attributes. Indeed, the first response to https://doi.org/10.1016/j.cognition.2019.01.018 Received 23 May 2018; Received in revised form 25 January 2019; Accepted 28 January 2019 nociceptor activation is usually a reflexive defensive reaction (Ellrich, Bromm, & Hopf, 1997;Skljarevski & Ramadan, 2002;Willer, 1977). Metacognitive oversight would benefit a sensory system tuned for discriminative precision because it allows for error correction and strategic behavioural adjustments in response to uncertainty (Redford, 2010;Yeung & Summerfield, 2012). In contrast, sensory systems that maintain homeostasis and facilitate quick defensive reactions must be able to function effectively without conscious cognitive control. Thus, metacognition may have less access to pain and to interoceptive senses than to sensory systems with fine discriminative capacities such as vision. Indeed, studies of interoceptive heartbeat perception have generally found poor metacognitive sensitivity to such signals (Azevedo, Aglioti, & Lenggenhager, 2016;Garfinkel, Seth, Barrett, Suzuki, & Critchley, 2015;Khalsa et al., 2008) and dissociations in metacognitive sensitivity between interoceptive and exteroceptive sensory modalities (Garfinkel et al., 2016). 2 Nociceptive metacognition might be similarly dissociated from exteroceptive metacognition because a basic function of nociception is to defend the integrity of the body by allowing quick motor reactions.
Second, nociceptive pain elicits physiological arousal and affective responses in addition to sensory processes (Hilgard & Morgan, 1975;Lenox, 1970;Melzack & Casey, 1968;Rainville, Carrier, Hofbauer, Bushnell, & Duncan, 1999;Storm, 2008). Studies that induced changes in arousal through subliminal affective priming (Allen et al., 2016) and pharmacological manipulation (Hauser et al., 2017) suggested that arousal responses may reduce the tendency to adjust metacognitive judgements according to internal or external noise, although they disagreed on which aspect of metacognition (sensitivity or bias) was most affected. Additionally, some studies have reported that negatively-valenced material increased measures of confidence in perception (Koizumi, Mobbs, & Lau, 2016) and in subsequent recall (Schwartz, 2010;Zimmerman & Kelley, 2010), while others found no effect of negative valence on metacognition (D'Angelo & Humphreys, 2012;Jersakova, Souchay, & Allen, 2015). Though these studies offer mixed evidence on the relations between arousal, affect, and metacognition, they suggest that the negatively valenced and arousing qualities of nociceptive pain could alter the calibration of metacognitive judgements, perhaps yielding over-confidence in perceptual decisions.
We investigated how metacognitive access to nociception compares to thermoception, a sensory modality that also serves a regulatory role for the body, and to vision, a sensory modality with fine discriminative capacities that is widely studied in metacognition research. Participants made intensity discrimination judgements about three different kinds of stimuli: noxious heat (pain), innocuous warmth, and visual gratings (contrast). They also rated their confidence in those judgements. We quantified metacognitive access using the ratio meta-d'/d'. This represents the efficiency with which confidence ratings discriminate between 'correct' and 'incorrect' trials, while controlling for differences in perceptual sensitivity (Fleming, 2017;Maniscalco & Lau, 2012). To examine metacognitive bias, we also compared mean confidence ratings across these three modalities. We controlled task difficulty across participants and sensory modalities using an adaptive staircase procedure. Because both nociception and thermoception serve chiefly defensive and regulatory functions (Craig, 2002(Craig, , 2003, we expected to find lower metacognitive efficiency scores for nociceptive pain and innocuous warmth discrimination tasks than for a visual contrast discrimination task. Further, we expected that individual differences in metacognitive efficiency would correlate across pain and warmth discrimination tasks, but that neither would correlate with metacognitive efficiency for visual contrast discrimination. Finally, we predicted higher confidence in judgements about pain, relative to judgements about warmth and visual contrast, because of the characteristic vividness and aversiveness of pain experiences.

Participants
To determine sample size, we used sequential hypothesis testing with Bayes factors (Schönbrodt, Wagenmakers, Zehetleitner, & Perugini, 2017). We selected a minimum sample size of 24, and defined our stopping rule as the point at which the Bayes factors (BF 10 ) for analyses of variance (ANOVAs) across our three conditions were higher than 3.00 (implying moderate support for the alternative hypothesis) or lower than 0.33 (implying moderate support for the null hypothesis; Jeffreys, 1961;Lee & Wagenmakers, 2013). We calculated Bayes factors after running 24 participants, and again after each additional 4 participants. Our stopping rule was reached at 36 participants (18 female, mean age = 24.50, range = 19-38). Sequential hypothesis testing with Bayes factors does not require corrections for multiple tests because the critical inference is based not on the probability of making a Type I error, but on a ratio (BF 10 ) indicating how much more (or less) likely the data would be under the alternative hypothesis compared to the null hypothesis (Schönbrodt et al., 2017).
All participants had normal or corrected-to-normal vision, normal cutaneous sensation, and no history of neurological or psychiatric disorders by self-report. They gave written informed consent prior to the experiment, and were compensated for their time with a per-hour payment of £7.50 or 1 course credit. One participant chose not to complete the experiment, and another participant's data were lost due to equipment failure. These incomplete datasets were not analysed. A third participant finished the experiment but performed at chance level on the innocuous warmth discrimination task, so that participant's entire dataset was also excluded from all analyses. These participants were replaced with others in the final sample. The study was approved by the University College London Research Ethics Committee, and carried out in accordance with the provisions of the World Medical Association Declaration of Helsinki.
Noxious and innocuous thermal stimuli were delivered using a computer-controlled Peltier thermode with a 13-mm diameter penshaped probe (Physitemp NTE-2A, Clifton, NJ, USA). The probe was affixed to a computer-controlled haptic device (PHANToM Premium 1.5, Geomagic, Morrisville, NC, USA) that was used to jitter stimulus position and to bring the probe into contact with the hand dorsum with a light force of 0.2 N. Skin temperature on the hand dorsum was monitored with a spot infrared thermometer (Precision Gold N85FR; Maplin Electronics, Rotherham, UK).

Procedure
All participants completed a perceptual intensity discrimination task in three different modalities: visual contrast, innocuous warmth, and nociceptive pain. Participants also completed a manipulation check 2 Note that none of those findings were based exclusively on the heartbeat counting task, which was shown to be a flawed measure of interoceptive accuracy (Zamariola, Maurage, Luminet, & Corneille, 2018).

B. Beck et al.
C o g n it io n 1 8 6 ( 2 0 1 9 ) 3 2 -4 1 in which they rated the painfulness of stimuli used in the nociceptive pain and innocuous warmth tasks, to confirm that the temperature ranges were perceived differently. These four tasks were completed in two experimental sessions on separate days. The second session was done within three days of the first session, and at the same time of day. Each session lasted about 1.5 h. The nociceptive pain and innocuous warmth discrimination tasks were always done in different sessions to minimise effects of habituation, sensitisation, or receptor fatigue from repeated thermal stimulation. The order of these tasks was counterbalanced across participants. The manipulation check was always done in the second session, after both the nociceptive pain and innocuous warmth discrimination tasks had been completed. The visual contrast discrimination task was done in the first session with either the nociceptive pain or the innocuous warmth discrimination task. Task order in the first session was counterbalanced across participants. Each task consisted of 180 trials of a two-interval alternative forced choice (2IFC) judgement. Participants were given a short break after every 20 trials. The first 20 trials were considered a practice block, and were not included in any statistical analyses. Each trial consisted of a reference stimulus, which was presented at the same stimulus intensity (i.e. the same contrast or temperature) on every trial, and a test stimulus, whose intensity was adapted throughout the task using a continuous 2down/1-up staircase procedure, in order to keep discrimination accuracy at approximately 70.7% (Levitt, 1971). The order and locations of the reference and target stimuli were counterbalanced across trials.

Visual contrast discrimination
Participants sat with their head in a chin rest approximately 57 cm from the screen. Each trial began with a central fixation cross (1000 ms), followed by two Gabor patches presented sequentially (200 ms each) with a 300-ms inter-stimulus interval (ISI). The first Gabor patch was presented either 7.5°to the left or 7.5°to the right of the fixation cross (pseudorandomly with equal probability across trials), and the second Gabor patch was presented in the other location, in order to mirror the spatial jittering procedure used for the innocuous warmth and noxious heat tasks (see Sections 2.3.2 and 2.3.3). After the offset of the second stimulus, a prompt appeared on the screen asking participants to report which stimulus was higher in contrast. Following their response, another prompt appeared asking them to report how confident they were in their response on a scale of 1 (not confident)to4 (confident). Participants were encouraged to use the entire confidence scale over the course of the task. They used a numerical keypad to respond to both prompts (Fig. 1a).
The reference stimulus was always presented with 50% contrast. The test stimulus started at 70% and was adapted throughout the task based on performance. It was increased by 3% following an incorrect response and decreased by 3% following two consecutive correct responses.

Innocuous warmth discrimination
Participants sat with their left hand placed palm down on the table in front of them. Prior to the task, the baseline skin temperature on their left hand dorsum was recorded (M = 31.04°C, SD = 2.19°C). Each trial began with a central fixation cross which remained on the screen until response prompts were displayed. The haptic device sequentially delivered two contact thermal stimuli (2000 ms each) to distinct locations on the left hand dorsum with a 3000-ms ISI. Stimulus location was jittered between four different locations on the hand dorsum to avoid peripheral effects such as receptor fatigue or persistent changes in skin temperature. The distance between these locations was adjusted for each participant based on hand size and shape, but was always at least 15 mm. After the offset of the second stimulus, a prompt appeared on the screen asking participants to report which stimulus was warmer. Then participants rated their confidence in their perceptual decision, as described in Section 2.3.1 above. Skin temperature on the left hand dorsum was monitored between blocks to ensure it had returned to the baseline skin temperature before starting the next block (mean change = 0.10°C, SD = 0.27°C).
The reference stimulus was always 38.0°C. The target stimulus started at 40.0°C and was adapted throughout the task based on performance. It was increased by 0.5°C following an incorrect response and decreased by 0.5°C following two consecutive correct responses. The test stimulus was never increased higher than 43.0°C-even if a participant made an incorrect response when comparing a 43.0°C test stimulus with the 38.0°C reference stimulus-to avoid delivering stimuli in the noxious heat range.

Nociceptive pain discrimination
The procedure of the nociceptive pain discrimination task was the same as the procedure for innocuous warmth discrimination (see Section 2.3.2), except that we used a higher temperature range of noxious heat for thermal stimulation, and participants reported which stimulus was more painful. The reference stimulus was always 45.0°C (i.e. the normative heat pain threshold; Dyck et al., 1993;Yarnitsky, Sprecher, Zaslansky, & Hemli, 1995). The target stimulus started at 47.0°C and was adapted throughout the task based on performance. It was increased by 0.5°C following an 'incorrect' response (i.e. an unexpected response based on noxious stimulus intensity) and decreased by 0.5°C following two consecutive 'correct' responses (i.e. the expected response based on noxious stimulus intensity). The test stimulus was never increased higher than 50.0°C as a precaution against skin damage. The baseline skin temperature on the left hand dorsum was recorded prior to the task (M = 31.24°C, SD = 2.83°C), and monitored between blocks to ensure it had returned to baseline before starting the next block (mean change = 0.17°C, SD = 0.37°C).

Manipulation check for thermal stimuli
In each trial, a single thermal stimulus (2000 ms) was delivered to the left hand dorsum. The temperature of the stimulus was set to either the lowest temperature delivered in the nociceptive pain discrimination task (i.e. 45.0°C) or the highest temperature delivered on any trial to each individual participant in the innocuous warmth discrimination task (M = 42.68°C, SD = 0.54°C). These temperatures were chosen to ensure that even the most similar stimuli delivered in the nociceptive pain and innocuous warmth discrimination tasks were perceived differently. After stimulus offset, a prompt appeared on the screen asking participants to report how painful the stimulus was on a scale of 1 (not painful)t o4( painful). The brief task consisted of 20 trials-10 of each stimulus temperature-in a randomised order.

Statistical analysis
First, we compared the percentage of correct responses between tasks using a Bayesian repeated measures ANOVA and Bayesian paired samples t-tests with default Cauchy priors (t-tests: r = 0.707; ANOVA: r fixed =1, r random = 0.5) to check whether our staircase procedures were successful. Then we used participants' 2IFC intensity judgements and confidence ratings to calculate signal detection theoretic measures of first-order perceptual sensitivity (d'), second-order metacognitive sensitivity (meta-d'), and metacognitive efficiency (meta-d'/d') for each participant in each sensory modality. To do this, we used a singlesubject Bayesian estimation approach, which tends to perform better than the maximum likelihood estimation and sum-of-squared error approaches when there are relatively few trials per subject and condition (Fleming, 2017). We calculated metacognitive bias as the participant's mean confidence rating in each task, irrespective of accuracy. Then we used Bayesian repeated measures ANOVAs and Bayesian paired samples t-tests to look for differences in perceptual sensitivity, metacognitive sensitivity, metacognitive efficiency, and mean confidence between sensory modalities.
We used Bayesian Pearson correlations with a default stretched beta prior over positive coefficient values (width = 1) to investigate whether individual differences in these four dependent variables were positively B. Beck et al. C o g n it io n 1 8 6 ( 2 0 1 9 ) 3 2 -4 1 correlated across all possible pairs of sensory modalities in our design. For each condition and dependent measure, we report the mean and the 95% credible interval (CI). We used frequentist Steiger's Z tests implemented by the R package cocor (Diedenhofen & Musch, 2015)t o compare correlation coefficients for overlapping pairs of dependent measures. Additionally, we used a hierarchical Bayesian model to estimate group-level correlation coefficients for individual differences in metacognitive efficiency (Fleming, 2017). All Bayesian hypothesis tests were performed in JASP (version 0.8.1.1; http://www.jasp-stats.org). BF 10 values indicate how much more likely the alternative hypothesis is than the null hypothesis, given the prior and the evidence (Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010). A BF 10 greater than 3.00 or less than 0.33 is considered to show moderate support for the alternative or the null hypothesis, respectively. Similarly, a BF 10 greater than 10.00 (or less than 0.10) is considered to show strong support for the alternative (or the null) hypothesis (Jeffreys, 1961;Lee & Wagenmakers, 2013). One of the main advantages of Bayesian hypothesis testing is that, unlike the pvalue in standard frequentist hypothesis testing, the Bayes factor distinguishes between results that support the null hypothesis (BF 10 < 0.33) and tests that lack the statistical power to infer support for either the alternative or the null hypothesis (0.33 < BF 10 < 3.00). Thus, when reporting the results of these tests below, we distinguish between tests showing evidence for a difference (or correlation) between conditions (BF 10 > 3.00), tests showing evidence for no difference (or correlation) between conditions (BF 10 < 0.33), and tests that were inconclusive (0.33 < BF 10 < 3.00).

Percentage of correct responses
A Bayesian repeated measures ANOVA showed strong evidence for differences in the percentage of correct responses between sensory modalities, BF 10 = 1.04 × 10 7 . Follow-up Bayesian paired samples ttests showed that participants made fewer correct responses in the innocuous warmth discrimination task (M = 68.9%, 95% CI = [67.6%, 70.1%]) than in the visual contrast discrimination task (M = 71.7%, 95% CI = [71.3%, 72.2%]), BF 10 = 328, and the nociceptive pain discrimination task (M = 72.2%, 95% CI = [71.7%, 72.7%]), BF 10 = 5.09 × 10 4 . The comparison between percentages of correct responses in the visual contrast discrimination task and the nociceptive pain discrimination task was inconclusive, BF 10 = 0.47. These results indicate that our attempt to hold task difficulty constant across the three sensory modalities was not entirely successful. We placed a strict upper limit of 43.0°C on the test stimulus in the innocuous warmth intensity staircase so that it would not increase into the noxious heat range. However, some participants gave incorrect answers even at the maximum temperature of the warm test stimulus, so overall performance in this modality was slightly worse than in the other two modalities. Such small but reliable differences in performance reinforce the need to appropriately control for perceptual sensitivity when quantifying metacognition.
Bayesian Pearson correlations showed that individual differences in perceptual sensitivity were not positively correlated between the visual discrimination task and the warmth discrimination task, r = 0.05, BF +0 = 0.26. The correlations between the pain and visual discrimination tasks, r = 0.15, BF +0 = 0.48, and the pain and warmth discrimination tasks, r = 0.27, BF +0 = 1.35, were inconclusive Fig. 1. Examples of trials in (a) the visual contrast discrimination task, (b) the innocuous warmth discrimination task, and (c) the nociceptive pain discrimination task. For all three tasks, two stimuli of different intensities were presented sequentially in each trial. Participants made a forced choice intensity discrimination judgement, and then rated their confidence in that judgement on a 4-point scale.

Metacognitive sensitivity (meta-d')
A Bayesian repeated measures ANOVA indicated that there were no differences in metacognitive sensitivity (meta-d') between sensory modalities, BF 10 = 0.12 (Fig. 2b) Bayesian Pearson correlations showed that individual differences in metacognitive sensitivity were not positively correlated between the visual discrimination task and the pain discrimination task, r = −0.01, BF +0 = 0.20. The correlations between the visual and warmth discrimination tasks, r = 0.13, BF +0 = 0.42, and the pain and warmth discrimination tasks, r = 0.28, BF +0 = 1.44, were inconclusive (Fig. 3b).

Metacognitive efficiency (meta-d'/d')
We considered that our measure of metacognitive sensitivity-metad'-might be confounded by differences in perceptual sensitivity between conditions, because the innocuous warmth discrimination task was more difficult than the nociceptive pain and visual contrast discrimination tasks (Fig. 2a). In contrast, metacognitive efficiency scores are not confounded by small differences in perceptual sensitivity between conditions, because they represent the ratio of metacognitive sensitivity to perceptual sensitivity (i.e. meta-d'/d'). Thus, metacognitive efficiency provides a more appropriate measure than metacognitive sensitivity for how well confidence tracked performance in each modality.
A Bayesian repeated measures ANOVA indicated that there were no differences in metacognitive efficiency (meta-d'/d') between sensory modalities, BF 10 = 0.32 (Fig. 2c). As a group, participants were close to metacognitive optimality, with metacognitive efficiency scores near 1 (vision: M = 0.90, 95% CI = [0.78, 1.02]; warmth: M = 1.00, 95% CI = [0.88, 1.12]; pain: M = 0.88, 95% CI = [0.77, 1.00]). That is, the d' that provided the best fit to confidence ratings was similar to observed perceptual sensitivity. This implies that there was no loss of (or gain in) perceptual information between the first-order perceptual decision and the second-order confidence judgement.
Bayesian Pearson correlations showed strong evidence that individual differences in metacognitive efficiency were positively correlated between visual discrimination and warmth discrimination tasks, r = 0.42, BF +0 = 10.20. (Note that we found evidence supporting the absence of a positive correlation between first-order visual and warmth discrimination performance, i.e. d', so confounds with perceptual sensitivity cannot explain this finding.) Further correlation tests indicated no positive correlation between metacognitive efficiency scores in the visual discrimination task and the pain discrimination task, r = −0.04, BF +0 = 0.17. The correlation between the warmth and pain discrimination tasks was low, but inconclusive, r = 0.12, BF +0 = 0.40 (Fig. 3c).
Our Bayesian correlation tests showed strong evidence for a positive correlation between metacognitive efficiency scores in the visual and warmth discrimination tasks, and moderate evidence against a positive correlation between metacognitive efficiency scores in the visual and pain discrimination tasks. However, those tests did not directly compare the correlation coefficients to each other. To test for differences between correlation coefficients, we used two-tailed Steiger's Z tests for overlapping correlations (employing a standard frequentist hypothesistesting approach). We found a significant difference between the visionwarmth and vision-pain correlations, Z = 2.13, p = 0.033. This further supports the finding of greater shared variance in metacognitive efficiency between the visual and warmth discrimination tasks than between the visual and pain discrimination tasks. Comparisons between vision-warmth and pain-warmth correlations, Z = 1.29, p = 0.198, and between vision-pain and pain-warmth correlations, Z = −0.89, p = 0.372, were not significant. (Note that frequentist hypothesis tests do not distinguish between evidence for the absence of a difference and insufficient statistical power to detect a difference.) All preceding correlation tests were based on point estimates of metacognitive efficiency from a relatively small number of participants (N = 36). Single-subject estimates of metacognitive efficiency can be noisy, so our estimates of the correlation coefficients may have also been imprecise. To overcome this potential issue, we used a hierarchical Bayesian model to estimate the covariance in metacognitive efficiency between visual, warmth, and pain discrimination tasks. A hierarchical Bayesian model ensures that uncertainty in subject-level parameter estimates appropriately propagates through to uncertainty around estimates of cross-task covariance (Fleming, 2017). In this case, the hierarchical model fits revealed the same pattern of results as the single-subject estimates. There was a significant positive correlation in individual differences in metacognitive efficiency between the visual and warmth discrimination tasks, ρ = 0.69, 95% CI = [0.06, 0.98]. (Note that statistical significance is obtained when the 95% CI does not overlap with zero.) Individual differences in metacognitive efficiency were not correlated between the visual and pain discrimination tasks, In all three tasks, several participants had metacognitive efficiency values greater than 1 (Fig. 3c), indicating higher metacognitive sensitivity (meta-d') than perceptual sensitivity (d'). This might occur if confidence depended on some processes independent of performance, for example processes that occur after decision, or in parallel to decision-making (Fleming & Daw, 2017). However, both d' and meta-d' estimates are inevitably subject to error. Metacognitive efficiency, as the ratio of the latter to the former, will be influenced by these errors, particularly when d' is low. We therefore also examined an alternative measure of metacognitive efficiency, meta-d'−d', which is less prone to such error amplification. This alternative measure yielded similar results (see Supplementary Results and Fig. S1).

Manipulation check for thermal stimuli
A Bayesian paired samples t-test showed strong evidence that participants felt a difference between the lowest level of noxious heat stimulation and the highest level of innocuous warmth stimulation delivered on any trial, BF 10 = 1.24 × 10 7 , thus validating that the lowest temperature stimulus in the noxious heat range was rated as more painful (M = 2.47, 95% CI = [2.29, 2.65]) than the highest temperature stimulus in the innocuous warmth range (M = 1.88, 95% CI = [1.71, 2.04]). There was, however, some variability in how the stimuli were perceived, both between and within individuals (Fig. 4). This was expected, yet we were not able to further separate the temperature ranges we used for the innocuous warmth and nociceptive pain discrimination tasks, due to the maximum safe contact heat temperature of 50.0°C, and the need to control first-order performance by varying the temperature difference between stimuli in a staircase procedure. We consider the implications of this design limitation in the Discussion. Importantly, our results do not change if we exclude the four participants who did not rate the lowest level of noxious heat as more painful than the highest level of innocuous warmth (see Fig. 4a and Supplementary Results).

Discussion
Our results do not support the hypothesis of reduced metacognitive access to nociceptive pain and innocuous thermal perception, compared to vision. We found no overall differences in metacognitive efficiency (meta-d'/d') between intensity judgements of visual contrast, innocuous warmth, and nociceptive pain (Fig. 2c). Some authors have proposed that interoceptive modalities lack the metacognitive sensitivity that accompanies exteroception (Azevedo et al., 2016;Garfinkel et al., 2015;Khalsa et al., 2008). Like interoceptive senses, the primary functions of both thermoceptive and nociceptive sensory systems are to maintain the optimal condition of the body and to defend it from harm (Craig, 2002(Craig, , 2003. The visual system, on the other hand, allows us to make fine discriminative judgements about objects and events in our surroundings. The processes of cognitive control and flexible behaviour enabled by metacognition (Redford, 2010;Yeung & Summerfield, 2012) might better serve discriminative functions than regulatory or defensive functions, the latter of which must operate effectively without conscious oversight. Nevertheless, our study indicates comparable metacognitive access to both discriminative and regulatory sensory modalities.
Moreover, we found that individual differences in metacognitive efficiency were positively correlated between the visual contrast and innocuous warmth discrimination tasks (Fig. 3c). Importantly, that correlation must have arisen from individual differences in metacognition rather than first-order perception, because there was no correlation in first-order perceptual sensitivity (d') between the same tasks (Fig. 3a). This finding suggests there is a common metacognitive system for vision and innocuous thermal perception, despite their disparate roles in fine discrimination of stimulus attributes and regulation of the body's condition, respectively. A previous study found no correlation in metacognitive sensitivity between a discriminative sense (touch) and regulatory, interoceptive senses (cardiac and respiratory signals), suggesting distinct metacognitive processes for those sensory categories (Garfinkel et al., 2016). However, those authors used a measure of metacognitive sensitivity-the type II ROC curve-that is potentially confounded by perceptual task performance. Our measure of metacognitive efficiency is not subject to such confounds (Fleming & Lau, 2014).
Conversely, we found evidence against the existence of a correlation between metacognitive efficiency for vision and nociception (Fig. 3c). Further, we found little evidence of a correlation in metacognitive Fig. 4. Variability in participants' ratings of the highest level of stimulation used in the innocuous warmth discrimination task (max. 43.0°C) and the lowest level of stimulation used in the nociceptive pain discrimination task (always 45.0°C). Overall, the lowest level of noxious heat was perceived as more painful than the highest level of innocuous warmth. However, perception of these stimuli varied both (a) between participants and (b) between trials.

B. Beck et al.
C o g n it io n 1 8 6 ( 2 0 1 9 ) 3 2 -4 1 efficiency between nociception and innocuous thermoception, even though the two are similar in terms of their functional roles and physiological pathways (Craig, 2002(Craig, , 2003. This is particularly striking because we used the same equipment and procedure to administer the stimuli for the innocuous warmth and nociceptive pain discrimination tasks, except that the thermal probe temperature was increased into the noxious heat range in the latter task. The unshared variance in nociceptive metacognition was not predicted, and awaits further support from replication studies. Nevertheless, we consider that it could either reflect a distinct metacognitive process, or an additional source of variation due to individual differences in some component that accompanies pain, such as affect or arousal responses. Pain has a strong affective component in addition to its sensory component (Melzack & Casey, 1968). Ratings of pain intensity and unpleasantness can even be dissociated, (e.g. Gracely, Dubner, & McGrath, 1979;Rainville et al., 1999;Smith, Gracely, & Safer, 1998), suggesting that affect is a distinctive component of pain, rather than a mere by-product. In our nociceptive pain discrimination task, participants reported which of two noxious heat stimuli was more painful without being asked to focus on either sensory or affective aspects, so their judgements presumably reflected both these components of pain. Moreover, pain can produce physiological arousal responses (Hilgard & Morgan, 1975;Lenox, 1970;Rainville et al., 1999;Storm, 2008), another factor known to influence metacognition (Allen et al., 2016;Hauser et al., 2017). Since noxious heat stimuli are both more arousing and more negatively valenced than innocuous thermal or visual contrast stimuli, these potential sources of variability would have been stronger in the nociceptive pain discrimination task than in the other tasks. Either the affective or arousal components of pain may thus have contributed to the unshared variance in nociceptive metacognition that we found here. In all three discrimination tasks, there were several participants with metacognitive efficiency (meta-d'/d') values greater than 1 (Fig. 3c). Such a finding could potentially result from imprecise estimates of low values of d'. Although there were a few outliers with low d' values in the warmth discrimination task (Fig. 3a), for the most part, our staircase procedure yielded sufficiently high levels of d' to avoid this problem. Moreover, we analysed our data using an alternative, nonratio measure of metacognitive efficiency (meta-d'−d'), and found the same results (see Supplementary Results and Fig. S1). Thus, our finding suggests that some participants experienced a gain in confidence-related information between their first-order perceptual decision and their subsequent, second-order confidence rating. Some previous studies that measured metacognitive efficiency have also found this trend (Charles, Van Opstal, Marti, & Dehaene, 2013;Faivre, Filevich, Solovey, Kühn, & Blanke, 2018). One possible explanation is that parallel accumulation of evidence or post-decisional processing allowed the recognition of errors in first-order decisions (Charles et al., 2013;Fleming & Daw, 2017). Our use of unspeeded perceptual judgements should have mitigated this influence by reducing errors related to quick responses. Nonetheless, given the difficulty of the discriminations they were asked to make, some participants may have changed their minds after their first decision and assigned lower confidence ratings to trials where they made an error, resulting in higher metacognitive sensitivity (meta-d') than perceptual sensitivity (d').
In addition, we examined metacognitive bias across vision, innocuous warmth, and nociceptive pain perception. There were no overall differences in confidence between modalities (Fig. 2d), and individual differences in mean confidence ratings were highly correlated across all three tasks (Fig. 3d). This is consistent with previous studies that found correlations in mean confidence levels across different tasks, both within and between sensory modalities (Ais, Zylberberg, Barttfeld, & Sigman, 2016;Song et al., 2011) and between perceptual and memory domains (Baird, Cieslak, Smallwood, Grafton, & Schooler, 2015;Baird, Smallwood, Gorgolewski, & Margulies, 2013;McCurdy et al., 2013). Some studies also found a task-dependent component of metacognitive bias which was attributed to differences in difficulty between tasks (Baird et al., 2015;Baird et al., 2013;Song et al., 2011). We did not find a task-dependent component of metacognitive bias, even though the innocuous warmth discrimination task was more difficult than the nociceptive pain discrimination task and the visual contrast discrimination task. Thus, our participants did not adjust their average confidence reports according to task difficulty. In this study, at least, consistent individual differences in confidence were the strongest contributing factor to metacognitive bias.
Altogether, the results of our correlation tests suggest that metacognition consists of both a modality-independent component (i.e. metacognitive bias) and a modality-dependent component (i.e. metacognitive efficiency). The former was a consistent trait of individuals, while the latter differentiated judgements about nociceptive pain. Further, our findings suggest that metacognitive ability does not dissociate between senses serving primarily regulatory or discriminative functions, as has been previously suggested for interoceptive and exteroceptive somatosensory modalities (Garfinkel et al., 2016). However, our results also refute pure modality-specificity in metacognitive ability, whereby individual differences in metacognitive efficiency would not correlate across any sensory modalities.
Confidence is often modelled as the strength or quality of the evidence that contributes to a first-order decision (Kepecs, Uchida, Zariwala, & Mainen, 2008;Kiani & Shadlen, 2009;Merkle & Van Zandt, 2006). However, it is unclear how first-order models could account for differences in covariance of metacognitive ability across modalities, as we observed here. In contrast, hierarchical models conceptualise metacognition as a distinct second-order network that represents and evaluates the state of the first-order network computing the decision (Cleeremans, Timmermans, & Pasquali, 2007;Fleming & Daw, 2017;Pasquali, Timmermans, & Cleeremans, 2010). Such models might explain our results in two ways. Under one account, metacognitive ability might be correlated when sensory evidence for two different modalities converges on a single metacognitive monitoring process. This account might predict a distinct metacognitive monitoring process for nociception-although why this separate circuit should have evolved remains unclear (Fig. 5a). Alternatively, as we mentioned above, there might be a single metacognitive mechanism for all sensory modalities, but this mechanism might be differentially affected by non-sensory inputs such as arousal or affect. Modalities that differ sharply in their recruitment of these additional factors would also exhibit low correlations in metacognitive ability (Fig. 5b).
Definitions of pain routinely insist on its subjective nature, and some hold the view that pain can never have any 'ground truth' in the physical properties of the world. Chronic pain conditions, which sometimes lack any apparent neurophysiological aetiology, might encourage this view. In our study, however, participants made judgements about pain that directly resulted from noxious thermal stimulation of nociceptive sensory pathways. Moreover, the 2IFC intensity discrimination task we used was specifically designed to test a discriminative aspect of nociceptive pain, similarly to our tests of innocuous warmth and visual contrast discrimination. By applying signal detection theory, we could determine how much participants' pain reports were informed by the properties of the evoking stimulus (i.e. the first-order judgement), as well as how people experienced the processes that contributed to the formation of their pain reports (i.e. the secondorder judgement, captured here using the established method of confidence ratings). This method allowed us to investigate the relation between judgements about experimentally evoked pain and underlying nociceptive processes, without insisting that pain is reducible to nociception. An alternative approach could have been to ask participants to report which noxious stimulus was hotter, rather than which was more painful. Such an instruction may have induced them to focus on the thermal quality of the noxious stimulation instead of its painfulness. The potential impact of this manipulation on our findings is an open question, and would depend upon whether the unshared variance in metacognitive efficiency for nociceptive pain came from the noxious B. Beck et al. C o g n it io n 1 8 6 ( 2 0 1 9 ) 3 2 -4 1 nature of the stimulus, or from the task requirement to judge pain levels. One limitation of our study was an inability to adjust the temperature ranges of innocuous warmth and noxious heat stimulation so that, for every participant, the latter always felt painful and the former never felt painful at all. We were constrained by safety considerations, which placed an upper limit of 50.0°C on contact thermal stimulation. Additionally, we were constrained by the need to adapt the intensity of the test stimulus throughout the task, so that we could control firstorder task performance and specifically test differences between modalities at the metacognitive level. For the innocuous warmth discrimination task, in particular, this often required a large difference between stimulus temperatures. Thus, we could not further separate the innocuous and noxious temperature ranges without compromising these important considerations, even though it meant that participants would sometimes perceive the upper end of the innocuous warmth range as somewhat painful, or the lower end of the noxious heat range as not at all painful (Fig. 4). If, as we speculate above, the unshared variance in metacognitive efficiency for nociceptive pain judgements arose from affective or arousal responses to noxious stimulation, then we might have found a clearer dissociation between metacognitive efficiency for innocuous warmth and nociceptive pain discrimination if we had adjusted the temperature ranges used for each individual participant based on their painfulness. It is also possible that confidence in judgements about nociceptive pain intensity could be substantively different when discriminating a painful stimulus and a non-painful stimulus, compared to two painful stimuli. We cannot exclude the possibility that some trials in our nociceptive pain discrimination task involved comparing stimuli of different quality (painful vs non-painful) rather than comparing stimuli of different intensity (more vs less painful). This may have introduced some variance in metacognitive efficiency that was not shared with the other tasks. Future studies could explore these issues by using innocuous and noxious thermal stimulation parameters that separate more clearly along the dimension of painfulness (e.g. innocuous cool temperatures vs noxious heat stimuli).
To conclude, we demonstrated that confidence tracks perceptual intensity judgements as precisely for nociceptive pain as for other modalities. However, we found no correlation between metacognitive efficiency for nociception and for vision, and minimal correlation between metacognitive efficiency for nociception and for thermoception. Thus, second-order judgements about nociceptive pain level appear to involve an additional factor, which may be the arousal and/or affective responses typical of noxious stimulation. Metacognitive appraisal is closely linked to higher-order accounts of conscious experience (Lau & Rosenthal, 2011). Our findings are thus consistent with the interesting possibility that distinctive and idiosyncratic features of the nociceptive pain experience, namely high vividness and inter-individual variability, may lie in the affective or motivational components of pain rather than the sensory component.