Introduction

Many students are generally overconfident about their own performance (De Bruin et al. 2017; Dunning et al. 2003; Ehrlinger and Dunning 2003; Kruger and Dunning 1999; Pennycook et al. 2017; Sanchez and Dunning 2018). Not being able to accurately estimate the level of one’s performance can have far-reaching implications, as performance estimates influence control decisions made during the learning process (Nelson and Narens 1990). In the case of overconfidence, students may prematurely end studying because of an incorrect conviction they have mastered the materials (Bol et al. 2005; Dunlosky and Rawson 2012). Especially low-performing students seem to suffer from inaccurate self-monitoring, mostly from overconfidence. High performing students are often much better calibrated, although they may show some underconfidence. This difference in over- and underconfidence among low and high performers is dubbed the unskilled-and-unaware effect (Dunning et al. 2003; Kruger and Dunning 1999).

The unskilled and unaware effect sparked discussion in the literature. A first explanation for the effect was that low performers lack metacognitive awareness because of a ‘double curse’ (Kruger and Dunning 1999). Low performers would lack the skills to conduct a task, and because of this incompetence, they were supposed to also be unaware of the level of their performance. However, other researchers suggested that the effect was simply an artefact of scaling. For example, Krueger and Mueller (2002) and Feld et al. (2017) showed that a large proportion of the unskilled-and-unaware effect could be attributed to regression to the mean. Yet, there remained a proportion of miscalibration and overconfidence of low performers that could not fully be explained by scaling error (Ehrlinger et al. 2008; Feld et al. 2017). It seemed that, albeit less than initially thought, low performers showed less awareness of their own performance than high performers, at least when judging their performance relative to their peers. In a study of Hartwig and Dunlosky (2014), a classic unskilled-and-unaware effect was shown when students compared their performance to a peer group. However, the effect dissapeared when students made absolute judgements of their own performance. Hence, the miscalibration of low performing students on percentile judgements seemed to arise from the inability to judge how their reference group would perform, not from a lack of self-awareness (Hartwig and Dunlosky 2014).

Recently, the unskilled-and-unaware effect is also challenged by studies using a different type of metacognitive measurement. Miller and Geraci (2011) argued that focusing solely on performance estimates gives an incomplete image of students’ metacognitive awareness. Instead, they argued that calibration research should focus on both functional and subjective confidence. Miller and Geraci defined functional confidence as students’ performance estimate, and subjective confidence as the confidence judgement assigned to a given performance estimate. Such subjective confidence judgements are studied by asking students to provide a second-order judgement (SOJ), i.e., rating the confidence they had in their estimate (Dunlosky et al. 2005). Using confidence judgements in calibration research is not new. Many studies on metacognitive awareness use confidence judgements (e.g., Lichtenstein and Fischhoff 1977; Siedlecka et al. 2016, 2019). For example, participants were asked to select a certain answer to a question and to rate how confident they were that their selected answer was indeed accurate (e.g., Dinsmore and Parkinson 2013; Siedlecka et al. 2016). This is, however, different from SOJs. Instead of judging how confident they are in their answer, students judge how confident they are in their estimate when providing a SOJ. Although Dunlosky et al. (2005) pointed to the value of SOJ’s to gain more insight in metacognitive awareness, it is relatively recent that SOJs really came at the center of attention (Miller and Geraci 2011; Händel and Fritzsche 2013, 2016). The reason for this is that SOJs may shed more light on the question whether low performers are indeed unaware of their poor performance, or whether they do show some metacognitive awareness by providing low confidence scores to their performance estimate.

In their study, Miller and Geraci asked university students not only to estimate their grade before taking an exam, but also to provide a ‘second-order judgement’ (SOJ; see Dunlosky et al. 2005): rating the confidence they had in their estimate. In line with previous research, Miller and Geraci found that low performers showed a larger mismatch between their estimated grade and actual grade than high performers. At the same time however, low performers reported less confidence in the accuracy of their estimates than high performers. According to Miller and Geraci, low performers thus seemed to have a certain level of metacognitive awareness. Following up on the research of Miller and Geraci (2011), Händel and Fritzsche (2016) investigated the alignment between SOJs and calibration accuracy on item-by-item judgements and global judgements (i.e., judgements regarding performance on an entire task). Händel and Fritzsche also found a positive relation between performance level and SOJs. Low performers, who were found to be poorly calibrated, were again less confident in their performance estimates than high performers.

Hence, the studies of Miller and Geraci (2011) and Händel and Fritzsche (2016) showed that low performers seem to be aware of their poor calibration accuracy. This finding has implications for the explanation of the unskilled-and-unaware effect, because, as also stated by Hartwig and Dunlosky (2014), qualifying low performers as fully unaware of their performance seems to be incorrect. However, in a follow-up study, Fritzsche et al. (2018) reanalyzed the data from Händel and Fritzsche (2016), and found that low performing students seemed to have a general tendency to provide low confidence judgements. This finding raises the issue whether it is performance level that determines whether students provide high or low subjective confidence scores, or whether it is actual metacognitive awareness.

If SOJs could be used to measure metacognitive awareness, we would expect subjective confidence to be directly related to calibration accuracy. This means that students high on miscalibration have little confidence in their judgements, and students low on miscalibration show high confidence, irrespective of students’ performance level. In that case, students would be metacognitively aware, with confidence judgements aligned to their calibration accuracy.

However, although these prior studies all aim to examine whether people are able to match their confidence judgements to their calibration accuracy, none of the previous studies actually examined the direct relation between calibration accuracy on the one hand, and SOJs on the other, independent from performance level. Hence, the question remains whether lower or higher SOJs are caused by differences in performance, or in students’ ability to calibrate accurately.

Another question that remains unanswered is whether findings on SOJs generalize to different age and school groups. While the studies discussed in the introduction have been exclusively conducted using university students (e.g., Fritzsche et al. 2018; Händel and Fritzsche 2016; Miller and Geraci 2011), unskilled-and-unaware effects have been found for younger students as well (e.g., Finn and Metcalfe 2014; Labuhn et al. 2010). At the same time, however, students and younger children are not completely unaware of their performance. Although research shows a clear age trend in metacognitive awareness (Roebers and Schneider 2005; Schneider 2008), from the age of 8, children show some metacognitive awareness on simple tasks (De Bruin et al. 2011; Lyons and Ghetti 2011; Roebers 2017, see also Schneider 2008). However, when tasks are more complicated and metacognitive skills are more advanced, for example because students have to show transfer skills or when they have to recognize whether they guessed or not, age trends reappear (Hacker 1997; Lockl and Schneider 2002; Roebers 2014; Van der Stel and Veenman 2010; Weil et al. 2013). In such situations, the basic metacognitive skills need to be fine-tuned which continues to happen in adolescence (Paulus et al. 2014), after which it plateaus in adulthood (Roebers 2014; Weil et al. 2013). Providing SOJs that align to the accuracy of one’s estimate (i.e., low confidence for inaccurate estimate, high confidence for accurate estimate) is a more complex form of metacognitive awareness, especially when the tasks on which such judgements have to be made consist of problem-solving. It is therefore the question whether findings on university students’ subjective confidence can be generalized to younger secondary school students. Alternatively, secondary school students may be less metacognitively aware, and show less alignment between their subjective confidence and actual calibration accuracy.

Present study

The current study thus aimed to answer two research questions: (1) are SOJs related to calibration accuracy, independent from performance level?; and (2) does the negative relation between SOJs and miscalibration hold for both university students and secondary school students?

To investigate the relation between subjective confidence and calibration accuracy, we conducted two studies in which students were asked to estimate their exam grade and to provide a confidence score of their estimate (subjective confidence as measured by a SOJ) after taking their exam. In the first study, we examined the alignment between SOJs and calibration accuracy among university students who took an exam in educational psychology (cf. Miller and Geraci 2011). The second study was conducted among secondary school students who took exams in three different subjects: French, German, and Mathematics.Footnote 1 For each exam, we investigated the alignment between calibration accuracy and SOJs.

Based on Miller and Geraci, we expected a significant negative relation between SOJs and calibration accuracy. That is, higher miscalibration would be associated to lower confidence. Furthermore, because developmental differences might affect metacognitive awareness, we expected that secondary school students would be less metacognitively aware and thus the relation between calibration accuracy and SOJs would be less strong in this sample.

Finally, we checked whether we could replicate the association between performance level and SOJ that Miller and Geraci (2011) found. Similar to Miller and Geraci, we divided our students into different quartiles based on performance level, to examine the differences in SOJs among high and low performers. Based on the assumption that low performers indeed calibrate worse than high performers (cf. the unskilled-and-unaware effect), we expected low performers to also provide lower confidence scores.

Method

Participants

Two-hundred-and-ninety-four second-year psychology students and 388 secondary school students from the same urban area in the Netherlands were recruited for this study. Confidentiality of the performance estimates was ensured to all students, and they all provided informed consent that their estimates and grades could be used for this study. Ethical approval was obtained for this study by the Ethical Committee of the Department of Psychology, Education, and Child Studies at our university. Two-hundred-and-fifty-one university students provided informed consent (85.37% response rate). The secondary school students provided informed consent together with their parents, leading to a final sample of 302 students (77.84% response rate).

Materials

Exams

We used students’ regular exams to measure calibration accuracy. The university students had an end-of-term exam in educational psychology, comparable to the exams used in most research on calibration accuracy with university students (Bol and Hacker 2001; Hacker et al. 2000; Miller and Geraci 2011; Nietfeld et al. 2006). During their academic year in which they took 8 exams, they could resit a maximum of 2 exams. The secondary school students had regular end of term exams in French, German, and Math. None of the students were allowed to resit unless they were sick when the exam was administered.

Procedure

The procedure was the same for all students. After taking their exam, students received a form on which they estimated the grade they thought they would obtain on a 10-point scale with decimal points (i.e., most common scoring scale in the Netherlands with 10 representing the highest possible score and 1 the lowest). Students also gave a SOJ by indicating the confidence they had in their performance estimate on a five-point scale, ranging from not confident to highly confident (cf. Händel and Fritzsche 2016; Miller and Geraci 2011).

Analyses

Calculating measurements

Calibration accuracy was calculated by taking the absolute difference between estimated exam grade and actual exam grade (Dunlosky and Thiede 2013; Schraw 2009). This means that the higher the difference between the estimated and actual grade, the higher the miscalibration. We also calculated bias scores to investigate the direction of the miscalibration (i.e., over- or underconfidence). Bias scores consisted of the difference between estimated grade and actual grade (Dunlosky and Thiede 2013; Schraw 2009).

Statistical analyses

Our research question was whether subjective confidence was aligned with calibration accuracy scores. To test this question, we investigated Pearson correlations between SOJs and calibration accuracy for each exam. As a check, we also investigated whether we could replicate the findings of Miller and Geraci (2011). To do so, we divided the university students into four quartiles based on their exam performance. We tested whether SOJs differed between low performers and high performers (cf. Miller and Geraci 2011) using an ANOVA with SOJ as dependent variable and performance level as independent variable.

Results

Table 1 shows the descriptives of the variables under study: Bias scores, calibration accuracy, SOJs, and final exam grades. In the following sections, our hypotheses on calibration accuracy are tested. In all our analyses, a significance level of .05 was used.

As a first check, we examined whether we could replicate the findings of Miller and Geraci (2011), by investigating whether possible differences in subjective confidence within our samples could be explained by performance level differences. Table 2 presents the means and standard deviations of the four different performance level groups on calibration accuracy, bias, SOJs, and performance, both for university students on their Educational Psychology exam and for secondary school students for their French, Math and German exam. We performed an ANOVA with SOJ as dependent variable and performance level (i.e., the performance quartiles) as independent variable (cf. Miller and Geraci 2011). Results were not significant: subjective confidence ratings did not differ between the performance quartiles for the university students, F(3, 251) = 0.07, MSE = 0.05, p = .978, ηp2 = .001, nor for the secondary school students on the courses French, F(3, 291) = 1.11, MSE = 0.77, p = .345, ηp2 = .011, Math, F(3, 287) = 0.42, MSE = 0.82, p = .741, ηp2 = .004, and German F(3, 298) = 0.44, MSE = 0.56, p = .725, ηp2 = .004. This indicates that, contrary to Miller and Geraci’s findings, subjective confidence scores in our samples did not differ between high and low performing students.

Relation between calibration accuracy and second-order judgements among university students

More importantly, however, we examined the direct relation between calibration accuracy and SOJs, irrespective of the level of performance. Results among university students showed a significant negative correlation between subjective confidence (SOJ) and calibration accuracy r = −.197, p = .002, confirming our hypothesis. Figure 1 shows that less confident students had higher miscalibration, and vice versa: confident students were shown to be better calibrated.

Fig. 1
figure 1

The negative relation between second-order judgements and miscalibration among university students. Low confidence judgements are related to high miscalibration, whereas high confidence judgements are related to lower miscalibration

Relation between calibration accuracy and second-order judgements among secondary school students

Thirdly, we tested whether the direct relation between calibration accuracy and SOJs also holds among secondary school students. In contrast to university students, results showed no significant correlation between SOJs and calibration accuracy among secondary school students for French r = −.029, p = .654; Math r = .036, p = .539; and German r = .059, p = .304. Hence, confirming our expectation, secondary school students’ calibration accuracy and SOJs did not seem to be directly related: more miscalibration was not accompanied by lower confidence scores.

Discussion

The current study focused on the relation between calibration accuracy and subjective confidence (as measured by SOJs) among university and secondary school students. We tested the hypothesis that students, regardless of their performance level, are metacognitively aware of their calibration accuracy. To do so, we examined the correlations between subjective confidence and miscalibration. Furthermore, we tested whether a relation between subjective confidence and miscalibration could be found both among university students and secondary school students.

Calibration accuracy and SOJs among university students

In support of our expectations, university students’ subjective confidence is directly related to calibration accuracy, meaning that their confidence judgements are somewhat aligned to the actual quality of their performance estimates. Our results extend previous research findings on subjective confidence (Händel and Fritzsche 2016; Miller and Geraci 2011). While the studies of Miller and Geraci (2011) and Händel and Fritzsche (2016) were initially aimed at investigating subjective confidence among different performance level groups only, our study shows that their findings generalize to students with poor calibration accuracy in general. That is, the current study shows that students assign less confidence to their judgements when they are less well calibrated. Hence, regardless of performance level, students show some awareness of their calibration accuracy by providing high or low SOJs. This indicates that metacognitive awareness as measured by SOJs does not (only) seem to depend on performance level, as can be concluded from the findings from Miller and Geraci and Händel and Fritzsche, but rather seems to hinge on calibration accuracy.

Supporting the reasoning above that SOJs are more dependent on calibration accuracy than on performance level per se, we do not find any differences in SOJs between different performance quartiles in our sample. This is in contrast to prior findings by Miller and Geraci (2011) and Händel and Fritzsche (2013, 2016). A possible explanation may be that in our study, the relation between performance level and calibration accuracy is different from previous studies. Although we find that on average, high performers are underconfident while low performers are (slightly) overconfident, which is conform a typical unskilled-and-unaware effect, our results show that overall, underconfidence is a larger ‘problem’ than overconfidence: 3 out of the 4 performance quartile groups show underconfidence. Furthermore, unusual for calibration research, pairwise comparisons even show that the 25% best performing students calibrate slightly worse (M = 1.70, SD = 0.80) than the 25% lowest performing students (M = 0.79, SD = 0.54), p < .001. The current study could therefore not support an unskilled-and-unaware effect. In itself, this supports the notion that having a poor performance does not necessarily make you unaware of this performance: those who showed the poorest performance, did not show the poorest calibration accuracy.

It is yet unclear why our results differ from previous studies that show clear unskilled-and-unaware effects. A possible reason for our findings could be that in many prior studies, predictions were examined (e.g., Bol et al. 2005; Foster et al. 2017; Miller and Geraci 2011) whereas in our study, postdictions were examined. When postdicting performance, students can make use of both theory-based cues (prior performance on exams) and experienced-based cues (how easy the student experienced the exam, see also Händel and Bukowski 2019). When predicting their performance, students do not have any experience-based cues and can only rely on theory-based cues. Consequently, findings show that the relation between confidence judgements and actual performance outcomes is lower for predictions than for postdictions (Siedlecka et al. 2016, 2019). It is possible that when postdicting their performance, lower performing students have less difficulty judging their performance because they have access to experience cues (i.e., they found the exam difficult). As a consequence, the difference between groups of students and how well they can judge their performance may disappear. Yet, there are a few studies that examined the unskilled-and-unaware effect with postdictions as well, and they still found an unskilled-and-unaware effect (Hacker et al. 2000; Shake and Shulley 2014). Furthermore, even when cues are available, low performing students usually have more difficulty using valid cues when judging their performance (Gutierrez de Blume et al. 2017; Thiede et al. 2010). So, although the differences between postdictions and predictions may explain some of the current findings, the difference does not appear to give a definitive answer to the question why students did not show an unskilled-and-unaware effect. Future research that more directly measures students’ cue use during pre- and postdictions can shed light on causes for the unskilled-unaware effect.

The purpose of the current study was to examine whether there was a relation between confidence and calibration accuracy. Our data confirm our hypothesis that confidence level is directly related to calibration accuracy. This further strengthens the generalizability of the idea that university students base their confidence judgements on the accuracy of their performance estimate. Again, the reason for doing so likely originated from their cue-use (Händel and Bukowski 2019; Koriat 1997). For example, when judging their exam performance and noting that they had found the exam very difficult, this experience can be taken into account when providing a SOJ. Furthermore, when students have the feeling that they miss certain cues to make an accurate judgement, they can also assign little confidence to their estimate. In other words, students can be aware that their estimate may be incorrect and hence assign lower confidence to them. This can also work vice versa. When students recognize some important cues (i.e., recognize certain tasks from their practice) they used when judging their performance, their confidence in this judgement is likely to be higher.

Calibration accuracy and SOJs among secondary school students

Whereas university students’ subjective confidence is related to calibration accuracy, the secondary school students’ confidence judgements do not relate to their actual calibration accuracy. This finding seems to be rather robust, as we do not find a relationship in either of the three exams (French, German, and Math). So it seems that our findings with university students do not straightforwardly generalize to secondary school students. Furthermore, this finding supports the notion that metacognitive awareness in secondary school is still under development, especially for more complex tasks and metacognitive judgements (Hacker 1997; Lockl and Schneider 2002; Roebers 2014; Van der Stel and Veenman 2010; Weil et al. 2013). Whereas university students seem to have some metacognitive awareness as they assign little confidence to incorrect estimates, secondary school students seem less able to recognize poor calibration accuracy. This also has an interesting practical implication: when the aim is to improve students’ calibration accuracy, secondary school students may first need to be stimulated to reflect on the quality of their performance estimates. Helping secondary school students to become more aware of the poor alignment between their confidence scores and their calibration accuracy, can help them to improve their overall performance monitoring. Interventions that aim to enhance calibration accuracy in secondary school may therefore focus not only on functional confidence, but on subjective confidence as well.

Limitations and future directions

Our study provides new insight in the relation between calibration accuracy and SOJs, both among university and secondary school students. However, there are several limitations to our study as well. Although we find differences in the relation between confidence judgements and calibration accuracy, direct comparisons between the two samples are confounded by many variables. For example, in university, students’ exam grades decide whether they will pass or fail the course and can thus be considered a high-stake exam. In secondary school, the grades add up to one another to form a grand mean at the end of the year. At the same time, however, university students can do a resit whereas secondary school students cannot. It is possible that such differences influence the judgements students make, and hence, their subjective confidence scores. In the current study, we cannot rule out this possibility because we used the exams naturally present in the classrooms. To get more insight in this topic and to enable better comparison between secondary and university students, future research should further examine the role of the task and its weighing on the relation between SOJs and calibration accuracy.

That we do not find significant differences in subjective confidence scores among students of different performance quartiles also contradicts the recent finding that low performers have a general tendency to provide lower confidence judgements than high performers (Fritzsche et al. 2018). It is important to note that our study differs from the study of Fritzsche et al. (2018) in several aspects. Perhaps most importantly, we ask students their exam grade while Fritzsche et al. asked students to estimate their performance while solving multiple choice questions. Their calibration accuracy was therefore based on a dichotomous scale (correct-incorrect), whereas in our study, a continuous scale was used. To facilitate comparison between studies, future research should focus on the question whether differences in scaling impact confidence judgements and calibration accuracy.

Conclusion

To conclude, this study shows that university students’ calibration scores are related to their second-order judgements: independent from performance level, students with poor calibration accuracy show less confidence in their estimates and vice versa. Secondary school students, however, provide confidence judgements that are not related to their actual calibration accuracy. Hence, to further understand metacognitive awareness, we encourage future research to take into account measures of both functional and subjective confidence (SOJs). Furthermore, as younger students show less metacognitive awareness, they may benefit less from an intervention that only targets their functional overconfidence, but might instead flourish under an intervention that also aims to align their subjective confidence.