A meta-analysis and critical review of metacognitive accuracy in autism

Metacognition refers to cognitions about our own cognitions. In recent years, there has been a concerted effort to examine metacognition among autistic people. The results from these studies have produced a mixed picture, with some concluding that autistic people are just as accurate as typically developing people in judging their own cognitions and others providing evidence of reduced accuracy. The aim of this meta-analysis is to amalgamate this research to obtain a clearer picture of the evidence to date. A total of 17 studies comparing 412 individuals diagnosed with autism and 453 typically developing individuals were included in the meta-analysis. The data revealed a moderate, but heterogeneous, reduction in metacognitive accuracy among autistic individuals in comparison with non-autistic individuals. A critical review of the results suggested that, despite the overall reduction in metacognitive accuracy, performance was not universally diminished among autistic participants across studies. Accuracy may be undiminished on certain types of metacognitive task. Moreover, across all tasks, there was moderate difference in metacognitive accuracy between autistic and non-autistic children, but only a small difference in metacognitive accuracy between autistic and non-autistic adults. Lay Abstract The ability to make accurate judgements about our own and others’ mental states has been widely researched; however, it is unclear how these two abilities relate to each other. This is important given that there is evidence that autistic individuals can have difficulty with accurately judging others’ mental states. Recent evidence suggests that some autistic individuals may also have difficulty accurately judging their own mental states. This may have an impact on various aspects of everyday life but particularly academic success, and therefore it is important that this skill is not overlooked when exploring areas of individual support. The aim of this article is to bring together the research examining autistic individual’s ability of making accurate judgements about their own mental states and to establish whether this is an area that warrants further investigation. The results from this article show that autistic individuals may have difficulty making accurate judgements about their own mental states, although this depends on the type of judgement being made. It also highlighted that while autistic children may have difficulties in some areas, these may improve by adulthood. Overall, this article shows that more research is needed to fully understand where specific difficulties lie and how they may be overcome.


Introduction
Metacognition refers to cognitions about our own mental states (Flavell, 1979). It is crucial for how we live our lives and underpins how we make sense of, predict and control our actions. It plays a key role in learning and decisionmaking and predicts academic performance independently of general intelligence (Dunlosky & Metcalfe, 2009). Thus, difficulties with metacognition are likely to have significant implications for everyday functioning. As such, understanding metacognition and its processes is key to supporting individuals with diminished metacognitive abilities.
Metacognition can be divided into three key components: metacognitive knowledge, metacognitive monitoring and metacognitive skills (Flavell, 1979). All these aspects are important for everyday life, but the majority of research into metacognition has focussed on meta-monitoring, in part because metacognitive skill and control depend on monitoring. Metacognitive monitoring is how we represent occurrence of ongoing cognitive activity, such as judging how confident we are that we have learned all we need to pass an exam.
Metacognitive monitoring can be measured using a variety of methods, with the most commonly used methods being judgements-of-confidence, judgements-of-learning, feeling-of-knowing and judgements-of-performance. These are all explicit verbal measures of current mental states. Each of these methods requires participants to complete an object-level task and to make a metacognitive judgement about their performance on that task. Crucially, the association between the participant's meta-level judgement and their actual performance on the object-level task is used as an indicator of metacognitive accuracy. This is often done by calculating a gamma correlation (Kruskal & Goodman, 1954), with scores ranging from −1 to +1, where scores of 0 indicate chance-level accuracy and large positive scores indicate good metacognitive accuracy.
Judgement-of-learning tasks require participants to learn something and then rate how likely it is that they have learnt the target information (meta-level judgement). Following this, participants complete a test of what they have learnt (object-level task). For instance, participants may be asked to learn a list of word-pairs, a cue word and a target word. Participants then rate how likely it is that they will be able to recall each target word during a followup test. Subsequently, participants are presented with the cue word from the previously studied word-pairs and asked to recall the target word. The association between the participant's judgement-of-learning for each item and their actual recall for each item is then used to indicate their metacognitive accuracy.
Feeling-of-knowing tasks require participants to complete a cognitive task, make a feeling-of-knowing judgement on any items they get incorrect and then complete another test for those items. For example, participants may be asked to learn a list of word-pairs, a cue word and a target word. They are then asked to recall the target word when presented with the cue word. For any items they get incorrect, they make a feeling-of-knowing judgement. For example, they may be asked how likely it is that they would recognise the target word if it were presented alongside a number of lure words. After making the feeling-of-knowing judgement, participants may be given a recognition test (object-level task). In this case, the association between the feeling-of-knowing judgement for each item and the recognition performance for each item indicates the level of metacognitive accuracy.
Judgements-of-confidence also require participants to complete an object-level task, such as recalling a list of previously learnt words or completing a perceptual discrimination task such as deciding which of the two images has the most dots. Immediately after each decision, participants are asked to rate how confident they are that their answer was correct. As with all previously mentioned tasks, the association between the meta-level judgement, in this case the judgement-of-confidence, for each item and the actual performance, such as a correct or incorrect discrimination/recall, signifies the accuracy of the participant's metacognition.
Judgements-of-performance tasks require participants to state how many questions they think they will get correct/incorrect after having seen an example question or rate how many they thought they got correct/incorrect after having completed the questions. The difference between the actual performance (number correct or incorrect) and the predicted/estimated performance indicates the level of metacognitive accuracy. The smaller the difference in actual performance (number correct or incorrect) and the predicted/estimated performance, the more accurate one's metacognition is considered to be.
The various ways of measuring metacognitive accuracy have highlighted that different meta-level tasks may rely on distinct processes, with evidence for neural dissociations across different meta-level tasks (Chua et al., 2009;Cosentino, 2014;Fleming & Dolan, 2012;Fleming et al., 2016;Schnyer et al., 2004). Schnyer et al. (2004), for example, found that people with damage to the right ventromedial prefrontal cortex were able to make accurate confidence judgements but had impaired feeling-ofknowing accuracy. Evidence also indicates a dissociation between types of metacognitive awareness among people with Alzheimer's disease (Cosentino, 2014;Rosen et al., 2014). Rosen et al. (2014), for example, found that feeling-of-knowing and judgements-of-confidence were impaired among people with Alzheimer's disease but judgements-of-learning were intact.
In addition to the potential dissociation between types of metacognitive awareness, there is also a rich history of theorising about a potential link between metacognitive monitoring and mindreading. Mindreading (also known as mentalizing) is the ability to represent the mental states of others. Some theories (one-system theory; Carruthers, 2009;Gopnik, 1990;Perner, 1991) predict that if mindreading is impaired, then metacognition will also be impaired, whereas others (simulation and two-system theories; Goldman, 2006;Nichols & Stich, 2003) predict that it is possible to have intact metacognition despite having impaired mindreading.
These clinical and theoretical issues make it important to study metacognition in neurodevelopmental conditions, especially in conditions that involve difficulties with mindreading. Arguably, the condition most clearly associated with mindreading difficulties is autism. Autism is a developmental condition diagnosed on the basis of restricted and repetitive behaviours, and social-communication difficulties (American Psychiatric Association, 2013, World Health Organization, 2018. It has been widely established for over 30 years that autistic individuals have difficulty representing mental states in others (e.g. Happé & Frith, 1995). Only relatively recently, however, have researchers started to use the gold-standard measures of metacognition described above to explore representation of own mental states among autistic individuals. This is despite many theorists hypothesising that metacognitive monitoring would be impaired in autistic people (see Williams, 2010, for a discussion of these hypotheses and the evidence on which they were based). Results from these studies have been mixed, however (see Table 1), with 11 suggesting a diminution of (at least some aspects of) metacognitive monitoring among autistic participants and 6 suggesting metacognitive monitoring is unimpaired in this population.
There are many potential reasons for this mixed pattern of findings. First, many of the studies (as is typical of case-control studies of autism) have relatively low statistical power to detect even moderate-sized between-group differences in meta-monitoring accuracy. This makes individual studies prone to making type I errors even when all other aspects of the study are well-designed and well-controlled. Meta-analyses overcome this potential concern by pooling effect sizes to gain a more reliable indicator of the true size of any between-group differences in a dependent variable. Second, it may be that only some aspects of metacognition are impaired in autism and that between-group differences are apparent only on specific tasks. Thus, in addition to examining metacognitive accuracy as a whole, it is also important to examine different types of metalevel tasks among autistic people to establish whether metacognition is globally impaired or task-specific. Again, the meta-analysis can help address this question by investigating the extent to which the type of metacognitive measure moderates the overall pattern of findings. Third, there may be developmental effects on the extent to which metacognition is impaired in autistic people. It could be that impairments are apparent in autistic children but resolve (or are compensated for) by the time these children reach adulthood. In that case, previous studies of autistic children would tend to observe between-group differences, whereas studies of autistic adults would not. Alternatively, it may be that metacognitive difficulties emerge over the course of development among autistic people. In that case, previous studies of autistic adults would tend to observe between-group differences, whereas studies of autistic children would not. The current meta-analysis can help address this by exploring whether the developmental status of samples (child vs adult) is a moderator of study results.
Due to the complexity and variation across the studies included in the meta-analysis, this article also presents a critical review of the research, taking account of key issues when examining such research, including matching groups on key variable (age, gender, IQ and object-level performance), the age group being examined (children or adults), and the meta-level task being employed. This will enable a detailed exploration of the existing literature beyond the insight gained through the meta-analysis alone.

Eligibility criteria
The following eligibility criterion was set out prior to conducting the literature search. To be eligible, the studies should have examined individuals of any age (children and/or adults) diagnosed with autism in comparison with a typically developing group. The tasks within the studies had to involve explicit verbal measures of current mental states as described above (e.g. Judgements-of-Learning, Feeling-of-Knowing, Judgements-of-Confidence, and Judgements-of-Performance). It was also crucial that the tasks did not involve any aspect that could result in improved metacognitive performance, such as training. Articles were excluded if they did not fit these criteria, were written in a language other than English, did not provide novel data or did not provide sufficient quantitative data to calculate effect sizes in the form of Hedges' g (e.g. means, standard deviations, p-values, t-tests).

Database search
A literature search (see Figure 1) was conducted using Web of Science, PubMed and PsycInfo using the search terms 'autism' AND 'metacognition' for all articles published prior to April 2021, resulting in a total of 675 articles (Web of Science = 83; PubMed = 84; PsycInfo = 508).
Of these, 106 were duplicates, 31 were in a language other than English and 501 were excluded because they either examined something other than metacognition in autism in comparison with a typically developing sample or they were reviews that did not provide any novel data of their own. Of the remaining 37 articles, 15 used a questionnaire to measure metacognition, one examined metacognitive knowledge rather than accuracy , one examined metacognitive control  and one examined metacognitive accuracy using non-verbal measures (Carpenter et al., 2019). Five articles did not provide the data required to calculate an effect size. The authors of the current meta-analysis attempted to contact the corresponding authors for each of these studies, two of whom provided the data required (Doenyas et al., 2019;Maras et al., 2019). It was not possible to make contact with the corresponding authors of the remaining three studies, and therefore they could not be included in the current metaanalysis (Brosnan et al., 2016;Wilkinson et al., 2010;Zalla et al., 2015). This left 16 articles that used explicit verbal measures of metacognitive accuracy among individuals diagnosed with autism. There was an additional study that did not come up in the literature search, but the authors of the current meta-analysis were aware of (Wojcik et al., 2011). This resulted in a final sample of 17 independent studies (see Table 2).

Articles identified through database search (n = 675)
Additional articles authors were aware of (n = 1)

Duplicates removed (n =106)
Articles screened (n = 570) Articles excluded (n = 553) Reason for exclusion: x Articles in a language other than English removed x Articles not providing novel data or did not examine metacognition in ASD in comparison to a typically developing sample x Used a questionnaire to measure meta-cognition x Examined metacognitive knowledge x Examined metacognitive control x Used non-verbal measure of metacognition x Unable to calculate effect sizes with data provided  Effect sizes for these studies were reversed due to lower scores indicating higher metacognitive accuracy.

Data extraction and management
Consistent with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) (Moher et al., 2009), key data were extracted from each study and entered into excel. These data included: study characteristics (e.g. sample size, gender, full-scale IQ (FSIQ)), object-level and meta-level tasks, and effect size data (e.g. means and standard deviations for metacognitive accuracy). Specific data on socioeconomic status and educational attainment levels were not recorded. To provide a check that data was extracted accurately, another researcher extracted data from a random 35% (six) of studies included in the meta-analysis. No errors or differences between the researchers were found. Cohen's d effect sizes were calculated using sample size, means and standard deviations of metacognitive accuracy for each group using the Practical Meta-Analysis Effect Size Calculator (Wilson, 2020). In cases where sample sizes, means and/or standard deviations were unavailable, t-values were used to calculate Cohen's d effect sizes. Cohen's d was then converted into Hedges' g. This was to correct for any bias as a result of small sample sizes (N < 20). Like Cohen's d, Hedges' g is based on the standardised mean difference, and a value ⩾0.8 can be interpreted as a large effect, ⩾0.5 as a medium effect and ⩾0.2 as a small effect (Cohen, 1969;Hedges, 1981).
Some studies included more than one experiment/ condition, and therefore they also provided more than one effect size. In some cases, these effect sizes were derived from the same participant group. When this was the case, an average effect size for between-group differences in metacognitive accuracy was calculated, and this average effect size was then included in the meta-analysis. This is a standard procedure to manage multiple dependent effect sizes and takes into account the issues relating to using multiple dependent effect sizes within one meta-analysis (see Borenstein et al., 2009). In cases where the effect sizes were derived from different participant groups, these were kept separate, and therefore some studies have multiple independent effect sizes included in the current meta-analysis. Overall, this approach produced 20 effect sizes that derived from experiments that fitted the inclusion criteria outlined above. Table 2 shows the key data and effect sizes for each of these experiments.
Following this initial data collection, effect sizes (Hedges' g) and sample sizes were entered into the software package Meta-Essentials (Suurmond et al., 2017), the results of which are presented below.

Community involvement
There is no community involved in this meta-analysis or critical review.

Results
A total of 412 autistic and 453 typically developing participants were included in the meta-analysis, and a randomeffects model was used. The weighted effect size for the between-group difference in meta-monitoring ability was -0.47 (SE = 0.13, 95% confidence interval (CI) = −0.75 to -0.20) and statistically significant (z = −3.57, p < 0.01). This suggests a moderate impairment of metacognitive accuracy among the autism groups in comparison with the typically developing groups. However, the homogeneity test was significant (Q = 61.06, p < 0.001), indicating that the variance across the effect sizes was greater than expected by sampling error. This suggests that there was a large range of effect sizes, and so it is possible that breaking the studies down into subgroups may be more appropriate than examining them as a whole. I² was also large (68.88%) supporting the need for further subgroup/moderation analysis. The effect sizes and accompanying CIs are presented in Figure 2. 1 Values below zero indicate that the typically developing group performed more accurately than the autism group.
One explanation for the heterogeneity is that deficits in metacognitive accuracy are domain-specific, rather than domain-general. Evidence indicates that different metalevel tasks rely on distinct processes and neural dissociations have been found across different meta-level tasks (Chua et al., 2009;Cosentino, 2014;Fleming & Dolan, 2012;Fleming et al., 2016;Schnyer et al., 2004). Given the variety of meta-level tasks used across the studies, we performed a subgroup analysis based on meta-level task. The results showed that the weighted effect size for the between-group difference in metacognitive accuracy on judgements-of-confidence tasks was -0.45 (95%CI = −0.71 to -0.20), whereas it was 0.01 (95%CI = −0.84 to 0.85) for judgements-of-learning. The homogeneity tests for both these subgroups were significant (Q = 20.47, p = 0.04 and Q = 20.57, p < 0.001, respectively). To make sense of the heterogeneity, these studies will be examined in more detail in Part 2 of this article. We will also examine the feeling-of-knowing and judgements-of-performance studies, of which there were too few to interpret meaningfully from the subgroup analysis, although for context the weighted effect sizes were -0.80 (95% CI = -1.09 to -0.50) for feelings-of-knowing and -1.10 (95%CI = -1.57 to -0.62) for judgements-of-performance, both of which were homogeneous (Q = 0.39, p = 0.53 and Q = 1.27, p = 0.26, respectively).
Another reason for the heterogeneity in the initial metaanalysis could be that it combined studies of children and adults. Metacognition has been shown to have a developmental link (Weil et al., 2013); therefore, it makes sense to examine the adult and children studies independently. Out of the 20 effect sizes, 13 derived from experiments examining metacognitive accuracy in children, with the remaining seven using adult participants. Subgroup analysis showed that the weighted effect size for the between-group differences in adults was -0.27 (95% CI = −0.62 to 0.08) and for children it was -0.59 (95% CI, -0.93 to -0.24), indicating that the deficit in metacognitive accuracy is twice as large among children as among adults. Possible reasons for this apparent developmental difference will be examined in Part 2. The homogeneity test for both adults and children was significant (Q = 13.04, p = 0.04 and Q = 48.89, p < 0.001, respectively). I² was also large for both these groups, suggesting further subgroup analysis would be useful, perhaps in terms of examining it across different domains. However, due to the relatively small number of studies, it would not be valid to break the analysis down any further.
Overall, these results show that there appears to be a difference in metacognitive accuracy, with most effect sizes indicating that the autism groups have poorer performance than the typically developing groups, albeit with a wide range of effect sizes. This meta-analysis makes a valuable contribution to the literature and has relevance for both theory development and clinical practice. Nevertheless, the result of a meta-analysis is only as valid and reliable as the results from the studies that comprise the analysis. Certainly, there are several issues that require consideration when interpreting case-control studies of metacognition and so Part 2 presents a critical review of the studies included in the meta-analysis.

Methodological and conceptual issues in the study of metacognitive accuracy in autism
Group matching. To draw firm conclusions regarding differences in group performances, it is important that groups are matched for key abilities/characteristics that are likely to relate to the dependent variable (i.e. metacognitive accuracy). Without matching, it is not possible to say with certainty if any between-group differences in metacognitive accuracy are the result of true differences due to diagnostic status or just down to differences related to the extraneous variables. Failing to match groups on key characteristics makes type 1 errors more likely and should be avoided. To consider groups as equated, it has been   (Palmer et al., 2014), gender (Weil et al., 2013) and IQ (Ohtani & Hisasaka, 2018) all relate to metacognitive ability. Therefore, it is important that groups are matched on these aspects when examining metacognition among individuals diagnosed with autism in comparison with typically developing individuals. Examination of the studies included in the meta-analysis reveals that 13 out of the 20 effect sizes were derived from experiments that matched groups on chronological age, gender and IQ (see Tables 2 and 3). All except 5 of these 13 studies indicated that the autism group had diminished metacognitive accuracy, with a moderate to large effect (-1.43 to -0.65). Out of the remaining five, four indicated little difference in accuracy with effect sizes ranging from -0.25 to 0.38 (Cooper et al., 2016;Grainger et al., 2016a, experiments 1 and 2; Maras et al., 2020, online condition), and one indicated that the autism group performed better than the typically developing group with a large effect (0.96; Wojcik et al., 2014 Experiment 2). Of the four that indicated little between-group difference, two used judgements-of-learning as a measure of metacognition (Grainger et al., 2016a, Experiments 1 and 2). It is therefore possible that this type of meta-level judgement is undiminished among autistic individuals. The other two experiments examined judgements-of-confidence in adults (Cooper et al., 2016;Maras et al., 2020, online condition), which may indicate that this type of meta-level judgement is undiminished among autistic adults. Both these issues will be explored in more detail below.
Object-level task performance. In addition to ensuring that participants are matched on background characteristics, it is also important that groups are matched on their objectlevel performance (Schwartz & Metcalfe, 1994). This is because the object-level performance is involved in the computation of metacognitive accuracy. Therefore, when differences in object-level performance are taken into account, it can eliminate group differences in meta-level performance (Connor et al., 1997). Out of the 20 effect sizes, 15 derived from participants matched on objectlevel performance (Cohen's d < 0.50). Of these 15 effect sizes, 10 showed a difference in metacognitive accuracy with an effect size ⩾-0.41; the majority of these also matched for the key characteristics discussed above (see Tables 2 and 3). Of the remaining five, four showed very little difference in metacognitive accuracy and one indicated that the autism group performed better than the typically developing group. Overall, this suggests that even when we exclude studies that fail to match on object-level performance, there continues to be a deficit in metacognitive accuracy in comparison with their typically developing counterparts.
Type of meta-level task. Another factor that requires consideration is the type of meta-level task used as a measure of metacognitive accuracy. Examining the outcomes from different meta-level tasks allows us to get a better understanding of the metacognitive profile of autistic individuals, and it allows us to see whether any deficit in metacognitive accuracy is domain-general or domainspecific. Given the evidence that indicates some distinct processes in different meta-level tasks (Fleming et al., 2016), it is possible that autistic individuals may be impaired in some meta-level tasks but not others.
The majority of the studies included in the main metaanalysis examined judgements-of-confidence. Thus, we conducted a subgroup analysis splitting the effect sizes up based on meta-level task. This analysis revealed that even when excluding other meta-level tasks, there remained a moderate difference in accuracy as measured by judgements-of-confidence, with a wide range of effect sizes. Looking at the judgements-of-confidence studies individually, five examined judgements-of-confidence in adults, with all except one  showing little difference in between-group metacognitive accuracy. The remaining seven effect sizes came from children/adolescent studies; of these, six showed an effect size ⩾-0.40 for the between-group difference in judgements-of-confidence accuracy. This suggests that while judgements-ofconfidence accuracy appears to be diminished among autistic children, this difficulty may have resolved by adulthood. To be sure that this is not an artefact of the tasks, further research is required to establish whether these meta-level tasks can be generalised between adults and children.
Turning our attention to the remaining studies, we can see that autistic children also appear to struggle with making global judgements about their cognitive-level performance. The two studies that examined judgements-of-performance found large effect sizes among children (-0.90 and -1.39). To date, no study has examined judgements-of-performance in autistic adults. Feeling-of-knowing, however, appears to be diminished among both autistic adults and autistic children. Thus far, the two studies examining feeling-ofknowing (one in adults and one in children) have found metacognitive accuracy to be impaired with a moderate-tolarge effect (-0.95 and -0.65, respectively). Judgements-oflearning accuracy, however, appears to be undiminished in autism. Out of the four experiments that examined judgements-of-learning, none reported a significant difference between groups.
Overall, this suggests that autistic individuals do appear to have diminished metacognitive accuracy across a variety of meta-level judgements, including confidence judgements, feeling-of-knowing judgements, and judgements-of-performance. This contrasts with judgementsof-learning accuracy, for which there is no reason to suspect any diminution in accuracy. There does, however,  appear to be some distinction between autistic adults' and autistic children's metacognitive accuracy, and therefore this will be explored in more detail in the next section.
Child versus adult metacognitive performance. Further inspection of the studies included in the meta-analysis revealed that 13 out of the 20 effect sizes were derived from experiments involving children and/or adolescents, with the remaining seven using adult participants. As can be seen from the subgroup analysis, when these are broken down it seems there is a small (-0.27) difference in metacognitive accuracy between autistic and typically developing adults, but a moderate (-0.59) difference between autistic and typically developing children.
Focusing on the effect sizes from the adult studies, five out of seven showed a small-to-moderate between-group difference (-0.44 to 0.38), four of which examined judgements-of-confidence and one of which examined judgements-of-learning. The remaining two effect sizes were moderate to large (-0.66 and -0.95), and in both cases the typically developing group were more accurate than the autism group, one of which examined feeling-of-knowing (Grainger et al., 2014) and one which examined judgements-of-confidence . Overall, this suggests that autistic adults make accurate judgements-of-confidence, with four out of the five studies employing judgements-of-confidence finding it to be intact. Judgements-of-learning also appear to be intact among autistic adults, but feeling-of-knowing appears to be impaired. Given that only one adult study looked at feeling-of-knowing and only one looked at judgementsof-learning, additional research would be beneficial in our understanding of these meta-level judgements among autistic adults.
Turing our attention to the effect sizes that derived from the child studies, we can see that while they were variable in size, they consistently favoured the comparison groups over autism groups. Ten out of the 13 effect sizes indicated the autism groups were less accurate, with effect sizes ranging from -1.43 to -0.41. Of the remaining three, two showed little differences in metacognitive accuracy (Grainger et al., 2016a, Experiment 2); one examining judgements-of-learning, which was previously discussed, appears to be intact among autistic adults and children. The other examined judgements-of-confidence but did not match for key characteristics or object-level performance. As previously discussed, this is a key consideration when examining such research. The remaining study showed that the autism group performed better than the typically developing group on judgement-of-learning accuracy, with a large effect size (0.96; Wojcik et al., 2014, Experiment 2).
The subgroup analysis and examination of these studies more closely suggest it may be sensible to examine adults and children separately, as well as meta-level task, when drawing any conclusions regarding between-group differences in metacognitive ability. Further studies of metacognition in autistic adults would be useful to confirm the reduction in between-group difference. It is possible that the relatively clear diminution of (most aspects of) metacognition among autistic children may not persist into adulthood. One possibility is that early metacognitive impairments resolve over development in autism. An alternative possibility is that autistic adults perform relatively well on metacognitive tasks through the use of compensatory strategies and/or learning/development, despite atypical underlying metacognitive competence. Compensation is widely believed to occur among autistic people and so it is plausible that differences between autistic and neurotypical people diminish over time because of compensation. Future studies could investigate these possibilities by exploring fine-grained patterns of performance on metacognitive tasks (as well as associations with other aspects of cognition/real-world functioning), rather than focusing only on level of metacognitive accuracy per se. If compensation underpins the relatively undiminished metacognitive accuracy observed among autistic adults, then patterns of performance (e.g. at a trial-by-trial level) should still be less stable/differ from those seen among neurotypical individuals. Similarly, established links between metacognition and aspects of cognition and/or behaviour (e.g. between metacognition and general intelligence or educational achievement) should be significantly weaker among autistic than neurotypical adults. Non-verbal meta-level tasks. While this meta-analysis has focused on explicit verbal tasks, it is important to acknowledge that metacognition has been assessed using non-verbal tasks, such as post-decision wagering (PDW), uncertainty monitoring and/or gambling paradigms. These tasks are non-verbal (or involve non-verbal response modes, at least) and so tend to be used as an alternative to judgements-of-confidence tasks to assess metacognition in young human children or non-human animals (e.g. Beran et al., 2009;Martin & Santos, 2014;Persaud et al., 2007;Ruffman et al., 2001;Smith et al., 2008Smith et al., , 2014Son & Kornell, 2005). PDW follows the same structure as judgements-of-confidence tasks; only it requires the participant to place a wager on their object-level performance instead of making a confidence judgement. Similarly, the classic gambling paradigm involves making a perceptual discrimination, such as choosing the longest of two lines (the object-level task). Participants are then presented with two symbols, which are (or come to be, through trial-and-error learning) associated with high and low risk, respectively. Selection of the high-risk option following a correct perceptual discrimination results in a large reward, but a large penalty following an incorrect perceptual discrimination. Selection of the low-risk option following a correct perceptual discrimination results in only a small reward, but a small penalty following an incorrect perceptual discrimination. As such, the logic of the task is that the high-risk option corresponds to a verbal judgement of high confidence, whereas the low-risk option corresponds to a verbal judgement of low confidence.
To our knowledge, only three studies have employed these non-verbal measures in studies of metacognition in autism (Carpenter et al., 2019;Nicholson et al., 2019Nicholson et al., , 2021. None reported significant between-group differences in metacognitive accuracy, with only small effect sizes reported in each case (d = 0.25, 0.31, and 0.45, respectively). Importantly, in the studies by Nicholson et al., autistic participants showed significantly diminished metacognitive accuracy on traditional verbal judgementsof-confidence tasks, providing evidence of a dissociation between performance on verbal measures and performance on structurally equivalent non-verbal measures. While it could be argued that such findings reveal hidden underlying metacognitive competence among autistic adults and children, a high degree of caution should be taken when drawing such a conclusion, given an ongoing debate about the basis of performance on such non-verbal measures. While some argue that non-verbal tasks require meta-representation of one's own states of uncertainty, others argue (and provide evidence) that such tasks require only firstorder cognitive processing for adaptive responding to occur and thus are not necessarily metacognitive in nature (see, for example, Carruthers & Williams, 2020). Although it is beyond the scope of this article to elucidate this debate fully, it was important to acknowledge it and to describe briefly the findings from use of non-verbal paradigms among autistic people.

General discussion
This meta-analysis showed that there is a moderate, albeit heterogeneous, diminution of metacognitive accuracy among autistic individuals. This was further supported by the critical review that revealed that even when key characteristics and object-level task performance were taken into account, the majority of studies showed diminished metacognitive accuracy among autistic participants. Nevertheless, the subsequent subgroup analysis and critical review showed that the level of metacognitive accuracy may vary as a result of the meta-level task being employed. For example, there is no reason to suspect that judgements-of-learning are diminished among autistic adults or children, but there is evidence for difficulties in feeling-of-knowing judgements among both autistic adults and children.
The variation in accuracy across meta-level tasks highlights the need to explore the landscape of strengths and weaknesses in metacognitive accuracy among autistic individuals. To date, while some studies have varied the object-level task within the same participant group, few studies have yet examined different meta-level tasks within the same participant group using the same objectlevel task. Therefore, research that examines the various types of meta-level tasks within the same participant group would help expand our understanding of metacognition within autistic individuals. This may then inform the development of any future targeted intervention or training programmes.
The subgroup analysis and critical review of the individual meta-level tasks also highlighted the distinction between studies that involve adults and children. For example, when examining the studies that employed judgements-of-confidence, it appears that while autistic children may be impaired, autistic adults may in fact have intact metacognitive accuracy. Further subgroup analysis of all the meta-level tasks combined revealed that when the adult and child studies were examined separately, the diminution in metacognitive accuracy among children was moderate, but the difference between autistic and nonautistic adults was small. This was further supported by the critical review that showed that most of the child studies indicate diminished metacognitive accuracy among autistic participants in comparison with typically developing children. This was in contrast to the adult studies where the majority of them showed little difference in metacognitive performance. Overall, this suggest that while autistic children may have metacognitive difficulties in some meta-level tasks, such as judgements-of-confidence, these difficulties may resolve by adulthood.
Establishing whether the reduction in disparity is due to developmental delay or compensation is important because it may inform what strategies can successfully be employed to improve metacognitive accuracy among individuals who have difficulties with such tasks. Shedding light on successful strategies can help inform the development of effective and targeted intervention or training programmes that aim to improve metacognitive accuracy. This is important given that metacognition pervades daily life from the basic decisions we have to make every day to the level of our academic success and subsequent impact this has on life chances (Dunlosky & Metcalfe, 2009;Hartwig & Dunlosky, 2012;Veenman et al., 2006).
Overall, this meta-analysis and review highlights the complexities of examining metacognitive accuracy among autistic individuals. It shows that researchers and clinicians need to pay close attention to the specific areas of metacognition being examined as well as the characteristics of individuals they are examining. It also opens up avenues for future research in respect to the developmental trajectory of metacognitive accuracy, the profile of strengths and weaknesses, and the effective strategies used to make accurate metacognitive judgements, particularly among autistic individuals. All of this can inform how we understand metacognition from both a theoretical and clinical perspective, which is highly important given the impact that metacognitive accuracy can have on daily life.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the Economic and Social Research Council Research PhD Studentship awarded to Katie L Carpenter [Grant number ES/P00072X/1].

ORCID iD
Katie L Carpenter https://orcid.org/0000-0001-9575-6836 Note 1. When each individual effect size was entered without averaging the weighted effect size for the between-group difference in meta-monitoring ability, -0.52 (SE = 0.11, 95% confidence interval (CI) = −0.75 to -0.29) and statistically significant, z = −4.53, p < 0.001. As with the averaged effect sizes, this suggests a moderate, although increased, impairment of metacognitive accuracy among the autism groups in comparison with the typically developing groups. However, as with the averaged effect sizes, the homogeneity test was significant (Q = 141.17, p < 0.001). The effect sizes and accompanying CIs are presented in Figure 3.