Longitudinal associations between executive functions and metacognitive monitoring in 5- to 8-year-olds

The transition to elementary school is an important step in a child’s life and is associated with developmental changes in cognition and behavior. Research revealed different interacting factors regarding the child, parents, school, and sociocultural context that all seem to contribute to successful academic achievement (Morrison et al., 2010). One crucial factor is self-regulation, which has been linked to emerging academic abilities in preschool (Lonigan et al., 2017) and to academic achievement throughout elementary school (Robson et al., 2020). Additionally, self-regulation enables a child to successfully adapt his or her behavior to the classroom by paying attention, completing tasks, and following the teacher’s demands (Von Suchodoletz et al., 2009). Two important theoretical constructs that are closely related to self-regulation are executive functions (EF) and metacognition (MC), both undergoing developmental improvements across the transition to school (Hughes & Ensor, 2011; O’Leary & Sloutsky, 2019). However, little is known about EF and MC’s interplay during this transition, more precisely, whether EF might serve as an antecedent to metacognitive monitoring skills. Thus, the present study aimed to address this gap in the literature.

As mentioned above, the present study’s focus lies on two aspects of self-regulation, namely EF and MC. Both constructs are higher-order cognitive processes and can be linked to similar prefrontal brain regions (Fleming & Dolan, 2012; Moriguchi & Hiraki, 2013). On the one hand, EF are heterogeneous cognitive abilities which can de differentiated but are also correlated to some extent. They that are important for planning, problem-solving, and goal-directed behavior and are most commonly measured including the three core components inhibition, working memory, and shifting. Inhibition involves controlling attention, behavior, thoughts, and emotions to override dominant or prepotent responses. Working memory involves holding information in mind and mentally manipulating it. Shifting describes the ability to flexibly shift the focus of attention or a cognitive set (Diamond, 2013; Miyake et al., 2000).

On the other hand, MC is commonly divided into a declarative and procedural component of MC (Flavell & Wellman, 1977). While declarative MC refers to declarative knowledge about cognition, learning, and memory, procedural MC connects the object-level (task) and the meta-level and thereby evokes two distinct but intertwined components: monitoring and control. The present study’s focus lies on metacognitive monitoring, which describes the ability to reflect on mental processes by introspection (e.g., how sure am I that I remembered this picture correctly?). It entails the ability to experience different degrees of certainty and uncertainty and can be assessed by giving confidence judgments (CJ; Nelson & Narens 1990). Comparing these CJ with the actual performance allows getting information about a potential over- or underconfidence, quantified using a bias index (Schraw, 2009). Crucially, monitoring processes are known to need time (Fleming & Dolan, 2012) as information is looped through the anterior cingulate cortex (ACC) to be evaluated. Therefore, it seems important for children to take their time when making monitoring judgments, and having better EF skills might help them hesitate and reflect more thoroughly on mental processes.

Looking at EF and MC’s developmental trajectories, similar patterns can be observed with continuous improvements from early childhood through adolescence (for a review, see Roebers, 2017). However, the development during the transition to formal schooling has only been investigated separately in both research fields. Regarding EF, preschool years were found to constitute an essential period of rapid development. Early signs of inhibition and working memory can be observed in children’s first year of life (Diamond, 2006), whereas shifting, as the most complex EF component, builds upon inhibition and working memory and emerges later in development (Davidson et al., 2006; Garon et al., 2008). Inhibition seems to hold a central role among the three EF subcomponents since it has repeatedly been found that a common EF factor includes all inhibition-specific variance, whereas working memory and shifting additionally map specific abilities (Miyake & Friedman, 2012). We can also find development regarding the structure of EF over the course of the lifespan. Particularly interesting for the current study are previous findings supporting a differentiation of EF from preschool into primary school with most often finding one or two-factor models in preschool children and three-factor models among school-aged children. The one- and two-factor solutions in preschool most commonly include inhibition and working memory whereas shifting does not emerge as a separate factor until school age (Karr et al., 2018).

Regarding MC, early signs of metacognitive monitoring skills could be observed in children as young as two- to three years (e.g., Geurten & Bastin 2019) and further improvements are seen throughout preschool, kindergarten (e.g., Lyons & Ghetti 2011) and elementary school years (e.g., Bayard et al., 2021). Nevertheless, monitoring judgments are typically over-optimistic (Lipko et al., 2009; Roebers & Spiess, 2017). Even when provided with feedback and instructions, young children tend to overestimate their performance (O’Leary & Sloutsky, 2017). So, if neither feedback nor instructions help young children to improve their monitoring, it could be that instead, the ability to hesitate and reflect about one’s confidence might be a crucial factor in developing realistic performance estimation. When giving a confidence rating, time is needed to accumulate evidence to judge one’s confidence (Pleskac & Busemeyer, 2010). This assumption is supported by the finding that longer latencies of CJ are related to lower confidence ratings (Kälin & Roebers, 2020; Pleskac & Busemeyer, 2010). Together, this might suggest that hesitating and taking time leads to a more thorough monitoring. It is important to note that the period of interest in the present study is the time children need to give a CJ. This needs to be distinguished from the time children take to give a response in the memory task which is typically used within the cue utilization framework (Koriat, 1997).

Building on the assumption that hesitating and taking time leads to a more thorough monitoring, we suspected that inhibition skills might be important for children to take their time when monitoring, possibly leading to more accurate monitoring judgments. As outlined before, children need time to accumulate enough evidence to judge their confidence (Pleskac & Busemeyer, 2010) and we assume that to be able to take this time, children need inhibition. Without inhibition, children would impulsively give a CJ without properly collecting and evaluating evidence. This idea is also based on neuroscientific findings revealing that brain areas mainly in the prefrontal cortex (PFC) and ACC alert the system if ongoing information processing is needed to slow down or be adjusted. Engaging in reflection and therefore responding more slowly was linked to better EF performance, and was associated with a down-regulation of ACC activation (Espinet et al., 2013). Thus, better inhibition skills might lead to a down-regulation of ACC activation, enabling children to take their time to monitor more thoroughly, subsequently resulting in more accurate monitoring skills (i.e. less overconfidence).

Besides inhibition, the two other EF components working memory and shifting could also be related to monitoring. Regarding working memory, it is assumed that MC tasks require maintaining information in working memory while giving judgments (Efklides, 2008), and, an increase in working memory load was found to influence monitoring (Touroutoglou & Efklides, 2010). Shifting is important to flexibly shift attention and it could therefore be assumed that the flexible shifting of attention between the task at hand and the meta level could constitute a link between the two (Roebers, 2017).

To date, only a few studies have examined the association between the different EF subcomponents and monitoring in children. Bryce et al., (2015) investigated the relationship between two EF subcomponents (inhibition and working memory) and metacognitive skills in 5- and 7-year-old children. Verbal and non-verbal metacognitive skills were rated during a problem-solving task. Results showed that inhibitory control but not working memory was significantly correlated with the rate of metacognitive monitoring behavior. Interestingly, after controlling for age, the relationship between inhibition and monitoring only remained significant in the younger age group. Kälin & Roebers (2020) investigated the association between all three EF subcomponents and metacognitive monitoring in 4- to 6-year-old children. They included explicitly reported CJ as well as latencies of CJ. Inhibition was found to be significantly correlated to the latencies of monitoring judgments for correct and incorrect answers, even after controlling for age. This is consistent with the hypothesis that better inhibitory skills enable children to take their time when monitoring.

There is also a longitudinal study investigating the association between EF and metacognitive skills in 7- to 8-year-old children (Roebers et al., 2012). Metacognitive skills were measured using a spelling task and monitoring was operationalized as explicitly reported CJ for correctly spelled words. Results showed that only verbal fluency at the end of 1st grade (T1) was significantly linked to monitoring at the end of 2nd grade (T2). In contrast, all EF subcomponents at T1 were significantly correlated with metacognitive control at T2. Similarly, structural equation modeling showed that EF (as a latent variable) assessed at the first measurement point was no significant predictor of later CJ values. Thus, so far, it seems that research suggests a link between inhibition and monitoring only in younger children (4- to 6-years-olds). Additionally, the findings do not support a substantial association of working memory and shifting with monitoring.

With the current study, we specifically aimed to follow up on the findings suggesting that inhibition may be a necessary prerequisite for monitoring skills in preschool children. Thus, in contrast to the study of Roebers et al., (2012), our first measurement point was in kindergarten, before entering formal schooling, with children being 5- to 6-years old. From previous research we know that preschool years constitute an important period regarding EF development, with children showing a rapid improvement in their EF skills (Garon et al., 2008). EF skills predict school readiness and make children ready to successfully adapt to the school setting by enabling children, for example, to stay focused, resist distractions and hold information in mind (Müller et al., 2008). Thus, knowing about the importance of EF in preschool children, we were specifically interested in whether and how EF skills, in particular inhibition, in kindergarten years were connected to later monitoring skills in elementary school.

So far, we have interpreted the previous correlational findings that EF, more precisely the subcomponent inhibition, might serve as a prerequisite of monitoring. Nevertheless, we cannot rule out the possibility that the relationship between these two constructs is bidirectional. Support can be found when looking at the iterative reprocessing model of EF skills in which children’s reflection abilities are assumed to be essential for EF development (Zelazo, 2015). However, previous studies investigating if monitoring might be a predictor of EF found no significant link. In the study of Destan & Roebers (2015), neither global monitoring nor the over- and underestimation of children’s performance was significantly linked to EF in 6-years-old children. Similarly, Marulis & Nelson (2021) found no significant association between procedural MC and EF in children aged 3–5 years. Since in both studies, an integrated EF measure was used, no statement can be made regarding the specific link between inhibition and monitoring. Thus, although previous research points to the assumption of inhibition being an antecedent of monitoring and not vice versa, we will explore the possibility of a bidirectional relationship in the present study.

In summary, against the background of the above-reviewed findings, it becomes evident that a more thorough and longitudinal investigation of the association between EF and MC in children is needed and would be especially interesting during the transition from kindergarten into elementary school. Thus, the present study aimed to examine whether and how the three common EF subcomponents (inhibition, working memory, and shifting), assessed in kindergarten, were linked to later metacognitive monitoring skills assessed in 2nd grade. Furthermore, we aimed to explore a possible bidirectional association between inhibition and monitoring during this period. Findings of Kälin & Roebers (2020) indicated that better inhibition skills might help children hesitate when providing monitoring judgments and possibly enable children to engage in more thorough monitoring. Based on this assumption, we were especially interested in the EF subcomponent inhibition and hypothesized that better inhibition skills in kindergarten lead to less overconfident monitoring judgments in elementary school. In contrast, based on a lack of previous evidence, we did not expect the subcomponents working memory and shifting to be significantly related to later monitoring.

Methods

Participants

Children were recruited from nine different public kindergartens and schools in urban and rural regions of the German part of Switzerland. The final sample consisted of N = 84 children (49% female). They were predominately Caucasians and, based on parental education, they came from lower- and upper-middle-class families. Children attended kindergarten (i.e. final year of preschool education) at the first assessment (T1) and were between 5 and 6 years of age (M = 73.3 months, SD = 4.72 months). At the second assessment (T2), children attended 2nd grade and were between 7 and 8 years of age (M = 94.6 months, SD = 4.30 months). The delay between T1 and T2 ranged from 17 to 24 months (M = 21.2 months, SD = 2.56 months). The original sample at T1 consisted of N = 248 children and has been published in Kälin & Roebers, 2020. In order to keep the delay between T1 and T2 as similar as possible, only children who attended the second year of kindergarten at T1 were asked to participate again at T2 (N = 98). Only children who completed assessments at both measurement points were included in the current study. Parents of participating children gave written informed consent at T1 and T2, and children provided oral assent before testing. Of all children asked to participate at T2 (N = 98), there was a 14% dropout rate because children either did not get their parents’ consent at T2 (N = 11) or moved out of the study’s reach (N = 3). Ethical approval for the study was obtained from the Faculty’s Ethics Committee (Faculty of Human Sciences, University of Bern; Approval No. 2017-04-00006).

Procedure

All computer-based tasks were administered on a tablet (Lenovo, Yoga Tab 3 Pro) using OpenSesame (Mathôt et al., 2012). Trained experimenters individually assessed children in a quiet room at the children’s kindergarten (T1) and school (T2). At both measurement points, children were tested in three sessions, each consisting of a fixed set of tasks and lasting about 20 min. The sessions were counterbalanced. After the last session, the children were thanked for their cooperation and received a small gift. The present study is part of a larger research project and not all administered tasks were included in the following analyses.

Measures

Metacognition

At both measurement points, metacognition was measured in the context of a paired associates learning task. To avoid memory effects, the 10 to-be-learned picture pairs differed from T1 to T2. Otherwise, the task was identical. At the beginning of the task, there was a familiarization phase in which all pictures were shown on seven sequential screens (two example pictures, 20 pictures forming the 10 to-be-learned picture pairs, 20 distractor pictures). The pictures showed different objects in color (e.g. frog, car, carrot). Children were instructed to ask the experimenter if they did not know what a picture depicted and were asked to name three randomly chosen pictures on each screen. The following phases were explained to the children using the same example pictures. Learning phase: 10 non-associative picture pairs (a cue and a target) appeared on the screen for three seconds, and children were instructed to memorize them. They were told that their memory would be tested later on. Delay: a 3-minute filler task (maze) on paper. Recognition phase: Each cue picture was presented on the screen’s left side with four response alternatives, and children were asked to choose the matching picture. There was no time limit for responding. Children were instructed to guess if they did not know the answer. Monitoring phase: The cue picture and four response alternatives were presented again, but the previously chosen answer circled. Children were instructed to give CJ on a 7-point Likert scale about “how sure are you that your selected picture was the correct one.“ The scale showed a thermometer (similiar to Koriat & Ackerman 2010) with different colors, ranging from blue (“very unsure”) to red (“very sure”). The thermometer was explained using an analogy to the well-known hot and cold search game, with blue representing cold and uncertainty and red representing hot and certainty. To make sure children understood the different confidence ratings, they were asked example questions. Administering the task in different phases led to children giving delayed CJ. This procedure was chosen to avoid a cognitive overload which could affect the monitoring performance (Roebers et al., 2007).

A bias index as a measure of absolute monitoring accuracy (Schraw, 2009) was calculated as a dependent variable, showing the degree of over- or underconfidence in judgments. To calculate the discrepancy between confidence and performance, the confidence ratings (1–7) were quantified as probabilities that each response was correct (0, 0.17, 0.33, 0.5, 0.67, 0.83, 1). Performance was included as 0 (incorrect) and 1 (correct).

Executive functions

At T1, the three common subcomponents of executive functions were assessed separately, with each task mainly tapping one of the subcomponents. Since we were especially interested in the subcomponent inhibition, it was assessed at T1 and T2.

At both measurement points, the EF subcomponent inhibition was measured with the same adapted and computerized version of the Fruit Stroop task (Reliability r = .93; Archibald & Kerns 1999). The task included three blocks, with each block consisting of 24 trials. In each trial, a target stimulus appeared for one second, followed by a response screen with four different colors (red, green, blue, yellow). In the first block, target stimuli were colored squares (baseline condition), and in the second block, four different fruits or vegetables in their original color (congruent condition). Children were instructed to choose the same color as quickly as possible by touching the response screen. In the third block, the same fruits and vegetables were displayed but in incongruent colors. Children had to choose the original color (i.e. the color the fruit or vegetable has in real life) on the response screen (incongruent condition). As dependent variables, an inverse efficiency (IE) score of the incongruent block (RT divided by accuracy; Townsend & Ashby 1978) was calculated to include RT and accuracy. To simplify interpretation of results, the IE score was reverse-scored with a higher score representing better performance.

Working memory was assessed using the backward color span task (Zoelch et al., 2005). Children were told a cover story about a dwarf who loses colored discs. Then, sequences of colored discs were presented on a tablet screen with each color appearing for one second, and children were instructed to recall the colors in reverse order. After three practice trials, each child started with two-item-sequences. The number of items increased by one item if the child correctly recalled three of the six trials on a particular level. The dependent variable was the total number of correctly recalled trials.

Shifting was measured with a modified dimensional change card sorting task (Reliability ICC = 0.90; Beck et al., 2011; Carlson, 2005; Zelazo, 2006). Three boxes were placed in front of the child, each displaying a target card with a specific colored shape. The task consisted of three conditions with a practice trial preceding each. After making sure that the children knew all colors and shapes, they were asked to sort six cards according to color (condition 1) and shape (condition 2) as quickly as possible. In the third condition, children were given 18 cards (5 with a star, 13 without a star) and instructed to sort cards with a star according to shape and cards without a star according to color, as fast as possible. The dependent variable was a combined score including the accuracy and the time to task completion of the third condition ((errors + 1) * time). The variable was reverse-scored, with a higher value representing better performance.

Statistical analyses

Data were analyzed using the software Jamovi 1.6. (The jamovi project, 2021), which runs on R (R Core Team, 2018). The path analysis was conducted using IBM SPSS Amos 25 software. All variables were standardized before entering the analysis. Assumption checks were performed before analyzing the data. No violation of assumptions was found (Shapiro Wilk W = 0.982, p = .30; Multicollinearity VIF < 2; Durbin-Watson = 2.34). In the Stroop task, trials were excluded if the reaction time was lower than 150ms or deviated more than 3 standard deviations from the subject’s mean reaction time. This applied to 2.3% of all reaction times at T1 and 3.1% at T2. Regarding the dependent variables, scores higher or lower than 3 standard deviations from the sample’s mean were set to 3 standard deviations. This concerned 0.6% of all values.

Results

Descriptive statistics

The descriptive statistics of all dependent variables are displayed in Table 1. Mean performance accuracy in the paired-associate task was 0.39 (SD = 0.22) at T1 and 0.59 (SD = 0.21) at T2. A paired t-test (two-sided) revealed a significant improvement over time, T(83) = -7.27, p < .001, Cohen’s d = − 0.79.

Table 1 Descriptive statistics for all variables of executive functions and metacognition

Development from T1 to T2

Regarding the EF measure inhibition, mean accuracy was 0.75 (SD = 0.19) at T1 and 0.92 (SD = 0.08) at T2. The mean reaction time was 1558 ms (SD = 562) at T1 and 1024 ms (SD = 309) at T2. A repeated measures ANOVA revealed a significant improvement in accuracy, F(1,83) = 73.5, p < .001, ηp² = 0.470, as well as a significant reduction in reaction times, F(1,83) = 96.8, p < .001, ηp² = 0.538, from T1 to T2. For all following analysis, accuracy and reaction times of the Stroop task were combined in an IE score. A repeated measures ANOVA revealed a significant improvement of the IE score from T1 to T2, F(1,83) = 107, p < .001, ηp² = 0.295. Absolute monitoring accuracy was assessed by calculating a bias index (Schraw, 2009) indicative for the degree of over- or underconfidence. T tests comparing the bias index against 0 (realistic performance estimation) revealed a significant overconfidence at both measurement points, T1: T(83) = 7.91, p < .001, Cohen’s d = 0.86; T2: T(83) = 4.52, p < .001, Cohen’s d = 0.49. Investigating the improvement of monitoring between T1 and T2, a repeated measures ANOVA revealed that children were significantly less overconfident at T2 compared to T1, F(1,83) = 14.4, p < .001, ηp² = 0.148.

Correlations between executive functions and monitoring

Intercorrelations between all variables are presented in Table 2. There were significant correlations among all EF components, cross-sectionally as well as longitudinally. Regarding the correlations between the EF components and monitoring, we found significant cross-sectional relations between the two components inhibition and shifting, and monitoring at T1. Longitudinally, only inhibition and monitoring were significantly correlated.

Table 2 Correlations among all executive functions and metacognition variables

Earlier executive functions predicting monitoring

To further explore the longitudinal links between EF and monitoring skills, a hierarchical regression analysis was conducted with metacognitive monitoring at T2 as a dependent variable and inhibition, working memory, and shifting at T1 as predictors (Table 3). Since we hypothesized that inhibition was the most important EF component with regard to monitoring, we entered inhibition separately in a first step into the regression analysis. In a second step, we entered shifting and working memory. Inhibition proved to be a significant predictor of monitoring, explaining 11% of variance. This indicates that children with better inhibition skills at T1 showed significantly less overconfidence at T2. In contrast, working memory and shifting were no significant predictors, only explaining 1% additional variance after accounting for inhibition. Exploratively, we conducted another regression analysis, entering the EF components after accounting for monitoring T1. The pattern of results remained the same with inhibition being the only significant predictor of monitoring T2.

Table 3 Linear regression analysis: executive functions at T1 predicting metacognitive monitoring at T2

Bidirectional links between inhibition and monitoring

A cross-lagged panel model was calculated to investigate the bidirectional links between inhibition and monitoring more closely. It included inhibition and monitoring at T1 and T2 (see Fig. 1), allowing the simultaneous estimation of concurrent and longitudinal links between all variables of interest. The estimated model consisted of observed variables and was recursive, meaning that no endogenous variable was represented as both a cause and effect of another, directly or indirectly (Hoyle, 2012). The final estimated path model was saturated, meaning that there were 0 degrees of freedom, and its fit to the data cannot be tested. Error variances of the two endogenous variables inhibition and monitoring at T2 were allowed to correlate.

Fig. 1
figure 1

Path analysis including inhibition and monitoring variables at time points 1 and 2. Standardized coefficients are shown

* p < .05, ** p < .01, *** p < .001

The cross-sectional link between inhibition and monitoring at T1 was significant (β = − 0.25, p < .05), indicating that kindergarten children with better inhibition skills showed less overconfidence when monitoring their performance. In contrast, the cross-sectional link between inhibition and monitoring at T2 was not significant (β = 0.06, n.s.). Regarding the longitudinal links, we found that inhibition at T1 was significantly related to monitoring at T2 (β = − 0.35, p < .01), indicating that early inhibitory skills predicted later monitoring accuracy. More precisely, children with better inhibitory skills in kindergarten seem to exhibit less overconfidence in elementary school. Interestingly, earlier monitoring skills had no significant effect on later inhibition skills. Concerning stability, there was a significant association between earlier and later inhibition skills (β = 0.52, p < .001), indicating high stability. In contrast, the longitudinal link between monitoring at T1 and T2 showed low monitoring accuracy stability (β = − 0.04, n.s.).

Discussion

The present study addressed the longitudinal relationship between EF and metacognitive monitoring across the transition into elementary school. Our primary goal was to investigate the assumption of inhibition being a prerequisite of monitoring skills and explore the potential bidirectional link between inhibition and monitoring. Overall, our results supported an essential role of inhibition regarding children’s later monitoring accuracy and replicated and extended previous findings. To our knowledge, this is the first study examining the longitudinal association between EF skills and monitoring accuracy during children’s transition to school.

Firstly, our results confirmed development progressions regarding inhibition and metacognition during the period of transition into school. Regarding inhibition, children became more accurate and faster in their responses from kindergarten to 2nd grade, confirming previous research (e.g., Davidson et al., 2006). Regarding monitoring, although children remained significantly overconfident in elementary school, they showed substantial improvements in their monitoring accuracy compared to kindergarten. This is in line with many cross-sectional studies for this age range (e.g., O’Leary & Sloutsky 2019; for a review see Roebers, 2017). Investigating the development of monitoring longitudinally, Roebers & Spiess (2017) also found significant reductions in overconfidence in children during 2nd grade. Experiences with formal schooling are generally considered as driving forces for this aspect of metacognitive development.

Investigating the longitudinal relationship between EF and monitoring revealed a significant link between the EF subcomponent inhibition and monitoring. The two other EF subcomponents working memory and shifting, assessed in kindergarten, were not significantly associated with later monitoring skills. This is in line with Bryce et al., (2015) ‘s cross-sectional study, finding no significant link between working memory and monitoring. One possible reason could be that research with young children so far suggests a link between working memory and metacognitive control (Bryce et al., 2015; Spiess et al., 2016), but not monitoring. Regarding the relation between shifting and MC, there is not much known since previous studies either did not include a shifting measure (Bryce et al., 2015) or used a composite score for EF (Destan & Roebers, 2015; Marulis & Nelson, 2021). While Roebers et al., (2012) included a combined measure of shifting and updating and found no significant association with monitoring, Kälin & Roebers (2020) found a link of shifting with the implicit, time-based measure of monitoring but not with the explicit measure of monitoring. Since research showed that shifting builds on inhibition and working memory and therefore comes later in development, it is possible that in young children, the link can only be found with implicit metacognitive measures and only later in development with explicit measures. Furthermore, the differentiation of the EF structure from kindergarten (T1) to primary school (T2), with shifting only emerging at school age (Karr et al., 2018) might also have played a role regarding the non-significant link of shifting with monitoring. It would have been interesting to additionally evaluate the association of working memory and shifting with monitoring at school-age. Future research should include all EF subcomponents separately and investigate elementary school children to acquire more knowledge about the relationship between EF and MC.

As mentioned above, our results supported a vital role of early inhibition for later monitoring skills. We assessed inhibition and monitoring in kindergarten and nearly two years later in elementary school because our primary focus laid on the investigation of this longitudinal relationship. Previous research provided correlational evidence that inhibition might serve as a prerequisite of monitoring skills in young children (Bryce et al., 2015; Kälin & Roebers, 2020). On the contrary, in their longitudinal study with elementary school children, Roebers et al., (2012) found no significant link between inhibition and monitoring, and EF as a latent variable was not substantially associated with later monitoring judgments. Together, these findings could be interpreted as suggesting that inhibition serves as an important factor for monitoring skills in early years and as an essential antecedent for later monitoring during the transition to formal schooling, but is less critical once children are settled in elementary school. From previous research we know that kindergarten (as part of the preschool education) constitutes a crucial period of EF and MC development (Hembacher & Ghetti, 2014; McCabe et al., 2004) and it is possible that especially in this age range, inhibition is important for children’s developing monitoring skills. In other words, early inhibition might be necessary for the development of emerging monitoring skills, but not anymore for more mature monitoring skills which has also been suggested by Bryce et al., (2015). A dynamic relationship between EF and MC over the course of development has also been proposed by Roebers (2017). Furthermore, developmentally distinct patterns of interactions between EF and MC have also been found regarding the link between working memory and metacognitive skills (Whitebread, 1999). It is also possible that young children with better inhibition gain different kinds of metacognitive experiences (e.g. different estimation of time or effort) than children with less developed inhibition, leading to more accurate monitoring (Efklides, 2001, 2006). Later in development, across elementary school years, other cognitive processes might become critical driving forces for monitoring accuracy.

Based on the significant association between inhibition and time-based monitoring measures, Kälin & Roebers (2020) suggested that better inhibition skills may help children hesitate when providing monitoring judgments and thus enable children to engage in more thorough monitoring. It was assumed that monitoring needs time, and better inhibition skills allowed children to take their time when monitoring, especially when uncertain. The neuroscientific background of this assumption is based on findings that prefrontal cortical networks are involved in resolving conflicts and down-regulate ACC activation (Botvinick et al., 2001). Crucially, there seems to be a link between EF skills and the recruitment of the lateral PFC, with better EF skills being associated with reflection, slower responding and a down-regulation of ACC activation (Espinet et al., 2013). Thus, in the present study, better inhibition skills in kindergarten might have led to a down-regulation of ACC activation and enabled children to take their time to monitor more thoroughly, which subsequently led to less overconfidence in their monitoring not only cross-sectionally but also over almost two years, indicating that it might have affected the development of more realistic monitoring judgments.

It can be assumed that better inhibition skills might prevent children from giving rushed and inconsiderate monitoring judgments which subsequently leads to more realistic performance estimations. Examining children’s monitoring by looking at preparation times during an EF task, Chevalier & Blaye (2016) found that 6-year-olds often triggered the target before they were fully prepared to process it efficiently, resulting in poorer task performance compared to 10-year-olds. The authors interpreted their findings in that EF development is partly driven by increased monitoring skills. In light of our findings, it would also be possible to suggest that younger children’s lacking inhibition skills led to shorter preparation times and, consequently, less accurate responses. It would be interesting for future research to additionally include time-based measures when investigating the link between inhibition and monitoring longitudinally.

Based on previous correlational results, it could also be suspected that the relationship between EF and MC works in the opposite direction. More precisely, the improved calibration of confidence to accuracy might lead to better EF skills. Therefore, we additionally explored the possibility of a bidirectional relationship between inhibition and monitoring. There is previous research examining whether metacognitive skills might be predictors of children’s EF, but so far, only cross-sectionally by applying regression analysis. Destan & Roebers (2015) reported that global monitoring was no significant predictor of EF in 6-years-old children, and also the over- and underestimation of children’s performance was not significantly linked to EF. Similarly, Marulis & Nelson (2021) investigated the relationship between metacognitive knowledge, procedural MC, and EF in 3-to 5-year-old children. They found that only metacognitive knowledge but not procedural MC was a significant predictor of EF. Having only used a combined score of EF and not separate components poses some difficulties when comparing these findings with our own. Nevertheless, overall, they are in line with the present study results, revealing no significant link between monitoring skills in kindergarten and later inhibition in elementary school. This emphasizes EF’s importance for young children’s emerging and developing monitoring skills, but not vice versa. Interestingly, inhibition in kindergarten seems to be a better predictor of later monitoring skills than monitoring in kindergarten. While our results revealed stability in inhibition, substantial stability could not be confirmed for monitoring. This is in line with previous studies showing that early inhibition skills were predictive for later inhibition skills in preschool (Carlson et al., 2004) and elementary school children (Harms et al., 2014), but finding low stability of monitoring accuracy (Steiner et al., 2020). This reoccurring lack of stability in monitoring measures warrants need for further research.

Despite our finding of inhibition being important for later monitoring skills, the relationship between EF and MC should not solely be seen as unidirectional. For example, the iterative reprocessing model of EF skills assumes that EF development depends, at least in parts, on cognitive and neural processes associated with the reflection abilities of children (Zelazo, 2015). This assumption was supported by an experimental intervention study using reflection training in which young children were instructed to reflect on the task before responding. Compared to the control groups, children receiving the reflection training substantially improved their EF skills (Espinet et al., 2013). Since reflection could be considered as a source for metacognitive knowledge (Zelazo, 2015) rather than procedural metacognition, it might be possible that the influence of MC on EF is limited to the declarative component of MC. This interpretation would also be in line with Marulis & Nelson (2021) finding that only metacognitive knowledge, but not procedural MC, was predictive for EF.

There are limitations of the present study that need consideration. Firstly, the two EF components shifting and working memory were only assessed at T1. In order to get a more holistic picture of the associations between EF and MC, and to be able to make more precise statements on their longitudinal relationship, the investigation of all EF components in relation to monitoring at school age would have been necessary. Secondly, in our study we assumed that inhibition enables children to take their time when monitoring. However, we cannot rule out that there is also a direct effect of inhibition on monitoring, leading to the inhibition of a child’s default of indicating high confidence. In other words, inhibition could enable children to control their impulse to always indicate high confidence independent of the time needed to give a judgment.

In conclusion, the present study presents novel evidence for the conceptually assumed link between EF and MC, substantiating the critical role of early inhibition for later monitoring accuracy. Most crucially, our findings indicate that better inhibition skills in kindergarten help children to more realistically estimate their performance in kindergarten and after entering elementary school. Therefore, our findings support the critical role of inhibition in kindergarten years and the assumption of a developmental sequence in which inhibition serves as a prerequisite of monitoring. Interestingly, the relationship between inhibition and monitoring seems not to be bidirectional, at least during this period. A practical implication that could be drawn is that children with inhibition difficulties in kindergarten may be at risk of developing less accurate monitoring skills later on. This leads to important implications concerning intervention programs in early childhood targeting EF skills. Against the background of our findings, it could be assumed that intervention or training programs aiming to improve early inhibition skills might also foster more accurate monitoring skills. Thus, identifying children with not well developed inhibition skills in kindergarten and support them with intervention or training programs might also be crucial for their later monitoring skills and consequently their academic achievement. It would be interesting to address these assumptions in future research and conduct further investigations to shed light on the interplay between EF and MC not only during the transition to formal schooling but also later in development.