The effects of strategy training and an extrinsic incentive on fourth-and fifth-grade students ’ performance , confidence , and calibration accuracy

This study investigated the influence of strategy training instruction and an extrinsic incentive on American fourthand fifth-grade students’ (N = 35) performance, confidence in performance, and calibration accuracy. Using an experimental design, children were randomized to either an experimental group (strategy training and an extrinsic incentive) or a comparison group in an after-school Brain Train Club. Data on performance and confidence ratings were collected pretest and again at posttest. Accuracy was calculated by comparing confidence in performance to actual performance. Results revealed that students in the experimental group demonstrated improved performance, confidence, and calibration accuracy compared to those in the comparison group. Implications for learning and instruction and avenues for future research are discussed. Subjects: Social Sciences; Behavioral Sciences; Education


Introduction
Metacognitive awareness has been defined as having two main dimensions, knowledge and regulation (Brown & Palincsar, 1989;Palincsar & Brown, 1984;Schraw & Dennison, 1994). Knowledge ABOUT THE AUTHOR Antonio P. Gutierrez de Blume, PhD, is currently an Assistant Professor of Research at Georgia Southern University. He is interested in researching metacognition under the theory of self-regulated learning. More specifically, he is interested in how learners monitor their comprehension during learning episodes. His program of research includes examining the effects of dispositional characteristics (e.g. various aspects of motivation) and learning strategy training on learners' calibration (accuracy and bias), confidence, and performance as well as investigating the latent dimensions of calibration to improve its measurement.

PUBLIC INTEREST STATEMENT
Understanding how children learn is an important endeavor. In this research article, I explore the mechanisms by which American elementary school students (fourth and fifth graders) convey what they know and do not know about a topic of interest, a key element of metacognitive monitoring known as calibration. Research suggests that children begin to develop comprehension monitoring skills and control of their cognitive processes by the age of 9-10 years. In addition, research indicates that learners are not very accurate in their calibration. Therefore, in this study, I outline an educational intervention that teaches crucial strategies to help students more effectively learn and monitor their comprehension. I also provide them with an extrinsic incentive to motivate them to apply strategies effectively given task demands. Results show that the intervention was successful at aiding these young learners to better monitor and control their learning, and hence calibrate their performance more effectively.
encapsulates declarative (knowledge of cognitive skills and strategies), procedural (knowledge of how to apply strategies), and conditional (knowledge of when where and why to apply strategies given task demands). Regulation, on the other hand, subsumes five elements: planning (anticipating cognitive resources and how to effectively allocate them to accomplish goals), information management (skills of how to manage and relate large volumes of information which may be discrepant), debugging (how to correct errors in judgment while learning), comprehension monitoring (monitoring one's understanding of information), and evaluation of learning (the act of reflecting after a learning episode and making appropriate adjustments for more effective future learning) (Schraw & Dennison, 1994;Sperling, Howard, Staley, & DuBois, 2004). All of these components are interrelated yet distinct. Previously, general metacognitive awareness has been researched in tandem with metacognitive monitoring ability and was found to be related (e.g. Schraw, 1998Schraw, , 2001Schraw & Graham, 1997).
Metacognitive monitoring is the process of judging what one knows and does not know about a topic (Huff & Nietfeld, 2009;Thiede, Griffin, Wiley, & Redford, 2009). In the metacognitive monitoring literature, judgments regarding one's metacognitive monitoring ability have been variously referred to as feelings of knowing (FOKs), judgments of learning (JOLs), or ease of learning judgments (EOLs; Schraw, 2009). A FOK is defined as a feeling that information that is necessary during a learning episode is accurately known and can be successfully recalled in the future when necessary, say during a criterion task such as a test; thus, it involves both a cognitive assessment of one's memory and a prediction (before the task; Hart, 1965;Metcalfe, 1986;Nelson, 1992;Schraw, 2009) or postdiction (after the task). Research suggests that learners are more accurate in their postdiction judgments when compared to predictions (e.g. Bol & Hacker, 2001;Bol, Hacker, O'Shea, & Allen, 2005;Gutierrez & Price, in press). On the other hand, JOLs are commonly referred to as judgments of learners' ability to convey how effectively something has been learned (Nelson & Dunlosky, 1991;Schraw, 2009). Finally, EOLs are referred to as judgments of the relative ease or difficulty of to-be-learned information (i.e. they are made in advance of learning novel information; Nelson & Narens, 1990).
Calibration is an outcome from FOK judgments and it refers to the relation between criterion task performance and a judgment about that performance (Boekaerts & Rozendaal, 2010;Efklides, 2008;Winne & Nesbit, 2009). The literature on calibration distinguishes between relative versus absolute judgments (Schraw, Kuch, Gutierrez, & Richmond, 2014). Absolute accuracy judgments refer to comparisons between confidence in performance and actual performance (e.g. G Index), whereas relative judgments refer to the extent to which said judgments discriminate performance (e.g. sensitivity, specificity, gamma, d'; Schraw et al., 2014;Serra & Metcalfe, 2009). Because these various judgments are related yet distinct and presumably tap into different aspects of memory (Leonesio & Nelson, 1990), choice of which types of judgments to measure has important implications for the inferences and conclusions researchers can draw from their data. Generally, these metacognitive judgments are collected as confidence ratings on a referent task such as tests and exams. Thus, calibration is a comparison of confidence in performance and actual performance, outcomes of which are calibration accuracy and bias (Boekaerts & Rozendaal, 2010;Efklides, 2008;Schraw, 2009;Winne & Nesbit, 2009). Calibration accuracy refers to how accurate one's confidence in performance is when compared to actual performance, whereas bias refers to the direction of the miscalibration (i.e. signed difference underconfidence or overconfidence) (Schraw, 2009). In the present investigation, learners were asked to make absolute accuracy judgments, as this was most appropriate for the aspect of metacognitive monitoring examined here. Thus, the purpose of the present study was to evaluate the effect of a compact, yet comprehensive, strategy training regimen that focused on learners' test-taking and study skills-all essential for effective self-regulated learning-combined with an extrinsic incentive on fourth-and fifth-grade students' performance, confidence in performance, and absolute calibration accuracy in an experimental pretest/posttest design.

Contributions to the extant literature on calibration
The present investigation contributes to the extant literature on metacognitive monitoring in that it is the first investigation of this type incorporating this population. Hitherto, much of the metacognitive monitoring literature investigating the effects of strategy training regimen on calibration accuracy has focused mainly on adult learners, mainly college undergraduates. This poses a problem to the extent that generalizations from these studies apply mainly to other samples of adult learners and not children or adolescents. Hence, this research endeavor extends previous research by providing much needed understanding of how these phenomena operate in still cognitively and metacognitively developing learners, namely fourth-and fifth-grade students. Findings have shown that children of this age (9 to 10 years) are experiencing tremendous metacognitive development (e.g. Fatzer & Roebers, 2012;Roebers, 2002), and thus are honing important self-regulated learning skills such as their ability to monitor their comprehension during learning episodes, and thus this study will allow scholars and educators to better understand the relation among cognitive (performance), metacognitive (calibration accuracy), and motivation (use of extrinsic incentives) variables at this critical time in self-regulation of learning development.
Previous researchers have employed a variety of strategy training interventions. For instance, Bol and Hacker (2001) used practice tests when compared to traditional review and Bol et al. (2005) incorporated overt practice in their study. Similarly, Hacker, Bol, and Bahbahani (2008) used reflection and incentives, whereas Bol, Hacker, Walck, and Nunnery (2012) incorporated a comparison between individual and group guidelines. Other researchers have employed some form of strategy instruction to influence metacognitive monitoring accuracy (e.g. Gutierrez & Schraw, 2015;Dunlosky, Rawson, & Middleton, 2005;Huff & Nietfeld, 2009;Nietfeld, Cao, & Osborne, 2005;Nietfeld & Schraw, 2002). Other varied educational interventions used by researchers have examined the effects of general self-regulated learning training (Azevedo & Cromley, 2004), self-explanations of reading comprehension (McNamara, 2004), main idea comprehension (Jitendra, Kay Hoppes, & Xin, 2000), and a comparison of the infusion method and the instrumental enrichment program (Lizarraga, Baquedano, Mangado, & Cardelle-Elewar, 2009). As is evident, in the literature on educational interventions, researchers have employed a variety of methods; however, it is important to note that these researchers did not necessarily all target metacognitive monitoring per se. The present investigation employs a strategy intervention similar to those used previously (e.g. Huff & Nietfeld, 2009;Nietfeld & Schraw, 2002) but also incorporates an extrinsic incentive similar to the one used by Hacker et al. (2008) and Gutierrez and Schraw (2015).
With respect to population, several researchers have conducted related research on various aspects of self-regulated learning with a variety of populations ranging from children to adolescents. Even though these studies did not specifically address strategy training or instruction, they warrant brief mention here. Barnett and Hixon (1997), for instance, examined the effect of grade and domain on second-, fourth-, and sixth-grade learners' tests score predictions. Chen and her colleagues (Chen, Cleary, & Lui, 2015;Digiacomo & Chen, 2016) in a series of studies investigated self-regulated learning skills in mathematics among a cohort of middle school students. Along a similar vein, Bol, Riggs, Hacker, and Nunnery (2010) explored calibration accuracy in mathematics among middle school learners. Hence, while the present study is informed by these studies, it represents an important extension of this work by employing strategy instruction with an incentive among children in an experimental, pretest/posttest design.

Theoretical Framework
Self-regulated learning (SRL) theory encompasses cognition, metacognition, and motivation. Several models of SRL have been proposed in the literature. For instance, Zimmerman (2000) described SRL as a cyclical process involving three parts: (1) forethought (e.g. goal setting, strategic planning, selfefficacy beliefs, and intrinsic motivation); (2) performance and volitional control (e.g. attention focusing, self-instruction, and self-monitoring); and (3) self-reflection (e.g. self-evaluation, attributions, and self-reactions). Boekaerts (1999) proposed a three-layer model of SRL, including: (1) regulation of the self-choice of goals and resources; (2) monitoring of processing methods (i.e. the use of metacognitive knowledge and skills to direct one's learning); and (3) regulation of processing modes (i.e. the choice of cognitive strategies). The theory of SRL posits that metacognitive monitoring is a crucial component of self-regulated learning. Previous findings indicate that proficient self-regulated learners are more able to leverage cognitive resources, select appropriate strategies to match task demands, effectively monitor comprehension during learning episodes, and exhibit greater metacognitive awareness when compared to less proficient learners (Boekaerts & Rozendaal, 2010;de Bruin & van Gog, 2012;Efklides, 2008;Winne & Nesbit, 2009). Greene and Azevedo (2007) describe Winne and Hadwins' model of SRL as more complex because it incorporates monitoring and control processes at each phase of learning as well as distinguishing between task understanding and goal setting. They also describe similarities among various models of SRL. Therefore, by enhancing individuals' comprehension monitoring through explicit instruction, it is plausible for them to improve higher order thinking skills such as calibration accuracy and metacognitive monitoring skills. Thus, the present investigation incorporates all three components of SRL theory-cognition, metacognition, and motivation (i.e. use of an extrinsic incentive).

Developmental trajectory of metacognition
Given the emphasis of children rather than adults in the present study, it is important to understand the developmental trend of metacognition across the lifespan. Although these research endeavors are cross-sectional and largely descriptive in nature, they provide an understanding of how relevant metacognitive skills function at different points along the developmental continuum. Studies on the development of metacognition reveal that metacognitive skills are present as early as 5-7 years of age (Ghetti & Angelini, 2008;Krebs & Roebers, 2010;Lyons & Ghetti, 2011;Roderer & Roebers, 2010). However, these studies indicate that self-awareness and monitoring processes begin to emerge at this early stage of metacognitive development. More advanced metacognitive skills such as control and inhibitory skills begin to surface in children approximately between 8 and10 years of age (Roebers, Schmid, & Roderer, 2009;Schneider, Knopf, & Sodian, 2009). Research results have shown that the metacognitive skills requisite for effective comprehension monitoring arise at about 9 or 10 years of age, in which children begin to display adequate control and monitoring of their own cognitive resources and abilities (Schneider, Visé, Lockl, & Nelson, 2000;Schneider et al., 2009). Interestingly, children begin to develop the ability to calibrate their performance at about age 8 (Roebers, von der Linden, Schneider, & Howie, 2007;Son, 2005;Souchay & Isingrini, 2004), although their calibration accuracy tends to be rather poor at this age. Additional research in later years found that individuals hone their metacognitive skills such as metamemory, reflection, comprehension monitoring, and self-regulation throughout adolescence (Kitchner, 1983;Veenman, Kok, & Blöte, 2005). Nevertheless, it is during adulthood that individuals' metacognitive skills are most proficient, suggesting that as individuals age, their metacognitive monitoring improves (Koriat, Ackerman, Adiv, Lockl, & Schneider, 2013;Vukman, 2005). It is perhaps for this reason that the majority of research on metacognitive monitoring has involved adult learners rather than children. Therefore, this study elucidates how metacognitive monitoring functions in children between 9 and 10 years old, in which metacognitive monitoring and control processes are present. McCormick (2003) and Pressley and Harris (2006) argued that metacognitive monitoring training is an optimal way to improve comprehension monitoring. A series of studies suggested that individuals' performance and calibration accuracy benefit from explicit metacognitive monitoring training. For example, research on reading comprehension indicated a positive effect of specific metacognitive monitoring training on rereading Griffin, Wiley, & Thiede, 2008). It is important to note that research on the influence of metacognitive strategy training on performance, confidence in performance, and calibration accuracy has been inconclusive. Some studies have demonstrated a positive effect of metacognitive strategy training on performance, confidence judgments, and calibration accuracy (Gutierrez & Schraw, 2015;Huff & Nietfeld, 2009;Magliano, Little, & Graesser, 1993;Nietfeld et al., 2005;Nietfeld, Cao, & Osborne, 2006;Nietfeld & Schraw, 2002). In contrast, research also points to the lack of any effects of training on these outcomes. For instance, Bol and Hacker (2001), Bol et al. (2005), and Hacker et al. (2008) found that metacognitive monitoring training had no effect on calibration accuracy and confidence, although it marginally improved performance. Differences in findings among these studies may be due to methodological choices (cf. Gutierrez & Schraw, 2015;Bol et al., 2005).

Metacognitive monitoring training and performance, confidence, and calibration accuracy
Despite the fact that developmental research on metacognition previously surveyed has demonstrated that children begin to realize metacognitive monitoring and control skills by 9 to 10 years of age, Beal (1996) argues that children in the elementary school years are highly miscalibrated in their reading comprehension skills such that they are overconfident in their performance, underscoring the need to implement metacognitive monitoring training among this population of learners. Although there is a great wealth of research on the effects of metacognitive monitoring training on performance, confidence in performance, and calibration accuracy among adult learners, the research on these constructs among children and adolescents is scarce. Among a cohort of middle school students, Cataldo and Cornoldi (1998) showed that explicitly instructing students to monitor their reading comprehension did not significantly influence comprehension monitoring. In contrast, de Bruin, Thiede, Camp, and Redford (2011) demonstrated that instructing children on generating keywords improved metacomprehension monitoring accuracy in reading among fourth-and sixthgrade students. In sum, the scant research on the role metacognitive monitoring training plays in performance, confidence, and calibration accuracy improvement among children and adolescence varies widely. Yet, the general consensus among these studies is that monitoring skills are malleable and trainable and that, for those studies revealing an effect, training has positive effects for these outcomes, albeit small.

The role of extrinsic incentives
The extant literature on the role of incentives converges on the finding that incentives differentially affect performance and calibration accuracy as a function of the type of incentive (Hogarth, Gibbs, McKenzie, & Marquis, 1991;Schraw, Potenza, & Nebelsick-Gullet, 1993;Yates, 1990). Extrinsic incentives are driven by tangible rewards, such as money and extra credit, which instead draw on individuals' performance on a criterion task (e.g. to outperform others or to avoid underperforming). Although incentives have been shown to influence task performance, the effect is not always positive (Kleinsorge & Rinkenauer, 2012), especially when a task is easy or unappealing (Bailey & Fessler, 2011). For example, Sankaran and Bui (2001) found that when individuals no longer received incentives to motivate performance on an intrinsically enjoyable task of their choosing, their performance and interest on the task dampened. Sinkavich (1994) provided an extrinsic incentive-in this case, extra credit points-to enhance monitoring accuracy. In addition, when individuals are provided an incentive to sustain high achievement on a referent task, this significantly decreased their incidental learning (i.e. learning that occurs unintentionally or that which is not initially planned; Hogarth et al., 1991), presumably because attention is focused on performance rather than actual learning.
The literature on the effects of incentives on performance and calibration accuracy has reported mixed findings. For instance, Hacker et al. (2008) found that students in the incentives condition significantly improved calibration accuracy but this was the case only for lower performing students. On the other hand, research by Schraw et al. (1993) found that incentives provided explicitly to improve calibration accuracy improved individuals' monitoring accuracy, whereas incentivizing them for results on specific tasks had no effect on performance, suggesting that individuals invoke subjective feelings of knowing when calibrating their performance rather than more objective information such as judging item difficulty on performance assessments. On another vein, findings have indicated nil effects of incentives on either calibration accuracy or performance (see, Hogarth et al., 1991; for a review), albeit one study revealed that individuals who received incentives performed significantly better than those who were exposed to strategy training (Tuckman, 1996). Conversely, Gutierrez and Schraw (2015) found that extrinsic incentives improve performance, confidence, and calibration accuracy among adult learners. In sum, the extant literature on these topics points to a dynamic interplay between incentives, calibration accuracy, and performance (e.g. Hacker et al., 2008;Schraw et al., 1993;Tuckman, 1996), thus underscoring the need for additional research to clarify this complex dynamic.

The present study
The purpose of the present investigation was to examine the influence of a compact strategy training regimen targeting learners' self-regulated learning skills (e.g. test-taking and study strategies) combined with an extrinsic incentive on fourth-and fifth-grade students' performance on a declarative knowledge assessment, confidence in performance, and calibration accuracy. This was evaluated using an experimental design with a repeated measures component. The compact strategy training was developed using previous research (Gutierrez & Schraw, 2015;Nietfeld & Schraw, 2002;Schraw, 1998;Volet, 1991). This line of inquiry posits that strategy instruction targeted to increase calibration accuracy should in tandem improve subsequent self-regulation of learning because more proficient monitoring presumably increases self-regulation and control of learning processes (Greene & Azevedo, 2010;Nelson & Narens, 1990;Winne & Nesbit, 2009). Gutierrez and Schraw (2015) used the same strategy instruction implemented here among adults using a four-group experimental design (strategy training and incentive, incentive only, strategy training only, and control).
More specifically, the integrated one-hour strategy instruction intervention used seven domaingeneral strategies that have been demonstrated to improve learning and self-regulation (Greene & Azevedo, 2010). Individuals were expected to read, review, relate, and monitor information during learning as a result of the strategy instruction. The full strategy module is known as R 3 M and was developed and tested previously with adult learners (Gutierrez & Schraw, 2015). The module is based on domain-general strategy instruction principles (Pressley & Harris, 2006) as well as specific strategies used in previous calibration research (Gutierrez and Schraw, 2015;Nietfeld & Schraw, 2002;Schraw, 1998;Volet, 1991). The present study presented the seven strategies shown in Table  1 in an integrated one-hour educational training session designed to increase proficiency with respect to self-regulation processes (i.e. strategic study, monitoring, and control processes) during learning. This strategy training represents a more comprehensive training regimen than previous research on this topic (e.g. Bol et al., 2005 Hacker et al., 2008;McNamara & Magliano, 2009;Nietfeld & Schraw, 2002). With the majority of research on the effect of strategy training on calibration focusing on adult learners rather than children who are at a critical juncture in their metacognitive development, the present investigation represents an important extension to the calibration literature.

Research question and hypothesis
The present study sought to answer the following research question.
1. What is the effect of strategy training instruction combined with an extrinsic incentive on fourth-and fifth-grade students' performance on a 20-item declarative knowledge test, confidence in performance, and calibration accuracy?
H 1 : The experimental condition (strategy training and incentive) was expected to positively affect students' performance, confidence judgments, and calibration accuracy. More specifically, students in the experimental group were predicted to improve their performance, confidence, and calibration accuracy at posttest when compared to students in the comparison group. Thus, an experimental condition x occasion (pretest, posttest) interaction was hypothesized, along with a significant main effect for experimental group. Given that the comparison group was not exposed to the treatment, a main effect for occasion was not predicted.

Participants and sample
Participants were 35 fourth-(n = 23) and 5th (n = 12) grade students from a Title I school in the southeast United States. This sample of students consisted of 19 females (16 males), with a mean age of 9 years and 6 months (SD = 0.57). The majority of the students were white (n = 26), with five identifying as black, two as Hispanic, and two as Mixed Race. School administration reported that the racial distribution of the sample approximates that of the school at large, and hence it is representative of this particular school. Furthermore, the educator who assisted in coordinating the study reported that, with respect to ability and achievement, the sample was typical of the school population.

Design and materials
This study employed an experimental pretest/posttest design. Students were randomized into one of the two groups, strategy training and an extrinsic incentive (n = 18) and a comparison group (n = 15) who completed tasks unrelated to the experiment (i.e. played educational games provided by the school in a tablet). Therefore, the monitoring and incentive condition was manipulated between-subjects, whereas occasion (pretest, posttest) was repeated within-subjects. Table 2 presents descriptive statistics and internal consistency reliability coefficients (Cronbach's α) for the performance variable at pretest and posttest for the sample and Table 3 includes the zero-order correlations for pretest and posttest measures. Performance and confidence judgments: Performance was assessed using a researcher-developed 20-item declarative knowledge test. Items included a variety of domains that incorporated information on what students were learning at school at the time of the experiment. Sample items included, "What is the answer to 203-159?"; "The sixth planet from the sun is:"; and "What is the answer? 17 2 × 5?" All items had only one correct response and four available choices. Correct responses were dummy coded as 1 and incorrect responses as 0. This permitted for a continuous raw score for performance, which was subsequently transformed to a proportion of correct responses (i.e. percent correct) to facilitate interpretation and to more logically compare confidence judgments to actual performance. Confidence judgments were collected locally (i.e. item-by-item) by asking participants the following question after each item, "How confident are you in your response to this item?" Participants responded to confidence ratings on a 0 to 100 continuous scale. They were instructed that any value from 0 to 100 is valid and that the closer the value is to 0, the less confidence they have whereas choosing a value closer to 100 indicated greater confidence. Confidence ratings were then averaged across all items to produce a mean confidence rating.
Absolute calibration accuracy: Absolute accuracy scores were calculated by comparing participants' confidence in performance against their actual assessment percent correct score-that is, the residual score approach. Raw scores were converted to a proportion and subtracted from the composite confidence in performance ratings to calculate absolute accuracy. Absolute values-to avoid negative values-of the discrepancy between students' self-reported level of confidence and actual performance were used as the measure of accuracy to facilitate interpretation. Comparing confidence in performance against actual performance yielded continuous, absolute calibration accuracy scores, as described by Schraw (2009). A score of "0" indicates perfect calibration; on the other hand, the higher the value, and thus the farther away from "0", the greater the inaccuracy. In essence, the higher the accuracy scores, the greater the miscalibration exhibited by the participant. In addition, the upper bound and lower bound signed scores were reported to gather information regarding participants' bias (i.e. underconfidence or overconfidence).

Procedure
University IRB approval was secured before the commencement of data collection activities. The research was conducted across four one-hour sessions that occurred every Tuesday after school between 2:15 and 3:15 pm. Two separate data collection sessions occurred from September to December 2015 to recruit enough participants for an adequate sample size. Participants first completed child assent forms after receiving parent permission forms for voluntary participation at the beginning of Session 1. Next, students completed pretest measures, including the performance assessment with item-by-item confidence in performance ratings (i.e. confidence ratings were collected immediately after participants responded to each item). All data collection activities occurred online in a computer lab at the participating school via Qualtrics. Participants randomized to the experimental R 3 M condition were taken to a separate room and received an advance organizer activity in which they were provided with the strategies in Table 1 and the instructions for receiving the incentive for improved performance at posttest. These instructions were read aloud to the experimental group to obviate confusion, and students were permitted to ask questions if they did not understand any part of the instructions. The choice of candy as the incentive was based on the professional advice of three educators in the participating school. Finally, students received a summary of the strategy training to take place in Session 2 and were instructed to reflect on the seven strategies and how they apply to their learning. Students in the comparison condition completed educational games on tables (e.g. Math Genius, Chess, and/or History) unrelated to the present study in a separate room for the remainder of Session 1.
During Session 2, participants in the experimental condition received the strategy training. The training session involved direct instruction and individual practice in using strategies with scaffolded feedback in a face-to-face lecture format. First, participants were provided with a brief introduction to the goal of the session and an overview of the types of strategies that would be covered. Next, the author covered each of the strategies separately. For each strategy, students were provided direct instruction that included explaining the strategy, identifying when it is applicable, and modeling as well as scaffolding the strategy so that students perceived its value with respect to improved calibration accuracy. Subsequently, students were provided opportunities to apply and practice each strategy covered during the session using an expository text on emotions as well as an 18-item practice test. The researcher walked around during this apply practice portion to provide additional guidance individually, where necessary. Students were afforded opportunities to ask questions and discuss strategies after they were introduced and modeled to clarify any misunderstandings. Students not receiving the monitoring instruction participated in an activity unrelated to the experiment in a separate room-namely, playing educational games on a tablet provided by the school.
Session 3 involved providing participants in the experimental condition with a brief 10-min summary of the activities of Session 2 while the comparison condition participants played educational games on a tablet in a separate room. Next, all participants returned to the computer lab to complete the posttest measure-the performance assessment in which confidence ratings were again completed immediately after responding to each item. After completion of the posttest, students were permitted to play educational games on a tablet. Finally, Session 4 involved providing the comparison group with the strategy training. Participants in the experimental condition who met the criteria for the incentive were given a candy of their choice by the educator who assisted in coordinating the study. Additional candy was given to school administration to distribute to all remaining participants after the conclusion of Session 4.

Strategy instruction intervention
The one-hour strategy training intervention was implemented by introducing each of the seven strategies in Table 1 sequentially. During the modeling and scaffolding portions of the training, specific examples were used to increase the utility and application of the strategy in authentic learning situations. For instance, when explaining the strategy on diagramming, participants were shown how this strategy could be applied when attempting to learn and understand the notions of division and multiplication by drawing squares or circles to understand the concept of breaking larger numbers into equal groups and remainders or compounding in multiplication. Another example used was how students could use highlighting and underlining effectively to facilitate reading comprehension in math word problems and when attempting to understand main ideas in texts. Students were permitted to ask questions for clarification and to practice the strategy with other concepts of their choice. This aided students in understanding the "when, where, and why" (i.e. conditional metacognitive knowledge) of strategy use in their everyday learning environment. At the end of the training, participants were instructed to reflect how they could apply any of the strategies while learning at school, albeit these process data were not captured for subsequent data analysis.

Data analysis plan
Prior to data analysis, data were first screened for univariate outliers and evaluated against requisite statistical assumptions according to the procedures outlined by Tabachnick and Fidell (2013) via IBM SPSS version 22. No extreme outliers that would otherwise undermine the trustworthiness of the data were detected. A missing values analysis demonstrated that two cases (one from each group; 0.03%) had missing data. Systematic bias in the pattern of missing data could pose a problem to the trustworthiness and accuracy of the data, and hence the validity of the inferences and conclusions drawn from such data. Thus, Little's MCAR χ 2 was requested from the missing values analysis to ascertain if the pattern of missing data was missing at random (Little & Rubin, 1989;Schafer & Graham, 2002). A significant χ 2 (i.e. p < .05) would suggest that the pattern of missing data is missing not at random (MNAR), which poses a problem for interpretation of results because they may be biased due to systematic differences in non-responses. However, the result of this test was non-significant, p = .53, suggesting that the missingness pattern in the data was missing at random. Thus, all final analyses were conducted with 33 complete cases.
Data were also tested for univariate normality using histograms with the normal curve overlay and skewness and kurtosis statistics. Data approximated a normal distribution. Furthermore, data were evaluated for assumptions including multicollinearity (all correlations were < r = .85), homogeneity of variance (all Levene's test p-values > .05), and sphericity. All of the aforementioned assumptions were met, and thus data analysis proceeded without making any adjustments to the data.
A series of 2 (experimental condition: monitoring training and incentive, comparison) × 2 (occasion: pretest, posttest) factorial mixed-model (between-subjects and within-subjects) analyses of variance (ANOVAs) were conducted to answer the research questions. In each of these analyses, performance, confidence judgments, and calibration accuracy served as the dependent variables separately. The Bonferroni adjustment to statistical significance was used to control for the family-wise Type I error rate inflation. All effect sizes for the factorial ANOVA results were reported as partial η 2 ( 2 p ). Cohen (1988) specified the following interpretive guidelines for 2 p : .010-.059 as small; .060-.139 as medium; and ≥ .140 as large.

Results
Pearson's product moment correlation coefficients in Table 3 were all within range, in the theoretically expected direction, and all of the correlations were weak to strong and statistically significant. Of special significance, absolute accuracy was significantly and inversely associated with performance at pretest, and descriptive statistics in Tables 2 and 4 show that participants' signed absolute bias scores tended to be positive, suggesting that students were generally overconfident in their confidence judgments. Moreover, this correlation was attenuated at posttest, indicating that students generally adjusted confidence ratings from pretest to posttest performance. Descriptive statistics by group are presented for all relevant variables in Table 4.

Establishing group equivalence at baseline
Prior to proceeding with data analyses, group equivalence was evaluated for all variables at baseline. Significant differences among groups on pretest measures would point to the need to control for those variables. However, a series of independent samples t-tests revealed no statistically significant differences between the experimental and comparison groups on any of the pretest variables (all p-values ≥ .31). Moreover, analyses between grades demonstrated that grade level had no significant effect on any of the outcome measures (all p-values ≥ .26). Therefore, all analyses proceeded as planned.

Main analyses
Performance: Performance results revealed a significant experimental condition x occasion interaction, F (1,31) = 3.98, p = .03, 2 p = .104 (see Figure 1). Neither main effect reached statistical significance (occasion, p = .10; experimental condition, p = .29). Simple contrasts with the Bonferroni adjustment for multiple comparisons revealed no significant differences for either group within occasion, all p-values ≥ .90. Conversely, simple main effects with the Bonferroni adjustments for multiple comparisons indicated that the experimental group exhibited improved performance at posttest when compared to pretest performance, 2 p = .090, p = .04. No significant difference was found for the comparison group. https://doi.org/10.1080/2331186X.2017.1314652 Confidence: Confidence in performance results indicated a statistically significant experimental condition x occasion interaction, F (1,31) = 4.97, p = .03, 2 p = .115 (see Figure 2), as well as significant main effects for experimental condition, F (1,31) = 4.71, p = .03, 2 p = .176, and occasion, F (1,31) = 5.67, p = .03, 2 p = .184. Simple contrasts with the Bonferroni adjustment for multiple comparisons revealed that, at posttest, the comparison group reported significantly higher confidence when compared to the experimental group, 2 p = .353, p = .001. Simple main effects with the Bonferroni adjustment showed that the difference in confidence ratings for the comparison group was significantly higher at posttest than pretest, 2 p = .23, p = .01. The main effect for occasion indicated that students' confidence increased from pretest to posttest and the main effect for experimental condition found that the confidence ratings were higher for the comparison group.
Absolute calibration accuracy: There was a statistically significant experimental condition x occasion interaction, F (1,31) = 3.96, p = .03, 2 p = .090 (see Figure 3). There was also a significant main effect  for experimental condition, F (1,31) = 4.01, p = .03, 2 p = .089. The main effect for occasion did not reach statistical significance (p = .42). Simple contrasts with the Bonferroni adjustment to the p-value revealed that within occasion, the experimental group exhibited significantly greater absolute accuracy at posttest when compared to the comparison group, 2 p = .181, p = .03. Within experimental condition, simple main effects with the Bonferroni adjustment for multiple comparisons indicated that the experimental group improved absolute accuracy from pretest to posttest, 2 p = .067, p = .04, and that the comparison group showed no statistically significant change in accuracy (p = .99). The main effect of experimental condition showed that the experimental group exhibited more proficient absolute accuracy when compared to the comparison group.
Summary: Taken together, these results indicate that strategy training, combined with an extrinsic incentive, yields positive effects for a variety of critical learning outcomes. The compact one-hour strategy training regimen (R 3 M) presented here had practically and meaningfully significant effects on fourth-and fifth-grade students' performance, confidence, and absolute calibration accuracy. Of special significance, students randomized to the comparison group reported significantly higher confidence at posttest when compared to the experimental group, suggesting that they did not adjust confidence ratings to more accurately match performance (i.e. they remain miscalibrated). More importantly, the experimental group reported greater absolute accuracy at posttest than the comparison group due to improved performance and lower confidence at posttest when compared to the control group. This is in spite of the test-retest effect, which apparently had deleterious effects for the control group participants, who reported greater confidence at posttest. Although tentatively, due to a small sample and a two-group design, these findings highlight the utility of the experimental condition at improving metacognitive monitoring skills.

Discussion
The purpose of the present investigation was to evaluate the effects of compact strategy training and an extrinsic incentive on fourth-and fifth-grade students' performance, confidence judgments, and absolute calibration accuracy. It was predicted that students in the experimental condition would improve their performance, confidence judgments, and calibration accuracy as a function of the strategy training and incentive. The significant interaction for performance indicated that only the experimental group showed significant gains in performance while the comparison group did not. This is in line with research conducted by Gutierrez and Schraw (2015) and Nietfeld and Schraw (2002), who found that strategy training improved students' performance (cf. Bol et al., 2005 regarding overt calibration practice). The present investigation found that calibration accuracy improved significantly for the experimental group but not the comparison group. This is congruent with previous research, which found a positive effect for strategy training on students' calibration (e.g. Gutierrez & Schraw, 2015;Azevedo & Cromley, 2004;Bol et al., 2012;Greene & Azevedo, 2010;Nietfeld & Schraw, 2002). Conversely, research by Bol et al. (2005) and Hacker et al. (2008) uncovered that while training improved performance, it had no significant effect of calibration accuracy.

Absolute Accuracy Results
Findings of performance suggest that students benefited from the strategy training which improved their performance, and by extension enhanced their self-regulation of learning skills. Besides improving regulatory capacity, the incentive motivated students in the experimental group to improve performance from pretest to posttest. This positive effect of incentives on performance has been found in previous studies (e.g. Gutierrez & Schraw, 2015;Schraw et al., 1993;Tuckman, 1996;Yates, 1990). The fact that the experimental group adjusted confidence to more closely align with actual performance from pretest to posttest supports the hypothesis and is sensible, given what we know about SRL theory and metacognitive monitoring. The theory of SRL specifies that a proficient learner will be better able to monitor his comprehension (Boekaerts & Rozendaal, 2010;de Bruin & van Gog, 2012;Efklides, 2008). Likewise, those who are more proficient at monitoring their comprehension as they learn exhibit greater calibration accuracy. Therefore, those students in the experimental group exhibited appropriate self-regulation via increased comprehension monitoring, and hence were able to adjust their confidence ratings to actually more adequately match what they know and do not know to actual performance. Presumably, this enhanced monitoring led students in the experimental group to adjust confidence at posttest to more closely align to actual performance when compared to comparison group students. In essence, the increased confidence judgments of comparison group students may be, quite possibly, evidence of greater miscalibration when compared to those of the experimental group (see Gutierrez & Schraw, 2015).
Although strategy training has been found to enhance calibration accuracy, the influence of incentives on calibration has not been thoroughly explored. Gutierrez and Schraw (2015), however, found that an extrinsic incentive combined with strategy training targeting SLR skills improved calibration accuracy among adult learners. In the present sample of elementary school students, it may be that the study and test-taking strategies improved self-regulatory ability and comprehension monitoring skills while the extrinsic incentive provided students with the motivational edge necessary to apply the strategies appropriately. In essence, what this is demonstrating is that the strategy training provided students with the conditional knowledge needed to know when, where, and why to apply a strategy, whereas the incentive motivated students to actually apply the strategies, although this needs to be empirically examined in future research.

Implications for learning and instruction
The present investigation provided tentative support that essential learning outcomes such as selfregulated learning skills (e.g. performance, confidence judgments, and calibration accuracy) are malleable and trainable. Although with a small sample size, the present study showed the potential for educators of fourth-and fifth-grade students to adopt a strategy training regimen such as R 3 M that employs study and test-taking strategies to successfully enhance key monitoring skills, strategy use, and performance. R 3 M is appealing because it is a compact (one-hour training session) approach to improving students' performance, and by extension, metacognitive monitoring accuracy, making them more proficient at understanding what they know and do not know about a topic and constantly adjusting confidence judgments to more closely approximate these feelings of knowing and avoid illusions of knowing and illusions of not knowing (Serra & Metcalfe, 2009). For practitioners in particular, the R 3 M can be successfully and efficiently employed either together in a compact onehour training such as the one utilized in this study, or they can segment the training by covering one strategy per day to provide a more in-depth modeling and scaffolding approach. Either approach can be easily incorporated in any lesson of the educator's choosing. Ultimately, educators should incorporate metacognitive monitoring instruction to their typical curricula to have longer, more sustained effects. This is important given that research such as that conducted by Nietfeld and Schraw (2002) showed that students' monitoring and performance reverted to baseline levels after a twoweek delay. Use of an extrinsic incentive demonstrated that dispositional characteristics have a significant impact on strategy use, performance, confidence, and calibration accuracy. Therefore, educators should consider not only cognitive and metacognitive factors but motivational ones as well, and thus more closely align instructional practice to SRL theory. Finding ways to incentivize students to actually apply effective strategies and be flexible in strategy use as task demands fluctuate may be an effective way to improve learning outcomes.

Methodological reflections and limitations
No research endeavor involving human participants is ever without limitations. Although this study found significant and robust practical effects (i.e. moderate-to-large effect sizes given the sample size) of strategy training and an extrinsic incentive on the outcomes of interest, especially calibration accuracy, the sample size was small by comparison. Moreover, because of time and other logistical constraints, it was not possible to implement a four-group design, as described in the Avenues for Future Research section, and hence it was not possible to isolate the individual effects of monitoring training and the extrinsic incentive separately. This approach would have provided a clearer, more nuanced explanation of the relations among these variables. In spite of these limitations, however, this research study was a randomized experiment in a repeated measures context, and thus increases the validity of the inferences and conclusions drawn from the findings. In addition, this study involved children, namely fourth-and fifth-grade students, and hence extends the extant calibration literature in this understudied population.

Avenues for future research
The present investigation indicated that a compact strategy training regimen that underscores enhancement of test-taking and study skills combined with an extrinsic incentive enhances self-regulatory skills such as those examined here (e.g. increased calibration accuracy). While the R 3 M module contains domain-general strategies focusing on learners' test-taking and study strategies, it is not presently clear whether these strategies perform equally effectively across academic domains or among other spans of development. Thus, future research should explore the invariance of these strategies across multiple domains and among adolescents. Also, additional research with children is warranted to ascertain the stability of the results of the present investigation across multiple samples of this population, especially studies using more robust sample sizes. For instance, process data regarding actual application of strategies in everyday learning environments would help researchers better tailor interventions and help researchers and educators understand under what situations students actually apply (or do not apply) strategies for more effective learning. Moreover, future research should replicate these findings in more authentic, ecologically valid settings such as actual classrooms. Further, the role incentives play in influencing these learning outcomes should be further examined. For instance, future research should investigate whether intrinsically incentivizing students is more, equally, or less effective than using extrinsic incentives. Such a study would help researchers and educators compare and contrast intrinsic and extrinsic motivators. Due to the nature of conducting research in the schools, it was not logistically feasible to implement an experiment with four groups-monitoring training and incentive; monitoring training only; incentive only; and comparison-to disentangle the effects of the combined condition and the individual manipulations. Thus, future research should implement this four-group design to better understand how these manipulations work in tandem and alone in this population of children.

Conclusion
Our main task as educators of learners with mixed ability, achievement, and dispositions is to find the most effective means to teach critical content while maintaining fidelity to the individual differences inherent in our learners. The utility of R 3 M is that it uses domain-general strategies targeting essential self-regulated learning skills such as study and test-taking strategies, which may be beneficial for diverse learners. The present inquiry showed that this regimen of strategies is effective in improving learners' performance, confidence in performance, and calibration accuracy. Hill, Bloom, Black, and Lipsey (2008) discuss the importance of using empirical benchmarks for a more realistic approach to interpreting effect sizes, which allows for cross-study comparisons. The effect sizes ( 2 p ) observed in this study, which ranged from 0.090 to 0.115, were in the medium range, and were comparable to those in other strategy training interventions for the improvement of metacognitive monitoring (e.g. Bol et al., 2005;Gutierrez & Schraw, 2015;Hacker et al., 2008;Nietfeld & Schraw, 2002), albeit these studies were with college students. Moreover, findings pointed to the need to consider cognitive, metacognitive, and motivational characteristics, as use of an extrinsic incentive was found to enhance the aforementioned outcomes as well. Because it is a relatively short strategy instructional intervention, it may be appealing to educators in classroom settings to assist students in becoming more self-regulated learners.