Predicting GPAs with Executive Functioning Assessed by Teachers and by Adolescents Themselves

Executive functions (EFs) show promise as important mediators of adolescent academic performance. However, the expense of measuring EFs accurately has restricted most field-based research on them to smaller, non-longitudinal studies of homogeneous populations with specific diagnoses. We therefore monitored the development of 259 diverse, at-risk students’ EFs as they progressed from 6th through 12th grade. Teachers completed the Behavior Rating Inventory of Executive Function (BRIEF) for a random subset of their students. At that same time, those same students completed the Behavior Rating Inventory of Executive Function-Self Report (BRIEF-SR) about themselves; teachers generally reported stronger EFs in students than students reported in themselves. Results further indicated that both BRIEF and BRIEF-SR Global Executive Composite (GEC) scores—measures of overall executive functioning—significantly predicted overall GPAs more than was already predicted by students’ gender, IEP status, and eligibility for free/reduced school lunch. BRIEF (teacher) scores were better predictors and contributed more to predictive accuracy than the BRIEF-SR (student) scores; BRIEF scores even added additional predictiveness to a model already containing BRIEF-SR scores, while the reverse did not hold. This study provides evidence for valid use of BRIEF and BRIEF-SR GEC scores to predict middle and high school GPAs, thereby supporting practitioners use for this purpose within similar, diverse, at-risk populations. The study also illuminates some of the EF development for this population during adolescence.

The BRIEF and BRIEF-SR were constructed to measure two general areas of EF: metacognition and behavioral regulation (Gioia et al., 2000;Guy et al., 2004), themselves each comprised of further subscales. Exploratory factor analyses of the eight subscale divisions of the parent and teacher forms of BRIEF showed the same two-factor solution in both normal controls and specific clinical subjects (Gioia et al., 2000). The metacognition and behavioral regulation areas can be combined to create an overall Global Executive Composite (GEC) score. As operationalized by the BRIEF and BRIEF-SR, metacognition includes the "ability to initiate, plan, organize, and sustain future-oriented problem solving in working memory" (Gioia et al., 2000, p. 20).
Behavioral regulation involves the "ability to shift cognitive set and modulate emotions and behavior via appropriate inhibitory control" while allowing "metacognitive processes to successfully guide active, systematic problem solving (and supports) appropriate self-regulation" (Gioia et al., 2000, p. 20). Clearly the functions subsumed by these general areas inter-relate, justifying the creation of the BRIEF and BRIEF-SR as instruments that subsume both domains of executive functioning.
Some evidence for the valid use of these instruments in academic settings is proffered by Langberg, Dvorsky, and Evans (2013) who investigated academic outcomes among ~100 adolescents diagnosed with ADHD. They used the parent and teacher versions of the BRIEF and found that teacherrated scores on the Plan/Organize subscale of the BRIEF significantly contributed to the prediction these students' overall grade point averages (GPAs) beyond that made by the number of parent-reported ADHD symptoms. Although limited to students diagnosed with ADHD, Langberg, Dvorsky, and Evans' study is among the few to use these instruments to study EFs and academics among adolescents-in contrast to the larger amount of research conducted among children (e.g., Clark, Pritchard, & Woodward, 2010;Locascio, Mahone, Eason, & Cutting, 2010;Waber et al., 2006). Best et al. (2011) investigated the relationships between EFs and academic achievement among a sample of over 2,000 children and adolescents using the Planning scale of the Cognitive Assessment System (Naglieri & Das, 1997); they found that EFs were moderately correlated with success in both math and reading achievement. Boschloo, Krabbendam, Aben, de Groot, and Jolles (2014), however, did not find a significant relationship between some subscores on a Dutch version of the BRIEF-SR and grades in Dutch, English, and mathematics; they also did not find that grades were predicted by behavioral measures of EFs from the Delis-Kaplan Executive Functions System.
The BRIEF-SR has been used less frequently than the BRIEF in research. It may be that studies like that reported by Boschloo et al. (2014) represent similar null findings that others find and do not publish, or that adolescents' insights into their own EFs remain an understudied area. Adolescents have been found to be able to rate their own behaviors accurately (Wichstrøm, 1995); nonetheless, individuals of many ages who are still developing an ability are often not so good at rating themselves on that ability (Dunning, Johnson, Ehrlinger, & Kruger, 2003), and the ability to monitor aspects of one's own performance is itself The European Educational Researcher | 177 an EF, so adolescents who are still developing the ability to monitor their own behaviors may not be so able to accurately rate themselves. One of the goals of the present study is to investigate the relationship of BRIEF and BRIEF-SR in predicting academic performance, comparing them against each other and conducting an initial foray initial to the role of selfmonitoring on the predictive aspects of the BRIEF-SR's validity here.

Assessment of Academic Success through GPA
In addition to strongly predicting future grades, middle school grades are among the best predictors of high school graduation (Lohmeier & Raad, 2012) and performance on standardized exams such as the Stanford Test of Basic Skills (Wentzel, 1993). High school grades predict college grades better than SAT scores (Geiser & Santelices, 2007).
Second, despite possible concerns with bias and generalizability, GPA remains both a common and well-predictive variable that gives complementary and non-redundant information predicting students' future academic success. Third, we believe that it is indeed under-valued while standardized scores are sometimes over-valued.

Goals and Hypotheses
The primary goals of the current study are to (1) investigate the predictive validity using of the BRIEF for experimental uses in schools by analyzing the contribution of BRIEF GEC scores to predictions of academic performance among at-risk adolescents from 6th to 12th grade, (2) investigate the predictive validity using of the BRIEF-SR for experimental uses in schools by analyzing the contribution of BRIEF-SR GEC scores to the predictions of these same outcomes, and (3) directly compare the contributions of the BRIEF with those of the BRIEF-SR for their uses as experimental tools. The secondary goal of the study is to investigate changes in EFs over these years.
We hypothesized that BRIEF GEC scores would show valid uses in middle and high school by predicting academic performance well. We also hypothesized that the valid predictive use of BRIEF-SR GEC scores here may not be as well supported (i.e., will not predict academic performance as well as BRIEF GEC scores) given the equivocal findings on the use of the BRIEF-SR in academic settings outlined earlier. We further hypothesized that EFs would improve, although the extent of their improvements may be affected by students' demographics.

Method Participants
All of the 259 participating students attended the same charter school located in New York City. The school is designed to serve mainly at-risk students by providing them with an enriched environment that prepares them for future academic success, including preparation for college. The ages of the participating students ranged from 9 to 18 years (mean = 13.45, SD = 2.65). About 85% of the school's students are eligible for either free (68%) or reduced-priced (17%) school lunches. Many of the students come from minority ethnicities: 32% self-identify as Hispanic; among the non-Hispanic students, 42% identify as African-American, 8% identify as Asian-American, and 17% identify as European-American. Finally, 40% of the students have diagnosed disabilities.
Students who participated in the study were enrolled in grades 6 through 12. Students contributed data for each year they were enrolled at this school; students who left the school or graduated ceased being measured; 38 (14.7%) of the students left the school before the end of the study. Those who left early did not significantly differ from those who stayed in terms of overall mean GPA (t36.0 = 1.42, p = .16), BRIEF GEC scores (t37.6 = -1.07, p = .29), or BRIEF-SR GEC scores (t50.31 = 0.40, p = .69).
The data analyzed included EF scores and GPAs from the current academic year as well as available EF scores and GPAs from all previous years. School lunch status, Individualized Education Program (IEP) status, and gender were all considered to be fixed terms here.
Forty-six teachers participated in this study by completing the BRIEF for students in their classes.
The students whose teachers were asked to rate were selected at random within constraints to balance the effect of teachers' course content expertise. The constraints were to ensure that the contents areas of teachers were equally sampled (thus reducing and equating any effect of a given teacher's effect on both GPA and BRIEF scores) and that each teacher reported on an equal number of students (thus equating any effect of within-rater variance). In addition, none of the teachers rated the students more than once: Each year, a different set of teachers rated the students, further limiting the effect of any one teacher on both BRIEF scores and GPA.
The identities of the students or the rating teachers were not disclosed to the researchers. Students' ages, gender, whether they were eligible to receive free/reduced school lunches, and whether they had an IEP were provided by the school. Regulation Index, which are added together to create a Global Executive Composite (GEC) score, which offers an overall measure of EFs. We will focus on the GEC since individual executive functions may develop at different rates during adolescence (Best & Miller, 2010), to facilitate comparisons with the BRIEF-SR, and to create a manageable set of analyses.

BRIEF-SR
The Behavior Rating Inventory of Executive Function-Self-Report (BRIEF-SR) was created to

Academic Performance
Academic performance is operationalized here as annual cumulative GPA in core courses, viz.,

The European Educational Researcher | 179
English/language arts (ELA), mathematics, science, social studies, and Spanish. GPAs were computed for grades 6 -12, the grade levels investigated in this study. Note that the teacher who rated each student through the BRIEF also teaches one of that student's courses and therefore contributes one of the five grades that comprise that student's GPA for that year (but for no other year). The teachers all taught various subjects, and the subjects taught were balanced across teachers, so this potential bias was distributed across all of the five courses.

Procedure
The BRIEF was distributed to the participating teachers by the school administration within two weeks of the end of every academic year for five consecutive years. The teachers used the BRIEF to rate a predetermined, randomly-selected subset of their students within one week of distribution of the instrument to them, as described in the Participants section, above.
The students were all administered the BRIEF-SR on the same day that the teachers were initially given the BRIEF. Students completed the BRIEF-SR on that same day; absent students completed it on the same day that they returned to school. With institutional and school IRB permission, all of these data were linked, anonymized, and given to the authors for analysis.

Analyses
Hypotheses were primarily tested through the series of nested and partially nested multilevel models of change reported here. We assessed whether BRIEF and/or BRIEF-SR GEC scores made significant contributions to predictions of total GPA by comparing differences in how well the models fit the data with and without BRIEF/BRIEF-SR scores added to them. We also added EF-Score x Time interaction terms to the models; these interaction terms test whether the influence of EF on GPAs changes over time.
It is worth noting at this point that we use the term "prediction" in the statistical sense of using known information (viz., EF scores and demographic information) to infer unknown information (viz., overall GPA). Nonetheless, the EF-score term establishes the y-intercept, thus using initial EF scores to infer information about future GPAs, therefore also addressing in part the more traditional use of the term "prediction" in that we are using prior scores to test for later scores. Nonetheless, we did not manipulate either EFs or GPAs: Although we can test predictive relationships, we cannot test for causal relationships between EFs and GPAs.
We compared the fits of models to the data using deviance statistics: -2 log-likelihoods (-2LLs) for comparisons between models using the same data and Bayesian Information Criteria (BICs) for comparisons of models that did not use the exact same data (i.e., the fit of the model containing BRIEF scores vs. the model containing BRIEF-SR score).
The multilevel models of change used here can easily accommodate instances where some time-varying data are missing for some participants, e.g., if a student does not have an EF score for a given year (Singer & Willet, 2003). However, differences in deviance statistics can only be validly analyzed when those statistics are computed from the exact same data set.
For completeness, then, we also assessed whether using the subset of the whole data set that contained only complete data for each participant appreciably affected the results. It did not: The term values of the models were not meaningfully different between analyses conducted with the entire set of data and analyses conducted with only data with no missing values. We therefore proceeded with the comparisons reported herein.
BRIEF and BRIEF-SR scores and GPAs were standardized. Time was measured as the number of days since that student's tenth birthday that the BRIEF was completed; these ages were then also standardized. Data were analyzed using R, version

Descriptive Statistics
The teachers reported knowing the student they were rating an average of 12.47 (SD = 6.74) months. In addition, only 5.52% of the teachers indicated that they knew the given student they were rating "not well" while 49.11% indicated they knew that student "moderately well" and 45.37% indicated they knew that student "very well". Table 1 presents the number (and percent) of female and male students with and without IEPs. The mean GPAs, BRIEF GEC scores, and BRIEF-SR GEC scores for females and males are presented in Table 2.
In general, teachers tended to rate a students' EFs as stronger (via lower BRIEF scores) than students rated their own EFs (via less low BRIEF-SR scores).       (1995) suggests is "very strong."

Summary of Main Findings
Both boys and students with an IEP tended to have lower GPAs; being eligible for free/reduced-price school lunches was not a strong predictor among this nearly uniformly poor sample. When we then considered the role of EF-related behaviors, we found that they made a very strong contribution to our predictions of GPAs beyond that made by both gender and IEP status-regardless of whether the frequency of EF-related behaviors were reported by a student's teacher or by the student her/himself. Nonetheless, gender and IEP status remained significant predictors of GPAs even when the strongly-predictive terms for EF-related behaviors were added.
Although EF-related behaviors strongly added to our predictions of GPAs, the information we gained from asking teachers about these behaviors was not entirely redundant with the information gained when we asked the students themselves: teacher-generated information was a stronger predictor than studentgenerated information, but non-EF-related terms remained stronger when we considered only studentgenerated information. When we considered both teacher-and student-generated information about EFrelated behaviors, that generated by teachers tended to overshadow that generated by students. The relationship between teacher-and student-generated scores is next considered further.

GEC Correlations
Supporting the analyses comparing model fits, the correlation between the mean BRIEF and BRIEF-SR GEC score for each student across all waves was .38 (p < .001). This is somewhat higher than the correlation between these two scores found by Guy et al. (2004), who found the correlation in a stratified sample of 148 adolescents to be .25. In their metaanalysis of a wide range of psychological studies, Achenbach, McConaughy, and Howell (1987) found that the average correlation between a teacher's ratings of students on a given scale and a student's self-ratings on a similar scale was .20.
Although we therefore found a relatively good correlation between students' and teachers' ratings, there is certainly room for BRIEF and BRIEF-SR GEC scores to make unique contributions. The extent to which BRIEF and BRIEF-SR GEC scores can both add to predictions of GPAs is tested in Model 4.
Adding both BRIEF-and BRIEF-SR-related terms (Model 4) indeed makes for a significantly betterfitting model than when using only BRIEF-SR-related GEC terms (-2LL = 81.2). However, using both BRIEF and BRIEF-SR GEC scores does not make for a significantly better-fitting model than using only BRIEF-related GEC terms (-2LL = 3.8).

BRIEF and BRIEF-SR Subscore Correlations
Tables 5 and 6 present the correlations between the BRIEF and BRIEF-SR subscores, respectively. These tables show that the correlations between the subscores within an instrument are all rather high for field-based, social-science research (lowest r = .37).
Nonetheless, the correlations between the subscores on the BRIEF are all higher than the correlations between the subscores on the BRIEF-SR: The lowest correlation between BRIEF subscores is .84 whereas the highest correlation between BRIEF-SR subscores is .78. The predictiveness of a score is limited by the correlations between its components (Nunnally & Bernstein, 1994), so the relatively lower correlations between the BRIEF-SR subscores likely contributes to the lower predictiveness of BRIEF-SR GEC scores.
The European Educational Researcher | 185 This study therefore provides evidence for the valid use of these instruments to predict overall GPA, supporting their use within similar at-risk populations.
However, although we found that knowing the BRIEF or BRIEF-SR assigned to a given student can reliably predict that student's current current or future GPA, we cannot infer from our findings whether EFs indeed cause students to have a given GPA. The study design does not allow us to rule out whether GPA causes EFs to attain a given level or whether both are in fact determined by one or more unmeasured moderating variables. Authors included a smaller sample size than we used here, so this difference in results is likely due to the fewer degrees of freedom they had for these other, demographic terms. Given the inter-relatedness of these EF scores with demographic factors, we also strongly advocate including demographic factorsboth for the analytic clarity and for the theoretical importance of this inter-relatedness. Although we are not currently able to measure additional factors such as IQ and parental education, we echo Jacob and Parkinson's (2015) recommendation that they be included whenever possible as well.
It is worth noting that since the teachers both rated a subset of the students once (over the seven years of this study) on BRIEF and contributed to a portion of that student's GPA, there is likely a small but nonneglible relationship between teacher-generated BRIEF scors and student GPA. This, of course, is much less likely to affect the relationship between student-generated BRIEF-SR scores and GPA. These results then also provide insights into the relationship between executive functions and GPAs with possible controls on any bias borne from the respondent: The BRIE-SR-GPA provides controls on any covariance from the teacher while the BRIEF-GPA relationship provides controls for a student's still-developing selfawareness. Together, then, they help provide the beginning of a more rounded and nuanced perspective that supports the relationship between executive functions and academic performance as measured through overall GPA.

Limitations
As discussed by Jacob and Parkinson (2015), IQ and EF are known to be highly correlated, although not synonymous. However, we were not able to measure IQ. The measure of academic success employed here is GPA, which is assigned by the students' teachers.
Although GPA strongly predicts future grades (Lohmeier & Raad, 2012) and performance on standardized exams (Wentzel, 1993), we should bear in mind that such a measure generally assesses both academic achievement and behavior, and the two cannot be disentangled (Jacob & Parkinson, 2015).
Nonetheless, we chose to use them to test the validity of the instruments since GPAs are ubiquitously and heavily relied upon to monitoring students' academic development.
In addition, the teachers who rated the students' EFs also taught one of the five courses which comprised the GPA. Therefore, 20% of a student's GPA was computed from a class that was taught by the same teacher who rated that student, although which course that was balanced across all of the students.
Practitioners who do not wish to tolerate any bias in EF ratings introduced by teachers but who nonetheless wish to benefit from the efficiency of this way of measuring EFs could rely instead on students' selfratings.