Using Artificial Societies to Understand the Impact of Teacher Student Match on Academic Performance

This paper presents an agent-based model of the standard U.S. k-12th grade classroom using NetLogo. By creating an artificial society, we identify the casual implications of the samerace effect (a moderate sized academic boost to students whose teachers have the same race) on the national educational achievement trends. The model predicts sizeable achievement gaps at the national level, consistent in size with those documented by the US National Report Card (NAEP) stemming from moderate sized same race effects. In addition, matching effects are found to be a source of increased heterogeneity in academic performance for the minority group. These results hold for all teacher-student matching phenomena and have implications for educational policy at the aggregate level. Using artificial societies to disentangle the aggregate effects of hypothesized causes of the achievement gap is a promising strategy that merits further research.

One of the most salient US educational policy problems is the amelioration of the Black-White gap in student achievement (Braun, Chapman, & Vezzu 2010). As documented by the National Assessment of Educational Progress (NAEP) long-term study, the average Black-White educational gap from 1990-2008 ranges from 0.72 to 0.83 (Cohen's d) for the various grades assessed (Planty, Hussar, & Snyder 2009). The Black-White gap starts in early childhood and cannot be explained by differences in family or test characteristics (Brooks-Gunn, Klebanov, Smith, Duncan, & Kyunghee 2003).

1.2
Attempts to explain the Black-White academic performance gap include attributing it to differences in the academic engagement of students, differences in their social capital, differences in their cultural capital, differences in the quality of schools, and race-contingent treatment of students by teachers. In a nationally representative study, Oates (2009) tested these five hypotheses simultaneously on the NELS data and found that race-contingent treatment and quality of education were more powerful predictors of black-white disparities than academic engagement, or differences in students' social and cultural capital. Research on race-contingent treatment of Black students by White teachers has shown that teachers are more likely to view and evaluate differently children who are of a different race (Alexander, Entwisle, & Thompson 1987;Downey & Pribesh 2004;Ehrenberg, Goldhaber, & Brewer 1995;Muller 1998;Oates 2003).

2.1
There are two inter-related but distinct agent-based modeling academic literatures in the field of education: (a) pedagogical articles describing how students learn about complex systems and how agent-based models-often using NetLogo (Wilensky 1999)-facilitate such learning (Goldstone 2006 Wilensky & Resnick 1999), and (b) research articles that use agent-based modeling as a tool to understand or evaluate educational policy. This brief review will focus on the second literature, which is still in an early stage of development.
2.2 Henrickson (2003) conducted a feasibility study to determine if agent-based modeling (programmed in C++) could produce results substantially similar to those reported in McDonough's (1994) empirical study of college choice. She developed her agent-based model using social capital theory and cultural reproduction theory (Bordieu 1984). The model used rich input data from the Cooperative Institutional Research Program 1994 Freshman Survey, administered by the Higher Education Research Institute. Her study used the input data and the model to generate an output dataset that would be similar to the one used in empirical research. Regression conducted on the agent-based model output data resulted in coefficient estimates with correct signs, and in the same order of magnitude as those obtained by logistic regression analysis in the original empirical work.

2.3
Maroulis (2009) used a complex systems perspective and agent-based models to analyze two areas of educational reform: why program reforms do not take hold and public school choice. His first model provided explanations for how institutional inertia emerges in a school where all individual teachers seek to improve. In his second model, he studied school choice showing that, paradoxically, if parents pick schools solely on achievement the highest achievement school will not emerge as the sole winner of the school choice mechanism, even in the absence of capacity or mobility constraints. This result occurs because the migration of low performing students into the highest performing school bring its aggregate performance down and generates unexpected dynamics.

2.4
Maroulis has continued his work on agent-based models of school choice to better understand the impact of school choice in Chicago public schools (Maroulis, Bakshy, Gomez, & Wilensky 2010;Maroulis et al. 2010). These models have become increasingly realistic by adding geographic features of the Chicago area public schools as well as capacity constraints.

2.5
In sum, the use of agent-based models to research educational policy is still in its infancy but the models are very promising to explain paradoxes or well established educational phenomena that are not well understood.
Purpose of the research 3.1 In educational policy research, we face a fundamental macro-micro divide problem. In our case, we wanted to know if we could provide any evidence that micro-level same-race teacher effects can cause achievement gaps at the national macro-level. Because there are a myriad of confounding factors in the real world and large-scale randomized trials are unlikely, it is difficult to establish this cause and effect claim with any certainly. Even the best empirical study (Oates 2009) can only establish correlation. The correlational approach presents difficulties because correlational studies multiply rather than reduce the number of hypothesis for any observed phenomenon. When a correlational study discovers a relationship between a variable and an outcome, the new variable increases the list of potential causes for the outcome (for example, recent studies have correlated the Black-White gap with parenting (Mandara, Varner, Greene, & Richman 2009), and family wealth (Yeung & Conley 2008)). In this paper, we propose an alternative approach that uses the NetLogo environment to simulate a small but relevant aspect of the classroom environment, create an artificial society were only one factor is manipulated and estimate the results for the whole population in a similar manner as we do with the US national report card (the NAEP-National Assessment of Educational Process).

3.2
By creating an artificial world where all Black and White students have identical ability, personal and familial resources and where education is delivered by teachers of identical quality, we can investigate whether the existence of a same-race teacher student effect is enough to generate the widely reported Black-White educational gap. In sum, the purposes of this study were (1) to determine if, in an artificial society that simulates an idealized randomized trial, the same race teacher effect is capable to generate the Black-White achievement gap similar in size (Cohen's d within 1/10 standard deviation) to those reported by the US National report card long-term study, and (2) to understand the relationship between the same race teacher effect and the size and timing of the gap.

Methods
Description of the model 4.1 The model was constructed in NetLogo, an agent-based simulation software. We created two sets of agents: students (represented by the human figure) and teachers (represented by the house figure). All students are in the same grade and form a cohort. Students and teachers were randomly assigned to minority (represented in blue color) and majority (represented in yellow) status using the calibration parameters described in the next section. Two concentric circular displays were chosen to facilitate the visualization of classrooms, with teachers in the inner circle and students in the outer circle. Students remain in place in an intact class and are randomly assigned to a new teacher each year. Student assignment is represented graphically by a line segment (link) between teacher and student. If there is a race match the segment is green, if not red. Thus, a wedge of green lines represents a classroom where teacher and students are of the same race (typically white).A mostly red wedge identifies a Black teacher teaching mostly White students, while the occasional red line indicates either a Black pupil with a White teacher or a White pupil with a Black Teacher. Figure 1 shows the display. After teachers are randomly assigned to students, the programs determines if there has been a same race match, colors the link appropriately, and calculates the academic production function for each student.

4.3
In the economics of education literature (Hanushek & Welch 2006), educational production is modeled with a value-added linear function where current achievement is a function of past achievement, a constant teacher effect, the same race effect, and a normally distributed random disturbance (Achievement (t) = Achievement (t-1) + teacher effect + same-race effect + disturbance). Initially, achievement is thought to be measured in some suitable absolute metric (e.g. number of correct responses) at the end of the academic year. The starting achievement point is zero for all students. The constant teacher effect can be thought of as the overall quality of the teacher, as it is commonly done in educational value-added formulations. In this paper, the teacher effect is assumed to be constant so that all teachers are of identical quality and do not confound the investigation of same race effects. The normally distributed, unbiased, random error models other factors that impact achievement in this artificial environment. Thus, the educational production function specifies that in this artificial world students' achievement accumulates by adding the homogenous teacher effect and the same-race effects. As it is typically done in education and psychology, both teacher effect and the same race effect are measured in Cohen's d effect size metric, standard deviations from the mean. Because there are no other systematic causes of achievement in the artificial environment and all teachers are of identical quality, the results of achievement gaps are necessarily caused by the same race effect.

4.4
Next, student achievement is normalized by expressing the relative achievement in z-scores (by subtracting the mean and dividing by the standard deviation). Thus, the average relative achievement is always zero for each year and the associated standard deviation is 1. The relative achievement score indicates how many standard deviations from the mean for the current grade is any particular student's achievement, and it simulates performance on the relative metric of standardized testing. This process is then repeated for 13 years (K-12).  (Planty, Provasnik, Hussar, Snyder, & Kena, 2007). Thus, on average over the 13 years or K-12, 7.15% of teachers self-identified as Black.

4.9
In our model, we simulate a world were only Black and White students and teachers exist. Based on the percentages above, Black students were set at 15.3% and Black teachers at 7.15%. Because our program uses the normal distribution to pick the race of students and teachers at random, we set the normal distribution cutoffs at -1.02365 and -1.46471 respectively.
4.10 The student/teacher ratio was 15.8 for regular public schools in 2006 (Planty, et al., 2009). Thus, we set student/teacher ratio at 16 in our simulations.
4.11 In summary, our simulations have 800 students and 50 teachers. In a typical run, 4 teachers will be Black and 122 students will be Black.
Benchmarks for the outcomes http://jasss.soc.surrey.ac.uk/15/4/8.html 4.12 We used the NAEP data explorer to obtain the estimates of the average reading scores and standard deviations for White and Black students from 1990 to 2008. We calculated by subtracting the average score of Black students from the average score of White students and dividing by the standard deviation of White students. The average standardized Black-White gap in reading scores at age 13 was 0.73 standard deviations. The corresponding estimate for reading scores at age 9 was 0.81, 0.73 for age 17. The standard deviations of reading scores were quite comparable for all age groups; with Black students showing slightly more variable scores at age 9 and 13, while having similar variability at age 17.
Simulation 4.13 We ran 100 simulations for each value of the same-race effect ranging from a minimal effect size of 0.01 to a sizeable (for educational interventions) effect size of 0.40, at intervals of 0.01 for a complete set of 4000 runs. The general teacher effect, T, was set at zero to simplify the presentation of results. We ran other simulations with various positive T values and obtained substantially the same results.

Results
5.1 Figure 2 shows the White and Black absolute achievement by size of same race effect at the end of the final year (grade 12). The data shows that the same-race effect size is linearly and positively related with absolute achievement. The relationship between Black absolute achievement and same race teacher effect is also linear but very weak. Note also that Black children have a much higher dispersion. This effect is produced by the greater variability in the probability of matching with a same race teacher, compared to that of white children. 5.2 Figure 3 shows the 12 th grade relative achievement of Black and White students by same race teacher effect size. Because White children are the majority of the sample, they are de facto the normative group, thus their relative achievement is close to the overall mean of the score, arbitrarily set at zero. Thus, in relative performance measures, Black children at the end of 12 th grade appear as being substantially below average, with the effect decreasing linearly with the same-race teacher effect size. In this relative metric, for a small same race effect size of 0.10, we expect Black children who were initially comparable to White children to score about ¼ standard deviations below the mean by 12 th grade. For a larger same race effect size of 0.40, we expect that Black children to score a full standard deviation below the mean, even though Black and White children have identical initial achievement or ability and equally effective teachers. 5.3 Figure 4 shows the 12 th grade relative achievement gap by same race teacher effect. This chart is a summary representation of the gap between the White and Black trends shown in figure 2. There is a well-fitting linear relationship between same race effect size and the size of the relative achievement gap (Gap=0.02+2.84 Same-race-teacher-effect, R2=0.93, slope coefficient was statistically significant, p<.05). the relationship can be described by the concept of a multiplier between the same-race teacher effect and the cumulative impact on the relative achievement gap by 12 th grade. Our estimate is that the multiplier is 2.8, indicating that same race effects result in triple-sized cumulative relative gaps by the end of 12 th grade.
For example, for a small effect size of 0.1 we expect a 12 th grade relative gap around 1/3 standard deviations; for an initial effect size of 0.2 we expect a gap of 3/5 or 60% a standard deviation. In summary, same race teacher effects result in effects on the relative performance gap of roughly 3 times their size. Figure 4. Relative Black-White 12 th grade achievement gap by size of same race effect 5.4 Figure 5 plots the relationship between the relative achievement gaps by year for a same race teacher effect of 0.35. As we can see, the first year (year 0 -Kindergarten) has the largest increase in the relative gap with a size comparable to the teacher effect size. Each subsequent year adds to the gap but a decreasing rate (a logarithmic curve) until by the end of the 12 th grade the average gap is the initial effect size times the multiplier. Thus, although the effect of same-race teacher matches accumulates over time, on a population basis most of it occurs in elementary school.

5.5
We plotted similar figures for other effect sizes and found them to all belong to the same family of logarithmic curves starting at zero, jumping to the same race teacher effect at the end of Kindergarten, and ending on the multiplier times the initial effect size.

5.6
In summary, the conclusions of this simulation are as follows. In a world where Black and White students have identical ability, personal and familial resources, and where education is delivered by teachers of identical quality but there exists a same-race teacher effect (1) that effect alone is sufficient to create a substantial Black-White relative achievement gaps of about three times the size of the same race effect ; (2) it results in Black students being below average on a population basis and Whites being slightly above average; (3) it results in greater variability of achievement among Blacks compared to Whites, and (4) large and increasing racial gaps start in the early grades, particularly in the earliest grade with full participation (e.g. Kindergarten or 1 st grade depending on the state).

5.7
The NAEP long-term average Black-White achievement gap from 1990-2008 ranges from 0.72 to 0.83 for various age groups. A same-race teacher effect of 0.25 generates these benchmarks within 1/10 of a standard deviation. Thus, a relatively small same race effect could cause Black-White performance gap of similar size to those reported by the long-trend NAEP study.

5.8
Our simulation showed that the same race effect should also result in greater variability of performance scores for Black students at any grade. This pattern was confirmed by the data only for ages 9 and 13. At age 17, scores of Black and White students were equally variable in the long-term NAEP average for 1990-2008. Yet in our artificial society no students dropped out while the age 17 NAEP is calculated on survivors only. Disproportionate dropping out behavior by Black students would substantially reduce their standard deviation, compared to that of white students. Thus our simulation results are generally consistent with the evidence. Discussion 6.1 We have created an artificial world where same race teacher effects are the only possible cause of increased achievement. In such a world, the presence of relatively small same race teacher effects yields substantial relative Black-White achievement gaps of the size observed in the NAEP data. Because we have complete control of the artificial environment, we can attribute cause and effect, and thus we have proven the following proposition: relatively small same-race teacher effects can cause measurable racial gaps in achievement as measured by standardized testing.

6.2
Although constructing an artificial world has the great advantage of complete control, it can never establish that the phenomenon occurs in the real world. Thus, we have not proven that same-race effects exist, although there is correlational support for their existence (Alexander et al. 1987;Muller 1998;Oates 2009). What we have shown is that under ideal conditions, the same race effect reported in the literature can generate Black-White achievement gaps comparable in size to those reported by the NAEP. Thus, we conclude that while same-race effects may very well be a cause, although not likely to be the sole cause, of the Black-White achievement gap.
6.3 Importantly, we have demonstrated that relatively small sized same race effects lead to much larger Black-White aggregate gaps in the absence of any institutional disparity other than the same race effect. Scholars have suggested that same race effects may occur through the practice of tracking (i.e. giving different placements to children of different races) (Oates 2003), but in this paper we show that same race effects accumulate into large Black-White aggregate gaps even if students are never tracked at any point of their academic careers. In our artificial society, each student is given a different, randomly assigned teacher each year. There is neither tracking nor any communication between past and current teacher, nor any impacts of student race on school policy (Hawley & Nieto 2010), and yet the simple fact that the Black student is less likely to have a teacher of his own race compared to a typical White peer results in the aggregate Black-White gap. Certainly, if in addition to a same race effect we add additional mechanisms such as communication (e.g. sharing of expectations) between teachers or tracking of students into groups of homogenous ability, as perceived by teachers, we would expect that the gap would widen. This more advanced model, however, would benefit from the simple model presented in this paper as a benchmark case. 6.4 The model presented in this paper has proven a larger proposition, namely, that any matching process that confers an advantage to children matched to particular teachers will accumulate and result in sizeable achievement gaps in relative achievement measures by the matching factor. This process occurs whenever groups have a different likelihood of matching, and results in a multiplier effect of the matching effect. The multiplier effect depends on the probability of matching, determined by the percentage of the student and teacher population who have the minority condition, as well as the error of the process. This would apply to any matching effect, such as matching by socio-economic status, gender, or any other characteristics empirically shown to confer benefits to the matched recipient.
Limitations 6.5 The model has a number of limitations. The only free, non-calibrated, parameter is the standard deviation of the error term in the educational production function. We set it at 1 standard deviation (the standard convention for z scores) indicating substantial variability among students. Good measures of the real variability could be obtained from randomized experiments, but were unavailable to us. Variations in the standard deviation of the error term do not qualitatively change any of our results and are inversely related to the multiplier effect. Smaller standard deviations facilitate the visualization of model results.
Possible expansions of the model 6.6 The model was created to be expandable and adopt many other features, yet in this first paper, we use the model in a very simple way: assuming random assignment of teachers and no peer effects as a way to model the results one would obtain through an ideal randomized trial. We are quite cognizant that the current model does not exploit the capabilities of NetLogo to model educational outcomes in society. We are working on expansions of the model to include the effect of de facto racial segregation in US public schools and the presence of reinforcing peer effects to determine how these more realistic conditions impact the results presented in this paper which simulates an idealized randomized trial. As mentioned above, the addition of tracking practices to the model may also change the results and deserves careful exploration. Additionally, the same race effect may be moderated by teacher variables such as selfefficacy or teacher preparation and such moderation could be added to the existing model (Hines & Kritsonis 2010;Kukla-Acevedo 2009). The incorporation of these variables into the model will necessitate augmenting the value-added theoretical framework with richer theoretical orientations that explain how teachers and students related to each other. Other possible extensions, suggested by the importance of the adoption of value-added models as key components of teacher's evaluation in Race to the Top legislation, will include the exploration of how value added of teachers will be impacted by same race effects. In a reanalysis of the Project STAR experimental data, Konstantopoulos (2009) found that teacher effects interacted with the race of student in some grades but not others and it would be interesting to model such effects in our artificial society. Implications 6.7 Methodologically, we have shown that the creation of artificial world can aid in our understanding of the educational process by providing an environment in which variables can be changed one at a time in a systematic manner. In doing so, the model yields corollary empirically testable findings that were not immediately obvious before simulation and not reported in the empirical literature (e.g. greater variability in the minority achievement, increased gap by age). In addition, it allows us to conceptually disentangle hypotheses often confused in the literature (e.g. same race effect and tracking).
6.8 Substantively, we have shown that same-race effects are capable of causing sizeable racial educational disparities at the national level and thus deserve careful empirical measurement and additional research. Generally, our model shows that any student-teacher matching effect will result in (a) an achievement gap by the matching characteristic, and (b) increased achievement heterogeneity for the students with the characteristic that has lower probability of matching. Because a multiplier effect will amplify initially small matching effects into larger achievement gaps, a better understanding of matching effects is important to design correct educational policy, including Race to the Top teacher accountability systems.
6.9 A procedure to determine the impact of any possible matching factors is to (a) conduct small scale randomized trials on the matching effect to obtain unconfounded effect sizes for both the matching effect and for standard deviations in academic tests. Once this information is known, an artificial society can be constructed to determine the likely impact of these matching factors on aggregate educational data trends. As the societies are perfected, educational policy can be first modeled in the artificial society to understand its likely impact on key outcomes. ;; Used code form the standard netlogo library to create charts to setup-graph set-current-plot "Relative Achievement" set-current-plot-pen "axis" ;; we don't want the "auto-plot" feature to cause the ;; plot's x range to grow when we draw the axis. so ;; first we turn auto-plot off temporarily auto-plot-off ;; now we draw an axis by drawing a line from the origin... plotxy 0 0 ;; ...to a point that's way, way, way off to the right. plotxy 1000000000 0 ;; now that we're done drawing the axis, we can turn ;; auto-plot back on again auto-plot-on set-current-plot "Absolute Achievement" set-current-plot-pen "axis" ;; we don't want the "auto-plot" feature to cause the ;; plot's x range to grow when we draw the axis. so ;; first we turn auto-plot off temporarily auto-plot-off ;; now we draw an axis by drawing a line from the origin... plotxy 0 0 ;; ...to a point that's way, way, way off to the right. plotxy 1000000000 0 ;; now that we're done drawing the axis, we can turn ;; auto-plot back on again auto-plot-on set-current-plot "Relative Achievement by Number of Matches" end to graph set-current-plot "Relative Achievement" set-current-plot-pen "Minorities"