A Cluster Randomized Controlled Trial of Establish-Maintain-Restore Among High School Teachers and Students

A large body of research shows that when students have close, trusting relationships with their teachers, they are more likely to exhibit a host of other positive social, emotional, and academic outcomes (Quin, 2017; Roorda et al., 2011). Meta-analyses, conducted with students ranging from pre through high school, found student–teacher relationships (STR) to be associated with a number of student outcomes, including engagement, sense of belonging, self-esteem and social skills, in addition to improved behavior, academic achievement, lower suspension rates, and reduced risk of dropout (Quin, 2017; Roorda et al., 2011). These associations have been documented cross sectionally and longitudinally, with some studies showing effects up to four years later (e.g., Valiente et al., 2019). Further, evidence from a recent meta-analysis of the intervention literature revealed that universal programs aimed at improving STR have an average effect size of d = 0.26 on relationship quality, with some programs having substantially large effects (Kincade et al., 2020). Notably, 20 of 21 studies included were conducted with elementary school students; only one was conducted in middle school. No studies were conducted with high school students, leaving a significant gap in the student–teacher relationship literature. In addition to the lack of intervention research at the high school level, there are notable disparities in STR among racially minoritized students (Hughes et al., 2005; Thijs et al., 2012), warranting programs that are equity-explicit (Gregory et al., 2016). Given this, the purpose of this study was to conduct a randomized pilot trial of an existing student–teacher relationship program adapted to be equity-explicit for use with ninth students transitioning into high school.

STR in High School

At the transition to high school, academic engagement and performance drop, and risk of dropout increases, for many students (Kennelly & Monrad, 2007). Not surprisingly, this transition portends the greatest risk for students with existing risk factors and those from minoritized racial/ethnic groups (EPE Research Center, 2006; Wheelock & Miao, 2005). The difficulties exhibited by students at this transition reflect the multitude of challenges facing entering 9th graders. The typical high school is larger and more bureaucratic than middle schools, which can lead to a sense of depersonalization and a lack of community (Lee & Smith, 2001). Classrooms are more socially comparative and competitive (Roeser et al., 2002), and there are increased demands to prepare for college or career (Williamson et al., 2015). Students at the transition, especially those who eventually drop out, report liking school less, feeling less connected to school and teachers, and generally feeling unmotivated and disengaged (Gruman et al., 2008). At a time when teacher support may be especially helpful, students report spending less time with teachers and feeling poorly supported and monitored by adults in their school (Barber & Olsen, 2004). Yet, how students adjust to the high school transition has significant implications for their long-term academic success (Archambault et al., 2009). Because of the challenges associated with the high school transition, interventions may be particularly powerful at this critical juncture. As Hertzog and Morgan (1999) noted, “students will decide during the first few weeks of their freshman year if they intend to be engaged in high school” (p. 27). Well-timed interventions during this sensitive period can trigger a series of reciprocally reinforcing positive interactions between the youth and their social system (Yeager & Walton, 2011).

Despite these risks, there is a dearth of STR interventions that are appropriate for high school students. This mirrors developmental patterns in social, emotional, and behavioral research in education in general, where the vast majority of studies focus on the elementary and middle school levels (e.g., Durlak et al., 2011). In addition, the dominant paradigm in STR research is grounded in attachment theory and conceptualizes STR quality along three dimensions: closeness, conflict, and dependency (Pianta et al., 2002). These aspects of STR are theoretically most important to development in early and middle childhood (O’Connor et al., 2011; Pianta & Nimetz, 1991). Compared to younger students, high school students have better-developed self-regulatory (Raffaelli et al., 2005) and social perspective-taking skills (Diazgranados et al., 2016). They rely less on dyadic interactions with teachers to co-regulate and moderate conflict in the classroom (Kurki et al., 2016). Thus, dependency in the STR may be less relevant for adolescents (Koomen & Jellesma, 2015). In contrast, constructs such as control, democratic socialization, maturity, autonomy support, competence, and relatedness may be more relevant for high schoolers (Brinkworth et al., 2018).

STR for Minoritized Racial/Ethnic Groups

Although the associations between STR and student outcomes have been established across demographic groups and contexts, positive STR may be especially protective for minoritized youth (McHugh et al., 2013). Yet, teachers are least likely to have positive cross-racial relationships with students (Thijs et al., 2012). Many scholars have attributed these racial disparities to the cultural mismatch between students and teachers, which are compounded by a lack of effective professional development (Milner, 2005). More than 80% of US teachers are White and female, which is a mismatch for the increasingly diverse student population (Howard, 2016). Additionally, many teacher candidates have limited exposure to cultural diversity, endorse skepticism that racism exists, and are concerned that discussing race may be harmful (Gay, 2010; Kyles & Olafson, 2008). These dispositions lend to “color-blind” approaches and low efficacy integrating race and culture into their teaching (Gay, 2010; Kyles & Olafson, 2008).

The racial disparities in STR have consequences for students’ school outcomes. The quality of the relationship predicts teachers’ perceptions of student academic ability, which may contribute to racial/ethnic differences in academic outcomes (Kellow & Jones, 2008; van den Bergh et al., 2010). STR represent one point in a recursive cycle that exaggerates student differences over time. Students who receive more positive attention demonstrate better behavior and improved academics over time, which in turn enhances teachers’ perception of their behavior and capability (Wang & Eccles, 2012).

Such findings suggest that any STR intervention needs to be equity-explicit. Fortunately, a number of equity-explicit interventions are emerging, primarily targeting classroom management practices with the aim of reducing racial/ethnic disparities in discipline referrals. For instance, Greet-Stop-Prompt (GSP) trains teachers to recognize how implicit biases may impact their responses to student misbehavior and to regulate their behavior to prevent exclusionary discipline. GSP was found to be effective at reducing relative risk ratios of office referrals for Black boys (Cook, Duong, et al., 2018a). Similarly, Double Check aims to improve teachers’ culturally responsive classroom practices by having teachers examine how their own identity and cultural background interact with their students’ identities and impact their responses to student behavior. Double Check has led to improved teacher classroom management practices and student behavior (Bradshaw et al., 2018). In sum, training teachers to recognize and mitigate implicit bias is a promising avenue for improving the equity of teacher-delivered interventions.

Previous Research with EMR

In previous research on EMR, we tested the intervention’s effects among elementary and middle school teachers and students in two randomized-controlled trials. In both trials, we found that EMR improved STR, and observer-rated disruptive behavior and academically engaged time, with large effect sizes (ds ranging from 0.61 to 1.32 over a control group) (references removed for blinding). However, EMR’s developmental appropriateness and effectiveness for high school students have not been determined. Further, our previous study showed that, although student race/ethnicity did not moderate EMR effects, there were significant differences in STR quality between White and non-White students at baseline that remained at post-intervention (reference removed for blinding) (Duong, et al., 2019; Gaias, et al., 2020). This suggests that an explicit equity component may be needed to enhance the intervention’s ability to disrupt extant disparities in STR. We previously developed an equity-explicit version of Establish-Maintain-Restore (E-EMR). A pre-post pilot study of E-EMR in one high school demonstrated promising results for reducing disparities for students of color and for improving equity in teacher relationship practices (reference removed for blinding). However, there is a need to more rigorously examine the effects of E-EMR on student outcomes and on racial/ethnic disparities in these outcomes.

The Current Study

This study reports the effects of a randomized controlled pilot trial testing the impact of E-EMR on student social, emotional, and behavioral outcomes, including STR, school belonging, academic motivation, and engagement. Pilot trials play an essential role in intervention science; they assess feasibility and allows investigators to detect signals of possible effects that can be tested in more robust, larger-scale efficacy trials. In addition to examining the main effects of E-EMR, we also examine for whom E-EMR is most effective by testing the moderating effects of students’ race/ethnicity and their baseline scores.

Method

Setting and Participants

Participants included 94 teachers and 412 students recruited from 6 public high schools in the Pacific Northwest region of the USA. A nearest neighbor approach was used to block schools into pairs based on similarity in school size, percent of the student body who identified as Asian, Latinx, Black, Multiracial, and Other (including American Indian/Alaskan Native), and percent of the student body eligible for free or reduced priced lunch (FRL). We randomly assigned schools within pairs to the intervention or waitlist control condition. High schools served a relatively diverse student population racially (M = 52.3% non-White) and socioeconomically (M = 27.5% receiving FRL).

Teacher Participants

Within both intervention and control schools, all 9th grade teachers were recruited to participate. Teachers had to meet two eligibility criteria: (1) be general education certified, and (b) 50% or more of their class rosters had to consist of 9th grade students. Of the 104 teachers eligible, 94 (90.4%) agreed to participate. Table 1 summarizes teacher-level demographics. As seen, teachers were primarily White (80.9%), with a mean age of 36.6 (SD = 10.14) and an average of 9.3 years (SD = 7.98) of teaching experience.

Table 1 Participant demographic characteristics

Student Participants

A random subset of students was selected for recruitment based on class rosters provided by schools. For each participating teacher, we aimed to recruit five students from their classroom roster. Because we were interested in the impact of E-EMR on equity, we over-sampled non-White students. To prevent cross-classification, we eliminated already-selected students from the next teacher’s classroom roster. Of 470 students recruited, 412 (87.7.7%) agreed to participate. Table 1 summarizes student demographics. The majority of students self-identified as either White (29.6%), Multiracial (23.1%), Asian (14.5%), or Latinx (13.7%). There were relatively fewer Black students (6.3%) and students who identified as other (5.1%). In comparison, the general enrollment for the participating schools, based on public records, was 53.4% White, 15.8% Latinx, 15.8% Asian, 8.0% Multiracial, and 5.4% Black. Chi-square and t-tests examined baseline equivalency of the recruited sample across conditions. No significant differences were found on any demographic characteristics, except eligibility for FRL, which was entered as a covariate in all models.

Procedure

All study procedures were approved by the University Institutional Review Board and school district research committees. Recruitment began in Spring 2019 via meetings with school and district leaders. Schools who agreed to participate provided a list of teachers whose teaching load was at least 50% ninth grade students. Teachers were oriented to the study procedures in person by research staff and those who agreed to participate provided written consent. In Fall of 2019, classroom rosters for participating teachers were obtained from school records. School administrators sent an information sheet describing the study to parents of ninth grade students with instructions to contact the researchers if they wish to opt out. From the students who remained eligible, a subset of five students per participating teacher was selected from class rosters. Student informed assent occurred in person at each participating school.

E-EMR training occurred at the beginning of the 2019–2020 academic year, during protected professional development time prior to the return of students in August and early September, 2019. Separate trainings were conducted for each school. There were three waves of data collection for both students and teachers. Time 1 (T1) took place over a 2 week period from the end of September to beginning of October 2019. T2 occurred in late January 2020, prior to the end of first semester for all schools. For these first two waves of data collection, electronic surveys were sent to all teachers and students. Regular reminders were sent to teachers, and an in-person “make-up” data collection was conducted at each school for students. Between T2 and T3, the COVID-19 pandemic caused schools to move to distance learning. Leadership and teachers from all high schools expressed interest in continuing the study. However, E-EMR was developed and previously tested as an in-person intervention, making it difficult to measure fidelity of the intervention in distance settings. T3 data collection in May 2020 was completed virtually only, resulting in significantly lower response rates for students without the aid of an in-person make-up. Because the disruptions caused by COVID-19 compromised the validity of T3 data, the present study utilizes only the first two time points of data collected.

Intervention Condition

E-EMR Condition

EMR is a framework that provides shared language and concrete practices to promote STR. It also includes implementation supports such as training, a “playbook” with implementation scripts, structured professional learning community (PLC) protocols, and weekly email prompts. Teachers are trained on the three distinct, inter-related phases of a relationship: Establish, Maintain, and Restore. These phases are arranged sequentially as a heuristic that guides teachers’ practices, but they can fluctuate over time depending on changes in the relationship (e.g., teacher discipline of the student may present a need to restore the relationship).

Establish practices

The initial phase in any relationship, and the first intervention component of EMR, involves intentional efforts to establish relationships with each student. The goal is to ensure that all students feel a sense of connection characterized by trust, respect, and mutual understanding. There are eight establish practices: banking time, getting to know the details of students’ lives, expressing high expectations and high support, proactively and reactively offering help, providing opportunities for voice, interspersing opportunities to exercise choice, positive greetings and farewells, and relaying positive information through other adults.

Maintain

Once established, active effort is required to maintain a relationship. Without proactive maintenance, relationship quality can deteriorate over time as the ratio of positive to negative interactions naturally diminishes (Steinberg, 2001). The primary practice associated with the Maintain Phase is the 5-to-1 ratio of positive to negative interactions. To achieve the 5-to-1 ratio, teachers learn strategies for non-contingent and contingent positive interactions. In addition, EMR provides guidance for preventing negative interactions by responding to unwanted behavior progressively and with empathy.

Restore

Conflict within relationships is common. When left unattended, however, negative interactions can weaken the relationship, leaving the student less engaged in class, less responsive to efforts to correct problem behaviors, and more challenging to motivate to take on academic work they perceive as challenging or boring. The aim is for teachers to follow negative interactions with students by reconnecting and repairing relational rifts and returning the relationship to its previous (Maintain) phase. Teachers are trained on four Restore strategies: letting go, taking ownership, mutual problem-solving, and expressing care.

Equity Levers

Four equity levers were added to make EMR equity-explicit, and were designed to provide teachers with concrete strategies for interrupting implicit biases while building relationships with students and increase teachers’ capacity to understand student behavior beyond their own cultural lens.

Seek Commonalities

The goal of the Seek Commonalities equity lever is to minimize the perceived differences between teachers and their students, which can contribute to negative out-group biases and stereotyping. Teachers were encouraged to better understand students’ preferences, values, and experiences and identify which of these they may share with the student. Teachers are then encouraged to actively center the commonalities they share with students when establishing or restoring relationships.

Gain Perspective

The goal of the Gain Perspective equity lever is to understand how a student perceives their experience in the classroom overall and how they may perceive negative interactions (e.g., disciplinary incidents) in particular. Teachers reflect on how their background shapes their expectations for student behavior and how students’ cultural or personal backgrounds shape the ways in which they engage in the classroom and school. It is critical, however, for teachers not to assume students’ perspectives based on their racial/ethnic or cultural identity, but to find ways to elicit this knowledge through discussions with students.

Gather Facts to Disprove Assumptions

The goal of this equity lever is for teachers to seek objective and counter-stereotypical information about individual students. This requires teachers to (1) reflect on their assumptions of certain students and what behaviors they expect them to show, (2) determine what information needs to be gathered to disprove those assumptions, and (3) conduct objective observations to test their assumptions. Teachers can then engage in relationship building strategies with students having checked their assumptions regarding students and their behavior.

Know your Vulnerabilities

The goal of the Know your Vulnerabilities equity lever is for teachers to recognize when they are most likely to respond in a biased manner and pay particular attention to interrupting those reactions in those moments. Teachers reflect on the situations and behaviors that are most likely to trigger an unskilled and/or biased interaction with a student and how their implicit or explicit biases may be most in play during those moments.

Training

The initial training was a six-hour interactive group-based experience that utilized an elicit-provide-elicit structure to promote teacher engagement. Two trainers who were involved in the development of E-EMR conducted the trainings. The trainings included a rationale for the importance of STR, particularly for ninth grade students, the science behind implicit bias, and an introduction to principles of equitable implementation of universal programs. Teachers were given the opportunity to see and practice skills in each of the three relationship phases. Finally, teachers were oriented to the rationale and structure of ongoing implementation supports, including the PLC and embedded reflection form, the E-EMR playbook, and the reminder emails. Trainings were videotaped and coded for fidelity. In terms of adherence, trainers delivered 97–100% of the active components as planned across all trainings. Teacher behavioral engagement was coded via 10 randomly selected 5 min video clips of the training. Two raters coded the videos using a 15 s partial interval recording and operational definitions of behavioral engagement. Raters overlapped on 30% of the observations; inter-rater agreement was 96%. Overall, behavioral engagement was 96% of the observed intervals.

PLCs

A primary E-EMR implementation support is a structured PLC protocol, used monthly in small groups. The protocol guides teachers through six steps: (1) Review the previous month’s plans and progress, (2) Using a roster, identify students who are in the Establish or Restore phases, (3) Reflect on equity by examining how student demographics are associated with relationship phase, (4) Set goals to establish or restore relationships with students, (5) Develop student-specific action plans to achieve the goals within the next month, and (6) Reflect on the PLC process, including what worked well and what could be improved.

Weekly emails

To encourage and remind teachers to use E-EMR practices, weekly prompts were emailed to participating teachers. Emails aimed to increase motivation or featured specific practices that were particularly relevant based on timing in the school year. An “equity tip” was included with every email, which explicitly tied the practice to the experiences of students of color, provided implementation tips to enhance culturally responsiveness for students of color, or reminded teachers of relevant equity levers and how they may be applied to the practice.

Control Condition

Ninth grade teachers in the control condition delivered practices as usual, with no formal training or technical assistance.

Measures

Psychological Sense of School Membership (PSSM)

The PSSM is a validated student self-report measure (Goodenow, 1993) that assesses a young person’s global sense of belonging and school connectedness. The PSSM consists of 18 items (e.g., I feel proud of belonging to my school) rated on a 5-point Likert scale (0 = not at all true to 4 = completely true). Internal reliability estimates and 4 week test–retest reliability are adequate (Goodenow, 1993; Hagborg, 1994). The PSSM had good internal reliability in this sample (α = 0.89).

Academic Motivation Scale (AMS)

The AMS (Vallerand et al., 1992) is a 28-item questionnaire with seven subscales, each designed to assess different motivational states. This study included only five of the subscales deemed most relevant to STR: amotivation (i.e., unmotivated to engage in academic pursuits), external regulation (i.e., academic behaviors motivated to obtain a positive outcome or avoid a negative one), identified regulation (i.e., attributing personal value to academics), introjected regulation (i.e., behavior motivated to achieve or avoid favorable or unfavorable internal states), and (e) intrinsic regulation (i.e., a state of being intrinsically motivated by academics). Items were rated on a 7-point scale (1 = does not correspond to 7 = correspond exactly). The AMS has shown good reliability and criterion and concurrent validity (e.g., Cokley et al., 2001; Fairchild et al., 2005; Vallerand et al., 1993). Internal reliability for this sample ranged from α = 0.81–0.94 for all subscales.

Student Engagement Instrument (SEI)

The SEI includes 33 items measuring cognitive and affective engagement (Appleton et al., 2006). Although the SEI has five subscales, only three were used in this study: (a) teacher-student relationships (9 items), control and relevance of school work (9 items), and future aspirations and goals (5 items). Items are rated on a 4-point scale (1 = strongly agree to 4 = strongly disagree). These three subscales have shown evidence of construct (Betts et al., 2010) and predictive validity (Appleton et al., 2006). Internal reliability for this sample ranged from α = 0.79 to 0.88 for all subscales.

Strengths and Difficulties Questionnaire (SDQ)

The SDQ is a standardized behavior rating scale for 3- to 16-year-olds that assesses psychosocial functioning and consists of four problem behavior subscales and one prosocial behavior subscale. Problem behavior subscales can be aggregated to yield a “total problems” score. Respondents complete items on a 3-point scale (0 = not true to 2 = certainly true). The SDQ has demonstrated acceptable internal consistency, reliability, and validity (Goodman, 2001) and has been shown to be at least as good as the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2001) at detecting conduct and emotional problems (Goodman & Scott, 1999). Internal reliability in this sample varied by subscale: α = 0.73–0.74 for emotional problems, α = 0.64–0.69 for conduct problems, α = 0.71–0.75 for hyperactivity/inattention problems, α = 0.57 for peer relationship problems, α = 0.62–0.65 for prosocial behavior, and α = 0.82–0.83 for total problems.

Analytic Plan

We conducted tests of baseline equivalence, examining differences in student and teacher demographics and baseline levels of outcome variables, across the treatment and control groups. The two groups were not significantly different on any demographic or outcome variables, except that the control group had a significantly higher proportion of students eligible for FRL. Thus, FRL status was entered as a covariate in all inferential models.

We calculated intraclass correlation coefficients (ICCs) and design effects (Huang, 2018; Musca et al., 2011; Raudenbush & Bryk, 2002). ICCs ranged from 0 to 48% (average = 8%) at Time 1 and 0–15% (average = 3%) at Time 2; design effects ranged from 1.02–2.86 (average = 1.32) at Time 1 and 1.03–2.25 (average = 1.13) at Time 2. Although the design effects suggest clustering may be ignorable, some of our ICCs exceed 0.05, suggesting a need for clustering. We opted for a more conservative approach and conducted our inferential analyses accounting for nesting.

Three sets of models were run to test our research questions. First, we examined the main effects of E-EMR condition (1 = E-EMR, 0 = Control) on each outcome at T2, controlling for baseline (T1) levels of the outcome and FRL status. Baseline scores were centered; thus, the effect of E-EMR could be interpreted for a student with an average level of the variable at T1. Effect sizes were calculated using a delta score which accounts for relative change over time for both the intervention and control group \(d_{{{\text{change}}}} \, = \,\left( {M_{{{\text{change}} - {\text{t}}}} /SD_{{{\text{pre}} - {\text{t}}}} } \right) - (M_{{{\text{change}} - {\text{tc}}}} /SD_{{{\text{pre}} - {\text{c}}}} )\), as recommended by Feingold (2009). Second, we examined whether the effect of E-EMR on T2 outcomes were moderated by T1 scores by including a multiplied term between the centered T1 variable and the E-EMR condition variable. Significant (p < 0.05) interaction terms were probed using the Johnson-Neyman technique, where the effect of E-EMR on the T2 outcome was plotted according to continuous levels of the adjusted T1 score. This allows for the identification of “regions of significance”—specific values of T1 where the effect of E-EMR was significantly different from zero. Third, we examined whether student's race/ethnicity moderated the effects of E-EMR on T2 outcomes. Dummy variables representing each racial/ethnic category were entered; White was used as a reference. Interaction terms between the five categories (i.e., Asian, Latinx, Black, Multiracial, Other) and E-EMR condition were added to the model. Although the Other race group was included as a dummy code, we did not interpret interactions or simple slopes for this group because of very small cell sizes (intervention n = 5, control n = 16). Otherwise, any significant interactions were probed using the model constraint command in Mplus to generate simple slopes. For any probed interaction, an additional model was run where the second-largest racial category (i.e., Multiracial) was removed to acquire a simple slope for White students.

All models were run using Mplus 8.1 (Muthén & Muthén, 2017). Missing data ranged from 6.7 to 10.1% at T1 and from 9.8 to 13.0% at T2. Missing analyses revealed that multiple student characteristics, including race, gender, whether a student had failed a class the previous year, and GPA, predicted missingness; thus, all missing data was imputed using demographic characteristics as auxiliary variables. Analyses were run using TYPE = COMPLEX at the classroom level to address clustering of students within teachers.

Results

Main Effects of E-EMR

Table 2 summarizes the main effects of E-EMR and average scores at pre- and post-intervention for students in E-EMR and control conditions separately. As shown, none of the main effects reached statistical significance, although changes were in the expected direction.

Table 2 E-EMR intervention effects on T2 outcomes and pre- and post-test means for each condition

Moderation by Baseline Scores

There was a two-way interaction between condition and baseline scores for three variables: STR (B = − 0.16(0.08), p = 0.05), conduct problems (B = − 0.24(0.10), p = 0.02), and total problems (B = − 0.15(0.07), p = 0.047). Johnson-Neyman plots probing these interactions are shown in Fig. 1. As illustrated in Panel A, the effect of E-EMR was significant and in the expected direction for students whose STR scores were − 0.53 SD or below the mean at baseline, but not students who scored above this cutoff at T1. Panel B shows a similar plot for conduct problems. As seen, a significant expected effect of E-EMR was found for students who scored 0.1 SD above the mean at baseline. However, a significant effect of E-EMR in the unexpected direction was found for students who scored 1.6 SD below the mean at baseline (i.e., students who began with fewer conduct problems than their peers). Finally, for total problems, the plot in Panel C indicates a significant effect of E-EMR in the expected direction for students at 0.95 SD above the mean at baseline.

Fig. 1
figure 1

CONSORT diagram detailing study procedures

Moderation by Student Race/Ethnicity

Significant interactions by race were identified for nine of the E-EMR outcomes. Interaction coefficients are summarized in Table 3 and can be interpreted as the difference between the effect of E-EMR for White students and for students from Asian, Latinx, Black, and Multiracial students, respectively. Simple slopes can be interpreted as the difference between intervention and control students within each racial group.

Table 3 Interactions between baseline scores and condition and interactions between race/ethnicity and condition

Asian Students

Asian race significantly interacted with E-EMR condition for school belonging (B = 0.37(0.13), p = 0.01) and hyperactivity/inattention (B = 0.23(0.10), p = 0.02). Simple slopes revealed that E-EMR had a significant positive effect on T2 school belonging for Asian (B = 0.28(0.11), p < 0.01) but not White students (B = − 0.10(0.09), p = 0.28). Simple slopes for hyperactivity/inattention revealed a significant effect in the unexpected direction for Asian students (B = 0.16(0.07), p = 0.03) and a non-significant effect for White students (B = − 0.07(0.06), p = 0.26).

Latinx Students

A significant interaction between Latinx race and E-EMR condition was found for identified regulation (B = 0.82(0.38), p = 0.03) and future aspirations and goals (B = 0.34(0.13), p = 0.01). For identified regulation, E-EMR showed effects in opposing directions for Latinx students (B = 0.52(0.28), p = 0.06) and White students (B = − 0.28(0.24), p = 0.23), although neither effects reached statistical significance. Similarly, simple slopes revealed that the effect of E-EMR on future aspirations and goals was significant for Latinx (B = 0.24(0.12), p = 0.04), but not White students (B = − 0.12(0.08), p = 0.13).

Black Students

A significant interaction of E-EMR was found for Black students relative to White students on only one outcome, control and relevance of schoolwork (B = 0.55(0.19), p = 0.003). Simple slopes revealed a positive effect of E-EMR for Black students (B = 0.51(0.16), p = 0.002) that was not significant for White students (B = − 0.07(0.08), p = 0.18).

Multiracial Students

Significant interactions emerged for school belonging (B = 0.27(0.11), p = 0.02) and future aspirations and goals (B = 0.23(0.11), p = 0.04). E-EMR had a significant positive effect on school belonging for Multiracial (B = 0.18(0.09), p = 0.04), but not White students (B = − 0.10(0.09), p = 0.28). Similarly, simple slopes for future aspirations and goals showed changes in opposing directions for Multiracial (B = 0.14(0.08), p = 0.09) and White students (B = − 0.12(0.08), p = 0.13), although neither reached statistical significance.

Discussion

This study built on previous research on Establish-Maintain-Restore with elementary and middle school students by evaluating its effectiveness among high school students and teachers in a cluster randomized controlled trial. The version of EMR tested in this study had been iteratively adapted to optimize its developmental and contextual appropriateness for high school and to make it equity-explicit (reference removed for blinding). Overall, we found no significant main effects, which is common in evaluations of universal prevention programs (Greenberg & Abenavoli, 2017). However, there were targeted benefits for students who started with low scores at baseline, for Asian, Latinx, Multiracial, and (to a lesser extent) Black students. We also found some unexpected effects, where high-performing and/or advantaged groups in the E-EMR condition had less favorable outcomes at post, compared to those in the control group, which may be a result of the equity-explicit focus of E-EMR (Fig. 2).

Fig. 2
figure 2

Johnson-Neyman Plots outlining the change in slope between E-EMR and T2 outcomes (y-axes) at varying levels of standardized baseline scores (x-axes) of each outcome. Solid lines represent the effect of E-EMR on T2 outcomes, whereas dashed lines represent the confidence interval around that effect. The effect of E-EMR is significant when the confidence interval does not include zero along the y-axis. Blue shading indicates regions of significance

Two principles guided our interpretation of the findings. First, this study was a pilot trial. Pilot trials play a critical role in the translational research process because they are designed to assess feasibility for and detect potential signals that could inform planning of a larger, more rigorous, and adequately powered study (Eldridge et al., 2016). Second, E-EMR is a universal intervention. In a review of 74 meta-analyses of universal intervention programs for school-aged youth, Tanner-Smith et al. (2018) found that the median average effect sizes of such programs tended to fall between Cohen’s d = 0.07 and Cohen’s d = 0.16. Those authors concluded that the statistical benchmarks outlined in by Cohen (1988) for small (0.20), medium (0.50), and large (0.80) effects are inappropriate to use in interpreting findings from universal interventions.

Main Effects of E-EMR

We did not find significant main effects of E-EMR, although changes were in the hypothesized direction. The practices in E-EMR are mostly dyadic. For instance, one of the primary Establish practices is to “bank time” with individual students by engaging in student-centered conversation. This may reflect E-EMR’s origins in elementary schools, and the field’s focus on STR among younger students (Kincade et al., 2020). The dyadic nature of such practices render them very difficult for high school teachers to implement with all their students, given the limited resources and time that results from teaching many more students across multiple class periods (Deschenes et al., 2010). In fact, we acknowledged the limitations on resources and time in E-EMR training. A fundamental part of E-EMR implementation is a “triage” process that helps teachers focus their attention on the students who need it most. Therefore, E-EMR practices, particularly at the high school level, are not intended to nor can feasibly be distributed equally across all students. It is also possible that we did not detect more effects because the five students randomly selected to be part of the study procedures did not match the students that teachers were focused on. Future research should examine the fidelity of E-EMR implementation at the dyadic level and how dyadic fidelity corresponds to benefits in student outcomes. Further, dyadic interactions between teachers and students in high schools have a strong academic focus. It may be worthwhile to examine the effects of E-EMR in concert with a whole-class intervention such as My Teaching Partner-Secondary (Liu et al., 2018).

Moderation by Baseline Scores

We found significant moderation by baseline scores for STRs, conduct problems, and total problems. For all three variables, E-EMR had the strongest effect among those who scored worse at baseline. These findings are consistent with other universal interventions, which often show increased benefits for students who are in the most need (Greenberg & Abenavoli, 2017). The findings are particularly encouraging, given that STR were reported by students, who were not aware of intervention condition. Further, conduct problems and total problems represent more distal outcomes of E-EMR, and are not directly targeted by the intervention. The impact of E-EMR on these outcomes are consistent with previous research suggesting that positive STR are protective against a variety of independently observed problem behaviors (Cook, Coco, et al., 2018b).

Moderation by Student Race/Ethnicity

With White students as the reference group, we found some targeted benefits for Asian, Latinx, Black, and Multiracial students. Among Asian students, for example, those whose teachers participated in E-EMR reported significantly larger increases in school belonging from September to January, compared to the control group. Although Asian students are not disadvantaged relative to their white peers on many educational outcomes, including achievement (Hsin & Xie, 2014), they are often subject to experiences of racial discrimination at school that undermine their sense of belonging (Byrd & Andrews, 2016; Montoro et al., 2021). These psychological outcomes are often overlooked for Asian students (Wing, 2007). Seen as the “model minority,” they are often perceived as high-performing and needing minimal support (Whaley & Noel, 2013; Yoo et al., 2015). However, a strong sense of school membership is protective across a range of psychosocial outcomes (Wagle et al., 2018).

Among Latinx students, E-EMR led to significant benefits on students’ future aspirations and goals, and trends in improved STR and perceived control and relevance of schoolwork. These patterns suggest E-EMR may have promise for mitigating long-standing barriers for Latinx students. Prior work has found that teachers have diminished expectations for Latinx students, are less likely to praise them, even for correct answers, and have fewer positive interactions with Latinx students (Tenenbaum & Ruck, 2007). As a consequence of these systemic barriers, Latinx students are less likely to feel connected to school, score lower than their White peers on standardized achievement tests, and drop out of high school at greater rates (Vera et al., 2018). It is important for future research to examine whether the positive trends noted in this study also manifest in indicators of achievement.

We found relatively fewer positive effects of E-EMR among Black high school students. Indeed, the only significant finding that emerged was that Black students in the intervention condition perceived greater control over and relevance of their schoolwork. It may be that anti-Black racism needs to be more explicitly addressed in interventions such as E-EMR (Wun, 2016). Additionally, our small subgroup sample size likely undermined our ability to detect effects. Future research with a larger sample size will be better positioned to tease out competing hypotheses.

Among Multiracial students, we found a relative benefit of E-EMR on sense of school belonging. These findings should be interpreted cautiously, given the heterogeneity of the “multiracial” category. Despite the growing multiracial population in the USA (Mordechay & Orfield, 2017), the field lacks consensus around how to measure and categorize multiracial individuals (Charmaraman et al., 2014). Consequently, inconsistent findings have been documented in the literature (Shih & Sanchez, 2005; Udry et al., 2003). Previous studies have adopted a range of approaches, including not allowing participants to report more than one race, not reporting data from multiracial participants, and combing all mixed subgroups into a single “multiracial” category (Charmaraman et al., 2014). In the current study, we adopted the latter approach. The 96 students within our multiracial category represented 30 unique racial combinations, with the largest groups identifying as Latinx and White (n = 26), Black and White (n = 14), Asian and White (n = 11), Native American and White (n = 5), and Black and Latinx (n = 4). Some researchers have hypothesized that mixed-race children may occupy positions of power and privilege that are somewhere between those of monoracial white and monoracial ethnic minority peers (Binning et al., 2009). Others have noted that mixed-race youth face unique challenges and opportunities, such as conflictual cultural messages at home and access to multiple communities (Shih & Sanchez, 2005). In terms of interventions like E-EMR, which are focused on teacher behavior, a student’s phenotype (which influences racial identification by others) may be important to consider in addition to self-identification (AhnAllen et al., 2006).

Unexpected Findings

We found some unexpected changes attributable to E-EMR. For instance, among students who scored low on conduct problems relative to their peers at baseline, students in E-EMR, compared to those in the control group, showed an increase in conduct problems. It should be noted, however, that even with this apparent increase, these students’ scores at post were 0.12 SD above the mean, well within the normative range on this measure. Additionally, E-EMR may have equalized outcomes by influencing subgroup scores in unintended directions. For example, Asian students increased in hyperactivity/inattention, and White students decreased in prosocial behavior. In both of these instances, students’ scores at post still fell well within the normal range (z = 0.96 for hyperactivity/inattention and z = 1.48 for prosocial behavior, respectively).

These unintended effects are small and limited and we are cautious about over-interpreting the findings, particularly in the context of a pilot trial. It should also be noted that these changes were observed over a short intervention period of four months, and we do not know whether these changes will continue and for how long. Nonetheless, the findings raise important questions about desired and acceptable outcomes of doing equity work. Some might expect equity to entail improving outcomes for “disadvantaged” subgroups while maintaining or improving outcomes for advantaged groups. However, this may not be feasible or enable disparities to be dismantled. If all tides rise in response to universal supports, gaps will continue to exist between advantaged and disadvantaged groups. As a field and society, is it acceptable if equity work equalizes outcomes not only by bringing up disadvantaged groups but also by bringing down advantaged groups? How much of a loss to advantaged subgroups are we willing to tolerate in the context of gains for disadvantaged groups? We must grapple with such questions.

Limitations

This study has several limitations not already discussed above. First, we had only a small number of schools. This, combined with randomization at the school level, resulted in some imbalance in our sample at baseline that had to be adjusted for statistically. It is also important to note that although randomization occurred at the school level, our analyses were conducted at the student level, impacting our degrees of freedom (Raudenbush & Bryk, 2002). If randomization occurs at the school level, a true test of treatment effects would require measurement of all teachers and all students within the school (Raudenbush & Bryk, 2002). As a pilot trial and due to the pandemic, we had only a short follow-up period of four months, and were not adequately powered to test mechanisms (e.g., implementation fidelity as a mediator of outcomes). Relatedly, we did not assess the extent to the which teachers in the control condition implemented practices similar to E-EMR. Finally, future research needs to examine E-EMR’s impact on behavioral and academic outcomes such as attendance, discipline, and grades.

Implications and Conclusions

In this study, we extended the research on EMR by adapting it to be equity-explicit and developmentally appropriate for high school students and testing it to detect signals of evidence to inform future research. Together, findings from our previous studies demonstrate the promise of E-EMR for improving teachers’ skills for building relationships and for enhancing equity in student outcomes. E-EMR shares a common component with other promising equity-focused programs, which is the opportunity for teachers to learn about, reflect on, and create action plans to disrupt their biases (see Bradshaw et al., 2018). Importantly, the equity-explicit components were not siloed from other intervention components, but was interwoven throughout each EMR practice. That is, we made it clear that teachers can establish, maintain, and restore relationships in a color-blind manner, or that they can establish, maintain, and restore relationships with equity front and center, and we gave specific instructions for doing the latter. We believe these concrete recommendations were pivotal in the effectiveness of E-EMR. Relatedly, the equity lens was interwoven throughout the monthly professional learning communities. This monthly cycle of goal setting, action, and reflection was designed to promote behavioral habits (taking equity-explicit actions to cultivate relationships) as well as dispositions (reflecting on one’s teaching through an equity lens). At this stage, E-EMR is poised for a large-scale efficacy trial that establishes more robust evidence of the effects of E-EMR on student social, emotional, and behavioral outcomes, including the prevention of high school dropout.