System-, teacher-, and student-level interventions for improving participation in online learning at scale in high schools

Significance The COVID-19 pandemic saw many school systems turn to online education. In contrast to the voluntary learners in massive open online courses, school student engagement in online learning is also determined by the role of their teachers and schools. Using large-scale experiments with over 45,000 Ecuadorian high school students, we implemented rapid behavioral science interventions at the student, teacher, and system level. The largest impacts on study time and knowledge are seen from using centralized management over schools, rather than decentralized self-management. Teacher-level interventions had limited impact. Small financial incentives induce more student studying, but do not translate into more knowledge. Our findings highlight the need for moving beyond the student level in online education interventions.


A: Detailed description of the three experimental waves and associated interventions
Context: The public schooling system in Ecuador consists of 6 grades of primary education, three grades of lower secondary (grades 7 through 9), and three grades of upper secondary or high school (grades 10 through 12). Schooling is free, and 75 percent of students attend public institutions. High schools are divided into two specializations: technical and science. In both types of high schools, students have a common core curriculum, and sit a national standardized exam called Ser Bachiller to graduate, but they are offered different elective subjects. We work with the technical specialization schools, which offer skills designed for students to be able to directly enter the labor market, including entrepreneurship as an elective subject. In our first wave we also worked with science specialization schools, that focus more on preparation to enter higher education. Our broader project, Showing Life Opportunities, is intended to provide personal initiative and negotiations skills to students considering entrepreneurship, as well as offer insights into alternative career pathways through online training in statistics and the scientific method, with English and Spanish online content offered as comparison courses.
Due to climatic reasons, the education system splits the country into two separate regimes which operate on different school calendars. The Highlands-Amazon regime holds classes from September to June, whereas the Coastal regime holds classes from April through to February.
Wave 1: In wave 1 we offered the courses to students in Educational Zone 2, part of the Highlands-Amazon regime. Schools had to have a steady internet connection and enough computers for at least one class to use at a time. We worked in total with 108 schools, containing 560 classes, and 15443 students. The program started on September 23, 2019. It was interrupted by national protests in the country for a few weeks soon after starting, and then resumed until the pandemic closed schools on March 13, 2020. This resulted in a break for a month and a half, before schools then encouraged students to access the materials from their homes and finish the program. 12,232 students reached the endline survey in July 2020 (more than 80% of those registered on the platform).
Wave 1 was used for our teacher benchmarking intervention. Our goal was to have teachers actively support the program by being present in class to oversee their students using the online materials and to encourage usage. We formed matched quadruplets of schools based on the number of classes in the school, number of lessons completed on the platform to date, whether the students were actively using the platform, and the subjects being provided to their students. Within each quadruplet, we randomly allocated two schools to the benchmarking treatment and two to control (see Figure S2). Table S5 shows balance on baseline school and student characteristics for the two groups.
The benchmarking intervention then took place over 6 months between December 2019 and May 2020. Each treated teacher received 20 personalized benchmarking emails on their verified email that they used to access the online educational platform. These emails showed the number of lessons completed, number of unfinished lessons, and last active lesson for the teacher's class(es) and compared it to those of other classes doing the same type of course. This information was presented graphically, with the help of bar plots. Figure S3 provides an example.

Wave 2:
The COVID-19 pandemic closed schools on March 13, 2020. The Ministry of Education of Ecuador then quickly looked for content it could provide to help its students in the Highlands/Amazonia Regime complete their school year, which was to finish at the end of June. We worked with them to offer the courses that had been offered in wave 1 to the remaining seven educational zones in the Highlands/Amazonia. This was offered to students in the final year (grade 12), and was made a requirement for graduation for these students. The courses were offered in 416 schools and 14,398 students started the program, with 13,714 students (more than 95%) reaching the endline survey by the end of July 2020.
This program was offered on a compressed schedule, with students starting May 18, 2020, and needing to finish by the end of June 2020. We implemented student-and teacher-level interventions to try to ensure students were able to complete the course on time.
The student-level interventions started June 1, 2020, and were focused on students who were not already rapidly advancing through the material. Students who had completed more than 6 lessons in the first two weeks it was offered were screened out, and then the remaining 11834 students who were enrolled in the online platform, but not progressing as fast, were randomized at the individual level into one of four groups ( Figure S2): (1) a control group of 1961 students, who would continue to use the platform at their own pace. (2) a lottery intervention group of 3936 students. Students were told they would win a lottery ticket each time they finished a lesson. Figure S4 provides an example. Given the emergency context and short period to complete the course, these incentivized completion only, and not performance on other outcomes such as knowledge tests. The winners of the lottery received $20. (3) an encouragement messages group of 3975 students. Students were shown encouragement messages on the platform, selected at random from a set of messages designed to convince students that they could still finish the course despite the hurdles of the pandemic. These were intended to change beliefs about the feasibility of learning, and to help students keep negative effects in perspective. Figure S4 provides examples of messages such as #DearClassOf2020, "Graduating is a great achievement. Thank you for your drive to shape the world into something better. Source: UNICEF. You can finish all the lessons" or "Pandemic is temporary, what you do will always last." (4) a plan and within-household team-up group of 1962 students. Students were asked to make plans for how they will study and share these plans with others in their house. Figure S4 gives an example of the instructions. Students were asked to share their notes on number of hours spent already, reflect on what they had learned, and plan their next study sessions, as a way of increasing the chance of students using self-regulated learning strategies. Table S6 shows baseline characteristics and balance for the student-level interventions.
The teacher-level interventions were introduced on June 17, 2020, one month after students had started the program, and were intended to spur teachers to help stimulate their students to complete the program and graduate on time. Randomization occurred at the teacher level, with 532 teachers (classes) in 416 schools formed into strata based on level of activity, educational zone, number of students in the platform, type of course students were assigned to, and number of lessons completed by students to date. Figure S2 shows how teachers (and their respective students) were randomly assigned into three groups: (1) A control group of 178 teachers and 5031 students, where no messages were delivered to teachers. (2) An administrative SMS group of 177 teachers and 4693 students. These administrative messages acted as a combination of reminders and increases in the salience of monitoring. The message sent read "Remember, your Zone [Name] delegate and MINEDUC monitor your activity in the SLO project. Make sure your students finish until the end of the school year. SLO". (SLO is an acronym for our online program "Showing Life Opportunities"). Monitoring was common knowledge as ministry personnel were shown on the class dashboard of the online platform next to each teacher in each class. We hoped that the administrative SMS would increase the salience of this monitoring, as well as reminding the teacher of the need to finish on time.
(3) An encouragement email group of 177 teachers and 4674 students. Teachers in this group were sent emails encouraging them to watch a video 1 of the successful completion of the program by students in Zone 2, and telling them that we hoped that their students would finish the program successfully in time for the end of the school year. Figure S5 provides an example. Table S5 shows balance and baseline characteristics for these teacher-level interventions.
Wave 3: The final wave of the experiment took place in schools in the 9 educational zones in the Coastal Regime. The new school year for these students started later than usual in June 2020, and the government had more time to prepare for launching online schooling for these students without the pressure of the end of the school year facing the Highlands-Amazonia students. Our study then provides evidence of incentives to learn under a "new normal" of remote studying and learning from home, after the initial shock of the pandemic had lessened. We offered the program to 16,547 students in 598 schools, starting September 7, 2020 and ending November 16, 2020. 14,562 students (more than 88% of those registered) completed the endline survey on the platform. Building on our previous experiments and lessons, we tested more interventions at the student level, as well as one at the system level.
We formed strata of schools based on type of course offered, educational zone, cluster size (number of students enrolled), student average performance on the state exam at the end of school (Ser Bachiller) and the number of students in the 12 th grade. We then randomized schools into the following student-level and system-level interventions: (1) Centralized monitoring through an online learning management system. 149 schools containing 3882 students were placed in this treatment group. Here Ministry of Education administrative personnel (heads of zones, the central office) would have access to the online management system and could see real-time information about take-up and usage of the educational platform in each school, and would receive weekly reports. They could then use these to monitor teachers in poor performing schools and encourage them to improve. Figure S6 provides an example of the system and weekly report. (2) Self-management: 150 schools containing 4125 students were instead assigned to self-manage their participation in the program and the progress of these students. Ministry of Education officials were not provided the centralized information and weekly reports on these schools, and it was up to each individual school and teacher within school to ensure that their students progressed. Teachers receive the same information in this condition as in centralized management, the difference is in whether the central ministry also receives it. (3) Student lotteries for lesson completion and score. The early results from wave 2 had suggested that students were responding to the lottery incentives, but there was a concern that they might be rushing to finish lessons without caring as much what they learned. For wave 3, 149 schools containing 4025 students were assigned to a new lottery program. Here students earned a ticket for a lottery prize for each completed module, and also tickets for their performance on platform tasks. Students were sent emails ( Figure S7) to inform them of this, and a weekly draw was held, with one winner per school receiving $10. Note that these schools were also subject to the Centralized monitoring as in group 1, and so we measure the additional effect of student lotteries on top of centralized monitoring. (4) Encouragement to team-up with peers remotely. In wave 2, there was insufficient time to get students to pair up with others in their class, which is why they were asked to share plans with household members. In wave 3, we wanted to see whether working with classmates remotely would create peer incentives to complete the program. 4515 students in 150 schools were assigned to get emails ( Figure S8) encouraging them to work with peers to complete as many lessons as possible and get a better score. Again these schools were also subject to Centralized monitoring, so that we measure the additional impact of peer team-ups. Table S8 shows baseline characteristics of these four groups in the Coastal Regime and balance across treatments.
Our three waves of the experiment cover schools in different parts of the country. Table S9 compares the baseline characteristics of these schools, and their teachers and students across the three waves. Although there are some differences in means, there is considerable overlap in the distributions, and lessons are all taking place within the same school system in the same country. As a check that differences across waves are due to differences in the treatments, and not to differences in the characteristics of the schools, teachers, and students, we use these characteristics in Table S9 to reweight the data using inverse propensity scores in order to make all waves look like wave 3, the Coastal Region. Table  S10 shows that this results in only minor changes in the magnitude of the estimated coefficients, and does not change any of our substantive conclusions.

B: Definitions and Measurement of Main Outcomes
We construct study time index as a z-score based on the following variables: study time on the platform (in minutes), active days on the platform, and number of lessons completed. These variables are captured instantly and automatically on the online platform (based on any input-response by user). We construct knowledge index as a z-score based on the performance on five subject-specific knowledge tests executed via the online platform: Statistics and Scientific Thinking, Negotiations, Personal Initiative, English, Spanish. Students' correct answers coded as one in each item of each knowledge test. Both indexes are an average z-score over standardized z-score of variables in the respective family of outcomes described above (z-scores are standardized with respect to the control group mean). These indexes are constructed for standardization purposes and to account for multiple hypothesis testing within family of outcomes.
In addition to these main outcomes, we use a pre-specified set of baseline (pre-intervention variables) as potential control variables that we then select from using the post-double selection Lasso method of (1). This list consists of: • Two class-level variables: the total number of active students on the platform who have started at least one lesson; and whether the class has at least three students in it who have started at least one lesson (active class). These variables are not used in the Coastal Regime regressions as controls, since random assignment took place before students had started on the platform. • A wide range of individual-level variables: Approximately 70 individual-level characteristics of students were used as potential controls (with small differences across the three waves depending on measurement). These include demographic variables (gender, age); socioeconomic status and family background (e.g. main language spoken, parental durable assets owned, parental education and occupation status); attitudes, knowledge, and intentions towards careers in entrepreneurship (e.g. interest in entrepreneurship, perception of it as a career, study plans in this field, knowledge of others in this field); attitudes, knowledge, and intentions towards careers in STEM (e.g. interest in STEM, perception of STEM careers, study plans in this field, knowledge of others in this field); preferences and psychological profile (e,g. personal initiative, risk preferences, Big-5 personality traits; trust; creativity, cognitive reflection, grit); and baseline subject knowledge

C: Empirical Approach
We estimate the impact of being assigned to treatment T on outcome Y for student i in school or class j through the following pre-specified Ancova regression: Where ,j,0 is the baseline (pre-intervention) measure of the outcome variable (e.g. baseline knowledge index, or amount of time spent studying on the platform prior to interventions being launched); controls are a vector of control variables selected using the post-double selection lasso method of (1); strata are a vector of dummies for the different randomization strata; and the error term , is then clustered at the level of random assignment. When there are multiple treatments randomized within the same level (student, teacher or system) and experimental wave, we include dummies for assignment to each treatment in equation (1). For example, we have three student-level interventions in wave 2, and so include dummies for assignment to the lottery treatment, assignment to the plan and team up within household treatment, and for assignment to the encouragement messages treatment in this equation.
The post-double selection lasso method to choose controls serves two purposes. First, it offers the potential to boost statistical power by choosing covariates that are correlated with the end outcomes of interest. Second, to the extent that there are chance imbalances from the randomized assignment, or imbalances resulting from selective attrition, it can correct bias. In practice, we find attrition is generally uncorrelated with treatment status (Table  S8), our sample sizes are large and balanced on observables (Tables S5-S7), and once we have controlled for the baseline outcome and randomization strata, the remaining variables do not have strong predictive power. The result is that our estimates with these controls are similar in magnitude and precision to those that would be obtained without using this method, and few covariates are typically selected for inclusion.
Our main approach to multiple hypothesis testing is to restrict the number of outcomes of interest by focusing on two index measures: study time, and knowledge, defined above. Then, since our interventions occur at different levels (system, teacher, and student), and in different settings and experimental waves, we view each intervention as a separate policy choice of interest. We then follow the recommendation of (2) in not adjusting for multiple treatments across experiments. This allows our planning treatment to be compared to the results of other planning treatments in the literature, our lottery treatments to be compared to the results of other experiments offering students financial incentives, etc. In addition, we calculate sharpened q-values to control the FDR to account for our two main outcomes (study time and knowledge test indexes) and multiple interventions within each different combination of waves and levels of intervention (see Figure S2).
We estimate heterogeneity of treatment effects using two approaches. First, following the suggestion of previous literature (e.g. (3)), we hypothesized and pre-registered the hypothesis that impacts of benchmarking may vary depending on whether teachers were initially above-or below-average. We therefore add a variable for whether the teacher had above average performance and its interaction with treatment to equation (1), and report the results in Table S3. We use the same approach of including a covariate and its interaction with treatment for examining whether impacts vary with student gender and socioeconomic status, although theory offers less clear predictions about what type of heterogeneity one should expect in these cases (Table S2).
The second approach we use to examine heterogeneity in impacts is used for our systemlevel intervention. The goal is to help understand which groups of students and schools policymakers should use this policy for, and we use the policy tree machine-learning algorithm of (4). We allow the algorithm to consider eight variables from our administrative data (type of school funding, type of teacher contract, school performance level) that the Ministry of Education collects regularly and can thus use for decisionmaking. We do this for the outcome of knowledge gains in week 11.
This results in a simple decision tree, shown in Figure S9. The algorithm selects for a decision rule based on the average score in the school in the national end of school examininations (Nota SER Bachiller). High-performing schools (those with a grade above 7.68) can be self-managed, but low-performing ones should be subject to centralized management. By way of comparison, the national average grade on this examination in our sample of schools is 7.39. For comparison, we then also re-run the same algorithm for the week 8 outcomes of study time index and performance index. The algorithm again selects a decision rule with a cut based on grade on this exam, albeit with slightly different cutoffs.

D. Measuring cost-effectiveness
To investigate cost-effectiveness we focus on impacts on the knowledge index. We first compute effect size as the mean difference between treatment groups divided by the standard deviation of the control group (SD). We then compute the cost of each intervention in a comprehensive way, including the direct cost of delivering the interventions but also the indirect costs associated with shaping the interventions and running the interventions. This is thus a broad notion of an average cost per final user. Table S9 below presents the results. For example, the Centralized learning management system compared to Self-management improves knowledge by 0.126 SD with costs per targeted participant of $0.509. This large improvement in learning outcomes taken together with low costs leads to a cost-effectiveness ratio of the centralized learning management system of $4.04 per SD. Another intervention with a large cost-effectiveness is the Encouragement SMS on the Teacher level intervention which reaches a cost effectiveness ratio of 6.37, although we note that this impact is not statistically significant. None of the student level interventions had a statistically significant impact on knowledge. The encouragement message on screen has the lowest cost-effectiveness ratio (5.47) among student level interventions based on the point estimates. Note: We omit control variables in tables for the clarity of presentation, but all regressions include controls selected with a double-lasso procedure. In panel B and C, standard errors are clustered at the school-level (level of randomization in wave 3 experiment).*p<0.1; **p<0.05; ***p<0.01        Note: Attrition in the treatment groups in experiment 1 (zone 2), experiment 2 (highland regime) and experiment 3 (coast regime). The estimations in experiment 1 and experiment 3 are made with standard errors clustered at the school level and the estimations in experiment 2 (teacher level) are made with standard errors at the school level. We observe slight attrition issue in the plan and team-up treatment, which is interesting given the result of (5) who finds that peer mentoring increased registrations. However, given that we do not see any changes in outcomes due to the plan and team treatment (see results section), it is unlikely that is of concern.   Table 1. Reweighted estimates use inverse propensity-scores (within common support) to weight data so that characteristics of schools, teachers, and students in waves 1 and 2 are similar to those in wave 3. We exclude participants with propensity scores below 0.05 or above 0.95, as recommended by (29).  Figure S1: Impact of Centralized Management on Subject Test Scores

Figure S2: Experimental Design and Sample Sizes
Note: *If the level of randomization was higher than the student, we used balanced randomization to boost statistical power. **In Zone 2 experiment (wave 1), 2 schools out of 108 randomized did not take-up the program (no information on time or knowledge index) and 4 more schools did not reach endline knowledge test (no information on knowledge index). ***In the Coastal experiment (wave 3), we randomized on the level of schools based on administrative information before students register on the platform (as we had to test decentralized monitoring). If none of the students enroll to the platform in school, the students from this school did not receive an email (student-level intervention) as they did not get through baseline. Anticipating that not all schools assigned will start to study, we specified Class Level Take-Up Index in a pre-analysis plan that test if the coverage (number of classes, number of students) substantially differs between treatments. We do not see that the coverage (number of classes, number of students) significantly differs between treatments neither on the 8th nor on the 11th week which additionally supports fidelity of randomization. No message is shown.

Treatment 2 -Encouragement
Each time the next group of screens shown at random: We know that you miss your school, your teachers and your classmates, but do not give up because in your strength we will sow a better reality than this.
Stay at home and continue training with the DOV Program. To have courage is not the absence of fear, but the wisdom that there is something more important than fear and to know above all that you are not alone! The DOV team is with you.
Hug your family, live in the moment, take advantage of the education you have access to. You can, complete all your lessons!

Treatment 3 -Lottery Ticket
Congratulations!!!!! "Remember, every class you complete will earn you an additional ticket. Make sure you win more tickets this week". You will need to find a partner at home to share your progress. You can, complete all your lessons!

Figure S5: Examples of Teacher Encouragement Email Email with Video
Dear teacher, We would like to thank you for your work. Please see the video on the experience of students and teachers with the DOV Program in Zone 2. We hope that you and your students are also having a great experience with DOV. We want your students to successfully complete the program until the end of the school year.
-Link to the video-Thank you for your work.

DOV Team
If you have any questions or doubts about the project. Please email us at ecuadorslo2020@gmail.com.
For more information Is average grade at the school more than 7.68?

Self-Management
Centralized Online Learning Management