Introduction

Social and emotional skills (e.g. empathy, self-regulation) are pivotal for positive youth development, with research consistently demonstrating their influence on a wide variety of adaptive outcomes across the lifespan. These range from larger friendship networks and better quality relationships with friends in childhood (Brackett et al. 2004; Lopes et al. 2004) to improved health and labour market outcomes in adulthood (Goodman et al. 2015). Such skills also exert a protective influence in relation to deviant and risk-taking behaviours, bullying, violence, mental health, tobacco use and drug problems (Petrides et al. 2004; Trinidad and Johnson 2002). Moreover, there are increasing educational claims associated with improved social and emotional skills, such as better attendance, learning and motivation (Zins et al. 2004); reductions in exclusions (Petrides et al. 2004; Qualter et al. 2007); higher retention (Qualter et al. 2009); eased transition from primary to secondary school (Qualter et al. 2007); and overall exam performance (Marquez et al. 2006; Qualter et al. 2012; Qualter et al. 2009). Consequently, social and emotional learning (SEL) curricula—in which these skills are explicitly taught—are increasingly popular in the UK and internationally (Early Intervention Foundation 2015).

What is SEL?

SEL interventions seek to develop children’s social and emotional skills, typically through the implementation of a taught curriculum, modifications to school ethos/climate and/or work with families and communities (Humphrey 2013). Four recent meta-analyses have demonstrated that high-quality SEL interventions can impact positively on a range of outcomes, including social and emotional skills, mental health, school attitudes and academic performance (Corcoran et al. 2018; Durlak et al. 2011; Goldberg et al. 2018; Sklad et al. 2012; Wigelsworth et al. 2016). A fifth has demonstrated that these effects are maintained over time, albeit with some attenuation (Taylor et al. 2017).

Schonfeld et al. (2015) theorise three mechanisms through which SEL can aid academic achievement. First, SEL interventions teach approaches to solving social problems (i.e. such as identifying problems, setting realistic goals, generating suitable solutions and monitoring and evaluating outcomes) that are transferable to academic domains. Second, they provide a structured teaching approach that focuses on developing student-teacher relationships and creating a positive classroom environment that is safe, caring, well-managed and participatory. Such settings are more conducive to effective learning because they are likely to increase children’s engagement and academic readiness. Third, school staff with SEL training are likely to be better equipped to manage a classroom and deal with disruptive behaviour that may erode learning. More generally, learning is fundamentally a social process, and therefore, the extent to which children are equipped to successfully navigate the social environment will likely influence their academic progress. Thus, improvements in academic attainment are viewed as a distal outcome of SEL (Humphrey 2013).

The meta-analyses noted above support these assertions, indicating that SEL interventions have a small but nonetheless practically significant impact on academic performance (effect size ranges from d = 0.19 to d = 0.46) (Corcoran et al. 2018; Durlak et al. 2011; Goldberg et al. 2018; Sklad et al. 2012; Wigelsworth et al. 2016). However, the evidence base is much less developed than for other domains (e.g. mental health), with fewer than 20% of SEL intervention studies assessing academic outcomes and a lack of use of standardised assessment measures (Goldberg et al. 2018). Given the primacy of SEL in education systems worldwide, there is a clear and distinct need for further research in this area.

The Promoting Alternative Thinking Strategies curriculum

Promoting Alternative Thinking Strategies (PATHS) is a universal, curriculum-based SEL intervention for primary school–aged children. It aims to help them manage their behaviour, understand their emotions and work well with others. PATHS is designed to be delivered by class teachers in a series of lessons that include such topics as emotional awareness and understanding, self-control, interpersonal problem-solving skills, peer relationships, self-esteem and study skills. In addition, supplementary parent materials are provided that aim to extend learning to the home environment. Beyond the taught curriculum, teachers are encouraged to implement generalisation activities and techniques to support the application of new skills throughout the school day.

Multiple randomised controlled trials (RCTs) have demonstrated the positive impact of PATHS on a variety of outcomes, including children’s social and emotional competence (e.g. Domitrovich et al. 2007; Humphrey et al. 2016) and mental health (e.g. Crean and Johnson 2013). However, there has been limited research to establish the impact of PATHS on academic outcomes. To date, only two RCTs of PATHS have done so. The first of these, conducted with a sample of deaf students in special education, failed to find any significant effect of PATHS on reading or math scores (Greenberg et al. 2004). More recently, Schonfeld et al.’s (2015) trial appeared to demonstrate that PATHS improved children’s academic performance in mainstream elementary schools. However, the trial in question experienced an attrition rate of nearly 50%, modelled students’ academic attainment as a binary (e.g. whether or not they met basic proficiency standards) rather than as a continuous outcome and produced inconsistent findings (e.g. only four of nine outcome analyses found treatment effects). Given the above, there is genuine uncertainty regarding the extent to which PATHS can be claimed to influence children’s academic progress.

PATHS in the UK

PATHS was recommended for widespread implementation in the UK in an influential review of early intervention (Allen 2011). However, the two RCTs of PATHS in the UK to date yielded mixed findings, and assessment of outcomes was focused on children’s social and emotional skills, behaviour and/or mental health rather than their academic attainment. Thus, Ross et al.’s (2011) study of a Northern Irish cultural adaptation of the programme found that PATHS (rebranded as ‘Together 4 All’) produced effects that were ‘weak and inconsistent, but generally in a positive direction’ (p. 61), while a major trial of PATHS in Birmingham, England, yielded null results (Berry et al. 2016). The study reported herein is the third UK RCT of PATHS, and the first to assess its impact on children’s academic attainment. It is also one of only a handful of PATHS trials internationally to focus on children at the upper end of the primary phase of education (most having focused on children in pre-school and early primary settings). Our findings relating to social-emotional skills and mental health outcomes are reported elsewhere (Humphrey et al. 2016; Humphrey et al. 2018).

The importance of implementation

One explanation for the weaker and null effects found in the UK PATHS trials noted above is variability in delivery, which is increasingly recognised as a crucial moderator of intervention outcomes. At a general level, programmes which encounter implementation problems are much more likely to yield null or reduced effects (Durlak et al. 2011); such problems may be amplified when they are adopted in different countries and cultures (Lendrum and Wigelsworth 2013). It is therefore critical to incorporate an assessment of implementation into school-based trials. However, despite a general trend towards increased frequency of reporting on implementation (e.g. from 5% of studies in an early review to 69% at the time of writing) (Durlak 1997; Wigelsworth et al. 2016), most studies still offer only descriptive analyses (e.g. mean dosage and fidelity ratings), which are used to provide evidence that a given intervention was actually delivered and thus strengthen the internal validity of trial outcome analyses. Analyses in which researchers model implementation variability as a moderator of intervention effects remain relatively infrequent in spite of their obvious significance in terms of both internal and external experimental validity (Authors 2016a). Furthermore, existing studies have been characterised by a narrow focus on fidelity (the extent to which the implementer adheres to the intended treatment model) and dosage (how much of the intervention has been delivered), at the expense of other potentially important dimensions. These include quality (how well different programme components are delivered), participant responsiveness (the degree to which children engage with the intervention) and programme reach (the rate and scope of participation) (Durlak and DuPre 2008).

Studies of the PATHS curriculum provide a useful illustration of these issues. Many trials of PATHS simply provide descriptive implementation data (e.g. Domitrovich et al. 2007). Of those that have conducted analyses to examine the association between implementation variability and intervention outcomes, findings have been inconclusive. For example, while Berry et al. (2016) found no relationship between PATHS implementation fidelity and intervention outcomes, Schonfeld et al. (2015) demonstrated a natural variation dose-response relationship in intervention schools, with more children achieving basic proficiency in reading and maths in classes where the programme was delivered with greater frequency. Mirroring trends in the field more broadly, such studies maintained a relatively narrow focus, measuring only one or two dimensions of implementation. Humphrey, Barlow, and Lendrum (2018) conducted a more comprehensive analysis, with mixed findings. Thus, while positive associations between higher levels of quality and responsiveness and improvements in externalising problems were found, higher levels of dosage predicted reductions in pro-social behaviour and social-emotional skills. Clearly, there is much still to be learned about how implementation variability may moderate intervention outcomes in PATHS (and school-based SEL interventions more generally).

The current study

In 2012, the authors were jointly funded by the National Institute for Health Research (NIHR) and the Education Endowment Foundation (EEF) to conduct a 2-year trial of PATHS in English primary schools (ISRCTN85087674). Among the objectives of the trial were to (a) assess the impact of PATHS on a range of outcomes, including social and emotional skills, mental health and academic attainment; and (b) examine the extent to which implementation variability moderated these outcomes. An article reporting on the (mixed) effects of PATHS on social and emotional skills and mental health has already been published (Humphrey et al. 2016); here, we focus on academic progress. In addition to evaluating the impact of PATHS on academic outcomes in an intention-to-treat analysis (Gupta 2011), a subgroup analysis was also included to further our understanding of differential programme benefits for at-risk children (Durlak et al. 2011). In this case, those eligible for free school meals (FSM), a widely used proxy for lower socio-economic status, were selected as our at-risk group in view of the fact that they score significantly poorer on academic outcomes than their more affluent counterparts (Department for Education 2015). In theory, it is plausible that school-based interventions such as PATHS may compensate for some of the factors that constrain the academic achievement of these students (Dietrichson et al. 2017). As such, we were interested to learn whether PATHS could ‘narrow the attainment gap’, a key policy priority (Andrews et al. 2017). Finally, in view of the evidence presented above, we sought to assess the moderating influence of implementation variability on academic outcomes.

The specific study objectives were as follows:

  1. 1.

    To assess the impact of PATHS on children’s academic progress in English and Mathematics.

  2. 2.

    To determine if PATHS produces amplified effects on academic progress among children eligible for FSM.

  3. 3.

    To document variability in the implementation of PATHS and assess the extent to which it moderates children’s academic outcomes.

Method

Design

A 2-year cluster RCT with two arms (intervention - PATHS; and control - usual practice) was utilised. Schools were the unit of allocation. Random allocation was conducted independently of the research team by the Clinical Trials Unit at the Manchester Academic Health Science Centre (MAHSC-CTU), and was balanced by proportions of children eligible for FSM and speaking English as an additional language (EAL) via adaptive stratification (minimisation). Outcomes were assessed at baseline (summer term 2012) and at the end of the 2-year trial at post-test follow-up (summer term 2014), with structured observations of implementation taking place in between (specifically, autumn term 2013 and spring term 2014).

Participants

Schools

Fifty-eight schools were recruited from seven local authorities (LAs – akin to school districts), of whom 45 met the eligibility criteria for randomisation, which included completion of baseline measures and signing a memorandum of agreement to adhere to the trial protocol. School sizes ranged from single- to three-form entry (i.e. number of classes per year group, with an average of 27 pupils per class), and were similarly split across trial arms (PATHS: single form = 11 schools, double form = 10 schools, triple form = 2 schools, UP: single form = 12 schools, double form = 8 schools, triple form = 2 schools). Participating schools were representative of norms in England in respect of size, attendance, ethnic composition, attainment and the proportion of children identified as having special educational needs, but had higher proportions of children eligible for FSM and speaking EAL than national averages (Department for Education 2010, 2012, 2013, 2014; see Table 1). There were minimal differences in school demographics (i.e. proportions of children eligible for FSM, with special education needs or disabilities (SEND), and speaking EAL) between the 23 schools allocated to the PATHS arm of the trial and the 22 schools continuing usual practice (F(6, 36) = 1.12, p = 0.236); however, of note is absence (percentage of half days missed, d = 0.48) and attainment (the proportion of students achieving government standards in English and maths, d = 0.43) where non-significant small-to-medium differences were observed.

Table 1 School sample characteristics and national averages

Students

Year 5

There were 1705 year 5 children in the 45 participating schools (889 male, 816 female), with an average age of 10 years, 2 months (range 9 years, 8 months to 10 years, 11 months). After accounting for school-level covariates (proportions of children eligible for free school meals FSM and speaking EAL—the minimisation variables used to ensure balance at randomisation), as is recommended practice when determining power and the minimal detectable effect size (Hedges and Hedberg 2007), the sample intra-cluster correlation co-efficient (ICC) was 0.08. With an average of 38 children per cluster, an average pre-post correlation of 0.71 for our outcome data and power and alpha set to 0.8 and 0.05, respectively, the minimum detectable effect size (MDES) was determined to be 0.19.

Year 6

There were 1631 year 6 children in the 45 participating schools (834 male, 797 female), with an average of age of 11 years, 3 months (range 8 years, 11 months to 11 years, 8 months). After accounting for the covariates specified above, the sample ICC was 0.04. With an average of 36 children per cluster, a pre-post correlation of 0.73 for our outcome data and power and alpha set to 0.8 and 0.05, respectively, the MDES was determined to be 0.17. Given the reported effect sizes relating to the impact of SEL intervention on academic outcomes noted earlier, our analyses were deemed to be more than adequately powered for both year groups.

The demographic characteristics of each year group cohort were consistent with national norms, albeit with the same exceptions noted above regarding school characteristics (e.g. higher proportions of EAL and FSM; Department for Education 2010, 2015) and differences between trial arms were also minimal with effect sizes below 0.11 (see Table 2).

Table 2 Pupil sample characteristics and national averages

Teachers/classrooms

In the 23 PATHS schools, implementation data from 27/33 (81%) year 5 classes and 32/34 (94%) year 6 classes were collected in the second year of the trialFootnote 1. Teachers of these classes had on average 8 years of teaching experience, were predominantly female (74%), educated to postgraduate level (53% had a postgraduate certificate, Master’s degree or doctorate) and the largest proportion (37%) reported having over 5 years’ experience implementing other SEL programmes prior becoming involved in the current study; these experienced SEL teachers were evenly distributed across all PATHS schools.

Measures

Academic outcomes

Schools in England follow a National Curriculum divided into aged-related Key Stages. At the end of each Key Stage, students are assessed in the core subjects of English, Mathematics and Science. The study design allowed us to collate this information at the project baseline, Key Stage 1 at (year 2, age 7 years), and at the follow-up, Key Stage 2 (year 6, age 11 years). Key Stage 1 national curriculum assessments were used to generate pre-test outcome data relating to attainment in English and Mathematics for both year groups. These assessments were used as they offer optimal external validity (providing the metric by which the academic progress of children is judged as they transition into Key Stage 2 in England) and no additional data burden for participating schools (since the assessments are statutory). These data were extracted from the National Pupil Database (NPD).

For children in year 5, independent standardised testing of academic attainment (reading, maths) at post-test was carried out via the Interactive Computerised Assessment System (InCAS) (Merrell and Tymms 2007). The Rasch Person Reliability of the different components of InCAS exceeds 0.92 in all cases, and scores are predictive of future, external assessment scores (e.g. the correlation with Performance Indicators in Primary Schools reading test scores administered 12 months later is 0.72) (Centre for Evaluation and Monitoring 2013). InCAS assessments were administered by members of the trial research team. Scoring was undertaken by CEM, who were blind to the allocation status of individual schools and children.

For children in year 6, end of Key Stage 2 Standardised Assessment Test (SAT) English and Mathematics post-test data were available from the NPD. These assessments are administered by school staff following the government’s recommended protocol (Standards and Testing Agency 2014). Scoring was undertaken by the Standards and Testing Agency, who were blind to the allocation status of individual schools and children. As with the pre-test data, the Key Stage 2 SAT data were used as they offer optimal external validity and minimised data burden. Independent psychometric assessment of Key Stage 2 SAT data has provided evidence of strong internal consistency (α > 0.91) and classification accuracy (> 85%) for each subject (He et al. 2013), in addition to predicting future academic outcomes at ages 14 and 16, respectively (Strand 2006).

Implementation

We sought to assess five aspects of implementation (fidelity, quality, dosage, reach and participant responsiveness) outlined by Durlak and DuPre (2008) via structured observations conducted by three trained research assistants. The observation schedule drew upon implementation theory (Berkel et al. 2011), existing rubrics used in previous studies of PATHS (e.g. Kam et al. 2003), advice from the programme developer and the extant literature on the assessment of implementation (e.g. Hansen 2014). The PATHS lesson observation rubric contained assessed (i) fidelity—adherence to the core elements, lesson structure and objectives (three items); (ii) quality—teacher preparedness, enthusiasm, clarity and responsiveness (four items); (iii) participant responsiveness—children’s engagement with and responsiveness to the lesson (two items). All items were scored 0–10 and a mean score per domain could be calculated (with higher scores indicative of higher levels of implementation). Participant reach was measured as the proportion of the class present for the PATHS lesson. A projected dosage indicator (% lessons delivered by the end of the school year) was recorded based on progress against the delivery schedule included in the aforementioned implementation manual. For example, if at the point of observation, the teacher should have been delivering lesson number 10 according to the delivery schedule, and they were in fact delivering lesson 5, this would be coded as 50%. Similarly, if at the point of observation, the teacher should have been delivering lesson 18 according to the delivery schedule, and they were in fact delivering lesson 12, this would be coded as 67%. The rubric was piloted and refined using video footage of PATHS implementation recorded in a previous trial (Berry et al. 2016). Additional footage was used to assess inter-rater reliability (ICC = 0.91). During live trial observations, each teacher was observed implementing a single PATHS lesson. A randomly selected 10% of observations were moderated in order to guard against drift over time.

To streamline analyses, avoid collinearity and model overfitting and establish clear differentiation between implementation constructs, exploratory factor analysis (EFA) was conducted on the data generated. This process identified two distinct factors, accounting for 69% of the explained common variance in the data, corresponding to procedural fidelity (α = 0.93), and quality and responsiveness (α = 0.93), respectively, which were clearly distinct from one another, and from the dosage and reach indicators (Authors 2017).

For the n = 59 classes observed in the current study, drawn from the second year of the trial, mean fidelity (8.09), quality/responsiveness (8.07) and reach (9.08) were uniformly high (possible range 0–10). Thus, our implementation data were demonstrative of a general trend with almost all children in a given class present, teachers adhering to most procedural elements outlined in the intervention manual, delivering lessons well and with children responding appropriately. Mean dosage was 39% (range 7–70%), signifying that at the point of the lesson observation, teachers were behind the recommended delivery schedule of two lessons a week. They were, on average, delivering slightly less than one lesson a week, and only over a third of the way through the curriculum that should have delivered at the point of observation.

Intervention

PATHS is underpinned by the Affective-Behavioural-Cognitive-Developmental model of development, which emphasises the developmental integration of affect, emotion language, behaviour and cognitive understanding to promote social-emotional competence (Greenberg and Kusche 1993). Each class receives curriculum packs containing lessons and send-home activities on topics such as identifying and labeling feelings, controlling impulses, reducing stress and understanding other people’s perspectives. Associated physical resources and artifacts (e.g. posters, feelings dictionaries) are also provided. In the current study, class teachers were also given an implementation guidance manual developed by the research team that emphasised the PATHS programme theory and the importance of effective implementation (available on request from the research team).

PATHS lessons follow a standard format that includes an introduction from the teacher, a main activity and a brief plenary/closure. Prompts to elicit pupil responses and clarify learning are included throughout. The programme utilises a ‘spiral’ curriculum model (e.g. topics and concepts are revisited, units and lessons are developmentally sequenced) and is designed to be delivered by class teachers in general education classrooms. PATHS lessons last approximately 30–40 min and are designed to be delivered twice weekly throughout the school year. Curriculum packs contain 42 lessons in year 5 and 32 lessons in year 6, but average 40 lessons across all year group curriculums.

Teachers in PATHS schools received one full day of initial training with a half-day follow-up 4 months later, led by certified trainers from Pennsylvania State University (PSU). Training included a range of activities designed to familiarise teachers with PATHS theory, concepts and materials. In addition to this training, on-going technical support and assistance was provided by three coaches trained by PSU staff, who received on-going supervision throughout the trial. As per the US model, these coaching visits were bespoke to schools’ needs, but were typically once a month/half-term and included modelling and observing PATHS lesson and providing feedback, and providing phone and email support to address concerns and queries. The coaches also played a role in helping embed PATHS across the school by way of training additional staff members (i.e. lunchtime organisers), briefing school senior leadership and governors and talking with parents.

Schools assigned to the control arm of the trial continued their usual practice. In line with calls for reports of school-based trials to offer more detail on study counterfactuals (Humphrey et al. 2016), and in recognition of the fact that the notion of an ‘untreated’ control group is a ‘fantasy’ (Durlak 2015, p. 1125), we provide details regarding what this usual practice looked like. Primary schools in England typically implement lessons on Personal, Social and Health Education (PSHE) as part of the standard school curriculum as well as other SEL and SEL-related activities. Schools in the usual practice arm of the trial reported using universal initiatives such as Social and Emotional Aspects of Learning (SEAL) whole school resources (81%) and individual lessons (62%), National Healthy Schools programme (81%) and Circle Time (57%); and targeted initiatives such as SEAL small group work (24%) and Family SEAL (10%), Targeted Mental Health in Schools (19%), Circle of Friends (19%), Nurture Groups (29%) and restorative justice (24%).

Procedure

Key Stage 1 assessments were undertaken prior to the trial, in the summer terms of 2010 (year 6 cohort) and 2011 (year 5 cohort), respectively. Randomisation of schools took place in the summer term of 2012. Key Stage 2 (year 6) and InCAS (year 5) assessments were administered at the end of the trial in the summer term of 2014. Assessment of implementation of year 5 and 6 classes took place between November 2013 and April 2014.

Figure 1 illustrates the flow of participating schools and children through the trial. In the interests of clarity, we provide separate flow information for the two cohorts because of their differential attrition rates at school (e.g. due to a school dropping out of the trial) and child (e.g. due to absence) levels.

Fig. 1
figure 1

Flow of schools and pupils through main trial

Analytical strategy

Outcome data were standardised (e.g. converted to z-scores) prior to analysis to facilitate comparison of effect size (Cohen’s d) within and across models. In view of the hierarchical and clustered nature of the dataset, we used fixed effects with random intercepts two-level (school, child) hierarchical linear modelling (HLM) in MLwiN 2.36, using IGLS (iterative generalised least squares) equivalent to maximum likelihood estimation. Intention-to-treat (ITT, objective 1) and subgroup (objective 2) analyses were performed. Group allocation (PATHS versus usual practice) and minimisation variables (proportions of pupils eligible for FSM or speaking EAL) were fitted at the school level. Sex, FSM eligibility (risk status) and pre-test academic outcome data were fitted at the child level. Interactions between group allocation and FSM were fitted to assess subgroup effects.

For the year 6 cohort (n = 1631), there was zero attrition at the school level and 3% (n = 49) at the child level. Accordingly, complete case analysis was undertaken as missing data were < 5% (Schulz and Grimes 2002). For the year 5 cohort (n = 1705) there was incomplete data for 35% (n = 588), made up of 16% school-level attrition (8 schools, n = 268) and 19% child-level attrition (n = 320). Analyses of missingness were therefore conducted. Comparison of schools lost to follow-up with those retained did not reveal significant differences on the characteristics noted above (F(6, 36) = 1.55, p = 0.189). Following Pampaka et al. (2016), missingness at the child level was investigated using logistic regression. Children with complete versus incomplete data did not differ in terms of their gender, language group or prior academic attainment. However, those with incomplete data were significantly more likely to be eligible for FSM (β = 0.62, p < 0.001). Accordingly, multiple imputation procedures were carried out in REALCOM-Impute, using the missing at random assumption (that is, missingness is conditional on other observed variables) (Carpenter et al. 2011). This enabled us to include both partially and completely observed cases of all 45 schools and 1705 students in the analysis, thereby reducing the bias associated with attrition. Demographic variables (e.g. gender, FSM eligibility, ethnicity, EAL, SEND provision, attendance), explanatory outcome variables (e.g. Key Stage 1 English and maths, InCAS reading and maths at follow-up) and the constant were entered as auxiliary variables and used to impute missing values. REALCOM-Impute default settings of 1000 iterations and a burn-in of 100, refresh of 10, were used, following guidance for multi-level imputation with mixed response types (Carpenter et al. 2011).

Implementation data from PATHS schools were analysed using a person-centred approach (namely, cluster analysis) in recognition of the fact that the variable-driven techniques (where each dimension of implementation is assumed to be independent) used in many studies are likely due to convention or convenience and do not reflect the inter-relations between dimensions proposed by most theoretical models of implementation (Low et al. 2016). Cluster analysis was selected as it has been shown to produce higher within group homogeneity and between group heterogeneity than other person-centred approaches (Eshghi et al. 2011). In order to maximise sample size, the analysis was conducted using data from all classes observed during the course of the full trial (n = 127) as opposed just to the year 5 and 6 classes reported here (n = 59).

Hierarchical cluster analysis (HCA) using Ward’s method and applying squared Euclidean distance was undertaken in SPSS version 22. The four implementation indicators (procedural fidelity, quality/responsiveness, dosage and reach) identified in the aforementioned factor analysis were used as classifying variables to determine the optimum number of clusters. A suitable cluster solution was accepted based on consideration of both the dendogram plots and the agglomeration schedule coefficients. A second run of the HCA was then used in order to assign cluster membership to each class teacher for use in later analyses. Finally, multivariate analysis of variance (MANOVA) was conducted to determine whether the classifying variables were significantly different between the clusters.

Having established distinct implementation profiles, these were fitted at the class level in two-level (class, child) HLMs, using a dummy variable approach in which the profile representing the lowest levels of implementation was used as the reference group. As above, sex, FSM eligibility and pre-test academic outcome data fitted at the child level. Multiple imputation procedures were carried out for the year 5 analysis as missing data were > 5% (as outlined above), and complete case analysis was undertaken for the year 6 data.

Results

Impact of PATHS

Unconditional models (i.e. empty models without the inclusion of explanatory variables) demonstrated the school level of the model accounted for a significant amount of variance and ranged between 10% (year 5 reading) to 16% (year 5 maths). The conditional ITT models outlined above demonstrated that PATHS had no discernible impact upon children’s attainment. This was consistent across year groups (year 5 and year 6), outcome measures (InCAS and Key Stage 2 assessment), subject areas (Maths and English/Reading) and analytical frames (ITT and subgroup) (see Table 3; all p > 0.05).

Table 3 Hierarchical linear models of the impact of PATHS curriculum on academic outcomes

PATHS implementation profiles

The analysis of implementation data indicated that a five-cluster solution was optimal, as at this point the agglomeration coefficients stabilised. In addition, inspection of the dendogram showed five clear clusters: two major and three minor. Of note however is the fact that none of the year 5 or 6 classes whose data are utilised in the current study belonged to the first cluster (C1 - very high dosage group; see below). Thus, for the analyses reported herein, only four clusters (C2 to C5) are used. Mean levels of implementation fidelity, quality/responsiveness, dosage and reach in each cluster are outlined in Table 4. The MANOVA indicated that the clusters were significantly different across the classifying variables (F(16, 488) = 10.11, p < 0.001); however, follow-up univariate ANOVAs revealed that this was driven primarily by differences in dosage (F(4, 122) = 336.04, p < 0.001). Consequently, the five clusters were labelled on this basis as very high dosage (C1), high dosage (C2), moderate dosage (C3), low dosage (C4) and very low dosage (C5), respectively. It is noteworthy that despite dosage being the principal differentiating variable among the implementation clusters, the high dosage observed in C2 appears to be somewhat at the expense of quality/responsiveness, with lower levels of the latter compared with all other clusters except for C5 (see Fig. 2).

Table 4 Descriptive statistics (n, means and SDs) for implementation clusters
Fig. 2
figure 2

Implementation factors plotted by cluster group

Unconditional models (i.e. empty models without the inclusion of explanatory variables) demonstrated the classroom level of the model accounted for a significant amount of variance and ranged between 8% (year 6 maths) to 18% (year 6 English). Analyses of the moderating influence of PATHS implementation variability (see Table 5) did not find any significant associations between cluster membership and academic outcomes. This was consistent across year groups (year 5 and year 6), outcome measures (InCAS and Key Stage 2 assessment) and subject areas (Maths and English/Reading) (all p > 0.05).

Table 5 Hierarchical linear models of the association between implementation variability and academic outcomes in the PATHS curriculum

Discussion

The current study is among the first to rigorously assess the impact of PATHS on children’s academic outcomes. We found no discernible effect of the programme compared with usual practice. This was consistent across year groups, outcome measures (both national and independent standardised assessments), subject areas and analytical frames (ITT and at-risk subgroup). Our second objective was to document profiles of PATHS implementation and examine their moderating role in relation to intervention outcomes, using a more comprehensive (e.g. extending beyond dosage and fidelity) and theoretically informed (e.g. person-centred) approach than has been evident in previous research. Our analyses identified distinct implementation profiles, differentiated primarily by varying levels of dosage. However, profile membership was not significantly associated with any academic outcomes.

Previous research on the impact of SEL interventions on academic outcomes has evidenced small but nonetheless practically meaningful effects (Corcoran et al. 2018; Durlak et al. 2011; Goldberg et al. 2018; Sklad et al. 2012; Wigelsworth et al. 2016). The null ITT and subgroup findings of the current study may therefore appear incongruous. However, there are multiple potential explanations for these findings. First, the PATHS intervention may not be sufficiently distinct from usual practice in English schools (that is, there was limited programme differentiation) to produce measurable effects. As noted, many of the control schools reported using numerous alternative approaches to SEL. In particular, PATHS shares a number of features with the primary SEAL programme, which was the default universal SEL intervention for most schools in England during the trial period. Thus, we might conclude from the current study not that PATHS is completely ineffective in improving children’s academic outcomes, but that it is no more effective than the SEL provision already in place in English primary schools. Future research may wish to include measures of ‘general’ SEL implementation/quality in both treatment and control schools to explore this analytically. Furthermore, the current study was conducted in the UK, in which the effects of PATHS documented through other trials have been weaker and more mixed (Ross et al. 2011) and in some cases null (Berry et al. 2016) than in the US-based studies that preceded them. Therefore, the null results observed here may reflect a wider issue pertaining to cultural transferability (Lendrum and Wigelsworth 2013).

Secondly, as an SEL intervention, the primary aim of PATHS is to promote social and emotional skills; effects on academic outcomes are viewed as an indirect, distal effect of these proximal changes. The PATHS logic model views learning as a social process, and therefore, the extent to which children are able to manage their behaviour, understand their emotions (skills promoted in the PATHS curriculum), will ultimately lead to pupils working well together, which in turn will likely influence how well they do academically (Humphrey 2013; Schonfeld et al. 2015). Given that improvements in academic attainment are viewed as a distal outcome of PATHS, a case could be made that our trial design was not of sufficient length to allow measurable improvements in attainment to be triggered. Therefore, it would be useful to assess follow-up effects of PATHS on academic outcomes in the future, as a means to identify the so-called ‘sleeper’ effects. A related possibility is that the (small) intervention effects we identified in relation to more proximal outcomes of PATHS (e.g. social and emotional skills and psychological well-being) (Humphrey et al. 2016; Humphrey et al. 2018) were of insufficient magnitude to trigger downstream changes in academic outcomes.

However, this does not adequately explain the difference in findings between the current study and that of Schonfeld et al. (2015), who reported significant improvements in both reading and maths (albeit attainment of basic level of proficiency as opposed to the degree of change in test scores reported in this study) as a result of exposure to PATHS. One possibility is that the duration of exposure is crucial, as the Schonfeld trial (ibid) was conducted over 4 years (as opposed to 2 years, as in the current study). However, the recent meta-analyses of universal SEL interventions (Durlak et al. 2011; Goldberg et al. 2018; Sklad et al. 2012; Wigelsworth et al. 2016) each reported meaningful improvements in academic attainment for interventions over much shorter periods of time (e.g. 77% of the interventions reported by Durlak et al. (2011) lasted less than 1 year).

Analyses of implementation identified five clusters, which primarily differed on dosage, while other aspects of implementation (i.e. fidelity, quality and responsiveness and reach) were relatively high and comparable across clusters. This highlights that when delivering PATHS, the schools were engaged and that the key challenge faced was finding time to fit the PATHS lessons into the curriculum, and indeed this was noted as the greatest barrier to implementation (Humphrey et al. 2018). Despite the fact the current study was an efficacy trial with externally trained coaches and support mechanisms in place to facilitate the quality and delivery of the programme, variability in delivery, specifically the frequency of lesson implementation, was noted, and could thus be monitored and assessed to allow us to determine the extent to which it moderates children’s academic outcomes. However, this implementation variability did not appear to moderate these outcomes. As with our impact analyses, this was consistent across year groups, outcome measures and subject areas. This finding is in contrast to much previous research, both in relation to PATHS and the field of SEL more generally (Durlak 2016). However, as noted earlier, most analyses of the relationship between implementation and outcomes in school-based interventions published to date have adopted a variable-centred approach. Person-centred analyses, which remain under-utilised despite arguably better reflecting implementation theory (e.g. Berkel et al. 2011), are not directly comparable with findings derived from such studies. However, the few examples of person-centred implementation analyses of school-based SEL and related interventions (e.g. Low et al. 2016) have established significant associations between overall implementation quality and intervention outcomes, including academic scores (e.g. Dix et al. 2012). Hence, it is unlikely that the use of person-centred modelling accounts for our findings. Rather, it may simply be that in PATHS, variability in implementation does not influence academic outcomes. As Durlak has noted, ‘we should not assume that each component [of implementation] is equally important for all possible outcomes’ (2015, p.1126).

Strengths and limitations

The current study has numerous strengths, increasing the confidence that can be placed in our principal (impact) findings. We utilised a cluster-randomised design with appropriate analysis that took account of the hierarchical and clustered nature of the dataset. Randomisation was conducted independently of the evaluation team. The trial was large and well powered, with MDESs that were below the average effects previously identified in the literature (Corcoran et al. 2018; Durlak et al. 2011; Goldberg et al. 2018; Sklad et al. 2012; Wigelsworth et al. 2016). Balance between trial arms was good. There was virtually no attrition in the year 6 cohort, and though there was moderate loss to follow-up in the year 5 cohort, we addressed this via multiple imputation in order to minimise bias. Our findings were cross-validated across different year groups (e.g. years 5 and 6), subjects (e.g. English/reading and maths) and method of assessment (e.g. independent standardised assessments - InCAS and national standardised assessment tests - Key Stage 2). The outcome measures used can be considered reliable, externally valid and not intervention-specific (e.g. not ‘inherent to treatment’). The research was conducted across seven LAs, and trial school composition mirrored that of primary schools in England in respect of size and the proportion of students speaking EAL (albeit with larger proportions of children with SEND and eligible for FSM, in addition to lower rates of absence and attainment).

Nonetheless, there are two key limitations to note. First, our assessment of implementation revealed that dosage appeared to be suboptimal (as a reminder, teachers were implementing PATHS at slightly less than half the recommended frequency in the second year of the trial). Other studies of PATHS have reported similar dosage rates (e.g. Faria et al. 2013). Although this is not a limitation of our research design per se, it could be argued that the apparent failure of PATHS to improve children’s academic attainment in this trial was attributable to implementation failure, with children needing a certain amount of consistent exposure in order to produce the kind of meaningful changes in their social and emotional skills and improvements in their classroom climate that could then feasibly influence their learning and attainment. However, even if this is indeed the case, questions are still raised in relation to the feasibility of PATHS as a tool for improving the attainment of children in England. That is, if schools in a major trial in which training, materials and external support and assistance were made available at no cost were not able to deliver PATHS at the frequency and consistency required to trigger academic change, what is the likelihood that schools will be able to do this in typical circumstances/conditions? It should also be recognised that bringing in a major evidence-based program does not necessarily engender commitment on the part of teachers; in some instances, this change can be met with reticence and reluctance which could manifest in reduced dosage, or difficulties in creating the climate, space and context to ensure implementation would take place. Nonetheless, implementation did not moderate the relationship with academic outcomes—schools who delivered more PATHS lessons did not observe increased English and maths scores compared to those who delivered fewer.

Second, although generally considered to be a more valid means through which to assess implementation than the more frequently used teacher self-report surveys (Humphrey 2013), independent observations only provide a snapshot of activity that may or may not be representative of implementation activity across the school year. Furthermore, it is impossible to rule out the so-called ‘observer effects’ (that is, a change in behaviour as a result of knowing one is being observed).

Conclusion

The findings of this trial highlight the importance of reporting null results (Fiennes 2018). Programme developers need to know whether the previously reported effects of their intervention can be replicated, and where this is not the case, it gives them the opportunity to adapt it to ensure its utility in different countries. Furthermore, funding bodies and schools need to be aware of the efficacy of a given programme, particularly in their specific cultural context, before they make a decision to adopt and implement it. More broadly, dissemination of findings such as those reported in the current study helps to address the widely publicised publication bias towards statistically significant, ‘positive’ results (Fanelli 2010). This is important for the advancement of knowledge in the field and in reducing the disconnect between scientific worth and scientific culture (Matosin et al. 2014).

Thus, in light of the security of our findings and their likely generalisability, it is not possible to recommend PATHS as an effective intervention for improving the academic attainment of children in English primary schools. However, this does not necessarily mean that it is not beneficial in other ways, and continued research to explore this possibility seems warranted (Humphrey et al. 2018).