Management in education systems

There is increasing interest in measuring management in schools. This paper discusses a popular measurement tool: the World Management Survey (WMS) for schools. Drawing on WMS data, secondary sources, and the recent literature on school management, we take stock of the WMS and make recommendations for its use in future research and policy. We conclude that the WMS remains a highly useful tool for its stated purpose—the standardized measurement of (a subset of) management practices within schools—and make two sets of recommendations. First, we encourage those seeking to benchmark management practices in schools to take a systems perspective by extending the WMS approach upwards into the education bureaucracy. Second, when measuring practices within schools, we recommend that researchers consider: how best to assess alignment across practices in the operations domain; the challenge of measuring student learning for monitoring and target-setting; and the context specificity of people management.


I. Introduction
There is increasing interest in measuring management in schools, as researchers and policy-makers try to understand which management practices can improve classroom teaching and student learning outcomes. Among economists, the World Management Survey (WMS) for schools is increasingly seen as the gold standard for standardized comparisons of school management, with the instrument (and associated data) being used dozens of times since its inception in 2009.
In a spirit similar to the article by Scur et al. (2021) in this issue, our goal is to take stock of the WMS for schools and to offer recommendations for its future use in research and policy. We begin in section II by setting out the objective of the WMS for schools and the management practices that were selected for inclusion using its monotonic 'more is better' scale. Drawing on data from the original survey waves, we also describe which types of schools score highly on which management practices. Next, in section III, we offer our reflections on the WMS for schools, based on this analysis of the original data, secondary sources, and a review of the recent literature on school management. 1 Finally, in section IV we present our main conclusions. We argue that the WMS for schools remains a highly useful tool for the purpose for which it was designed: benchmarking of specific school-level management practices. However, we encourage researchers to view management in schools from a systems perspective (and hence to expand the WMS approach 'upwards' into the education bureaucracy), and to think carefully about the importance of alignment, the challenge of measuring student learning, and the context-specific nature of people management when applying the WMS instrument within schools.

II. The World Management Survey for schools (i) What is the objective of the WMS for schools, and which practices are included?
As Scur et al. (2021) note in their article in this issue, the goal of the WMS project is to systematically collect data on the usage of management practices at scale, while ensuring comparability across settings and over time. For this reason, the original WMS was based on a selected set of management practices-those for which the WMS team found there was a high degree of consensus ex ante (among consultants and industry experts) that the practice was likely to be 'good', in the sense of being a causal determinant of better firm performance. For these selected practices, it was felt that adoption (of the practice) could be scored on a monotonic 'more is better' scale, with limited adoption being given a score of 1 and thorough adoption a score of 5. As with prior surveys, this is ultimately an empirical question and was tested with performance data after the pilot survey wave.
The WMS for schools was developed in 2009, 7 years after the original survey. 2 Again, the aim was not to be exhaustive but rather to select school management practices that were seen (by education practitioners and academics whom the team consulted) at that time to be important, and where adoption could plausibly be scored on a monotonic 'more is better' scale.
The management practices that were selected for inclusion in the WMS for schools fall under the same four domains used in the original survey: operations, monitoring, target setting, and people management. The 20 individual practices are listed in Appendix 1 We focus on the substance of the WMS for schools-i.e. the selection of management practices-rather than the survey methodology (in which trained enumerators pose open-ended interview questions to school principals, and then numerically score their responses based on a detailed rubric; see Bloom et al. (2015)). 2 The full survey instrument for the WMS for schools can be found online at www.worldmanagementsurvey.org. The Development WMS for schools (Lemos and Scur, 2016) is an expanded instrument for use in developing countries and is available at www.developingmanagement.org. Table A1, together with a descriptor for thorough adoption of each practice-the top of the monotonic 'more is better' scale. It is clear from the table that the people management practices relate to individual teachers, whereas the operations, monitoring, and target-setting practices relate to the school as an organization. For this reason, we follow the WMS convention and refer to the grouping of operations, monitoring, and target-setting practices as 'non-people' management.
(ii) Which types of schools score highly on which management practices?
Results from the initial waves of the WMS for schools are reported in Bloom et al. (2015) and are based on 1,849 schools, across eight countries. The authors classify these schools into three groups depending on their source of funding and their degree of operational autonomy. As Table 1 shows, there are: 1,237 'regular government schools' that are publicly funded and operate within the public regulatory framework; 483 'private schools' that are privately funded and operate using a school-specific charter; and 129 'autonomous government schools' that receive some public funding and have operational autonomy in at least one of curriculum content, teacher selection, and student admissions. These totals do, of course, mask differences across countries. Notably, the WMS for schools sample for Sweden does not include any private schools, and the sample for Italy does not include any autonomous government schools. Bloom et al. (2015) explore whether there are differences in the overall management score-obtained by aggregating the individual scores (measured on the 1-5 scale) for each of the 20 practices in Table A1-by this tripartite classification of school type. They report that, among the OECD WMS countries (Canada, Germany, Italy, Sweden, the UK, and the US) and also for Brazil, autonomous government schools have higher overall management scores than regular government schools (see their Figure 3). This  Brazil  513  372  3  138  Canada  146  111  17  18  Germany  140  124  12  4  India  318  109  22  187  Italy  284  222  0  62  Sweden  88  65  23  0  UK  92  43  38  11  US  268  191  14  63 Notes: Data for 1,849 schools from the sample constructed by Bloom et al. (2015) but less two observations for the US which were found to be duplicates. Bloom et al. (2015) define autonomous government schools as schools receiving at least partial funding from the government and with at least limited autonomy in one of three areas: establishing the curriculum content; selecting teachers; and admitting pupils. Details are provided in Table 1 of their paper. difference in mean scores persists even after including a variety of school and survey controls (see their Table 5).
We extend the analysis in Bloom et al. (2015) in two ways, first by disaggregating the overall management score into a non-people management score vs a people management score, and second by exploring whether there are differences in the variation of management scores across school types and not simply the mean. 3 Since there are so few autonomous government schools, we pool this group with private schools, referring to this category simply as 'other schools'. Table 2 reports results for the non-people management score (the average of operations, monitoring, and target setting) and the people management score. 4 This disaggregation turns out to be revealing.

Non-people management.
Pooling across countries, the average non-people management score is higher among regular government schools (2.43) than among other schools (2.31). The difference is small at 0.12 points (on a scale of 1-5) but is statistically significant at 1 per cent. This pooled result runs counter to the finding in Bloom et al. (2015) for the overall management score and is clearly driven by a subset of countries, namely Canada, Italy, and the US. In the two non-OECD countries, Brazil and India, as well as Sweden, the average non-people management score is lower among regular government schools than in other schools. The difference in means is small in India, but larger in Sweden (0.34 points) and Brazil (0.22 points), and is statistically significant at 1 per cent in all three countries.
Brazil and Sweden are also notable in terms of the variance of the non-people management score. In these two countries, non-people management scores are less dispersed among regular government schools than among other schools, as shown by the smaller standard deviations in Table 2. We can reject the null of equal variances at 5 per cent (or less) for both countries. For the other six countries, the difference in standard deviation across school types is small and not statistically significant at conventional levels.
In Figure 1, we present these results graphically by plotting the discrete density of the non-people management score in regular government schools (pale bars) in each of the 8 WMS countries, overlaying the smoothed kernel density of the non-people management score in other schools (plotted line) for comparison. For Brazil and Sweden, the discrete density is clearly less dispersed than the smoothed kernel density, but this is not the case for the other six countries.

People management
Pooling across countries, the average people management score is lower among regular government schools (1.92) than among other schools (2.18). The difference is large at 0.26 points and is again statistically significant at 1 per cent. In contrast to the nonpeople management score, this finding is not driven by a subset of countries; the sign of the difference in means is the same-lower in regular government schools-across all eight countries. The difference in means is largest in Brazil (0.72 points), Germany (0.47 points), Sweden (0.39 points), and Italy (0.28 points), and is statistically significant at 5 per cent or less in every country except Canada and the UK. In Brazil, Germany, Sweden, and Italy, people management scores are also less dispersed among regular government schools than among other schools, as shown by the smaller standard deviations in Table 2. We can reject the null of equal variances at 5 per cent or less for all four countries. In Canada, India, the UK, and the US, the standard deviation of the people management score is lower in regular government schools than in other schools, but the differences are small and (with the exception of the UK) not statistically significant at conventional levels. Figure 2 reproduces Figure 1 for the people management score. The plots for Brazil, Sweden, and Germany are particularly striking: people management scores are substantially less dispersed in regular government schools than in other schools.
To sum up, there are two takeaways from this re-examination of the Bloom et al. (2015) data. First, when studying adoption patterns across types of school it is important to distinguish between non-people and people management practices. Although the mean overall management score is lower in regular government schools than other schools, in several OECD countries the mean non-people management score is actually higher in regular government schools than other schools. It is not true that management practices are universally worse (on the WMS coding) in schools that are publicly funded and that operate within the public regulatory framework, as the regular government schools in the WMS for schools dataset for Canada, Italy, and the US illustrate.
Second, when studying adoption patterns across types of school it is instructive to look at the variation in management practices, and not simply the mean. In Brazil and Sweden, both non-people and people management scores are lower and less dispersed in regular government schools than in other schools. In Germany and Italy, this is true for people management scores, although not for non-people management scores. The fact that management scores vary less in schools that have been granted less operational autonomy is, of course, intuitive and points to the importance of thinking about school management in the context of the wider education system. We return to this issue in section III(i) below.

III. Reflections on the WMS for schools at 11
In their article in this issue, Scur et al. (2021) reflect on the WMS at 18. The WMS for schools is much younger but, over the last decade since its creation, much research has been undertaken on the topic of management in education and so reflection is also timely. In this section, we focus on two issues. The first is scopewhether the WMS approach should be broadened to bring in the wider education system; and the second is measurement-whether the management practices that were selected for inclusion in the WMS for schools in 2009 (on the monotonic 'more is better' scale) remain appropriate today, and whether there are new areas worthy of focus. Bloom et al. (2015) report that the overall management score is, on average, higher in autonomous government schools than in regular public schools. Responding to this finding, they attempt to answer the question 'what explains the advantage of autonomous government schools?' by regressing the overall management score on a set of school type dummies (with regular government school as the omitted category) and a wide range of school and principal controls. The positive coefficient on autonomous government school remains more or less stable until controls are added for 'principal accountability' (the degree to which the principal is accountable to institutional stakeholders such as school external boards) and 'principal strategy' (the degree to which the principal communicates a well-articulated strategy for the school over the next 5 years). In these specifications, the coefficient halves in size (0.129 compared to 0.233 in the specification with no additional controls) but remains both economically and statistically significant. Differences in school and principal characteristics therefore account for some of the overall management score advantage of autonomous government schools, but certainly not all of it. Another way to think about the difference in management scores across school types is from a systems perspective.

(i) Bringing in the education system
What is a systems perspective? Figure 3 provides a visual conceptualization. The solid line depicts the management relationship within schools between principals and their teachers and students. This is the focus of the WMS for schools-the questions prompt principals to describe the (operations, monitoring, target-setting, and people management) practices that they use to manage teachers and students in their schools. The dotted line depicts the management relationship in the wider education system between education authorities (the Ministry of Education and its bureaucracy) 5 and schools (principals, teachers, and students). The WMS for schools does not focus on this management relationship, but there are 5 Education authorities may also spread beyond the education ministry, e.g. when teacher salaries are administered by the finance ministry and hiring rules are controlled by the civil service authority (e.g. Huang et al., 2020). Also, not all education authorities are government entities. Certain non-state actors may have significant discretion over school management, such as the head offices of large private school chains, such as Bridge International Academies, or commercial examination boards like Cambridge Assessment. frameworks that do. The RISE Accountability Framework, for instance, proposes five distinct 'design elements' for this relationship: 6 -Delegation. Education authorities specify what they want done-the objectives or goals that schools and teachers should achieve.
-Finance. Education authorities set a budget for schools. This may include line items for teacher pay and bonuses, or teacher pay may be determined centrally. -Support. Education authorities design and deliver curricula, learning materials, and training to schools and teachers. -Information. Education authorities determine the information that will be collected from schools (e.g. national assessment systems, school inspections, EMIS reporting, etc.). -Motivation. Education authorities specify what will happen to schools and teachers if the outcomes are good (relative to the delegation specified and based on the information available) versus if the outcomes are bad. These can be positive or negative and intrinsic or extrinsic (pecuniary) motivators. 7

Figure 3: Management in education systems
6 The original RISE Accountability Framework built on the World Development Report (World Bank, 2004) and proposed four design elements (see Pritchett, 2015); 'support' was added to the list subsequently. 7 Additionally, teachers and school principals may face affective, social, or reputational motivators that are beyond the formal control of education authorities, but are nevertheless contingent on their fulfilment of the delegation specified by education authorities (e.g. being shunned by colleagues for excessive absenteeism-or 'excessive' conscientiousness).
Evidently, the choices made under each design element will impact the ability of principals in government-funded and regulated schools to adopt management practices that score highly on the WMS coding. If, under the finance element, education authorities decide to retain control of teacher pay and allocate school budgets for non-people line items only, principals will have little scope to adopt practices that score highly on the question about 'attracting talent'. Similarly, if, under the motivation element, education authorities decide not to allow performance-based rewards (bonuses and/or promotion) and sanctions (dismissals or job reallocation/reposting), principals will have little scope to adopt practices that score highly on questions about 'rewarding high performers' and 'removing poor performers'. In short, principals in regular government schools do not operate in a vacuum, but rather are nested within a hierarchy of management relationships within the wider public education system, and this may impose constraints that are not present (or present to a lesser degree) in autonomous government schools and private schools. 8 If principals in regular government schools do face such systemic constraints, then we would expect to see management scores that are uniformly lower than in other schools (because the explanation lies at the system level rather than school level). This is indeed what we find in four of the eight WMS countries. Recall that, in Brazil and Sweden, both non-people and people management scores are significantly lower and significantly less dispersed among regular government schools than among the group of other schools (autonomous government schools and/or private schools depending on the country). And in Germany and Italy, this is true for people management scores, although not scores in the non-people domains.

How do system features vary across WMS countries?
To illustrate these arguments about the influence of the wider education system on school management, we compare the distribution of WMS scores from section II(ii) with data on education systems from other cross-country sources. Specifically, we outline possible correspondences between people management scores and one aspect of the management relationship between education authorities and schools-decentralization of decision-making discretion-as well as one aspect of the political economy of education systems-teacher unions. We do not discuss non-people management scores. Neither do we attempt to elucidate the full range of contextual features that can facilitate or constrain effective school management. As such, our analysis should be seen as exploratory and non-exhaustive.
As noted in section II(ii), there is substantial between-country variation in average people management scores in regular government schools. Among the six OECD countries surveyed, these scores range from 1.83 in Italy to 2.69 in the UK. This variation may reflect, in part, differences in the degree to which decision-making authority is decentralized across levels of government. The more decentralized an area of management may be, the more flexibility school principals would have to improve management practices within their school, and the higher we would expect management scores to be. This is borne out when comparing the WMS for schools data with country-level data from the OECD's Education at a Glance (2012) on aspects of people management that span four of the five 'design elements' from the RISE framework: determining teachers' duties (delegation), fixing salary levels (finance), allocating resources for teacher professional development (support), and teacher dismissal (motivation). As shown in Table 3, decision-making discretion across these four areas is primarily centralized in low-scoring Italy, and fully decentralized to the school level in high-scoring England. Among the four countries with people management scores in the middle of the range, decision-making discretion over these four areas is held at the state and regional levels in Germany, and is distributed between the state, local, and school levels in Canada, Sweden, and the US.
The distribution of decision-making discretion may also explain some of the withincountry differences in the variance of people management scores across school types. If some areas of management are decided by higher administrative levels for regular government schools, but are decentralized for private and autonomous government schools, then we would expect to see more dispersion in the management scores of the latter. Again, there is suggestive evidence that this may be the case in some of the WMS countries. In Sweden, which has the largest difference in the variance of people management scores between regular and autonomous government schools (difference = 0.490 SD), procedures for teacher appraisal and teacher reward schemes are determined by sub-regional, municipal, or local authorities in regular government schools, but by school-level committees in autonomous government schools (OECD, 2015). In Brazil and Italy, which have the next largest differences in the variances of people management scores between school types, teachers in regular government schools are civil servants (OECD (2015) on Brazil; Eurydice (2020) on Italy). Accordingly, people Countries are sorted in order of ascending mean scores for Education WMS people management in regular government schools. 'Central' refers to national-level government authorities. In federal countries, 'state' refers to the first territorial unit below the nation. 'Regional' refers to the level of government below the central level for Italy, and below the state level for Germany. Levels of government below the central government may hold primary discretion for an aspect of people management but may be obligated to make decisions within a framework set by a higher administrative level (e.g. in Italy, teacher dismissal decisions are made at the regional level within a framework set by the central level). management in regular government schools in these countries would be subject to civil service regulations that would not apply to other schools. At the other end of the spectrum, there is neither a significant nor substantive difference in the dispersion of people management scores across school types in the US (difference = 0.013 SD). This may be due in part to the fact that regulations for teacher certification and appraisal for both public and private schools vary across the 50 states (OECD, 2015), such that any differences in variance between school types are cross-cut by differences between states. Besides the decentralization of discretion between education authorities and schools, other relationships, actors, and dynamics within the wider education system may also constrain school principals' management practices (Pritchett, 2015). One such dynamic is the political economy of teacher unions. For example, teacher unions with wide membership bases may mobilize in support of labour protections that restrict governments' and school principals' leeway to implement the performance-based incentives at the centre of the WMS people management indicators. While there are no publicly accessible cross-country datasets on teacher union membership, International Labour Organization data on the proportion of employees across all industries who are covered by collective bargaining agreements are available for all of the WMS for schools countries besides India. Among these seven countries, collective bargaining agreements covered more than half of all employees in Brazil, Italy, Germany, and Sweden, where the people management scores were significantly lower and less dispersed in regular government schools than in other schools. In Canada, the UK, and the US, where people management scores differed less markedly across school types, collective bargaining agreements covered less than half of all employees. This is true for all years for which data are available, i.e. 2000-16 (International Labour Organization, 2020. Country-specific data on teacher unionization paints a similar picture. In Sweden, where people management scores were far less dispersed in regular government schools than in autonomous schools, collective agreements cover 100 per cent of teachers in regular government schools and 85 per cent of teachers in autonomous schools. Notwithstanding these comparably high coverage levels, salary negotiations for regular government school teachers take place jointly at the national level, whereas there is fragmentation in the autonomous sector, with an estimated 3,000 collective agreements across autonomous schools (Education International, 2013b). Another variable in the political economy of teacher unions is how much power a union has to disrupt business-as-usual. This can vary considerably. In Brazil, where we see the largest difference in magnitude of people management scores between regular government and other schools, the public-sector education union has substantial veto power. In 2011, there were six different states or municipalities which saw more than 50 days of strike action, and a further four such states or municipalities in 2012-including the state of Bahia, which saw 115 days of strike action that year (Education International, 2013a). In contrast, teacher unions have far less power in the US. A recent difference-in-difference analysis across 33 US states that had introduced mandatory collective bargaining regulations found that these laws did not lead to higher teacher salaries or larger education budgets, because these same laws typically restricted union power by instituting costly individual and collective penalties for strikes (Paglayan, 2019).

Moving forwards
The discussion above, while exploratory and non-exhaustive, does support the argument that the wider education system can influence the management practices in place in schools. We therefore see the extension of the WMS approach 'upwards' into the education bureaucracy as an important agenda for research. There has been some recent progress in this area. For example, RISE researchers recently adapted the Development WMS for schools to study the adoption of management practices within district education offices in Tanzania (Cilliers et al., 2021), building on earlier work on district education offices in Zambia (Walter, 2018) and on the civil service in Ghana (Rasul et al., 2020). We encourage researchers seeking to understand school management and its relationship to student learning to take a systems perspective. Our specific suggestion is, alongside the WMS for schools, to administer surveys within the education ministry and its bureaucracy that aim to capture the nature of the management relationship between education authorities and their schools (e.g. via constructs built around the five design elements mentioned above), as well as political economy factors such as the influence of trade unions. A further practical suggestion is to build on existing data sources from other organizations and disciplines, to reanalyse prior surveys using a WMS-style framework (e.g. Leaver et al., 2019), or to complement WMS surveys with data on other system features (e.g. the illustrative discussion in this section).

(ii) Measuring management practices in schools
As noted in section II, the aim of the WMS for schools was to select school management practices for which adoption could plausibly (in 2009) be scored on a monotonic 'more is better' scale. In this section, we draw on the literature that has emerged over the last decade to assess whether this selection remains appropriate today. We focus on the domains of operations, monitoring and target-setting (for brevity, grouped together), and people management in turn and, in each case, highlight what we see as the key theme to have emerged from recent research. 9 Operations: the importance of alignment The unifying principle behind all four practices in the operations domain of the WMS for schools is alignment. Specifically, standardization of instructional processes entails alignment across classrooms and between curriculum, instructional materials, and classroom practice. Personalization of instruction and data-driven transitions entail alignment of, respectively, day-to-day classroom instruction and critical transition points with students' learning needs. Finally, adopting best practices entails alignment of instruction with effective pedagogical strategies.
If anything, there is even greater consensus today that alignment is an important determinant of school performance than there was back in 2009 when the WMS for schools was created (see, for example, World Bank (2018)). This is particularly true for alignment of instruction with students' learning needs, a long-established principle in educational research (e.g. Vygotsky, 1978;National Research Council, 2000;Tomlinson et al., 2003). Education interventions that are premised on such alignment to learning needs have shown significant learning gains in recent experimental evaluations, whether in the 'Teaching at the Right Level' approach for foundational literacy and numeracy that was pioneered in India and is currently implemented in several African countries (Banerjee et al., 2016(Banerjee et al., , 2017 or in the more recent Mindspark programme for computeradaptive instruction (Muralidharan et al., 2019). At the classroom level, alignment between curriculum, instructional materials, and assessments can greatly aid teachers in navigating and fulfilling the many priorities of classroom learning. Some education systems have badly misaligned curricula, exams, and instructional practices (Atuhurra and Kaffenberger, 2020), but interventions that introduce well-aligned instructional components can yield learning gains, such as Kenya's nationwide Tusome literacy programme (Piper et al., 2018).
It is also increasingly recognized that it is not just alignment within each management practice that can influence instructional effectiveness, but also alignment across these practices and with the wider education system as a whole (Pritchett, 2015). To illustrate, adoption of new 'best practice' teaching techniques into classrooms may do little to cultivate student learning unless they are compatible with individual student learning needs (e.g. Glewwe et al., 2009). 10 Achieving alignment across management practices may be challenging, however, especially in low-capacity settings. For example, there may be a tension between personalization and standardization of instruction. Concurrently achieving high scores for both practices would require extensive educational resources, such as highly trained support staff who can offer tailored out-of-lesson remedial instruction-this may be a reality in some high-income settings, but not in countries where large shares of primary school teachers have yet to master the content that they are supposed to teach (Bold et al., 2017). In such low-capacity settings, school principals and education authorities face a twin challenge: aligning across management practices and prioritizing between these practices.

Monitoring and target setting: the challenge of measuring student learning
Most of the management practices in the monitoring and targeting domains focus on process, i.e. 'how', rather than 'what', to monitor and target. The descriptor for target balance does, however, indicate a judgement that it is good management practice to include targets based on (absolute and value-added) measures of student learning. 11 Such emphasis on student learning, while fundamental to the purpose of schooling, cannot be taken for granted in educational management. For example, the State Report Cards 2016-2017 in India's District Information System for Education reported on 977 distinct figures, none of which was a direct measure of student learning (Pritchett,10 It is also worth noting that there has been pushback against the idea of universal best practices that can improve education across all contexts (e.g. Coffield (2012), critiquing McKinsey's oft-cited education reports; Sellar and Lingard (2013), on the OECD's influence on educational governance). These critiques are supported by arguments in public policy and development studies about the importance of identifying context-specific mechanisms for achieving desired changes, especially when these changes require shifts in people's behaviour or decision-making (Pawson and Tilley, 1997;Cartwright and Hardie, 2012;Bates and Glennerster, 2017;Monaghan and King, 2018;Williams, 2020). 11 In the newer Development WMS instrument, student learning outcomes are also explicitly included in performance tracking, under monitoring (Lemos and Scur, 2016). 2018). Nonetheless, frequent monitoring of student, school, and classroom progress has consistently been identified as an important management process in educational effectiveness research (see Reynolds et al. (2015) for a review). More recently, an analysis of PISA data for 59 countries from 2000 to 2015 found that expanding the use of standardized tests to compare learning outcomes across students or across schools was positively and significantly associated with student achievement (Bergbauer et al., 2018).
In addition to providing valuable information to school principals and teachers, some approaches for monitoring student outcomes can themselves reinforce students' mastery of new learning. Cognitive science research has found that retrieval practicetrying to retrieve information from one's memory, ideally with feedback about the accuracy of retrieval-can reinforce long-term retention of information (Roediger and Butler, 2011; see also Dunlosky et al. (2013) on practice testing; and Ericsson et al. (1993) on deliberate practice for the development of expertise). Similarly, meta-analyses of interventions that promote formative assessment in K-12 classrooms found a weighted mean effect size of 0.20 on student achievement (Kingston and Nash, 2011). That said, a recent randomized controlled trial of a formative assessment programme in primary schools in Haryana, India, found that the programme did not improve test scores (Berry et al., 2020), partly because the assessment was treated as an administrative task and was not used to provide feedback to students, nor to inform teaching practices.
In short, there is growing evidence that monitoring/targeting student learning is associated with student achievement, alongside increasing recognition that this area of management practice brings complex challenges.
One such challenge is administration. Bloom and Van Reenen (2007) note in their article introducing the WMS that firms may choose not to adopt certain productivity-boosting management practices if the productivity gains do not offset the costs of adoption. This trade-off certainly applies to the costs that would be incurred in principals' and teachers' time in order to monitor school performance with the frequency and formality recommended in the WMS for schools. In the 2018 Teaching and Learning International Survey (TALIS), nationally representative samples of lower secondary school teachers across 48 countries reported spending an average of 8.2 per cent of classroom lesson time on 'general administrative tasks', such as recording attendance or distributing forms (OECD, 2019). One study of teacher accountability in the US found that, under the No Child Left Behind Act, teachers in some schools had to submit 'up to sixty pages of documentation each week' (p. 366) for administrative monitoring of compliance with curricular standards (Holloway and Brass, 2018).
A further challenge is measurement. Conducting student assessments that are valid, reliable, and appropriately calibrated is a technically demanding and potentially resource-intensive task (Koretz, 2008). Yet high-quality student assessment is both a precondition of meaningful monitoring and a casualty when the incentives embedded in monitoring systems go awry (whether those incentives are pecuniary, reputational, or otherwise). In the WMS for schools, the quality of student assessment is mentioned in the operations domain under data-driven transitions, which notes that student transitions should be 'supported by formative assessment tightly linked to learning expectations'. However, the monitoring and target-setting domains appear to take assessment quality for granted.
For a learning assessment to be valid, it has to measure the knowledge and skills that it purports to test. To give an egregious but actual example, if a test item is meant to measure students' knowledge of place values, then it would be inadvisable to ask them to identify the number in the units position of the cubed root of 531,441-since many students who have mastered place values would not have mastered the far more difficult area of cubed roots (Burdett, 2016). In light of validity concerns, some education experts recommend replacing standardized tests with performance assessments that demonstrate real-world skills, such as portfolios or capstone projects that are assessed using rubrics rather than percentage-correct or item-response theory scoring (Guha et al., 2018). Using test scores as a measure of schools' or teachers' performance introduces another dimension of validity, i.e. whether the measures accurately reflect teachers' and schools' contributions to learning, rather than reflecting factors beyond their control. In the WMS for schools, attaining a top score on target balance entails setting targets for 'both absolute and value-added measures of student outcomes'. Value-added measures that take into account students' prior performances (and sometimes home backgrounds) can be more appropriate indicators of teacher and school performance than absolute scores, although these measures have their own share of issues (Amrein-Beardsley, 2014;Bitler et al., 2019; see also recommendations from the AERA (2015) and the ASA (2014)).
Besides validity, high-quality student assessments also need to be reliable. In addition to reliability at the level of item and test design, a potential issue when assessment data is used in management processes is corruption of test score data when agents respond to self-serving incentives, as observed in Campbell's (1979) and Goodhart's (1984) laws. Test score manipulation can range from focusing on children whose test scores hover near accountability thresholds (e.g. Booher-Jennings (2005) on Texas), to inflationary leniency in grading (e.g. Hinnerich and Vlachos (2017) on Sweden), to outright cheating (e.g. Buckner and Hodges (2016) on Jordan and Morocco; Patrick et al. (2018) on Atlanta). Statistical analyses of assessment data have found answering patterns indicating test score manipulation in Sweden (Diamond and Persson, 2016), the US (Dee et al., 2019;Jacob and Levitt, 2003), southern Italy (Angrist et al., 2017), Mexico (Martinelli et al., 2018), India (Johnson and Parrado, 2020;Singh, 2020), Indonesia (Berkhout et al., 2020), and on a regional assessment of southern and eastern African countries (Gustafsson and Nuga Deliwe, 2017). Test score manipulation can compromise the achievement of learning targets not only because it redirects student, teacher, and administrator effort towards manipulation rather than learning, but also because managers may make counterproductive decisions when they treat manipulated data as if they were accurate.
Finally, high-quality student assessments must also be appropriately calibrated. The learning crisis in many low-and middle-income countries is such that actual student learning falls far below the levels that many policy-makers and school leaders can comfortably acknowledge. A recent evaluation of a computer-adaptive instruction programme in India found that grade 6 students in the treatment group were, at baseline, an average of 2.5 years behind the mathematics curriculum (Muralidharan et al., 2019). This is not an isolated result. For example, across 51 developing countries in the Demographic and Health Surveys, only half of all women aged 25-34 who had completed grade 6 (but had not attended secondary school) could read a simple sentence such as 'Parents love their children' in a language of their choosing (Pritchett and Sandefur, 2017). When the official curriculum is far above actual learning levels, emphasizing curricular completion is likely to make students fall further behind learning targets over time, rather than boosting achievement (Pritchett and Beatty, 2012).

People management: context matters
People management is the one domain where there is perhaps less consensus on the management practices that should be included in the WMS for schools with its monotonic 'more is better' scale. One factor, discussed in section III(i) above, is that principals in regular public schools may have limited discretion to adopt the high-scoring practices (flexible compensation, promotion, hiring, firing, and retention) due to systemlevel constraints. To the extent that this is true, the people management domain may not be capturing management by a school principal but rather management of a school by education authorities. There is also accumulating evidence that for some practices, notably performance pay under rewarding high performers, 'more' may enhance student learning in some contexts but not in others.
To focus in on teacher compensation, two recent reviews (Glewwe and Muralidharan, 2015;Breeding et al., 2020) and one recent meta-analysis (Pham et al., 2020) have found that, averaging over studies, performance pay schemes have positive treatment effects on student learning outcomes, although the individual impacts vary considerably in size and significance.
One possible explanation for this variation is that the interplay between extrinsic incentives and intrinsic motivation depends on context. Frey (1997) argues that extrinsic incentives are only likely to crowd out intrinsic motivation when the agent has high levels of intrinsic motivation to begin with. Experimental studies support this hypothesis. Teacher performance pay schemes have raised student achievement and been viewed positively by teachers in a number of less well-functioning education systems (where a sizeable fraction of teachers fail to perform even 'the basics', such as turning up for work and being present in class) including: India (Muralidharan and Sundararaman, 2011a,b), Tanzania (Mbiti et al., 2019;Mbiti and Schipper, 2019), and Rwanda (Leaver et al., 2021) . Relatedly, Deci and Ryan (1980) suggest that extrinsic incentives may inhibit intrinsic motivation to the extent that they are felt to constrain autonomy, but may reinforce intrinsic motivation when they are felt to affirm competence. The Tanzania performance pay scheme also supports this hypothesis. McAlpine et al. (2018) suggest that one reason why teachers viewed the Tanzanian scheme favourably was that they saw it as a status-raising social affirmation, because the scheme involved visits from the external implementers.
In other settings, however, more thorough adoption of the practice of rewarding high performers has not been found to be unequivocally 'better'. Three recent studies illustrate. A performance pay scheme for mathematics teachers in Uganda had no effect on attendance, achievement, or attainment in schools without mathematics books-although in schools with the appropriate books, it raised teacher attendance slightly and improved student performance on test items covered in the books (Gilligan et al., 2019). A test-score based performance pay scheme in Pakistan raised student test scores, but skewed lesson time towards test preparation and negatively affected student socioemotional development (Andrabi and Brown, forthcoming). And the introduction of high-stakes teacher evaluation in US states raised the quality of new teachers (as measured by the selectivity of their undergraduate institutions), but also raised the likelihood that schools would have unfilled teacher vacancies (Kraft et al., 2020).
Recent research has also shown that there are other management practices, beyond the formal incentives emphasized in the WMS for schools, that may be important in driving better performance: namely, responding to-and shaping-social and professional norms.
In some cases, norms can make school management harder. For example, social status hierarchies in India and Indonesia have influenced teachers' perceptions of who can legitimately evaluate their work (Broekman, 2015;Narwana, 2015;Gaduh et al., 2020). In an insider study, Mizel (2009) finds that teachers in Bedouin schools in Israel tended to prioritize accountability to the tribal sheikh over accountability to the education ministry, even though the schools were part of the state education system. Divergence between norms and official expectations can vary considerably across contexts: Sabarwal and Abu-Jawdeh (2018) find that over 75 per cent of teachers surveyed in Argentina, Senegal, and Tajikistan believe that it is acceptable to be absent from class when the teacher has completed the curriculum, providing they leave work for the students to do, and/or if they are engaged in tasks that serve the community; whereas fewer than 25 per cent of teachers in Myanmar and Pakistan expressed similar beliefs.
On the other hand, norms can also constructively reinforce school management practices. In Vietnam, which is currently an outlier with high student learning levels despite relatively low income levels, officially mandated processes for monitoring teacher performance are complemented by strong professional ethics among teachers, as well as high levels of societal attention to education (McAleavy et al., 2018). Crucially, different norm orientations may support different management configurations. Singapore's socio-cultural context of top-down management and meritocracy supports the use of highly structured teacher career ladders and performance bonuses; whereas Finland's sociocultural context of egalitarianism and individual civic responsibility supports the deployment of carefully selected, expertly trained, and highly autonomous teacherssuch that some Finnish teachers say that introducing people management practices that score highly on the WMS coding would have a negative impact on teaching and learning (Hwa, 2019). This variability in teacher norms and, accordingly, in appropriate management practices implies a need for caution in assuming a positive relationship between WMS people management practices and school performance in a given context.

Moving forwards
This brief review shows that the WMS for schools remains a highly relevant tool today, more than a decade after its inception. We fully support its use in schools (ideally alongside similar surveys in the education bureaucracy) but have emphasized three issues for researchers to bear in mind when doing so.
The first issue is the importance of alignment. The principle that unites the management practices selected for inclusion in the operations domain is alignment and, if anything, this is seen as more important today, particularly the alignment of classroom instruction with students' learning needs (personalization of instruction). Researchers may want to think more about how to capture alignment across management practices-e.g. by probing thoroughly whether practices that standardize instructional strategies and ensure consistency across classrooms also allow for personalization to student learning needs (recognizing that this may not always be feasible in low-capacity settings).
The second issue is the challenge of measuring student learning. Although the practices included under the domains of monitoring and target setting are well supported by recent evidence, the WMS for schools appears to take the quality of student assessments, on which 'balanced targets' should be based, for granted. Researchers may want to probe whether there are management practices in place to ensure that the assessments used to measure student learning for target setting are valid, reliable, and appropriately calibrated.
The final issue is the context-specificity of people management. There are two points here. The first is the systems-perspective point: in some school and country contexts, the principal being surveyed may not have decision-making discretion to adopt highscoring (on WMS coding) people management practices. The second is that for some practices under this domain, notably rewarding high-performing teachers with performance pay, 'more' may be 'better' in some school and country contexts but not in others. Researchers may want to reflect on whether in their particular context it makes sense to administer the questions in the people management domain in their current form: do they adequately capture management by school principals and, further, is it appropriate to code the answers on a monotonic 'more is better' scale?

IV. Concluding remarks
We began this article by summarizing the objective of the WMS for schools. Then, drawing on analysis of the original WMS data, secondary sources, and a review of the recent literature on school management, we took stock of its usefulness as a tool for future research and policy.
Our view is that the WMS for schools remains a highly useful tool today for its stated purpose: the standardized measurement of (a subset of) management practices within schools. The practices selected for inclusion are well-supported by the evidence that has emerged over the past 11 years, especially for the operations, monitoring, and target-setting domains. We did, however, make two sets of recommendations for its use going forwards.
First, we encourage researchers and policy-makers seeking to benchmark management practices in schools to take a systems perspective. We showed that, in four of the eight WMS countries, management scores were lower and less dispersed in regular government schools than in other schools with more autonomy, particularly in the people management domain. In these four countries, decision-making on key issues (e.g. teacher duties, salaries, professional development, and dismissals) is more centralized, and teacher unions negotiate more extensive collective bargaining agreements on pay and conditions, relative to the other WMS countries. Hence, the lower and less dispersed management scores in regular government schools may have been a reflection of the system, rather than choices made by individual school principals. Extending the WMS approach 'upwards' into the education bureaucracy-to capture the nature of the management relationship between education authorities and their schools, as well as the influence of other stakeholders-is an important agenda for future research. Such work seems particularly timely in view of the fact that recent high-profile, cross-cutting management interventions in schools show no impact on student learning (Muralidharan and Singh, 2020;Bedoya et al., 2021). These interventions followed global 'best practice' but may have failed to account for pressures and constraints arising in their specific system context.
Second, turning to the measurement of management practices within schools, we encourage researchers to think about how best to assess alignment across practices in the operations domain and the challenge of measuring student learning for monitoring and target-setting. For the domain of people management, where there is less consensus, researchers may want to reflect on whether, in their particular country context, the questions in the people management domain adequately capture management by school principals, and whether it is in fact appropriate to code the answers to all questions on a monotonic 'more is better' scale.
The WMS for schools was not conceived to provide causal explanations or to make practical policy prescriptions, but rather as a device to benchmark management practices in schools over time and countries. We feel it remains useful for this purpose, particularly with the recommendations suggested in this article. Such measurement can help as part of the process of understanding which management practices improve classroom teaching and student learning outcomes, alongside (or embedded in) careful quantitative and qualitative studies of school management in specific settings.

Operations
Standardization of instructional processes Use of a planning process designed to align instructional strategies and materials with learning expectations and incorporate flexibility with student needs. Use of comprehensive monitoring to ensure consistency across classrooms. Personalization of instruction and learning Use of processes: to identify individual student needs and to accommodate these needs in the classroom; to encourage student participation in the classroom; and to connect students and parents with sufficient resources to support student learning. Data-driven planning and student transitions The process used to move students through grades/ levels is supported by data from formative assessments that are tightly linked to learning expectations. These data are widely available and easy to use. Adopting educational best practices Staff are given opportunities to collaborate and share best practice teaching techniques. Use of processes to support the implementation of these practices in the classroom and to monitor their continued usage.

Monitoring
Continuous improvement Use of structured, regular processes to expose and resolve problems for the school, individual students, teachers, and staff. All appropriate individuals and groups are involved in problem resolution. Performance tracking Performance indicators that relate to overall school objectives are tracked formally and at high frequency.
Progress is communicated to all staff using a range of visual tools.

Management practice Descriptor for thorough adoption (top score)
Performance dialogue Regular meetings are held to discuss school performance. At these meetings: the agenda is clear; appropriate data are available; conversations focus on problem solving, constructive feedback, and coaching; and follow up steps are clear to all. Consequence management Failure to meet a target or carry through a follow-up plan carries consequences, e.g. retraining in identified areas of weakness and/or moving individuals to where their skills are more appropriate.

Target setting
Target balance Targets (on performance metrics) are defined for the school and for individual staff. The list includes internal targets that are not set by the government or regulators, and targets that are based on absolute and value-added measures of student learning. Target inter-connection Targets are aligned and linked at system level and increase in specificity as they cascade, ultimately defining individual expectations for all staff groups.

Time horizon of targets
There are short-and long-term goals for all levels of the school system. Long-term goals are translated into specific short-term targets so that short-term targets become a 'staircase' to reach long-term goals. Target stretch Goals are genuinely demanding for all parts of the organization and developed in consultation with senior staff (e.g. to adjust external benchmarks appropriately).

People management
Rewarding high performers There is an evaluation system which rewards individuals based on performance; the system includes both personal financial and non-financial awards; rewards are awarded as a consequence of well-defined and monitored individual achievements.

Removing poor performers
Repeated poor performance is addressed through a range of methods, beginning with targeted interventions. The process of terminating an employee is not too long to deter dismissals; poor performers are moved out of the school when weaknesses cannot be overcome. Promoting high performers Use of processes to actively identify, develop, and promote the school's top-performing staff members; promotions are based on performance rather than tenure.

Managing talent
The school proactively controls the number and types of teachers, staff, and leadership needed to meet goals; hiring criteria and processes are defined based on understanding of what drives student achievement.

Retaining talent
The school prioritizes the retention of top-performing teachers.

Attracting talent
The school provides a value proposition to encourage talented people to join that goes beyond its competitors.