Measuring ethnic school segregation within local educational markets in England

ABSTRACT Despite the increasing ethnic diversity of England’s school-age population, academic literature on ethnic school segregation remains small, dated and hindered by methodological challenges. This study seeks to address these issues by measuring ethnic school segregation between 2006–2019 using two methodological innovations. Firstly, it is the first national study in England to adopt a multi-group segregation index to measure segregation between five major groups in a single concise metric. Secondly, it uses a clustering algorithm to group local schools into ‘pseudo-neighbourhoods’, which allows for segregation to be measured within local educational markets. The article shows that between 2006 and 2019 the median level of ethnic school segregation within English pseudo-neighbourhoods fell by 25%, which suggests students from different ethnic groups have become more evenly spread across local schools. Additionally, areas in the North of England were typically found to have higher levels of segregation than those in the South.


Introduction
England is in the middle of a profound shift in the ethnic composition of its population. In 1991 the nation was overwhelmingly 'white', with only 5.9% of residents identifying as part of any other ethnic group. By 2011, this figure had more than doubled, increasing to 14% of the population (Office for National Statistics, 2012). England's school-age population replicates these patterns in the extreme (Hamnett, 2012). As of 2018, 33% of primary school pupils and 30% of secondary school pupils came from an ethnic minority background (Department for Education, 2018), more than twice the proportion in the wider UK population. Urban schools have become especially diverse, with ethnic minority pupils constituting an absolute majority in Slough (84%), Luton (80%), London (76%), Leicester (72%), Birmingham (69%), Manchester (63%) and Bradford (57%). 1 These shifts have been accompanied by ongoing concerns that students from different ethnic groups are being educated in different schools (BBC, 2017;Cantle, 2001;Casey, 2016;May, 2016;The Challenge, School Dash and The iCoCo Foundation, 2016). This runs counter to the government's aim that schools become places where students 'mix and form lasting relationships with others from different ethnic, religious or socioeconomic groups' (HM Government, 2018, p. 26).
However, due to a lack of rigorous analysis in this area, it is hard to know whether these fears are justified. There are only four studies in the last two decades that have measured ethnic school segregation at a national level in relation to evenness segregation (Burgess, Wilson, & Lupton, 2005;Gorard, 2016;Gorard, Hordosy, & See, 2013;Johnston, Wilson, & Burgess, 2004), and most use data from the turn of the century. Following stark changes in the ethnic profile of England's school-age population, these results are likely to have changed substantially. These articles also use methodologies that offer a limited approach to measuring ethnic school segregation in England. Most calculate segregation within Local Authority (LA) boundaries, which are too large to be considered local school markets and they vary substantially in size, which introduces considerable bias into their results. Finally, the tools the researchers used to measure segregation either are poorly conceived or no longer provide a measurement of segregation which aligns with how ethnic identities are now understood.
This article aims to address these gaps. By analysing school census data from 2006-2019, it provides the most up-to-date measurement of ethnic school segregation in England. It also pioneers two new methodological techniques that substantially improve existing research practices. It is the first national study in England to use a multigroup segregation index (H Index) to measure ethnic school segregation between five major ethnic groups in a single concise metric. Secondly, it pioneers the use of a clustering algorithm to group local schools into 'pseudo-neighbourhoods', so that patterns are measured at an appropriate geographic scale. The article therefore provides substantive findings that will be useful to a wide audience of academics in the UK and abroad, along with methodological developments that can inform policymakers and academics working within the field of school segregation.

Literature review
Ethnic school segregation is a complex, hotly debated term which a host of academics have highlighted that 'despite its long history in sociology, remains theoretically underdeveloped' (Fiel, 2013). Broadly speaking, it centres on measuring 'the degree to which two or more groups are separated from each other' within a school system (Allen & Vignoles, 2007, p. 645). However, how this 'separateness' is defined varies from study to study, as do the geographic areas, ethnic groupings and statistical tools that academics use to measure it. As each of these decisions has a substantial impact on research findings, the article begins by discussing each of these four choices, to justify the new methodological approaches of this study. Based on a review of past academic literature, this is the first article to evaluate all four dimensions in unison, which makes the discussion an important contribution to the field. The article then evaluates past academic literature on ethnic school segregation in Britain, before presenting its methodology, results and discussion sections.

Defining segregation as (un)evenness
While there are five theoretical dimensions of 'separateness' (Massey & Denton, 1988), evenness segregation is the common conceptualisation applied to the school system. This assesses the extent to which students from different ethnic groups are evenly spread across the schools in a predefined geographic area. This can be understood in relation to how closely the ethnic profile of individual schools matches the average ethnic profile of all schools within a predefined geographic area. Alternatively, it can be interpreted as how closely the ethnic profile of schools within a geographic area match one another.
Crucially, evenness segregation is a relational measurement between schools located within the same area and therefore segregation is ascribed to the area, not an individual school. A single school cannot therefore be 'segregated' in and of itself. Secondly, 'it is critical that segregation is not conflated with diversity' (Catney, 2016). Geographic areas can be ethnically diverse and still have low levels of ethnic school segregation if the ethnic profiles of schools match one another, and vice versa.

Defining the area(s) in which segregation is being measured?
As evenness segregation is a comparison between the ethnic profiles of schools within a predefined geographic area, defining the geographic area(s) of study has a substantial impact on a study's results, an issue known as the Modifiable Areal Unit Problem (Flowerdew, 2011;Taylor, Gorard, & Fitz, 2003). A study that calculates ethnic school segregation within England as a single geographic area will produce drastically different results to one that splits the nation into 3000 geographic areas and measures ethnic school segregation within each of these smaller areas.
When choosing how to define their geographic area(s) of interest, researchers must make two decisions (Reardon & O'Sullivan, 2004). Firstly, they must decide how large their geographic area(s) should be and, secondly, they must choose where the boundary of this/these area(s) should be drawn. Drawing on the limited number of previous articles that discuss this topic (Burgess, Greaves, Vignoles, & Wilson, 2015;Gibson & Asthana, 2000a;Harris & Johnston, 2008;Taylor et al., 2003) this study suggests two properties that should guide this choice. These are that: (1) Geographic areas should reflect local educational markets, so that schools that could feasibly educate the same pool of students are in the same geographic area. This is the most common scale at which researchers in England aim to operate (Burgess et al., 2015;Gibson & Asthana, 2000a;Harris & Johnston, 2008;Taylor et al., 2003). It is conceptually clear and helps to avoid comparing schools which are spread across such a large geographic area that differences in their ethnic profiles primarily reflect ethnic residential patterns. Similarly, it avoids grouping schools into such small geographic areas that measures of segregation fail to detect dynamics occurring within the local educational market.
Unfortunately, it is impossible to define geographic areas that perfectly reflect a local educational market in England. While home-school distance is an important part of most schools' admissions policies (Pennell, West, & Hind, 2006), students can apply to any school in the country, which means there are no discrete educational markets in England. Secondly, while most schools tend to compete with those in their immediate vicinity, religious and grammar schools often compete with equivalent schools spread across a wider geographic area (Johnston & Harris, 2016;Taylor, 2001). Meeting this principle should therefore act as a goal for researchers, rather than a straightforward rule to implement.
(1) When measuring ethnic school segregation within multiple geographic areas, each area should contain roughly the same number of schools.
If researchers split a large geographic area (i.e. a nation) into a series of smaller ones (i.e. neighbourhoods), there would ideally be minimal variation in the number of schools contained within each of these smaller geographic areas. As the number of schools within a geographic area influences measures of segregation, having a similar number of schools within each area ensures a fairer comparison (Harris, 2011, p. 484). Additionally, it should also help readers have a clear understanding of the scale at which they should interpret findings. As the size of educational markets vary across different parts of England, meeting this principle could involve a trade-off with meeting the first principle.
Previous studies in England have used government boundaries to define the geographic area(s) when measuring ethnic school segregation, primarily due to the convenience it offers. Unfortunately, no government boundaries provide a desirable set of geographic areas for measuring ethnic school segregation. As shown in Table 1, all are inappropriately sized, either containing too many schools to be considered a local educational market (LA and Parliamentary constituencies) or too few (Middle Super Output Areas, Lower Super Output Areas and Electoral Wards). Additionally, as none of the boundaries are drawn in reference to schools, none are likely to reflect the choices available to local parents (Sandford, 2018). Finally, all boundaries exhibit considerable variation in the number of schools within each area, which introduces bias into measures of ethnic school segregation and makes it hard for readers to have a clear understanding of the scale at which they should interpret results.
Worryingly, Local Authorities (LA), the most commonly used set of boundaries, is one of the worst options (Gibson & Asthana, 2000b, p. 139). While the historical role that LAs played in local school systems may have meant that using LA boundaries could have once helped research reflect local school admission policies (Taylor et al., 2003, p. 53) or translate findings into policy outputs by aligning neighbourhoods to governing bodies (Burgess et al., 2005(Burgess et al., , p. 1033, the weakening of LA control over local schooling and the fact that students can apply for any school in England, regardless of which LA the school is located in, means neither argument carries much weight any longer. Instead, LAs now represent a series of large, arbitrary geographic areas where findings tend to reflect differences in residential patterns rather than any dynamics within local educational markets (Gibson & Asthana, 2000b, p. 139).

Who is being studied? Defining the ethnic groups
A third decision researchers face is how to define the ethnic groups they wish to use. As most studies use government data, researchers normally begin with the 18 groups listed in the UK census, before aggregating these into a limited number of larger groups. 2 This is an important decision that researchers must clearly justify, something few articles currently do. This article proposed two principles to inform this choice, which hopefully provides a starting point for future discussion.
(1) Ethnic groups should reflect how identities are commonly conceptualised within the area(s) of study.
This ensures that segregation measures segregation between people who are perceived to be from different ethnic groups. Of course, meeting this principle in practice will be challenging as ethnic identities are 'nebulous' (Harris, Johnston, Jones, & Owen, 2013, p. 2283 and are likely to be viewed differently by different members of society (Kaszycka & Strzałko, 2019). This is particularly hard when measuring segregation across a large geographic area, as conceptions of ethnicity are more likely to vary.
(1) Researchers should try and avoid using an excessively large number of ethnic groups As the number of ethnic groups increases, segregation indexes can become upwardly biased, as it becomes increasingly unlikely that students from different ethnic groups will be spread evenly across schools, even if they were randomly assigned to schools in the area. 3 By creating ethnic groupings that include a reasonable number of students in each group, researchers ensure that this 'random segregation' remains minimal (Cortese, Falk, & Cohen, 1976;Rathelot, 2012). 4 Of course, this may require a trade-off with meeting the first principle.
Secondly, using fewer ethnic groups should help provide a more accessible narrative for non-specialist audiences. Given how dense some of the mathematical concepts underpinning segregation are, making findings more accessible should be a priority for any academic who wants their work to play a role in supporting real world policy making.

How will you measure segregation? Selecting the right measurement tool
Finally, researchers must select a tool to measure ethnic school segregation. Luckily, all methods ultimately do the same thing: they calculate the extent to which the ethnic profiles of schools located in the same pre-defined geographic area diverge from one another. This involves comparing the proportion of each ethnic group in each school against the proportion of that ethnic group within the geographic area. When studying a single neighbourhood, maps and other graphic approaches normally offer the clearest way of describing this data (James & Taeuber, 1985, p. 26). However, when working at a macro scale the number of comparisons becomes too large for this approach to be feasible.
Consequently, researchers normally aggregate these datapoints into a smaller set of numbers using a segregation index. This reduces 'huge data arrays into [a] simpler, more readily understandable number' (Grannis, 2002). This process is unpacked into five steps, and they are described in detail in Figure 1. After a researcher defines the geographic area of analysis and calculates the ethnic profile of each school, the proportion of each ethnic group in each school is compared against the proportion of that ethnic group in the geographic area. While the exact formula for measuring this divergence varies between indexes, they are collectively known as a Disproportionality Function. These divergences are then aggregated first to the school and then to the geographic area, which leaves researchers with a single statistic to describe how evenly distributed ethnic groups are, on average, across schools within the geographic area.

Past research that has measured ethnic school segregation in England
While seven previous studies have measured some form of ethnic 'separateness' in English schools, only four have explored evenness segregation across the entire English school system. 5 The two most recent articles used school census data to map patterns between 1997-2012 in relation to four different student characteristics, one of which was a binary categorisation of 'White' and 'Non-White' students (Gorard, 2016;Gorard et al., 2013). When measuring trends within England as a single geographic area using the Gorard Index, the articles found G fell by a third from 0.6 to 0.4 for secondary schools, while primary schools experienced a smaller reduction with G falling from 0.3 in 1997 to 0.25 by 2012. 6 Substantively, these figures suggest that approximately 60% of ethnic minority students in secondary schools and 30% of ethnic minority pupils in primary schools would have had to change schools in 1997 for all secondary and primary schools in England to have matching ethnic profiles. By 2012, the same figures had dropped to approximately 40% for secondary schools and 25% for primary schools, suggesting that ethnic minority and white British students had become more evenly spread across the nation's schools. 7 Unfortunately, these results do not offer a compelling measurement of ethnic school segregation within the English school system today. Both calculate segregation within England as a single geographic area, which means results are likely to reflect ethnic residential patterns rather than dynamics within the school system. Additionally, the results were calculated using only two ethnic groups ('White' and 'Non-White'), which provides a crude measure of segregation that does not capture trends in segregation between distinct minority groups. As ethnic minority groups in England now constitute nearly a third of all students, it is both mathematically feasible and conceptually important to use measures of segregation that assess the level of separation between different minority groups.
A third article by Johnston et al. (2004), measured segregation for secondary students within three types of geographic areas in 2001, using a novel graphic approach called a 'concentration profile'. This creates a plot that measures the cumulative percentage of an ethnic group (x%) who attend a school in which their ethnic group is equal to, or larger than, a given percentage (y%). When making these plots for England as a single area, between 'White' and 'Non-White' students they found that 'white[s] [students] much more segregated in all-white schools than non-white[s] [students] are into all-non-white schools' (2004, p. 244). After delineating between different ethnic minority groups, they found students from Asian backgrounds, particularly those of Pakistani and Bangladeshi heritage, were more segregated than students from other ethnic minority groups. However, as with the previous two studies, these national level findings primarily reflected ethnic residential patterns across England rather than any dynamics within the school system.
The authors also made plots within four geographic areas that were created by grouping LAs together based on the percentage of the non-white students in each LA, to create what this article refers to as a 'meta-LA'. Unfortunately, this approach offers readers a measure of segregation within four large, disjointed geographic areas, that were created using cut-off points which, by the authors' own admission, were arbitrary. There was also substantial variation in the size of each meta-LA, which introduces substantial bias into the data. The '30% non-white' meta-LA, for example, contained 118 LAs while the '75% non-white' meta-LA contained only 16.
The fourth article by Burgess, Wilson and Lupton provides the most robust study into patterns of ethnic school segregation in England to date (2005). Using the 2001 school census dataset on secondary schools, the authors measured segregation within 144 Local Authorities, using a series of binary indexes, each of which measured segregation for a given ethnic group in relation to students from all other ethnic backgrounds. Their major finding was that Bangladeshi and Pakistani students, on average, tended to be most segregated from other ethnic groups in their Local Authority, while Chinese and Indian students were the least segregated.
While this offers the best insight into ethnic school segregation currently available, there remains a clear gap in the literature for a new study. As the article's findings are based on data from 2001 and use LA boundaries, their findings suffer from the same issues as previously discussed. Additionally, by using a series of binary segregation indexes, the article offers eight overlapping narratives of ethnic school segregation, which leaves readers having to weave together various narratives to gain a holistic picture of ethnic school segregation in England. The article also provides no insight into the level of ethnic school segregation occurring within the primary school system which, judging by Gorard's results (Gorard, 2016;Gorard et al., 2013), could well be distinct.
They also present findings in relation to ethnic groups, rather than their geographic area of interest (LAs), which prevents readers from having a clear idea of which parts of England experience the highest levels of segregation, obscures the relational nature of segregation and places undue focus on a given ethnic minority group. It also implies that the best way to engage with segregation is on a group-by-group basis, rather than working across local areas to encourage local families from various ethnic backgrounds to attend the same schools. This is an important change in the academic literature that this article aims to encourage, as part of a wider push by other academics to ensure that results are presented in a way that reduces the potential risk of enflaming tensions between ethnic groups (Phillips, 2007(Phillips, , p. 1154 In summary, the existing literature on ethnic school segregation in England is small, dated and tends to calculate trends at undesirable geographic scales. Additionally, articles often use crude binary groups of 'White' and 'Non-White' students rather than a broader list of ethnic categories, as well as employing unappealing approaches to measurement. As discussed in the following methodology section, by mapping ethnic school segregation using the most current school census data, within appropriately defined geographic areas, using robust multi-group measures of school segregation, this article seeks to address these issues, thereby making an important contribution to academic literature.

Data sources
The study used the 2006-2019 school census datasets, which contain information on the ethnic profile and location of all schools in England. This population was reduced to create the final sample based on three criteria. Firstly, private schools were removed from the dataset because they do not release information on the ethnic make-up of their student bodies. 8 Secondly, special schools and sixth-form colleges were removed to focus on mainstream primary and secondary schools, thereby making the sample more parsimonious. Finally, any schools that were in a pseudo-neighbourhood where the largest ethnic group constituted more than 90% of the student population were removed from the sample. 9 This helped to ensure that ethnic school segregation was only measured in pseudo-neighbourhoods that were sufficiently diverse for segregation indexes to be reliable. While most researchers only exclude areas where 95% of the population belongs to the same ethnic group (James & Taeuber, 1985;Reardon, Yun, & Eitle, 2000;Zoloth, 1976), this still leaves researchers comparing neighbourhoods with radically different ethnic profiles. While this may have been permissible when using binary measures of segregation in the past, an increasingly diverse population warrants a more conservative approach. On average, this resulted in 1105 pseudo-neighbourhoods being removed from the sample of 3292 (34%), with an average final sample of 6,581,580 students attending 14,687 schools nested in 2245 ethnically diverse pseudo-neighbourhoods. 10 Figure 2 shows the locations of pseudo-neighbourhoods and whether they were included in the sample. The ethnic profile of the students in the final sample was 59.8% White British, 15.8% Asian, 9.6% Black, 8.5% White Other and 6.3% Other. This compares to 65.9% White British, 13.2% Asian, 8.1% Black, 7.3% White Other and 5.5% Other in the wider school population.

Defining the geographic areas of study using k-means clustering
The study used the k-means clustering algorithm (Hartigan & Wong, 1979) to group schools together which were geographically close to one another in order to form what this article calls 'pseudo-neighbourhoods'. 11 The collective ethnic profile of each group of schools was then used as the ethnic profile of the geographic area. This procedure is explained in the following section. 12 Researchers initially decide how many clusters, or 'pseudo-neighbourhoods', they would like the algorithm to make (k). The algorithm then randomly scattered this number of centroids (k) across a map of the geographic area of study that contains the coordinates of all schools in the sample (step one). Schools are then assigned to their nearest centroid, based on the Euclidian distance between the school and the centroid in step two. In step three, each centroid is then replotted as the midpoint of all schools assigned to it in the previous step. Steps two and three are then repeated until all schools are consistently assigned to the same centroid. These stable centroids are used to define the 'pseudo-neighbourhoods' in the fourth and final step of this procedure.
This approach has two main advantages over using government boundaries. Firstly, by selecting the number of centroids (k) in step 1, researchers control the average number of schools in each pseudo-neighbourhood, allowing them to make groups of schools which are of a similar and appropriate size. The greater the number of centroids (k), the fewer schools each pseudo-neighbourhood will contain and vice versa. This not only gives researchers control over the size of pseudo-neighbourhoods, but also forces her/him to justify their choice of k, which alerts the readers to the scale of the geographic area in which segregation is being measured. Hopefully, this will encourage a debate over the ideal scale at which to measure school segregation in England.
The second advantage of k-means clustering is that by creating multiple sets of pseudo-neighbourhoods, running analyses on each set of boundaries, and then averaging results across each iteration, k-means can produce findings that are insensitive to the way any single set of boundaries is drawn. As the initial distribution of centroids is random (step two) each time the algorithm is run a different combination of pseudoneighbourhoods is created, but as k is consistent across iterations, the average size of pseudo-neighbourhoods remains constant. This means that different sets of schools are grouped together to form different pseudo-neighbourhoods, but of a similar size. Crucially, when results are averaged across several iterations the findings are stable, which allows researchers to validate previous studies as well as establish trends across time. 13 This study set k = 2800 for the analysis of primary schools and k = 550 for the analysis of secondary schools, delivering pseudo-neighbourhoods that, on average, contain six schools, as shown in Table 2. This reflects the maximum number of schools parents can list on their child's school application form (HM Government, 2018, p. 4), which offers a reasonable estimation for the size of local educational markets. 14 As Table 2 shows, pseudoneighbourhoods contain a similar number of schools, which suggests that these values of k also meet the second desirable property for defining a geographic area for measuring school segregation. When conducting longitudinal analysis, pseudo-neighbourhoods were kept constant by storing the midpoint of schools within each pseudo-neighbourhood in 2019, with schools in previous years assigned to whichever midpoint they were closest to. This approach ensured that schools were consistently assigned to the same pseudoneighbourhood, while schools which had closed before 2019 were still included in the analysis of previous years. This ensured that, within each iteration, the schools grouped together to form a pseudo-neighbourhood were largely 'fixed' between different years, save for the opening or closing of individual schools that meant certain pseudoneighbourhoods gained a school as their local educational markets grew, or lost one as their local educational market shrank.
This study also assigned pseudo-neighbourhoods to one of the nine regions of England, based on whichever region most of its schools were located in. As regions are geographically large, 94% of pseudo-neighbourhoods only had schools located within a single region, which made the aggregation process relatively straightforward. Any pseudo-neighbourhoods where schools were evenly split between two or more regions were removed from the sample, which resulted in a small loss of data (<0.5%). On a few occasions, the study assigned pseudo-neighbourhoods to a city or urban dwelling using the same method to help describe more localised trends.

Defining five ethnic groups
This study condensed the 18 official ethnic groups into five broader ethnic categories -White British, White Other, Black, Asian and Other. These groups are similar to how ethnic identities are often aggregated for national statistics (Office for National Statistics, 2021), make results easier to describe and reduce the risk of small unit bias. They also align with how two previous UK research articles conceptualised ethnic groupings (Jones, Johnston, Manley, Owen, & Charlton, 2015;Leckie & Goldstein, 2015). While these five groups may not reflect important local dynamics within certain parts of England, where, for example, distinctions between Indian and Pakistani communities may be important, they are nonetheless useful in analysing trends at a national scale. A sensitivity analysis conducted by the author found that an H index based on five or 18 ethnic groups had a correlation of 0.90, which suggests this aggregation only had a minimal impact on the results.

A multi-group measure of ethnic school segregation
Starting from the premise that a multigroup measure is required, the following section explains the desirable properties for a segregation index to exhibit before demonstrating that H is the index that best meets these criteria. However, readers should note that a sensitivity analysis conducted by the author found for multi-group H index had an average correlation of 0.93 with a multigroup D index when using English school census data. This suggests that different multi-group segregation indexes produce similar results when using English school data.

Five desirable properties for a segregation index
Despite over 70 years of debate, there remains disagreement over which index offers the best measure of segregation (Peach, 1975, p. 3). Over 25 segregation indexes exist (Grannis, 2002) and no indexes are considered 'ideal' (Gorard, 2007, p. 674). However, academics normally agree on five properties that should be used to evaluate indexes (Allen & Vignoles, 2007;Grannis, 2002;James & Taeuber, 1985). These are: (1) Organisational Equivalence: measures of segregation should not be influenced by the number of schools in an area. Of the five properties, compositional invariance remains the biggest hurdle for segregation indexes. All indexes are influenced by the proportions of ethnic groups within an area, with segregation normally higher in places with greater ethnic diversity. This makes it challenging to compare areas with radically different ethnic profiles or to plot changes in segregation through time when an area's ethnic profile has changed (Harris et al., 2013(Harris et al., , p. 2282. For this reason, the study adopts a more conservative approach than most other studies by restricting the measurement of ethnic school segregation to pseudo-neighbourhoods where no single ethnic group constitutes more than 90% of the people, as described previously.

The H index
The H index was used as the study's measure of ethnic segregation, as it meets more desirable properties than any other multigroup index (Reardon & Firebaugh, 2002). The formula for H is listed here. 15 Equation 1: The H Index (Theil & Finizza, 1971) Where: j ¼ a given schoool m ¼ a given ethnic group P j ¼ proportion of school j which belong to ethnic group m P m ¼ proportion of students in the pseudo À neighborhood which belong to ethnic group m t j ¼ number of students in the pseudo À neighborhood which attend school j T ¼ total number of students in the area Initially formulated in the 1970s (Theil & Finizza, 1971), H is based upon a branch of mathematics called Information Theory, which models the flow of information (Alencar, 2015). Bounded between 0 and 1, or 0 and 100 in this article, its output can be interpreted as 'the proportional increase in expected information about [a student's ethnicity] that occurs when learning about the school that the student attends' (Mora & Castillo, 2011, p. 172). While a number of articles in the 1980s commended its mathematical properties (James & Taeuber, 1985;White, 1986), its complicated formula and interpretation meant that H only began to be used frequently following the racial diversification of the United States in the late 1990s (Reardon et al., 2000, p. 352). Historically, a binary Index of Dissimilarity (D), which measured segregation between White and Non-White students, had been preferred (Duncan & Duncan, 1955). D has a straightforward formula and a comparatively easy interpretation. Bounded between 0-100, its output measures the percentage of minority students who would need to move schools to achieve an even spread of students across a neighbourhood's schools. It also meets most of the criteria for a good measure of segregation (Allen & Vignoles, 2007;Hutchens, 2004), with its strong compositional invariance marking it out against other indexes. However, when adapted to a multigroup index, D becomes far less appealing. Its interpretation changes and its compositional invariance, the main advantage it holds over other two-group indexes, does not hold in a multigroup setting. It also fails to meet the principle of exchanges, transfers, additive organisational and group decomposability (Reardon & Firebaugh, 2002). While H is not perfect, it meets more of the desired properties and therefore was used as this study's main measure of ethnic school segregation.

Ethical considerations
This research was granted ethical approval by Oxford University's Central University Research Ethics Committee in September 2019 (ED-C1A-19-244). As part of this process, it took several steps to reduce the risk that any harm would arise from the publication of these results. All findings were aggregated to a unit of analysis that prevented the identification of any individual, or any small group of individuals, and results were described so as to be sensitive to debates over equity and ethnicity in the UK. For example, any language that attributed blame to a particular ethnic group (or groups) was stringently avoided, and results that related to White British students were written in the same manner as those relating to smaller ethnic groups. Additionally, the article actively highlighted different viewpoints to stress how contested topics on school segregation remain to encourage nuanced viewpoints on these complex dynamics. Figure 3, the analysis determined that the median H value for all pseudoneighbourhoods in England for 2019 was 5.1, with a lower quartile of 3.2 and an upper quartile of 8.2. Primary school pseudo-neighbourhoods tended to have exhibited a slightly lower level of ethnic school segregation than pseudo-neighbourhoods made up of secondary schools. Primary school pseudo-neighbourhoods had a median H value of 5.0, which was 14% lower than the median H value for pseudo-neighbourhoods made up of secondary schools (H = 5.7). H values for both types of pseudo-neighbourhoods had similar shaped distributions, with a strong positive skew. This meant that many pseudo-neighbourhoods experienced far higher levels of ethnic school segregation than the summary statistics would suggest. 16 As shown in Figure 4, pseudo-neighbourhoods in the north of England tended to experience higher levels of ethnic school segregation than those in the south. The median H values for primary school pseudo-neighbourhoods in the North East region (8.0), Yorkshire and the Humber (7.6) and the North West (6.7) were all higher than for pseudo-neighbourhoods in London (4.6), the East of England (4.0) and the South East (3.9). Similar trends were found for secondary school pseudoneighbourhoods, with median H values for pseudo-neighbourhoods in Yorkshire (10.1), the North West (8.8) and the North East (7.6) higher than equivalent figures for pseudo-neighbourhoods in the East of England (4.8), the South West (4.5) and the South East (2.9).

As shown in
This north-south divide was particularly acute with regard to the number of pseudo-neighbourhoods with the highest levels of segregation, as defined by pseudoneighbourhoods within the top 5% of H values (H > 20). For example, more than 10% of primary school pseudo-neighbourhoods and 20% of secondary school pseudo-neighbourhoods in the region of the North West and Yorkshire and the Humber had H values above 20, compared to less than 3% in London, the East of England and the South East. Remarkably, these two regions alone contained more pseudo-neighbourhoods where H was greater than 20 than the rest of the country combined.

Changes in the level of ethnic school segregation between 2006-2019
Between 2006 and 2019 the level of ethnic school segregation fell consistently from a median H value of 6.8 in 2006 to a median H value of 5.1 by 2019, representing a 25% reduction within the 13-year period. These trends were consistent for both primary and secondary school pseudo-neighbourhoods, which fell from a median H value of 6.8 to 5.0 (−26%) and 7.6 to 5.8 (−24%) respectively. Additionally, as shown in Figure 5, the lower quartile, the mean, the upper quartile and the maximum H values all decreased, suggesting a reduction across the whole distribution of pseudoneighbourhoods. These various statistics point to a substantial and consistent reduction in the average level of ethnic school segregation within local educational markets in England during this period.
These national trends were also consistent for pseudo-neighbourhoods located in different parts of the country. The median H value for primary and secondary school pseudo-neighbourhoods fell across all nine regions fell between 2006-2019. As shown in Figure 6, pseudo-neighbourhoods in the northern regions, which had the highest average levels of segregation in 2006, tended to see the largest reduction in average segregation, which facilitated a degree of equalisation between different parts of the country.

Discussion
England is becoming an increasingly multi-ethnic society and its school-aged population is at the vanguard of this change. Set within this shifting demographic landscape, the study has shown that the average level of ethnic school segregation for England's pseudo-neighbourhoods fell significantly between 2006-2019, reducing by around 25% from a median H value of 6.8 to 5.1. Substantively, this means that between 2006-2019, the ethnic profile of schools within the same local educational market tended to become increasingly similar, with students from different ethnic groups more evenly spread across local schools. While pseudo-neighbourhoods in the north of England tended to experience larger reductions in ethnic school segregation than those in the south, pseudo-neighbourhoods in all nine regions of the country experienced a reduction in the median level of ethnic school segregation. When read in conjunction with results from Gorard, Hordosy and See's article (2013), this provides growing evidence that the association between a student's ethnicity and the school they attend is weakening within England across various geographic scales of measurement.
The results also cast doubt on a popular narrative that 'ethnic divisions' within English schools are deep and worsening (BBC, 2017;Casey, 2016;The Challenge, School Dash and The iCoCo Foundation, 2016). Instead, these findings suggest that increased diversity in English schools has generally been accompanied by consistent year-on-year reductions in the typical levels of ethnic school segregation since 2006. This will be welcome news for the UK government, as it suggests that the opportunity for students to 'mix and form lasting relationships with others from different ethnic . . . groups' (HM Government, 2018, p. 26) has grown within the English school system. At the same time, the reduction in the average level of ethnic school segregation should not obscure the fact that ethnic school segregation remains high in several areas in England. Unfortunately, many such pseudo-neighbourhoods were located in the northern regions, which is worrying, given the history of inter-ethnic violence and tension in this part of the country (Cantle, 2001). It is disturbing to note, for example, that the North West region and Yorkshire and the Humber contained more pseudoneighbourhoods with the highest levels of segregation in 2019 (H > 20) than the rest of the country combined. This provides a stark reminder of the continued challenge of creating ethnically integrated schools in these neighbourhoods.
Additionally, while ethnic school segregation has fallen over the last decade, it should not be assumed that it will inherently continue to do so in the coming years, or that these changes will happen seamlessly. Schools across England will be adapting to new levels of ethnic diversity and school leaders will have to manage these transitions carefully if students from different ethnic backgrounds are to share the same local schools in the medium to long term. Stakeholders should be aware that once the proportion of an ethnic group in a school passes a certain threshold, or 'tipping point', schools often become ethnically homogenous quickly, as students from other ethnic groups leave or apply to schools elsewhere. An effortful vigilance is therefore required if reductions in ethnic school segregation are to be maintained, or indeed built upon. While this challenge may be particularly acute for neighbourhoods that are diversifying for the first time, even places with a long history of educating students from ethnic minority backgrounds face new hurdles. Many schools in such neighbourhoods are now educating students with an increasingly broad range of ethnic identities, each of whom make up an increasingly small proportion of the overall school population. This newfound 'superdiversity' brings with it its own challenges, which may diverge substantially from the schools' previous experiences.
In addition to these substantive findings, the two methodological techniques this article has pioneered substantially improve research practices. In particular, the use of pseudo-neighbourhoods offers an appealing tool for measuring ethnic school segregation within local educational markets so that results are calculated at an appropriate scale and are not based on a single set of subjective area boundaries. While many researchers around the world could benefit from using this approach, those researching segregation in areas where boundaries are either contested or drawn in a way that groups people from the same ethnic background into the same district will find it particularly helpful, as will researchers studying places which lack a uniformly sized set of school districts, such as the border region on the Island of Ireland or the school system in the United States.
Secondly, the adoption of a multi-group index to calculate ethnic school segregation within a national study is another important development for English research, given the growing ethnic diversity of the school-aged population. It not only helps to avoid crude groups of White and Non-White students, which is out of step with modern notions of ethnic identity, but also encourages results to be reported in relation to the geographic areas in which they were measured rather than in relation to a given ethnic group(s). This better reflects the relational nature of segregation and encourages readers to conceptualise segregation in relation to local neighbourhoods, rather than individual ethnic groups, which in turn encourages a shared sense of responsibility for building ethnically integrated schools for families from all ethnic backgrounds living within a local area.

Limitations and future work
While the main conclusions of this article remain robust, readers should be reminded that segregation indexes are influenced by the ethnic profile of an area. This means that some of the differences in pseudo-neighbourhood H values are likely to reflect differences in the ethnic composition of school-age populations. While this does not invalidate the findings of this research, particularly given that the study removed the least diverse pseudo-neighbourhoods from the analysis, readers should nonetheless avoid making crude comparisons of segregation between pseudo-neighbourhoods with radically different ethnic profiles.
There also remains lots of room for new research. The author has already conducted an analysis into whether pseudo-neighbourhoods with certain characteristics experience higher average levels of ethnic school segregation and will soon publish an analysis that uses simulations to explore how school admissions systems relate to patterns of ethnic school segregation. However, academics will also need to go beyond producing empirical evidence and wade into questions about what should be happening. While the government is now committed to creating an integrated school system, there remains a need for a more detailed debate on what England's new school system should look and feel like in practice. it becomes impossible for there to be an even distribution of ethnic groups between schools. 4. While researchers can technically adjust for random segregation by running simulation analyses or amending the formulae of segregation indexes (Allen, Burgess, & Windmeijer, 2009), this introduces another layer of complexity to an already intricate methodological process. 5. Of the three not discussed in this article, two measure other forms of ethnic 'separateness' (Harris & Johnston, 2008, 2020 which do not have a unform relationship with evenness segregation, while a third calculates evenness segregation for students in London, primarily to discuss the relative merits of measuring segregation using multi-level models (Leckie & Goldstein, 2015). 6. These figures were gained by reading points of a figure in their article and therefore may not be exact. 7. Poorly phrased results in this field have had serious negative consequences in the past, even when used within academic contexts without any intended malice. For example, various UK newspapers ran headlines about worsening ethnic divisions in Britain after an academic gave a presentation on 'Muslim ghettos' in Britain as part of an Australian academic conference. The story gained widespread public and political attention in the UK, despite the substantial body of academic research that showed residential segregation was consistently falling across England (Johnston, Jones, Manley, & Owen, 2016;Peach, 2009), which added to the growing hostility faced by British Muslims in their day-to-day lives (Phillips, 2006, p. 26;Miah, 2016).

Notes
8. Private schools consistently educated around 7% of the English school-aged population between 2006-2019 (Parks, Chan, & Chan, 2021, p. 7), ranging from a high of 7.1% in 2009 to 6.6% in 2019. These figures were calculated by the author using School Census Data. According to a report released in 2019 by the Independent Schools Council, the proportion of students from ethnic minority backgrounds in the independent school sector was broadly similar to the percentage found within the state school system. Within the 1364 independent schools they surveyed, 66.1% of private school students were White British, with 33.9% from an ethnic minority background, compared to 67.8% and 32.2% in the state school system, respectively (Stevens, Parkes, & Shun-Kai, 2019, p. 17). Unfortunately, without further longitudinal data, it is impossible to know how consistent this was between 2006-2019. 9. See section 3.1.1 for details on how this study created 'pseudo-neighbourhoods'. 10. As this paper does not use a fixed set of neighbourhoods (see Section 4.3), the characteristics of each sample varied slightly between each iteration. All figures reported therefore relate to the average statistics across 100 different sets of neighbourhood boundaries and are given for the 2019 school year. These figures had minimal variation between iterations. A total of 99.9% of the filtered pseudo-neighbourhoods were removed because more than 90% of the student population were White British, while one pseudo-neighbourhood, on average, was removed due to the size of its Asian population. 11. While alternative clustering algorithms exist, k-means clustering is currently 'the most widely used' clustering algorithm (Celebi, Kingravi, & Vela, 2013) and is comparatively simple (Jain, 2010), which should increase the method's accessibility and encourage its uptake. 12. The formula for k-means clustering is detailed in the Appendix. 13. The author is happy to provide readers with a dataset of the 100 sets of pseudoneighbourhood boundaries used in this study. Please use the contact details listed at the start of this article. 14. This differs to conventional cluster analysis, which normally determines the number of clusters posteriori, based on their predictive power in relation to a given outcome variable. Instead, this study determines the number of clusters a priori, based on a subjective judgment about the size of educational markets. Consequently, the study did not run a conventional goodness of fit analyses on these clusters. 15. For a breakdown of H see section A4 in the appendix. 16. There was minimal variation across 100 iterations in the summary statistics. The Lower quartile ranged from 3.1-3.3, the median ranged from 5.0-5.3, the mean ranged from 6.7-7.0 and the upper quartile ranged from 8.0-8.5. This suggests that, when investigating the level of segregation for pseudo-neighbourhoods across England the location (or zoning) of pseudo-neighbourhood boundaries does not have a significant impact on results. 17. This is known as the 'argument minimum', which is represented as J c k ð Þ. 18. This diagram focuses on the conceptual steps taken by the vast majority of multi-group segregation indexes in the context of measuring ethnic school segregation. For more information onthe H index, and the formulas used in step 4 and 5, please refer tothe methodology.