Re-defining Transport for London ’ s strategic neighbourhoods from spatial and social perspectives

.


Introduction
The neighbourhood is an important but ambiguous topic in urban studies.Neighbourhoods are where daily human activities and interactions take place and their usage as boundaries and containers for localised urban conversations tends to assume a degree of social homogeneity within, so they are often used as the smallest spatial unit in sociological studies (Galster, 2001).However, in current neighbourhood studies, more focus is placed on defining neighbourhood characteristics or analysing neighbourhood effects rather than measuring the neighbourhood boundary or clearly defining the entity itself (Conway & Conway, 2021;Galster, 2008Galster, , 2012;;Li & Ashuri, 2018).In fact, most studies make simplifying assumptions about neighbourhood boundaries and often automatically apply political or administrative boundaries such as census tracts as proxies, when these units might be quite different from actual (socially or culturally coherent) neighbourhoods as defined by the residents and cause a misspecification that amplifies the diversity or homogeneity within the neighbourhood (Foster & Hipp, 2011) which can hinder meaningful analysis or policy making as described in the Modifiable Areal Unit Problem (MAUP) (Openshaw, 1984).
Fundamentally, there is not a universally agreed definition of what constitutes a neighbourhood.Earlier theories generally refer to a neighbourhood as the space within which the inhabitants form a certain relationship such as sharing common social characteristics (Goodman, 1977;Webster, 1979).Later definitions extended this relationship to more complex social processes such as "neighbourhoods are geographically bounded groupings of households and institutions connected through structures and processes" (Coulton et al., 1999), whereas Galster (2001) defined neighbourhood as "the bundle of spatially based attributes associated with clusters of residences, sometimes in conjunction with other land uses".Chaskin (1997) concluded the three essential dimensions of neighbourhoods: social, physical (geographical), and experiential, and suggested that conceptual and operational work on neighbourhoods should be based on local contexts.In these theories neighbourhood is often conceived in terms of both space and society.
Despite the various theoretical definitions, the specification of neighbourhood boundaries is still considered a difficult task in empirical applications.One of the reasons is that neighbourhoods do not necessarily have explicit physical boundaries (Deng, 2016;Foster & Hipp, 2011).As the formation of neighbourhoods is a complex process composed of spatial and social interactions, neighbourhoods are generally segregated by vague or uncertain boundaries which are constantly evolving.Another reason is that different social phenomena occur in neighbourhoods with different scales and perspectives (Galster, 2008).Different phenomena may influence each other, and the neighbourhood identified may differ.Another major reason is that, as Chaskin (1997) explained, the formation of neighbourhoods is the result of multiple micro procedures occurring over space and time.The perception of the neighbourhood boundary and size is heavily influenced by the local social and cultural contexts and varies as residents can form personal definitions of neighbourhoods according to their own social background and relations (Pebley & Sastry, 2009;Coulton et al., 2013).
In recent years, researchers have proposed several alternative approaches to specifying neighbourhood boundaries, which can be roughly categorized into individual, spatial, and social perspectives.The individual-level methods gather human-centred data through media such as surveys, interviews, on-site observations, or mobile phone and GPS track data, and aim to portray neighbourhoods by quantifying individuals' perceptions or mapping the social network of residents into space (Coulton et al., 2013;Engstrom et al., 2013;Weiss et al., 2007;Colburn et al., 2020;Hipp et al., 2012;Chambers et al., 2017;Gale et al., 2011).The cultural and social contexts and the impacts of the residents are reflected in the results, but the collection and processing of a large quantity of private data is time-consuming and can face ethical risks where, for example, consent for mobile phone traces may not have been explicitly given (Coulton et al., 2001).
The advancement of GIS technology makes it possible to divide space using geographic open data.On the spatial level, GIS can be used to effectively analyse large areas through batch processing algorithms.Examples include the applications of photogrammetry and remote sensing to acquire physical feature data (Cutchin et al., 2011), or the employment of machine learning to identify physical features from images or extract information from texts to measure neighbourhoods.Among these, an essential empirical approach is the tertiary-communities (T-Communities) method proposed by Grannis (1998).Grannis proposes that residents connected by pedestrian streets (which he refers to as tertiary streets) are more accessible to each other by walking, and therefore are more likely to interact and form close social relations, from which neighbourhoods are formed (Grannis, 1998(Grannis, , 2005)).Based on this principle, neighbourhood boundaries can be defined by following the continuity in pedestrian streets.Multiple studies have adopted the T-Communities method and achieved effective results (Davies et al., 2018(Davies et al., , 2019;;Foster & Hipp, 2011;Whalen et al., 2012).Although this method has also been criticized due to its neglect of social effects, it is also credited for its minimum data requirement.
On the social level, several methods have divided space by analysing the spatial distribution of social characteristics using demographic data.The general idea is that people with similar social backgrounds tend to live close to each other and different social groups are gradually segregated (Reibel, 2011).These theories value the roles of social characteristics in neighbourhood formation and resemble the general idea of geodemographics, which suggests that as groups of similar people are clustered into similar places, people can be analysed socially by where they live (Harris et al., 2005;Sleight, 2004).But the limitation of geodemographics is that multiple neighbourhoods can be clustered into the same group and therefore it does not produce discrete neighbourhood boundaries.Examples of social methods include adopting the classification and regression tree (CART) model to identify different social groups (Clapp & Wang, 2006), or by analysing the geographic gradient of several social factors (Kramer, 2017).Another popular method often used to identify housing submarkets based on spatial clustering of social factors is recognized for its efficiency and flexibility (Bourassa et al., 1999;Wu & Sharma, 2012).In this method, multiple social factors across the study area are collected and analysed through principal component analysis (PCA) to extract a set of orthogonal components, which are then used in spatial clustering analysis to identify neighbourhoods.
In the UK, the Census Output Area (and aggregations into slightly larger 'Lower Layer Super Output Areas -LSOAs), is often considered a close approximation of neighbourhood units and widely used in urban studies where quantitative 'neighbourhood' analysis is conducted.The production of Output Areas (OAs) followed some of the principles embodied in the sub-field of geodemographics (ONS, 2011) and took small geographic areas defined as part of the postcode system (built to facilitate postal delivery), and then rearranged and merged these zones to have a degree of intra-area social homogeneity using an algorithm called the automated zoning procedure (Martin et al., 2001;Martin, 2002).However, the need for a more updated classification of neighbourhood units to better reflect actual neighbourhood distributions has been rising in recent years, noticeably in London.In 2020, Transport for London (TfL) launched the Strategic Neighbourhood Analysis (SNA) project in response to the need to have smaller "neighbourhood" sized areas of London for which TfL's Low Traffic Neighbourhood schemes could be applied.In the SNA, roads with high traffic volumes are used as barriers to divide up the urban space in the city.The results produced a new representation of neighbourhoods in London (see Fig. 4), but TfL themselves were not fully satisfied with the results, noting (in private correspondence with this project team) limitations such as abnormally large or small areas and a focus on the spatial dimensions of the city at the expense of social dimensions.
In summary, various studies have been conducted on defining neighbourhood boundaries from either spatial or social perspectives using geographical data.But no empirical study has yet been carried out to ascertain whether social or spatial methods are preferable in any given context.In this paper, we address this gap using London as a vehicle to evaluate whether any particular methodology (social or spatial) offers clear advantages such as through accuracy (for example homogeneity of people dwelling within) or generalisability (methods transferable to other contexts).In achieving these theoretical objectives, this study also has a more practical objective in attempting to redefine TfL's Strategic Neighbourhoods.As such, the objectives of this piece of research are two-fold: Firstly, we aim to create a better set of neighbourhood boundaries for TfL's Strategic Neighbourhood Analysis project -'better' being defined in social terms as having better internal social homogeneity and increased heterogeneity when compared with surrounding zones and spatially as being more homogenous in terms of their size and shape.Our second objective is to compare social and spatial approaches to defining neighbourhoods in our study area and evaluate both the theoretical and practical implications of using one approach over another, using our analysis to make recommendations for future neighbourhood design.
In the next section of this paper, a brief description of the study area as well as data used for analysis is presented.Then we outline in Section 3, two sets of methodologies: the T-Communities method and the PCA&MST clustering method with their workflows presented in detail, followed by the methods for evaluation.The results are described in Section 4. In Section 5, we evaluate the relative performance and outcomes of the different methods in specifying neighbourhoods, using a case study of Islington to illustrate the discussion and explore the socialspatial underpinnings of these neighbourhoods, with conclusions drawn in Section 6.

Study area
Greater London (referred to as London below) has approximately 9 million people and a population density of 5725 per sq km, and is comprised of 33 local government authorities (or Boroughs) (ONS, X. Yan and A. Dennett 2020).London is one of the most urbanized cities in the world and has one of the most prosperous economies and diverse populations.According to the Greater London Authority's (GLA) estimation, as of 2019, 43% of the residents have a black, Asian or minority ethnicity (BAME) (ONS, 2021).However, just like many other metropolises, due to the impacts of over-crowding and high living costs, London is also undergoing uneven sub-regional development.Nearly two-thirds of London's Lower Super Output Areas (LSOA) exhibit deprivation, scoring above the national average on the Index of Multiple Deprivation (IMD1 ).Some 22.5% of LSOAs fall within the most deprived 20% of England, while less than 4% are among the least deprived decile.(GLA, 2019).Spatially, adjacent areas can have distinct differences in deprivation.The high urbanization rate, diverse population and unbalanced sub-regional development are accompanied by inequalities in access to transportation, housing, education, health care, and welfare, bringing challenges for sustainable urban development.For example, studies have discovered that the more deprived areas with higher proportions of BAME residents are more likely to be exposed to higher levels of air pollution and less availability of green space, leading to greater health risks and lower life expectancy (GLA, 2017a;GLA, 2018a;GLA, 2021).To address this situation and promote healthy urban development, the authorities have come up with policies including the Mayor's Transport Strategy (GLA, 2018b), Economic Development Strategy (GLA, 2017b), Environment Strategy (GLA, 2018c) and the Health Inequalities Strategy (GLA, 2018a).Where there has been a need to target policies in a more spatially explicit way, initiatives such as TFL's strategic neighbourhoods have emerged.

Data overview
The data used in this study mainly comes from public data released by the London authorities, with a small amount of non-public data licensed by TfL.Multi-level statistical GIS boundary files for London are acquired in the Shapefile format from London Datastore (GLA, 2014), and polygon Shapefile data for each of the postcodes in London are acquired from the Ordnance Survey (OS) Code-Point with Polygons product (OS, 2022a).In the T-Communities method, two levels of road data are used.The pedestrian road data is extracted from the OS Open Roads product (OS, 2022b), and the non-pedestrian road data is derived from TfL's Streets Framework Analysis (SFA).In the MST clustering method, several social factors are chosen and collected from the 2011 London Census datasets (GLA, 2012) and the OS Point of Interest (POI) product (OS, 2022c).In the evaluation part, the TfL-licensed SNA dataset, as well as community centre locations derived from OS POI data, are used to validate the results.

Methodology
Two methodologies will be applied separately to define neighbourhoods in London, including the T-Communities method based on physical road structure, and the clustering method based on PCA and the MST graph.The two methods and the result evaluation methods are introduced in this section.

Defining neighbourhood boundaries using tertiary-communities
According to the T-Communities theory (Grannis, 1998), individuals tend to live in patterns that promote social interaction among populations with similar backgrounds, where the ease of reaching one's neighbours rather than Euclidian distances is paramount in promoting such interaction.The T-Communities theory starts from a top-down spatial perspective of neighbourhood and makes assumptions about the reasons behind the emergence of neighbourhood phenomena, i.e., the influence of pedestrian streets, and thus deduces the distribution of communities formed.Such neighbourhoods are defined as tertiary-communities (T-Communities), where every household is reachable from every other household inside by only using pedestrian streets.Specifically, T-Communities are encircled by non-pedestrian streets and delimited by discontinuities in pedestrian street networks.Fig. 1 gives a graphic explanation of this definition, where the bold lines representing non-pedestrian streets serve as the outer boundaries, and each set of connected pedestrian street networks, represented by light lines, forms one T-Community.Hence, there are four T-Communities identified in the figure, one in area A and three in area B.
Before analysing the T-Communities, there needs to be a clarification of street levels.Pedestrian streets are defined as a combination of minor roads and local roads classified by the OS Open Roads system.For nonpedestrian streets, we refer to the Streets Framework Analysis (SFA) of TfL where roads are divided into several categories by the traffic movement score, and select the roads with a high movement score consisting of mainly arterial roads.The distributions of the road networks can be found in the appendix.
There has not been an explanation as to how to specifically divide internal space by road networks after setting the external boundaries in previous studies.We design the workflow as below (shown in Fig. 2).Postcodes for individual housing units are used as the minimum units for analysis.First, the study area is split into sub-areas by the nonpedestrian roads.Within each sub-area, the road network is formulated by pedestrian roads, on which network analysis is conducted to extract the connected components.For each postcode unit in the subarea, its Euclidean distance to each of the connected components is calculated, and then the postcode unit is grouped into the component from which it has the shortest distance.All the postcodes grouped into one connected component of pedestrian roads will form one single T-Community, i.e., neighbourhood.Components consisting of one single pedestrian road are removed from the networks, considering that neighbourhoods are unlikely to emerge from one single pedestrian road and also to avoid producing small neighbourhoods.Finally, neighbourhoods with an area smaller than 10,000 sqm are eliminated into their nearest neighbourhood based on experience.We used Python 3.9 to produce the T-Communities and ArcGIS 10.5 software for dataprocessing, and the link to the code can be found in the appendix.

Factor selection and pre-processing
The approach that defines neighbourhood by social characteristics is based on the idea that neighbourhoods emerge from groups sharing similar social backgrounds.According to Galster's neighbourhood  (Grannis, 1998).definition ( 2001), the characteristics of neighbourhoods can be summarized into ten categories: structural, infrastructural, demographic, social class, public service, environmental, proximity, political, socialinteractive and sentimental.Combined with experience from previous research, 22 factors related to neighbourhood are selected and summarized here into three groups: housing-related, demographic, and distance-based factors (shown in Table 1).
Postcodes are used as the minimum spatial unit for classification in this method as well.As postcode-level information is concealed in the census for privacy concerns, only data detailed to the LSOA level are available.The LSOA-level census data are first pre-processed using the Kriging interpolation tool in ArcGIS 10.5 to produce postcode-level variables (Oliver & Webster, 1990;Stein, 1999), with steps shown in Fig. 3a.For the POI-based variables, the Euclidean distance from the nearest POI object to each postcode is calculated.The distributions of the final variables can be found in the appendix.Before applying PCA, the variables are standardized into Z-scores using Python, with details described in the appendix.

Extracting factors using PCA
Principal component analysis (PCA) is a procedure to select features and reduce dimensions in the presence of multiple factors, by deriving a reduced number of linear combinations of the original variables while retaining a substantial amount of the information contained in them (Abdi & Williams, 2010;Bourassa et al., 1999;Jolliffe, 2005).Components of variables explaining more than 5% of the variance in the data are selected in this study.The Python code for applying PCA can be found in the appendix.

Spatial constraint clustering analysis using minimum spanning tree
Spatial clustering analysis refers to numerical methods for allocating objects of similar kinds into specific geographical space based on their attributes.In order to maintain the contiguity of neighbourhoods, the produced clusters need to be geographically cohesive by adding spatial constraints using the minimum spanning tree (MST) clustering algorithm.The relationship between objects is presented by a weighted graph where nodes represent the objects and the weight of the edge connecting two nodes is proportional to their similarity (Grygorash   , 2006).A spanning tree is an acyclic subgraph containing all the nodes from the graph, and the minimum spanning tree is the minimum-weight spanning tree of that graph (Gower & Ross, 1969;Yu et al., 2015).In the clustering algorithm, after establishing the minimum spanning tree, one edge that minimizes the differences in the generated groups is removed from the tree to produce two new minimum spanning trees.The removal is repeated in each iteration until reaching a defined number of trees, i.e., clusters.
The MST clustering is applied to each local authority in London separately to produce clusters to avoid null results, using a self-built ArcGIS toolbox (see appendix).In this study, we manually define the number of groups within each local authority as: where n i is the number of clusters in the local authority i, n is the expected total number of clusters, which is defined to be 2000 in accordance with the original SNA.p i is the number of postcodes in i, and p is the total number of postcodes.This is to make sure that all the produced neighbourhoods have similar number of postcodes so that they are consistent in size.This approach is acceptable in our context as postcode areas in London are relatively consistent in size.For other cities where postcodes vary significantly in size, the parameters p i and p should be replaced by other indices such as land area or population depending on the context.The grouping results are then merged together to produce the final neighbourhood boundaries.

Evaluation of neighbourhood boundaries
We conclude from the discussions on neighbourhood concepts that an ideal way to specify neighbourhood boundaries should maximize between-group variance and minimize within-group variance, and be suitable for the visualization and interpretation of subsequent analyses (Foster & Hipp, 2011), and follow the criteria proposed by Goodman (1980) as the standards for choosing the neighbourhood division method, including.
A) Homogeneity: postcodes within the neighbourhood should share similarities in social characteristics; B) Simplicity: there should be fewer neighbourhoods in a given geographic location rather than more; C) Contiguity in space: postcodes within the neighbourhood should be adjacent to one another.D) Consistency in size: the neighbourhoods should be consistent in size and scale with each other Following these criteria, the neighbourhood boundaries from the two methods are compared with the original SNA results and evaluated using the following indices.

Neighbourhood size
The sizes of the neighbourhoods are summarized and compared to evaluate the results.An ideal method should produce neighbourhoods with more homogeneous sizes and fewer outliers.

Intra-class correlation coefficient
To measure the ability to maximize between-group variance and minimize within-group variance, the intra-class correlation coefficient (ICC) is calculated for each social variable to compare the results.ICC is an index that assesses the significance of clusters, expressed as: where SSB denotes the total sum of squares between groups, and SSW denotes the total sum of squares within groups.Therefore, ICC is equal to the ratio of between-group variance to the total variance, and a higher ICC indicates a better ability to generate areas that represent the intraarea homogeneity and inter-area variation of the variable (Cutchin et al., 2011).

Proportion of internally disconnected neighbourhoods
The ratio of the number of internally disconnected neighbourhoods to total neighbourhood number is measured to meet with the contiguity criteria.An internally disconnected neighbourhood is defined as a neighbourhood consisting of multiple polygons, where some postcodes within the neighbourhood are inaccessible from one another.The better method should produce a proportion closer to 0.

Number of community centres
While statistical tests are useful, any final set of data-driven boundaries should resemble actual London neighbourhoods.In the absence of an extensive city-wide ground-truthing exercise, it is reasonable to examine proxy features.To do this, we turn to the presence of community centres.Community centres in the UK are public facilities where local residents can come together to socialize or use the space for a variety of community events, and are normally located within a town/ city/village centre within easy walking distance for their members.The locations of the community centres can to some degree reflect the spatial clustering of residents and therefore be suggestive of the presence of neighbourhoods.We acknowledge that proxies like this are imperfectalternatives could have been pubs (although in central areas these would be skewed by daytime populations) or religious buildings like churches, synagogues and mosques (but declining populations identifying with a religion also make these imperfect).Where there is no such thing as a perfect neighbourhood proxy in any form, we use the presence of community centre cautiously.Of course not all communities have community centre buildings and some large residential concentrations might have more than one; but our instinct is that something close to an average of one per neighbourhood or at least a relatively even distribution amongst zones might be reasonable to expect.

T-communities results
Neighbourhoods derived by the T-Communities method are shown in Fig. 4a.There are 3492 neighbourhoods in total, with an average area of 449,884 m 2 .The area is divided into 1244 sub-areas, and the sub-areas have an average number of 2.8 neighbourhoods in each of them, with a minimum of 1 and a maximum of 94 neighbourhoods.The link to the output neighbourhood data is shown in the appendix.

Clustering results
The results of PCA are shown in Table 2.There are four principal components chosen for clustering analysis, with their distributions mapped in Fig. 5, together explaining over 70% of the total variance.The contributions of all 22 factors in each component are summarized, with the weights over 0.20 highlighted in bold texts.By comparing the higher weighted factor combination, each component can be explained as focusing on one or two categories of social characteristics.Component 1 has higher weights in demographic and housing type variables, and is roughly related to the sub-urban areas of London as shown in Fig. 5; component 2 has higher weights in economic factors relating to housing markets and job markets, which is higher in the city centre; component 3 is mainly concerned with geographic proximity to services, and component 4 relates to both housing and demography.Among these components, component 1 alone explains 36.73% of the total variance, followed by component 2 of 22.12%, indicating that housing and demographic variables are more significant than proximity-related variables in clustering.
Using the derived components, the results of MST clustering analysis are shown in Fig. 4b.There are 2070 neighbourhoods in total, with an average area of 758,376 m 2 .

Comparing neighbourhood sizes
The neighbourhood sizes for the two results and the SNA output are summarized in Table 3 with their distributions visualized in a boxplot in Fig. 6.Among the three neighbourhood results, T-Communities neighbourhoods have the smallest average size and the largest neighbourhood number.It also creates fewer outliers (0.91%).The clustering neighbourhoods have the largest average size and the smallest variation, indicating that most neighbourhoods have similar sizes to each other.It has a slightly higher proportion of outliers (1.88%) than the T-Communities neighbourhoods.The SNA neighbourhoods have the largest variation in sizes and the largest percent of outliers (5.86%).To conclude, the T-Communities method creates neighbourhoods with the least number of areas too large or too small, and the clustering method produces neighbourhoods with the most consistent sizes.Both methods outperform the SNA in terms of creating neighbourhoods with homogeneous sizes.

Comparing ICC
The ICC for each variable used in the clustering analysis is calculated for the three neighbourhood results.It can be seen from the result in Table 4 that the T-Communities method has the highest ICCs in 18 out of the 22 variables, while the clustering method has the highest ICCs in the other 4 variables including House Price and three proximity-based variables.The T-Communities results have ICCs higher in all variables compared with SNA, while the clustering results have ICCs higher than SNA in 16 out of the 22 variables.The performance of the clustering method is somewhat surprising considering that its neighbourhoods are grouped using the combinations of the exact same variables, but are outperformed by the spatial T-Communities method.We might also expect the smaller SNA areas to have less within area-variation relative  to between area variation, but this is not the case.Generally speaking, the T-Communities method can produce neighbourhoods that best minimize intra-group variation and maximize inter-group variation of social variables, which strongly suggests that neighbourhoods derived from physical structures can usefully explain the spatial patterns of certain social phenomena.

Comparing the proportion of internally disconnected neighbourhoods
The T-Communities results have the highest proportion of internally disconnected neighbourhoods, with 26.67% of the neighbourhoods containing multiple polygons of postcodes (see Table 5).An explanation of how a disconnected neighbourhood unit comes about is described in the appendix.The clustering neighbourhoods and the SNA neighbourhoods have relatively better connectivity within the neighbourhoods, where clustering has the proportion of disconnected neighbourhoods closest to 0 and therefore best meets with the "contiguity in space" criteria.

Comparing the number of community centres
The number of community centres that fall inside is calculated for each neighbourhood of the three results, with the results shown in Table 6 and Fig. 7.In comparison, the differences in the average number are not significant between three sets of results, but there still exist slight differences in the numerical distribution worth analysing.The T-Communities neighbourhoods have the smallest standard deviation and a relatively smaller average number of community centres (0.86) with over half having 0 centres and nearly 40% having 1 or 2 centres, which is consistent with its smaller average area, but it also creates the most outliers (10.68%) that have too many community centres inside one neighbourhood.A possible explanation is that as zones considered too small are removed in the T-Communities (by eliminating them into adjacent larger zones), the neighbourhoods produced from these larger zones might actually contain multiple communities instead of one.The clustering neighbourhoods have the largest average number of community centres (1.45), and the mean value closest to 1.Most clustering neighbourhoods have 0 to 2 centres, with 5.99% outliers.The SNA neighbourhoods have an average of 1.35 centres inside, and over half of the neighbourhoods have 0 centres, with another 25% having 1 or 2 centres.The results suggest that both the T-Communities method and the clustering method, as well as the SNA, produce neighbourhoods the most of which can be matched with a community centre, and can resemble actual communities to a certain extent.

Differences between the three neighbourhood results
Compared to the original SNA boundaries, the T-Communities and clustering neighbourhood methods have nearly full coverage of London with almost no areas left out, whereas areas such as certain industrial lands and water bodies are removed from the SNA, leaving large gaps between neighbourhoods.Comparing the neighbourhood size, the number of community centres and the proportion of internally disconnected neighbourhoods, the T-Communities method creates fewer neighbourhoods that are too large or too small, since neighbourhood zones considered too small are merged into the nearby zones.One downside of this feature is that in pseudo-ground-truthing the results through observing the presence of community centres, it becomes clear that some T-Communities neighbourhoods may exhibit multiple community centres within.Where these community centres are in close proximity such as in the Southern tip of Islington (Fig. 8), this may be more of a function of abnormally clustered community centres; however where they are more dispersed such as in the North West corner of the Borough (Fig. 8A), this could be indicative of less successful zonation.Comparatively, the neighbourhoods created by clustering are more uniform in size and better internally connected, which can better meet with two of our neighbourhood definition criteria in Section 3.3 -contiguity in space and consistency in size.The fewer outliers of community centre number also indicate more accurate zoning of community centres in areas such as the ones mentioned above.
The results of ICC indicate that the neighbourhoods generated by T-Communities have more homogeneous intra-neighbourhood social attributes than the clustering neighbourhoods, which might be contrary to expectations.The clusters identified by combinations of multiple variables may not accurately represent clusters of a single variable, as the overall social characteristic is the main consideration.Another possible explanation is that as the social variables are obtained by interpolating higher-level datasets, the values might not be accurate for each postcode and can have a negative impact on the classification results.It should also be noted that the number of groups can have a slight effect on ICC values and might have caused the difference (Müller & Büttner, 1994), but the difference in neighbourhood number between T-Communities and clustering results is relatively small compared to the postcode number, and therefore ICC is still considered a reasonable index to compare intra-class homogeneity.Compared with the SNA neighbourhood boundaries which also employ non-pedestrian roads as divisions, the T-Communities which applies pedestrian roads on top of non-pedestrian roads have higher ICCs, proving that pedestrian roads are crucial to neighbourhood definition.
Generally speaking, the T-Communities and clustering methods both outperform the SNA in terms of creating more uniform-sized, better London-coverage, closer-to-reality and more-spatially-contiguous neighbourhoods with more homogeneous social characteristics, which confirms the feasibility of defining neighbourhoods with the spatial and social perspectives.In comparison with each other, the clustering method generally produces more uniform-sized and better-spatiallycontiguous units.The T-Communities method produces neighbourhood units with more internal homogeneous social characteristics.Each may be preferable for different applications, for example, Zoning for school catchments which might consider mixing socio-demographic groups (T-communities).

Comparing the spatial and social methods in a London borough
To further test their similarities and differences in spatial structure and expressing social characteristics, the two sets of neighbourhoods that fall inside the Islington borough are extracted as an example, shown in  In the areas closer to the City of London, there are more neighbourhoods defined by T-Communities, and larger variation can be found between adjacent neighbourhoods, whereas the division by clustering is more detailed in the centre area.Although both methods have identified the neighbourhoods in this area with relatively satisfying results, there are still differences between the results and actual neighbourhoods.This is also in accordance with our argument in the beginning that the same characteristics can be portrayed differently using various neighbourhood boundaries, and that these differences in characteristics, however small they are, can have significant impact on neighbourhood analysis and policy making, especially as neighbourhood-level research becomes increasingly popular and important (GLA, 2018a;GLA, 2018b).This, of course, is the real-world essence of the modifiable areal unit problem (Openshaw, 1984).

Contributions to neighbourhood definition
The two methods can be applied flexibly in different scenarios.For example, in T-Communities, the neighbourhoods can be identified at different spatial scales by including successively more major roads in the analysis; in the clustering method, neighbourhoods representing different subjects can be produced by selecting different social variables.By using data collected at different time-points, a dynamic neighbourhood-changing process can also be obtained.Compared with alternative methods such as the classification method using photogrammetry and machine learning, the T-Communities method does not require prior knowledge or ground truth information for classification, which makes the processing more achievable (Cutchin et al., 2011) in different urban contexts.Spatial road network data can be obtained easily from sources like OpenStreetMap for cities across the world and as such is repeatable in a variety of different contextssomething which is important given the more patchy and less timely availability of good quality social data.The advantage of clustering, compared with other social-level methods, is that the results have practical physical meanings, i.e., they can be mapped into individual households represented by each postcode in the neighbourhood, unlike in other social-scale studies where space is divided using small grids (Clapp & Wang, 2006;Kramer, 2017).The integration of MST also ensures the spatial coherence of the neighbourhoods and achieved good spatial contiguity (Grygorash et al., 2006).
The validity of the T-Communities and clustering results proves that the outcome of neighbourhood formation can be inferred from either its spatial structure or its social attributes, but they do not necessarily point to a causal relationship between spatial structure, social attributes and neighbourhood formation.In other words, how neighbourhoods are formed cannot be explained from this perspective.According to Grannis' proposal of T-Communities, the physical structure of road networks directly shapes neighbourhoods, as people tend to socialize with those in close proximity or choose to live next to people sharing a similar social background (Grannis, 2009).However, the formation of neighbourhoods can also in turn change the structure of the roads, as residents may actively lobby to reduce speed limits, widen pavements, pedestrianize or block-off through flow ("rat-runs") to improve safetydowngrading the status of the roads in the process.
Similarly, while social characteristics are the driving force that brings people together, the clustering of homogeneous social characteristics could also be the outcome as well as the cause of neighbourhood formation, as the clustered residents influence and change each other to develop new social characteristics.From this perspective, the underlying theory behind the two methods is actually homologous to Soja's sociospatial dialectic theory (Soja, 1980).Neighbourhood formation is considered a complex process where social and spatial factors continuously influence each other and shape the neighbourhood together.We Fig. 9. IMD distribution for the two neighbourhood results in Islington.
X. Yan and A. Dennett conclude the relationship between spatial structure, social characteristics, and neighbourhood outcome in the form of a dynamic cycle shown in Fig. 11, where the spatial structure and social characteristics indirectly affect each other through the physical and social outcomes of neighbourhoods.This process is also consistent with some previous definitions of neighbourhoods (Chaskin, 1997;Galster, 2001;Lee, 1968).Based on these previous studies and the empirical analysis presented in this paper, we suggest that the evolution of the neighbourhood is a dynamic process formed through the interaction and clustering of residents and impacted by the urban physical structure and social factors, the outcome of which in turn influences the social and spatial factors, consequently forming its future state.
On this basis, an ideal way to define neighbourhoods and their boundaries should combine both social and spatial attributes, with reference to local socio-spatial contexts and experience, similar to the three levels of neighbourhood summarized by Chaskin (1997): space, society and experiment.
That said, it is worth reiterating that we have shown in this analysis that neighbourhoods defined from purely spatial attributes (T-Communities) display a large degree of socially homogeneity.This finding for anyone seeking to define neighbourhoods in the absence of highresolution social data runs counter to the critique that T-Communities ignore the social dimension of neighbourhoods.Through the lens of the socio-spatial dialectic we have been able to show that, because neighbourhoods are both socially and spatially constructed over many years and iterations, a spatial conception of a 'neighbourhood' actually captures a large degree of social homogeneity within.Meanwhile we believe this finding not only applies to cities like London where a highlydeveloped road network exists, as our theory does not involve the scale and size of the city and therefore should be universal.Further work is required, however, to test this contention in other urban settings.For cities with a less-developed formal road networks, other spatial structures or features such as footpaths (for example in informal settlements) or, waterways can be applied for T-Communities generation.This has useful benefits, not just for organisations like TfL in well-measured cities like London, but for cities and civic entities without good access to finegrained social data but where physical data via satellite or Open-StreetMap, perhaps are more widely available.We have shown that it is possible to create socially viable neighbourhood definitions using the T-Communities methodology and this is a beneficial practical outcome of this piece of research.

Conclusions
In this research, we set out to evaluate two sets of neighbourhood delineation methodologies: the T-Communities method and the PCA & MST clustering method.Compared to a set of SNA neighbourhoods developed by TfL, we found that: 1) both of the two new neighbourhood systems have better internal social homogeneity and increased external heterogeneity; 2) both of the neighbourhood systems are more homogenous in size, better spatially contiguous, and can better resemble actual communities spatially; 3) depending on the context and the desired outcomes, both methods produce useful zonation outcomes, although the T-Communities method does so with fewer inputs and wider application potential due to the availability of open road network data.
In evaluating the relative merits of the social and spatial approaches to neighbourhood delineation we discuss the dialectic relationship between neighbourhood outcomes, spatial structures, and social characteristics through a neighbourhood formation cycle.It is clear that an optimal way to define neighbourhood boundaries accounts for both spatial and social attributes as ideally combines this with experimental knowledge in a particular urban context.We have also shown that in   practical terms, because of the dialectical relationship between the social and spatial makeup of neighbourhoods and the fact that these two apparently separate features are closely intertwined, it is possibleand in many cases more practicalto define neighbourhoods from solely physical characteristics and still capture a large amount of social homogeneity in the process.
In utilising an applied contemporary urban setting and a real-world policy imperative to shed light in this area, we have contributed to an empirical blind-spot in the study of neighbourhoods proving that the social variation of the neighbourhood can be deduced through its spatial structure, and vice versa.The results and methods can potentially support further neighbourhood analysis and decision-making in other urban contexts, although it will be informative to replicate this methodology in other urban contexts to confirm whether this relationship observed in London between society and space exists more broadly in other urban settings.This work can be usefully developed through the application of these methods and careful evaluation in other urban contexts.

Fig. 1 .
Fig. 1.A graphic definition of T-Communities, from T-Communities: Pedestrian Street Networks and Residential Segregation in Chicago, Los Angeles, and New York (Grannis, 1998).

Fig. 10 .
Fig. 10.BAME resident distribution for the two neighbourhood results in Islington.
X.Yan and A. Dennett

Fig. A. 1 .
Fig. A. 1.The distribution of a disconnected neighbourhood unit.

Fig. A. 3 .
Fig. A. 3.The distributions of the social variables for the postcodes.

Table 1
Variables selected for neighbourhood definition.

Table 2
The results of PCA (with weights over 0.20 highlighted in bold texts).
X.Yan and A. Dennett

Table 3
Statistics of the neighbourhood sizes.

Table 4
The ICC results of T-Communities, clustering and SNA.

Table 5
The proportion of internally disconnected neighbourhoods.

Table 6
Statistics of community centres number.
X.Yan and A. Dennett Yan and A. Dennettcharacteristics, the IMD and BAME variables are used as the representing variables.The value of each neighbourhood is aggregated by calculating the average value of the postcodes weighted by population density.Figs. 9 and 10 show the distribution of IMD and BAME in the two sets of neighbourhoods in Islington.IMD and BAME have generally similar distributions in the two maps.The neighbourhoods in the north have higher IMDs, while neighbourhoods in the south closer to the City of London are less deprived.The central neighbourhoods have the lowest BAME residents, with two clusters of high BAME residents in the north and the southwest. X.