A machine learning methodology to quantify the potential of urban densification in the Oxford-Cambridge Arc, United Kingdom

Regional-scale urban residential densification provides an opportunity to tackle multiple challenges of sustain- ability in cities. But framework for detailed large-scale analysis of densification potentials and their integration with natural capital to assess the housing capacity is lacking. Using a combination of Machine Learning Random Forests algorithm and exploratory data analysis (EDA), we propose density scenarios and housing-capacity estimates for the potential residential lands in the Oxford – Cambridge Arc region (whose current population of 3.7 million is expected to increase up to 4.7 million in 2035) in the UK. A detailed analysis was done for Oxfordshire, assuming different densities in urban and rural areas and protecting lands with high-value natural capital from development. For a 30,000 dwellings-per-year scenario, the land allocated in Local Plans could cover housing growth in the four districts but not in Oxford City itself (which accounts for 48% of the demand); only 19% of the need would be covered in low but 59% in high housing density scenarios. Our study suggests a decision-support method for quantifying how the impact of housing growth on natural capital can be significantly reduced using more compact development patterns, protection of land with high-value natural capital, and use of low-biodiversity brownfield sites where available.


Introduction
Urbanisation is a global trend. For example, in the UK the urban population is projected to grow from 57 million in 2021 to 63 million in 2035 while the rural population shrinks from 11 million to 9 million (UN, 2018). Urban living can be more sustainable, due to high-density housing, mixed use and, in many countries, a well-functioning public transport system and promotion of cycling and walking (Burton, 2000). Urban areas can grow either through expansion or densification (Dembski et al., 2020). For expansion, buildings are added at the margin of the existing urban area, typically resulting in loss of 'greenfield' land such as farmland or natural land. Expansion, however, may compete with the demand for land for food production, nature recovery and other 'ecosystem services' such as carbon sequestration and flood protection, especially in densely populated countries such as England. Expansion can also contribute to 'urban sprawl', where low density development increases car dependency.
For densification, buildings are added to available land within the urban boundary. Densification may often be a more sustainable option than expansion (Dembski et al., 2020;Mohajeri & Gudmundsson, 2014;Mohajeri et al., 2015) and can take several forms. First, densification can be on 'brownfield' land, i.e., unused land which has previously been developed, e.g., for commercial or industrial purposes. Although brownfield land is sometimes contaminated or polluted, some sites may regenerate to a mosaic of scrubby habitats with high biodiversity values. Redevelopment of such brownfield sites can be associated with inner-city transformation as part of urban regeneration (Dembski et al., 2020). Second, densification can also be associated with change in the use of buildings, such as from offices to residential spaces, particularly in urban centres (Clifford et al., 2019), or the transformation of suburban areas, e.g., by building on gardens and green spaces (Charmes & Keil, 2015). Third, densification can be 'soft' and/or 'hard' as regards the associated physical change. Soft densification refers to incremental changes in the built environment, such as through 'garden grabbing' (Sayce et al., 2012;Bibby et al., 2020), whereas hard densification refers to large-scale projects for urban development (Touati-morel, 2015).
Excessive densification, however, can lead to overcrowding and exacerbate local environmental impacts such as noise, traffic, and lack of green space (Brunner & Cozens, 2013;Howley et al., 2009;Melia et al., 2011;Neuman, 2005;Williams, 2000). Many studies that favour urban densification strategies do not address the commonly associated lack of access to urban green space (Byrne et al., 2010;Haaland & van den Bosch, 2015), which is particularly important for human health and well-being during global pandemics such as Covid-19 (Burnett et al., 2021;Hamidi & Zandiatashbar, 2021;Volence et al., 2021).
The impact of urban densification or expansion depends primarily on two factors: the previous land use (brownfield sites, urban green spaces, gardens, farmland, or natural land) and the housing density. In the UK, concern about growing urban sprawl in the 1990s led to efforts to increase land use efficiency by increasing housing density. Minimum housing density was set at 30 dwellings per hectare (dph) for new residential development in the 2004 National Planning Policy Framework (NPPF) Planning Policy Guidance 3 (PPG 3), with much higher densities (>50 dph) in and around town/city centres and key transport nodes. In 2012, however, the NPPF removed this requirement and allowed local planning authorities to set their own local guidelines. Minimum densities, however, had to be reintroduced because of the ongoing housing crisis in the UK. In 2018 major changes were introduced in a draft proposal to the NPPF (NPPF, 2018). These included giving local councils the power to reject planning applications for residential developments if the density was regarded as too low, together with renewed emphasis on brownfield development. Additional changes included further measures to promote the conversion of shops into housing and, crucially, the reintroduction of minimum density targets for housing developments around transport hubs and in urban centres. In NPPF (2021), greater densities are encouraged around city-centre transport hubs as well as in suburban areas.
Researchers are increasingly applying a range of methods to study the potential for urban densification from a city or neighbourhood scale (e.g., (Amer et al. 2017) and Attia 2019, Broitman andKoomen 2015); Vuckovic et al. (2017) to a national scale (Bibby et al., 2020;Wang et al., 2019). Amer and Attia (2019), Amer et al. (2017) developed a method to assess density potentials through roof-stacking at the building level for Brussel. Bolton (2021) developed a new tool, the Space Ratio, defined as the ratio of the existing density to the permissible density (the developable density according to a specified policy or scenario), for measuring and mapping density potentials in London. In a large-scale bottom-up approach, Eggimann et al. (2021) developed a geospatial simulation framework to assess densification potentials at a neighbourhood scale and then applied this to simulate densification potentials at a national scale for Switzerland using supervised archetype classifications. By contrast, Hargreaves (2015) provided a top-down regional modelling framework to forecast the number of dwellings and their average residential density based on a spatial interaction model, which was later calibrated and validated at a district scale using detailed GIS data (Hargreaves, 2021).
Some recent studies use image analysis and Machine Learning (ML) to study how individual buildings may evolve over time (Hecht et al., 2013;Hussain & Chen, 2018;Moosavi, 2017), while others use ML to predict how and where a city neighbourhood develops (Reedes et al., 2019). ML has also been used to analyse existing patterns and processes of neighbourhood development to understand complex urban processes such as gentrification, that is, the displacement of working-class residents of a neighbourhood by wealthier professionals (Reedes et al., 2019). Identifying the spatial differences between neighbourhoods at a regional scale (Schirmer and Axhausen 2015) and classifying their characteristics (e.g., housing density) provides useful information that can be used to develop scenarios for future sustainable housing developments.
This study provides a novel contribution through developing a datadriven methodology to quantify the potential of housing densification in Oxford-Cambridge Arc in the UK with a view of reducing the environmental impact of development in the Arc. Another novel approach here is to use an exploratory data analysis to develop three density scenarios (high, medium, and low) that takes into account the value of 'natural capital' assets such as productive farmland and woodlands. More specifically, the main aims of this study are as follows: (i) To develop a datadriven methodology to classify the existing residential neighbourhoods and their spatial characteristics as Centre, Urban, Suburban, and Rural, in the Oxford-Cambridge Arc, using the ML algorithm of Random Forests (RF). (ii) Assuming a uniform density, a second aim is to use the results of the classification to estimate the total housing capacity for the entire Arc region by the year 2035 (by which time the Arc population is expected to have increased from its current population of 3.7 million to as many as 4.7 million) for four pre-defined housing growth scenarios (ITRC, 2020; Fig. 1). (iii) The third aim is to use exploratory data analysis to develop three density scenarios (high, medium, and low) that integrate the high value of 'natural capital' assets such as productive farmland and woodlands with lower densities. (iv) The fourth aim is to apply the density scenarios to the county of Oxfordshire, the only one of the five counties in the Arc where we had access to spatial data on land allocations for housing. Considering the variation in housing density from the urban centre to rural (non-uniform density) and the natural capital value of the land, we quantify the potential for densification to reduce pressure on land for housing.

Oxford-Cambridge Arc housing growth scenarios
The Oxford-Cambridge Arc has been identified as a key area for high population growth (Fig. 1a). This fast-growing area comprises 26 Local Authority Districts (LADs) (Fig. 1b), is home to 3.7 million people (in 2017), provides over 2 million jobs, and contributes over £110 billion of Gross Value Added to the UK economy per year (ITRC, 2020). But the Arc also contains highly productive agricultural land and natural land which is vulnerable to loss from development. In 2016, the National Infrastructure Commission (NIC) was asked by the UK Government to explore the growth potential of the Arc as a single, knowledge-intensive cluster (ITRC, 2020). Population change, economic growth and the infrastructure services in the Arc have wider impact on a regional and national level in the UK. New developments in the Arc provide opportunities to build to the highest standards of energy efficiency, develop the transport infrastructure, and design sustainable drainage. But they also present challenges to preserve green corridors, minimise impact on natural capital (including food production), and design liveable places (ITRC, 2020). Thus, the NIC identified four inter-related policy themes as important for facilitating future growth throughout the Arc: (i) productivityensuring that businesses and skills are supported so as to maximise the Arc's economic prosperity; (ii) place-makingdelivering sufficient affordable, high-quality homes, workplaces and community places; (iii) connectivity-improving infrastructure -for transport, digital connectivity, and utilities; and (iv) environmentnamely, to protect and enhance the natural environment. In this study, we focus on two of these themes, place-making and environment, both of which present great challenges for the future of the Arc (ITRC, 2020).
While there are many possible future development patterns and choices for the Arc, this study uses a set of housing growth scenarios that was developed as part of the NIC consultation from 2016 to 2018 (5th Studio 2017;ITRC, 2020): Baseline, Unplanned development, Expansion of existing settlements, and New settlements (Fig. 1c). For the Baseline scenario, housing growth is assumed to be 14,460 new dwellings per year, which equals the average number of new dwellings completed in recent years (Fig. 1d), and population grows from 3.7 million people in 2017 to 4.1 million in 2035 and 4.4 million in 2050. In Unplanned development new housing development takes place at a rate of 18,978 new dwellings per year, but developments can occur on an ad-hoc basis without an overall spatial vision. For Expansion of existing settlement, two different dwelling projections were considered: (i) a high-growth scenario of 23,000 new dwellings per year to meet the needs within the Arc and (ii) an additional 7000 dwellings per year to relieve pressure from Greater London and South East England (that is, 30,000 new dwellings per year). The Expansion scenario will be partly in the form of densification whereby additional dwellings are built within the existing urban areas, primarily on brownfield land and/or permitted greenfield land, and partly through expansion, that is, adding dwellings at the margins of the existing urban areas. In the Expansion scenario the population is expected to grow from 3.7 million to 4.7 million, by 2035, and 5.4 million, by 2050. In this study we focus on the Baseline, Unplanned, and two Expansion scenarios and ignore the New Settlements scenario.
All the scenarios extend from 2020 to 2035. Fig. 1c shows the total  number of dwellings (in millions) for each scenario in the Arc. This total growth is apportioned across the 26 LADs based on the population growth and the average number of dwellings completed in the past up to the year 2017 (Table 2; ITRC, 2020). Fig. 1d shows the number of dwelling completions per year (in thousands) within the Arc since 2004, and the assumed future completions under each scenario.

Input data and pre-processing
Datasets on urban development in the Arc were collected and processed using GIS Spatial Analysis Tools, to train a Machine Learning model to recognise distinct patterns of spatial characteristics (such as the size and density of residential buildings, gardens, and roads) for different zones (Centre, Urban, Suburban and Rural). Several layers of input data were used ( Fig. 2; Table 1). The following features were selected and computed at the neighbourhood scale, using a grid which covers the total area of the Arc with a pixel size of 500 m × 500 m: (a) Dwelling geometric features (e.g., building footprint area). (b) Streetnetwork characteristics (e.g., street density). (c) Density of dwellings, that is, the number of housing units or dwellings per hectare (dph). (d) Building residential classes (detached, semi-detached, terraced, and flats). (e) Statistical characteristics of the surrounding pixels, such as the minimum, maximum, and mean values of the above attributes for the eight pixels surrounding the central pixel ( Fig. 2). While the maximum number of surrounding pixels around the central pixel is 8, in some areas such as at the boundaries of cities or towns, there are pixels that do not contain buildings and are, therefore, excluded. The attributes of the surrounding pixels were computed using geospatial tools in combination with the Python programming language. The features were selected to reflect the residential urban forms at the neighbourhood level (pixel size of 500 m × 500 m).
To determine how much land is currently allocated for housing development in the Arc, we collected the areas of sites (sites within the urban areas and in the margin of urban areas) assigned for housing development, in hectares, from the Housing and Economic Land Availability Assessments (HELAA) reports and Local Plans reports of the 26 Local Authority Districts (LADs). The areas of these sites (larger than 0.25 hectares) are estimated for the year 2035 (in most of the LADs). Sites smaller than 0.25 hectares are not included in this study because the lack of available data for all the LADs. However, such small sites may contribute significantly to meeting the housing requirement of the LADs and are often built out relatively quickly. The land availability assessment and the associated report provided by each local council comprise several analyses of the existing sites and consider physical and environmental constraints. These include access to services (e.g., parks, shops, schools, bus, and train stations), and landscape features, nature and heritage conservation, as well as assessments of risks of floods, pollution, and contamination. Of all the sites allocated for housing development from 26 LADs, we have access to the digital GIS data only for the 5 LADs from the Oxfordshire County Council. In addition, the National Housing Federation (NHF, 2019) has also identified and mapped more than 18,000 brownfield sites across England, covering over 26,000 hectares of land (CPRE, 2019). As this was carried out in 2017, after development of the Local Plans, there should not be overlap between the two datasets. A minimum estimate of net dwellings suggests that it is possible to develop more than one million net homes on all the brownfield sites in England (CPRE, 2019). Table 3 shows the potential areas of sites in hectares for housing development in Local Plans and in brownfield lands for the 26 LADs in the Arc.

Machine Leaning (ML) for residential density classification
Here we apply an ML algorithm to classify existing residential neighbourhoods within the Arc region into Centre, Urban, Suburban and Rural classes based on spatial characteristics including housing density (see results in Sections 3.1 and 3.2). Random Forests (RF), an ensemblelearning algorithm (Breiman, 2001), is used in this study for residential neighbourhood classification. The RF algorithm applies the technique of bagging (bootstrap aggregating) to decision-tree learners (Breiman, 1996). In bagging, multiple trees on random subsets of data are trained with replacement. For classification, the majority vote is then used to predict the label of a new observation. RF is a popular ML algorithm for various reasons, including: (a) RF is an ensemble learning which generally limits the overfitting of the data. (b) The use of bootstrapping enables RF to work well on relatively small datasets. (c) Predictors can be trained in parallel. (d) Decision-tree learning enables automatic feature selection. (e) RF does not require much hyper-parameter tuning (Breiman, 1996(Breiman, , 2001). (f) RF provides a feature importance metric. For the RF implementation we use Python's scikit-learn package.
We classify the residential neighbourhoods according to their spatial locations (Centre, Urban, Suburban, Rural) using the features described in Section 2.2. Fig. 2 shows the workflow for the ML classification of residential neighborhoods in the Arc. The importance of the features during training the RF classification of the residential neighbourhoods based on their spatial locations (Centre, Urban, Suburban, Rural) are shown in Fig. 3a and the model accuracy is shown in Fig. 3b. While all the features listed in Fig. 3a contribute to the training of the RF classification, the features are clearly of very unequal importance. The three variables of the greatest importance are the (1) total footprint area, (2) dwelling density (number of dwellings per hectare, dph), and (3) number of private gardens of the surroundings (Fig. 3a). In the present work one of the main aim is to assess the potential for accommodating greatly increased population in the Arc in the coming decades through densification of the existing sites. Therefore, the focus here is on the features related to the density, namely the dwelling density (number of dwellings per hectares, dph) for each neighbourhood and its surroundings.
To maximise the performance of the classifier, and allow the classifier to generalise well outside the labelled (training) dataset, the following steps are used: (i) Separate the labelled dataset into a training and validation set (75% of the data) and a test set (25% of the data). (ii) Train a model using solely the training data set. (iii) Use the trained model to predict output values for pixels in the test data set. (iv) Compute a test error from the discrepancy between the predicted outputs and the known labels. The number of total labelled data is 1655 which is obtained from the existing literature (e.g., Bibby and Brindley, 2013) and expert knowledge and validation size is 414 samples (25% of labelled data). Most ML models include hyper-parameters which must be tuned to obtain the best model performance for a given dataset. The hyper-parameters are tuned during the training of the model, using a procedure called k-fold cross validation.
To measure the performance of the classifier (by estimating the test error), we use Precision (% of correct predictions over all predictions of a class), Recall (% of pixels of a class being correctly classified), and the f1-score (the weighted average of precision and recall, varying between 0 and 1) and give a measure of incorrectly classified pixels. We also use the accuracy estimation, which is a classical error measure for classification tasks. The accuracy for the individual classes is the same as Recall and computes the percentage of pixels of a class being correctly classified in the test set, using the model built on the training set. Table 4 shows the results of residential classifications based on their spatial locations (Centre, Urban, Suburban, Rural) and the model performance.
The general test accuracy is 80%, while the highest accuracy is for the Urban class (91%) and the lowest accuracy is for the Rural class (38%). The general train accuracy is 0.95%, while out-of-bag (OOB) accuracy is 0.76%. OOB accuracy is a validation technique used in RF algorithms, which exploits the randomness introduced through bagging.

Exploratory data analysis (EDA) using box plots
To develop density scenarios for each residential class (Centre, Urban, Suburban, Rural) and estimate the housing capacity in Oxfordshire, we use exploratory box plot analysis (see results in Section 3.5; cf. Fig. 2 workflow diagram). Box plotsalso called box and whisker plotsare commonly used graphs to visualise data (Everitt & Skrondal, 2010;Tukey, 1977). The plots are particularly useful for revealing the central tendency and variability or dispersion of the data, the distribution (symmetry or skewness) shape, as well as the possible presence of outliers. Moreover, box plots are also a powerful graphical technique for  comparing samples from two or more different populations. Each box plot is made of five key components which together provide information about the distribution of the data. These components are shown in Fig. 4: The second quartile is denoted by Q2 and corresponds to the median (50th percentile), that is, the middle value of the dataset. The first quartile and the third quartile are denoted by Q1 (corresponds to the 25th percentile) and Q3 (corresponds to 75th percentile), respectively. Lower (inner and outer) and upper (inner and outer) fences determine the data values with the mild and extreme outliers. The fences (envelope) are defined as follows: where IQR denotes the interquartile range (IQR = Q3 − Q1), and is a measure of variability around the median. The upper whisker and lower whisker connect the hinges with the fences. All individual points beyond the lower and upper fences are represented as outliers. Points beyond the inner fences in either direction are mild outliers; points beyond the outer fences in either direction are extreme outliers. In this study, the box plots including the median (50th percentile) and Q3 (corresponding to 75th percentile) as well as upper inner fence and upper outer fence (Eqs. (1) and (2)) are used to build up the density scenarios (low, medium, high) for each residential class and to estimate the housing capacity in Oxfordshire (cf. Section 2.5).

Development of density scenarios and housing capacity estimation
The housing capacity is defined as the number of dwellings that can be built on the residential land (potential land allocated for housing in the Local Plans as well as any additional brownfield sites). The estimated housing capacity depends strongly on the housing density. We estimate the housing capacity using two approaches depending on the available GIS data for the location and exact size and shape of the housing allocation sites. In the first approach (cf. Section 3.3), we propose a uniform density across each LADs for the entire Arc region (26 LADs). This is because detailed GIS spatial location data on the potential land for housing is unavailable except for the Oxfordshire. In this approach, the assumed dwelling density is partly based on different density bands, suggested by several Local Plans (e.g., Oxford City Council Local Plan 2036, 2018), and partly based on the results of current average dwelling density in all the LADs. We then estimate the housing capacity for different density bands for the entire Arc region (26 LADs).
In the second approach (cf. Section 3.5), we propose a non-uniform density to estimate the housing capacity in Oxfordshire (5 LADs). This is because GIS data for the spatial location of the allocated sites for housing in the Arc is available only for Oxfordshire through the Oxfordshire County Council. In this second approach, we use a datadriven method to develop different scenarios of dwelling densities in dph, (low, medium, and high density) for the allocated lands of housing development, depending on their classes (Centre, Urban, Suburban, Rural). In addition, we consider the natural capital scores (scores range from 0 to 10, cf. Section 2.6) in combination with the residential classes so that land which had a high value (high natural capital scores) is protected from housing development.
As future dwelling density is expected to be 'at least' as high as in existing residential areas, the scenarios consider the maximum existing density, dph, for a given residential class and for each natural capital score (excluding the outliers). To obtain the maximum density (dph), we compute the 50th percentile (median), the 75th percentile, and the interquartile range, IQR (the difference between the 25th and the 75th percentile of the existing dwellings, dph). We also set the range of all 'inliers', that is, the envelope of the data for Eqs. (1) and (2), and thus remove all the outliers. The low-density scenario for different residential classes (Centre, Urban, Suburban and Rural) is based on the median (the 50th percentile) of existing dwelling density for a given natural capital score. The medium-density scenario is based on upper inner fence (75th percentile + 1.5 × IQR), and the high-density scenario is based on upper outer fence (75th percentile + 3 × IQR). To convert the envelope to a limited number of non-uniform density scenarios and to smooth the envelope, we finally compute the average envelope value for each residential class across each natural capital scores (cf. Section 3.5).

Natural capital
Natural capital refers to those factors or elements of nature that directly or indirectly create value for humans. These include ecosystems, species, soils, rocks, water, air, and the natural processes that link these elements and sustain life (Natural Capital Committee, 2013). Natural capital provides essential services for human wellbeing and health. We consider the natural capital value of the land, so that land which had a high value for food production, ecosystem services or biodiversity is protected from housing development. The natural capital for the land potentially available for housing development in the Arc is assessed using a matrix of indicative scores where the scores range from 0 to 10 (Smith et al., 2017. The scores reflect the ability of different habitat types to deliver 18 different ecosystem services in three main categories. These are (i) provisioning services (e.g., food crops, wood, and fish), (ii) cultural services (e.g., recreation, sense of place, education, and knowledge), and (iii) regulating services (e.g., flood control, water quality, and air quality).
The method is applied to Oxfordshire (cf. Fig. 2 the workflow diagram) to produce natural capital maps using detailed habitat and land use data, agricultural land class, and habitat designations (Smith, 2019). Although 18 maps were developed for 18 different ecosystem services, it may be more useful for decision makers to provide a combined map. However, in theory, scores for different services cannot be added together or averaged because they are not in common units. Therefore, to create a combined map, we instead show the 'maximum' score out of all 18 services for each land parcel. Thus, the score (0 to 10) for each land parcels show a 'maximum' score for at least one of the 18 services (Smith et al., 2017.

Results
The residential classification and analyses of the dwelling density (dph) for the whole Arc and for each LAD are presented in Sections 3.1 and 3.2. The development of the density scenarios, regardless of urbanrural differences, and the estimated housing capacity for the Arc region using the uniform density approach are presented in Section 3.3. The development of alternative density scenarios (low, medium, high) for each residential class (Centre, Urban, Suburban, Rural), considering also the natural capital scores and estimates of housing capacity for Oxfordshire using a non-uniform density approach, are presented in Sections 3.4 and 3.5.

Residential classification prediction and its spatial characteristics
Using ML Random Forests algorithm, the residential neighbourhoods in the Arc are classified into 4 classes based on their spatial locations, namely: Centre, Urban, Suburban, and Rural. In Fig. 5a we show the results of the residential neighbourhood classification within the Arc (cf. Fig. 2 workflow diagram). The geographical locations of several large cities in the Arc are shown in the map. To assess the potential of densification for the lands from the Local Plans and potential brownfield sites (Table 3) we first analyse the existing dwelling densities for different classes. We compute and visualise the dwelling density at the scale of 100 m × 100 m to be consistent with the Local Plan and the National Planning Policy Framework (NPPF) use of unit namely, dwellings per hectares (dph). The spatial characteristics of residential neighbourhoods depend largely on the surrounding environment, as the importance of the 'surrounding' features shows (Fig. 3a). Applying the classification model at a scale smaller than 500 m × 500 m would probably have resulted in many misclassifications. Fig. 5b shows the dwelling density per hectare (100 m × 100 m) in the Arc. Fig. 6 shows the average dwelling density (dph) for 26 LADs in the Arc region for each residential class. The average dwelling density for Centre, Urban, Suburban, Rural is rarely above 30 dph, which is the minimum threshold set by policy. There are some fluctuations in the variation in dwelling densities, but most districts show a general decrease in dwelling density from Centre to Rural. Of all the districts, the dwelling density in the Centre is highest in Luton (51 dph) followed by that of Oxford city and Northampton (both 43 dph). The cities of Oxford and Cambridge have the highest dwelling density in Urban (both 30 dph), followed by Suburban (27 dph in Oxford and 28 dph in Cambridge), and Rural (24 dph in Oxford and 27 dph in Cambridge). Of all  the districts in the Arc, Chiltern has the lowest dwelling density with 12 dph in Urban, and 10 dph in Suburban and Rural. No residential pixels in Luton are classified as Rural.

Density analysis
Density analysis is the core of many studies focusing on characterising urban form (e.g., Jiao, 2015) and urban (regional) compactness and expansion (Mohajeri et al., 2016). We analyse the variations in dwelling density (dph) for the 26 LADs in the Arc (i) for different classes namely, Centre, Urban, Suburban, and Rural, and then (ii) as a function of distance from the city centres. These analyses provide a better understanding of the range of dwelling densities between and within the LADs as well as in the entire Arc. The analyses yield results as to how the dwelling density may change from the city centre outward irrespective of the administrative boundaries. Furthermore, the results provide useful information on the development of density scenarios. The variations in dwelling density for the entire Arc region, and for different residential classes, are analysed using box-plot assessments (Fig. 7). The range is greatest in the class Centre, and least in the class Rural. The average dwelling density decreases from 36 in the class Centre to 14 in the class Rural for the entire Arc region.
Further details of the variation in dwelling density amongst the 26 local authority areas are given in Fig. 8. While the general trends in Fig. 7 are still generally maintained at the scale of individual LADs, there are exceptions. For example, several of the LADs do not show the highest dwelling density in the class Centre, but rather in the class Urban. These include South and West Oxfordshire, as well as South Buckinghamshire and South and East Cambridgeshire. In fact, South Buckinghamshire does not show a gradual decrease in dwelling density from the class Centre to the class Rural at all, but rather an increase. This is because the South Buckinghamshire has a disperse population with no clear and distinct centre. There are several other LADs that show very small difference in density between the class Centre and the class Rural. Yet, all except South Buckinghamshire have somewhat less dwelling density in class Rural than in class Centre (Fig. 8).
To explore how the dwelling density (dph) changes with distance from the city centres outward, irrespective of the administrative boundaries, and in relation to the residential classification used here (Centre, Urban, Suburban, and Rural), we selected three cities (3 LADs), as examples (Fig. 9). These cities are Northampton, Cambridge, and Oxford. They are among the cities with the highest average density, dph, in the entire Arc region. We use a standardised method namely, concentric ring partitioning (e.g., Jiao, 2015;Schneider & Woodcock, 2008) to create 500 m buffers that increase stepwise from the centre of the city, which is here taken as the central business district, CBD (Fig. 9a). We then calculated the average dwelling density in each annulus (the area between a pair of adjacent concentric rings). The results for these three cities show that the density is highest close to the city centre, where it reaches 15 to 26 dph on average, and then decreases with distance from the centre (Fig. 9b). Because the centre itself may contains many office buildings and other non-residential buildings (e.g., shopping centres), it follows that the dwelling density peaks at a certain distance from the centre rather than exactly in the centre itself. In Northampton, the peak dwelling density occurs at a distance of about 1000 m from the centre, in Cambridge at about 1500 m from the centre, and in Oxford at 1500-2000 m from the centre. These differences may be also partly due to the definition and exact geographic location of the 'city centre'. In general, the rate of the decline of dwelling density from the centre of a city outward varies with the distance from the centre. We show (Fig. 9b) that the dwelling density decreases slowly in the centre, then decreases relatively quickly in the urban and suburban areas, and finally decreases slowly again in the rural.
We also estimated the percentage of dwellings belonging to each of the 4 classes within each annulus. The results (Fig. 9c) show that in the city centre, almost all dwellings belong to the class Centre. With increasing radial distance from the city centre, however, the percentage of dwellings belonging the class Centre diminishes rapidly, and, at the same time, the percentage of dwellings in the class Urban increases. The class Urban reaches to or close to the edges of the cities, but the percentage of dwellings in this class diminished rapidly beyond a radius of about 3500-4000 m. There the class Suburban takes over and is largely maintained out to the edges of the cities. The class Rural is becomes significant when the radial distance from the city centre exceeds 4000-5000 m (although this class also occurs closer to the city centre) and the percentage of dwelling in this class gradually increases to the outer boundary or edge of the city. The percentage of pixels with no dwellings (marked in grey) gradually increases with increasing radial distance from the city centre.

Estimated housing capacity for the Arc region (uniform density approach)
The potential land allocated for housing development, according to the Local Plans of the respective authorities, as well as the potential brownfield lands are shown in Fig. 10 and Table 3. The potential land collected from Local Plans of the LADs for the period up to 2035 varies greatly. Some LADs, such as East Cambridgeshire, Luton, and Wellingborough, have hardly any potential land (from Local Plans and brownfield lands) for housing development, while other LADs have much potential land, such as Aylesbury Vale, Central Bedfordshire, Cherwell, Huntingdonshire, Milton Keynes, and South Cambridgeshire. The great difference in the potential for densification in the areas of the local authorities must be considered when planning the future of the Arc. Available brownfield land in the Arc is limited (Fig. 10). Only the local authorities of Peterborough and South Cambridgeshire, and to a much lesser degree, Bedford and Central Bedfordshire, can provide reasonably large such lands.
To estimate the housing capacity for the entire Arc up to the year 2035, the following housing growth scenarios are considered (Fig. 1): Baseline scenario, Unplanned scenario, Expansion 23k scenario, and Expansion 30k scenario (see Section 2.2). We estimate the housing capacity for 6 dwelling density bands: the first one is 23 dph, which is the average density for all LADs in the Arc region. The other density bands, which are partly obtained and modified from several of the Local Plans, are as follows: 30-50 dph, 50-70 dph, 70-90 dph, 90-110 dph, and > 110 dph. For the estimation of housing capacity, we use the lower level of density bands. For uniform density approach, we assume that the dwellings have the same density across the available lands for a given density band. Thus, the differences in density between classes (Centre, Urban, Suburban, Rural) within which the new dwellings would be located is not considered -because lack of GIS spatial location data for  Considering the potential land for each LAD (Table 3), we estimate first the housing capacity for the Arc for a given density band (multiplying the potential land by the lower level of density bands). Then we assess the percentage coverage of estimated dwellings needed by 2035 in each of our four housing growth scenarios. The difference between the total number of dwellings in the year 2020 and the estimated total number of dwellings for the four housing growth scenarios by 2035 (Table 2) is used to estimate the percentage coverage of dwellings needed for the target year. The percentage is obtained from the dwelling capacity in the first step divided by difference between the estimated total number of dwellings in years 2020 and 2035. The percentage coverage of dwellings needed indicates how much of the dwelling in the Arc up to the year 2035 could be covered by the potential brownfield lands or potential lands from the Local Plans. The results for the whole Arc are shown in Fig. 11a.
For the highest housing growth scenario (Expansion 30k), we estimate that brownfield land can cover from 19% of the dwellings needed at 23 dph up to 48% at >110 dph (Fig. 11a, bold colour bars). Here we assume there are spatial constraints, meaning that each LAD must cover its own dwelling needs. However, while most LADs have a shortfall, there is a surplus of unused brownfield land in some LADs (e.g., South  Cambridgeshire and Peterborough). If spatial constraints were relaxed then this surplus land could be used to cover demand from another LADs. The percentage coverage of additional dwellings on surplus land across the Arc is shown by the pale colour bars in Fig 11a. The total height of the bold and pale colour bars includes both spatial constraints and relaxed constraints in Fig 11a. Considering the spatial constraints (bold colour bars in Fig. 11a), we show that at 23 dph brownfield lands could cover between 19% and 32% of the estimated dwellings needed for all four housing growth scenarios in the Arc. At densities of 30 dph, the minimum required density set by the current policy, the dwelling demand covered by brownfield lands is between 24% and 38% for all four housing growth scenarios. At 50 dph and 70 dph, the dwelling demand covered is 31-50% and 38-60%, respectively, and at the very high density of 90 dph and 110 dph, the dwelling demand covered by brownfield lands is 43-66% and 48-71%, respectively, for all the four housing growth scenarios. However, when the spatial constraints are relaxed, then brownfield lands could cover a large portion of the estimated dwelling needed, but only for very high housing densities (Fig  11a).
From the results on brownfield lands, it follows that additional land would be needed to cover the increase in dwellings in the Arc up to 2035 in our four housing growth scenarios if spatial constraints are applied. Also, the dwelling needs could not be covered by LADs with surplus brownfield lands (Fig. 11b pale colour bars). However, the required additional land is significantly less at densities over 50 dph and can decrease to zero at high densities if demand can be covered from other LADs which have surplus brownfield lands (Fig. 11b bold colour bars).
While there is much more land available from Local Plan allocations than from brownfields in the Arc (Fig. 10), when spatial constraints are used the potential land from Local Plans for some LADs is insufficient to cover the estimated dwelling demand for surplus land in some of the housing growth scenarios (even though there is surplus allocated land in many LADs). With spatial constraints the estimated number of dwellings Table 2 Estimated total number of dwellings in Arc for the year 2020 and for four different housing growth scenarios by 2035 based on population growth and the average number of dwellings completed in the past up to the year 2017 (Source: ITRC, 2020). Also shown are the differences in the total number of dwellings for each scenario between the year 2020 and 2035 (additional dwellings needed).  by 2035 for the 'Expansion 23k scenario' and the 'Expansion 30k scenario' cannot be covered at any of the proposed housing densities (Fig. 11c, bold colour bars). If the spatial constraints are relaxed, however, then even at the lowest dwelling density (23 dph) the dwelling demand by 2035 for all our scenarios can be fully covered by the potential land made available through Local Plans (Fig. 11c, the total height of the bold and pale colour bars). Fig. 12a shows a natural capital map with scores from 0 to 10 for the entire Oxfordshire. Fig. 12b shows the natural capital scores clipped for the land allocated for housing development in Oxfordshire up to year 2035 through Local Plans and Fig. 12c shows the location of Oxfordshire in the Arc. The assignment of residential class for the potential lands in Oxfordshire is obtained based on the nearby pixels (maximum number of each class) when overlapping the predicted ML residential classification with the allocated land map clipped in Oxfordshire (Fig. 12c). The predicted residential class and the natural capital scores are then assigned to each pixel of allocated lands (100 m × 100 m). To show the variations of natural capital scores for the lands allocated for housing development in different classes, we present the scores and their relative frequency distributions for the four residential classes (Fig. 12d). The histograms indicate that the natural capital scores commonly peak at around 2. This comparatively low value is partly because the lands are located in dense urban areasprimarily as disconnected and commonly small-area grasslands. There is, however, a shift to the right in the frequency distribution when moving from the classes Centre and Urban to the classes Suburban and Rural. This is as expected since the factors that create high land value for humans in relation to natural capital, as defined in Section 2.6, become more abundant and interconnected when moving out from the urban centres. When plotted against dwelling density (dph) the natural capital scores also tend to peak at about 2, and higher scores are seen at lower dwelling densities, but the distributions vary widely (Fig. 12e). For classes Suburban and Rural, the peaks are clearly defined at a score of 2, and then gradually diminish at higher scores. For class Urban the shape of the distribution is similar, but the peak is shifted to lower scores, at about 1.5. The scores for class Centre peak at about 1 and have also the most dispersed distribution. In the next section, we explain how we used these scores to develop dwelling density scenarios and estimate the housing capacity in Oxfordshire for the year 2035.

Estimated housing capacity for Oxfordshire (non-uniform density approach)
Because Oxfordshire (5 LADs) is the only county of the Arc for which the GIS spatial data for locations of lands are available, the present nonuniform density approach to estimate the housing capacity is applied exclusively to Oxfordshire. We assume that the density of new dwellings, depending on the spatial location of land, will be different for each urban class (i.e., Centre, Urban, Suburban, Rural). It is also assumed that no new dwellings will be built in the areas with the highest natural capital scores, that is, with scores of 8-10, and that the greatest number of new dwellings will be in the areas with the lowest natural capital scores, namely 0-2 (Table 5). The results for the low-density scenario ( Fig. 13 and Table 5) are presented so that the dwelling density (dph) for each pixel of allocated land (size of 100 m × 100 m) is plotted against the natural capital scores. The broken line (Fig. 13) shows the median (50th percentile) of the plotted data. To convert the median (50th percentile) to a limited number of low-density scenarios, we compute the average median value of the density across natural capital scores with a range of 1 (0-1, 1-2, 2-3, 3-4, etc.). The column bars (Fig. 13) present the density scenarios for each natural capital score and for each residential class (Table 5). The maximum dwelling density of class Centre is 42 (for score of 0-1), that of class Urban is 30 (for score 1-2), that of class Suburban is 23 (for score 0-2), and that of class Rural is 20 (for score 0-2).
The results for medium-density scenario are shown in Fig. 14 and Table 5. The broken line shows the 75th percentile of data plotted. We set the range of all 'inliers', that is, the envelope of the data (Eq. (1)) to 75th percentile + 1.5 × IQR (dotted line), and hence remove all outliers (mild and extreme). To convert the envelope to a limited number of medium-density scenarios and to smooth the envelope, we compute the average envelope value across natural capital score with a range of 1. The column bars in Fig. 14 present the density scenarios for each natural capital score and for each residential class (Table 5). The maximum dwelling density of class Centre is 124 (for score of 0-1), that of class Urban is 74 (for score 0-1), that of class Suburban is 50 (for score 0-2), and that of class Rural is 44 (for score 0-2). Similarly, the results for a high-density scenario are presented in Fig. 15 and Table 5. We set the range of all 'inliers', that is, the envelope of the data (Eq. (2)) to 75th percentile + 3 × IQR (dotted line), and hence remove all extreme outliers. We compute the average envelope value across natural capital score with a range of 1, to convert the envelope to a limited number of highdensity scenarios. The column bars in Fig. 15 present the density scenarios for each natural capital score and for each residential class (Table 5). Here the high dwelling density of class Centre is 190 (for score of 0-1), that of class Urban is 110 (for score 0-1), that of class Suburban is 69 (for score 0-2), and that of class Rural is 62 (for score 0-2).
We have made several adjustments to the envelopes of each class. These are as follows: (i) For all classes, we set the value of data with a natural capital score larger than 8 to 0 (Table 5;  . We assume no densification is expected in lands with a natural capital score larger than 8. This is to protect the land with high value natural capital from housing development. (ii) For all classes, we set the minimum value of density with a natural capital score less than 8 at 30 dph because this is the minimum required density set by the current National policy. (iii) Both envelopes are averaged for data with a range (intervals) of natural capital score equal to 1 because the envelopes are smoothed and are monotonously decreasing with natural capital score. (iv) For the class Centre, the envelope curve (dotted line) is not monotonously decreasing as the line show many spikes. This is partly because the amount of data for class Centre is relatively low and hence statistically less representative. We therefore use a moving average filter, which is a common statistical technique, to smooth the envelop curve and reduce noise in the dotted line. (v) For the classes Suburban and Rural, the average envelope value across natural capital score 0 to 1 is set to the same as that of with score range from 1 to 2. This is because the number of data points in the range 0 to 1 is low and hence not very representative.

Table 4
Model performance results for testing data for each residential/location class. The diagonal numbers of the table (in bold) show the number of pixels (size 500 m × 500 m) belonging to each class that are correctly classified as being part of that class. Using the above results, the housing capacity (total number of dwellings) demand by 2035 for the low, medium, and high-density scenarios are estimated for the 5 LADs in Oxfordshire (Table 6). In addition, for the three developed density scenarios we estimate the percentage coverage of dwelling by 2035 for the four housing growth scenarios in each of the LADs in Oxfordshire. Using spatial constraints, all the LADs except Oxford City can cover the dwelling demand for every housing growth scenario (Fig. 16, bold colour bars). For the Baseline scenario, the coverage in Oxford City reaches 100%, but for all the other scenarios the coverage is at or somewhat below 100%. The pale colour bars in Fig. 16 show the percentage of the dwelling by 2035 that can be covered by 'unused' (surplus) land from the Local Plans, such as is available in all LADs in Oxfordshire except in Oxford City.
If the spatial constraints are relaxed then all the Oxfordshire districts can cover the dwelling demands (and much more than needed) for every housing growth scenario, except Oxford City (Fig. 16, the total height of the bold and pale colour bars). The coverage in Oxford City for the Baseline scenario is well above 100% (for low to high density scenarios). For the Unplanned scenario, the coverage in Oxford City is above 100% for the medium and high-density scenarios and below 100% for the low density scenario. For the two Expansion scenarios (23k and 30k), the coverage is below 100% for all the density scenarios (Fig. 16).

Discussion
The Machine Learning (ML) Random Forests method, as a decision support tool, allow us to process large amount of data and extract useful information for supporting and providing decisions (Casali et al., 2022;Tekouabou et al., 2022). Here we use ML to classify residential neighbourhoods at the regional scale for the Arc region. Among the most important variables (features) in the classification model is the dwelling density for the surrounding pixels. The surrounding pixels play an important role in the classification, indicating that the residential class of the neighbourhoods cannot be estimated individually but should be considered in the context of the surrounding neighbourhoods. The residential classification helps us to develop the density scenarios depending on each class and estimate finally the future housing capacity in the Arc.
According to the NPPF (2018), a minimum density of 30 dph is recommended for new residential development of urban areas in England and much higher densities (>50 dph) around town centres and key transport nodes. Analysing the variations in dwelling density for the entire Arc region and for the 26 LADs based on different residential classes namely, Centre, Urban, Suburban, and Rural is an important step in residential classification. We use the exploratory data analysis to develop high, medium, and low housing density scenarios across these classes. We integrate the natural capital value of the land in the density-scenario development, so that land that has a high value for food production, ecosystem services, or biodiversity is protected from development. The methods of analysis developed here provide a better understanding of spatial planning than traditional methods of urban densifications at regional scale (Casali et al., 2022;Eggimann et al., 2021;Tekouabou et al., 2022) and can be applied to other regions in the UK (and for other countries). Here these methods have been used to provide accurate estimates of the housing capacity for the potential land identified in Local Plans and as brownfield lands in the Arc under different density scenarios.
For the entire Arc region, the average dph decreases from 36 in the class Centre to 14 in the class Rural (Fig. 7). Although there are some fluctuations in the variation in dph amongst the 26 LADs, the average dph for the Centre class in most of the LADs is rarely above the recommended density of 50 dph and for Urban, Suburban, Rural classes is rarely above the recommended density of 30 dph (Fig. 6).  To estimate the housing capacity for each housing growth scenario, we first developed density scenarios using two approaches, namely uniform density and non-uniform density, depending on the availability of GIS spatial land data. The results show that when using spatial constraints and uniform density the potential brownfield land is not sufficient to cover the estimated number of dwellings for the considered densities (dph) in the Arc by the year 2035. For the land available from Local Plans, if spatial constraints are applied, the estimated demand by 2035 for the 'Expansion 23k scenario' and the 'Expansion 30k scenario' cannot be covered 100% for any of the assumed dwelling densities (dph). However, if the spatial constraints are relaxed, then no additional land would be needed in the Arc to cover the estimated new dwellings up to the year 2035 when using the uniform dwelling density of 23 dph or higher. For non-uniform dwelling density in Oxfordshire, the results show that, with or without spatial constraints, apart from Oxford City, all the districts can cover the dwelling demand for the housing growth scenarios up to 2035. Oxford City, however, accounted for 48% of the dwelling demand in the Oxfordshire and can only cover the demand from the Baseline housing growth scenario for low, medium, and high density, if spatial constraints are applied (Fig. 16). The present results indicate that the potential brownfield lands together with the land made available through Local Plans can apparently cover most of the estimated number of dwellings in the Arc up to the year 2035. More detailed studies on this topic in the future, however, could consider several additional points that explore further the limitations of current study and the scope for expanded research.
First, the housing capacity is estimated considering both spatial constraints and no spatial constraints. Relaxing the spatial constraints is related to the 'duty to co-operate', a legal test that requires cooperation between local planning authorities and other public bodies to maximise the effectiveness of policies for strategic matters in Local Plans. This was introduced in the 2011 Localism Act as a strategic planning mechanism, particularly in response to the housing crisis. The duty states that councils must engage continuously and constructively with their neighbours on strategic cross-boundary issues like housing and infrastructure provision. In practice, however, the duty to cooperate has been subject to criticism and its effectiveness is uncertain. While our housing capacity results based on relaxing the spatial constraints are valid, their implementation depends on more effective approach for strategic planning. Despite the complexity of 'duty to co-operate' in the UK, there exist successful examples as regards cooperative urban land development in Germany (Koetter et al., 2021) and in South Africa (Howe, 2022). These authors show that cooperative urbanism can provide a sufficient stock of affordable housing to residents and address partly the Fig. 13. Low dwelling density scenario (dwellings per hectare) for each residential/location class, namely Centre, Urban, Suburban, and Rural, based on exploratory data analysis (the 50th percentile). The density ranges are shown in Table 5. socio-economic inequality challenges of the region.
Second, while the estimated number of dwellings by 2035 may be partly accommodated by the available lands through densification, it does not necessarily follow that the existing infrastructure is of sufficient quality to deal with the increased population. Thus, in some of the LADs the land may be available for densification and population increase but the water supply, the bus services, and the road/transportation network, for example, may not be able to handle the additional demand.
Third, some of the Local Plans on which the present analysis is based date back to the year 2011, while others are from 2016. It follows that some of the land made available then may already have been used. Thus, part of the brownfield lands, and the land from the Local Plans, may already have been built on. Additionally, the GIS data for spatial location of the lands for all LADs in the Arc is not accessible. This leads to more rough estimation of housing capacity for the entire Arc region (uniform density) in compared to the detailed analysis of the nonuniform density for Oxfordshire. Detailed analysis of housing capacity for the entire Arc region is subject to access to this spatial data in the future research.
Fourth, as regards the implementation, it is not clear if the local authorities would accept a dwelling density even as low as 23 dph, not to speak of the much higher values such as 50 dph or 70 dph. Parts of the Arc have densities lower than 23 dph and an increase in dwelling density urban densificationis a matter that may not be readily acceptable by the people living in the neighbourhood or, more generally, by the local authorities. One reason is that urban densification may lead to 'overcrowding' and increase local environmental impacts, for example from traffic and noise (Williams, 2000;Newman, 2005;Brunner & Cozens, 2013;Howley et al., 2009;Melia et al., 2011). Another reason for possible negative local attitude to densification is the potential loss and socio-spatial inequalities of green space distributions and diminished access to green urban infrastructure following densification (Lin et al., 2015).
Urban residential densification at a regional scale, while providing an opportunity to tackle multiple sustainability challenges (e.g., land efficiency; affordable housing), may compete with the demand of land for food production, nature recovery and other 'ecosystem services'. The potential available land for housing in the Arc obtained from the developed Local Plans and brownfield lands can accommodate an increase in population from its present 3.7 million people to as many as 4.7 million, by 2035, and 5.4 million, by 2050 based on the Expansion scenario (ITRC, 2020; Fig. 1). Such a large-scale increase in housing, however, might impact negatively on, and lead to trade-offs with, the land needed for food production and nature recovery (Lin et al., 2015).

Fig. 14.
Medium dwelling density scenario (dwellings per hectare) for each residential/location class, that is, Centre, Urban, Suburban, and Rural based on exploratory data analysis (upper inner fence: 75th percentile + 1.5 * IQR). The density ranges are shown in Table 5.
The potential negative impact could, however, be reduced through urban densification that would (i) focus new housing on land that has been previously developed ('brownfield land'), as well as specified lands by the Local Plans, rather than undeveloped ('greenfield') land (while recognising that some brownfield sites have high biodiversity value); (ii) adopt more compact development patterns with higher housing density; and (iii) avoid land with the highest natural capital value, i.e. the most valuable land for ecosystem service delivery.

Conclusions
This study combines the use of Machine Learning Random Forests algorithm and exploratory data analysis (EDA) to provide a decision support tool. This tool is used to classify the existing residential neighbourhoods and their spatial characteristics at regional scale in the Oxford-Cambridge Arc, propose density scenarios, and finally estimate the housing-capacity for the potential residential lands in the Arc. Based on the UK case study, we conclude that the presented ML methodology makes it possible to assess the variations in housing density from the centre to rural locations across the Arc region. Furthermore, considering 'natural capital' value of the land when developing the density scenarios, the exploratory data analysis (EDA) method allows to quantify the potential for densification to reduce pressure on land for housing development in the two following ways: • An initial analysis that assumes uniform densification across all Arc districts, irrespective of urban-rural differences, shows that at the current average housing density of 23 dph brownfield land can cover only 19% of housing growth for a scenario of 30,000 dwellings per year from 2020 to 2035, increasing to 48% at a very high density of 110 dph.

Fig. 15.
High dwelling density scenario (dwellings per hectare) for each residential/location class, that is, Centre, Urban, Suburban, and Rural based on exploratory data analysis (upper outer fence: 75th percentile + 3 * IQR). The density ranges are shown in Table 5.

Table 6
Total number of dwellings resulting from different density scenarios by 2035 based on land availability from Local Plans (left column in Table 3) and • A more detailed analysis was carried out for Oxfordshire, where spatial data on land allocated for housing development is available, assumes a non-uniform density, that is, different densities in urban and rural areas (based on classified land as Centre, Urban, Suburban, Rural). This analysis also assumes that high-value natural capital areas are projected (excluded) from housing development. For the scenario of 30,000 new dwellings per year, the land allocated in Local Plans could cover housing growth in the four LADs but not in Oxford City itself (which accounts for 48% of the demand). In Oxford City only 19% of the need would be covered in a low-housing density scenario (which on average is from 10 dph in the Rural to 20 dph in the Centre) but 59% at high density scenario (which on average is from 26 dph in Rural to 86 dph in the Centre).
While the method has some limitations, as described above, our study suggests a reliable method for quantifying how the negative impacts of housing growth on natural capital can be significantly reduced using more compact development patterns, protection of land with high value natural capital, and the use of low biodiversity brownfield sites where available. The study brings data-driven decision-making processes to the level of Local Authority and policy makers in order to support sustainable housing development at the regional scale.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request. Fig. 16. Percentage coverage of estimated number of dwellings for each housing growth scenario by 2035, based on different density scenarios (from low, to medium, and then high density) and for different residential classes in each LAD in Oxfordshire. This figure is based on the information provided in Table 2 and  Table 6. Bold colour bars indicate the% coverage of dwelling demand considering spatial constraints, whereas the pale colour bars indicate spatial constraints are relaxed.