University of Birmingham Representing the dwelling stock as 3D generic tiles estimated from average residential density

Forecasting the variability of dwellings and residential land is important for estimating the future potential of environmental technologies. This paper presents an innovative method of converting average residential density into a setofone-hectare3Dtilestorepresentthedwellingstock.Thesegenerictilesincluderesidentiallandaswellasthe dwelling characteristics. The method was based on a detailed analysis of the English House Condition Survey data and density was calculated as the inverse of the plot area per dwelling. This found that when disaggregated by ageband,urbanmorphologyandareatype,thefrequencydistributionofplotdensityperdwellingtypecanberep-resented by the gamma distribution. The shape parameter revealed interesting characteristics about the dwelling stockandhowthishaschangedovertime.Itshowedaconsistenttrendthatolderdwellingshavegreatervariability in plot density than newer dwellings, and also that apartments and detached dwellings have greater variability in plotdensitythanterracedandsemi-detacheddwellings.Oncecalibrated,theshapeparameterofthegammadistri-butionwasusedtoconverttheaveragedensityperhousingtypeintoafrequencydistributionofplotdensity.These werethenapproximatedbysystematicallyselectingasetofgenerictiles.Thesetilesareparticularlyusefulasame-dium for multidisciplinary research on decentralized environmental technologies or climate adaptation, which requires this understanding of the variability of dwellings, occupancies and urban space. It thereby links the socioeconomic modeling of city regions with the physical modeling of dwellings and associated infrastructure across thespatialscales.Thetiles methodhasbeenvalidatedbycomparing resultsagainstEnglishregionalhousing surveydataanddwellingfootprintareadata.Thenextstepwouldbetoexplorethepossibilityofgeneratinggeneric residential area types and adapt the method to other countries that have similar housing survey data. © 2015 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
There has been an increasing emphasis on understanding the building stock and how to reduce the consumption of energy and production of waste (Kohler & Hassler, 2002).Much of this research has focused on the buildings themselves and predominantly their energy consumption (Kavgic et al., 2010).This has normally used typologies that correspond with the classification of national housing stock data such as dwelling types, age bands and fabric.Examples include building energy models (Firth, Lomas, & Wright, 2010;Cheng & Steemers, 2011) and studies of the building stock and energy efficiency such as McKenna, Merkel, Fehrenbach, Mehne, and Fichtner (2013); Ballarini, Corgnati, and Corrado (2014); Filogamo, Peri, Rizzo, and Giaccone (2014) and Mata, Sasic Kalagasidis, and Johnsson (2014).
However, there is increasing recognition that decentralized supply technologies are also important for helping to meet government environmental targets but uncertainty about whether properties have the space required for installation has been identified as a barrier to implementation (DECC, 2012, Sept. 20th).For example, the dimensions of gardens, roof space or cluster size will affect the feasibility of some technologies such as ground source heat pumps, rainwater harvesting and recycling.Modeling tools have been developed to assess the potential of environmental technologies (Hofierka & Kanuk, 2009;Girardin, Marechal, Dubuis, Calame-Darbellay, & Favrat, 2010;Lukac, Zlaus, Seme, Zalik, & Stumberger, 2013;Makropoulos, Natsis, Liu, Mittas, & Butler, 2008;Robinson et al., 2007) but these detailed simulations for relatively small areas require inputs on future urban form.
Planning policies, building regulations and incentive schemes are applied at national or regional level and require a long time scale and considerable investment to take effect.The outcome will depend on urban density, occupancies and whether dwellings are as existing or new build.Urban densities and occupancies will vary spatially within a city region as a result of the socio-economic pressures that drive the property market and shape urban form.Human factors are thought to account for a substantial amount of the variability of energy use in buildings.Yu, Fung, Haghighat, Yoshino, and Morofsky (2011) and Pereira and Assis (2013) showed how the increases over time in household energy consumption are spatially correlated with socio economic changes in income.A forecasting capability is therefore needed at regional scale to test the longer term impacts and cost effectiveness.
Regional-scale land use-transport forecasting models can provide a detailed top-down simulation of the supply and demand for land and floor space at the building parcel scale (Abraham, Weidner, Gliebe, Willison, & Hunt, 2005).This included GIS-based micro-scale modeling of floor space types and rental values for land parcels, with floor space categorized according to general building types.This reproduces the spatial layout and overall floor space but does not represent the size and variability of buildings.This reliance on mapping limits the capability to forecast the future urban form.In another example, a regional scale macro-model was linked to an UrbanSim model (Waddell et al., 2003), which simulated neighborhoods as 2.25 ha grid cells chosen from a set of 25 development types further defined by a range of residential units and non-residential floor space to create typical contiguous urban areas.These models aim to represent the actual land parcels and this leads to difficulties matching the data sources which makes the models very resource intensive to create and operate over large areas within a macro-modeling framework.
Computer graphic simulation methods are available to study builtform and its potential for sustainable technologies (Meinel, Hecht, & Herold, 2009;Vanegas et al., 2010;Wiginton, Nguyen, & Pearce, 2010;Jacubiec & Reinhart, 2013).They rely on location specific inputs of road networks and user specified attributes of land parcels and building shapes using mapping and aerial imaging.The outputs can be similar in complexity to the actual physical built environment, thus limiting the practical size of study area.An alternative is to use a theoretical simulation of built form.Some are based on metaphors for the urban development process such as (Crompton, 2012) who compared the variability of buildings to the variability of Lego™ pieces.Others such as Tuhus-Dubrow and Krarti (2010) have used optimization methods to estimate the most energy efficient building form.These theoretical approaches are not subject to the livability and commercial constraints that shape actual dimensions and are not empirically validated.Neither the graphic or theoretical simulation methods have the capability to forecast urban densities and occupancies.
The most promising approach for linking across the spatial scales is to use a statistical method of representing the variability of land per dwelling.Zhou and Kockelman (2008) recognized the advantages of this approach but were unable to fit a statistical function to their single family residential parcel size data.This is possibly because they were studying only one city and using GIS data instead of housing survey data.
The following sections of this paper present a unique method of estimating the variability of dwellings and residential land from the mean residential density.The parametric variability of dwellings and associated land is then represented by systematically selecting a set of discrete one hectare 3D tiles.The final part of the paper validates this 'tiles' method and discusses how this enables the modeling of dwellings and environmental technologies to be integrated with a regional forecasting model.This attains the important and previously difficult to achieve objective of linking models across spatial scales.
It is expected that this paper will be of interest for spatial modeling and urban simulation, particularly for forecasting the impacts of building scale interventions such as sustainable technologies and climate change mitigation.

Research context
This research was part of a project that tested spatial planning policies in combination with scenarios for decentralized sustainable technologies for London and its surrounding regions over a 30 year time horizon.The aim was to explore how the density and clustering of development would affect the potential of 'green' technologies for buildings, energy, transport, water and waste.This required a regional modeling framework with a forecasting capability.

Spatial interaction model
A regional Land Use Interaction and Social Accounting model (LUISA) was developed to forecast the spatial allocation of industry, employment, households and population.This was an aggregate static model based on input-output socioeconomic accounting tables linked to random utility discrete choice modeling of spatial allocation and travel behavior (Echenique, Grinevich, Hargreaves, & Zachariadis, 2013).Rents arise when there are constraints on the amount of production that can be assigned to a location.In order to balance demand and supply, the production prices and disutilities need to adjust, by generating rents.This process is dealt with endogenously within the model, using an iterative procedure.The model has bimodal accounting of monetary and nonmonetary disutility.The total monetary disutility includes the rents and building construction costs, and the total non-monetary disutility includes qualitative aspects such as what consumers of land are willing to tolerate in order to have a lower monetary rent, for example living in a high rise building.
For practical purposes the case study area was divided into zones, within which locations were assumed to be homogenous.Once constructed, the model was calibrated so that its outputs matched the base year data on spatial production and consumption and prices per zone.The regional economic and demographic projections, future land availability and transport improvements are then input to the model to test scenarios for the forecast year.The forecasts included households and population per zone by socio-economic classification.The challenge was to convert these aggregate outputs per zone into a realistic estimate of the future dwelling stock and occupancies so that scenarios could be tested for decentralized sustainable technologies.It was important that this included the variability of land per dwelling because this affects the potential for sustainable technologies.

Analysis of the English House Condition Survey data
The English House Condition Survey (EHCS) is a detailed source of data on the English housing stock that includes both the building and plot dimensions.The 2007 EHCS contained 16,194 sample dwellings and the sampling takes into account the location and tenure.It comprises a household interview, a physical inspection and a market valuation.The physical inspection provides detailed information about the building dimensions and plot size, building fabric and service systems of each sample dwelling.The plot is the private land that belongs to the dwelling (generally referred to in the USA as the lot) or if the private land belongs to a small number of dwellings the plot includes the proportion attributed to the surveyed dwelling.
The variables used to categorize the dwellings can be found in the survey guidance document (DCLG 2007) and further information can be found on the Department for Communities and Local Government website (DCLG 2013).The EHCS variables chosen to categorize the dwellings for this analysis were the dwelling type, urban morphology, area type, region, and age band, which are described in Appendix A.
The EHCS data was firstly prepared for the purposes of analysis by calculating the plot area per dwelling based on the EHCS plot dimensions (Fig. 1).
The plot area was estimated as: where: The 'plot density' in dwellings per hectare was calculated as: For a house u = 1 For an apartment u = number of apartments within the block Plot density μ per dwelling was used as the metric instead of average dwelling density because the paper analyzes the variability of individual dwellings rather than neighborhoods.This plot density μ metric therefore gives a higher numerical value than the average residential area density normally used by planners because it excludes publically accessible areas and rights of way.
Each surveyed dwelling is allocated a grossing factor by the EHCS to convert the survey sample to an estimate of the English housing stock.The EHCS surveyed sample of dwellings was converted into an estimate of the English dwelling stock, as follows: where: EHCS grossing factor to convert the sample dwelling into the dwelling stock H i = Number of dwelling equivalent to surveyed dwelling h i .Hence: where: A i Total plot area of dwelling equivalent to surveyed dwelling h i .
The frequency distribution of the plot densities μ i of dwellings H i is positively skewed and similar to the gamma distribution (as shown in Fig. 2).However, there is some 'lumpiness' in the distribution.If the  dwellings are disaggregated into houses and apartments then the distributions have a broadly similar shape but with different mean and scale.It was found that further disaggregation into different dwelling types, area types and age band achieves density distributions that become progressively smoother and more distinct.This thereby increases the likelihood of fitting a density distribution function to the empirical data.
The variables that affect density were identified by analyzing the EHCS data using a generalized linear model (GLM).This estimated the significance of the correlation between the dependent variable and each independent variable whilst taking into account the variability of the other independent variables.The chosen specification of the GLM for this application used gamma regression with plot density μ as the dependent variable.The EHCS variables used as predictors were dwelling type, age band, area type, morphology and region.Further information about this type of model can be found in Chapter 8 of McCullagh and Nelder (1989).
This GLM analysis of the independent variables found that dwelling type, age band, area type, and urban morphology were all significantly related to the dependent variable of dwelling plot density.The region variable showed a 'north south divide' and the three northern regions (regions 1 to 3) had slightly higher densities than the rest of the country (around 2 dwellings per hectare higher).A possible reason is that a greater proportion of housing in the north was built for industrial workers, but this hasn't been investigated further.The three northern regions account for around one third of English housing and the difference in density is quite small so it was decided not to disaggregate the data by regions in order to avoid further reducing the sample sizes for analysis.
Importantly, the GLM analysis found no consistent trend in density per dwelling type over time.In fact, the analysis shows that densities have fluctuated with periods of lower than average density pre-1850 and from 1919 to 1980, and periods of higher than average density from 1850 to 1918, possibly due to industrialization, and from 1981 to 2007 possibly due to planning constraints.Hence, although plot densities per dwelling type have fluctuated over time, there is no evidence that a method derived from the analysis of existing housing stock cannot be used to forecast the density distributions of future housing stock.Overall residential densities have increased in England over this long timescale due to urbanization but this has mainly been achieved by having a greater proportion of apartments and terraced dwellings per area type, rather than by an increase in density per dwelling type.
Based on the above findings, the next step of the analysis aggregated the EHCS data into the possible combinations of age-band (9 bands), morphology (4 types), and area type (6 types).Each combination was regarded for this analysis as equivalent to an 'aspatial development type' c and was analyzed per dwelling type d.Dwellings were regarded as outliers and excluded if their plot density μ i exceeded the upper quartile plus 1.5 times the inter-quartile range, or was less than the lower quartile minus the 1.5 times the inter-quartile range.The outliers were mainly dwellings with a plot size the same or smaller than its footprint, which may be due to it being part of a mixed-use building or of unusual construction.Many of these combinations of dwelling type and aspatial development type were either unusual or inconsistent and so only had small/zero samples per dwelling type (such as apartments in a rural area, or a rural area with urban morphology, etc.).Those combinations with a remaining sample h di of less than 24 dwellings were excluded from the analysis.There were 108 remaining combinations of dwelling type and the aspatial development type for the next step of fitting a density distribution function.Table 1 shows that after applying the w i grossing factors, 74% of the total English dwellings remained for analysis.

Fitting a probability distribution to the EHCS dwelling stock data
For each combination of dwelling type d and aspatial development type c, the frequency distribution of the plot density μ was found to have a similar shape to the gamma distribution for k N 1.The general probability density function (PDF) of the gamma distribution is: where: The cumulative distribution function (CDF) is: where γ is the lower incomplete gamma function: The gamma distribution has the convenient mathematical property that the mean μ equals the product of the two parameters that define the distribution.Hence, the expected value of the theoretical mean plot density μ is: Subsequent investigation of the literature found that the gamma distribution has been widely studied for its usefulness in curve fitting to data of extreme events such as accidents, climatology, and hydrology.For example, Ison, Feyerherm, and Dean (1971); and Husak, Michaelsen, and Funk (2006) studied the differences in the shapes of the gamma distributions fitted to empirical climate data to assess whether locations have irregular or extreme events by comparing the parameters k and θ as discussed in Wilks (1995).
An innovative feature of the 'tiles method' is that it uses the gamma distribution from the opposite perspective to the above studies.It firstly calibrates the shape parameter k using the EHCS data and then specifies the gamma distribution from the calibrated shape parameter k and mean density x.
The shape parameter k is estimated using a maximum likelihood estimator for the gamma distribution (Thom, 1958).This includes the sample statistic E which is the difference between the natural log of the sample mean, and the mean of the logs of the data: For the plot density frequency distribution, the shape parameter k can be conceptualized as the result of successive subdivisions of land into plots.The shape parameter k represents the degree of similarity between the plot sizes each time a plot sub-divides, whereas the scale parameter θ represents the amount of subdivision that has taken place.The smaller the shape parameter, the more positively skewed the distribution and the larger the shape parameter, the more similar it is to the normal distribution.

Kolmogorov-Smirnov (K-S) goodness of fit test
The estimated gamma distribution is compared against the empirical density distribution using the Kolmogorov-Smirnov (K-S) onesample goodness of fit test (Siegal & Castellan, 1988).The statistical test needed to be more stringent than the standard K-S test because the distribution parameters were estimated using the EHCS data and the same data was used to test the goodness of fit.A more stringent K-S statistic was therefore used based on a version of the Lilliefors test using critical values for this K-S statistic to assess goodness of fit of gamma distributions that were originally published in Crutcher (1975) and reproduced in Wilks (1995).
Both the K-S and Lilliefors tests utilize the following test statistic: where F n (μ) is the empirical cumulative probability, which is estimated as F n ðμ i Þ ¼ i n for the i′th smallest data value in the sample n of h i and F(μ) is the theoretical cumulative gamma distribution function (CDF) evaluated at μ i .Thus the K-S test statistic g n looks for the largest difference, in absolute terms, between the empirical and the fitted CDFs for the sample of size n.The null hypothesis is that the observed data are drawn from the chosen theoretical distribution.If the discrepancy g n exceeds the critical value then this is cause for rejection of the null hypothesis and implies that the theoretical distribution is not doing an adequate job of modeling the empirical density distribution.

Results of the K-S test
Appendix B summarizes for each combination of dwelling type and independent variables the sample size, number of outliers, estimated gamma distribution parameters, and the significance of the K-S statistic.This shows that the gamma distribution was a good fit for almost all of the combinations and in most cases the null hypothesis could not be rejected.Of the total 108 fitted gamma distributions, the null hypothesis could be rejected at the 20% level in 47 cases; (i.e., with only 80% confidence that the sample was not drawn from the fitted theoretical gamma distribution) at the 10% level in 18 cases, and at the 5% level in 9 cases and at the 1% level in 17 cases.Hence there were only 17 remaining out of the 108 cases where there was 99% confidence that the dwellings were not drawn from a gamma distribution and these were spread relatively evenly across the dwelling types.It is surprising that the gamma distribution fits the empirical data so well given that plot sizes vary greatly between dwellings.This high success rate in fitting the gamma distribution and its convenient mathematical properties made it a suitable basis for representing the distribution of residential plot densities.

The calibrated shape parameters of the gamma distribution
The shape parameters of the fitted gamma distributions from Appendix B are plotted in Fig. 3 and show a clear relationship between the shape parameter k and the age band of the dwelling: The older the dwelling, the smaller the shape parameter.This indicates that newer dwellings have a more uniform division of plot size whereas older dwellings have a less equitable distribution of plot size.
The shape parameters for detached houses, bungalows and apartments are approximately half the size of those for semi-detached and terraced houses indicating that apartments and detached houses have a greater variability in plot size for a given mean density.Apartments are often built in areas where space is constrained and can vary greatly in density per plot, and similarly detached houses and bungalows can have large variability in plot size, whereas plot sizes for semi-detached and terraced houses tend to be more uniform.
Table 2 summarizes these calibrated shape parameters k for the different dwelling types.
The analysis found no relationship between the shape parameter k per dwelling type and either the area types or the morphologies.However, the aspatial neighborhood types with samples large enough for analysis had a relatively narrow range of area types per dwelling type, for example apartments were in urban areas, whereas detached dwellings were in suburban and rural areas.

Estimating the gamma distribution from the mean density
For a given dwelling type and location, the mean density x is: It can be intuitively deduced that x is the mode of the plot densities μ (i.e., the maximum value of the probability density function of μ).
The mode of the gamma distribution has the following property if k N 1: Alternatively, this can be shown by integration of the probability density distribution f(μ; k, θ).The number of dwellings cancels out and the denominator is the integral of the dwelling frequency distribution divided by plot density μ, as shown below; And this results in Eq. ( 13) above Hence: Substituting Eq. ( 8) into Eq.( 15) gives the following new convenient relationship that will allow the mean plot density to be calculated from the mean area density: Hence, the scale parameter θ can firstly be estimated from Eq. ( 15) because x is forecast by an urban model such as LUISA and the shape parameter k has been empirically calibrated as k using the EHCS data.Therefore μ can then be estimated from Eq. ( 16) and the theoretical gamma distribution can be fully specified as a calibrated function of plot density.Note that μ and x are different density metrics; x is the conventional method of dwellings divided by the sum of the residential plot areas, whereas μ is the mean value of the plot densities of the individual dwellings.

Deriving the distribution of the plot area per dwelling type
The plot area per dwelling is: Hence from the PDF of the dwellings, Eq. ( 5), the area per dwelling is: The mean plot area ā d per dwelling from Eq. ( 14) is: Hence, the PDF of the plot area per dwelling over the plot density range μ is: The following integration produces the CDF of plot area per dwelling: It is interesting to compare this CDF of plot area (Eq.21) with the CDF of dwelling frequency (Eq.5).Fig. 4a shows this comparison for detached dwellings and Fig. 4b for semi-detached dwellings.Both are shown for the same mean plot density x but these dwelling types differ on their shape parameter k (Table 2).The 20% lowest density detached dwellings would occupy 36% of the plot area for detached dwellings whereas the 20% lowest density semidetached dwellings would occupy only 30% of the plot area.This illustrates how k can indicate the equity in plot sizes per dwelling type.Those with a smaller k, such as detached dwellings and apartments, have a less equitable division of plot sizes than terraced and semi-detached dwellings.

Using discrete tiles to represent the calibrated density distribution
The next stage was to develop a method of using the preceding calibrated functions to convert the forecasts of average density per zone of an urban model, such as LUISA, into discrete representations of the dwellings and plot sizes.

Disaggregating dwellings into dwelling types and densities
The EHCS dwelling types were aggregated into the minimum number of distinct types.These were detached, semi-detached, terraced houses, purpose-built and converted flats/apartments (with bungalows divided between detached and semi-detached houses).This was done to reduce the number of tile types and amount of work involved in designing and modeling the tiles.

Estimate the proportions of each dwelling type per zone
The first step estimated the percentage of each dwelling type for a given mean area density x using a similar method to Mitchell, Hargreaves, Namdeo, and Echenique (2011).This combined the 2001 Census dwellings data with the residential land areas of the Generalized Land Use Database (GLUD) that is based on Ordnance Survey Mastermap™ (DCLG 2005).Fig. 5 shows how these percentages varied with plot density and the East and South East of England regions are similar to the average for England, whereas London has a greater proportion of apartments for each density band.
Note however that the GLUD data only classifies the dwelling footprints and domestic gardens as residential land whereas the EHCS data is based on a manual survey that measures the residential plot.There are therefore some disparities between GLUD and EHCS metrics especially for high density urban centers.For example, if a mixed-use building has a non-domestic unit on the ground floor then GLUD classifies the whole building as non-domestic.Nevertheless, the GLUD data is broadly consistent with the plot density metric for most of the case study area.
The relationships illustrated in Fig. 5 were then represented as empirical equations per region so that the forecast number of dwellings could be split into the number of dwellings by type H dj based on the forecast mean residential density x j per zone j.

Estimate the mean density per dwelling type from mean dwelling density
The next step estimated the mean density of each dwelling type from the overall mean density x.This analysis aggregated the EHCS data by region, morphology and area type giving 216 possible 'aspatial' location types l (9 × 4 × 6).However, there were some invalid combinations, such as rural morphologies in London region, and so only 164 'aspatial' location types had data.The estimated mean density per dwelling type d in aspatial location l is: where: H dli = number of dwellings i of type d in aspatial location type l A dli = plot area of dwellings i of type d in aspatial location type l.
The overall mean density of the aspatial location type l is: Fig. 6 shows that there are clear relationships between the overall mean residential density x l and the estimated mean residential density per dwelling type x dl ′ .These were represented as empirical equations to convert average density into density per dwelling type.(The correlation for converted apartments is not as good as the other dwellings because they are of very variable construction but these are a small proportion of total dwellings and so this makes little difference to the results.) These density estimates per zone of the urban model were then proportionally adjusted so that the resulting residential area matches the land input constraints per zone j; where: A j = input residential area to urban model for zone j σ j = estimated adjustment factor x d j = adjusted mean density for dwellings type d.

The generic tiles
The tiles are a new innovative method of transforming the parametric distribution of plot densities into a discrete 3D representation of built form.They have been created primarily as a medium for multidisciplinary research on urban planning, buildings and decentralized environmental technologies.Fig. 7 shows examples of the tiles which are generic forms that range from low to high density for each dwelling type.Appendix C shows of two of the tiles in more detail.
The tiles were designed using the EHCS data on the dwelling dimensions, building fabric, floor space, occupancies and plot sizes (DCLG 2009).

Designing the plot density of each tile type
The method of selecting a set of tiles is conceptually equivalent to fitting a histogram to a gamma distribution as illustrated in Fig. 10.The frequency distribution of plot densities is represented by the numbers of tiles selected from a pre-designed set, which can include fractions of tiles.This is similar in principle to approximate integration where in this case the subinterval is defined by the plot density boundaries of the tile types.The following procedure is carried out for each location j and dwelling type d.
The gamma distribution for dwellings H d in zone j was specified by inserting the mean density x and the appropriate value of k (Table 2) into Eqs.( 15) & ( 16) to calculate μ and θ.The gamma distribution CDF Fðμj k; θÞ of the plot area frequency (Eq.21) was then used to calculate the probability that the plot area is of tile type t as follows: where: p at = probability that plot area is of tile type t μ t = upper boundary of the plot density subinterval of tile type t.
(Note that for t max the probability of μ b μ t = 1.) Hence the total plot area of tiles of type t is: The CDF f ðμj k; θÞ of dwelling frequency (Eq.6) was used to calculate the probability that a dwelling is of tile type t as follows: where: p ht = probability that a dwelling is of file type t.
Hence, the number of dwellings of tile type t is: Hence, the mean density of the subinterval for tile type t is an output from the above empirically calibrated parametric functions: The accuracy of the tiles method depends on specifying a discrete tile density that is as close as possible to the mean density x t of the tile subinterval.The sum of the tiles will then closely match the target for available plot area.
Values of x t were calculated using GLUD Output Area data in the Wider South East of England case study area.The Output Areas (OAs) are the smallest areas at which the UK Office for National Statistics provides geographical data (around 125 households per OA).Using such small areas provided a rigorous test of the tiles method.The OAs were selected if they were within an urban boundary and had less than 2% non-domestic land (19,770 of the OAs in the WSE met these criteria).This ensured that most of land per OA was for residential use and thereby reduced the inaccuracies that mixed-use buildings cause to the GLUD estimates of residential land.
The boundaries of the tile subintervals were adjusted so that each tile type represented approximately the same proportion of dwellings for the case study area.The finalized boundaries are shown in Table 3. where: n t = number of tile types.
Fig. 8 shows the mean densities per tile type x t versus the mean density of the dwelling type x d per OA.It can be seen that x t was relatively constant for the intermediate density tiles and was approximately the mid-point of the density subinterval and so each could be approximated by a discrete tile density.
However, the mean density was not constant for the lowest and highest density subintervals because there was a decreasing probability that the target density would be matched at these extremes by a mixture of tile types.The upper and lower subintervals represent the most extreme generic built form per dwelling type and the only way to achieve a more extreme density is to vary either its plot size or number of storeys.Table 4 shows the discrete tile densities selected to represent the mean density of each tile type.Table 5 shows an example of using the tiles for semi-detached houses of mean density x ¼ 30 dph (based on an input plot area of 10 ha and forecast of 300 dwellings).If their average age is 1945 to 1974 then their calibrated shape parameter k ¼ 8 from Table 2.The scale parameter θ = 4.3 from Eq. ( 13).Hence, the mean plot density is μ ¼ 34:3 dph from Eq. ( 16), which thereby fully specifies the calibrated gamma distribution.Fig. 9 shows the CDF of the plot area per dwelling (Eq.21) with the tile subintervals from Table 3.The tiles can be used in two alternative ways to convert the parametric distribution to the discrete tiles per dwelling type, depending on whether the aim is to exactly match actual dwelling forecast or the actual inputs on plot area.The method to match the number of dwellings uses the CDF of plot area, and the  method to match the plot area uses CDF of dwellings (Eq.6).In this example, the results are within 5% for plot area and − 3% for dwellings which are typical for the method.
The plot area is estimated as: The dwellings are estimated as: Fig. 10 illustrates how the above example compares with the gamma distribution PDF of the dwellings (Eq.5).The tile densities are shown as broken lines.

Comparing the estimates of plot area using the tiles with the GLUD data
The total residential plot area was estimated using the tiles as: where: A dt = total residential plot area of dwelling type d.
Fig. 11 shows that the tiles give a very close estimation to the actual GLUD data on residential land per Output Area.The only exceptions are the Output Areas larger than around 10 ha, which are lower density areas that often have unusually large properties with outbuildings.

Comparing the regional distribution of generated tiles against empirical EHCS data
The next step of the validation process generated a set of tiles for the East of England and compared the results with the East of England EHCS housing stock data.The EHCS data was firstly disaggregated into 21 aspatial location types based on combinations of the 6 area types and 4 morphologies (3 of these combinations had no dwellings).The only inputs for the tile generation process were the average plot density and total number of dwellings of each aspatial location type.Everything else was then calculated using the previously described tiles method to generate a set of tiles for each of the 21 aspatial location types.Fig. 12 compares the empirical plot density distributions of the EHCS data for   (from eq.28) Fig. 11.Comparison of the total plot area of the tiles vs. GLUD residential area.
the East of England with the plot density distribution from the tiles and shows that the tiles method generates a realistic distribution of plot density per dwelling type.

Comparing the land areas of the generated tiles against Census Output Area data
The tiles were then compared with GLUD dwelling footprint data per Census Output Area.This provided an independent validation because GLUD dwelling footprint data were not used for either the tile design or tile generation process.The dwelling footprint areas of the tiles were summed per Output Area and compared with the GLUD residential footprint area.GLUD was found to consistently overestimate the footprint area by around 4 m 2 which was found to be due to the inclusion of outbuildings, such as garages and garden sheds.The validation process therefore deducted 4 m 2 per dwelling from the GLUD residential footprint data before making the comparison with the estimates from the tiles.The validation was carried out in 4 stages shown in Fig. 13: a) Using only the total residential plot area and total number of dwellings per Output Area as the input to generate the tiles.The footprint areas were estimated based on the average footprint per dwelling type from the EHCS data.b) Same as (a) above but using data on the dwelling type percentages instead of an estimatethis shows that using the actual percentages, if available, makes little difference to the correlation.c) Same as (a) above but using the actual footprint areas per tile typethis shows how distinguishing between sizes of dwellings of the same type greatly improves the correlation even when only using an estimate of dwelling percentages.d) Same as (c) above but using the actual percentage per dwelling typethis gives a further improvement in the correlation and R 2 = 0.82 is surprisingly good for these small Output Areas.
This shows that the tiles method produces a much more accurate estimate of dwelling footprint areas than using average footprint per dwelling type.This is reassuring because footprint areas are a proxy for floor space and roof space, which are important for investigating energy demands and the potential of sustainable technologies such as solar energy and water harvesting.

Using the tiles for integrating regional scale and building scale modeling
The residential land areas in the LUISA regional forecasting model were based on GLUD residential land which as explained earlier, has broadly consistent metrics with EHCS and so the average densities per zone of the LUISA model could be converted directly into plot densities.
However urban planners normally measure the densities based on total residential area, which may include pathways, parking, communal space and roads.These additional areas were therefore subsequently added to the design of the tiles so that the number of tiles selected equals the total residential area in hectares.This resulted in two alternative measures of density; the plot density per tile x t that was used in the preceding method to calculate the number of dwellings per tile type; and the residential density z t shown in Table 6 that was used to convert these dwellings into the number of one-hectare tiles; where: z t = tile residential density (dph) a t = plot area per dwelling of file type t (m 2 ) r t = the additional residential land per dwelling tile type t (m 2 ) where: n t = number of tiles of type t (each is one hectare).
One hectare was chosen as the tile size partly for convenience of accounting so that the number of tiles equals the residential area, but also because if they were any smaller it would be difficult to visually illustrate its typical housing layout.The additional areas such as roads, paths, green space and residential parking were estimated from Google Satellite maps and Ordnance Survey mapping.Fig. 14 shows how the percentage of these different residential land types varies with the mean plot density x t .At low densities x t is similar to z t because the buildings and gardens account for most of the tile area but as the density increases the rest of the residential land such as roads, paths, green space and other land become a larger percentage of the tile area, and so x t becomes much larger than z t .
A dataset was produced per tile of consumption, emissions and costs using building-scale models for energy, water and waste for combinations of scenario-specific variables, such as technology scenario and uptake; climate; development type (existing area, redevelopment or new land); area type (central, urban, suburban or rural), occupancy characteristics and whether the dwellings were as existing, retrofitted or new build.The tiles were generated per zone to match each scenario forecast of the regional scale model and then the tile data was aggregated.
It would be very useful if this tiles method could be further developed to directly estimate the distribution of residential density rather than plot density.To investigate this possibility Fig. 15 compares the mean plot density x À with the mean tile density z À for the main dwelling types.This relationship is almost linear especially for the detached and semi-detached dwellings.It is not quite so linear for terraced dwellings and apartments but this is probably due their greater range of dwelling types and if disaggregated into end-terrace and mid-terrace and lowrise and high-rise apartments then this may increase their linearity.This broadly linear relationship suggests that the plot densities could be transformed into tile densities and still fit a gamma distribution.If the shape parameter k was then recalibrated it may be possible to estimate the distribution of residential density z directly from z À and generate larger tiles as generic neighborhood types.This would be helpful for exploring the interactions between bottom-up urban design and topdown socio-economic modeling of city regions, but is beyond the scope of this current paper.

Discussion and conclusions
This paper has presented a new innovative method of analyzing housing survey data to explore the variability of dwellings plot sizes.Housing development in England is largely commercially driven but subject to planning constraints and so land is relatively expensive.The provision and adaptation of the housing stock is therefore responsive  to buyer demands and may partly explain why the distributions of plot densities of each dwelling type have a relatively consistent shape that can be approximated by the gamma distribution.
Older dwellings were found to have a greater variability in plot size than newer dwellings and the following possible reasons would need further investigation.Dwellings may have become more uniform over time due to a tightening of planning regulations and housing standards, whereas older dwellings are more likely to be the result of property conversion and thereby become more diverse.It is also likely that the variability of plot size per dwelling type is correlated with variability in household income.Greater mobility from widespread car ownership has allowed more social segregation between area types whereas previously neighborhoods had a wider mix of income levels and consequently a wider variation in plot sizes per dwelling type.
In reality the density distribution of dwellings does not necessarily follow the gamma distribution on a location specific basis because the built environment is so variable.However, the tile estimates become very similar to the data when aggregated over larger areas.Some alternative distributions were tried but none outperformed the gamma distribution on providing a consistently good fit to the data.The method could in principle be used for the parametric simulation of dwellings and plot sizes but it is more practical for multidisciplinary research to convert the density distributions into discrete predesigned 3D tiles.
This tiles method is a useful extension to urban forecasting models by allowing the average density forecast per zone to be converted into a representation of the dwelling stock and residential land.This captures the variability in garden size, roof areas and floorspace that is needed to estimate the likely uptake and performance of decentralized sustainable technologies for energy, water and waste management and provides more accurate estimates than could be achieved using more conventional methods such as dwelling typologies, mean densities, and floor area ratios.The tiles also reflect qualitative aspects, such as garden size, number of party walls and storeys, which may be useful for modeling housing choice.Also, their data on building heights and land cover could be a useful input to the forecasting of urban climate and flood risk.
Adding more tile types would increase the accuracy of estimating the dwelling stock.However, this needs to be balanced against the extra work of designing each tile and modeling its building efficiency, demand and supply characteristics.Some extra tiles would need to be added to more accurately represent mixed-use buildings in urban centers such as inner London.The tiles method is being adapted to nondomestic buildings and so it may eventually be possible to combine the domestic, non-domestic and mixed-use tiles within a single estimation process.
Further more detailed validation would be useful by land parcel size and if possible by dwelling age bands.The next steps are to adapt the method to other countries that have similar housing survey data and also to explore the possibility of generating larger generic tiles of residential areas.
plot area of a house or an apartment block i (m 2 ) W = plot width (m) F = depth of land at the front (m) B = depth of the building (m) R = depth land at the rear (m).

Fig. 1 .
Fig. 1.Plot dimensions from EHCS data used for calculating the plot density.

Fig. 3 .
Fig. 3. Shape parameter k by age band for the main dwelling types.

Fig. 6 .
Fig.6.Estimate of mean density per dwelling type from the average residential density.

Fig. 8 .
Fig. 8. Mean density per tile type plotted against the mean density of the dwelling type.

Fig. 12 .
Fig. 12. Results of the validation for the East of England region.

Fig. 15 .
Fig. 15.Comparison of mean plot density and mean tile density for each main dwelling type.

Fig. 14 .
Fig. 14.The average proportion of residential land versus the average plot density x of the tiles.

Table 1
Dwellings H i represented by the selected sample compared to total English dwellings.

Table 2
Summary of the calibrated shape parameter k for different ages and types of dwelling.Converted apartments are not included because only 6 of the variable combinations had large enough sample sizes, which is insufficient to reliably estimate the shape parameter. a

Table 3
Density of the tile boundaries.
a Each dwelling type was allocated an upper boundary to represent the highest feasible density of the dwelling type, to improve the reliability of the mean tile density calculation.

Table 4
Plot densities of the 23 domestic tiles used to validate the tiles method.
Dwelling typePlot density x t per tile type (dph)

Table 5
Example of the tiles method for x ¼ 30 dph, k ¼ 8.

Table 6
Tile densities of the 23 domestic tiles.Results of the validation using the GLUD area of residential building footprints per Output Area.

Table B1 .
Table B3Results for bungalows and purpose built and converted apartments.Table C1Tile D1detached houses.
Appendix C. Examples of tile dimensions