Rent/price ratio for English housing sub ‐ markets using matched sales and rental data

The ratio between the rental and sales values of residential properties are a much studied statistic in the field of real estate economics. When these values do not keep pace with each other, and in particular when the ratio is low, some commen-tators take this as an indication that there may be a housing bubble building. The ratios are also of interest to potential property investors. These ratios are commonly computed on aggregate statistics derived from the housing market and as such rarely provide any indication of sub ‐ market bubbles, that can occur with particular property types or regions of the country. In this study use is made of a data set from a property listings company that provides sales and, potentially, rental prices for the same properties within England. From the matching that takes place it is possible to calculate the rent/price ratio for individual properties. A regression model is then estimated to explain how the characteristics of the properties; the nature of their neighbourhood; and their location influence this ratio. The model consistently validates the hypothesis that the more desirable a property or affluent an area, the lower the rent/price ratio. It also begins to illustrate the range of “ normal ” rent/price ratios that may exist in housing sub ‐ markets. The regression model is then used to provide a map of the geographical distribution of the ratio for England for one property sub ‐ market.

& Kadi, 2018), particularly as investments (Mellish & Rhoden, 2009;Sprigings, 2008). One particular demographic often cited as challenged by these changes in the housing market is "generation-rent," young people who are unable to afford to buy a home, are denied access to the limited social rented sector (Schmickler & Park, 2014), and are therefore reliant on private rented properties (Alakeson, 2011;Clapham et al., 2014;Lund, 2013). In such a diverse and dynamic market it is important to have an understanding of the mechanisms at work that affect price and affordability and how these differ by sub-markets, either in terms of the type of property or by geography.
One important measure of the state of the housing market is the rent/price ratio yield, which is calculated as the annual rental value of a property divided by its sale price. However, there are a number of nuances to this definition (Wyatt, 2013). The first aspect is whether the rental yield is based on gross or net costs. In regards to the rent, the net rent could reflect the rental costs, e.g., agent management fees or property maintenance, and in regards to sales, the net sales price could reflect the on-costs of the purchase, e.g., legal or mortgage loan fees. A second aspect concerns whether the rent is the current rent charged (initial yield) or the market rent (reversionary yield), where over time, unless the rent is re-negotiated, the two will diverge. Notwithstanding these nuances, typically the rent/price ratio has a value around 0.06.
The utility of this ratio rests with both its ability to identify possible housing bubbles (Mayer, 2011;Smith & Smith, 2006) and also to indicate potential investment yields from buy-to-let purchases (Leyshon & French, 2009). Ideally the ratio should be calculated on the same stock of housing, but practical difficulties usually make this problematichouses rarely appear on the market at the same time for rent and sale, and when they are either rented or sold the alternative value measure is unrealised. Recognising this, the ratio is commonly calculated based on an aggregate understanding of the trends in rents and house prices. In this paper the rent/price ratio will be calculated using a method that matches contemporaneous sales and rental data for the same property within England by making use of administrative and commercial property data sources. This technique will allow an almost complete picture and understanding of the pattern of this ratio by aspects of the property, its neighbourhood, and its geography. No previous studies have attempted this task on this scalethe closest are a study confined to the atypical West London housing market (Bracke, 2015) and the calculation of correction factors to apply to modelled rents and prices in Sydney, Australia (Hill & Syed, 2011).
Section 2 of this paper provides some background on the rent/price ratio and how it has been used in previous studies. Section 3 introduces the data and the methodology used to calculate the ratio. Section 4 presents the results of a regression model that attempts to show how various attributes influence the ratio, and maps national estimates of the ratio for English postcodes. Finally, section 5 provides a discussion of these findings and the wider implications of the work.

| IMPORTANCE OF THE RENT/PRICE RATIO
The importance of the rent/price ratio to politicians, policy analysts, and economists is that it reflects the stability of a housing market or its sub-markets. In a stable housing market the ratio will remain relatively consistent. However, if the ratio begins to fall then there is evidence that the value of the underlying asset, the property, is beginning to increase (Campbell et al., 2009). When this departure occurs, there is usually a correction, which can be through a gradual convergence as rents also increase, or a sudden drop in the property pricea burst bubble (Ambrose et al., 2013;Jurgilas & Lansing, 2012). Various econometric models have been estimated to try to gain an understanding of how useful the rent/price ratio is in predicting a bubble. This has been done for the UK (Kim, 2015;Ngai & Tenreyro, 2014); the Euro area (Hiebert & Sydow, 2011); a range of OECD counties (André et al., 2014;Engsted & Pedersen, 2015); China (Zhai et al., 2017); the USA (Gallin, 2008;Kivedal, 2013); and a selection of Metropolitan areas within the USA (Beracha et al., 2012;Campbell et al., 2009;Kishor & Morley, 2015). Many of these studies report that the ratio is a valid indicator for the development of a property bubble. In particular, studies of the Metropolitan Areas of the USA identified markets that were subject to a bubble that burst and those that also experienced some form of bubble but in which the correction was less severe. This variation in outcome is also evident in European countries: Spain and Ireland experienced a bubble followed by a burst, but Germany did not (Hiebert & Sydow, 2011). The question then arises as to what is driving these departures from the fundamental relationship between rents and property prices? Some of these drivers include: interest rates, levels of housing affordability, regulatory environment, taxation and tax relief, speculation, constraints on development, and demographics (Clark & Coggin, 2011;Mayer, 2011).
Another important use of the rent/price ratio is to establish a likely rental yield from owning and renting out a property. Kennett et al. (2013) and Whitehead and Williams (2011) chart UK trends in the private rental market from a period of slow decline until the 1990s, followed by a period of stability and then rapid growth, starting in the early years of the 21st century. The impact of the global financial crisis in 2007/2008 is seen to encourage the private rental market due to a CLARK AND LOMAX | 137 "search for yield" in a global environment of low interest rates, which provides poor yields from monetary deposits and also makes borrowing to invest in property more attractive. However, it is argued that it is more the potential for capital gain than rental income that is of interest to investors (Kemp, 2015).

| DATA AND METHODOLOGY
The rental and sales data for this study is primarily collected by Zoopla (2018), a large online property listing company which has been processed by the data services company WhenFresh, with additionally the sales data being supplement with administrative data from the Land Registry (Her Majesties' Land Registry, 2018). The data cover the calendar years 2014 and 2015.
For all sales, the property type (flat, terrace, semi-detached, and detached), its address, the date of the sale transaction, and the sale price are available. When this sale was also listed on the Zoopla web site, additional data concerning the property is available, including the number of bedrooms. For rental properties only information from Zoopla is available, which included: property type, date of listing, rental price listed, and number of bedrooms.
Prior to its use some cleaning of these data takes place. Transactions before January 2014 and after December 2015 are removed, exact and temporally close (within seven days) duplicates are deleted and outliers are removed. The removal of outliers in these types of commercial data is commonplace (Ambrose et al., 2015;Bracke, 2015;McCord et al., 2014) and here, rather than the top and bottom slicing of a fixed percentage of the distribution of asking rents or actual sale prices, a variation of the approach to identifying outliers in a box-plot is used. First the listings are segmented into sub-markets by property type, number of bedrooms, and the Acorn Category of the postcode for the property (CACI, 2017) (these categories will be introduced below). A lower limit for the rent or price is then set as 1.5 times the inter-quartile range below the lower quartile and the upper limit as 3.0 times the inter-quartile range above the upper quartile. The asymmetry in the multipliers is a recognition that these data are not symmetric, exhibiting a positive skew. Thus the identification of outliers is only made in the context of similar properties and no fixed percentage of outliers are forced to be removed. In practice this approach removes just 1.8% of rental listings and 1.3% of sales listings. In total, all these cleaning operations remove just over 10% of rental listings and only 2% of sales.
The methodology here adopts that of Bracke (2015), where the same property is identified in the sales data and the rental data and those properties where the rental occurs between one and eight months after the sale are retained. The same properties are identified using the full address and the postcode. For the sales data, no information on the size of property, e.g., number of bedrooms, is available where the data are sourced from the Land Registry and is also sometimes missing from the Zoopla listing. Other studies have attempted similar exercisesincluding exact matching for sales and rentals (Bracke, 2015); exact repeat rentals (Ambrose et al., 2015); identifying similar sales and rental properties (Smith & Smith, 2006); and developing hedonic rental and sales price models to estimate, for the same property, a contemporary sales and rental value (Hill & Syed, 2011).
After the merging of the two datasets a comparison of the property aspects can be made, one piece of information from the sales data and one from the rental listing. Here there are some discrepancies. Table 1 gives the cross-tabulation of property type. The comparison is complicated by the absence of the bungalow category in the Land Registry sales data: bungalows can be detached or semi-detached (but not flats or terraced!). Also, sometimes an end terrace house may be listed as semi-detached. The illogical cross-classifications are shown as italics and shaded in dark grey in Table 1 and amount to just less than 5% of properties. Taking this information forward, the property type is taken from the more complete Land Note: Bold values are consistent from the two sources; light grey values are known from one source only; italic values in dark grey signify values that are inconsistent; and normal cells are values that could be consistent. These counts sum to 25,503, however 600 of these properties do not have a valid postcode and therefore are not used in the summary statistics reported in Table 3 or the regression results reported in Table 4.
Registry data. The situation is more complex for the number of bedrooms, as shown in Table 2. There is straightforward agreement along the main diagonal for 53% of properties plus a further 1.5% where neither source provides any information (bold and white cells). For 38% of properties, only one source has information (normal and light grey cells), leaving just 8% where there is active disagreement between the two sources (italic and dark grey). This disagreement could be due to renovation work carried out between the dates of the sale and the rental listing to either remove or add a bedroom to the property, but this reason is unlikely to explain all of these discrepancies. Taking this forward, where there is agreement, this information is used; where only one piece of information is known, this is used; and where they disagree, this is marked as a property with an active disagreement. Information on the number of days between the sale and the rental listing is recorded. Additionally, based on the property's postcode, the following information is attached to the property: its Acorn geodemographic profile; its score on three health indices (Daras et al., 2018); its distance north/south and east/west of Kensington Palace in West London; distance from the nearest railway/underground station (Department for Transport, 2017); and the Ofsted rating of the nearest primary and secondary schools (Baxter & Clarke, 2013).
From the 101,324 properties that match between sales and rental listing, those where the subsequent rental occurs between one and eight months after the sale are identified. When all this information is collected together, the summary statistics for the sales data, the rental data and the matched data are provided in Table 3. The rent/price ratio yield is calculated as the annual asking rent of the property divided by the sales price. Such a yield is best described as a reversionary gross yield, since the costs are those that reflect current market conditions and also do not reflect any of the costs associated with the transaction. Of those rental properties listed on the Zoopla rental site, 2.5% were sold within the previous one to eight months. The matched data have higher rents but lower sales prices, and thereby a higher rent/price ratio. There is more terraced housing in the matched data set and the properties are smaller, with a predominance of two bedrooms. Fewer of the matched properties are located in areas with the affluent achievers geodemographic classification and more are to be found in the Financially Stretched and Urban Adversity areas. Matched properties are located closer to London. The distributions of primary and secondary school Ofsted ratings are similar among all three data sets.

| REGRESSION RESULTS
Using the matched data set, a regression equation is used to try and understand the predictors of the rent/price ratio and provide a model to predict the rent/price ratio for each English postcode (see Table 4). Since the ratio is positively skewed and the mean and variance are not similar, a quasi-Poisson generalised linear model is fitted using the glm function in R (R Core Team, 2016) with a quasi-Poisson family (Fox, 2015). Detached or semi-detached houses have a lower ratio than terraced houses, while flats have a higher ratio. The more bedrooms that a property has, the higher the ratio. The longer the gap between the property being sold and listed on the rental market, the higher the ratio. As the affluence of the area decreases, then the ratio increases. Living in an area with a healthy retail location and good access to health services decreases the rent/ price ratio, while a healthy physical environment increases the ratio. The further from central London, in any direction, the higher the ratio. A greater distance to the nearest railway or underground station has a negative impact on the ratio but is not significant. In terms of primary schools, the ratio increases relative to the base of having an school rated as Outstanding by Ofsted close by, but only significantly so for Good and Requires Improvement schools. For secondary schools, the ratio T A B L E 2 Cross-tabulation of number of bedrooms from sales and rental data Note: Bold values are consistent from the two sources; light-grey values are known from one source only; and italic values in dark grey signify values that are inconsistent. These counts sum to 25,503, however 600 of these properties do not have a valid postcode and therefore are not used in the summary statistics reported in Table 3 or the regression results reported in Table 4.   Table 5. On the log scale, the average percentage errors are around 6% or 8%, but on the original sale they are larger at 16% or 26%. Given the lack of comparative studies it is difficult to judge how well these statistics compare.
To explore any potential issues with multi-collinearity between the variables, the correlation matrix from the model is visualised in Figure 1. Other than the natural correlation within categorical or interaction variables and correlations with the intercept, the highest correlation is 0.721, which is between the north/south and the east/west of London. Using the vif function in R package car (Fox & Weisberg, 2011), the highest GVIF 1/(s*dof) value is 4.87 for the distance from London interaction term, which does not indicate a severe problem with multi-collinearity (see O'Brien (2007) for a discussion on suitable rules of thumb). Using the glm model it is then possible to estimate the rent/price ratio for areas in England for a given property type. Looking at the terms in the regression, all are known at the geography of postcode, except the number of days between the sale and rental. To provide this information the median number of days between the sale and rental listing by property type and number of bedrooms is used ( Table 5 provides a count of this distribution and Table 6 the median time gap, which for the majority of properties is just less than three months). The map of these ratios for a two-bedroom flat (the most commonly matched property type) is provided in Figure 2, with the ratio split into quintiles. The ratio is highest in a band across northern England, from Liverpool in the west, eastwards through Manchester and West Yorkshire and then running south through the East Midlands. It also appears to be lower in rural postcodes relative to the nearest major town or city, with rural properties commanding high sales prices, since they are often purchased as premium, retirement, or second properties, but being unable to command high rents, since local employment opportunities can be limited in rural postcodes. Also the distance from London impact is seen to be mitigated somewhat, with some rural postcodes in the north having a ratio not dissimilar to that in the Home Counties around London.

| DISCUSSION
In this paper we determine the rent/price ratio for a heterogeneous mix of properties types for every postcode in England. The information is derived from a mix of commercial (rental/sales) and administrative (sales) data sources. The work has been extended to a model that explains these ratios using a combination of information about the property, the affluence of the area, and the neighbourhood characteristics.
T A B L E 6 Number of 1-8-month matched properties, by property type and number of bedrooms A consideration of the results reported in Table 4 reveals some insight into the rent/price ratio. Relative to terraced houses, flats have a higher ratio, while detached and semi-detached properties have a lower ratio, a reflection of their sales price differentials relative to terraced properties. This finding shows that, all other things being equal, flats have the highest ratio, followed by terraced properties. The greater the number of bedrooms, the higher the ratio, with the premium for two bedrooms over one being just over 2%, but for five or more bedrooms it is much greater at more than 20%. This indicates that while larger properties sell for higher prices, their rarity on the rental market allows a premium to be incorporated in the expected rent and such a higher rent will increase the ratio. Where the number of bedrooms is either unknown or there is some disagreement, the estimate is somewhere between that for two and three bedrooms. A longer time between the sale and rental listing may reflect the fact that some renovation work is required to the property, which means that it was probably sold for a lower sales price, but this work would also lead to a higher expectation for the rent, and hence, as seen here, a higher ratio. Although the reported percentage impact per day is low, over a gap of 100 days (not untypical, see Table 7), this multiplies out to a 3% increase in the ratio. A general finding in other studies is that wealth or affluence tends to produce a lower rent/price ratio (Bracke, 2015) and this is reflected in the results reported for the Acorn geodemographic categories, as affluence decreases, the ratio increases. This is where the highest impact values are seen, with a near 50% increase in the ratio for challenged areas of Urban Adversity. Living in a neighbourhood with a healthy retail environment (away from fast food restaurants, tobacconists and gambling) and access to health services reduces the ratio but conversely a neighbourhood with good physical environmental health (in terms of lower pollution levels and access to green space) increases the ratio. These scores are measured on a scale of 0-100, so the scope for large changes and hence large percentage impacts on the ratio are limited. The variable that explicitly captures the spatial aspects of the ratio is the distance from West London, differentiated by both a north/south axis and an east/west axis plus an interaction term. The further from London (in log terms), the higher the ratio, which is a reflection of high property prices in West London that may struggle to be matched with similarly high rents. A 10% increase in distance from London north or south is much more pronounced (over three times more) than a similar increase east or west of London. For primary and secondary schools, the neighbourhoods in the catchment of schools that are not Outstanding have higher ratios, another reflection of the ratio being higher in less affluent neighbourhoods, with properties in such neighbourhood been relatively cheap to buy but still able to command a reasonable rent. The impact of the secondary school's rating is much higher than that of the primary school.
Many of these findings, using a diverse range of attributes, confirm the hypothesis that deprivation tends to increase the rent/price ratio. These interpretations also show that aspects that increase the sales price of a property (e.g., close to amenities and good schools) do not necessarily increase the rental value of a property, thereby enabling a significant variation in the ratio to be attributed to these aspects.
This study is also revealing in terms of the utility of the rent/price to indicate a potential housing bubble. We have identified that there exists considerable variation in this ratio and attributed it to a diverse range of influences. A low ratio indicates that properties are expensive relative to their potential rental yield. A low value of around 0.04 is not uncommon in central London, but if observed elsewhere say, where the typical value is around 0.05 or 0.06, then that would indicate that the property market, in that location, may be experiencing a bubble. This allows for regional bubbles in the housing market to be identified.
From an investor perspective, this model suggests that the type of property with the highest rent/price ratio is a renovated flat with a large number of bedrooms, in a less affluent area, at some distance from London. An investor with £10 million to invest and looking to maximise their gross rental yield would, rather than investing in a couple of properties in West London, be better off investing in hundreds of properties in the less affluent areas of the Midlands and North. The map in Figure 2 corroborates this, with the highest ratios and hence potential yields in areas of the Midlands and northern England. Also capital appreciation is not guaranteed to offset this lower yield for London properties, with Land Registry data showing London to be the only region of England to show a decline (from 119 to 117 points) in its house price index in the 10-month period to February 2018 (Land Registry, 2018).
T A B L E 7 Number of days between sales and rental listing, by property type and number of bedrooms The equivalent Bracke (2015) study is more geographically limited than the work reported here. They only used data for West London, which was primarily composed of high-value flats, making the results difficult to generalise to the whole of the UK. What Bracke (2015) was able to do was incorporate information on time between repeat rentals, rent appreciation, and rent volatility into the model, taking advantage of having a longer time series from 2006 to 2012. These later terms improved the model fit considerably. In the data used for this study, the opportunities for repeat rents is limited: only 429 properties are listed for rent on two or more occasions after their sale. However, these repeat terms, which are very property-specific, would make any model difficult to generalise to the postcode geography.
The work reported here could be extended by incorporating either more historic information from before 2014 or, more readily, data after 2015. Since the processing of these data is largely automated (in regards to the use of other data sources and the modest cleaning of the data), this extension would be trivial in a data sense. However, it is the data acquisition that is a challenge, with legal and procedure negotiations with data providers being necessaryparticularly for the rental data, which is not readily available from other sources. If a longer time span of data is available, it would then be possible to split the data into time segments and use that analysis to gain an understanding of the short-term trends in the rent/price ratio, which would be of value to policy analysts and econometricians. Another extension is to repeat the analysis for other housing markets, both in the UK and Europe, or anywhere that has access to the volume and variety of data used here (e.g., Zillow in the USA (Zillow, 2018); RealEstate.com in Australia (Realestate.com.au, 2018); and Funda in the Netherlands (Funda, 2018)).