Volatility in the Housing Market : Evidence on Risk and Return in the London Sub-market

The impact of volatility in housing market analysis is reconsidered via examinaton of the risk-return relationship in the London housing market is examined. In addition to providing the first empirical results for the relationship between risk (as measured by volatility) and returns for this submarket, the analysis offers a more general message to empiricists via a detailed and explicit evaluation of the impact of empirical design decisions upon inferences. In particular, the negative risk-return relationship discussed frequently in the housing market literature is examined and shown to depend upon typically overlooked decisions concerning components of the empirical framework from which statistical inferences are drawn.


Introduction
The importance of the housing market to the wider economy has been well documented in empirical research with a number of studies noting, inter alia, its substantive contribution to private sector wealth, dominance over the stock market in determining household consumption decisions, central role within the macroeconomy and close relationship with economic fundamentals (Brueckner, 1997;Holly and Jones, 1997;Gallin, 2006;Goetzmann, 1993;Goodhart and Hoffman, 2007;Bayer et al., 2010;Costello et al., 2011;Case et al., 2013;Han, 2013).As a consequence, the behaviour of housing markets and the properties of house prices have received much attention.In recent years, one element of this research has considered whether housing displays the risk-return characteristics predicted for other financial assets by standard finance theory.A feature of this literature is the repeated discussion of the existence of a counterintuitive negative risk-return relationship within housing markets (Dolde and Tirtiroglu, 1997;Morley and Thomas, 2011;Han, 2013;Lin and Fuerst, 2014).Clearly this runs contrary to the positive risk-return relationship depicted by theoretical finance where higher risk is compensated by higher returns.
The contribution of the present paper to the existing literature is twofold.First, previous research is extended by producing the first findings on this issue for the highly topical London housing submarket.Second, an explicit investigation is provided of the extent to which decisions regarding empirical design impact upon inferences concerning risk and return in housing markets.More precisely, the present research considers the influence of the components of empirical design upon the subsequent inferences drawn by investigators when examining the relationship between risk, as measured by volatility, and returns.Consequently, it is examined how the significant negative relationship which has featured so prominently in the literature is dependent upon stances taken with regard to decisions on variable definition, sample selection, optimisation methods, dynamic specification, regional disaggregation and modelling techniques.Interestingly the results of the current analysis show that while negative risk-return relationships are observed for an empirical design with very specific options selected for the sample, modelling technique and approach to dynamic specification employed, a more thorough analysis produces mixed findings.
To achieve its objectives, this paper will proceed as follows.In Section 2 a selected review of the literature on the analysis of volatility and risk in housing markets is presented.The various components of the empirical design employed to examine risk and return in the present analysis are provided in Section 3, with the empirical results from this analysis presented in Section 4. Section 5 provides some concluding remarks.

Literature Review
In this section a selected review of the literature in relation to the analysis of volatility and risk in housing markets is provided.Considering these two issues in turn, house price volatility has received much attention in the empirical literature for a number of years, as illustrated by, Foster and Van Order (1984), Crawford and Rosenblatt (1995), Dolde and Tirtiroglu (1997), Crawford and Fratantoni (2003), Miller and Peng (2006), Miles (2008, 2011), Miller and Pandher (2008), Morley and Thomas (2011) and Barros et al. (2015).However, while a wealth of empirical studies have emerged examining volatility, the development of theoretical explanations for its presence have received less attention with the inertia-based explanation of Case and Shiller (1988, 1989, 1990) and Wheaton's (2015) decomposition of volatility into demand-and supply-side factors being notable exceptions to this.Inspection of the empirical literature examining volatility shows the use of the autoregressive conditional heteroskedasticity (ARCH) model and its various extensions to feature prominently.Interestingly, it can be seen that their use has produced conflicting results.As an illustration of this, while Dolde and Tirtiroglu (1997), Crawford and Fratantoni (2003) and Miles (2008) present evidence of volatility in house prices for the USA, the results of Miller and Peng (2006) are less supportive.Similarly, the results of Miles (2011) suggest an element of volatility in UK house prices with just over half of the regions considered providing evidence of significant volatility, the evidence provided by Lin and Fuerst (2014) is more compelling for Canada.
With regard to 'risk', a number of differing perspectives have been adopted in the literature to consider its both is presence and its impact.Portfolio risk management provides the motivation for both Huang et al. (2016) and Zhou and Gao (2012).While Huang et al. (2016) consider the management of risk via diversification in real estate investment trust (REIT)-housing and stocks-housing portfolios, Zhou and Gao (2012) employ copula-based methods to explore risk management in real estate securities.A further alternative consideration of the role of risk in housing markets is provided by Tsang et al. (2016) where stochastic dominance analysis is employed to examine the impact of risk on housing purchase decisions in the Hong Kong market.Additional research more closely related to the present analysis is provided by Domian et al. (2015) where two specifications of CAPM are employed to consider the risk-return relationship across Metropolitan Statistical Areas (MSAs) in the USA using the Case-Shiller house price index.Extending the analysis to include leverage and liquidity risks, Domian et al. (2015) provide evidence of geographical variation across MSAs with counter-cyclical behaviour noted in some areas such as cities in Californian contasting with the higher levels of risk noted in, for example, New York.However, the literature on the risk-return relationship in housing is dominant by the use of ARCH-based models which are employed in the current analysis.Specific examples of this include Dolde and Tirtiroglu (1997), Morley andThomas (2011, 2016), Lin and Fuerst (2014) and Lee (2017).As noted by Han (2013) and in the studies above, a negative relationship between risk and returns has been noted in this literature.This variation in the sign of the risk-return relationship and its possible dependence on the nature of the approach adopted towards modelling provides the motivation for the current study.To explore this issue and provide an extension to the existing literature, the current analysis explicitly considers the impact of variations in the empirical design of econometric framework upon the results obtained with regard to the risk-return relationship.

Risk-return and the Housing Market
In this section the structure, or components, of the empirical analysis are outlined.The material is structured via consideration of the decisions made concerning variable definition, sample selection, modelling techniques and dynamic specification when undertaking an analysis of risk and return.Typically these decisions and assumptions are implicit (or unrecognised) when performing empirical analysis.In contrast, the present research makes explicit the alternative options available to investigators in the process of moving from an initial hypothesis of interest motivating empirical analysis to the testable framework from which inferences are drawn.

Data
To consider the nature of risk and return within housing markets, an obvious issue to consider concerns the actual series to be examined.In the present analysis, the London housing market is considered.The London housing market is a highly topical example receiving frequent attention within both academia and the media, and hence an attractive choice.The specific series considered herein are those for the 32 boroughs of London plus the aggregate London series.The house price data considered are monthly, seasonally adjusted observations on average house prices in the over the period January 1995 to December 2015.* While the London submarket has been examined recently by Abbott and De Vita (2012), the data set employed in the present analysis differs from that employed in their examination of housing market convergence.First, the aggregate London price series considered here was not considered by Abbott and De Vita (2012).Second, the 32 boroughs of London are considered herein without the addition of the City of London local authority district.† Third, in a further departure from the previous study, the house price series are considered in nominal terms as well as in real (inflation adjusted) terms following deflation using the consumer price index.Consideration of both nominal and real data recognises the common use of real data in analysis of risk and return, along with the argument within the economics literature that decision making is often undertaken by consideration of nominal values (Shafir et al., 1997).‡ Fourth, the sample considered differs from that employed by Abbott and De Vita (2012) as it is observed at a monthly, rather than quarterly frequency, with this higher frequency allowing an improved analysis of volatility.Finally, seasonally adjusted, rather than unadjusted, observations are considered for a sample which starts a year earlier and finishes over 6 years later than that of Abbott and De Vita (2012).As a result, the current analysis considers a sample of 252 observations compared to 54 in Abbott and De Vita (2012).Therefore, despite initially appearing similar, the data in the current analysis differ markedly from those in Abbott and De Vita (2012) due to sample span, number of observations, frequency, seasonal adjustment, aggregation and the nominal/real measurement dichotomy.Importantly, the presence of these options illustrates the relevance of the issue of empirical design discussed later in this paper.
Denoting the natural logarithms of the house price series as p t , house price returns are calculated as their difference r t = ∆p t .§ The use of changes in house prices to measure returns to housing is an issue that warrants some discussion.¶ This issue can be illustrated via consideration of the work of Bayer et al. (2010) where the return to housing is given as a combination of house price changes plus rental income.However, while this has a clear justification, the literature is dominated by studies measuring returns as house price changes without the inclusion of rental income.Examples of studies adopting this approach include the works of Dolde and Tirtiroglu (2002), Miller and Peng (2006), Morley andThomas (2011, 2016).In light of the prevalence of the use of house price changes as a measure of returns, this approach will be adopted in the present study to allow consideration of the returns to purely holding housing as an asset or, alternatively expressed, the returns to owner occupation.
The use of house price data for London in both disaggregated and aggregate form and in both nominal and real terms reflects clear decisions in the design of the empirical analysis undertaken in the current analysis.These variable-related assumptions or decisions are denoted here as V i .Similarly, options available concerning decisions on the sample employed can be denoted as S i .Within the present analysis, the full or maximum sample available is considered as a starting point for the empirical analysis.However, to explore the sample dependence of inferences drawn, rolling samples are considered also.To include a relatively large number of observations in each of the rolling samples examined, the 251 observations of the full effective sample are employed to create an † Abbott and DeVita (2012) include the City of London in their analysis.While this is a local authority district within the Greater London, it is a very different in nature to the boroughs of London.In addition, data for the City of London are not available for the preferred frequency considered herein.
‡ The CPI series was obtained from the Office of National Statistics (https://www.ons.gov.uk/)§ Examination of the order of integration of the returns series (both nominal and real) using the Im et al. (2001) panel unit root test resulted in rejection of the null.Hence the returns series are all treated as stationary processes.Further details are available from the authors upon request.
¶ We are grateful to the editor and an anonymous referee for raising the issue of measuring returns.
As noted, the measurement of returns has been extended by Bayer et al. (2010) to include rental income.Interestingly while this has a positive impact upon the level of returns, factors such as renovation costs and mortgage payments which have a negative impact are not considered.

Quantitative Finance and Economics
Volume 1, Issue 3, 272-287 additional 72 samples containing 180 observations.* * Via the use of these rolling samples, robustness of results across alternative sampling periods is explored.

Alternative modelling techniques
The prediction of a positive relationship between the returns on an asset and its associated risk is a standard feature of financial theory.However, as noted by Han (2013), a negative relationship between risk and returns has been noted repeatedly in the housing market literature (Dolde and Tirtiroglu, 1997;Morley and Thomas, 2011;Lin and Fuerst, 2014).When considering the examination of riskreturn relationships, an obvious and typically employed model to utilise is the GARCH-M specification (Engle et al., 1987).With returns on house prices denoted as r t , a standard GARCH(1,1)-M model can be expressed as follows: Application of ( 1)-( 2) allows examination of the risk-return relationship via the coefficient δ attached to the conditional standard deviation term (σ t ) in the mean equation.Although this specification is commonly applied in the literature, as has been noted by Scruggs (1998), variations on this model can be considered in terms of the specification of both the mean and variance equations.With regard to the variance equation, an obvious alternative to consider is the exponential GARCH (EGARCH) model of Nelson (1991).In addition to allowing a broader coverage of behaviour to capture potential asymmetric responses, the EGARCH model is attractive to practitioners as it does not require consideration of non-negativity constraints associated with the {φ 1 , φ 2 } parameters in the GARCH model of (2) above.The benefits of the EGARCH specification are apparent in its widespread adoption in the literature with Miller and Peng (2006), Lee (2009), Miles (2011), Morley andThomas (2011, 2016) and Lin and Fuerst (2014) all employing this model.Extension of the GARCH-M model of ( 1)-(2) to consider an analogous EGARCH(1,1)-M specification results in the model below: To allow for the heavy, or thick, tails observed in financial series, the above models are estimated using the generalised error distribution (GED) for the error u t .In addition to allowing for heavy-tailed errors, use of the GED affords flexibility in the analysis conducted via unrestricted estimation of its underlying shape parameter (ν) for every model considered.Hence the degree of thickness of the tails of the error distribution can be tailored to the series, with movement from the Normal distribution (ν = 2) to heavier tailed processes (ν < 2) arising due to the estimation of smaller values for the shape parameter.Following the above discussion, the options available for these model-related components of the empirical design are denoted as M i .

Dynamic specification
With regard to the specification of the mean equations in (1) and ( 3), these static expressions can be expanded to include a dynamic structure as noted by Scruggs (1998).Inclusion of an autoregressive component results in the GARCH(1,1)-M specification of ( 5)-( 6): and the EGARCH(1,1)-M model of ( 7)-( 8): Estimated risk-return coefficients δ can then be drawn from models employing alternative lag structures, or values of p. Investigators could consider the risk-return coefficients obtained from the static models discussed above δ 0 or the maximum and minimum values δ max , δ min observed to be statistically significant across a range of values of p.Alternatively, a more typical approach is to consider coefficients obtained from models employing an optimised value of p, with the Akaike Information Criterion (AIC) frequently utilised as the means of selecting the optimum lag length.These coefficients are denoted herein as δ AIC with the optimum lag length defined as that generating the minimum value of the AIC, with the AIC specified as: where and T denote the log-likelihood and sample size respectively.In light of the monthly frequency of the data considered in the present analysis, lags from a maximum value of p = 12 down to a minimum of no lags (p = 0) are considered herein.Continuing the above notation, these options or alternative possibilities concerning dynamic specification can be denoted as D i .

Empirical design components
When considering the risk-return relationship via the use of (E)GARCH-M models, the parameter of interest is the risk coefficient in the relevant mean equation.The two issues of importance concerning this coefficient are its significance so as to determine whether a significant relationship exists and, if so, whether it is positive, as predicted by theory, or whether it is negative.However, as the above discussion in this section has made explicit, a series of decisions are required to structure the subsequent empirical analysis to permit testing of this hypothesis and inferences to be drawn.This movement from the initial focus upon the risk-return coefficient through to inference upon its nature is summarised in Figure One below.As this illustration depicts, the hypothesis of interest (H) providing the motivation for empirical analysis is surrounded by assumptions and decisions relating to variable selection and definition (V i ) , sample selection (S i ) , the models employed (M i ) and dynamic specification (D i ).It is argued that as a result of being embedded within these surrounding assumptions/options, any resulting inferences will be dependent upon not just the truth or falsity of the underlying hypothesis of interest, but also the impact of the decisions relating to {V i , S i , M i , D i }.Recognition of this 'jointness of testing' as a result of moving from an initial hypothesis to a composite testable form is present in the philosophy of science, particularly in relation to the Duhem-Quine thesis and the (im)plausibility of 'crucial experiments' allowing the evaluation of hypothesis of interest in isolation from surrounding factors.† † This issue has been considered in the economics literature also where Cross (1982) provides a theoretical analysis and discussion of the auxiliary assumptions associated with the empirical examination of monetarism.In the present study this form of analysis is extended to provide an empirical evaluation of the impact of variation in empirical design in relation to a very topical issue.As a result of this detailed consideration, the robustness of risk-return inferences is explored rather than consider a specific, single analysis.

Results
The results obtained from estimation of the GARCH(1,1)-M and EGARCH(1,1)-M models over the full sample are presented in Tables One and Two.As a result of the use of two models, alternative options for M i are considered.Similarly, the use of both nominal and real returns for 33 regions (32 boroughs plus the aggregate London series) provides results for alternative V i .Further to this, the † † See Harding (1976) for a very readable collection of essays concerning the Duhem-Quine thesis.tabulated results present a variety of risk-return coefficients obtained from alternative approaches to dynamic specification (D i ).In particular, statistically significant (at the 5% level) estimated risk-return coefficients from static models δ 0 , obtained from AIC optimisation of the lag length δ AIC and the maximum and minimum values obtained across alternative lag specifications δ max , δ min are provided to evaluate the impact of variations in D i .While Tables One and Two allow examination of the impact of variation in {V i , M i , D i }, the results are generated using the full sample available and hence reflect use of a single value for S i .In recognition of this, further results are presented in Tables Three and Four where rolling samples are employed to consider the impact of variations in S i .These full sample and rolling sample results are considered in turn below.

Full sample results
Turning to the results for the GARCH(1,1)-M model in Tables One and Two, there is evidence of a significant, negative risk-return relationship for a number of series examined.More precisely, there are 9 (13) series for which the static model produces a significant coefficient for nominal (real) returns respectively.In contrast, results obtained from use of AIC optimisation depict a single significant riskreturn coefficient for both the nominal and real series.Hence, alternative decisions on both V i and D i influence the extent of the derivation of significant results.In terms of V i , real returns produce a greater number of significant results than nominal returns, and (in)significance varies widely across regions.Similarly, the influence of decisions concerning D i is apparent via the difference in findings for coefficients from static and AIC optimised models.However, the extent of the detection of significant findings is relatively low given 33 series are examined, as is reflected in the small number of instances in which significant (at the 5% level) δ max and δ min coefficients are observed across the full set of lag lengths considered.
Considering the results for the EGARCH(1,1)-M model, a vast increase in significant results for the static model is apparent with nominal and real returns producing 29 and 27 significant negative coefficients respectively.In addition, nominal and real returns generate 2 and 4 significant positive coefficients respectively for the static model.Consequently, variation in M i via the movement from GARCH-M to EGARCH-M has a substantial impact upon the detection of significant risk-return relationships and leads also to the introduction of positive coefficient estimates to the analysis.Turning to the results for the AIC optimised coefficients, the nominal (real) series produce 4 (6) positive coefficients and 5 (9) negative coefficients.This represents a far more balanced outcome than that observed for the static model.The issue of the sign of the risk-return coefficient is more apparent when considering δ max , δ min for the EGARCH(1,1)-M model where only one series fails to produce significant coefficients when considering nominal and real returns.Interestingly, these latter results show with regard to the sign of the risk-return coefficient, 31 (33) series produce negative coefficients for the nominal (real) series, while 14 (17) series produce positive coefficients.This range of outcomes for the EGARCH model compared to the GARCH specification, along with the variation in results observed under use of AIC optimisation and static models illustrates clearly the impact of M i and D i upon inferences.While the use of nominal or real has an influence upon results derived, with a greater number of significant findings observed for the latter, the impact of V i upon inferences is most apparent in terms of the effects of disaggregation, with widespread variation observed for the alternative series examined.In summary, these findings display a variation in sign and significance which indicates the counterintuitive negative risk-return relationship depicted by static EGARCH-M

Quantitative Finance and Economics
Volume 1, Issue 3, 272-287 models is far less prevalent when alternative decisions are made concerning dynamic specification and modelling technique within the empirical design.

Rolling sample results
To explore the impact of the selected sample period upon inferences and whether the conclusions drawn from Tables One and Two are dependent upon the use of the specific full sample available, the EGARCH(1,1)-M is estimated over 72 rolling samples of 180 observations.The results obtained from this analysis are presented in Tables Three and Four below.For each of the series considered, the maximum and minimum statistically significant values for the δ AIC across these 72 samples are reported.This provides information on the variation in, or range of, results arising from consideration of different samples.Further information on this variation is provided by C AIC which presents the percentage of samples for which statistically significant values of the δ AIC are observed (under the heading 'total') and their sign (under the headings 'positive' and 'negative').To illustrate these results, consider the findings in Table Three for the first series (Barking).It can be seen that significant values of δ AIC from 0.65 down to −6.13 are observed across the 72 samples, and that while 19% of samples return significant positive δ AIC values and 3% return significant negative values, the δ AIC is insignificant for the remaining 78% of samples.As the δ AIC denotes the optimised value of δ according to the AIC across 13 alternative lag lengths (p = 0, 1, ..., 12), further significant values of δ are potentially available within each sample across the 12 other (non-optimal) lag lengths considered.In recognition of this, C ALL is reported to provide information on the percentage of samples producing significant positive and negative values of δ.
Considering the results in Tables Three and Four, 28 (31) regions produce negative values of δ AIC for nominal (real) returns, while 25 (30) regions produce positive values.These figures are clearly very similar in terms of the extent of negative/positive values.With regard to the percentages of samples for each series where the δ AIC was negative/positive, this is again balanced with an average of 14% (23%) of samples returning negative values and 18% (23%) producing positive values for nominal (real) returns.Therefore, the extension of the analysis to consider alternative sample periods has resulted in both increased detection of significant risk-return relationships using the AIC and evidence of its variability with a positive relationship being more frequent than a negative relationship.The results for C ALL , where the percentage of significant values of δ are reported across all lag lengths considered over the 72 samples, show that nearly all samples return significant negative coefficients, with the percentages being 95% and 96% respectively for nominal and real returns.In contrast, on average less than half the samples return significant positive coefficients, with the average number of samples being 42% for both nominal and real returns.Therefore, while consideration of all available significant riskreturn coefficient results in a prevalence of counterintuitive negative values, inspection of coefficients obtained from optimisation of the lag length leads to a relatively balanced finding in terms of negative and positive values.Notes: The first two columns of figures provide maximum and minimum values of the δ AIC significant at the 5% level over 72 rolling samples.C AIC is the percentage of samples generating significant δ AIC values with the percentage of negative and positive values along with the total percentage reported.C ALL provides the percentage of samples generating significant δ values with the percentages for both positive and negative coefficients provided.

Quantitative Finance and Economics
Volume 1, Issue 3, 272-287 The above findings illustrate the variation in results concerning risk and return in the housing market due to alternative decisions regarding the use of models, variables, dynamic specification and samples.These findings have illustrated the empirical relevance of the jointness of hypothesis testing as depicted in Figure One.More specifically, the analysis has made explicit the dependence of the often cited counterintuitive finding of a negative risk-return relationship upon the approach taken to empirical design.While the full sample results for EGARCH-M model show the prevalence of negative coefficients, results observed for AIC optimisation and rolling samples do not support this finding.To illustrate further the variability of results across alternative samples and series (that is, alternative S i and V i ), Figures Two and Three present the significant δ AIC values for three London boroughs using the EGARCH(1,1)-M model.In Figure Two, the values of the δ AIC for real house prices for Waltham Forest are plotted.To ease consideration of the crucial issue of the sign of the coefficient, a line at zero is included for the horizontal axis.From inspection of this graph, it can be seen that the first half of the 72 rolling samples are dominated by positive estimated coefficients while the second half of the samples does not return anything but negative values.As a consequence, variability in results is apparent when just the decision regarding samples is allowed to vary and decisions on the model, dynamic specification and variable are held constant.Such variation when just 1 factor is variable (and 3 are not), is compelling evidence against a certainty in the nature of the risk-return relationship.Similarly, to illustrate variability across alternative variables, Figure Three depicts analogous results for real house prices in Greenwich and Havering.In this instance, variation across series is illustrated as the former region generates positive δ AIC values only, while the latter has negative values only.In summary, Figures Two and Three present variability within series and across series respectively.

Concluding Remarks
The above analysis has provided a detailed examination of the risk-return relationship in the London housing market.In addition to providing the first findings within the literature for this issue, the analysis contains a more general message relating to the impact of decisions concerning empirical design upon risk-return inferences.The results provided indicate that while the counterintuitive negative risk-return relationship discussed frequently in the literature does arise, this is most prevalent when very specific decisions are taken with regard to models, dynamic specification and sample periods.In contrast to this, a more flexible and more preferred approach involving optimisation results in the generation of mixed results where a relatively balanced number of positive and negative findings are apparent.The analysis therefore provides a clear message concerning the impact of typically implicit and overlooked decisions on empirical design upon inferences.An obvious future line of research would involve a meta-analysis to consider the linkages between empirical design and inferences in previous research and hence a broader evaluation of the impact of design upon the robustness of conclusions relating to the risk-return relationship.

Figure 1 .
Figure One: From hypothesis of interest to inference Figure 1.From hypothesis of interest to inference.

Figure 2 .
Figure Two: Rolling sample risk-return for Waltham Forest real house prices

Figure 3 .
Figure Three: Rolling sample risk-return for Greenwich and Havering real house prices

Table 1 .
Risk-return coefficients: Nominal returns.Notes: The above figures are the static mean equation, AIC optimised, maximum and minimum risk-return coefficients obtained from the GARCH-M and EGARCH-M specifications for nominal house returns.Significance at the 5% and 1% levels are denoted by a and b respectively.
Notes:The above figures are the static mean equation, AIC optimised, maximum and minimum risk-return coefficients obtained from GARCH-M and EGARCH-M specifications for real house returns.Significance at the 5% and 1% levels are denoted by a and b respectively.

Table 3 .
Rolling sample risk-return analysis: Nominal returns.