Hyper-local geographically weighted regression: extending GWR through local model selection and local bandwidth optimization

Geographically weighted regression (GWR) is an inherently exploratory technique for examining process non-stationarity in data relationships. This paper develops and applies a hyper-local GWR which extends such investigations further. The hyper-local GWR simultaneously optimizes both local model selection (which covariates to include in each local regression) and local kernel bandwidth specification (how much data should be included locally). These are evaluated using a measure of model fit. The hyper-local GWR approach evaluates different kernel bandwidths at each location and selects the most parsimonious local regression model. By allowing models and bandwidths to vary locally, this approach extends and refines the one-size-fits-all "whole map model" and "constant bandwidth calibration" under standard GWR. The results provide an alternative, complementary and more nuanced interpretation of localized regression. The method is illustrated using a case study modeling soil total nitrogen (STN) and soil total phosphorus (STP) from data collected at 689 locations in a watershed in Northern China. The analysis compares linear regression, standard GWR, and hyper-local GWR models of STN and STP and highlights the different locations at which covariates are identified as significant predictors of STN and STP by the different GWR approaches and the spatial variation in bandwidths. The hyper-local GWR results indicate that the STN processes are more non-stationary and localized than found via a standard application of GWR. By contrast, the results for STP are more confirmatory (i.e., similar) between the two GWR approaches providing extra assurance about the nature of the moderate non-stationary relationships observed. That is, a standard GWR may underestimate localized spatial heterogeneity where it is strongly present (as in the STN case study) and may overestimate it where spatial homogeneity is present (as in the STP case study). The overall benefits of hyper-local GWR are discussed, particularly in the context of the original investigative aims of GWR. A hyper-local approach provides a useful counter view of local regression modeling to that found with standard GWR. Where spatial non-stationarity exists, the hyper-local GWR provides a more spatially nuanced indication of the localization than a standard GWR analysis and can be used to suggest the direction of further analyses and investigations. Some areas of further work are suggested.

using a case study modeling soil total nitrogen (STN) and soil total phosphorus (STP) from data collected at 689 locations in a watershed in Northern China. The analysis compares linear regression, standard GWR, and hyper-local GWR models of STN and STP and highlights the different locations at which covariates are identified as significant predictors of STN and STP by the different GWR approaches and the spatial variation in bandwidths. The hyper-local GWR results indicate that the STN processes are more non-stationary and localized than found via a standard application of GWR. By contrast, the results for STP are more confirmatory (i.e., similar) between the two GWR approaches providing extra assurance about the nature of the moderate non-stationary relationships observed. That is, a standard GWR may underestimate localized spatial heterogeneity where it is strongly present (as in the STN case study) and may overestimate it where spatial homogeneity is present (as in the STP case study). The overall benefits of hyper-local GWR are discussed, particularly in the context of the original investigative aims of GWR. A hyperlocal approach provides a useful counter view of local regression modeling to that found with standard GWR. Where spatial non-stationarity exists, the hyper-local GWR provides a more spatially nuanced indication of the localization than a standard GWR analysis and can be used to suggest the direction of further analyses and investigations. Some areas of further work are suggested.

Introduction
Geographically Weighted Regression (GWR), as first described in Brunsdon et al. [5], is a commonly used approach in spatial analysis. It has at its core the idea that global or whole map statistical models may make unreasonable assumptions of spatial non-stationarity amongst the processes under investigation [33]. The intention of GWR was to provide an exploratory approach to investigate the spatial nature of relationships between response and predictor variables, and in so doing, to provide a better understanding of the process under consideration. It has conceptual elegance; local regression models are constructed at different locations using data under a moving window or kernel, which are weighted by the distance to the kernel center such that data furthest away contribute less to the overall model. Because of this, the geographically weighted (GW) framework has been extended to include different types of models including GW principal components analysis [20], GW summary statistics [4], GW discriminant analysis [6], GW variograms [22], GW Structural Equation Models [11], and has been applied in domains with little tradition of local statistical approaches such as remote sensing (eg [9,12,15]). The fundamental aims of GWR and GW frameworks are thus to explore spatial relationships in data and processes.
One of the key parts of any GW analysis is to determine an optimal kernel size or bandwidth, as this controls how much data are included in each local model and the degree of smoothing or "localness" in the GW model. Gollini et al. [19] provide a full discussion but in essence the bandwidth determines the scale at which each localized model operates. Smaller bandwidths result in greater local variation in the outputs and larger ones result in outputs that are increasingly closer to the global measure. Optimum kernel bandwidths can be found by minimizing a model fit diagnostic and most GWR implementations use a leave-one-out cross-validation (CV) score, the Akaike Information Criterion (AIC) [1], or a corrected version of the AIC [25]. Essentially what these do is construct a local model at each location for each bandwidth and then the model fit is calculated from all of the local models for that bandwidth. The bandwidth with the best (lowest) score is selected. A standard implementation of GWR frequently determines which covariates to include using a global model selection procedure and then determines the optimal bandwidth using the model fit procedure described above. A GWR generates coefficient estimates at each location and these are commonly mapped to show the spatial variation in the degree to which changes in covariates x are associated with changes in y. Thus bandwidth optimization and model selection are both global in nature, the same covariates and bandwidth are specified for each local regression of GWR.
This paper proposes an enhancement to standard applications of GWR that allows both model selection and bandwidth to vary locally. The aim of such hyper-local approaches to GWR is to provide a still deeper understanding of the spatial nature of the processes under investigation. As with GWR, hyper-local GWR applies a local regression under a moving window or kernel at each location under consideration, but it simultaneously optimizes both the local regression model and the local kernel bandwidth. This is entirely novel: although model selection in GWR has been done [42], it has not been combined with nonconstant bandwidth selection where bandwidths are truly local and unique (i.e. [34,35]). Local model selection helps to identify which covariates are important in explaining the variation in the dependent variable and where they are important. The corresponding local bandwidths in turn provide insight into the local scales of influence. The hyper-local GWR approach provides an alternative interpretation of localized regression by extending GWR through local model selection and local bandwidth optimization. It complements and enriches a standard application of GWR.

Methods
Linear regression, GWR, and the proposed hyper-local GWR were used to construct models of soil total nitrogen (STN) and soil total phosphorus (STP). The analyses used the data described in Wang et al. [41].

Data and study area
The case study data reports measurements made at 689 locations in the Liudaogou watershed, within the Loess Plateau, located 14 km West of Shenmu, Shaanxi Province, China. Wang et al. [41] provide a full description of the data including descriptive statistics and correlations amongst the variables used in the study. In brief, this is a small watershed with an altitudinal range of 1081m to 1274m, a semi-arid climate with mainly grassland land use. The data were collected at locations on an approximate 100m by 100m grid (Figure 1) and analyzed in the laboratory to provide measurements of covariates commonly associated with STN and STP: soil organic carbon (SOCgkg), clay (ClayPC), silt (SiltPC), sand (SandPC), nitrate nitrogen (NO3Ngkg), and ammonium (NH4Ngkg). Some of the variables were transformed using natural logs (STN, SOCgkg, NO3Ngkg, NH4Ngkg) and square roots (STP, ClayPC), as was done by [41].

Linear regression and GWR
A standard linear regression for spatial data is specified as follows: where for observations indexed by i = 1, ...n, y i is the response variable, x ij is the value of the j th predictor variable, m is the number of predictor variables, β 0 is the intercept term, β j is the regression coefficient for the j th predictor variable and i is the random error term. GWR is similar in form to linear regression, except that GWR calculates a series of local linear regressions rather than one global one. A GWR model has locations associated with the coefficient terms: where (u i , v i ) is the spatial location of the i th observation and β j(ui,vi) is a realization of the continuous function β j (u, v) at point i. The geographical weighting results in data nearer to the kernel center making a greater contribution to the estimation of regression coefficients at each local regression calibration point k. For this study, the weights were generated using a bisquare kernel for the bandwidth parameter which is defined by: where d ik is the distance between the kernel centre and regression calibration point k and h is the bandwidth. Here h can be specified as a fixed (constant) distance value, or in an www.josis.org adaptive, varying distance way, where the number of nearest neighbors is fixed (constant). In this case, fixed, distance-based kernel bandwidths were determined using the AIC-based model fit procedure. Fixed bandwidths were chosen to support direct understandings of the spatial scales of relationship non-stationarity and because the data locations are regularly spaced.

Hyper-local GWR
In a hyper-local GWR, both the bandwidth and the regression model selection are optimized locally rather than globally across all local models as in a standard GWR. A sequence of bandwidths was investigated (from 200 m to 3700 m in steps of 50 m, n = 63) and at each location regression models of STN and STP were constructed using weighted data falling under the kernel. Then a stepwise AIC model selection procedure was applied-in this case the stepAIC function in the MASS R package [36]. Thus for each location, 63 local regression models of STN and STP were constructed, and a stepwise AIC was used to determine the locally selected regression model at each location under each kernel size. The AIC scores of each selected model was calculated resulting in 63 AIC scores at each location. The "best" model and bandwidth combination at each location was that with the lowest AIC. The use of AIC as a method for model selection will be returned to in the discussion.

Linear regression
Linear regression models of STN and for STP were constructed from the six covariates and a stepwise AIC model selection procedure was applied. Tables 1 and 2 summarize the coefficient estimates and the selected covariates. In the case of STN, the full model is being driven by SOCgkg, SiltPC, and NO3Ngkg which are significantly associated with STN. The model is similar to that described in Wang et al. [41] with an R 2 of 0.61 and all of the covariates positively associated with STN except NH4Ngkg. The AIC selected model does not include the ClayPC, SandPC, and NH4Ngkg covariates. Observe that NO3Ngkg is not significantly associated with STN in the AIC selected model (at the 95% level). The significant predictors of STP in the full model were  Table 2: Summary of the coefficient estimates arising from the Full and AIC selected linear regression models of STP.
SOCgkg, SiltPC, SandPC, and NH4Ngkg with an R 2 value of 0.40, again similar to the findings of Wang et al (2009). The selected model did not include the covariates for NO3Ngkg and SandPC, but all retained covariates were significant. In both cases the selected models reflect the impact of silt and soil organic carbon in increasing the soil surface area supporting higher absorption capacities, and thus concentrations of STN and STP, as noted by Wang et al. [41]. The AIC selected models are more parsimonious model but with weaker R 2 and adjusted R 2 values as would be expected.
Note that the selected model does not necessarily include covariates that are significant and that non-significant covariates in the full model may be included in the selected model and may become significant (e.g., the ClayPC covariate for the STP regression). The key point is that the variance in STN and STP can be explained by two competing, but equally valid linear regression models. This concept is repeated locally in the subsequent GWR analyses and is a cornerstone of this paper.

Standard GWR
Linear regression models assume that the contributions to the model made by the different covariates are the same across the study area. In reality, this assumption of process spatial invariance may be violated and GWR seeks to quantify the spatial variation in data relationships. In a standard GWR analysis, covariate selection is typically undertaken globally and the same regression model is constructed locally using weighted data subsets. The coefficient estimates are commonly mapped and local covariate selection (and goodness of fit evaluations) can be done by identifying local covariate t-values that indicate coefficients to be significantly different from zero (e.g., [24]).
The optimal bandwidths for GWR models of STN and STP were found at 1026m and 1629m, respectively. These were used to calibrate the GWRs constructed at each of the sample locations in Figure 1. The local coefficient estimates from these are summarized in Tables 3 and 4. The GWR coefficients for STN show considerable spatial variation (via the inter-quartile range, IQR) and much less is found in the local STP models, as also reflected in the larger bandwidth. For example, in the STN GWR model the coefficient estimates for SandPC and NO3Ngkg have IQRs of 0.0408 and 0.1267, respectively, while in the STP GWR model these have relatively small IQRs (0.0023 and 0.0077, respectively). However www.josis.org note the relatively high variation of the IQRs of the local coefficient estimates in the GWR models compared to the global coefficient estimates.   The spatial variations in the coefficient estimates arising from the two GWR models are mapped in Figures 2 and 3 and indicate the relative importance of the contribution made to each local model by each covariate at each location. They confirm that there is much greater spatial variation in the relationships associated with STN than with STP.
The t-values in Figures 2 and 3 show where local coefficients are significant and thus where a covariate is an important predictor of STN or STP. This provides an indication of local covariate selection from the full model and is analogous to the global full models reported in Tables 1 and 2. For example, it is evident in both GWR models that SOCgkg is strongly and significantly associated with STN and STP across all locations, but the strength of this association varies spatially. Whereas significant coefficient estimates of NO3Ngkg are highly localized in each GWR model indicting strong associations in the north east and center of the study area with STN and strong associations in the north with STP. In general, significant relationships are much more localized for STN than for STP.

Hyper-local GWR
The GWR analysis applied the same kernel bandwidth and included the same full set of covariates in each local regression model. Figures 2 and 3 display the spatial distribution of the GWR coefficient estimates and a degree of local model selection is possible through exploration of the local t-values associated with the local coefficient estimates. This is a standard application of GWR, supporting investigations of process heterogeneity with respect to spatially-varying relationships. The hyper-local GWR approach provides an alternative interpretation of localized regression through local model selection and local bandwidth optimization. It builds on previous GWR studies by Paez and Wheeler that have identified analytical advantages when locally-determined, non-constant bandwidths are applied [34,35] and when covariate selection is determined locally [42]. It combines these localized characteristics but the ultimate objective is entirely different to the studies by Paez and Wheeler. Paez et al. [34,35] were concerned about modeling a non-stationary error variance in GWR via a parametric approach and Wheeler [42] sought to address local collinearity in GWR via a lasso approach.
www.josis.org For each of the 689 data points, the hyper-local GWR identified the components of the best fitting model for each of the 63 bandwidths (from 200 m to 3700 m in intervals of 50 m) and returned the AIC score for the model. Thus it was possible to determine the best fitting model, with the lowest AIC score at each location. from the southeast to the northwest. This suggests that local regressions in this area are informed by data subsets of a similar size to that found with standard GWR (with its constant bandwidth of 1026 m). Elsewhere, the bandwidths are much smaller (200-1000 m), so that local regressions in these areas are informed by much smaller data subsets. The distribution of bandwidths in the hyper-local GWR model is on the whole indicative of increased localized spatial heterogeneity in data relationships, which is more than that suggested by the standard GWR analyses above. Conversely, the STP bandwidths range from 1500-3700 m and are much larger almost everywhere than the constant bandwidth for standard GWR at 1629 m. Thus, most of the local regressions in a hyper-local GWR are informed by much larger data subsets than a standard GWR. Only to the center of the study area are bandwidths from hyper-local GWR of similar size to a standard GWR. The larger bandwidths indicate reduced spatial heterogeneity to that found with standard GWR, and suggests spatial homogeneity in the relationships (i.e., tending to the global regression).

Local covariate selection and distribution of coefficient t-values
Investigating the spatial variation in bandwidth size is only one aspect of hyper-local GWR and should be linked to consideration of local covariate selection. Table 5 summarizes how many times each covariate was selected using stepwise AIC at each of the 689 locations in the hyper-local GWR models. There are a number of interesting points. STN model selection for the global regression (Table 1 excluded ClayPC, SandPC, and NH4Ngkg, while www.josis.org these are now selected in 522, 641, and 439 out of 689 hyper-local models, respectively). STP model selection for the global regression (Table 2) excluded SandPC and NO3Ngkg, while these are now selected in 689 and 170 out of 689 local models, respectively). Additionally, three covariates were always selected regression (SOCgkg, SiltPC, and SandPC), whereas for STN none were. This suggests that there are potentially interesting local interactions between covariates which are missed in standard GWR in which all six covariates are included in the model for all 689 local regressions. STN    For example, in the STN models (comparing Figures 2 and 5), SandPC is a significant covariate at most locations in the hyper-local GWR model. In the standard GWR model (Figure 2 it is only significant in two sub-regions to the north and center of the study area. Whilst, NH4Ngkg in the standard GWR model of STN is significant in the northwest of the study area, but has a much wider significance in the hyper-local GWR model. These results indicate that when the bandwidth and covariate selection are more localized under the hyper-local GWR, then significant non-stationary relationships result, that are not apparent with standard GWR. Similar interpretations apply to STN relationships with SiltPC, NO3Ngkg, and ClayPC while it appears that STN's relationship to SOCgkg is consistent across both GWR forms. Note also that hyper-local GWR tends to provide spatially disjoint areas of covariate selection and coefficient significance, reflecting highly localized processes. For the STP process, comparing Figures 3 and 6, there are very similar patterns for significant coefficients from the hyper-local GWR and from the standard GWR for all six covariates, although NO3Ngkg, SandPC, ClayPC, and NH4Ngkg show enlarged localized areas of significance under the hyper-local model. Note that NO3Ngkg is only selected in 170 sample locations in hyper-local GWR (see Table 5) and these are in the north, precisely where the standard GWR shows the NO3Ngkg relationships as significant.
Clearly, these results indicate that when the bandwidth and covariate selection tend towards the global solution, as with the hyper-local GWR of STP, the non-stationary relationships that result from a hyper-local GWR are broadly similar for both forms of GWR. However, where localized spatial heterogeneity is present in data relationships, as with STN, the hyper-local GWR provides a more spatial nuanced indication of the localization than a standard GWR analysis.

Comparisons of global and local model fit
The final analysis compared the three different regression models in the degree to which they (in-sample) predict STN and STP. The scatterplots in Figure 7 show fitted values against observed values for these six models. For STN, the model fits improve with increaswww.josis.org    standard GWR, as the bandwidths for hyper-local GWR tend to be larger and the process tends towards the global fit. Care must be taken in the interpretation of model fit results, as any form of localized regression will tend to provide an improved prediction accuracy, the more complex it gets (hence the strong performance of hyper-local GWR for STN). Furthermore, although hyperlocal GWR is shown to improve fit for the STN process, this has little predictive value, as hyper-local GWR cannot be used as an out-of-sample predictor. This is because the outof-sample prediction does not have its own local bandwidth, whereas for standard GWR, the global bandwidth can be used [23]. Thus, hyper-local GWR is solely for guiding spatial exploration and inference only, as demonstrated in this study.
It is important to investigate local model fit characteristics so that the outputs in Figures  2 to 6 can be placed in better context and geographically contrasted. Figure 8 compares the local R 2 values for standard GWR and hyper-local GWR models for STN and STP and indicates that hyper-local GWR provides a better fit in 503/689 and 5/689 locations for STN and STP, respectively. Thus, for the STN process, the local regressions of standard GWR could be considered sub-optimal in 73% of the locations, whilst for the STP process, the local regressions of standard GWR are, in general, reasonable. The magnitude of the differences are much greater for STN than for STP. If Figure 8 is compared with Figure 4, the areas where a hyper-local approach provides a better model fit for STN directly correspond to those where a much smaller local bandwidth was selected. This behavior is not so apparent for the STP process.
The maps in Figure 8 confirm what has already been described. For STN, hyper-local GWR suggests a more localized relationship process where local model fit can improve using fewer data points and fewer covariates. Standard GWR is under-fitting the true nonstationary relationship process and this effect is not uncommon (e.g. [21]). Conversely, it is always possible that hyper-local GWR is overfitting. The STN process is, in general, www.josis.org  well-informed by the six covariates. For STP, hyper-local GWR suggests more moderately spatially-varying relationships but where local model fits are similar (slightly weaker) to that found for standard GWR. Thus, the application of hyper-local GWR provides little value to an extended use of nearby data points with often fewer covariates for its local regressions. The STP process, is in general, not well-informed by the six covariates.

Discussion
GWR is an inherently exploratory approach for examining and investigating process nonstationarity in data relationships. The hyper-local GWR extends these investigations further. Whereas a standard GWR employs a one-size-fits-all bandwidth and a one-size-fits-all local regression model, the hyper-local GWR approach evaluates different kernel bandwidths and models at each location. It provides an alternative and complementary interpretation of localized regression by locally selecting the most parsimonious model (by local sample and covariate size), for which spatially distributed coefficient estimates and t-values can also be found. The local selection of the most parsimonious model is analogous to what is commonly done in a global analysis, where a summary of the full model is presented alongside a reduced, selected covariates model.

www.josis.org
The investigations show that where the non-stationarity of relationships tend towards the global, as with STP, the results are similar to a standard GWR (compare Figures 3 and  6). However, where localized spatial heterogeneity and spatial non-stationarity are present, as with STN, the hyper-local GWR provides a more spatially nuanced indication of the localization than a standard GWR analysis (compare Figures 2 and 5). Thus the hyper-local GWR results can be used to guide the direction of the next analytical steps. Further analysis of the STN could consider adopting a more sophisticated spatially-varying coefficient model (e.g., [18]), including models that accounts for non-linearity (e.g., [2]). Further analysis of STP could consider a spatially-autocorrelated regression given that its GWR analyses were not entirely promising (e.g., [21]).
Determining local bandwidth size and local covariate selection is also in the same spirit as (but with entirely different objectives to) the GWR models of Paez et al. [34,35], Wheeler [42], and Yoneoka et al. [43]. These are analogous to developments in local (attribute-space) regression [29,40] from which GWR originates [5,31], but the GWR models of Paez only do local bandwidths (not local covariate selection) and the GWR models of Wheeler and Yoneoka only do local covariate selection (not local bandwidths). The exploratory and enhanced spatial nuance of hyper-local GWR reflects recent developments within the broad family of GWR methods that has promoted wider consideration of scale and distance. These include hierarchical GWR models [25], consideration of distance metrics [10,30], and flexible bandwidth GWR models [17,26] that select different bandwidths for each dependent/independent data relationship, rather than for each location as here. These multiscale GWR models are closely aligned to the spatially-varying coefficient models of Gelfand et al. [18] and Murakami et al. [32].
There is a computational cost to hyper-local GWR approaches which evaluate a model for each bandwidth at each location, rather than a standard GWR, which just evaluates a single bandwidth and a single model. It terms of computing time, a standard GWR using the GWmodel package v2.0-5 [19] in R took 2.96 seconds to run on this data. The hyperlocal approach took longer-20.7 minutes-because of the number of calculations but also because the algorithm has been transparently (rather than efficiently) coded. The data and code used in this analysis have been made available (see the acknowledgments section).
There are a number of considerations relating to the GWR models applied and demonstrated in this study. The first is collinearity. Standard GWR and hyper-local GWR are not designed to address collinearity issues, but here hyper-local GWR could be adapted to mitigate against such issues, in a similar manner to that proposed for standard GWR (e.g., [2,3,8]). In the hyper-local GWR approach described here, local models were selected under different bandwidths (i.e., with different local data subsets) at each location. Such model selection procedures identify the most parsimonious model and implicitly tend to select explanatory variables that are not collinear. The second concerns multiple hypothesis tests (MHTs) and spatial heteroskedasticity in the error term. Presenting GWR t-values in an uncorrected form can lead to the false discovery rate problem. Here the MHT corrections suggested by da Silva and Fotheringham [13] could be adopted for both standard and hyper-local GWR t-value outputs. Similarly GWR models that account for spatial heteroskedasticity in the error term can be found in Paez et al. [34,35] and in Fotheringham et al. [16] and Shen et al. [37], and the latter provides a procedure to similarly adapt a hyperlocal GWR model. However, such GWR models have not been widely adopted, mainly due to inherent inferential issues, and as such are only viewed as exploratory (e.g., see Shen et al. [37]). A third consideration is the choice of kernel, the bandwidth type (fixed by dis-tance, as used here, or fixed by sample size), and the choice of distance metric, all of which effect perspectives of coefficient non-stationarity. It would be interesting to examine the degree of difference between the STN and STP standard and hyper-local GWR models under such different parameterization choices. Gollini et al. [19] provides overviews of these considerations. Fourth, another area of future work is to examine the potential for overfitting with hyper-local methods. One way to test whether hyper-local GWR describes the study data significantly better than standard GWR would be to adapt the F -test procedure given in Leung et al. [27]. Here, instead of assessing standard GWR against the global linear regression, hyper-local GWR is assessed against standard GWR, where the null hypothesis is no significant difference between models. A fifth and more salient consideration for the research described in this paper is the use of AIC scores to select both local bandwidths and local regression models. AIC [1,25] seeks to optimize model parsimony by trading off prediction accuracy and complexity. Other measures of fit could be applied including some kind of cross-validation measure of residual errors. There have been a number of arguments made in the context of information theory about the choice of model selection method and their associated measures of fit, and Li and Lam [28] review variable selection methods in GWR frameworks. They compared Step-AIC in GWR, GWR-Lasso, and GWR-Ridge models, noting that they are a function of zero-power, one-power, and twopower, respectively, of the explanatory variables in the models. This essentially frames the relationships between model selection within the elastic net. In terms of information criteria, alternatives to AIC exist such as Bayesian Information Criterion (BIC) and Deviance Information Criterion (DIC) [39]. Future work will investigate these and CV approaches as they would be expected to result in different local model selection. The key in determining which model selection method to use is to understand the logics of each approach and how they relate to the study objectives and even the underlying objectives of data collection. For example, AIC and BIC provide different approaches for model comparison [7]. BIC seeks to determine the "true" model and, if any particular candidate model represents the genuine data-generating mechanism, BIC will select such a model. It is said to be asymptotically consistent because it seeks to select the true model. By contrast AIC seeks to pragmatically select a model by trading-off explanations of the data with prediction strength. Despite these theoretical differences, Spiegelhalter et al. [38] note that "it is perhaps therefore rather surprising how often these two criteria produce similar rankings of candidate models" (p. 486) with the only real differences found in the size of the penalty scores [14]. Future work for both hyper-local and standard GWR will investigate the use of different model selection criteria, the logics associated with the local models being constructed and the underlying process spatial heterogeneity.

Conclusions
Local statistical approaches such as GWR are inherently exploratory in nature. They seek to confirm or refute spatial heterogeneity in spatial data structure, processes and statistical relationships. The hyper-local GWR approach described in this paper provides a useful counter view of local regression modeling to that found with standard GWR. The results of this study show that a standard GWR analysis may underestimate the degree of localized spatial heterogeneity in data relationships where it is strongly present (as in the STN case study) and may overestimate it where spatial homogeneity is present (as in the STP case www.josis.org study). Standard GWR applies the same regression model at each location and uniformly sets the same kernel bandwidth everywhere. The hyper-local GWR approach evaluates different kernel bandwidths at each location and selects the most parsimonious local regression model. Where spatial non-stationarity exists, the hyper-local GWR provides a more spatially nuanced indication of the localization than a standard GWR analysis and can be used to suggest the direction of further analyses and investigations. Undertaking a hyper-local GWR alongside a standard GWR allows coefficient estimates, t-values and bandwidths to be compared for differences and similarities. Specifically, a dual GWR approach that examines the spatial distribution of local covariate selection and the local bandwidth size supports a deeper understanding of the local and scale-related characteristics of the spatial process under investigation.