Statistical Approaches Used to Assess the Equity of Access to Food Outlets: A Systematic Review

Background Inequalities in eating behaviours are often linked to the types of food retailers accessible in neighbourhood environments. Numerous studies have aimed to identify if access to healthy and unhealthy food retailers is socioeconomically patterned across neighbourhoods, and thus a potential risk factor for dietary inequalities. Existing reviews have examined differences between methodologies, particularly focussing on neighbourhood and food outlet access measure definitions. However, no review has informatively discussed the suitability of the statistical methodologies employed; a key issue determining the validity of study findings. Our aim was to examine the suitability of statistical approaches adopted in these analyses. Methods Searches were conducted for articles published from 2000–2014. Eligible studies included objective measures of the neighbourhood food environment and neighbourhood-level socio-economic status, with a statistical analysis of the association between food outlet access and socio-economic status. Results Fifty-four papers were included. Outlet accessibility was typically defined as the distance to the nearest outlet from the neighbourhood centroid, or as the number of food outlets within a neighbourhood (or buffer). To assess if these measures were linked to neighbourhood disadvantage, common statistical methods included ANOVA, correlation, and Poisson or negative binomial regression. Although all studies involved spatial data, few considered spatial analysis techniques or spatial autocorrelation. Conclusions With advances in GIS software, sophisticated measures of neighbourhood outlet accessibility can be considered. However, approaches to statistical analysis often appear less sophisticated. Care should be taken to consider assumptions underlying the analysis and the possibility of spatially correlated residuals which could affect the results.


Introduction
Obesity is one of the leading public health concerns globally and is linked to a number of health conditions, such as cardiovascular disease and type 2 diabetes. Individual-level interventions have had limited success in curbing rising rates of overweight and obesity [1,2]. In recent years, research has examined environmental factors which may influence weight gain, in particular, the "obesogenic environment", defined as an environment which facilitates unhealthy behaviours, such as poor diet, and provides limited opportunities to engage in healthy activities, such as physical activity [3]. The environment has been posited as a contributing factor to the higher levels of obesity observed amongst those residing in more socioeconomically disadvantaged areas [4]. In particular, disadvantaged areas have been investigated for the presence of higher levels of fast food outlets [5,6] and reduced access to healthier outlets, such as supermarkets or grocery stores [7][8][9].
In a recent review of ten semi-systematic and systematic review articles, nine of which considered disparities in access to food outlets by neighbourhood-level socio-economic status (SES), Black et al. [10] found that although these reviews tended to suggest food deserts exist in the US, with those living in low SES neighbourhoods identified as having lower access to supermarkets [11][12][13][14] and often greater access to fast food outlets [12,13,[15][16][17], findings from other countries have been equivocal [10]. Such equivocal findings, while potentially due to true differences in these diverse built environment contexts, may also in part be explained by a number of methodological factors, including inappropriate analytical methods or inconsistent approaches when accounting for the spatial autocorrelation in the studies.
In the nine reviews of disparities in access to food outlets, few explicitly discussed the statistical methods employed to examine the associations. When mentioned, these reviews almost exclusively only discussed limitations attributable to the cross-sectional design common to published studies, highlighting that these approaches mean that causal inference cannot be made [12,13,15,17,18]. In their review of access to fast food outlets, Fraser et al. [16] mentioned that the statistical methods in the studies examined were typically simple approaches, such as correlation or simple regression, but did not discuss how appropriate these methods were for the questions being addressed or the structure of the data being considered. The choice of statistical methodology adopted is important as using an inappropriate methodology can lead to incorrect findings [19]. It is important for researchers to verify the assumptions underlying the methodology undertaken to ensure that the method is suitable. For example, a two-sample t-test can provide misleading results if it is adopted when the data are not normally distributed and the groups have unequal variance [20]. Thus, in this situation an alternative statistical method would be more suitable to test for differences between groups.
In another review, Fleischhacker et al. [15] discussed analytical considerations in studies of the distribution of food outlets, such as a lack of detail in the methodology as to how population adjustment was conducted. Importantly, the authors identified four key factors which should be developed further in studies of the fast food environment and its effect on health and behaviour outcomes: software, statistics, sample size, and the size/range of the neighbourhood buffers. While studies have compared the size and range of buffers of food access measures [21], important statistical considerations when dealing with spatial data in this field have not been addressed. In one review of the distribution of fast food outlets, Fraser et al. [16] stated that alternative statistical approaches such as geographically weighted regression, a technique for exploring how relationships vary in space [22], could be utilised in these studies but did not describe the benefits of adopting this technique or mention any other spatial statistical techniques or considerations.
In one review examining associations between the community food environment, defined as the 'number, type, location and accessibility of food outlets' in an area [23], and obesity, Holsten [24] highlighted that "since objects are spatially related and not independent, many analyses should have controlled for spatial autocorrelation". Spatial autocorrelation refers to the degree of similarity of neighbouring observations. Although this is an important methodological issue when examining the equity of access to outlets across neighbouring areas where correlation is likely to be present, it was not discussed in any reviews of the distribution of food outlets. Ignoring spatial autocorrelation through the use of typical parametric statistical techniques, such as linear regression, can lead to erroneously identifying statistically significant associations when in fact none exist or, alternatively, to failing to identify associations when they are present [25][26][27]. In a review of ecological studies which compared analyses with and without adjustment for spatial autocorrelation, Dormann [28] discussed some of the consequences of ignoring spatial autocorrelation on model parameters, namely obtaining biased parameter estimates and "overly optimistic" standard errors, and found that in all studies reviewed the coefficients were affected by spatial autocorrelation. Therefore, the choice of statistical analysis technique employed and the degree of spatial autocorrelation can influence research findings.
The aim of this review was to systematically appraise the existing literature on the equity of access to food outlets to identify the statistical methods used in the analyses. The key focus was to examine the suitability of the methodology employed and to identify any spatial statistical methodologies used. The secondary aim was to assess whether or not spatial autocorrelation was considered.

Inclusion criteria
Articles were eligible for inclusion if they featured an objective measure of the neighbourhood food environment considered as an outcome variable in the analysis. Included studies contained a measure of neighbourhood-level SES (e.g., median household income, socioeconomic index for areas). Papers were excluded if they solely examined within-store produce as an outcome, rather than store availability, such as those which examined healthy food baskets or shelf space use, or if the focus was on dietary or obesity outcomes rather than store availability or distribution. Furthermore, articles were excluded if they did not conduct a formal statistical analysis of the association between neighbourhood-level SES and food store access; that is, studies which only produced descriptive tables or maps of the distribution but did not attempt to identify evidence of an association between food store access and neighbourhood-level SES through statistical tests.

Search strategy
The electronic search was conducted in March 2014 and the search strategy adopted is fully specified in the Appendix. Our search included journal articles published in English since 2000 as existing reviews of food environment literature have shown that the majority of environmental food assessments have occurred during this period [15,17]. Articles were identified using the following databases: Medline Complete, PsychINFO, CINAHL Complete, Web of Science, Global Health, Embase, Scopus and the Cochrane Library. Relevant articles known to the authors were examined to identify key words to use as search terms. Our search terms included combinations of terms referring to food outlets, equity and neighbourhood access as detailed in the Appendix.
Initially, a title scan was conducted in order to discard irrelevant articles identified in the search. A two stage process was adopted when screening abstracts. First, review articles, commentary or discussion articles and intervention studies (in which the focus was on individual outcomes), studies which examined students' diets or the school food environment, and any other studies which did not involve an objective measure of the food environment were excluded. In the second stage, two investigators (KEL and LET) independently assessed the remaining abstracts according to the inclusion criteria to compile a final list of articles. Where there was disagreement, the full article was examined and discussed, with input from all co-authors, to identify if this should be included in the review.

Data extraction
A structured form was created for the data extraction which included information on where the study was conducted, the number of neighbourhoods considered, the statistical analysis approach adopted (including whether or not spatial autocorrelation was considered) and the main findings.

Results
The results from the search are presented in Figure 1. A total of 54 published papers were considered in this systematic review.

Summary of included studies
The 54 included papers, described in Table 1, published between 2002 (no articles published in 2000 and 2001 met the inclusion criteria) and 2014 feature studies of food access and availability from the US (n = 26; 48.1%), Canada (n = 10; 18.5%), the UK (n = 7; 13.0%), New Zealand (n = 4; 7.4%), Australia (n = 3; 5.6%), Brazil (n = 1; 1.9%), Denmark (n = 1; 1.9%), Germany (n = 1; 1.9%), and Sweden (n = 1; 1.9%). The median sample size (i.e., number of administrative units) was 390, although there was a great deal of variability (IQR = 5671.8) and two articles did not report sample sizes [29,30]. The samples ranged from as low as 18 neighbourhoods in one article which examined fast food outlet availability in Cologne, Germany [31] to as high as 65,174 in a recent article which considered the availability of supermarkets, grocery and convenience stores across census tracts for the whole of the US [32]. Just under one third of the articles (n = 16; 29.6%) involved studies of more than 1000 neighbourhoods and the majority of these (n = 12) were national, or urban national studies, while the others (n = 4) were US city or county studies.
Included articles considered a wide variety of food outlet types, such as fast food outlets, supermarkets or grocery stores (typically defined as smaller supermarkets and / or non-chain supermarkets), convenience stores, green grocers, cafés, specialty food stores (e.g., meat markets, fishmongers), and delicatessens. Of these, the most commonly considered outlet types were supermarkets and fast food outlets, with some analyses considering the distribution of both outlet types.
Although the primary purpose of this review was to examine the statistical techniques employed, we have highlighted the key study findings in Table 1. As in other systematic reviews [10,15,16], findings relating to the distribution of supermarkets and grocery stores by neighbourhood-level SES were mixed while results relating to fast food outlet distribution were more consistent, particularly in the US, with greater availability in low SES areas. Findings from the studies which examined the distribution of other food store types varied (Table 1).

Number of available food outlets
The most common type of outcome considered was the number of available food outlets within an administratively defined neighbourhood or a pre-specified buffer distance of the neighbourhood centroid, either geometric or population-weighted centroid [33]; 43 (79.6%) of the 54 articles considered this measure. These outcomes are counts as these can only be zero or positive whole numbers and, depending on the type of food outlet considered, potentially feature skewed distributions. For example, if the outcome is major fast food chain outlets within a small administrative unit the distribution is likely to be positively skewed, and potentially zero-inflated (have a larger number of zero values than assumed by a specific distribution), as many neighbourhoods will have only a small numbers of outlets while fewer neighbourhoods will have a large number. Thus, statistical approaches such as standard or zero-inflated Poisson and negative binomial regression which are equipped to deal with distributions of this nature are likely to be the most appropriate to use for this type of outcome.
The statistical methods adopted in the 43 articles which considered the number of outlets as an outcome are summarised in Table 2. Of these articles, only one third (n = 14) accounted for the fact that the outcome was a count through the use of Poisson [34][35][36] or negative binomial regression [9,32,[37][38][39][40][41], Poisson multilevel regression [42] or generalised estimating equations [43], generalised additive models with Poisson errors [44], or a spatial scan statistical approach assuming a Poisson distribution [45]. Negative binomial regression is preferable to Poisson regression when the data are over-dispersed (i.e., when the variance is greater than the mean) as an assumption of the Poisson distribution is that the variance equals the mean. The negative binomial regression has an additional parameter which is able to deal with over-dispersed data and is often useful when the data are zero-inflated as can be the case in analyses of food outlet data. Of the analyses that assumed a Poisson distribution, two [36,43] mentioned examining whether or not over-dispersion was present, finding no evidence of over-dispersion.
Other commonly used techniques which considered the outcome as a linear response variable included the one-way ANOVA or MANOVA (10 studies, 23.3%) or linear regression, whether single-level, multilevel or multivariate (6 studies, 14.0%). These techniques all assume that the residuals are normally distributed with mean zero and constant variance. In addition, these techniques assume that the observations are independent, apart from multilevel models which account for clustering in the data. Few studies mentioned considering the distributional assumptions in the analysis. Of the 16 studies, one log-transformed the outcome due to the skewed nature of the distribution [5] and one mentioned using the Kolmogorov-Smirnov test to determine if the assumption of normality was valid for their outcome variable, finding it to be reasonable [46]. Another article, while not discussing assessment of the outcome distribution, mentioned that the data were zero-inflated and thus presented a logistic regression analysis of the presence or absence of the outlet type in the neighbourhood [47]. However, a one-way ANOVA was used for the assessment of the association between the number of food outlets and neighbourhood-level SES. While ANOVA and linear regression can be robust to deviations from normality, the distribution of the number of food outlets would be more suitably dealt with using a method designed to deal with count data. Perhaps one reason for the use of these approaches is that the authors typically converted the food outlet outcome to a rate, either the number per 1,000 or 10,000 individuals, or the number per square mile or kilometre, thus converting a count outcome into a continuous variable prior to fitting the model. However, both Poisson and negative binomial regression are able to model rates by incorporating the log of population size or area as an offset in the model. Furthermore, rates are never negative and while techniques such as linear regression can yield expected values that are negative, those based on Poisson or negative binomial regression do not.

Spatial autocorrelation
Only 5 (11.6%) of the 43 articles tested for evidence of spatial autocorrelation ( Table 2); all used Moran's I. Values of spatial autocorrelation from Moran's I range from -1 to 1, with 0 indicating no correlation. Z-scores can be calculated for Moran's I values to determine whether or not there is evidence of spatial autocorrelation. However, evidence of spatial autocorrelation can also be determined using permutation tests which provide pseudo significance levels (i.e., pseudo p-values). These are classified as 'pseudo' since the significance is dependent on the number of permutations adopted. Permutation tests can be useful when assumptions underlying Moran's I tests, such as normality, are not appropriate.
Four articles examined spatial autocorrelation in the food outlet outcome variable(s) [31,[48][49][50] while one assessed residual spatial autocorrelation [43]. Of those that examined spatial autocorrelation in the outcome variable, one found no evidence of spatial autocorrelation but did not report estimates [31]. Another found no evidence of spatial autocorrelation (correlation = -0.04, pseudo-p = 0.46-0.49) in the number of grocery stores per acre, weak evidence of positive spatial autocorrelation of 0.11 (pseudo-p = 0.07-0.08) in the number of fast food outlets per acre, and evidence at the 5% significance level of positive spatial autocorrelation of 0.19 (pseudo-p = 0.01-0.02) in the number of convenience stores per acre [50]. In the third article, the authors reported evidence of spatial autocorrelation of 0.72 (z = 28.36) in the number of supermarkets within 1000m [48]. The fourth article reported positive spatial autocorrelation of between 0.25 and 0.62 (z = 5.66-13.53) for the number of supermarkets, 0.30 to 0.66 (z = 6.94-14.95) for the number of fast food restaurants and 0.29 to 0.41 (z = 6.32-9.73) for the number of convenience stores depending on the buffer size used to define the neighbourhood, with correlation increasing as the buffer increased from 1 to 5 miles [49]. Two of the articles which found evidence of spatial autocorrelation in the outcome did not account for this in the analysis or test for evidence of residual spatial autocorrelation after examining the associations with neighbourhood-level SES [48,49]. Thus, the results from the analyses may have been affected if residual spatial autocorrelation remained. The third article which found evidence of spatial autocorrelation in the outcome conducted bivariate spatial autocorrelation analyses of the food outlet outcome alongside neighbourhood-level SES in order to determine associations [50]. However, the analytical results presented were based on the use of MANOVA which does not take into account the spatial location of the neighbourhoods. Lisabeth et al. (2010) examined residual spatial autocorrelation after fitting a multivariate Poisson regression using generalised estimating equations to deal with the clustering of the different stores within census tracts and found no evidence of residual spatial autocorrelation (all p-values > 0.37). However, the authors stated that evidence of residual spatial autocorrelation was identified when using buffer sizes to define neighbourhoods rather than census tracts and thus the estimates of the standard errors from that analysis were not valid. No attempt to incorporate the spatial information about the data was made to account for this residual spatial autocorrelation.

Spatial methods
Although only 5 articles explicitly tested for spatial autocorrelation in the analysis, others incorporated information about the spatial location of the data in different ways. For example, three articles considered the clustering of small administratively defined neighbourhoods within larger area level definitions, such as local authorities or counties, using multilevel modelling [42,51] or clustered standard errors [32]. While these methods deal with the grouping of neighbourhoods, they do not explicitly examine the spatial location as such, in that neighbourhoods could have similar observations to those they surround and these neighbourhoods will not be located within the same local authority or county if they are at the edge of these administrative levels. One analysis included the spatial location as a covariate in the analysis in order to potentially account for any spatial autocorrelation [44]. Residual spatial autocorrelation was not examined in any of these articles.
Only one of the 43 articles adopted a spatial analytical technique to examine associations between the number of food outlets and neighbourhood-level SES. In this analysis, Baker et al. adopted a spatial scan approach in which a circular window of a pre-defined radius is moved across the map to test the null hypothesis that the rate of food outlets is the same in all of the windows, assuming a Poisson distribution for the outcome variable [45]. This technique identifies clusters in which higher or lower rates are observed than expected and adjustment for neighbourhood-level SES can be examined to determine if this explains these clusters.
A small number of studies mention the lack of consideration of spatial autocorrelation in the study limitations [36,44,52].

Distance to the nearest food outlet
Fourteen (25.9%) of the 54 articles considered distance to the nearest food outlet as the accessibility measure, shown in Table 3. Although Hurvitz et al. considered this outcome in addition to the density of outlets, no formal statistical analysis was conducted of the association between the distance and neighbourhood-level SES [38]. Of the fourteen articles, eleven (78.6%) feature in Table  2 as these studies also considered the number of outlets as an outcome measure. Ten of these used the same statistical methods for both the count measure and the distance measure. The most common techniques used were the one-way ANOVA (4 articles, 28.6%) or linear regression, including multivariate linear regression, (4 articles, 28.6%). Although these techniques are perhaps more appropriate for distance measures, it is possible that these types of measures could be skewed. Most articles did not mention any assessment of the shape of the distribution or examination of model residuals. In one article, the distance outcome was log-transformed to obtain a normally distributed outcome variable [40]. Another analysis, although using ANOVA, reported median distances suggesting that the data were skewed [53].

Spatial autocorrelation
Four (28.6%) of the 14 articles examined spatial autocorrelation using Moran's I [7,31,48,49] ( Table 3), three of which also assessed spatial autocorrelation for the count of food outlets ( Table 2). Three articles only considered spatial autocorrelation in the outcome [31,48,49], while one examined residual spatial autocorrelation [7]. Considering spatial autocorrelation in the food outlet distance outcome, one article found evidence of positive spatial autocorrelation of 0.54 (z-score = 21.68) in the distance to the nearest supermarket [48]. Another found varying degrees of positive spatial autocorrelation dependent on the outlet type, ranging from 0.20 (z-score = 4.51) for distance to the nearest fast food restaurant to 0.70 (z-score = 15.17) for distance to the nearest mass merchandiser. The spatial autocorrelations for the distance to the nearest supermarket and nearest grocery store were 0.50 (z-score = 10.41) and 0.61 (z-score = 13.57), respectively [49].
Although these analyses found evidence of spatial autocorrelation in the outcome, neither tested for residual spatial autocorrelation when modelling associations with neighbourhood-level SES, nor took the spatial location into account in the analysis. Schneider and Gruber mentioned that they found no evidence of spatial autocorrelation in the outcome, although they did not explicitly mention testing this for the distance accessibility measure, only the count measure [31]. Zenk et al. found evidence of residual spatial autocorrelation (Moran's I = 0.008, p < 0.001) after fitting ordinary least squares regression and thus used a moving average spatial regression analysis to account for any spatial autocorrelation present in the residuals [7].
Moving average spatial regression, unlike ordinary least squares regression, allows for spatial autocorrelation in the residual terms by taking the spatial location of the neighbourhoods into account. This form of spatial regression considers the influence of local neighbours; that is, it is assumed that observations in one neighbourhood are directly influenced by observations in the closest neighbourhoods but not in the neighbourhoods beyond. In order to fit a moving average spatial regression, it is necessary to define a neighbours matrix to describe the spatial relationships in the data. If, for example, the study region involves 100 administrative units, the neighbours matrix will be a square matrix with 100 rows and 100 columns to represent all of these units. The diagonal entry of the matrix will equal zero as administrative units cannot neighbour themselves. If two administrative units are neighbours then an entry of 1 will be included in the matrix; an entry of 0 indicates that the two units are not neighbours. Commonly, two administrative units are defined as neighbours if they share a common boundary. Although, alternatively, neighbours could be defined according to distance measures (e.g., defining areas to be neighbours if the distance between the administrative unit centroids is less than 2km). Zenk et al. did not describe how the neighbours matrix was created but mentioned that accounting for the spatial structure of the data using moving average spatial regression resulted in no remaining residual spatial autocorrelation.

Spatial methods
None of the other eight articles considered spatial autocorrelation or spatial analytical methods, although, as with the count outcome, one did examine whether including clustered standard errors affected the results, reporting them to be similar to the results without clustered standard errors [52].

Alternative food outlet accessibility measures
Of the eight articles not discussed in sections 3.2 and 3.3, three considered travel time in minutes to the nearest food outlet [8,54,55]. In each study, the authors acknowledged that the travel times were skewed and, thus, not normally distributed. One analysis adopted linear regression [55] and two used Spearman's rank order correlation [8,54]. None of these articles mentioned which statistical software package they used in the analysis or discussed spatial autocorrelation.
One article considered two binary outcomes-the presence or absence of fast food outlets within 500m or supermarkets within 800m from the geometric centroid of each census block-and fitted logit models in Stata to examine associations with neighbourhood-level SES [56]. Another considered travel times from each census block to the nearest supermarket or fast food outlet and then categorised each census block as either having a shorter time to a supermarket, a shorter time to the fast food outlet, or the same time to each outlet [57]. Using categories rather than actual distance values led to a loss of information about the magnitude of the differences in distance, making it difficult to determine how access to these outlet types could differ. To examine associations between neighbourhood-level SES and these categories of access, the authors fitted a one-way ANOVA of continuous SES score (Socioeconomic Index for Areas, SEIFA).
In a third article, a composite measure of food outlet access was derived by assigning neighbourhoods with a score of 1 for each of three different healthy and three different unhealthy outlets if located within a quarter mile network area [58]. Thus, each neighbourhood would have a score between 0 and 3 for healthy outlets and for unhealthy outlets. This measure is limited in that the scores do not take into account the number of outlets within a neighbourhood (e.g., a score of one is assigned to the neighbourhood regardless of whether it has one supermarket or ten within a quarter mile). The healthy outlet score was subtracted from the unhealthy outlet score to give a range of scores from -3 to 3 for the neighbourhoods considered. This outcome is difficult to interpret given that, for example, a score of zero for neighbourhoods which have neither healthy nor unhealthy outlets within a quarter mile cannot be distinguished from a score of zero for neighbourhoods which have three healthy and three unhealthy outlet types. The association between the outlet score and neighbourhood-level SES was assessed using one-way ANOVA in SAS, although the authors did not mention assessing the shape of the outcome distribution. None of these three studies assessed or mentioned the possibility of spatial autocorrelation in their data.
The other two studies adopted spatial analytical approaches in the analysis [30,59]. Dai and Wang used a spatial lag model to examine the distribution of weight scaled food outlet accessibility measures (using weights of 0-10 based on name recognition) by neighbourhood-level SES variables [59]. A spatial lag model incorporates a weighted average of the outcome values of neighbouring observations into a regression model to remove any residual spatial autocorrelation. In a spatial lag model, a neighbours matrix (as described previously) is required. The spatial lag term is created by multiplying the neighbourhood matrix (typically standardised so that the sum of each row is equal to one) by the outcome variable (i.e., the food outlet outcome). Typically, many of the terms in the neighbourhood matrix are 0 as a lot of neighbourhoods do not neighbour one another. Therefore, for each neighbourhood, the spatial lag term is the weighted average of the observations in the immediately surrounding neighbourhoods. An alternative to the spatial lag model is the spatial error model (although it is possible to incorporate both a spatial lag and a spatial error term in a model) which takes into account the location of observations by modelling the correlation in the error term. In the absence of any clear view as to which is the more appropriate structure to model, model comparison techniques can be adopted to aid in deciding which captures the underlying spatial structure of the data [60,61].
Lee and Lim adopted a more complex food outlet accessibility measure by deriving a discrepancy index, in which they calculated the expected demand for an outlet and divided this by the observed number of outlets in the neighbourhood [30]. A ratio of 1 indicates that there are sufficient outlets in the neighbourhood, while < 1 indicates that there is an over-supply in the neighbourhood, and >1 indicates that the demand is greater than the supply. The authors used the G-statistic to examine the spatial distribution of the outcome. The G-statistic aids in the identification of clusters and tests the null hypothesis that there is no clustering of the variable of interest, the discrepancy index in this case; that is, there is complete spatial randomness in the distribution of the variable. The G-statistic estimates the spatial clustering of values of environmental features. The statistic takes high values where higher values of the observations cluster and low values where lower values of the observations cluster [62].

Summary
Only five articles (9.3%) included in this review adopted a spatial statistical technique in the analysis of the equity of access to food outlets, each using a different technique. These methods were: moving average spatial regression [7], spatial scan statistic [45], G-statistic [30], spatial lag model [59], and bivariate spatial autocorrelation assessment [50]. A sixth study incorporated the spatial location of neighbourhoods in a regression model [44]. Seven (13.0%) of the 54 studies tested for spatial autocorrelation, while only a further three mentioned spatial autocorrelation at all. i) Supermarkets more common in advantaged neighbourhoods; ii) Grocery stores more common in deprived neighbourhoods; iii) Specialty stores in more common more deprived neighbourhoods but no evidence of a difference in multivariate analysis; iv) Convenience stores more common in more advantaged neighbourhoods. i) Evidence of an association between number of restaurants and deprivation but no clear trend. Highest access in second most affluent area. Second most affluent area has greater odds of having a restaurant than middling and deprived areas. ii) No evidence of a difference in fast food outlet number by deprivation, or in odds of having a fast food outlet. iii) No evidence of a difference in number of cafés by deprivation. Odds of the presence of a café are lower in the second most deprived quintile than the second most affluent. iv) Evidence of an association between deprivation and the number of takeaways but no clear trend. Highest access in second most affluent area. Lower odds of having a takeaway outlet present in the most affluent quintile than the second most affluent. Meltzer i) Supermarkets (chain); ii) Grocery stores; iii) Convenience stores; iv) Convenience stores with gas stations; v) Specialty food stores (meat markets, fruit and vegetable markets); vi) Full-service restaurants (including cafeterias); vii) Fast food outlets (chain and non-chain); viii) Fast food outlets (chain); ix) Carryout eating places (non-chain delicatessens, bagel or sandwich shops); x) Carryout specialty items (smoothie shops, espresso bars, specialise in one type of food); xi) Bars/taverns.
Median value for homes. Site-specific quintiles of wealth were averaged to create a measure of relative wealth.
i) Supermarkets more prevalent in less deprived areas but no clear trend; ii) Grocery stores more prevalent in more deprived areas; iii) No clear evidence of a difference in convenience stores by derivation; iv) More convenience stores with gas stations in middling deprivation areas compared to high deprivation areas; v) No clear evidence of a difference in specialty food stores by deprivation; vi) No clear evidence of a difference in full-service restaurants by deprivation; vii) & viii) No clear evidence of a difference in fast-food outlets by deprivation; ix) No clear evidence of a difference in carryout outlets by deprivation; x) No clear evidence of a difference in specialty carryout outlets by deprivation; xi) Lower numbers of bars/taverns in the two most affluent quintiles than the least affluent neighbourhoods. Pearce (bottom quintile), middle (middle three quintiles), and high (top quintile).
chain supermarkets than middle income areas; ii) Low income areas have more non-chain supermarkets than middle income areas; iii) Low income areas have more grocery stores and high income areas have fewer grocery stores than middle income areas; iv) Low income areas have more convenience stores and high income areas have fewer convenience stores than middle income areas. Reidpath, 2002 [77] Melbourne, Australia Neighbourhood poverty. Dichotomised into >20% or ≤20% of population below the federal poverty level.
Neighbourhood minority. % of non-Hispanic white race/ethnicity categorised as low/medium/high but unclear how grouped.
Created a categorical variable: low poverty/low minority, high poverty/low minority, low poverty/medium minority, high Findings were mixed. Descriptive data shows: i) In general, more fast food outlets in high-poverty compared to low-poverty areas, apart from in high-density urban medium-minority areas; ii) In general, more grocery stores/supermarkets in high-poverty compared to low-poverty areas, apart from in non-urban medium-minority areas and high-density high-minority areas; iii) Mixed findings for convenience stores depending on urban density and minority. For example, in both high-density urban areas, areas % of the population with income less than 100% of the federal poverty level. Dichotomised into low income (16.0-63.4%; n=24) and high income (0-15.2%; n=24).

AIMS Public Health
i) Higher number of food stores in low income areas. No test for evidence of a difference; ii) A higher proportion of stores are SNAP accepting in low income compared to high income areas; iii) Proportion of SNAP accepting supermarkets greater in high income areas; iv) Proportion of SNAP accepting grocery stores was greater in low income neighbourhoods compared to high income neighbourhoods; v) Greater number of SNAP accepting convenience stores in low income neighbourhoods but no evidence of a difference; vi) A higher proportion of 'other' stores in low income areas were SNAP accepting than in high income areas. Split into three groups: low deprivation (lowest quartile of weighted and standardised deprivation scores), medium deprivation (middle two quartiles), and high deprivation (highest quartile).
i) High deprivation neighbourhoods had lower distance to nearest fast food outlet than low deprivation areas. The number of fast food outlets within three miles was higher in high deprivation neighbourhoods than in low deprivation areas. There was no evidence of an association between deprivation and the number of outlets within on mile. ii) High and medium deprivation neighbourhoods had lower distance to the nearest fast food opportunity than low deprivation areas. High deprivation neighbourhoods had more fast food opportunities within one and three miles than low deprivation areas. iii) High deprivation neighbourhoods had lower distance to the nearest fast food opportunity with healthier entrees than low deprivation areas. High deprivation areas had higher numbers of opportunities with healthier entrees within one and three miles than low deprivation areas. iv) High deprivation neighbourhoods had lower

AIMS Public Health
Volume 2, Issue 3, 358-401 distance to the nearest fast food opportunity with healthier side dishes than low deprivation areas. High deprivation areas had higher numbers of opportunities with healthier side dishes within one and three miles than low deprivation areas. Smith, 2010 [8]     Moving average spatial regression 1 [7] S+SpatialStats (1) 1 1 * Articles can appear in more than one category. Numbers excluded by full article represent primary exclusion reason. ** This includes shelf-space/display, produce availability, price, quality and marketing

Discussion
A number of systematic reviews have considered the evidence supporting inequities in access to food outlets. While these reviews discussed differences between studies in terms of the definitions of access, neighbourhood SES and the neighbourhood boundaries or buffers adopted, none explicitly examined the statistical methodologies employed.
Our review has shown that a variety of methods have been employed to examine the equity of food outlets by deprivation, with 17 analytical techniques used to determine associations between the number of food outlets and neighbourhood-level SES and seven techniques used to test for associations between the distance to the nearest outlet and neighbourhood-level SES. It is not possible for us to determine how findings may have been affected by the analytical approach as this will be dependent on a number of factors including, for example, the sample size of the study and the validity of the model assumptions. While the assumption of normality, and thus the use of linear regression, t-tests or ANOVAs, may be valid for large sample sizes, it is important to consider precisely what question is being asked and whether the approach utilised is appropriate [82]. In this area of research, commonly the number of food outlets was considered as an outcome variable. This is a count variable, only taking zero and positive integer values. Therefore, the normal distribution, which assumes an equal distribution around the expected value (either positive or negative), is not the most appropriate for dealing with data of this type. Count variables are more suited to analyses using Poisson or negative binomial regression. Although we focussed particularly on the treatment of the outcome variable in these analyses, it is worth noting that treatment of the exposure variable should not be overlooked. In particular, there is often a tendency to adopt arbitrary percentile categorisation of exposure variables (discussed elsewhere [83]).
When considering analyses of the availability of food outlets by small-area level deprivation, it is important to acknowledge that these studies involve spatial data and thus this feature should also be considered when determining the statistical approaches to employ in the analysis. Our systematic review has shown that this feature is infrequently considered in studies of the equity of outlets with most relying on traditional regression techniques which assume that the residuals are independently distributed; an assumption which should be verified when dealing with spatial data. Thus, it was unclear whether residual spatial autocorrelation remained which could affect the inference from the models. Furthermore, studies which found evidence of spatial autocorrelation infrequently adopted spatial regression techniques to attempt to model the spatial autocorrelation. It therefore appears that there may some confusion within this field of research about how spatial data can and should be dealt with in analyses. It is possible to draw on examples looking at the distribution of other facilities which have considered the spatial nature of the data [84,85].
One potential reason for the lack of consideration of the spatial nature of the data, other than a possible unfamiliarity with the problems associated with ignoring spatial autocorrelation, may be due to the functionality of software used to map data or the users' familiarisation with the capabilities of this software. Typically GIS software packages such as ArcGIS were adopted to map the data and determine the number within a given region, before transferring the data to a statistical software package to determine if neighbourhood-level SES was associated with the food outlet outcome measure. In transferring the data to the statistical software package, it is likely that the spatial aspects of the data were not retained for consideration in the analysis. Dealing with spatial data can be non-trivial. However, commonly used software packages, such as SAS [86], Stata [87,88] and R [89], offer options for conducting spatial analysis. Other spatial analytical software is available which could be used in studies of this nature. Notably, studies in this review which tested for spatial autocorrelation either used ArcGIS/ArcView or a specialist spatial analytical package for this purpose, such as GeoDa [90] or S + SpatialStats [91]. However, in those studies that employed GeoDa, other statistical software packages, such as SPSS or Stata, were used to test for associations between neighbourhood-level SES and food outlet outcome even though GeoDa does provide some options for regression models.
Another possible reason for not considering spatial autocorrelation or spatial regression techniques in these analyses may be due to the number of neighbourhoods considered in some studies. The larger studies discussed in this review consisted of several thousand observations meaning that large neighbours matrices are required in order to determine the level of spatial autocorrelation or to fit spatial regression models. This can prove to be computationally intensive. However, various techniques have been proposed to deal with large spatial data sets, including techniques involving sparse matrix operations, in which only the non-zero elements of the neighbours matrix are stored [92,93]. Some studies may not have considered spatial autocorrelation or spatial techniques as the areas considered were not spatially contiguous. However, spatial neighbours do not necessarily have to be defined as those which share a common boundary; distance based definitions of neighbours can be used but this poses the question as to what distance should be used.
There are a number of analytical techniques which can deal with spatial data. These include spatial regression techniques which are able to model associations between areal measures, such as the number of food outlets within a neighbourhood and neighbourhood-level SES, while accounting for the spatial nature of the data. One such approach is the spatial moving average regression described previously, adopted by Zenk et al. [7], which enables the spatial autocorrelation in the data to decline rapidly beyond direct neighbours [94]. In spatial epidemiology or ecology literature dealing with areal data, often conditional autoregressive (CAR) [95][96][97] or simultaneous autoregressive (SAR) [98][99][100] models are used. Spatial auto-regression models expand on traditional regression models through the creation of a spatial dependence between the outcome observations (e.g., the number of food outlets) or the residuals at neighbouring locations through the use of a weighted neighbours matrix (described previously). This matrix specifies the strength of the interaction between the neighbouring units [28,101,102]. Choosing an appropriate spatial model to adopt in the presence of spatial autocorrelation can be challenging and requires some care [103].
Other approaches, such as the spatial scan statistic or the G-statistic, are useful for detecting clusters of higher or lower availability of food outlets. Alternative clustering techniques have been proposed in other food environment literature, such as the bivariate K-function [104]. However, this approach has received criticism as to its appropriateness in built environment studies [105].
Clearly, the technique to employ is dependent on the research question being posed and the underlying nature of the spatial data. Dealing with spatial data is by no means trivial. Therefore, care should be taken to ensure the validity of the assumptions imposed by the modelling adopted.

Limitations of the review
Our search strategy was limited to articles published in the English language and, thus, may not have included all relevant papers. While it is beyond the scope of this review to discuss in depth the numerous spatial analytical approaches available, we hope that highlighting possible approaches to account for the spatial nature of the data aids future analyses in this field.

Conclusion
While researchers continue to explore the impact of the neighbourhood environment on disadvantaged groups in society through the examination of the equity of access to food outlets, it is important to highlight that results may differ dependent on the analytical approach adopted, particularly given the spatial nature of the data. While much detail is usually provided on the data collection and mapping using GIS software, the description of statistical procedures is often brief and lacks sufficient information. It is recommended that future studies consider the validity of the assumptions underlying the analytical approach adopted and assess the residual spatial autocorrelation following standard modelling, adopting spatial analysis techniques where appropriate.