Abstract
Popular spatial autocorrelation (SA) indices employed in spatial econometrics include the Moran Coefficient (MC), the Geary Ratio, (GR) and the join count statistics (JCS). Properties of these first two quantities rely on spatial weights matrix definitions [e.g., binary 0–1 (rook or queen adjacencies), nearest neighbors, inverse inter-point distance, row standardized], which may cause confusion about output from different software packages; to date, JCS calculations have been using only binary 0–1 definitions. The MC and GR expected values for linear regression residuals also merit closer examination; although the mean and other details of the sampling distribution for the MC are well-known, at least the details of those for the GR are not. The (MC + GR) sum furnishes a potential diagnostic for georeferenced data normality, one that warrants much further explication and scrutiny. The Moran scatterplot is a widely used graphic tool for visualizing SA; this paper formally introduces its Geary scatterplot counterpart (first appearing informally in 2019), together with some comparisons of the two. Meanwhile, established relationships between the JCS and the MC and the GR need additional inspection, too, especially in terms of their sampling variances. Preliminary analyses summarized in this paper also address derived asymptotic properties as well as links with the single spatial autoregressive parameter of the simultaneous autoregressive (SAR; spatial error) and autoregressive response (AR; spatial lag) model specifications. This paper describes selected little-known features of these standard SA indices, furthering a better understanding of, and a more complete set of details about, them. Results from a myriad of empirical spatial economics landscapes [e.g., Puerto Rico, Jiangsu Province, Texas, Houston (Harris County), and the Dallas-Fort Worth (DFW) metroplex] and a variety of planar surface partitionings (including the square and hexagonal tessellations, and randomly generated graphs) illustrate highlighted theoretical and conceptual traits. These include a corroboration of the contention in the literature that the MC more closely aligns with spatial autoregression, and the GR more closely aligns with geostatistics.
Similar content being viewed by others
Data availability
Data publicly available via the US Census or other popular data source web pages.
Code availability
Computations were made using standard SAS and R routines.
Notes
Using matrix notation, the MC for vector Y is \(\frac{{\text{n}}}{{{\mathbf{1}}^{{\text{T}}} {\mathbf{C1}}}}\frac{{{\mathbf{Y}}^{{\text{T}}} \left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right){\mathbf{C}}\left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right){\mathbf{Y}}}}{{{\mathbf{Y}}^{{\text{T}}} \left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right){\mathbf{Y}}}}\), where superscript T denotes the matrix transpose operator. Its GR parallel is \(\frac{{{\text{n}} - 1}}{{{\mathbf{1}}^{{\text{T}}} {\mathbf{C1}}}}\frac{{{\mathbf{Y}}^{{\text{T}}} \left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right)\left( {\left\langle {{\mathbf{C1}}} \right\rangle_{{{\mathbf{diag}}}} - {\mathbf{C}}} \right)\left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right){\mathbf{Y}}}}{{{\mathbf{Y}}^{{\text{T}}} \left( {{\mathbf{I}} - {\mathbf{11}}^{{\text{T}}} /{\text{n}}} \right){\mathbf{Y}}}}\), where < > diag denotes a diagonal matrix.
This database is informative partly because it underlines the role geographic resolution plays in SA assessment, illustrating that the county resolution tends to yield appealing results. Its numbers of areal units also span the primary range for most empirical examples. It includes the often encountered striping/banding affiliated with lower bound zero attribute values. One of its most attractive traits here is its ability to exemplify a case for which the Geary scatterplot is preferable to its Moran scatterplot alternative.
H0: ρSA = 0, and HA: ρSA ≠ 0, where ρSA denotes the population SA.
Griffith (2003) presents the foundational theory of Moran eigenvector spatial filtering (MESF), the methodology used to construct ESFs, with Griffith et al. (2019) furnishing its implementation treatment. In brief, MESF extracts eigenvectors from the doubly centered spatial weights matrix appearing in the MC numerator—these are map patterns representing a spectrum of distinct SA natures and degrees—and then uses these vectors as covariates in standard linear and generalized linear regression techniques to filter SA from residuals and transfer it to what becomes a nonconstant intercept term. This specification directly relates to auto-normal model specifications, and renders regression residuals that mimic being independent. It extends standard linear and nonlinear regression theory to geospatial data analysis, while preserving the desired statistical parameter estimator properties of unbiasedness, efficiency, consistency, and sufficiency. The eigenvectors involved are mutually orthogonal and uncorrelated, which facilitates using automated stepwise vector selection for the construction of a given ESF, a problem plagued by over- and under-fitting obstacles (addressable with cross-validation) as well as computational complexity arising from the screening of nearly n eigenvectors. The user-friendly freeware SAAR (https://github.com/hyeongmokoo/SAAR; Koo et al. 2018) and ESF_Tool (https://github.com/esftool/esftool) implement this methodology, as does the spmoran R package module.
Kelejian and Prucha (2001) extend this MC diagnostic testing opportunity to limited dependent variable model specifications (e.g., the Tobit and bi/multinomial regression), as well as for the relatively popular spatial AR-SAR specification. Their work offers a blueprint for extending the GR testing option promoted in this paper to this same variety of residuals, particularly given results reported in Griffith (2010).
Pre- and post-multiplying a spatial weights matrix by the projection matrix (I – 11 T/n) creates double centering; see Borg and Groenen (2005, p. 262).
The diagonal of zeros is replaced by a diagonal of negative row sums; see Bapat (2010, Chapter 4).
For example, if all areal units have the same number of neighbors [i.e., \(\mathop \sum \limits_{{{\text{j}} = 1}}^{{\text{n}}} {\text{c}}_{{{\text{ij}}}}\) = k \(\forall\) i], then this fraction reduces to (n – 1)/n, which asymptotically converges on 1; for n = 100, a reasonable sample size, it approximately equals 0.99.
The ability to indicate the correct behavior of a given dataset more often than would be possible by pure chance, before performing a battery of rigorous diagnostics.
References
Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht
Anselin L (1996) The Moran Scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS in environmental and socio-economic sciences. Taylor & Francis, London, pp 111–125
Anselin L (2019) A local indicator of multivariate spatial association: extending Geary’s c. Geogr Anal 51:133–150
Arbia G (2006) Spatial econometrics. Springer, Berlin
Bapat R (2010) Chapter 4: Laplacian matrices, Graphs and Matrices. Springer, London, pp 45–55
Boots B, Royle G (1991) A conjecture on the maximum value of the principal eigenvalue of a planar graph. Geogr Anal 23:276–282
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, New York
Burridge P (1980) On the Cliff-Ord test for spatial correlation. J Roy Stat Soc B 42:107–108
Chun Y, Griffith D (2013) Spatial statistics and geostatistics. SAGE, Thousand Oaks, CA
Cliff A, Ord J (1973) Spatial autocorrelation. Pion, London
Cliff A, Ord J (1981) Spatial processes. Pion, London
Comber A, Brunsdon C, Radburn R (2011) A spatial analysis of variations in health access: linking geography, socio-economic status and access perceptions. Int J Health Geogr 10(1):1–11
Geary R (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146
Griffith D (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer-Verlag, Berlin
Griffith D (2009) Spatial autocorrelation. In: Kitchin R, Thrift N (eds) International encyclopedia of human geography. Elsevier, Oxford, pp 308–316
Griffith D (2010) The Moran coefficient for non-normal data. J Stat Plan Inference 140(11):2980–2990
Griffith D (2017) Some robustness assessments of Moran eigenvector spatial filtering. Spat Stat 22:155–179
Griffith D (2018) Generating random connected planar graphs. GeoInformatica 22:767–782
Griffith D (2019) Negative spatial autocorrelation: one of the most neglected concepts in spatial statistics. Stats 2:388–415
Griffith D, Chun Y, Li B (2019) Spatial regression analysis using eigenvector spatial filtering. Elsevier, Cambridge, MA
Griffith D, Layne L (1999) A casebook for spatial statistical data analysis. Oxford, NY
Griffith D, Li B (2017) A geocomputation and geovisualization comparison of Moran and Geary eigenvector spatial filtering, in CPGIS Publication Committee. In: Proceedings of the 25th international conference on geoinformatics, geoinformatics 2017. SUNY/Buffalo, Buffalo, NY, August 2–4, p 4
Griffith D, Paelinck JH (2007) An equation by any other name is still the same: on spatial econometrics and spatial statistics. Ann Reg Sci 41(1):209–227
Griffith D, Paelinck JH (2011) Non-standard spatial statistics and spatial econometrics. Springer-Verlag, Berlin
Griffith D, Agarwal K, Chen M, Lee C, Panetti E, Rhyu K, Venigalla L, Yu X (2022) Geospatial socio-economic/demographic data: the existence of spatial autocorrelation mixtures in georeferenced data—Part I & Part II. Transact GIS 26(1):72–87
Hepple L (1998) Exact testing for spatial autocorrelation among regression residuals. Environ Plan A 30(1):85–108
Kelejian H, Piras G (2017) Spatial econometrics. Academic Press, London
Kelejian H, Prucha I (2001) On the asymptotic distribution of the Moran I test statistic with applications. J Econ 104(2):219–257
Koo H, Chun Y, Griffith D (2018) Integrating spatial data analysis functionalities in a GIS environment: spatial analysis using ArcGIS Engine and R (SAAR). Trans GIS 22:721–736
LeSage J, Pace R (2009) Introduction to spatial econometrics. CRC/Chapman & Hall, Boca Raton, FL
Leung Y, Mei C-L, Zhang W-X (2000) Testing for spatial autocorrelation among the residuals of the geographically weighted regression. Environ Plan A 32:871–890
Li H, Calder C, Cressie N (2007) Beyond Moran’s I: testing for spatial dependence based on the spatial autoregressive model. Geogr Anal 39:357–375. https://doi.org/10.1111/j.1538-4632.2007.00708.x
Luo Q, Griffith D, Wu H (2017) The Moran coefficient and the Geary ratio: some mathematical and numerical comparisons. In: Griffith D, Chun Y, Dean D (eds) Advances in geocomputation: geocomputation 2015—the 13th international conference. Springer, Berlin, pp 253–269
Luo Q, Griffith D, Wu H (2019) Spatial autocorrelation for massive spatial data: verification of efficiency and statistical power asymptotics. J Geogr Syst 21:237–269
Mays G, Smith S (2009) Geographic variation in public health spending: correlates and consequences. Health Serv Res 44(5p2):1796–1817
Paelinck J, Klaassen L (1979) Spatial econometrics. Saxon House, Farnborough
Potter K, Koch F, Oswalt C, Iannone B III (2016) Data, data everywhere: detecting spatial patterns in fine-scale ecological information collected across a continent. Landscape Ecol 31:67–84
Sauer J, Stewart K, Dezman Z (2021) A spatio-temporal Bayesian model to estimate risk and evaluate factors related to drug-involved emergency department visits in the greater Baltimore metropolitan area. J Subst Abuse Treat 131:108534
Sokal R, Oden N, Thomson B (1998) Local spatial autocorrelation in a biological model. Geogr Anal 30:331–354
Tait M, Tobin J (2017) Three conjectures in extremal spectral graph theory. J Comb Theory Ser B 126:137–163
Tiefelsdorf M, Griffith D, Boots B (1999) A variance stabilizing coding scheme for spatial link matrices. Environ Plan A 31:165–180
Wang F (2020) Why public health needs GIS: a methodological overview. Ann GIS 26(1):1–12
Wennberg J, Cooper M (1998) The Dartmouth atlas of health care in Pennsylvania. American Hospital Association, Chicago
Wiedermann W, Hagmann M (2016) Asymmetric properties of the Pearson correlation coefficient: correlation as the negative association between linear regression residuals. Commun Stat Theory Methods 45:6263–6283
Zhang Y, Baicker K, Newhouse J (2010) Geographic variation in the quality of prescribing. N Engl J Med 363(21):1985
Acknowledgements
Compilation of the publicly available data obtained via The Dartmouth Atlas DATA website https://data.dartmouthatlas.org/mortality was funded by the Robert Wood Johnson Foundation, as well as The Dartmouth Clinical and Translational Science Institute under the auspices of award number UL1TR001086 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH), and, in part, by the National Institute of Aging under the auspices of award number U01 AG046830.
Funding
No funding source.
Author information
Authors and Affiliations
Contributions
Griffith did all SAS computations, and Chun did all R computations. Griffith did the initial paper draft, and Chun and Griffith repeatedly revised through to its current version. Chun spearheaded the HSA empirical analysis.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Selected MC + GR simulation experiment results
Tables
6 and
7 furnish numerical evidence gleaned from simulation experiments espousing a reliability interval for the sum MC + GR offering heuristic guidance about georeferenced data containing various natures and degrees of SA. The distilled general interval reported in this paper is 0.95 ≤ MC + GR ≤ 1.05 (i.e., 1 ± 0.05). A useful next research step would be to refine this proposition, expressing both its upper and lower bounds as functions of n, ρ, and λ1(C). These are some of the data facets appearing to introduce variation across the columns of these two tables.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Griffith, D.A., Chun, Y. Some useful details about the Moran coefficient, the Geary ratio, and the join count indices of spatial autocorrelation. J Spat Econometrics 3, 12 (2022). https://doi.org/10.1007/s43071-022-00031-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43071-022-00031-w