Skip to main content
Log in

The effects of spatial autoregressive dependencies on inference in ordinary least squares: a geometric approach

  • Original Article
  • Published:
Journal of Geographical Systems Aims and scope Submit manuscript

Abstract

There is a common belief that the presence of residual spatial autocorrelation in ordinary least squares (OLS) regression leads to inflated significance levels in beta coefficients and, in particular, inflated levels relative to the more efficient spatial error model (SEM). However, our simulations show that this is not always the case. Hence, the purpose of this paper is to examine this question from a geometric viewpoint. The key idea is to characterize the OLS test statistic in terms of angle cosines and examine the geometric implications of this characterization. Our first result is to show that if the explanatory variables in the regression exhibit no spatial autocorrelation, then the distribution of test statistics for individual beta coefficients in OLS is independent of any spatial autocorrelation in the error term. Hence, inferences about betas exhibit all the optimality properties of the classic uncorrelated error case. However, a second more important series of results show that if spatial autocorrelation is present in both the dependent and explanatory variables, then the conventional wisdom is correct. In particular, even when an explanatory variable is statistically independent of the dependent variable, such joint spatial dependencies tend to produce “spurious correlation” that results in over-rejection of the null hypothesis. The underlying geometric nature of this problem is clarified by illustrative examples. The paper concludes with a brief discussion of some possible remedies for this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In fact, the same is true for the much broader class of feasible generalized least square (FGLS) estimators. For as out by Green (2003, p. 211) and others, OLS is usually more efficient than FGLS when departures from classical assumptions of linear models are not too severe. For specific simulation results in the context of temporal autocorrelation, see for example Dutilleul and Alpargu (2001).

  2. As with most linear models, this trend line is at best only locally linear. But it may still provide a reasonable description of mean population density within the range shown.

  3. Of course not all correlated deviation patterns will be as smooth as those depicted. This stylized representation is only meant to illustrate a general tendency toward smoothness.

  4. This set of spatial boundaries (as illustrated in Fig. 2 of Smith and Lee 2011a) is taken from Anselin (1988) and also constitutes one of the standard examples used in Geoda software.

  5. Queen contiguity implies that equal positive weights, w ij  > 0, are assigned to all distinct neighborhood pairs, ij, and that w ij  = 0 elsewhere.

  6. The restriction, \( |\rho_{y} | < 1 \), ensures that the matrix inverse, (I n  − ρ y W)−1, exists over the full range of ρ y , so that the autoregressive process has a well-defined reduced form, \( \upsilon = (I_{n} - \rho_{y} W)^{ - 1} \varepsilon \).

  7. This unusually large number of simulations was employed to minimize any possible sampling error in the results of Table 1a below.

  8. The specific values of ρ y used were in increments of 0.1 from 0 to 0.9, together with the end value, 0.95. Given the singularity of (I n  − ρ y W)−1 at ρ y  = 1.0, values larger than 0.95 tend to exhibit computational instabilities.

  9. Of course all variance estimates are still based on the standard asymptotic covariance matrix for ML estimation, so that some small-sample bias remains.

  10. There are six separate illustrations in Table 1, labeled (a) through (f). We shall refer to each by its label, such as Table 1a for the present case. Graphical representations of each illustration are given in the longer version of this paper, Smith and Lee (2011a), available online.

  11. In particular, Eq. 2 is precisely Eq. 4 under the null hypothesis, β1 = 0.

  12. Note that μ y in Eq. 5 below now plays the role of β0 in Eq. 2.

  13. The effects of different weight matrices for y and x are illustrated in the longer version of this paper (Smith and Lee 2011a, Section 4.4.3).

  14. An additional restriction on W will be considered in Sect. 4.3.1 below.

  15. We also note that the inverses \( B_{{\rho_{y} }}^{ - 1} \) and \( B_{{\rho_{x} }}^{ - 1} \) are guaranteed to exist when \( \rho_{y} ,\rho_{x} \in [0,1) \). For an analysis of this model in the case of “unit roots” where either ρ y  = 1 or ρ x  = 1, see for example Lauridsen and Kosfeld (2006).

  16. A more systematic analysis would of course involve tables of test-size values allowing variation in both ρ y and ρ x . However, our purpose here is simply to illustrate the key properties of these testing procedures. Our main objective is to explain these properties in geometric terms.

  17. This still leaves open the over-rejection problem for SEM seen in smaller samples such as Table 1d above. We shall return to this issue in the concluding section of the paper.

  18. Here, it should be noted that the covariance matrix, σ2 M 2, of \( \tilde{\varepsilon } \) has rank n – 1, so that technically \( \tilde{\varepsilon } \) has a singular normal distribution. This can easily be remedied by replacing M 2 with an equivalent reduced form matrix of full column rank, as developed in Appendix 2 of the supplementary material.

  19. All appendices are included in the supplementary material for this paper and can also be found in the longer version of this paper, Smith and Lee (2011a), available on line.

  20. Davidson and MacKinnon (1993) also point out that the associated t statistic for this null hypothesis (which is simply the (signed) square route of F 1) corresponds to the cotangent of the angle.

  21. Fortunately, such samples are just large enough to yield nontrivial estimates of slopes such as β1.

  22. As shown in Appendix 2 in ESM, this singular density can be replaced by a proper density, \( \phi (U^{\prime}x) \), where U are eigenvectors for the non-null eigenvalues of M.

  23. Dutilleul also credits earlier work on this topic by Huynh and Feldt (1970), and others.

  24. The consequences of this symmetry property have of course been noted by many authors, dating at least as far back as Fingleton (1999).

  25. In fact, this second assumption is for convenience only and simply avoids the need to introduce generalized inverses.

  26. As shown in the Appendix, this result actually holds for all spatial dependency values, ρ y , which yield a well-defined reduced form in Eq. 8 above, i.e., for which the matrix, \( B_{{\rho_{y} }} = I_{n} - \rho_{y} W \), is nonsingular. But since our main interest is on nonnegative spatial dependencies, we choose here to focus on this case.

  27. In fact, spurious correlation can even arise for completely independent x-samples and y-samples. In particular, if such samples are heteroscedastic, then such differences in variation can produce non-spherical distributions that have the same effects as those for correlated samples. An explicit example of this type is developed in the longer version of this paper (Smith and Lee 2011a, Section 4.3).

  28. The transpose notation here indicates that these are by convention column vectors.

  29. Note also that since all components of z are completely determined by its first component, this n-vector is effectively a sample of size one.

  30. Recall again that we ignore the measure-zero cases in which z 1 = 0 and/or w 1 = 0.

  31. Note also that this result depends only on symmetry of both the z 1 distribution and w 1 distribution about zero and does not require normality.

  32. In the more general development in the Appendix 2 in ESM, T is an instance of the matrix, \( U^{\prime}_{2} \), for n = 3.

  33. The symbol τ is employed here to avoid confusion with spatial dependency parameters, ρ.

  34. Note that Σ r can also be expressed directly as a linear matrix function of τ. In particular, it may be verified that in terms of the row representation of T in Eq. 52, \( \Upsigma_{\tau } = T^{\prime}T + \tau (T_{1} T^{\prime}_{2} + T_{2} T^{\prime}_{1} ) \).

  35. Note that as a parallel to JSE models, the spatial dependency parameters (ρ y , ρ x ) are here replaced by the correlation parameters (τ y , τ x ). However, this simple parameterization is only possible in the present setting for the n = 3 case.

  36. Note that this argument is in fact an instance of the more general geometric fact (mentioned in Sect. 3.2 above) that orthogonal projections of spheres are always spheres of lower dimension.

  37. While the simulation size, 5,000, yields a clear visual scatter plot in panel (a1), it is not sufficiently large to overcome the extreme variation in samples of size n = 3. Hence, all histograms in this section [such as in panels (a1) and (a3)] are based on much larger simulations of 100,000 draws. At this simulation size, the true shape of each sampling distribution [such as the uniform distribution in panel (a3)] is much more evident.

  38. A histogram of these cosine values is somewhat less informative since the cosine itself is a very nonlinear function.

  39. For this small sample size, the corresponding rejection level is enormous: F(0.05;1,1) = 161.45.

  40. As in footnote 37 above, each estimated test size in this section (such as the present value of 0.0498) is based on a larger simulation of 100,000 draws to overcome sampling variation.

  41. Interestingly, the result to be developed does not require a zero diagonal. But the spatial error model itself requires zero diagonals to avoid self-referencing in the spatial autoregressive relation of Eq. 4 above.

  42. Such matrices are also said to be irreducible matrices. For further discussion of such weight matrices, see for example Appendix A in Martellosio (2010).

  43. The pairs (λ1, v 1) are often designated as the Perron eigenvalue and eigenvector for W. See Lemma 1 in Appendix 4 in ESM for more detail.

  44. As mentioned in the introduction, this result is largely inspired by the work of Kramer and Donninger (1987) who developed a parallel result for the covariance matrix, \( \text{cov} (y) = \sigma^{2} (I_{n} - \rho_{y} W)^{ - 1} (I_{n} - \rho_{y} W^{\prime})^{ - 1} \). A recent result closely related to Theorem 3 (in the context of testing for spatial autocorrelation) can be found the proof of Theorem 1 in Martellosio (2010).

  45. Note also that y should technically be indexed by ρ y to indicate that each specified value of ρ y implicitly defines a different random vector, y.

  46. The limiting properties of test sizes for this perfect-correlation case are studied in Kramer (2003).

  47. Here, the exceptional subset of \( \varepsilon_{y} \)values are those with \( v^{\prime}_{1} \varepsilon_{y} = 0 \), which has probability zero.

  48. It is also of interest to note that as a spatial pattern, the vector v 1 exhibits maximal correlation with its corresponding “spatial influence” pattern, Wv 1. In fact, this correlation is perfect, since by definition, Wv 1 = λ1 v 1 (with v 1 > 0) implies corr(v 1, Wv 1) = 1 (which is also closely related to the well-known extremal properties of Moran’s I in terms of eigenvectors, as for example in DeJong et al. 1984 and Griffith 1996). So this type of spuriousness is again associated with an extreme form of spatial correlation.

  49. In this context, it is important to note that the “row normalization” convention often used with spatial weight matrices in fact guarantees that v 1 = 1 n . But as pointed out by Kelejian and Prucha (2010), the validity of this normalization procedure is subject to question. Here, it should also be noted that for matrices close to the “equal weights” matrix with constant off-diagonal components, one must again have v 1 close to span(1 n ). Such weight matrices are known to exhibit a variety of special properties with respect to standard testing procedures, as studied for example by Kelejian and Prucha (2002) and Martellosio (2011).

  50. This observation is closely related to the more general result of Krämer and Donninger (1987) showing that OLS will be as efficient as SEM when \( v_{1} \in {\text{span}}(X) \) and W is symmetric (see also Tilke 1993, and Krämer and Baltagi 1996). However, this efficiency result holds much more generally, as recently shown by Martellosio (2011).

References

  • Alpargu G, Dutilleul P (2003a) To be or not to be valid in testing the significance of the slope in simple quantitative linear models with autocorrelated errors. J Stat Comput Simul 73(3):165–180

    Article  Google Scholar 

  • Alpargu G, Dutilleul P (2003b) Efficiency and validity analyses of two-stage estimation procedures and derived testing procedures in quantitative linear models with AR(1) errors. Commun Stat Simul Comput 32(3):799–833

    Article  Google Scholar 

  • Alpargu G, Dutilleul P (2006) Stepwise regression in mixed quantitative linear models with autocorrelated errors. Commun Stat Simul Comput 32:799–833

    Article  Google Scholar 

  • Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Boston

    Google Scholar 

  • Berman A, Plemmons RJ (1994) Nonnegative matrices in the mathematical sciences. Siam, Philadelphia

    Book  Google Scholar 

  • Bivand R (1980) A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations. Quaest Geogr 6:5–10

    Google Scholar 

  • Clifford P, Richardson S, Hémon D (1989) Assessing the significance of the correlation between two spatial processes. Biometrics 45(1):123–134

    Article  Google Scholar 

  • Davidson R, MacKinnon J (1993) Estimation and inference in econometrics. Oxford University Press, New York

    Google Scholar 

  • Davidson R, MacKinnon J (2004) Econometric theory and methods, Oxford University Press, New York

  • DeJong P, Sprenger C, Van Veen F (1984) On extreme values of Moran’s I and Geary’s C. Geogr Anal 16(1):17–24

    Article  Google Scholar 

  • Dutilleul P (1993) Modifying the t test for assessing the correlation between two spatial processes. Biometrics 49(1):305–314

    Article  Google Scholar 

  • Dutilleul P (2008) A note on sufficient conditions for valid unmodified t testing in correlation analysis with autocorrelated and heteroscedastic sample data. Commun Stat Theory Method 37:137–145

    Article  Google Scholar 

  • Dutilleul P, Alpargu G (2001) Efficiency analysis of ten estimation procedures for quantitative linear models with autocorrelated errors. J Stat Comput Simul 69:257–275

    Article  Google Scholar 

  • Fingleton B (1999) Spurious spatial regression: some Monte Carlo results with a spatial unit root and spatial cointegration. J Reg Sci 39(1):1–19

    Article  Google Scholar 

  • Green WH (2003) Econometric analysis, 5th edn. New Jersey, Prentice Hall

    Google Scholar 

  • Griffith D (1996) Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Can Geogr 40(4):351–357

    Article  Google Scholar 

  • Huynh H, Feldt S (1970) Conditions under which mean square ratios in repeated measurements designs have exact F-distributions. J Am Stat Assoc 65(332):1582–1589

    Article  Google Scholar 

  • Kelejian HH, Prucha IR (2002) 2SLS and OLS in a spatial autoregressive model with equal spatial weights. Reg Sci Urban Econ 32(6):691–707

    Article  Google Scholar 

  • Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econ 157(1):53–67

    Google Scholar 

  • Kramer W (2003) The robustness of the F-test to spatial autocorrelation among Regression disturbances. Statistica 63(3):435–440

    Google Scholar 

  • Krämer W, Baltagi B (1996) A general condition for an optimal limiting efficiency of OLS in the general linear regression model. Econ Lett 50(1):13–17

    Article  Google Scholar 

  • Krämer W, Donninger C (1987) Spatial autocorrelation among errors and the relative efficiency of OLS in the linear regression model. J Am Stat Assoc 82(398):577–579

    Article  Google Scholar 

  • Lauridsen J, Kosfeld R (2006) A test strategy for spurious spatial regression, spatial nonstationarity, and spatial cointegration. Pap Reg Sci 85(3):363–377

    Article  Google Scholar 

  • Legendre P, Dale MRT, Fortin M-J, Gurevitch J, Hohn M, Myers D (2002) The consequences of spatial structure for the design and analysis of ecological field surveys. Ecography 25(5):601–615

    Article  Google Scholar 

  • Martellosio F (2010) Power properties of invariant tests for spatial autocorrelation in linear regression. Econ Theory 26(1):152–186

    Google Scholar 

  • Martellosio F (2011) Non-testability of equal weights spatial dependence. Econ Theory. doi:10.1017/S0266466611000089

  • Mur J, Trívez FJ (2003) Unit roots and deterministic trends in spatial econometric models. Int Reg Sci Rev 26(3):289–312

    Article  Google Scholar 

  • Smith TE, Lee KL (2011a) The effects of spatial autoregressive dependencies on inference in OLS: A geometric approach. Working Paper available on line at: http://www.seas.upenn.edu/~tesmith/Geometry_of_Spurious_Correlation.pdf, in progress

  • Smith TE, Lee KL (2011b) Small sample inference for spatial error models, in progress

  • Tilke C (1993) The relative efficiency of OLS in the linear regression model with spatially autocorrelated errors. Stat Pap 34(3):263–270

    Article  Google Scholar 

Download references

Acknowledgments

The authors are indebted to Federico Martellosio for valuable comments and suggestions on an earlier draft of this paper. We are also grateful to the two referees for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tony E. Smith.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 744 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, T.E., Lee, K.L. The effects of spatial autoregressive dependencies on inference in ordinary least squares: a geometric approach. J Geogr Syst 14, 91–124 (2012). https://doi.org/10.1007/s10109-011-0152-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10109-011-0152-x

Keywords

JEL Classification

Navigation