Consistency of bootstrap approximation to the null distributions of local spatial statistics with application to house price analysis

With the increasing availability of spatially extensive geo-referenced data, much attention has been paid to the use of local statistics to identify local patterns of spatial association, in which the null distributions of local statistics play an essential role in the related statistical inference. As a powerful tool to approximate the distribution of a statistic, the bootstrap method is used in this paper to derive null distributions of the commonly used local spatial statistics including local Getis and Ord’s Gi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$G_{i}$\end{document}, Moran’s Ii\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$I_{i}$\end{document} and Geary’s ci\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$c_{i}$\end{document}. Strong consistency of the bootstrap approximation to the null distributions of the statistics is proved under some mild conditions, and the Boston housing price data are analyzed to demonstrate the application of the theoretical results.


Introduction
Exploration of spatial association has long been recognized as an important issue in spatial data analysis. With the increasing availability of spatially extensive geo-referenced data and due to the geological and geographical diversity on a large region, a global structure of spatial association is no longer a realistic assumption for such a data set. Therefore, much attention has been paid to the use of local statistics to identify local patterns of spatial association. The most popular local spatial statistics are perhaps Getis and Ord's G i [11,18] and Anselin's LISAs [1]. Since their inception, these local statistics have been applied to a variety of fields for spatial data analysis (see, for example, [9,10,14,24]).
In order to test for significance of local spatial association at a reference location, it is essential to derive the null distribution of the local statistics. Normal distributions have been used to approximate the null distributions of some local spatial statistics such as local Getis and Ord's G i , Moran's I i and Geary's c i (see, for example, [1,11,18]). However, many empirical studies have shown that this approximation is sometimes problematic [1,4,5,27]. Based on the distributional theory of quadratic forms in normal vari-ables, some improved methods have been developed under the assumption that the spatial data are drawn from a normally distributed population (see, for example, [5,13,[20][21][22]). Nevertheless, this assumption might be invalid for some real-world data sets. With the computation power of modern computers, the randomized permutation method, a resampling procedure that randomly relocates the data over the locations, is frequently employed to approximate the null distributions of local spatial statistics (see, for example, [1,12,17]). Recently, Yan et al. [26] suggested a bootstrap method, originally proposed by Efron [8], to approximate the null distributions of the spatio-temporal versions of local Getis and Ord's G i , Moran's I i and Geary's c i . They showed by simulations that both the bootstrap and the randomized permutation methods can accurately approximate the null distributions of the local statistics while the bootstrap method seems more efficient than the randomized permutation method in terms of computational time.
However, the theoretical validity of the bootstrap approximation remains to be investigated.
The main objective of this paper is to theoretically investigate the validity of the bootstrap approximation to the null distributions of local Getis and Ord's G i , Moran's I i and Geary's c i . Under some mild conditions, we proved that the bootstrap approximation is strongly consistent in terms of the Kolmogorov distance on the space of distribution functions. Moreover, the Monte Carlo implementation of the bootstrap approximation for statistical inference is given in detail by a case study of the Boston housing price in order to demonstrate application of the theoretical results.
The remainder of this paper is organized as follows: the main results are presented in the next section, and their proofs are given in Sect. 3. As an application example of the theoretical results, the Boston housing price data are analyzed in Sect. 4. The paper is then ended with a brief summary.

Main results
Let F(x) be the population distribution and s be the coordinate of a geographical location.
Given n locations s i (i = 1, 2, . . . , n), let W = (w ij (d)) n×n be the symmetric spatial linkage matrix determined by the underlying spatial structure of the n locations or geographical units, where d is a pre-specified distance threshold and w ij (d) (j = 1, 2, . . . , n) are positive for all s j 's within distance d of the location s i excluding s j = s i , and are zero for other s j 's.
Generally, the binary values, zero and one, are assigned to w ij (d) (j = 1, 2, . . . , n) according to the above rule. At each location s j , draw independently X j from the population distribution F(x), forming an independent and identically distributed (i.i.d.) sample (X 1 , X 2 , . . . , X n ) with X j located at s j (j = 1, 2, . . . , n).
Given a reference location s i , after re-scaling and/or re-centering, the local Getis and Ord's G i [11], the local Moran's I i and Geary's c i [1] are, respectively, of the forms and Remark 1 For G i (d), it is a natural assumption that E(X j ) = μ = 0. Moreover, we modify the numerator in the G i (d) statistic as X j -X instead of X j in its original form to facilitate the forthcoming proof of the asymptotic property. This modification does not change the interpretation of the statistic.
Let F n denote the empirical distribution of the sample (X 1 , X 2 , . . . , X n ), that is, where I {A} is the indicator function of the event A. Let (X * 1 , X * 2 , . . . , X * n ) be the bootstrap sample drawn from F n (x) with replacement and be located at (s 1 , s 2 , . . . , s n ). The bootstrap scenarios of the local Getis and Ord's G i , Moran's I i and Geary's c i are, respectively, and whereX * = 1 n n i=1 X * i . Throughout this paper, we use the notations P, E and Var to indicate the probability, expectation and variance calculated under F(x) and the notations P * , E * and Var * to represent those computed under F n (x). In what follows, we first introduce the consistency definition of bootstrap approximation to the distribution of a statistic and then give the main results of this article. Definition 1 ([7], Chap. 29) Let F and G be two distributions on a sample space X and ρ(F, G) be a metric on the space of distribution functions. Let (X 1 , X 2 , . . . , X n ) be i.i.d. random variables with the common distribution F. For a given statistic T = T(X 1 , . . . , X n ; F), let H n (x) = P(T(X 1 , X 2 , . . . , X n ; F) ≤ x) and H * n (x) = P * (T(X * 1 , X * 2 , . . . , X * n ; F n ) ≤ x) be the distribution function of T and the bootstrap distribution function of T * = T(X * 1 , X * 2 , . . . , X * n ; F n ), respectively. We say that the bootstrap approximation for T is weakly consistent un- Several metrics such as the Kolmogorov distance and the Mallows distance can be employed to measure the consistency of the bootstrap approximation. The Kolmogorov distance defined by is commonly used, where R = (-∞, +∞). In this paper, the Kolmogorov distance is mainly used to investigate the strong consistency of the bootstrap approximation for the local Getis and Ord's G i , Moran's I i and Geary's c i and the main results are summarized in the following theorems.

Preliminaries and lemmas
To prove the theorems, the Mallows distance (see, for example, [3,15,16]) will be used because of its interesting properties relating to the Kolmogorov distance. Let F p be the set of distribution functions F with ∞ -∞ |x| p dF(x) < ∞. For F, G ∈ F p , the Mallows distance between F and G is defined as where 1 ≤ p < ∞ and the infimum is taken over the pairs (X, Y ) with the marginal distribution functions of X and Y being F and G, respectively. Throughout this paper, we also write d p (F, G) and [d p (F, G)] 2 as d p (X, Y ) and d 2 p (X, Y ), respectively, for the ease of interpretation.
Lemma 1 ([23], p. 12) Let X 1 , X 2 , . . . be a random variable sequence and X be a random variable with a continuous distribution function. If X n converges to X in distribution, which we denote X n X, then

and only if both of the following conditions hold:
(1) G n → G weakly as n → ∞, Remark 2 Let the distribution functions of X n and X be G n and G, respectively. Lemma 2 means that G n converges to G in the Mallows distance d p if and only if X n X and E|X n | p → E|X| p . Lemma 3 Let X 1 , X 2 , . . . be an i.i.d. random variable sequence with the common distribution function F ∈ F p . Let F n be the empirical distribution function of (X 1 , X 2 , . . . , X n ). Then Proof By Lemma 2, it is sufficient to prove F n → F weakly and E|X n | p → E|X| p . Let Y i = I {X i ≤x} (i = 1, 2, . . . , n). Since (Y 1 , Y 2 , . . . , Y n ) are i.i.d. random variables, we know from the strong law of large numbers that which indicates F n → F weakly. Similarly, E|X n | p → E|X| p can be obtained by using the strong law of large numbers for (|X 1 | p , |X 2 | p , . . . , |X n | p ).

Remark 3
The key for proving this lemma is the use of the Minkowski's inequality (see Lemma 8.6 in Bickel and Freedman [3] for the details), which does not need the independence condition among the two sets of the random variables. Therefore, the independence assumption on (X 1 , X 2 , . . . , X n ) as well as on (Y 1 , Y 2 , . . . , Y n ) is indeed not indispensable for guaranteeing the conclusion of the lemma.
Lemma 5 Let X, X 1 , X 2 , . . . be a sequence of random variables with their distribution functions belonging to F p . If d p (X n , X) → 0 as n → ∞, then Proof The conditions imply that the distribution functions of X 2 , X 2 1 , X 2 2 , . . . belong to F q . From Lemma 2, we have (i) X n X; and (ii) E(|X n | p ) → E(|X| p ). The continuous mapping theorem ([23], p.7) together with (i) yields (iii)X 2 n X 2 . The lemma is then proved according to (ii), (iii) and Lemma 2.
. . , X * n ) be the empirical distribution function and the bootstrap sample of (X 1 , X 2 , . . . , X n ), respectively. Then, for almost all sample sequences of X 1 , X 2 , . . . , Proof The condition σ 2 < ∞ implies that E(X i ) μ exists. By the strong law of large numbers, we have, for almost all sample sequences of X 1 , X 2 , . . . , Then the desired result can be proved by the continuous mapping theorem.

Lemma 7
If X n and Y n are independent random variables for each n, then X n X and Y n Y imply that (X n , Y n ) (X, Y ) with X and Y being independent.
Proof Because X n and Y n are independent random variables for every n, we have Then the lemma is proved.

Proofs of the theorems
In the proofs of the theorems, the following two cases will be separately considered because the proof ways are essentially different for the two cases.
Case 1 Suppose that W in = n j=1 w ij (d) < ∞ as n → ∞, which means that the number of observations within d-distance neighborhood of the reference location s i will be fixed when n is large enough. For a local spatial statistic, this case is possible if the newly coming observations are all placed outside the d-distance neighborhood of the reference location s i after n reaches some finite integer, say, n 0 .
Case 2 Assume W in → ∞ as n → ∞, which implies that the number of observations within distance d of the reference location s i goes to infinity as n → ∞.

Proof of Theorem 1 Note that
Furthermore, the numerators of G i (d) and G * i (d) can be, respectively, expressed as and For any ε > 0, by the Chebyshev inequality and the assumption that 1 n W 2 in → 0 as n → ∞, we obtain Similarly, we have − → σ 2 < ∞ according to the strong law of large numbers, we have, for almost all sample sequences of X 1 , X 2 , . . . , which implies W in X -X * P * − → 0 for almost all sample sequences of X 1 , X 2 , . . . .
In Case 1, from Eqs. (8) and (10) and the Slutsky theorem, we have where Similarly, from Eqs. (9) and (11), it can be inferred that have the same limiting distribution for almost all sample sequences of X 1 , X 2 , . . . . Moreover, according to Lemmas 2, 3 and 4, we obtain which implies that the distribution of 1 On the other hand, let G(d) From the Slutsky theorem, Eq. (12) and the fact that 1 Then, according to Lemma 1 and noting that the distribution function of G(d) is continuous, we have Similarly, from Eqs. (7) and (13), we have Therefore, the above two equations and the triangle inequality yields the conclusion of the theorem. In Case 2, given an n, suppose that there are k n observation locations within distance d of s i , leading to W in = √ k n and k n → ∞ as n → ∞. Without loss of generality, let (X 1 , X 2 , . . . , X k n ) locate within distance d of s i . Note that X jμ (j = 1, . . . , k n ) are i.i.d. random variables with finite variance σ 2 and k n → ∞ as n → ∞. Therefore, according to the central limit theorem, we have where Z stands for a random variable distributed as the normal distribution N(0, σ 2 ). It therefore follows from Eqs. (8) and (10) and the Slutsky theorem that Similarly, because X * j -X (j = 1, . . . , k n ) are conditionally i.i.d. random variables with variance S 2 n = 1 n n i=1 (X i -X) 2 and k n → ∞ as n → ∞, then, according to the central limit theorem and noting S 2 n a.s. − → σ 2 as n → ∞, we have, for almost all sample sequences of This, together with Eqs. (9) and (11) and the Slutsky theorem, yields for almost all sample sequences of X 1 , X 2 , . . . . Let G(d) Z μ . With a similar derivation to that in Case 1, the theorem is then proved in this case.
Proof of Theorem 2 Notice that the numerator of I i (d) can be expressed as where Firstly, from Eq. (10) andX -X i a.s.
Secondly, as mentioned in the proof of Theorem 1, 1 W in n j=1 w ij (d)(X jμ) converges to Z 0 and Z in distribution as n → ∞ in Cases 1 and 2, respectively. Therefore, from μ -X a.s. − → 0 and the Slutsky theorem, we have T 2 P − → 0 as n → ∞ in both cases.
Finally, because (x, y) → yx is a continuous mapping, then, by Lemma 7 and the result that X iμ is independent from 1 W in n j=1 w ij (d)(X jμ), we have T 3 (X iμ)Z 0 and T 3 (X iμ)Z as n → ∞ in Cases 1 and 2, respectively.
By the Slutsky theorem and Eq. (16), we obtain, as n → ∞, in Cases 1 and 2, respectively. Since 1 n n j=1 (X j -X) 2 a.s. − → σ 2 as n → ∞, we know that I i (d) I(d) according to the Slutsky theorem. Therefore, Lemma 1 and the continuity of the distribution function of I(d) guarantee that According to the triangle inequality, to prove Theorem 2, it is sufficient to prove In a similar way to that in dealing with the quantity of the left-hand side in Eq. (16), we rewrite the numerator of I * i (d) as where First of all, we obtain T * Finally, it is known that which implies X * i -X X iμ as n → ∞. Then, according to Lemma 7 and the result that X * i -X is conditionally independent to 1 W in n j=1 w ij (d)(X * j -X), we obtain T * 3 (X iμ)Z 0 and T * 3 (X iμ)Z as n → ∞ in Cases 1 and 2, respectively. According to the Slutsky theorem, it follows from Lemma 6 and Eq. (17) that I * i (d) I(d) as n → ∞ in both cases. Noting the continuity of the distribution function of I(d) and using Lemma 1 and the triangle inequality, Theorem 2 is then proved.
Proof of Theorem 3 In Case 1, since W in = n j=1 w ij (d) < ∞ as n → ∞, we can write W in = W in 0 = n 0 j=1 w ij (d) for some positive integer n 0 . According to the triangle inequality, the Hölder inequality and Lemma 3, we have Then it follows from Lemmas 4 and 5 that Similarly, from Lemma 6, we have Then the theorem is proved by using the triangle inequality. In Case 2, since W in → ∞ as n → ∞, we can rewrite the numerator of c i (d) as where With the same argument as in the proof of Theorem 1, we have According to the strong law of large numbers, we obtain B a.s.

It follows from the Markovian inequality that
Then the Slutsky theorem together with the result that 1 Applying the Slutsky theorem to Eq. (18), we have By the triangle inequality for the Kolmogorov distance, it is then sufficient to prove Similarly, the numerator of c * i (d) can be expressed as where Firstly, according to Lemmas 3, 4 and 5, we obtain which implies that the distribution of (X * i ) 2 -2μX * i + μ 2 + σ 2 converges to the distribution of A. According to the strong law of large numbers, we therefore obtain A * A as n → ∞ for almost all sample sequences of X 1 , X 2 , . . . . Moreover, with the same argument in the proof of Theorem 1, we write B * as Noting that (X * j ) 2 -(X) 2 -S 2 n (j = 1, . . . , n) are conditionally i.i.d. random variables and using the strong law of large numbers, we obtain B * a.s. − → 0 as n → ∞ for almost all sample sequences of X 1 , X 2 , . . . . Finally, for any ε > 0, we obtain from the Markovian inequality that Because 1 n n j=1 X 2 j a.s. − → μ 2 + σ 2 < ∞, we have, for almost all sample sequences of X 1 , X 2 , . . . , That is, for almost all sample sequences of X 1 , X 2 , . . . , it is true that which implies that C * P * − → 0 as n → ∞ for almost all sample sequences.
In conclusion, according to the Slutsky theorem and Lemma 6, we obtain c * i (d) c(d) as n → ∞ for almost all sample sequences of X 1 , X 2 , . . . . Equation (19) is then proved according to Lemma 1.
Remark 4 In the proofs of the three theorems, different ways are used to prove the consistency of the bootstrap approximations for Cases 1 and 2. For Case 2, the distributions of each local statistic and its bootstrap scenario are bridged by a same normal distribution. Therefore, it can be inferred that the bootstrap approximation performs at least as well as the normal distribution in this case. For Case 1, however, the numerator of each statistic is the sum of a fixed number of random variables in the process of n → ∞. The limit distribution of each statistic cannot be a normal distribution if the population for drawing the sample does not follow a normal distribution. Therefore, the normal approximation fails to approximate the null distribution of each statistic in this case, but the bootstrap approximation still works according to the proof of each theorem, which is possibly the main reason for the empirical finding that the normal approximation is sometimes problematic as mentioned in the introduction. In practice, the neighbors of a reference location is generally very few relatively to the sample size and, as aforementioned, the bootstrap method can provide a valid approximation to the null distribution of each local statistic. In summary, the bootstrap approximation outperforms the normal approximation especially in practice.

Application to the spatial pattern detection of the Boston housing price data
In order to demonstrate the application of the bootstrap approximations, a real-world example based on the Boston housing price data is analyzed for the significance test of local spatial association. As mentioned in Remark 4, the bootstrap method can provide a valid approximation for the null distribution of each local statistic. However, for a local-statisticbased test with the bootstrap approximation, some other issues such as the Monte-Carlo implementation of the bootstrap method and the multiple test problem should be considered. The purpose of this section is to provide a full process of using the bootstrap approximation in practice.

Description of the data set and determination of the spatial linkage matrix
The Boston housing price data set, which is publicly available in the R package spdep (http://eran.r-project.org/), consists of observations of the median house value (in $1000) of owner-occupied homes and 13 explanatory variables in 506 US census tracts of the Boston area in 1970. Moreover, a list of influential neighbors for each tract is also attached, where a tract is an influential neighbor of another tract if these two tracts share a common part of the boundary.
Here, we chose the median house value, which we denoted by X henceforth, as the target variable to detect its spatial variation patterns based on the observations x 1 , x 2 , . . . , x n of X in the n = 506 census tracts. The spatial linkage matrix W = (w ij ) n×n was obtained from the list of influential neighbors of each tract. Specifically, let w ij = 1 if tract j is the influential neighbor of tract i; w ij = 0 if otherwise; and w ii = 0 by convention. The number of neighbors for the 506 census tract ranges from 1 to 8 with the averaged value being 4.25 which is much smaller than the sample size n = 506.
First of all, we conducted the Kolmogorov-Smirnov test for the normality of the observations of the target variable X. The p-value of the test is p = 0.0000, providing strong evidence of non-normality of the observations. As mentioned in Remark 4, the normal approximation to the null distributions of the three local statistics is problematic while the bootstrap approximation works for this data set.

Monte Carlo implementation of the bootstrap distribution functions
In general, the exact bootstrap distribution of a statistic is difficult to derive, although it is theoretically known for a given sample drawn from the population. In practice, Monte Carlo simulation is commonly used to compute the bootstrap distribution of the statistic. Here, we take the Getis and Ord's G i statistic (we omit the distance threshold d in the statistic here because the spatial linkage matrix was determined without using it explicitly) as an example to show the Monte Carlo procedure. The procedure for the other two statistics is essentially the same.
Let x 1 , x 2 , . . . , x n be the observations of the target variable X with x i located at the location s i . Given a reference location s i , the Monte Carlo procedure for approximating the bootstrap distribution of G i is as follows.
Step 2. Compute the bootstrap value G * i of G i according to Eq. (4).

Step 3. Repeat Steps 1 and 2 for N times and obtain N bootstrap values of
Step 4. Compute the empirical distribution function of G * i (1) , G * i (2) , . . . , G * i(N) and take it as an estimator of the bootstrap distribution of G i . That is, for each real number x, the bootstrap distribution function of G i is approximated by

Alternative hypotheses and p-values of the tests
As pointed out by Getis and Ord [11], G i measures the concentration or lack concentration of the values associated with the variable X on the reference location s i . Therefore, G i is commonly used to identify a location which is surrounded by large values or small values of X in its neighborhood. I i and c i can be employed to test whether the value of X located at the reference location s i is similar (local positive autocorrelation) or dissimilar (local negative autocorrelation) to those located at its neighbors. To be specific, we mainly focused in this case study on identifying such a location that is surrounded by large values located at its neighbors by G i , that is, a location with extremely large value of G i , and testing local positive autocorrelation using I i and c i , that is, a location with extremely large value of I i or with extremely small value of c i . These above objectives amount to the G i -, I i -and c i -based tests for the following alternative hypotheses, respectively: H 1G : a tract surrounded by its neighbors with high housing price; H 1I : a tract with the housing price being positively correlated to those in its neighbors; H 1c : a tract with the housing price being similar to those in its neighbors.
The above alternative hypotheses all lead to one-sided tests. Specifically, the p-value of the G i test derived by the bootstrap distribution in Eq. (21) is where G (0) i is the observed value of G i at the reference location s i and is computed according to Eq. (1) with the sample (X 1 , X 2 , . . . , X n ) replaced by its observed value (x 1 , x 2 , . . . , x n ). Similarly, the p-values of the I i test and the c i test are, respectively, and where I (0) i and c (0) i are the observed values of I i and c i computed according to Eqs. (2) and (3), respectively.

Method for dealing with multiple testing problem
When a local statistic is used to identify local spatial association of geo-referenced data, the test is generally performed at each location over the study region based on the same observations, which involves the multiple testing problem. Therefore, a given overall significance level, say α, should be properly adjusted in order to control the overall type I error to be less than α. Although the commonly used Bonferroni and Sidák criterions can readily be used here for adjusting the overall significance level, both methods are very conservative especially when the sample size is large [1]. Caldas and Singer [6] have used the so-called false discovery rate (FDR) criterion, developed by Benjamini and Hochberg [2], to handle the multiple testing problem associated with local spatial statistics and the results demonstrated that the FDR criterion is much more powerful than the Bonferroni and the Sidák methods. Therefore, the FDR criterion is employed here for dealing with the multiple testing problem in the analysis of the Boston housing price data with the G i , I i and c i statistics. We introduce in what follows the FDR criterion in its general case.
Suppose that a total of K tests are simultaneously conducted based on a local statistic and the resultant p-values are p 1 , p 2 , . . . , p K , respectively. Sort the p-values in ascending order as p (1) ≤ p (2) ≤ · · · ≤ p (K) , and let where α is the given overall significance level. The adjusted significance level for each individual test is α A = k 0 K α.

Testing results with analysis
For the Boston housing price data, the sample size is n = 506. Given each of the three local statistics G i , I i and c i , the bootstrap procedure was used to compute the p-value at each of the 506 tracts, in which the number of the bootstrap replications is N = 500. The overall significance level was set to be α = 0.05. Using the FDR criterion, we saw that the adjusted significance levels are α G A = 0.00198 for G i , α I A = 0.00573 for I i , and α c A = 0.01107 for c i , respectively. The maps of the testing results are shown in Fig. 1, where the black areas represent the tracts with the original p-values being less than the overall significance level The result of the I i test (panels in the second row) shows a similar pattern to that of the G i test especially under the overall significance level of α = 0.05. That is, the tracts with similar housing price to those of their respective neighbors also locate on the middle western region except for some tracts on the middle eastern part. After the significance level is adjusted to α I A = 0.0057, a belt region where positive spatial autocorrelation is significant is clearly shown. By the combination of the results from G i and I i tests, we know that these common tracts colored in black and their respective neighbors all share high housing price, indicating "hot" spots of housing price in the Boston area.
The result of the c i test (panels in the last row) demonstrates a totally opposite spatial pattern to that of the I i test for the significant tracts, although both tests focus on detecting such tracts that share a similar housing price with their respective neighbors. Given the foregoing analysis showing that the I i test uncovers the tracts with high housing price sharing with their respective neighbors, it can be inferred that the c i test clarifies such tracts that share low house price with their respective neighbors. According to the structures of the I i and c i statistics, the opposite spatial patterns identified by the I i and the c i tests may imply that a large difference generally exists in the high housing price shared by a reference tract and its neighbors, while the low housing prices shared by a reference tract and its neighbors are relatively homogeneous. Moreover, it can be observed from the figure that the tracts sharing low housing price with their respective neighbors are more separately spatially distributed than those sharing high housing price with their respective neighbors. That is to say, the "cool" spots in the housing price are separately spatially distributed and the "hot" spots crowd in space.

Final remarks
There has been a growing interest in using local statistics to explore local patterns of spatial association in geo-referenced data, in which the null distributions of the local statistics play a key role in the related statistical inference. Considering that the bootstrap method can well account for non-normality of data and can easily be implemented with modern computers, we propose in this paper a bootstrap method to approximate the null distributions of the commonly used local spatial statistics of Getis and Ord's G i , Moran's I i and Geary's c i . More importantly, strong consistency of the bootstrap approximation is established, which provides not only a theoretical basis for using the bootstrap method to approximate the null distributions of these three statistics, but also some evidence that normal approximation sometimes fails to approximate the null distributions of these local statistics. Furthermore, the practical implementation procedure of the local spatial statistics based bootstrap tests is fully given by a case study of the Boston housing price data.
Methodologically, the bootstrap procedure can readily be used to approximate the null distributions of other local spatial statistics such as Ord and Getis's LOSH statistic [19,25]. However, establishing a common theoretical framework for the validity of the bootstrap approximation seems not easy. Therefore, consistency of the bootstrap approximation for other local spatial statistics or, furthermore, convergence rate of the current bootstrap approximation deserves to be investigated in the future research.