1 Introduction

Coronavirus 2019 (COVID-19) is a declared global pandemic with multiple risk factors (WHO report: coronavirus). A prominent characteristic of the pandemic is the marked geographic variation in COVID-19 prevalence. As of March 2020, several countries—the epicenters of the pandemic—were already badly affected by the pandemic, while others had just confirmed the first few cases. Balsius (2020) attributes this epidemic prevalence to the power law distribution (e.g., Newman 2005), namely, the correlation between the arrival time of the disease and the COVID-19 prevalence.

An interesting related debate at the city level is the role of development density in the spread of pandemics. Compact areas facilitate more intensive human interaction and could lead to higher exposure to the infection, which make them the potential epicenter of the pandemic crisis (Glaeser 2011; Eubank et al. 2004). At the same time, dense areas tend to have superior health and educational systems that are more prepared to handle pandemics, leading to higher recovery rates and lower mortality rates (Dye 2008). Densely developed areas also have the infrastructure to more effectively put in place measures that foster social distancing, thus reducing actual rates of infection. Density also could make it easier to provide services for citizens in-need at the time of social distancing orders (Bell et al. 2009). (Hamidi et al. 2020b pp 1–2).Footnote 1

In the context of COVID19 infection rates, urbanization economies may work via two channels. On the one hand, urbanization economies facilitate human interaction (Glaeser 2011; Hamidi 2020b), which, in turn, may lead to higher infection rates. On the other hand, large cities typically offer better medical services, which, in turn, may promote medical literacy, namely, elevated awareness concerning the benefits of complying with hygienic practices (washing hands), social distancing rules and wearing masks. Given the growing criticism against compact planning design in California for potentially facilitating the spread of future viruses (Kahn 2020), new empirical evidence might prove to be important.

The Israeli Ministry of Health Report (2019) clearly demonstrates the highest (lowest) concentration of medical doctors in larger cities: a national average of 3.4 physicians per 1,000 persons, Tel Aviv—5.3 physicians per 1,000 persons (Northern and Southern Districts—2.5–3.5 physicians per 1,000 persons). In a study among pregnant mothers, female literacy was found to be equivalent to increased number of nurses (Robinson and Wharrad 2001).

Given this geographic variation, it is of interest to examine whether and to what extent infection rates within countries are influenced by differences in urban population densities and socio-economic conditions. Schmitt-Grohé et al. (2020) found no correlation between access to Covid-19 testing and the level of income in New York City. The ten percent of the city's population living in the richest zip codes with 29 percent of the city's income received 11 percent of the Covid-19 tests. The ten percent of the city's population living in the poorest zip codes with only four percent of the city's income received ten percent of the tests. On the other hand, Clarke and Whitely (2020) argue that economic inequality can help predict COVID19 deaths in the USA. Other important factors the authors mention are population densities: “Population density matters as well since inter-personal transmission of the virus will be higher in densely populated areas.”

The objective of the current study is to assess the influence of these two factors (population density and socio-economic measures)Footnote 2 on coronavirus infection rates—the ratio between the number of infected persons to the total examined city population. These data are calculated based on the Israeli Ministry of Health report, which is updated as of May 11, 2020. Population densities, socio-economic measures and the Gini Index updated to 2018 are based on the Israel Central Bureau of Statistics (CBS) reports. Given the highly non-uniform distribution of population across cities, the existence of one of the most densely populated cities in the world (Bnei Brak) and diversified populations, Israel provides an interesting case study to explore this hypothesis.

The remainder of this study is organized as follows. Section 2 reports the background and descriptive statistics. Section 3 describes the methodology and Sect. 4 reports the results. Finally, Sect. 5 concludes and summarizes.

2 Background and descriptive statistics

2.1 Background

Israel is considered to be a highly urbanized nation. According to the ICBD report (Israel in Figures—Selected Data from The Statistical Abstract of Israel 2019), in 2018 a total of 88.9% of the Israeli population, consisting of 8,967,600 inhabitants, lived either in cities (74.2%) or municipalities (14.7%) (page 30). The urbanization processes vis á vis a credible terrorist rocket threats that Israel faces (Elster et al. 2017) may pose a challenge to the conventional approach of compact urban planning. The map in the appendix demonstrates another challenge to compact planning design, namely, the spread of the COVID19 pandemic.Footnote 3

Appendix A2” displays the map of Israel stratified by population densities. Israel is currently populated by 8.97 million inhabitants, of whom above 75% are Jews, approximately 20% are Arabs and the rest are defined as “other.” The total area of the nine statistical regions in the country, including East Jerusalem and the Golan Heights, is 22,072 sq. km., and if lakes are excluded, the total area is reduced to 21,643 sq. km. (Israel CBS: Israel in Figures 2019).

The spatial distribution of population is highly non-uniform, and there is high variability in population densities across regions. This may be demonstrated at the Table in “Appendix A3” While most of the southern parts of Israel are sparsely populated (less than 100 persons per sq. km.), population density rises with a shift to the north; until it reaches a peak of 6,276–12,385 persons per sq. km. in the Tel Aviv District (one of the most densely populated regions on a global scale). Shifting further to the central west and the northern parts of Israel, population densities drop to 1000–2999 persons per sq. km. in the Haifa, Nazerath and Karmiel and Nahariya sub-districts (ICBD report: Population—Statistical Abstract of Israel 2019- No.70. Available at: https://www.cbs.gov.il/he/publications/doclib/2019/2.shnatonpopulation/02_01e.pdf).

The three major cities in Israel are Jerusalem (919,438 residents), Tel Aviv (451,523 residents) and Haifa (283,640 residents), in 2018. Jerusalem is the biggest city and the nation’s capital. The Israeli Parliament (the Knesset), the Supreme Court, the Israeli Central Bank, and most of the government offices, including the Prime Minister's Office, are all located in Jerusalem. Tel Aviv is the center of the Gush Dan conurbation—Israel's most heavily populated and dense metropolis, and is considered to be the business and financial center of the country (Gat 1996, 1998).

The Haifa region stands out as the most capital-intensive part of Israel because heavy industry has been concentrated in Haifa since Ottoman times. Throughout 1987–2007, the North had the most capital and the Krayot towns the least (Beenstock et al. 2011: 606). Prior to 1985, the governmental policy preferred capital investment in the periphery to investment in the center. During this period, regional policy was designed to prevent depopulation in the periphery for strategic and not just economic reasons. However, investment in the periphery often had a relatively low return on investment.

Following the Economic Stabilization Plan of 1985, regional policy, like other aspects of economic policy, underwent radical changes. Greater emphasis was placed on market forces in trade policy, labor market policy, macroeconomic policy, and innovation policy. Wholesale support for investment in the periphery was diminished in favor of more selective regional incentives such as R&D, high-tech and business, incubator projects (Avnimelech et al. 2007). Therefore, it is not surprising that the periphery began to lose its preferred status over the center. (Beenstock et al. 2011: 606).

Appendix A4” displays the map of Israel stratified by percentage of locally generated income in the budget of Local Authorities. This variable is positively correlated with the Socio-Economic IndexSocio-Economic Index, where the latter variable is positively correlated with the Gini Index (0.6986–0.7382, calculated p value for the rejection of zero correlation < 0.01): an increase in the wealth of a Local Authority is associated with higher income inequality. Yet, one cannot reject the null hypothesis that population density and the socio-economic ranking of the Local Authorities are uncorrelated (p = 0.3587; 0.4804. The implication is that the spatial spread of Local Authorities based on socio-economics ranking is uniformly distributed across Israel.

Following Alperovich (1984) and O’Sullivan (2012: 81–82), we ran a rank-size rule test for the 111–255 Israeli Local Authorities. The conventional empirical model for such a test is given by the following equation:

$$G\left( P \right) = AP^{ - \alpha }$$
(1)

where G(P) = number of cities with population P or more; P = population of city; A = constant term; α = Pareto exponent. This parameter has important implications in terms of population distribution across cities. As this parameter becomes lower, populations are less evenly distributed across cities. For more populated and denser cities, one would anticipate more frequent human interaction, on the one hand, which, in turn, may lead to higher infection rates with lower values of α. On the other hand, large cities typically provide better medical services, which, in turn, may promote medical literacy, namely, elevated heightened awareness to the benefits of compliance with hygienic practices (washing hands), social distancing rules and wearing masks (Glaeser 2011; Hamidi et al. 2020b). This, in turn, may lead to higher infection rates with lower values of α:

The estimation results of Eq. (1) for the entire population of 255 cities yield:

$$\begin{array}{ll} {{proj}}\left[ {\it{ln} \left( {G\left( P \right)} \right)} \right] = &12.48522 - 0.8173233\ln\,\!\left(P \right);\quad\;R^{2} = 0.9053\;\\& \footnotesize { \left( {0.1623} \right)}\;\quad \footnotesize{ \left( {0.0166157} \right) }\end{array}$$

And for the full sample of 238 Local Authorities that include information about infection rates, the Socio-Economic Index and the Gini Index for inequality (95% of the entire population):

$$\begin{aligned} {{proj}}\left[ {\it{ln} \left( {G\left( P \right)} \right)} \right] = &13.05622-0.8731546{\text{ln}}\left( P \right);\quad\;R^{2} = 0.9319\; \\ & {\left(\footnotesize {0.{1512}} \right)}\;\quad {\left( \footnotesize{0.0{153638}} \right)} \end{aligned}$$

As anticipated, estimation of \(\alpha\) is low, and the null hypothesis \(\alpha = - 1\) is clearly rejected (99% confidence interval of [− 0.8604, − 0.7742], [− 0.9034, − 0.8429]). In his meta-analysis, Nitsch (2005) found that in most studies this parameter is below − 1.0. As suggested by the strict interpretation of the Zip’s law, the implication might be a departure from uniform distribution of population across cities in Israel.

Given that our sample is restricted to 111 Local Authorities with information about the Socio-Economic Index and the Gini Index for inequality, and COVID19 infection rates above zero, the estimation results of Eq. (1) for the sample of 111 Local Authorities yield:

$$\begin{aligned}{{proj}}\left[ {\it{ln} \left( {G\left( P \right)} \right)} \right] = &14.82664 - 1.04103{\text{ln}}\left( P \right);\;\quad \quad R^{2} = 0.9647\;\\&\left( \footnotesize{0.{2}0{133}} \right)\;\,\,\left( \footnotesize{0.0{19}0{9}} \right)\end{aligned}$$

where standard errors are given in parentheses. The null hypothesis of \(\alpha = - 1\) is not supported empirically at the 5% level (p = 0.0338 and 95% confidence interval is [− 1.078857, − 1.003204]).

2.2 Descriptive statistics

Table 1 reports the descriptive statistics and Fig. 1—the histogram of each variable for the entire sample of 238 Local Authorities and the sample of 111 observations that excludes Local Authorities without COVID19 cases. As the histogram demonstrates, the distribution of the Rate_Infected and Pop_Density is skewed to the right. The skewness is 1.50–1.73 (Rate_Infected) and 2.42–3.18 (Pop_Density). The null hypothesis of symmetrical distribution is clearly rejected. The calculated adjusted Chi2 statistics with 2 degrees of freedom are: 64.15–138.80 for the entire sample of 238 Local Authorities and 26.23–55.49 for the sample that excludes Local Authorities without COVID19 cases. The 1% critical Chi2 value is only 9.21034. In contrast, the distribution of the Gini Index and socio-economic ranking seems to be much more symmetrical. The null hypothesis of symmetrical distribution is not rejected empirically (p = 0.0156–0.9402).

Table 1 Descriptive statistics
Fig. 1
figure 1

Histograms of Variables at A Local Authority Level

2.3 Pearson correlation matrices

Table 2 reports the Pearson Correlation Matrices for the entire sample of 238 Local Authorities and the sample of 111 observations that excludes Local Authorities without COVID19 cases. The table supports the conclusion that there is no collinearity between Population Density and both socio-economic measures (Socio-Economic Ranking and Gini Index). However, there is high collinearity between Socio-Economic Ranking and Gini Index (0.6986–0.7382, where for both correlations the null hypothesis of zero correlation is clearly rejected).Footnote 4

Table 2 Pearson Correlation Matrix

Finally, note the positive Pearson correlations between infection rates and population densities (0.3845 for the full sample, where p < 0.01 for the rejection of the zero correlation null hypothesis, and 0.2250 for the restricted sample, where p = 0.0176 for the rejection of the zero correlation null hypothesis). When the sample is restricted to exclude Local Authorities with zero infection rates, the Pearson correlation between infection rates and socio-economic ranking drops from minus 12.02% (p = 0.0641) to minus 45.68% (p < 0.01). The implication is a rise in the explanatory power of this variable vis-á-vis a drop in the explanatory power of population density with the restriction of the sample.

3 Methodology

Consider the following estimated maximum likelihood objective function of the fractional probit model (e.g., Papke and Wooldrige (1996); Johnston and Dinardo (1997): 61–63,Footnote 5 424–426; Wooldrige (2010)):

$$\ln L = \mathop \sum \limits_{j = 1}^{N} \omega_{j} y_{j} \ln \left\{ {G\left( {x_{j}^{^{\prime}} \beta } \right)} \right\} + \mathop \sum \limits_{j = 1}^{N} \omega_{j} \left( {1 - y_{j} } \right)\ln \left\{ {1 - G\left( {x_{j}^{^{\prime}} \beta } \right)} \right\}$$
(2)
$$G\left( {x_{j}^{^{\prime}} \beta } \right) = \Phi \left\{ {x_{j}^{^{\prime}} \beta /\exp \left( {z_{j}^{^{\prime}} \gamma } \right)} \right\}$$
(3)

where j is the index for each Local Authority (for the full model \(j = 1,2,3, \ldots ,238\) and for the model, which excludes Local Authorities with zero infection rates \(j = 1,2,3, \ldots ,111\)); \(\omega_{j} = \sqrt {{\text{POP}}_{j} }\) (the square root of Local Authority population); \(y_{j} = {\text{Infection}}\_{\text{rate}}_{j} = \frac{{{\text{Infected}}_{j} }}{{{\text{Examined}}_{j} }}\) where \(0 \le y_{j} < 1\); \(x_{j}^{^{\prime}}\) is a matrix whose dimensions are 238 × 5 or 111 × 5 (\(x_{j,1}^{^{\prime}} = \vec{1}\) for the constant term; \(x_{j,2}^{^{\prime}} = {\text{Population}} {\text{Density}}_{j}\) in square kilometers;

\(x_{j,3}^{^{\prime}} = {\text{Population}}\_{\text{Density}}_{j}^{2}\); \(x_{j,4}^{^{\prime}} = {\text{Gini Index}}_{j}\) (a measure that ranges between zero = perfect equality and 1 = perfect inequality) and \(x_{j,5}^{^{\prime}}\) = Socio-Economic ranking of the Local Authority, which ranges between 1 = the lowest, to 10 = the highest). Finally, \(\Phi\) is the cumulative normal distribution function; \(z_{j}^{^{\prime}} = x_{j}^{^{\prime}}\) and \(\beta\) and \(\gamma\) are column vectors of the parameters with up to four rows.

The fractional probit model was pioneered and has been extensively used in biometrics applications (Amemiya 1981: 1484; Johnston and Dinardo 1997: 413). It belongs to the family of discrete choice models. Biologists (medical researchers) employ this sort of model to measure the relationship between survival of an insect = 1; otherwise = 0 (patients recovery = 1; non-recovery = 0) and the dosage of insecticide (drugs). Consequently, it seems plausible to employ this model in a micro-individual level sample, where the limited dependent variable equals 1/0 if the person was infected/not-infected.

The difference between the Probit and the Fractional Probit model is the definition of the dependent variable. While the dependent variable in the former receives only 1 or zero (\(y_{i} = 1/0\) for infected/not-infected), the dependent variable in the latter may receive any continuous variable bounded between 0 and 1, so that: \(0 \le y_{i} \le 1\) (Papke and Woldridge 1996: 621). This model fits the definition of Infection Rate as Cases divided by the population of the Local Authority.

With this exception, the discussion on these bounded models are quite similar. It is possible to run an OLS procedure with a limited dependent variable (the Linear Probability Model or LPM). The advantage of this model lies in the fact that the coefficients are readily interpreted as marginal probabilities. Yet, according to Johnston and Dinardo 1997: “A major weakness of the linear probability model is that it does not constrain the predicted value to lie between 0 and 1.” (page 417—italics in the source). Based on the 1988 Population Survey data, the authors highlight. This weakness, where according to the projected probabilities to become a union member (Union = 1/0 for Union/non-union member), 5% of the persons have a minus 10% chance of being a union member (page 417).

The Probit model, whose predictions are bounded between 0 and 1, is the cumulative standard normal distribution function. An additional advantage of the probit model is: “In the probit model, the derivative of the probability with respect to X varies with the level of X and the other variables in the model.” (Johnston and Dinardo 1997: 422—Italics are in the source).

Johnston and Dinardo (1997) also mention the following disadvantage of the probit model: “Observe that the sign pattern of the coefficients is the same one we observed for the linear probability model. However, calculating the change in the probability of union membership with respect to one of the right-hand-side variables is not so simple as it was in the linear probability model.” (page 422).

Finally, the Logit model, whose predictions are bounded between 0 and 1 is given by: \({\text{prob}}\left( {y_{i} = 1} \right) = \frac{{\exp \left( {X_{i} \beta } \right)}}{{1 + \exp \left( {X_{i} \beta } \right)}}\) (Johnston and Dinardo 1997: 424). The main difference between the normal distribution and the logistic distribution is that the latter has more weight in the tail. Yet Johnston and Dinardo (1997: 424–426) demonstrate, in contrast to the LPM, the minor projected differences obtained from the estimation of the probit and logit models.

4 Results

4.1 Main results

Table 3 reports the regression outcomes of Eqs. (2)–(3). The bottom of the table gives the outcomes of the Harvey Harvey-Godfrey test for heteroskedasticity, where the dependent variable in the auxiliary regression is \(\ln \hat{\sigma }^{2}\) (Ramanathan 2002: 348–350). This specification seems to fit the model described by Eq. (3). The outcomes support heteroskedasticity with respect to population density and the Socio-Economic Index for the full sample of 238 Local Authorities (consisting of 95.39% of the Israeli population), and with respect to population density and the Gini Index for the restricted sample of 111 Local Authorities (consisting of 78.80% of the Israeli population).

Table 3 Regression outcomes

The outcomes reported at the top of Table 3 indicate that the linear model is rejected in favor of the quadratic model. The negative sign of the coefficient of population density indicates a parabola with a maximal point, namely, projected probability of infection rates is expected to rise with population density at a decreasing pace, until it reaches a maximal point, after which projected probability to be infected drops.

Given the difficulty to interpret directly the coefficients of the fractional probit regressions, and based on the outcomes reported in Table 3, Figs. 2a–b give the projected probability to be infected from COVID19 with respect to population density. The figures show an anticipated rise in COVID19 infection rate with population density from 1.6 to 2.72% up to a maximum of 5.17–5.238% for a population density of 20,282–20,542 persons per sq. km. Above this benchmark—anticipated infection rate drops up to 4.06–4.50%. Projected infection rates of 4.06–4.50% are equal in Local Authorities with the maximal population density of 26,510 and 11,979–13,343 persons per sq. km.

Fig. 2
figure 2

a Including Local Authorities with no infection (238 Local Authorities). b Excluding Local Authorities with no infection (111 Local Authorities). Notes Based on the outcomes reported in Table 3

Given that at denser regions (above 20,282–20,542 persons per sq. km.) projected infection rates are anticipated to drop, results of our study reveal a more complex COVID19 infection rate—population density relationship than previously thought, and may support the professional consensus in favor of a compact development (Hamidi et al. 2020b).

Returning to Table 3, results show a positive association between projected probability of infection rates and income inequality, and a negative association between projected probability of infection rates and the socio-economic ranking.

4.2 Robustness tests

As a robustness test, we compare the results obtained from the fractional probit with those obtained from the fractional logit and the linear probability models, where robust standard errors are employed and only inherent heteroscedasticity with respect to the group size (\(\sqrt {{\text{POP}}}\)) is considered. The outcomes are given in Table 4 and are robust to those obtained previously when the fractional probit and the fractional logit models are applied to the full sample of 238 Local Authorities, covering 95% of the Israeli population.

Table 4 Robustness tests of different empirical models

To measure the impact of first-order spatial autocorrelation, we mapped the entire 238 Local Authorities based on longitude and latitude and sorted from southern to the northern parts of Israel. In addition, we corrected for heteroskedasticity, where \(\sigma_{i}^{2} = \frac{{\sigma^{2} }}{N}\) and the weights are \(\sqrt N = \sqrt {{\text{POP}}}\) (the square root of the population of each Local Authority). The outcomes are reported in Table 5, where columns (1) and (3) report the LPM estimates (with statistical tests for first order spatial autocorrelation) and columns (2) and (4) report the estimates obtained via the Prais–Winsten regression.Footnote 6

Table 5 Robustness Test of Spatial Autocorrelation

Results of the LPM procedure, which ignores first-order spatial autocorrelation, and the Durbin H test,Footnote 7 demonstrate that the null hypothesis of zero autocorrelation (\(\rho = 0\)) is rejected for both the full sample of 238 Local Authorities at the 1% level (where \(\hat{\rho } = 0.3383\)) and the limited sample of 111 Local Authorities at the 5% level (where \(\hat{\rho } = 0.6570\)). Yet, when the Prais–Winsten procedure replaces the LPM, results remain robust both in terms of the sign and significance of the coefficients.

An additional concern, which should be addressed, is the high collinearity between the socio-economic measures (Gini, Socio_Economic_Index). According to Johnston and Dinardo (1997): “The more the X’s look alike, the more imprecise is the attempt to estimate their relative effects. This situation is referred to as multicollinearity or collinearity (pages 88–89). There are two strategies to address this concern.

The first strategy is to argue that the two control variables included in the empirical model (Gini, Socio_Economic_Index) are relevant according to the theory or the logic of the researchers, and, consequently, multicollinearity is not an issue. Indeed, according to Ramanathan 2002: The danger of multicollinearity is a strong argument against the indiscriminate use of explanatory variables. The importance of theory in formulating models should once again be emphasized. There may be strong theoretical reasons for including a variable even if multicollinearity might make its coefficient insignificant. In this case, the variable should be retained in the model even if multicollinearity exists. (page 216; a further discussion is given in Johnston and Dinardo 1997: 110–111; Kmenta 1997: 430–432; 442–446). Indeed, the control variables Gini and Socio_Economic_Index capture different dimensions of the socio-economic status of the city. While the former reflects the mean, the latter captures the dispersion around the mean of the city wealth or income.

The second strategy is to drop the Gini variable from the model and observe the change in sign of coefficient and p-values of the remaining variable. Table 6 gives this robustness test, where the full model includes and excludes the independent variable Gini. The outcomes of this test demonstrate that the coefficient of Socio_Economic_Index remains negative and statistically significant in the presence and absence of the Gini variable. The implication is that when the population density is controlled, COVID19 infection rate is anticipated to drop by 0.368–0.748% with one-point elevation of the socio-economic ranking of the city.

Table 6 Collinearity robustness test

5 Summary and conclusions

Given the huge geographic variation in COVID-19 prevalence, the objective of the current study is to assess the influence of population density and socio-economic measures on coronavirus infection rates—the ratio between the number of infected persons and the total examined city population. These are calculated based on the Israeli Ministry of Health report, which is updated to May 11, 2020. Population densities, socio-economic rankings and the Gini Index updated to 2018 are based on the Israel CBS reports. Israel provides an interesting case study based on the highly non-uniform distribution of population across cities, the existence of one of the most densely populated cities in the world (Bnei Brak) and diversified populations.

Given the debate over whether compact planning may promote the spread of a virus, it is important to provide evidence referring to the relationship between COVID19 spread and population density. Indeed, compact planning promotes greater physical activity and less likelihood of obesity, heart disease, cancer prevalence (Ewing et al. 2014; Sallis et al. 2016; Arbel et al. 2019), higher life expectancy (Hamidi et al. 2018) and consumption of healthier food (Hamidi 2020a, b).

The outcomes of our study may provide support to the compact planning design. They demonstrate that ceteris paribus projected probabilities to be infected from coronavirus rise with population density from 1.6 to 2.72% up to a maximum of 5.17–5.238% for a population density of 20,282–20,542 persons per square kilometer. Above this benchmark—the anticipated infection rate drops up to 4.06–4.50%. Projected infection rates of 4.06–4.50% are equal in Local Authorities with the maximal population density of 26,510 and 11,979–13,343 persons per sq. km. Indeed, city planners should weight the costs and benefits of many risk factors, including the COVID19 pandemic.

In the conventional urban economics textbooks, high population density represents multi-story structures with smaller building footprints, where price of land is expensive (at the central cities, e.g., Mills and Hamilton 1989: 425–434; O’Sullivan 2012: 127–151; Arbel et al. 2019). Yet, when population density is transformed from theory to practice, several econometric problems arise, including: (1) the difference between gross and net population density (McDonald and McMillen, 2011: 121); (2) inherent heteroscedasticity with respect to the group size; and (3) aggregation bias.Footnote 8 Consequently, and in spite of our treatment in this inherent heteroskedasticity problem, the use of empirically measured population densities may be viewed as a limitation of our study.

Yet, referring to the aggregation bias problem, note that they might be rather small. In their textbook, McDonald and McMillen (2011) demonstrate the division of the same region around the center of New York City to 3,761 instead of 50 zones. This division results in only 9.677% rise in the absolute value of the estimated negative population density gradient—from 12.4 to 13.6% (page 125).