The environmental neighborhoods of cities and their spatial extent

We define the new concept of an environmental neighborhood as the surrounding area influencing the environmental quality at a given point in a city, and propose a novel methodology to measure its spatial extent. We compute the spatial correlation of air quality and urban parameters from high spatial resolution datasets for New York City, where the urban characteristics are averaged over variable urban footprint sizes, ranging from 25 m × 5 m to 5000 m × 5000 m. The scale at which these correlations peak indicates the extent of the neighboring area that influences pollutant concentrations deviations from the city-wide average. The results indicate that the scale of these environmental neighborhoods ranges from ∼1000 m (for attributes such as road area or building footmark) down to ∼200 m (for building use or green area). Selecting this optimal neighborhood scale is thus critical for identifying the urban fabric and activity attributes that have the largest influence on air quality; smaller footprints do not contain all the pertinent urban surface information while larger footprints contain irrelevant, potentially misleading information. The quantification of this scale of influence therefore enables more effective and localized policies and interventions to improve urban environmental quality and reduce urban health disparities. More broadly, the findings indicate that, in a wide range of environmental and ecological applications where surface heterogeneity is a primary driver, the scale of analysis is not an external parameter to be chosen, but rather an internal parameter dictated by the problem physics.


Introduction
Urban environmental quality has in the last decade become a global health concern. It has recently been reported at the World Health Organization (WHO) Global Conference on Air Pollution and Health that 'Air pollution-both ambient and household-is estimated to cause 7 million deaths per year' (WHO 2018a). Poor air quality impacts are exacerbated by the urban heat island effect, which itself is exacerbated by climate-change related heat waves that are increasing in frequency Bou-Zeid 2013, WHO 2018b;Li et al 2019). As reported by the Environmental Protection Agency (EPA), the increased daytime temperatures and reduced night-time cooling, coupled to the higher air pollution levels correlated with urban heat islands and anthropogenic emissions, affect human health by producing general discomfort, heat cramps and exhaustion, respiratory difficulties, heat strokes, and heat-related mortality (Dye 2008, EPA 2019, Goldman and Dominici 2019, Editorial 2019, Caplin et al 2019). Exposure to poor air quality is also linked to increased death risk from pathogenic respiratory diseases such as SARS (Cui et al 2003), and this has prompted recent media reports projecting these findings to the novel SARS-CoV-2 virus (The Washington Post 2020). Interventions to curb this multi-hazard risk have thus never been more crucial, but the science of such interventions remains more qualitative than quantitative.
The introduction of trees and vegetation, green roofs, cool roofs and pavements, as well as urban strategies to enhance urban mechanical ventilation, have proven effective in reducing the intensity of the urban heat island and air contaminant concentration levels (Grimmond and Oke 1999, Yin 2011, Santamouris 2013, Akbari 2016, Kroeger et al 2018, Llaguno-Munitxa and Bou-Zeid 2018. These measures are being adopted in various cities all around the world. Urban planning strategies to alleviate air pollution in cities through traffic restriction policies or Floor Area Ratio (FAR) regulations are also starting to be considered (Editorial 2018, Chandra 2018. Concurrently, many cities are starting to invest in extensive weather and air quality sensing networks to understand their urban environmental quality gradients. The C40 Cities air quality network led by the cities of London in the UK and Bengaluru in India, for example, claims to be implementing the 'most sophisticated air quality air monitoring system' (C40 Cities 2018). The Array of Things (AoT) project in Chicago aims to deploy 500 stationary environmental sensing nodes in the city center to monitor urban floods, heat island and air quality, with the goal of informing strategic investment in urban policy and municipal socioeconomic initiatives . New York City on the other hand is following an ambitious goal set by PlaNYC to become the city with the cleanest air of any major US city by 2030. As part of the PlaNYC's air strategy, the city's Department of Health and Mental Hygiene's New York City Community Air Survey (NYCCAS) was launched to collect high spatial resolution air pollution data. 150 street level air pollution stations were installed from 2008 to 2010, reduced to 100 from 2010 to 2013 (NYCCAS 2018;Matte et al 2013).
While mobile microclimate and air quality sensing technologies have been in use since the first half of the 20th century (Schmidt 1927, Peppler 1929, Budel and Wolf 1933, recent advancements in IoT technologies are making them increasingly popular (Kuttler and Strassburger 1999, Van Poppel et al 2013, Levy et al 2014. They can very effectively complement the fixed networks surveyed above (Yang and Bou-Zeid 2019a). The use of mobile air quality monitoring networks is enabling the acquisition of high spatial resolution air quality data affordably in various cities such as New York, Zurich, Oklahoma, California and London (Maciejczyk et al 2004, Mueller et al 2016, Apte et al 2017, BBC News 2019. Thus, for the first time in history, through the data collected by dense fixed networks and mobile sensing technologies, we are starting to capture and understand the strong spatiotemporal urban air pollution and temperature gradients present in our cities. What is needed now is a framework for applying this knowledge to quantitatively guide the interventions and policies aimed at mitigating the multi-hazard environmental risk in cities. The effect of local urban intervention strategies on air quality, such as the planting of trees or modifications of urban roughness and city block densities, as well as the effect of urban characteristics such as population or building geometries, have been widely researched in prior literature (Grimmond and Oke 1999, Beckett et al 2000, Hang et al 2012, Vos et al 2013, Gromke and Blocken 2015, Llaguno-Munitxa and Bou-Zeid 2018. Correlations between air quality datasets and urban surface characteristics and anthropogenic activities have also been studied by various researchers (Martilli 2014, Rodriguez et al 2016, Yang et al 2016. However, such studies have mainly utilized idealized models or limited experimental datasets; their uncertainties hence remain high. This may preclude accurate quantification of the benefits and the design of finely-targeted interventions. A particularly important question concerns the required spatial extent of these interventions to appreciably improve environmental quality in neighborhoods with severe problems (Yang and Bou-Zeid 2019b). The expanding availability of extensive air quality datasets collected at high spatiotemporal resolution, or mobile air quality sensing technologies, now enables analyses that can answer such questions concerning complex urban air quality spatial distributions (Chen et  . It has been recently reported, for example, that concentrations of pollutants can vary by over five times within a single city block, and by over eight times at the city scale due to variability in local emissions and traffic (Apte et al 2017). If we combine detailed urban environmental and attribute datasets, can we distill the linkages of distinct urban characteristics and activity to local air quality? At what scale do these linkages manifest? How can this information guide the design of future cities to improve environmental quality?
This paper presents a novel approach to tackle these questions utilizing high spatiotemporal resolution air quality experimental data collected by NYC-CAS from 2008 to 2016, one of the most complete air quality datasets available today that includes data for Nitrogen Oxide (NO), Nitrogen Dioxide (NO 2 ), Particulate Matter <2.5 µm (PM2.5), Ozone (O 3 ), Elemental Carbon (EC), and Sulfur Dioxide (SO 2 ). New York City also provides an extensive open data portal that enables us to relate the air pollutant concentration levels to urban characteristics. Stewart and Oke (2012) developed the Local Climate Zone (LCZ) concept to represent urban land patches with relatively uniform characteristics at the neighborhood scale that create spatially coherent thermal climates. Studies that followed which have made use of high spatial resolution environmental datasets to infer the influence of land use parameters have primarily focused on the study of the influence of urban attributes on local temperature, concluding that air temperature variability (e.g. Ziter et al 2019) and surface temperature variability (e.g. Buyantuyev andWu 2009, Jenerette et al 2016) are linked to urban parameters at a very small, local scale. However, comparable analyses for urban air quality (Edussuriya et al 2011, Miskell et al 2015, Zhou et al 2018, Jung et al 2019 have struggled to converge on general conclusions on the links between urban attributes and air pollution, and have adopted arbitrary spatial scales in their analysis (ranging from a 200 m scale to the city scale). Therefore, the spatial extent of the influence of urban parameters and the links between urban

Methods
In this paper, we propose a new approach to overcome these challenges, defining the novel concept of an environmental neighborhood as the area that exerts the most influence on environmental quality at a given point. We then demonstrate how the spatial extent of such a neighborhood can be measured based on a variable-footprint approach. We illustrate these ideas for the unique environmental contaminants and urban parameters available for New York City, but that are becoming commonly available in many other metropoles. For each NYCCAS air quality station location, urban parameters' metrics have been computed for distinct urban footprint sizes, ranging from 25 m × 25 m to 5000 m × 5000 m, allowing us to understand the correlation between contaminant concentrations and neighboring urban parameters at multiple scales. The scale maximizing this correlation can then be defined as the area of influence of a given urban parameter on local air quality.
As displayed in figure 1(b), the studied urban footprints cover ranges from the immediate neighboring area of a station, up to a big fraction of the city. Utilizing the 3 × 3 ft 2 spatial resolution land use dataset available for New York City that provides a two dimensional characterization of the urban fabric, the building footprint area Building fa , road area Road a , tree canopy plan area fraction Tree ca , and the green grass/shrub area Green gsa were computed for the different footprint sizes. Given that the traffic count was reported for specific locations within New York City, a traffic flow estimation model for the stations and footprints under study (details in the supplementary information avaliable Building ta , total residential building area Residential ta , total non-residential area Non-Residential ta , and mean building height metric Height bm were computed for all NYCCAS air quality station locations (see figure 1(a) and figure S3), and for each of the studied footprints (see table S1 for further details).

Results and discussion
The linear Pearson correlation coefficients (Pearson 1895) of contaminant concentrations with each other, and of urban parameters with each other, were first computed to identify their linkages and interdependences. Amongst the six contaminants reported by NYCCAS, the highest correlations are observed between PM 2.5 and NO 2 , and between NO 2 and NO, while O 3 tends to be negatively correlated with the other pollutants (see table S2). Table 1 lists the Pearson correlation coefficients between the selected urban parameters for a 1000 m × 1000 m footprint as an illustration. The results of urban parameter correlations suggest that we can omit the Building ta parameter as an independent input given the high correlation it has with the Height bm parameter (see table 1). But all other parameters seem to be partially independent and need to be retained. Once the independent urban parameters were selected, the retained ones were qualitatively coalesced into four categories to aid in the interpretation of the result. The first, Pedestrian Level Emissions, groups the urban parameters Road a and the Traffic c that inform us about the activity and emissions from near street level sources. The second category, Green Infrastructure, includes the urban parameters Tree ca and Green gsa , which can potentially act as air pollution sinks. The third category, Building Density, comprises the urban parameters Building ta , Building fa and Height bm that represent the ventilation and sunlight ingress potential of urban street canyons. The final category, Building Type, includes the parameters Residential ta and Non-Residential ta , which reveal information related to emissions from household or tertiary use building facilities, both through pedestrian or rooftop level sources. Now we can proceed to evaluate which urban parameters have the largest impact on air quality, and what is their spatial scale of influence. To that end, the urban parameters were spatially averaged (for urban footprints from 25 × 25 m to 1000 m × 1000 m); subsequently, three correlation coefficients (the linear Pearson correlation, the power law Pearson correlation and the non-parametric Spearman correlation) were computed for each footprint scale, and then averaged across these footprint scales (only for this analysis). The goal is to (i) confirm that the parametric and non-parametric correlation trends are consistent (which they were) and (ii) check whether the air quality and urban parameter relations are better described by a linear or a power law fit (further details included in the supplementary information and table S3). For most of the cases, a linear fit gave the highest correlation. However, some parameters such as the Tree ca or Green gsa show a higher correlation with the proposed Pearson (ln (x) , ln (y)) fit instead, where ln is the natural logarithm of the variable. We henceforth use the correlation (linear or logarithmic) for each of the contaminant and urban parameter pairs that gave the best fit. These correlation coefficient results are depicted in the bar plots of figure 2.
For SO 2 , the highest correlation is with Height bm and Building ta . SO 2 behavior is distinct from the other pollutants since (i) it is mainly emitted from building heating boilers, which explains its strong dependence on Building Density parameters, (ii) it was thus only measured during the winter in NYCCAS, and (iii) a large drop of 68% in its concentrations was noted during the experimental period (NYCCAS 2018; Matte 2013) due to heating fuel regulations imposed by NYC, as well as the increasingly strict federal level regulations that have resulted in reductions in emissions of SO 2 . For all other pollutants, Non-Residential ta , which better combines the influence of traffic and urban density, was the best single predictor. Generally, the urban parameters associated with building density and type have the highest correlations with concentrations, i.e. the most impact on air quality. Road and traffic urban parameters show similar trends, with strong correlations especially with NO and NO 2 . The parameters associated with buildings have the opposite effect to those associated with green cover, as expected. PM 2.5 , NO x and EC concentrations display very similar dependences on urban attributes. While these pollutants can also be transported from outside of the city, from sources such as vessels or power plants (NYCCAS 2018; Matte 2013) the main sources are local and mostly related to fuel combustion from vehicles and boilers.
O 3 concentrations, which were only measured during the summer months, on the other hand show unique trends compared to the remaining contaminants. An increase of O 3 is observed in areas with denser green canopies, suggesting that the formation of O 3 in the city may be partially related to local biogenic emissions, or transported from the urban boundary layer and troposphere as reported in prior literature (Kuttler and Strassburger 1999, Sillman 1999, Oke et al 2017. As described by (Sillman 1999, Gu et al 2020. The larger and negative correlations with building density and traffic parameters suggest that the conventional main source of O 3 , which is the reaction of combustion products, and NO x in particular, with volatile organic compounds and sunlight, may not be active or effective in this dense urban area. In dense urban environments, especially in city centers with high buildings like New York, the penetration of sunlight into the street canyon is limited and thus the combustion emissions may not undergo a photoreaction. This trend is already visible in figure S2 where the O 3 concentration is observed to be higher in less dense areas in boroughs outside of the denser Manhattan. Therefore, the reactants producing O 3 might need to be lofted above the urban canopy, react and produce O 3 , which is then re-entrained intro the urban canopy layer from the urban boundary layer aloft. Another possible source of O 3 is the transport from the troposphere, and since transported O 3 is less likely to reach the pedestrian level in dense urban environments, that source would also explain the spatial variations observed in the data (Sillman et al 2003, Oke et al 2017. Future research can shed light on how the combination of these factors explains O 3 variability, but what our paper aims to highlight is that the footprint extent needs to be correctly estimated to deduce the influence of urban parameters on air quality. The same general observations are also supported by figure 3, but now the distinct correlations for the various studied footprints m × 3000 m, 5000 m × 5000 m) provide a much more complete picture, and allow us to derive the extents of the environmental neighborhoods. For the urban parameters that play an important role for urban ventilation such as Building fa , the correlations peak at about 1000 m × 1000 m footprints, with a decrease at higher or smaller footprints, showing that the influence of building density is strongest within a few blocks. Thus, utilizing footprints that are too small or too big to understand the effect of building density on local air quality could yield misleading results or could underestimate the impact. The Height bm parameter, on the other hand, plateaus beyond a~150 m × 150 m footprint scale, revealing that the influence of the height of nearby buildings is the most significant, but that no penalty will be incurred if a larger footprint is used.
For the urban parameters associated with building emissions, that is the Residential ta and Non-Residential ta urban parameters, different trends are observed. For Residential ta , high correlations are observed at very small footprints~25 m × 25 m representative of the influence of nearby building emissions on the local concentration. Correlations then drop at mid-range footprint sizes~300 m × 300 m and increase again for 1000 m × 1000 m or larger footprints. This unique behavior may be related to the fact that residential household emissions take place on all floors and at roof levels, each of them with a distinct influence area. Residential emissions may therefore have a longer-range influence than those originating from commercial or urban industrial activities that generally take place in areas in closer proximity to the pedestrian level. The Non-Residential ta urban parameter, on the other hand, shows a local peak at around~150 m × 150 m footprints and plateaus for footprints >200 m × 200 m. The Non-Residential ta urban parameter includes total building floor areas for tertiary, industrial and logistical facilities and thus it is likely that such emissions occur mostly at the pedestrian level, or through high chimneys with a further reach. Also these tend to be areas with higher density of businesses and thus higher traffic (the correlation of Non-Residential ta and Traffic c is 0.63, table 1). Similarly to the Height bm parameter examined before, linking the influence of Non-Residential ta on local air quality at large footprints would not create a substantial penalty since the correlation plateaus. A conceptual diagram of characteristic emission sources in an urban street canyon in New York city is illustrated in figure 4.
The urban parameter categories associated with vehicular emissions, such as Road a , show a peak with the highest correlation at around 1000 m × 1000 m footprints, with a steep decrease in the correlation for smaller and larger footprints. As with the Building fa parameter, Road a mostly affects local air pollution within a few block distance. The Traffic c urban parameter on the other hand, shows different trends for the various pollutants, but except for O 3 , it seems a plateau is reached somewhere between 300 and 1000 m footprint scales. It is important to note however that the present study does not get down to the street or building scales, which can present strong gradients specially for the Traffic c parameter. This can be the case for example in proximity to heavy traffic intersections. Therefore, while the specific gradients in the main road arteries or street intersections of the city may not be accurately characterized, the presented footprint study provides a generalized correlation between the urban parameters and the averaged Traffic c parameter at larger spatial, longer temporal, scales. Overall for the vehicular emission related parameters, the highest correlations are observed at 500 m × 500 m to 1000 m × 1000 m footprints, especially for the contaminants directly related to vehicular combustion PM 2.5 , NO 2 and NO. The urban parameters related to green infrastructure, Tree ca and Green gsa , show the largest (negative) correlations at around 300 m × 300 m footprints, but while . Urban-parameter specific Pearson Correlation plots against the NYCCAS mean contaminant concentrations for all studied footprint scales. a and b for the urban parameters related to urban ventilation capacity (building footprint area-Building fa and mean building height-Height bm ). c and d for the building emission category parameters (total residential area-Residentialta and non-residential building area-Non-Residentialta). e and f for the vehicular emission category (total road area-Roada and total traffic count-Trafficc). g and h for the green infrastructure category parameters (tree canopy plan area fraction-Treeca and grass and shrub area-Greengsa).
correlations drop for larger footprint with Tree ca , they plateau with Green gsa .
Of all the studied urban parameters, the highest correlations have been observed with building density and use-related (type) urban parameters, with highest correlations starting at~150 m × 150 m. The Non-Residential ta parameter was the only one giving correlations exceeding 0.8, and was a very good predictor of NO 2 , PM 2.5 , and NO concentration. More importantly, the accuracy penalties that would be associated with an inadequate choice of footprint scale, when aiming to understand the local air quality, are the largest for the urban parameters Building fa , Road a and Tree ca . Thus, it is critical that when studying the influence of such urban parameters on the local air quality, the right footprint scale is analyzed.
The analysis presented in the paper has complied the data collected from all NYCCAS stations together and thus it not only includes the stations located in the dense Manhattan street canyons, but also those located in parks and within different less dense boroughs of New York City, as shown in figures S2 & S4. It is then useful to evaluate whether including or excluding the stations located in parks or the stations located in different boroughs affects the trends observed in the correlations. To that end, we analyzed the results of the Pearson correlations between the NYCCAS PM 2.5 and O 3 data and the Building fa and Tree ca including and excluding the stations located in parks, and only including the data collected by the stations located in Manhattan (figure S4b). As demonstrated in figure S4b, the correlation trends remain consistent in all cases where parks are included and excluded from the computation, as well as when the district of Manhattan is studied in isolation or all stations belonging to different districts are studied together. This consistency underlines the robustness of our findings.
In order to check whether the analyzed air quality statistics are solely dictated by urban parameters or another analysis artifact, the trends of the urban parameters alone, for the different footprint scales, are displayed in figure S3. Comparing against figure 3, we note that (i) the correlation scaling trends are generally different from the urban parameter scaling trends and thus the former are not simply inherited from the spatial variability of the later, and (ii) there is a large variability in the urban parameters at a given scale (figure S3) and in air quality (figure S2) amongst the stations, which indicates that the signal in their correlation trends is large and supports the physical inferences and conclusions made in this paper.

Conclusions
We investigate the correlation of urban air quality and physical attributes of the city at various urban footprint scales ranging from 25 m × 25 m to 5000 m × 5000 m. This allows us to define an environmental neighborhood as the surrounding area of influence the attributes of which are the most relevant for predicting air quality anomalies of a given location, and to measure its extent. The analyses are illustrated for the air quality dataset from NYCCAS collected from 2008 to 2016 at 150 air quality stations. The urban parameters are compiled from the New York City land-use dataset at 3 ft × 3ft resolution, the NYC Open data GIS dataset, the NYC PLUTO dataset for building and lot information, and the NYC traffic dataset for the year 2012-2013. While the spatial extent of the environmental neighborhoods might change for different cities, the main innovation in this study are the concept and methodology, which are generalizable.
Broadly, all combustion related parameters, PM 2.5 , NO 2 , NO and SO 2 show consistent trends: the highest concentrations are in dense urban environments. O 3 on the other hand follows a unique pattern, where the highest concentrations are observed in the less dense areas of the city. The results show that urban fabric parameters associated with building morphology and building use play the most critical role in determining the spatio-temporal gradients of air quality. The correlation coefficients associated with the mean building height (Height bm ) and total non-residential areas (Non-Residential ta ) attain the highest magnitudes (negative for O 3 ) at ≈150 m × 150 m footprints. On the other hand, the urban parameters Building Footprint Area (Building fa ) or Road Area (Road a ) have a larger reach and affect the local concentrations at spatial extents ≈1000 m × 1000 m. Urban parameters associated with green infrastructure, such as tree canopy plan-area fraction (Tree ca ) and grass/shrub (Green gsa ), display negative correlations (except with O 3 ) that peak at about ≈300 m × 300 m. A similar trend is observed with the total traffic count Traffic c , which for most of the studied pollutants plateaus beyond ≈300 m × 300 m.
The environmental neighborhood scale deduced in our study pertains to Manhattan, but the same approach can be used to measure it in other cities. The extent of these neighborhoods is unlikely to be consistent, and the generalization of the findings might then appear challenging. Nevertheless, given the present analyses, it would seem plausible to hypothesize that the extent of environmental neighborhoods is closely linked to land use and land cover, and as such a suitable framework to use is that of the local climate zones reviewed in the introduction section (Stewart andOke 2012, Muller et al 2013b). The LCZ classification can distill the commonalities and unique features between physical neighborhoods and cities, and we expect that the scale of environmental neighborhoods varies between different LCZs, but much less so among areas belonging to the same LCZ type (regardless of what city they are in).
The findings are consequential for urban planning and environmental quality management in many ways. First, they illustrate that selecting the optimal footprint size is critical for finding robust relations between air quality, and potentially other environmental quality parameters, and urban attributes at the neighborhood scale. Smaller footprints do not contain all the pertinent information, while larger footprints contain irrelevant, potentially misleading information; both might result in erroneous conclusions. In other words, the spatial extent is not an input to be chosen but is rather dictated by the urban physics. Second, spatial regression or machine learning models of air quality need to select the right environmental neighborhood scale to maximize model skill, and account for the full range of urban fabric and activity parameters. Finally, air pollutants and urban attributes do not all evolve similarly, underlining the complexity of airflow in cities, but general trends can be detected. The most consequential general trend is that for all urban characteristics the optimal footprint is~200-1000 m, which implies that, like social neighborhoods, environmental neighborhoods in cities are quite limited and areas with worse air quality than the rest of the city can be improved with localized intervention measures to reduce environmental and health disparities.