Methodologic Issues in Using Land Cover Data to Characterize Living Environments of Geocoded Addresses

Estes et al. (2009) presented an interesting analysis of the relationship between blood pressure levels of individuals in four metropolitan regions and their living environments. Remotely sensed data was used to determine urban, suburban, and rural living environments as well as day/night land surface temperatures (LST). These remotely sensed data sets are readily available nationally, increasing the replicability and consistency of the methods. 
 
Estes et al. (2009) characterized living environments using the 2001 National Land Cover Dataset (NLCD; Homer et al. 2004). Detailed land cover classes were reclassified into broad categories of urban, suburban, and rural, and the original 30-m resolution raster data was resampled to a 1-km grid using a majority filter to match the resolution of the LST data. Residential addresses were geocoded and their location compared to the 1-km grid cell values to establish the living environment variables. There are several problems that result from this particular methodology, which I address below. 
 
First, Estes et al. (2009) geocoded the residential addresses using SAS/GIS geocoding software which employs TIGER data (SAS 2010) from the U.S. Census Bureau for street geocoding. The positional accuracy of TIGER data is not very good (e.g., Zandbergen 2008), and street geocoding in general is not very accurate (Cayo and Talbot 2003; Zandbergen 2009). The street geocoded location of the residence of a particular individual is therefore not very likely to fall inside the same 30-m grid cell as the true location of the residence. For example, the median error of typical street geocoding is in the order of 30–60 m for urban areas, about double that for suburban areas and much larger in rural areas (Cayo and Talbot 2003; Zandbergen 2009). This is likely to introduce a substantial number of misclassifications. Any point-in-raster overlay where the positional error of the points is of the same order of magnitude as the raster resolution is not very reliable, and the degree of misclassification will vary with the spatial heterogeneity of the land cover data. 
 
Second, the positional errors in street geocoding are not random in nature. Typical street geocoding employs a standard offset from the roads in the placement of the geocoded locations. In many areas, however, the actual residence is located at much greater distances, especially in rural areas. In the 2001 NLCD land cover data, many rural and suburban roads are classified as developed open space. This means that geocoded rural addresses will typically fall on this land cover type, while the actual residence is located on an agricultural or vegetated category. This adds to the occurrences of misclassifications, especially between suburban and rural. 
 
Third, the resampling of the original land cover data from 30 m to 1 km using a majority filter has the undesirable effect that small clusters of one land cover type that are surrounded by larger areas of other types will simply disappear. Estes et al. (2009) clearly acknowledged this and compared the classifications resulting from different resolutions; when resampling from 30 m to 1 km, only 63% of all locations were classified the same. This effect of resampling will vary between study areas. For example, urban development in Atlanta, Georgia, is relatively fragmented and the resampling results in a substantial reduction of the total area (from 2.0% of the study area in the original 30-m grid to 0.94% in the 1-km grid). A more compact urban development pattern such as Chicago, Illinois, is more robust to the effect of resampling. 
 
The resampling does overcome some of the misclassifications introduced by the errors in street geocoding. In effect, the land cover type at the exact location of the geocoded address is no longer of greatest interest, and instead the “majority” land cover of the surrounding area is used. However, the effects of street geocoding errors and resampling will vary greatly between study areas, reducing the robustness of the final classifications of study subjects and introducing potential bias. 
 
One approach to overcome some of these problems is to use the 2001 impervious cover data, which is provided as a complement to the 2001 NLCD land cover data. Imperviousness is classified between 0 and 100% and corresponds closely to the different land cover types, albeit providing more detail. The benefit of using impervious cover is that during resampling a simple averaging filter can be used instead of a majority filter. This type of filter produces unbiased results that are not dependent on the spatial heterogeneity of the landscape or the scale of resampling. Similar urban, suburban, and rural categories can be identified and will remain more robust under various resampling scenarios. 
 
The availability of moderate to high resolution remotely sensed data at national and global scales is providing unprecedented opportunities to compare health observations to environmental variables, including land cover and climatic factors. When combining data from different sources, great care should be taken to ensure the accuracy of the input is sufficient to produce reliable results given the specific analysis methods employed. Street geocding in particular has been underestimated as a source of positional error. In addition, when resampling methods are employed to produce data sets of matching resolution, robust methods are needed to avoid the unnecessary introduction of noise and bias.


Methodologic Issues in Using
Land Cover Data to Characterize Living Environments of Geocoded Addresses doi:10.1289doi:10. /ehp.0901863 Estes et al. (2009 presented an interesting analysis of the relationship between blood pressure levels of individuals in four metro politan regions and their living environ ments. Remotely sensed data was used to determine urban, suburban, and rural living environments as well as day/night land surface tempera tures (LST). These remotely sensed data sets are readily available nationally, increasing the repli cability and consistency of the methods. Estes et al. (2009) characterized living environments using the 2001 National Land Cover Dataset (NLCD;Homer et al. 2004). Detailed land cover classes were reclassified into broad categories of urban, suburban, and rural, and the original 30m resolution raster data was resampled to a 1km grid using a majority filter to match the resolu tion of the LST data. Residential addresses were geocoded and their location compared to the 1km grid cell values to establish the living environment variables. There are sev eral problems that result from this particular methodology, which I address below.
First, Estes et al. (2009) geocoded the resi dential addresses using SAS/GIS geocoding software which employs TIGER data (SAS 2010) from the U.S. Census Bureau for street geocoding. The positional accuracy of TIGER data is not very good (e.g., Zandbergen 2008), and street geocoding in general is not very accurate (Cayo and Talbot 2003;Zandbergen 2009). The street geocoded location of the residence of a particular individual is therefore not very likely to fall inside the same 30m grid cell as the true location of the residence. For example, the median error of typical street geocoding is in the order of 30-60 m for urban areas, about double that for suburban areas and much larger in rural areas (Cayo and Talbot 2003;Zandbergen 2009). This is likely to introduce a substantial number of mis classifi cations. Any pointinraster over lay where the positional error of the points is of the same order of magnitude as the raster resolution is not very reliable, and the degree of misclassification will vary with the spatial heterogeneity of the land cover data.
Second, the positional errors in street geocoding are not random in nature. Typical street geocoding employs a standard offset from the roads in the placement of the geo coded locations. In many areas, however, the actual residence is located at much greater distances, especially in rural areas. In the 2001 NLCD land cover data, many rural and suburban roads are classified as developed open space. This means that geocoded rural addresses will typically fall on this land cover type, while the actual residence is located on an agricultural or vegetated cate gory. This adds to the occurrences of misclassifications, especially between suburban and rural.
Third, the resampling of the original land cover data from 30 m to 1 km using a major ity filter has the undesirable effect that small clusters of one land cover type that are sur rounded by larger areas of other types will simply disappear. Estes et al. (2009) clearly acknowledged this and compared the classi fi cations resulting from different resolutions; when resampling from 30 m to 1 km, only 63% of all locations were classified the same. This effect of resampling will vary between study areas. For example, urban development in Atlanta, Georgia, is relatively fragmented and the resampling results in a substantial reduction of the total area (from 2.0% of the study area in the original 30m grid to 0.94% in the 1km grid). A more compact urban development pattern such as Chicago, Illinois, is more robust to the effect of resampling.
The resampling does overcome some of the misclassifications introduced by the errors in street geocoding. In effect, the land cover type at the exact location of the geocoded address is no longer of greatest interest, and instead the "majority" land cover of the sur rounding area is used. However, the effects of street geocoding errors and resampling will vary greatly between study areas, reducing the robustness of the final classifications of study subjects and introducing potential bias.
One approach to overcome some of these problems is to use the 2001 impervious cover data, which is provided as a comple ment to the 2001 NLCD land cover data. Imperviousness is classified between 0 and 100% and corresponds closely to the different land cover types, albeit providing more detail. The benefit of using impervious cover is that during resampling a simple averaging filter can be used instead of a majority filter. This type of filter produces unbiased results that are not dependent on the spatial hetero geneity of the landscape or the scale of resampling. Similar urban, suburban, and rural categories can be identified and will remain more robust under various resampling scenarios.
The availability of moderate to high resolution remotely sensed data at national and global scales is providing unprecedented opportunities to compare health observa tions to environmental variables, including land cover and climatic factors. When com bining data from different sources, great care should be taken to ensure the accuracy of the input is sufficient to produce reliable results given the specific analysis methods employed. Street geocding in particular has been under estimated as a source of positional error. In addition, when resampling methods are employed to produce data sets of matching resolution, robust methods are needed to avoid the unnecessary introduction of noise and bias. A 109 as we did in our methodology to characterize the participants' living environ ment. With respect to the misclassification that may be introduced due to the resolution used to classify participants, Zandbergen is correct that resampling to different resolutions did change the classification of the participants. However, the results of the analyses were consistent regardless of the resolution of the classification, indicating that while this may influence the exposure itself, it does not influ ence the relationship between the exposure and the outcome. Although satellite remote sensing has advanced significantly in recent years, there are inherent weaknesses in the use of this technology. The association between satellite based aerosol optical depth (AOD S ) and air pollution monitored on the ground can be influenced by a number of factors. In their article,  highlighted the weaknesses of AOD S to predict the spa tial distribution of fine particulate matter ≤ 2.5 µm in aero dynamic diameter (PM 2.5 ). It is a timely article given the increasing impor tance of indirect methods, including satellite data, to estimate air quality because of scarce and ad hoc spatial-temporal coverage of air pollution monitored by federal regulatory methods. It is important that the robustness of these methods is evaluated, and Paciorek and Liu's article is such an attempt. However, they failed to address the role of five major factors that can influence the AOD S -PM 2.5 association. These factors include decomposi tion of AOD S by aerosol types, mismatch in spatial-temporal resolution, collocation and integration of AOD S and PM 2.5 data, and control for spatial-temporal structure in the statistical model. Consequently, the weak nesses in Paciorek and Liu's study lead me to question their findings. The columnar measurement of AOD S consists of aerosols generated by anthropo genic (human) sources (AOD Sh ), such as emissions from industries and vehicles, and natural sources (AOD Sn ), such as water vapor or dust in the air. AOD Sn that constitutes a large fraction of AOD S is influenced by moving large air masses and observes a strong spatial and temporal structure. The concentra tion of PM 2.5 , however, can vary significantly within short distances. Therefore, there is a significant mismatch in the magnitude and extent of spatial and temporal variability of AOD Sn and AOD Sh ; without an adequate control for AOD Sn , it is difficult to develop a reliable PM 2.5 predictive model using AOD S ).
Paciorek and Liu (2009) recognized that the spatial-temporal resolutions of AOD S and PM 2.5 they used were different, but they did not address how the mismatch in the spatial-temporal resolutions of these data can influence their association. The spa tial resolutions of MISR (multi angle imag ing spectro radiometer), MODIS (moderate resolution imaging spectroradiometer), and GEOS (geostationary operational environ mental satellite) AOD were 17.6 km, 10 km, and 4 km, respectively, and PM 2.5 data were point measurements aggregated across 24 hr. A recent study suggests the strength of the AOD S -PM 2.5 association diminishes with the increase in time interval used for their aggregation ). It would have been useful for  to document the implications of the spatialtemporal reso lu tions and aggregation of AOD and PM 2.5 (data they used) on their findings. AOD S retrieval and PM 2.5 are not avail able on the same days: AOD S retrieval is not possible on cloudy days, and PM 2.5 data are recorded every third or sixth day. It seems that  averaged all AOD S at 4km pixel (i.e., 16 km 2 area; monthly and yearly) and all PM 2.5 (in the pixels where a monitoring station was situ ated). This could have resulted in a weak asso ciation between AOD S and PM 2.5 , because there were systema tic temporal gaps in both AOD S and PM 2.5 data sets. A reasonable approach to address this problem is to aggre gate AOD S PM 2.5 data for those days only when both AOD S and PM 2.5 are available.
Paciorek and Liu's method for aggregating 17.6km and 10km AOD S to a 4km pixel seems problematic. First, a radiative transfer model is used to retrieve AOD S (Remer et al. 2006) which removes pixels with the upper 50% and lower 20% of the reflectance values. This removal can be systematic. For example, pixels with high reflectivity (such as buildings and roads) are more likely to be removed than the vegetated pixels (i.e., pixels under vegetation canopy). Thus, the centroid of a 10km AOD S pixel may not represent the AOD S value for the entire 10km area. Second, AOD S regis ters a strong spatial-temporal auto correlation. Thus, time-space kriging that utilizes large number of data points is appropriate for AOD S aggregation rather than a single AOD S value to avoid an area specific bias.
The robustness of AOD S retrieval is evaluated by its comparison with the AOD recorded by sunphotometers at AERONET sites (AOD A ) (NASA 2007). The spatial reso lution at which AOD S is retrieved and the spatial-temporal intervals within which these data are aggregated may directly influence its comparison with the AOD A . This, in turn, can influence the association between AOD S and PM 2.5 . Recent literature suggests that 1km and 5km AOD S observe a significantly better association with PM 2.5 monitored on the ground than the 10km AOD S Li et al. 2005). Therefore, the optimal spatial resolution of AOD S retrieval and the optimal spatial and temporal inter vals for aggregating these data are critically important for developing time-space resolved estimates of air quality with the aid of AOD S .
Because meteorologic conditions are largely influenced by the prevailing air masses and do not vary significantly within thou sands of miles for a short period of time, the AOD Sn component of AOD S is likely to have a strong spatial-temporal structure. PM 2.5 that constitutes particulate mass associated with anthropogenic factors, however, varies signifi cantly within short distances from emission sources. Therefore, to develop a PM 2.5 predic tive model it is important that only AOD Sh is used instead of AOD Sn . If such data are not available, an alternative is to indirectly control for AOD Sn and its associated spatial-tempo ral structure. Otherwise the predicted PM 2.5 surface is likely to have an unrealistic spatial trend, as reported by , as well as unrealistic temporal trends.
The author declares he has no competing financial interests.

University of IowaGeography
Iowa City, Iowa Email: nareshkumar@uiowa.edu AOD-PM 2.5 Association: Paciorek and Liu Respond doi:10.1289/ehp.0901732R We thank Kumar for his letter about our article ). We hope this exchange will highlight some of the key issues in using aerosol optical depth (AOD) for air quality purposes, in particular with regard to our focus on epidemiologic use. More dialogue is needed between scientists involved in remote sensing and those studying air pollution exposure and its epidemiologic effects with regard to the challenges and needs involved in making the remote sensing products helpful in applications. Our response to Kumar's letter highlights our perspective on these challenges.
We agree that in our article  we used AOD in a relatively straightforward way, and we welcome more advanced approaches to making use of AOD; in fact, one of us (Y.L.) is heavily involved in such work. The scientific challenge is to ensure that more advanced techniques can be used over an entire continuous time period and spatial area needed in a given epidemiologic or regulatory context. In our analysis we did our best to make use of currently available AOD products and to adjust for meteorologic variables and largescale spatial discrepancy between AOD and particulate matter ≤ 2.5 µm in aero dynamic diameter (PM 2.5 ) based on the data available. More sophisticated approaches will hopefully reduce the discrepancy between PM 2.5 and AOD, but this does not change the need for rigorous assessment of the use of AOD as a proxy for PM 2.5 . An important testwhich we explored-is the ability of AOD to help improve PM 2.5 predictions, beyond reporting correlations between AOD and PM 2.5 . Furthermore, even with improved approaches in which systema tic discrepancy may be alleviated, systematic discrepancy seems unlikely to disappear, and we believe serious consideration of AOD as a proxy for PM 2.5 in the future will need to consider the nature of this discrepancy and its implications for the contexts in which AOD is used as a proxy for PM 2.5 .
To the extent that natural sources of AOD do not correlate with concentrations of ground-level PM 2.5 , we agree with Kumar that it would be ideal to control for such sources. We used the standard MODIS (moderate resolution imaging spectroradiometer) AOD product because this product would be available to general users; however, it would be appealing if a more tailored AOD retrieval algorithm could be applied over the spatial and temporal domain of interest for a given application. From reading the article by  particularly p. 3390), we did not see a specific algorithm proposed to decompose AOD into anthropogenic and natural sources or to control for natural sources.
As noted by Kumar, averaging all data-rather than matching in time before averaging-reduces associations. However, when interest lies in developing a proxy for long-term average PM 2.5 , the average of all monitoring data available at a regular interval should be an unbiased estimate of true PM 2.5 at the location, which is the quantity one would like to have everywhere in space. Estimated associations based on matched data therefore are an overly optimistic assessment of AOD as a proxy for true long-term PM 2.5 . Of course for shorter intervals, the variability in estimates of true PM 2.5 that are based on small numbers of daily samples will contribute to reduced AOD-PM 2.5 association, so there are tradeoffs in deciding whether to match. One also needs to consider whether using matching introduces bias because missing AOD is associated with particular meteorologic conditions that also likely correlate with PM 2.5 levels Paciorek et al. 2008). Finally, in unpublished work, we have seen moderate improvements in associations when matching, but these improvements were not so large as to suggest that lack of timematching is the key reason for the results seen in our article . The reference to results on the diminishing association with longer-term aggregation by  seems to reinforce our very point: One should be cautious about using AOD as a proxy for PM 2.5 when aggregating over time, but this is precisely one of the contexts in which we need proxies for PM 2.5 . Health analyses do not have the luxury of only analyzing health outcomes that correspond to the time periods (and spatial locations) for which AOD is available or for which AOD is thought to be a more reliable proxy.
Given the pixel-scale AOD retrievals (and the changing MODIS pixels from day to day), to spatially align our various data sources we took the ad hoc approach of assigning to each 4-km grid cell the value of the nearest MODIS AOD pixel overlapping the grid cell, requiring the distance between the cell and pixel centroids to be no greater than the nomi nal distance between AOD pixel centroids. This does not fundamentally change the AOD spatial pattern but does somewhat blur the original AOD values at the pixel boundaries. We recognize that it is difficult to compare a pixel-wide AOD value to a point observation of PM 2.5 , and of course one cannot expect AOD to provide information below its nomi nal resolution. Given this, in our statistical modeling we did our best to account for local sources of variation in PM 2.5 , namely distance to roads and to point sources, that cause the pointlevel observations to necessarily differ from the pixel-scale AOD. One would hope that the AOD pixel value represents variation in AOD at the scale of pixels or at somewhat larger resolution (such as distinguishing variation at scales up to 50-100 km) that differentiates urban, suburban, and rural areas. We would like variation at this scale to provide information on PM 2.5 variation at the same scale that would improve prediction of PM 2.5 , but our results unfortunately did not provide evidence for such improvement. It is not completely clear what Kumar is suggesting as an alternative to using the value of AOD assigned to a pixel as representative of the entire pixel area, but it seems to be an approach that uses subpixel-scale information not available in the current MODIS AOD product. This seems promising, and we welcome work on providing AOD at higher resolution and evaluating whether more highly resolved AOD improves predictions of PM 2.5 . A key issue from this perspective is not the nominal resolution at which AOD is provided but the resolutions at which it is associated with spatial variations in PM 2.5 .
It was not entirely clear what Kumar is suggesting in terms of how to control for natural-source AOD and its spatio-temporal structure. With regard to large-scale discrepancy between AOD and PM 2.5 that might mask smaller-scale correspondence, we used AOD and PM 2.5 data to estimate and adjust (our calibrated AOD) for large-scale spatial discrepancy that persists over time but found that this did not improve matters, suggesting that small-scale discrepancy between PM 2.5 and AOD is a major concern.