GIS Approaches for the Estimation of Residential-Level Ambient PM Concentrations

Spatial estimations are increasingly used to estimate geocoded ambient particulate matter (PM) concentrations in epidemiologic studies because measures of daily PM concentrations are unavailable in most U.S. locations. This study was conducted to a) assess the feasibility of large-scale kriging estimations of daily residential-level ambient PM concentrations, b) perform and compare cross-validations of different kriging models, c) contrast three popular kriging approaches, and d ) calculate SE of the kriging estimations. We used PM data for PM with aerodynamic diameter ≤10 μm (PM10) and aerodynamic diameter ≤ 2.5 μm (PM2.5) from the U.S. Environmental Protection Agency for the year 2000. Kriging estimations were performed at 94,135 geocoded addresses of Women’s Health Initiative study participants using the ArcView geographic information system. We developed a semiautomated program to enable large-scale daily kriging estimation and assessed validity of semivariogram models using prediction error (PE), standardized prediction error (SPE), root mean square standardized (RMSS), and SE of the estimated PM. National- and regional-scale kriging performed satisfactorily, with the former slightly better. The average PE, SPE, and RMSS of daily PM10 semivariograms using regular ordinary kriging with a spherical model were 0.0629, −0.0011, and 1.255 μg/m3, respectively; the average SE of the estimated residential-level PM10 was 27.36 μg/m3. The values for PM2.5 were 0.049, 0.0085, 1.389, and 4.13 μg/m3, respectively. Lognormal ordinary kriging yielded a smaller average SE and effectively eliminated out-of-range predicted values compared to regular ordinary kriging. Semiautomated daily kriging estimations and semivariogram cross-validations are feasible on a national scale. Lognormal ordinary kriging with a spherical model is valid for estimating daily ambient PM at geocoded residential addresses.

Large-scale, population-based epidemiologic investigations of the health effects of ambient air pollution often rely on measurements from a network of air quality monitors maintained by the U.S. Environmental Protection Agency (U.S. EPA 1995a(U.S. EPA , 1995b(U.S. EPA , 2005. The Air Quality System (AQS) is the only national ambient air pollution database currently available for public use in the United States. The availability of individual-level health outcome and covariable data from national-scale studies that often characterize participants over the course of several years enables researchers to study the acute effects of ambient air pollution using individual-level data (Liao et al. , 2005aSullivan et al. 2005;Wellenius et al. 2005;Whitsel et al. 2004). This approach requires measures of daily particulate matter (PM) exposures, ideally assessed as close to the individual level as possible, such as at participant residences or in immediate proximity to participants themselves. Because daily measures of ambient PM concentrations from the AQS are unavailable in the large majority of locations, spatial estimation methods using geographic information systems (GIS) are increasingly being considered to estimate geocoded location-specific ambient PM concentrations, such as kriging methods. Important methodologic and practical issues still need to be resolved, however. This study was designed to a) assess the feasibility of largescale kriging estimation of daily residentiallevel ambient PM concentrations, b) perform and compare cross-validations of different kriging models, c) determine and contrast the most appropriate kriging approaches, and d) calculate the SEs of the kriging estimations.

Materials and Methods
We obtained from AQS the PM 10 and PM 2.5 (PM with aerodynamic diameter ≤ 10 and 2.5 µm, respectively) data from 1993(U.S. EPA 2005. The data from 2000 were used for this study after eliminating duplicate records and converting all measures to the same units and denominator. We calculated "monitor-specific" daily averages based on ≥ 18 hourly measures. Monitor-specific daily averages were set to missing for monitors reporting < 18 hourly measures on any given day. If more than one monitor was operating at the same location on a given day, we then computed "site-specific" daily PM 10 and PM 2.5 averages by taking the mean of the monitors' measures. We also obtained the longitude and latitude for each site from the AQS database. These data served as pollutant-and site-specific daily source data for our study (Liao et al. 2005b).
We geocoded 94,135 addresses of Women's Health Initiative (WHI) Clinical Trial (CT) participant residences and examination sites in the contiguous 48 United States and District of Columbia, after assessing geocoding vendor error . Daily PM 10 and PM 2.5 concentrations and the associated estimation errors (SEs) are estimated at these geographic locations by the Environmental Epidemiology of Arrhythmogenesis in WHI study (Whitsel 2006).
We used ArcView GIS (version 8.3) and its Geostatistical Analyst Extension (ESRI Inc., Redlands, CA) for semivariogram determination and cross-validation and for subsequent spatial estimation of daily location-specific PM concentrations. Three frequently referenced spatial models (spherical, exponential, Gaussian) (Cressie 1993a;Davis 2002) were considered using the weighted least-squares method (Gribov et al. 2004;Jian et al. 1996) to obtain the "optimal" daily semivariogram parameters (range, partial sill, and nugget). Based on the daily semivariograms, we performed ordinary kriging to estimate the daily mean PM concentration and its SE at each of the 94,135 geocoded addresses. Next, we performed the standard cross-validation-an iterative procedure that omits site-specific PM data points one at a time and refits the model using the remaining data to estimate the PM concentration at the site of the omitted observation. We assessed the validity (also termed "goodness of fit") of each semivariogram using three crossvalidation parameters readily available from the ArcView software package: a) the average of prediction error (PE), where PE is the average of the difference between the predicted and measured daily PM values at each monitoring site; b) the average of standardized prediction error (SPE), where SPE is the PE divided by the SE of estimation across all sites; and c) root mean square standardized (RMSS), the standard deviation (SD) of all SPEs across all sites. Additionally, we assessed the goodness of fit of each semivariogram by the average of the SEs of the estimations, generated by the kriging procedure, across all 94,135 geocoded addresses. The expectations for a good-fitting semivariogram and kriging model are an average PE and SPE near 0, an RMSS near 1, and a small SE. If RMSS < 1, there is a tendency toward overestimation of the variance; if > 1, there is a tendency toward underestimation (ESRI Inc. 2001). These criteria were consistently used to guide our model selection processes throughout this study (Liao et al. 2005c).
As an alternative to using the automatically calculated semivariogram (calculated using the weighted least-squares method (Gribov et al. 2004;Jian et al. 1996), one can also manually specify the semivariogram parameters to improve the cross-validation parameters in ArcView. We selected six least satisfactory daily semivariograms throughout year 2000 and manually adjusted the semivariogram parameters to obtain the best achievable average RMSS and SPE (RMSS as close to 1 and average SPE as close to 0 as possible). The cross-validation parameters from the weighted least-squares method-calculated semivariograms were then compared to those of the manually adjusted semivariograms.
We performed daily ordinary krigings on both the original scale (regular ordinary kriging) and the lognormal scale (lognormal ordinary kriging) (Cressie 1993b;Johnston 2001) for all WHI CT addresses for the year 2000 and compared the cross-validation parameters between the two kriging procedures. Lognormal ordinary kriging was used because it has the ability to eliminate the negative predicted values, which is a problem in ordinary kriging, especially when the source data contain extreme values.

Results
Characteristics of the site-specific daily average PM 10 and PM 2.5 concentrations. During 1994During -2003, the number of monitoring sites that provided GIS-usable daily PM 10 data varied widely (range, 120-1,340). On 17% of days, GIS-usable data were provided by ≥ 400 monitoring sites; on 39% of days, by 200-400 sites; and on 44% of days, by 120-200 sites. The corresponding values for PM 2.5 during 1999-2003 were 33% of days by ≥ 400 sites and 67% of days by 148-400 sites. Specific to the year 2000, there were averages of 325 PM 10 and 456 PM 2.5 monitoring sites operating per day across the contiguous United States, with minima and maxima of 148 and 1,061 sites for PM 10 and 178 and 1,019 sites for PM 2.5 . As a result, there were 118,791 site-days during 2000 for which we can retrieve measured PM 10 data and 166,796 site-days for PM 2.5 data. The mean (± SD) of PM 10 and PM 2.5 from these retrievable site-days were 26.29 ± 58.13 and 13.14 ± 8.59 µg/m 3 , respectively, with medians of 21.33 and 11.20 µg/m 3 , respectively. A right-skewed distribution of both PM 10 and PM 2.5 are evident, especially for PM 10 . Figure 1 illustrates the spatial relationships between the geocoded addresses and the PM monitoring sites on an optimal day and a typical day. The mean distance between each address and its nearest PM monitor was 12.35 km, with an SD of 13.98 km, a median of 7.81 km, an interquartile range of 10.53 km, and 99th percentile of 68.36 km.
Comparisons of three widely used spatial models. Tables 1 and 2 present summary statistics of the cross-validation parameters (PE, SPE, and RMSS) comparing three widely used spatial models (spherical, exponential, Gaussian) for PM 10 and PM 2.5 , respectively. In general, both average PE and average SPE are very close to 0, with a very narrow range of variation from the 366 daily cross-validations. More specifically, > 95% of average PEs were within ± 2 µg/m 3 of measured PM 10 , and ± 0.5 µg/m 3 of measured PM 2.5 , an average measurement error that we considered acceptable. In terms of RMSS, we considered > 95% of cross-validations as acceptable, but there were days when RMSS indicated a slight overor underestimation of the prediction variability. These data support the overall validity of using kriging-based estimation approaches to estimate location-specific PM concentrations across the contiguous United States.
Comparisons of default and manually adjusted semivariograms. Table 3 presents the cross-validations and actual kriging estimations from the weighted least-squares mean method calculated semivariogram and manually adjusted semivariogram. For the 6 days when the PE, SPE, or RMSS indicated a less satisfactory default-calculated semivariogram, these three cross-validation parameters could be improved satisfactorily through adjustment of the semivariogram parameters by an operator. However, the application of such "improved" semivariograms to the estimation of PM 10 concentrations at geocoded locations across the United States did not necessarily provide better estimation of location-specific PM (i.e., smaller SEs). To the contrary, the average SEs from the default semivariograms were smaller than those from manually adjusted semivariograms. Because each average SPE of the default-calculated daily semivariograms was close to 0, and each default-calculated daily semivariogram produced a smaller estimation error, we recommend using the defaultcalculated semivariogram, even though the RMSS from the default-calculated semivariogram was not fully satisfactory.
Comparisons of regular versus lognormal ordinary krigings. We applied regular ordinary kriging (spherical model, default-calculated PM 10 monitors on a typical day (201 sites) PM 10 monitors on an optimal day (1,061 sites) WHI participants' residences Northeast Southeast Southwest Northwest Middle north daily semivariograms) to estimate daily PM 10 concentrations at geocoded addresses (n = 94,135) of WHI CT participants and examination sites in the contiguous United States. We examined the estimated PM 10 concentrations and identified 22 days during 2000 when estimated values exceeded the range of observed values. In some cases, the estimated values were negative. The number of addresses affected by this problem ranged from a few on most days to 3.5% of all addresses. This problem was related to skewed PM 10 distributions and to small numbers of extreme outlying values or operating sites on some days. We therefore compared regular ordinary kriging and lognormal ordinary kriging anticipating that lognormal kriging would attenuate this problem. Table 4 lists the 22 days on which regular ordinary kriging yielded estimated PM 10 values that were outside the range of measured values. For comparison, the minima and maxima of the measured and estimated PM 10 concentrations from both regular and lognormal ordinary krigings are also listed in Table 4. In summary, during 2000, lognormal ordinary kriging effectively reduced the number of problematic days from 22 to 1. Even on this one day, lognormal ordinary kriging yielded a minimum value that was closer to the range of measured data than that from regular ordinary kriging. Table 5 shows the mean values of crossvalidation parameters of daily PM 10 semivariograms for both regular ordinary kriging and lognormal ordinary kriging. Cross-validation parameters were within the acceptable range from both regular and lognormal ordinary krigings, except for the 22 "out-of-range" days as defined above. On these out-of-range days, the SPE was well within the acceptable range for both regular and lognormal krigings, but the RMSS was > 1 from both approaches. Even so, for these out-of-range days RMSS from lognormal ordinary kriging was closer to 1 than that from regular ordinary kriging.
We then performed regular and lognormal ordinary kriging to estimate PM 10 concentrations at geocoded addresses of WHI CT participants and examination sites, based on year 2000 PM 10 data (94,135 locations and 366 days). The mean, SD, median, and maximum of the daily mean SE of the estimated PM 10 from the regular ordinary kriging were 27. 36, 83.35, 13.93, and 1160.20 µg/m 3 , respectively. In contrast, those from the lognormal ordinary kriging were 16.29, 6.65, 15.05, and 67.46 µg/m 3 . Clearly, the distribution of the estimation errors from lognormal ordinary kriging was considerably less skewed and had fewer outlying values than that from regular ordinary kriging. Alternative methods (winsorizing extreme PM 10 values; using ArcView's "no-sector" option to search for measured data points from a circle centered around a location that needs of an estimation-i.e., disabling the default "sector" search for measured data points in the four sectors of a circle, reducing the range or nugget) were less effective in estimating predicted values within the range of measured values (data not shown).
Similar to the situation observed in PM 10 estimations, lognormal ordinary kriging also effectively eliminated the negative or out-ofrange problem that occurred in about 5% of PM 2.5 data when using regular ordinary kriging. Other cross-validation parameters were comparable between the lognormal and regular ordinary krigings (data not shown).
Comparisons between national and regional krigings. From the 61 days when 900 or more monitoring sites were operating in the year 2000 in the 48 contiguous states, the first of such days from each month was selected for comparisons between ordinary kriging models on a national versus regional scale. National krigings and cross-validations were performed on these 12 selected days using daily sitespecific PM 10 data. Regional krigings and VOLUME 114 | NUMBER 9 | September 2006 • Environmental Health Perspectives  cross-validations were performed on the same data using the regional map ( Figure 1) that divides the U.S. continent into five regions (northwest, southwest, middle north, southeast, and northeast). These five regions were created based on the assumption that different semivariogram parameters would be needed for different geographic areas. In general, for both regional and national krigings, the average SPE and RMSS from cross-validations of semivariograms calculated for the 12 selected days were very close to 0 and 1, respectively (Table 6)

Discussion
Classical methods often assume that measures are uniformly or randomly distributed. The assumptions are often inappropriate for analysis of environmental measures because values at neighboring locations are rarely independent, particularly over short distances. This form of dependence (spatial autocorrelation) nonetheless makes it possible to interpolate values at unmonitored locations from known values at monitored locations. Kriging is one such interpolation method originally developed by mining engineers (Krige 1966). It is especially attractive in this setting because it takes the spatial autocorrelation structure function (variogram) into account by considering known values from monitored locations, weighting them with values read from the variogram at corresponding distances, and splitting weights among adjacent locations. The method thereby ensures that interpolations do not depend on monitor density (Legendre and Fortin 1989). By doing so, kriging yields best linear unbiased estimates, in this setting, of location-specific daily mean ambient PM concentrations and their SEs. Large-scale population-based epidemiologic investigations of the health effects of ambient air pollution often rely on data collected from a network of air quality monitors maintained by the U.S. EPA-the AQS data (U.S. EPA 1995aEPA , 1995bEPA , 2005. It is revealing to compare kriging with interpolation methods used in the well-known time-series and cohort studies of PM effects on mortality and cardiovascular disease (Abbey et al. 1991(Abbey et al. , 1999Dockery et al. 1993;Katsouyanni et al. 1996Katsouyanni et al. , 2001Miller et al. 2004Miller et al. , 2005Pope et al. 2004;Samet et al. 2000aSamet et al. , 2000b. These studies uniformly estimated PM exposures using area-based arithmetic averaging or nearest-neighbor imputation-alternative methods that have important limitations (Moore and Carpenter 1999). Such limitations include the assumption of homogeneous exposures within study areas and the inability (or failure) to estimate exposures or associated PEs. For example, when daily exposure was of interest and there were no operating PM monitors with a study area, data pairs (daily PM concentrations, death counts) were unavailable in these studies. In addition, when longer-term (monthly to yearly) exposure was of interest, area aggregated exposures were based on available measurements within a given time frame. If there were five 24-hr measures in a month, for example, the monthly average exposure was calculated as the mean of the five readings. In contrast, our kriging-based approach estimated daily mean exposures and SEs at geocoded addresses of participants and their examination sites across the contiguous United States that can be readily integrated over time with little influence of missing data. Studies in the geosciences have also found that kriging provides consistently improved interpolation accuracy over traditional inverse-distance weighting and other, simpler spatial interpolation methods (Zimmerman 1999). Another important advantage of GIS-based estimation over the traditional area-average GIS to estimate residential-level ambient PM Environmental Health Perspectives • VOLUME 114 | NUMBER 9 | September 2006 approach is the availability of both the locationspecific estimated pollutant concentrations and their SEs. Our goal in this study was to contribute methodologic and practical insights toward standardized, semiautomated GIS approaches to estimation of daily air pollution concentrations and their associated estimation errors. The air pollution data estimated using these approaches will support the Environmental Epidemiology of Arrhythmogenesis in WHI study (Whitsel 2006) examining the cardiac effects of air pollution in 68,133 postmenopausal women 50-79 years of age at baseline in the WHI CT (WHI Study Group 1998). Here we describe our experience resolving several important methodologic and practical issues in adopting a systematic, standardized, and semiautomated kriging approach to estimate daily air pollution concentrations and the associated estimation errors at geocoded addresses across the contiguous United States over 10 years.
We successfully downloaded from AQS the PM 10 and PM 2.5 raw data from 1993-2004. We then cleaned, calculated, and reconstructed site-specific daily PM concentration data ready for GIS applications. It is well known that the monitoring sites in AQS are not randomly distributed, which is one of the assumptions in kriging estimation, and the density of the monitoring sites is relatively low given the size of the contiguous United States. However, the AQS is the only currently available nationwide database. Our cross-validation studies suggest that the AQS data can be used as source data for kriging estimation of ambient pollution concentrations at various locations across the 48 contiguous states.
In this study, we performed cross-validation to assess the goodness of fit of various semivariogram and spatial models using four major parameters: the average PE, SPE, RMSS, and SE of estimation. Details can be found elsewhere (Webster and Oliver 2001), but it is worth noting that in addition to using the SE as a measure of the goodness of fit of a kriging model, one could improve the health effects models by incorporating SE in the models to account for the error in the estimation of location-specific PM concentrations. We consider this an important advantage of GIS-based estimation over the traditional area-average approach and are performing studies of using SE in health effects models.
We compared the performance of three widely spatial models (spherical, exponential, Gaussian) for PM 10 and PM 2.5 estimations using regular ordinary kriging on a national scale (Tables 1 and 2). In general, the cross-validation parameters suggest that all three models performed fairly well. Overall, the spherical model seemed to perform slightly better, consistent with the observation that the spatial distribution pattern of ambient air pollutants is closest to the assumption of the spherical model. The spherical model has been used most often in modeling spatially distributed data, providing a further rationale for its use in our large-scale population-based study of the health effects of PM. Furthermore, from the perspective of the cross-validation results, both average PE and average SPE are very close to 0, with a very narrow range of variation from the 366 daily cross-validations. These data support the overall validity of using kriging-based estimation approaches to estimate location-specific PM concentrations across the contiguous United States.
We completed an empirical analysis to investigate whether manually adjusting semivariogram parameters improves a) crossvalidation parameters and b) estimated PM 10 concentrations and their SEs (Table 3). From these data, we conclude that manually adjusting semivariogram parameters improves crossvalidation parameters. However, the application of such "improved" semivariograms to the estimation of PM 10 concentrations at geocoded locations across the United States did not necessarily provide better estimation of locationspecific PM. Therefore, we recommend using the default-calculated semivariogram.
Semivariograms are sensitive to strong positive skewness. As a result, regular ordinary kriging can yield negative predicted values or values exceeding the range of the source data. Kriging works best if the input data have a normal distribution. One solution is to logtransform the input data-using "lognormal kriging." In the ArcView software package, performing lognormal kriging is a standard option. This option log-transforms the input data to normalize its distribution and attenuate the impact of very large values. It also back-transforms the estimated values and the "unbiased" SE of the estimation to the original scale (Cressie 1993b;Johnston 2001). Our results comparing lognormal ordinary kriging versus regular-scale ordinary kriging suggest that lognormal ordinary kriging not only effectively estimated location-specific PM concentrations within the range of the measured data for the days regular ordinary kriging yielded negative or "out of range" PM estimations, but also yielded a smaller average SE than did regular ordinary kriging and estimations. Therefore, our results support the use of lognormal ordinary kriging as an acceptable solution to the problem commonly posed by positively skewed distributions of environmental data. Our comparisons of national-versus regional-scale kriging indicate that, in terms of cross-validation results, both performed similarly. However, such comparisons are based on krigings using the source data from optimal days (when > 900 sites across the country were reporting data), which account for only 17% of all days in a year. Therefore, there is additional justification for using national-scale kriging: Usually, there were very few operating sites within a region. On typical days-when only about 200 monitoring sites were operating-ability to derive stable and meaningful semivariograms was greatly impaired. Regional kriging also poses problems for estimation at locations near regional borders. For example, at locations within Washington State but near the Washington-Idaho border, regional kriging is based solely on PM 10 concentrations in the "Washington/ Oregon, Northern California" region. It is not based on PM 10 concentrations measured immediately across the border in Idaho, despite the real possibility that they would have the largest weights in national-scale kriging estimation. For all these reasons, we recommend national-scale kriging.
Considering the number of study participants and the length of study period (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) for the Environmental Epidemiology of Arrhythmogenesis in WHI study, development of an automated procedure enabling large-scale daily krigings and semivariogram cross-validations was critical. In this study, we decided to use ArcView for predicting individuals' PM exposure concentrations because of the flexibility it offers for automation. Because ArcView GIS relies on either the weighted least-squares method or visual adjustment to create semivariograms, we did not compare the relative performance of semivariograms generated using alternative methods such as maximum likelihood and restricted maximum likelihood. For generating semivariograms, we compared only three popular spatial models (spherical, exponential, and Gaussian). Our results, however, do not invalidate alternative spatial models (e.g., power). In the end, we selected the spherical model for our study because it is the most studied model, and its assumption pertaining to the spatial correlation of data is probably closest to our pollutant data. Furthermore, the spherical model seemed to perform as well as or slightly better than the remaining models in terms of crossvalidation parameters.
We chose ordinary kriging instead of universal or simple kriging for several reasons. First, the assumption for simple kriging of a known mean concentration on any given day across space is not practical for our data. Although it may seem more appropriate because of the "varying mean" concentration across the contiguous U.S. assumption, universal kriging requires a predetermined set of "exploratory variables" to explain the varying means. The candidates, many of which are spatial variables, include emissions, land use, population, road network distribution, altitude, rainfall, latitude, climatology, and other quality data. Denby et al. (2005) recently recommended a method that uses measured concentration data in combination with some "exploratory variables" as suggested above. However, their approach may not be feasible for a national-scale study such as ours, because little guiding information is available as to how to identify a set of widely acceptable variables that can be applied to the entire nation. Moreover, even if we could identify a set of exploratory variables, we do not know the forms or shapes of their independent and joint relations to the air pollution measures. Further studies that involve large-scale national data using universal kriging are still needed. In this study, we empirically tested whether the nonconstant mean assumption for universal kriging was needed; we performed five regional ordinary krigings so that different parts of the country would assume a different mean PM concentration. Our data suggested that regional and national ordinary kriging performed similarly. Therefore, our data indirectly validated and supported the use of national ordinary kriging.
Although the primary objective of our study is to assess the short-term relationship between PM and cardiac responses, the proposed kriging method also enables us to calculate the long-term cumulative exposure of an individual by taking into account the change of his or her residences over time, because the WHI study recorded the residential location history over 10 years. Nevertheless, from the environmental perspective, an inherited limitation of the kriging-based approach is that the estimations of the PM concentrations will provide only surrogates, or the best guesses, of the true exposure levels at the locations of interest. Thus, the accuracy of the estimations depends highly on the quality of the measured data and their spatial correlation. Even if the estimations were made with a high level of confidence, they cannot be directly interpreted as the true individual-level exposures. However, to correlate individual level cardiac responses with a surrogate of location-specific exposure, our approach represents one of the best available methods for a large-scale population-based study.
In summary, our investigation of GIS approaches for estimating daily mean geocoded location-specific air pollutant concentrations and their SEs supports the use of a spherical model to perform lognormal ordinary kriging on a national scale. Our findings also support the use of default-generated semivariograms (estimated using the weighted least-squares method) without visual adjustment. We developed a semiautomated program to access and execute ArcView to implement these approaches for large-scale daily kriging estimations and semivariogram cross-validations. Detailed information about this program can be obtained on request.