A National Prediction Model for PM2.5 Component Exposures and Measurement Error–Corrected Health Effect Inference

Background: Studies estimating health effects of long-term air pollution exposure often use a two-stage approach: building exposure models to assign individual-level exposures, which are then used in regression analyses. This requires accurate exposure modeling and careful treatment of exposure measurement error. Objective: To illustrate the importance of accounting for exposure model characteristics in two-stage air pollution studies, we considered a case study based on data from the Multi-Ethnic Study of Atherosclerosis (MESA). Methods: We built national spatial exposure models that used partial least squares and universal kriging to estimate annual average concentrations of four PM2.5 components: elemental carbon (EC), organic carbon (OC), silicon (Si), and sulfur (S). We predicted PM2.5 component exposures for the MESA cohort and estimated cross-sectional associations with carotid intima-media thickness (CIMT), adjusting for subject-specific covariates. We corrected for measurement error using recently developed methods that account for the spatial structure of predicted exposures. Results: Our models performed well, with cross-validated R2 values ranging from 0.62 to 0.95. Naïve analyses that did not account for measurement error indicated statistically significant associations between CIMT and exposure to OC, Si, and S. EC and OC exhibited little spatial correlation, and the corrected inference was unchanged from the naïve analysis. The Si and S exposure surfaces displayed notable spatial correlation, resulting in corrected confidence intervals (CIs) that were 50% wider than the naïve CIs, but that were still statistically significant. Conclusion: The impact of correcting for measurement error on health effect inference is concordant with the degree of spatial correlation in the exposure surfaces. Exposure model characteristics must be considered when performing two-stage air pollution epidemiologic analyses because naïve health effect inference may be inappropriate. Citation: Bergen S, Sheppard L, Sampson PD, Kim SY, Richards M, Vedal S, Kaufman JD, Szpiro AA. 2013. A national prediction model for PM2.5 component exposures and measurement error–corrected health effect inference. Environ Health Perspect 121:1017–1025; http://dx.doi.org/10.1289/ehp.1206010


Introduction
The relationship between air pollution and adverse health outcomes has been well docu mented (Pope et al. 2002;Samet et al. 2000). Many studies focus on particulate matter, speci fically particulate matter ≤ 2.5 µm in aerodynamic diameter (PM 2.5 ) (Kim et al. 2009;Miller et al. 2007). Health effects of PM 2.5 may depend on characteristics of the particles, including shape, solubility, pH, or chemical composition (Vedal et al., in press), and a deeper understanding of these differ ential effects could help inform policy. One of the challenges in assessing the impact of different chemical components of PM 2.5 in an epidemiologic study is the need to assign exposures to study participants based on monitoring data from different locations (i.e., spatially misaligned data). When doing this for many components, the prediction proce dure needs to be streamlined in order to be practical. Whatever the prediction algorithm, using the estimated rather than true exposures induces measurement error in the subsequent epidemiologic analysis. Here we describe a flexible and efficient prediction model that can be applied on a national scale to estimate longterm exposure levels for multi ple pollut ants and that implements existing methods of correcting for measurement error in the health model. Current methods for assigning exposures include landuse regression (LUR) with geo graphic information system (GIS) covariates (Hoek et al. 2008) and universal kriging, which also exploits residual spatial structure (Kim et al. 2009;Mercer et al. 2011). Often hundreds of candidate correlated GIS covari ates are available, necessi tating a dimension reduction procedure. Variable selection meth ods that have been considered in the literature include exhaustive search, stepwise selection, and shrinkage by the "lasso" (Mercer et al. 2011;Tibshirani 1996). However, variable selection methods tend to be computation ally intensive, feasible perhaps when consider ing a single pollutant but quickly becoming impractical when develop ing predictions for multi ple pollutants. A more streamlined alter native is partial least squares (PLS) regression (Sampson et al. 2009), which finds a small number of linear combinations of the GIS covariates that most efficiently account for variability in the measured concentrations. These linear combinations reduce the covari ate space to a much smaller dimension and can then be used as the mean structure in a LUR or universal kriging model in place of individual GIS covariates. This provides the advantages of using all available GIS covariates and eliminating potentially timeconsuming variable selection processes.
Using exposures predicted from spatially misaligned data rather than true exposures in health models introduces measurement error that may have implications for β x , the estimated health model coefficient of interest (Szpiro et al. 2011b We thank the three reviewers for their helpful comments. Research in this publication was supported by grants T32ES015459, P50ES015915, and R01ES009411 from the National Institute of Environmental Health Sciences of the National Institutes of Health (NIH). Additional support was provided by an award to the University of Washington under the National Particle Component Toxicity initiative of the Health Effects Institute and by the U.S. Environmental Protection Agency (EPA), Assistance Agreement RD834796010 (Clean Air Research Centers). This publication was developed under a STAR (Science to Achieve Results) program research assistance agreement, RD831697, awarded by the U.S. EPA. The views expressed in this document are solely those of the University of Washington, and the U.S. EPA does not endorse any products or commercial services mentioned in this publication. The MultiEthnic Study of Atherosclerosis (MESA) is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by NHLBI contracts N01HC95159 through N01HC95169 and UL1RR024156. MESA Air is funded by the U.S. EPA's STAR grant RD831697.
The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Background: Studies estimating health effects of long-term air pollution exposure often use a twostage approach: building exposure models to assign individual-level exposures, which are then used in regression analyses. This requires accurate exposure modeling and careful treatment of exposure measurement error. oBjective: To illustrate the importance of accounting for exposure model characteristics in twostage air pollution studies, we considered a case study based on data from the Multi-Ethnic Study of Atherosclerosis (MESA). Methods: We built national spatial exposure models that used partial least squares and universal kriging to estimate annual average concentrations of four PM 2.5 components: elemental carbon (EC), organic carbon (OC), silicon (Si), and sulfur (S). We predicted PM 2.5 component exposures for the MESA cohort and estimated cross-sectional associations with carotid intima-media thickness (CIMT), adjusting for subject-specific covariates. We corrected for measurement error using recently developed methods that account for the spatial structure of predicted exposures. results: Our models performed well, with cross-validated R 2 values ranging from 0.62 to 0.95. Naïve analyses that did not account for measurement error indicated statistically significant associations between CIMT and exposure to OC, Si, and S. EC and OC exhibited little spatial correlation, and the corrected inference was unchanged from the naïve analysis. The Si and S exposure surfaces displayed notable spatial correlation, resulting in corrected confidence intervals (CIs) that were 50% wider than the naïve CIs, but that were still statistically significant. conclusion: The impact of correcting for measurement error on health effect inference is concordant with the degree of spatial correlation in the exposure surfaces. Exposure model characteristics must be considered when performing two-stage air pollution epidemiologic analyses because naïve health effect inference may be inappropriate. citation: Bergen S, Sheppard L, Sampson PD, Kim Szpiro et al. (2011b).
Here we present a case study to illustrate a holistic approach to twostage air pollu tion epidemiologic modeling, which includes exposure modeling in the first stage and health modeling that incorporates measure ment error correction in the second stage. We build national exposure models using PLS and universal kriging, and employ them to estimate longterm average concentrations of four chemical species of PM 2.5 -elemen tal carbon (EC), organic carbon (OC), sili con (Si), and sulfur (S)-selected to reflect a variety of different PM 2.5 sources and forma tion processes (Vedal et al., in press). After developing the exposure models, we derive predictions for the MultiEthnic Study of Atherosclerosis (MESA) cohort. These predic tions are used as the covariates of interest in health analyses assessing associations between carotid intimamedia thickness (CIMT), a subclinical measure of athero sclerosis, and exposure to PM 2.5 components. We apply measurement error correction methods to account for the fact that predicted rather than true exposures are being used in these health models. We discuss our results and their implications with regard to the effect of spatial correlation in exposure surfaces on estimated associations between exposures and health outcomes.

Data
Monitoring data. Data on EC, OC, Si, and S were collected to build the national models. These data consisted of annual aver ages from 2009-2010 as measured by the Interagency Monitoring for Protected Visual Environments (IMPROVE) and Chemical Speciation Network (CSN) of the U.S. Environmental Protection Agency (U.S. EPA 2009). The IMPROVE monitors are a nation wide network located mostly in remote areas. The CSN monitors are located in more urban areas. These two networks provide data that are evenly dispersed throughout the lower 48 states ( Figure 1).
All IMPROVE and CSN monitors that had at least 10 data points per quarter and a maximum of 45 days between measure ments were included in our analyses. Si and S measurements were averaged over 1 January  (Mercer et al. 2011). For NDVI a series of 23 monitor specific, 16day com posite satellite images were obtained, and the pixels within a given buffer were averaged for each image. PLS incorporated the 25th, 50th, and 75th percentile of these 23 averages. The median of "highvegetation season" image averages (defined as 1 April-30 September) and "lowvegetation season" averages (1 October-31 March) were also included. The geographic covariates were preprocessed to eliminate LUR covariates that were too homogeneous or outlierprone to be of use. Specifically, we eliminated variables with > 85% identical values, and those with the most extreme standardized outlier > 7. We logtransformed and truncated all distance variables at 10 km, and computed additional "compiled" distance variables such as mini mum distance to major roads and distance to any port. These compiled variables were then subject to the same inclusion criteria. All selected covariates were mean centered and scaled by their respective SDs.
MESA cohort. MESA is a population based study that began in 2000, with a cohort consisting of 6,814 participants from six U.S. cities: Los Angeles, California; St. Paul, Minnesota; Chicago, Illinois; Winston Salem, North Carolina; New York, New York; and Baltimore, Maryland. Four ethnic/ racial groups were targeted: white, Chinese American, African American, and Hispanic. All participants were free of physician diagnosed cardio vascular disease at time of entrance. [For additional details about the MESA study, see Bild et al. (2002).] These participants were also utilized in the Multi Ethnic Study of Athero sclerosis and Air Pollution (MESA Air), an ancillary study to MESA funded by the U.S. EPA to study the relationship between chronic exposure to air pollution and progression of subclini cal cardio vascular disease (Kaufman et al. 2012). Both the MESA and MESA Air stud ies were approved by the institutional review board (IRB) at each site, including the IRBs at the University of California, Los Angeles (Los Angeles, CA), Columbia University (New York, NY), Johns Hopkins University (Baltimore, MD), the University of Minnesota (MinneapolisSt. Paul, MN), Wake Forest University (WinstonSalem, NC), and Northwestern University (Evanston, IL). All subjects gave written informed consent.
We selected the CIMT end point in MESA as the health outcome for our case study. CIMT, a subclinical measure of athero sclerosis, was measured by Bmode ultrasound using a GE Logiq scanner (GE Healthcare, Wauwatosa, WI), and the end point was quantified as the right far wall CIMT mea sures conducted during MESA exam 1, which took place during 2000). We considered the 5,501 MESA participants who had CIMT measures during exam 1; our analysis was based on the 5,298 MESA participants who had CIMT mea sures during exam 1 and complete data for all selected model covariates.

Methods
The first stage of the twostage approach included building the exposure models using PLS as the covariates in universal kriging models. We used crossvalidation (CV) to select the number of PLS scores, determine how reliable predictions from each expo sure model were, and assess the extent to which spatial structure was present for each pollutant. The health modeling stage of the twostage approach included the health models we fit and the measurement error correction methods we employed. [For more detailed technical exposition, see Bergen et al. (2012).] Spatial prediction models. Notation. Let X t * denote the N* × 1 vector of observed squareroot transformed concentrations at monitor locations; R* the N* × p matrix of geographic covariates at monitor locations; X t the N × 1 vector of unknown squareroot transformed concentrations at the unobserved subject locations; and R the N × p matrix of geographic covariates at the subject locations. Note that for our exposure models, X t * and X t are dependent variables, and R* and R are independent variables. We used PLS to decompose R* into a set of linear combina tions of much smaller dimension than R*. Specifically,

R*H = T*.
Here, H is a p × k matrix of weights for the geographic covariates, and T* is an N* × k matrix of PLS components or scores. These scores are linear combinations of the geo graphic covariates found in such a way that they maximize the covariance between X t * and all possible linear combinations of R*. One might notice similarities between PLS and principal components analysis (PCA). Although the two methods are similar in that they are both dimension reduction methods, the scores from PLS maximize the covariance between X t * and all other pos sible linear combinations of R*, whereas the scores from PCA are chosen to explain as much as possible the covariance of R*. [For more details see Sampson et al. (2013)]. PLS scores at unobserved locations are then derived as T = RH.
Once the PLS scores T and T* were obtained for the subject and monitor ing locations, respectively, we assumed the following joint model for unobserved and observed exposures: Here α is a vector of regression coefficients for the PLS scores, and η and η* are N × 1 and N* × 1 vectors of errors, respectively. Our pri mary exposure models assumed that the error terms exhibited spatial correlation that could be modeled with a kriging variogram parame terized by a vector of parame ters θ = (τ 2 , σ 2 , φ) (Cressie 1992). The nugget, τ 2 , is interpre table as the amount of varia bility in the pollu tion exposures that is not explained by spatial structure; the partial sill, σ 2 , is interpretable as the amount of variability that is explained by spatial structure; and the range, φ, is inter pretable as the maximum distance between two locations beyond which they may no longer be considered spatially correlated. We estimated these parame ters and the regression coefficients α via profile maximum likelihood. Once these parame ters were estimated, we obtained predictions at unobserved locations by taking the mean of X t conditional on X t * and the estimated exposure model parame ters. Because our measurement error correction methods rely on a correctly specified exposure model, we took care to choose the bestfitting kriging variogram to model our data. We ini tially fit exponential variograms for all four pollutants and investigated whether plots of the estimated variogram appeared to fit the empirical variogram well. If they appeared to fit poorly, we investigated spherical and cubic variograms. The exponential variogram fit well for EC, OC, and S, but provided a poor fit for Si (data not shown). We therefore  volume 121 | number 9 | September 2013 • Environmental Health Perspectives examined cubic and spherical variograms and found the spherical variogram provided a much better fit and used it to model Si in our exposure models.
As a comparison to our primary kriging models, we also derived predictions from PLS alone without fitting a kriging variogram. This is analogous to a pure LUR model but using the PLS scores instead of actual geographic covariates. For this analysis η and η* were assumed to be independent, and α was esti mated using a leastsquares fit to regression of X t * on T*. PLSonly predictions at the unob served locations were then derived as the fit ted values from this regression using the PLS scores at the subject locations.
CV and model selection. We used 10fold CV (Hastie et al. 2001) to assess the models' prediction accuracy, to select the number of PLS components to use in the final prediction models, and to compare predictions generated using PLS only to our primary models, which used both PLS and universal kriging. Data were randomly assigned to 1 of 10 groups. One group (a "test set") was omitted, and the remaining groups (a "training set") were used to fit the model and generate test set predic tions. Each group played the role of test set until predictions were obtained for the entire data set. At each iteration, the following steps were taken to crossvalidate our primary mod els (similar steps were followed to derive cross validated predictions that used PLS only): • PLS was fit using the training set, and K scores were computed for the test set, for K = 1,...,10. • Universal kriging parame ters θ and coef ficients α were estimated via profile maxi mum likelihood using the training set. The first K PLS scores correspond to T* in Equation 1, for K = 1,...,10. • Predictions were derived using the first K PLS components and the corresponding universal kriging, using kriging parame ters estimated from the training set. We used the R package pls to fit the PLS. universal kriging was performed using the R package geoR. The bestperforming models were selected out of those that used both PLS and kriging based on their crossvalidated root mean squared error of prediction (RMSEP) and corresponding R 2 . For a data set with N* observations and corresponding predictions, the formulae for these performance metrics are given by These metrics are sensitive to scale; accord ingly, they are useful for evaluating model performance for a given pollutant but not for comparing models across pollutants. Health modeling. Disease model. Multi variable linear regression models were used to estimate the effects of each individual PM 2.5 component exposure on CIMT. Each model included a single PM 2.5 component along with a vector of subjectspecific covariates. Let Y be the 5,298 × 1 vector of health outcomes for the 5,298 MESA participants included in the analysis, W the 5,298 × 1 vector of expo sure predictions on the untransformed scale, and Z a matrix of potential confounders. We assumed linear relationships between Y, the true exposures, and Z, and fit the following equation via ordinary least squares (OLS): [4] Measurement error correction. The model in Equation 4 was fit using the predicted exposures W instead of the true exposures as the covariate of interest. Using predictions rather than true exposures in health modeling introduces two sources of measurement error that potentially influence the behavior of We used the parame ter bootstrap in the context of predictions that use both PLS and universal kriging; the approach would be very similar if PLS alone was used (although we did not implement that correction here).
with corresponding biascorrected effect estimate β x,λ corrected = β x -Bias λ (β x ). 4. Estimate the bootstrap SE as For our implementation of the parame ter bootstrap, we set B = 30,000 and λ = 1.
The goal of the parame ter bootstrap is to approximate the sampling properties of the measurement errorimpacted β x that would be estimated if we performed our twostage analysis with many actual realizations of monitoring observations and subject health data sets. Accordingly, step 2(a) gives us B new "realizations" of our data. For λ = 1, step 2(b) accounts for the classicallike error by resampling the exposure model parame ters.
Step 2(c) accounts for the Berksonlike error by smoothing the true exposure surface.
Step 2(d) then calculates B new β x,j 's, the sam pling properties of which have incorporated all sources of measurement error. Comparing these to the mean of bootstrapped β x,j derived using fixed exposure model parame ters (i.e., λ = 0) gives us an approximation of the bias induced by the classicallike error (step 3), and the empirical SD approximates the SE that accounts for both sources of measurement error (step 4).
We also implemented the parame ter boot strap for λ = 0. This is equivalent to the "partial parametric bootstrap" described by Szpiro et al. (2011b), which accounts for the Berksonlike error only because the exposure surface is still smoothed, but with fixed parame ters.
A desirable trait of the parame ter boot strap is the ability to "tune" the amount of the classicallike error by varying λ, which allows us to investigate how variability in the sampling distribution of (α j , θ j ) affects the bias of β x . This can be useful in refining our boot strap bias estimates by simulation extrapola tion (SIMEX) (Stefanski and Cook 1995). (For additional information on our approach to SIMEX and the results of applying it to the MESA data, see Supplemental Material, pp. 2-3 and Figure S1.)

Results
Data. Monitoring data. Mean concentrations of the four pollutants according to monitoring network are shown in Table 1. EC and OC concentrations measured by CSN monitors tended to be higher than concentrations mea sured by IMPROVE monitors. Average Si and S concentrations measured by CSN moni tors were also higher than the IMPROVE averages; however, relative to their SDs, the differences between CSN and IMPROVE monitors in Si and S concentrations were not as great as the differences between EC and OC concentrations.
Geographic covariates. The geographic variables that we used are listed in Table 2. Most of these variables were used for modeling all four pollutants, but not all. The following variables were used for modeling Si and S but not EC and OC: PM 2.5 and PM 10 emissions, streams and canals within a 3km buffer, other urban or builtup land use within a 400m buffer, lakes within a 10km buffer, industrial and commercial complexes within a 15km buffer, and herbaceous rangeland within a 3km buffer. On the other hand, the following variables were used for modeling EC and OC but not Si and S: industrial land use within 1 and 1.5km buffers.
The distributions of selected geographic covariates are shown according to monitor ing network and MESA locations in Table 1. Although relatively few monitors belonging to either IMPROVE or CSN were within 150 m of an A1 road, there was a larger pro portion of CSN monitors within 150 m of an A3 road (44%) than IMPROVE monitors (19%), consistent with the placement of CSN monitors in more urban locations compared with IMPROVE monitors (Table 1). The median distance to commercial and service centers was much smaller for CSN moni tors (127 m vs. 4,696 m), and the median population density was much larger for CSN monitors (805 persons/mi 2 ) than for IMPROVE monitors (only 3 persons/mi 2 ). Median summer NDVI values within 250 m were slightly smaller for CSN monitors than for IMPROVE monitors, consistent with the placement of IMPROVE monitors in greener areas. Geographic covariate distributions among MESA participant locations were more consistent with the CSN monitors, as is espe cially evident for the number of sites < 150 m from an A3 road and median population den sity (Table 1). Density plots of the geographic covariates for monitoring and subject locations indicated noticeable overlap for all geographic covariates (data not shown), suggesting differ ences in geographic covariates between moni tor and MESA locations were consistent with the concentration of MESA subjects in urban locations, not extrapolation beyond our data.
MESA cohort. Distributions of health model covariates among MESA cohort par ticipants are summarized in Table 3. The mean CIMT (0.68 ± 0.19 mm); mean age (62 ± 10 years); sex (52% female); race (39% white, 12% Chinese American, 27% African American, and 22% Hispanic); and status (44% hypertension status and 15% statin use) were determined by questionnaire (Bild et al. 2002). The highest percentage of par ticipants resided in Los Angeles (19.7%), but the distribution across the six cities was quite homogeneous. Only the 5,298 participants   Table 3 were included in the analysis. Spatial prediction models. Model evaluation. The selected models corresponding to lowest crossvalidated R 2 all used PLS and uni versal kriging. For all four PM 2.5 components and for all numbers of PLS scores, kriging improved prediction accuracy, as indicated by the R 2 and RMSEP statistics for the selected prediction models corresponding to the best performing PLSonly and PLS + universal kriging models (Table 4). Comparing the R 2 with and without universal kriging indicates that EC and OC were not much improved by kriging, whereas universal kriging improved prediction accuracy for Si and even more so for S. The ratio of the nugget to the sill (i.e., τ 2 / σ 2 ) also supports improved predictions with spatial smoothing by kriging. For a fixed range, smaller values of this ratio indicate that con centrations at nearby locations receive greater weight when kriging. We see this relationship in Table 4 where τ 2 /σ 2 was large when uni versal kriging did little to improve prediction accuracy, and very small when universal krig ing helped improve prediction accuracy.
As a sensitivity analysis we also carried out CV using nearest monitor exposure estimates. This method performed very poorly for EC and OC (R 2 s of 0 and 0.06, respectively), relatively poorly for Si (R 2 = 0.36), but per formed well for S (R 2 = 0.88).
Interpretation of PLS. Figure 2 illustrates the geographic covariates that were most important for explaining pollutant variabil ity. Specifically, Figure 2 summarizes the p × 1 vector m, the vector such that Rm equals the 5,298 exposures predicted with PLS only. Each element of m is a weight for a correspond ing geographic covariate. Positive elements in m (i.e., values > 0 in Figure 2) indicate that higher values of the geographic covariate were associated with higher predicted exposure; the larger the absolute value of an element in m, the more the corresponding geographic covari ate contributed to exposure prediction.
Population density was associated with larger predicted values of all pollutants, par ticularly for EC, OC, and S. Industrial land use within the smallest buffer was very predic tive of EC and OC, and evergreen forest land within a given buffer was strongly predictive of decreases in S. NDVI, industrial land use, emissions, and linelength variables were posi tively associa ted with all exposures except Si, whereas all the distancetofeatures variables were negatively associated with all exposures except Si. The NDVI variables were more important for prediction of OC and S than they were for EC. For Si, the NDVI and tran sitional land use variables appeared to be the most informative for prediction, with NDVI negatively and transitional land use posi tively associated with Si exposure. Distance to features appeared to be informative for all four pollutants.
Exposure predictions. Figure 1 shows predicted concentrations across the United States, with finer detail illustrated for St. Paul, Minnesota. The EC and OC predictions were much higher in the middle of urban areas, and quickly dissi pated further from urban centers. S predictions were high across the midwest ern and eastern states and in the Los Angeles area, and lower in the plains and mountains. Si predictions were low in most urban areas, and high in desert states.
Health models. The results from the naïve health model that did not include any measurement error correction, as well as the results from the health model that included bootstrapcorrected point estimates and SEs of β x , are displayed in Table 5. The naïve analysis indicated significant positive associa tions (p < 0.05) of CIMT with OC, Si, and S. There was also a positive but nonsignificant association between CIMT and EC. SEs for the EC and OC health effects were virtually unchanged when measurement error correc tion was implemented, whereas the bootstrap corrected SEs for Si and S were about 50% larger than their respective naïve estimates. The estimated biases resulting from the classical like measurement error were so small as to be uninteresting from an epidemiologic perspective because the point estimates of all four pollutants after implementing measure ment error correction were unchanged out to three decimal places.

Discussion
Summary. Our comprehensive twostage approach to estimating longterm effects of air pollution exposure includes a national predic tion model to estimate exposures to individual PM 2.5 components and corrects for measure ment error in the epidemiologic analysis using a methodology that accounts for differing amounts of spatial structure in the exposure surfaces. In a case study of four components of PM 2.5 and measurement error-corrected associations between these components and CIMT in the MESA cohort, corrected SEs corresponding to pollutants that exhibi ted sig nificant spatial structure (i.e., Si and S) were 50% larger than naïve estimates, whereas cor rected SE estimates for EC and OC were very similar to the naïve estimates. National exposure models. We find that a national approach to exposure modeling is reasonable and performs well in terms of pre diction accuracy. Our primary PLS + universal kriging models resulted in crossvalidated R 2 ≤ 0.95 (for predicting S concentrations) and ≥ 0.62 (for predicting Si) for any of the PM 2.5 components. Use of kriging improved the crossvalidated R 2 for all four pollutants compared with models that used PLS only, although the improvement was not equal across all four pollutants. These results are useful in terms of understanding the spa tial nature of our exposure surfaces. For EC and OC, the R 2 only improved by ≤ 0.09 when kriging was used compared to when PLS alone was used, indicating little large scale spatial structure in these pollutants. For Si, the R 2 improved from 0.36 to 0.62; and for S, from 0.63 to 0.95. This indicates that S (and to a lesser extent Si) had substantial largescale spatial structure that kriging was able to exploit. For all models, using kriging improved R 2 , indicating that no prediction accuracy was lost (and quite a bit stood to be gained, when spatial structure was present) by using PLS+universal kriging as opposed to using PLS alone. Our results also suggest  Table 2): A1 road, nearest road, airport, large airport, port, coastline, commercial or service center, railroad, and rail yard. Variable abbreviations and buffer sizes are indicated in Table 2. Most of the variables shown here were used for modeling all four pollutants, but not all. Variables used for modeling Si and S but not EC and OC were PM 2.5 and PM 10 emissions, streams and canals within a 3-km buffer, other urban or built-up land use within a 400-m buffer, lakes within a 10-km buffer, industrial and commercial complexes within a 15-km buffer, and herbaceous rangeland within a 3-km buffer. The variables used for modeling EC and OC but not Si and S were industrial land use within 1-and 1.5-km buffers. that exposure models such as the ones we have built may be preferable in many cases to simpler approaches such as nearest monitor interpolation. Our models produced cross validated R 2 that were higher than the nearest monitor approach, and our results indicate that unless there is considerable spatial struc ture in the exposure surface, a substantial amount of prediction accuracy may be lost when the nearestmonitor approach is used. We used twostage modeling instead of joint modeling of exposure and health for a variety of reasons. One is pragmatic: Joint modeling is computationally intensive, so our twostage approach is especially desirable when modeling multi ple pollutants. Joint modeling may also be more sensitive to out liers in the health data. Twostage modeling also appeals more intuitively in the context of modeling multi ple health outcomes because it assigns one exposure per participant that can then be used to model a number of different health outcomes. Joint modeling, on the other hand, would assign different levels of the same pol lutant depending on what health outcome was being modeled.
Epidemiologic case study. In this case study, we focused on four PM 2.5 components selected to gain insight into the sources or features of PM 2.5 that might contribute to the effects of PM 2.5 on cardio vascular disease. EC and OC were chosen as markers of pri mary emissions from combustion processes, with OC also including contributions from secondary organic aerosols formed from atmo spheric chemical reactions; Si was chosen as a marker of crustal dust; and S was chosen as a marker of sulfate, an inorganic aerosol formed secondarily from atmospheric chemical reac tions (Vedal et al., in press). The mechanisms whereby exposures to PM 2.5 or PM 2.5 com ponents produce cardiovascular effects such as athero sclerosis are not well understood, although several mechanisms have been pro posed (Brook et al. 2010). [For discussion of other studies examin ing the effects of these pollutants, see Vedal et al. (in press).] The relatively poor performance of nearest monitor interpolation for EC, OC, and Si raises concerns about epidemiologic infer ences based on predictions derived from that method. For S, the only pollutant for which our models and nearestmonitor interpolation performed comparably, the estimated increase in CIMT for a 1unit increase in exposure based on nearestmonitor interpolation was 0.074 ± 0.018, comparable to the naïve infer ence made using predictions from our expo sure models (0.055 ± 0.017). However, there is no way to correct for measurement error using this method, which is another significant advantage of our models.
Naïve health analyses based on exposure predictions from our national models indicated significant associations of CIMT with 1unit increases in average OC, Si, and S, but not EC. Using the parame ter bootstrap to account and correct for measurement error led to notice ably larger SEs and wider CIs for Si and S; however, OC, Si, and S were still significantly associated with CIMT even after correcting for measurement error.
Measurement error correction. For EC and OC, using PLS alone was sufficient to make accurate predictions, whereas the spatial smoothing from universal kriging substan tially improved prediction accuracy for Si and S. It is accordingly no coincidence that the bootstrap corrected SE estimates for EC and OC were unchanged from the naïve estimates, whereas the corrected SE estimates for Si and S were about 50% larger (and the resulting 95% CIs 50% wider) than their respective naïve estimates. The fact that the EC and OC exposure predictions were derived mostly from the PLSonly models, which assumed inde pendent residuals, implies that the Berkson like error was almost pure Berkson error (i.e., independent across location), which was cor rectly accounted for by naïve SE estimates. On the other hand, much more smoothing took place for Si and S, which induced spatial correlation in the residual difference between true and predicted exposure. Accordingly, SEs that correctly account for the Berkson like error in these two pollutants are inflated because the correlated errors in the predictions translate into correlated residuals in the disease model that are not accounted for by naïve SE estimates (Szpiro et al. 2011b). The fact that the SE estimates from the parame ter boot strap using λ = 1 (which accounts for both Berksonlike and classicallike error) and using λ = 0 (which accounts only for Berksonlike error) were so similar further indicates that the larger corrected SE estimates were most likely a result of the Berksonlike error. None of our measurement error analyses indicated that any important bias was induced by the classical like error.
Limitations and model considerations. Although our exposure models performed well, there is still room for improvement in prediction accuracy, especially for the EC, OC, and Si models, which had crossvalidated R 2 that could be improved upon. For these models it is possible that inclusion of addi tional geographic covariates in the PLS would help improve model performance. Examples include woodburning sources within a given buffer for EC and OC concentrations, or dust and sand sources for Si. These covariates are currently not available in our databases. Furthermore, although it is possible to inter pret the individual covariates in PLS compo nents (Figure 2), such interpretations need to be regarded with caution because inclu sion of many correlated covariates can lead to apparent associations that are counter intuitive and the opposite of what might be expected scientifically. Finally, PLS does not consider interactions or nonlinear combinations of the geographic covariates, factors which could improve model performance.
Implications and future directions. Our results show that careful investigation of the exposure model characteristics can help to clarify the implications for the subsequent epidemiologic analyses that use the predicted exposures. As noted by Szpiro et al. (2011a), an overarching framework that considers the end goal of health modeling seems more appealing than treating exposure models as if they exist for their own sake. This analysis serves as an example that will inform ongoing efforts by our group and others to construct and utilize exposure prediction models that are most suitable for epidemiologic studies.
Our epidemiologic inference was based on one health model per pollutant. One might reasonably be interested in how multi ple pol lutants jointly affect health. However, current literature for measurement error correction does not address models that use multi ple predicted pollutants as exposures. Our group is currently working on methods to address this challenge. Point estimates are estimates of the increase in CIMT for a 1-unit increase in each pollutant. a In the case of λ = 1, β x refers to the estimate corrected for any bias from classical-like error. b PB refers to results from parame ter bootstrap implemented with given value of λ.