A Unified Spatiotemporal Modeling Approach for Predicting Concentrations of Multiple Air Pollutants in the Multi-Ethnic Study of Atherosclerosis and Air Pollution

Background: Cohort studies of the relationship between air pollution exposure and chronic health effects require predictions of exposure over long periods of time. Objectives: We developed a unified modeling approach for predicting fine particulate matter, nitrogen dioxide, oxides of nitrogen, and black carbon (as measured by light absorption coefficient) in six U.S. metropolitan regions from 1999 through early 2012 as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Methods: We obtained monitoring data from regulatory networks and supplemented those data with study-specific measurements collected from MESA Air community locations and participants’ homes. In each region, we applied a spatiotemporal model that included a long-term spatial mean, time trends with spatially varying coefficients, and a spatiotemporal residual. The mean structure was derived from a large set of geographic covariates that was reduced using partial least-squares regression. We estimated time trends from observed time series and used spatial smoothing methods to borrow strength between observations. Results: Prediction accuracy was high for most models, with cross-validation R2 (R2CV) > 0.80 at regulatory and fixed sites for most regions and pollutants. At home sites, overall R2CV ranged from 0.45 to 0.92, and temporally adjusted R2CV ranged from 0.23 to 0.92. Conclusions: This novel spatiotemporal modeling approach provides accurate fine-scale predictions in multiple regions for four pollutants. We have generated participant-specific predictions for MESA Air to investigate health effects of long-term air pollution exposures. These successes highlight modeling advances that can be adopted more widely in modern cohort studies. Citation: Keller JP, Olives C, Kim SY, Sheppard L, Sampson PD, Szpiro AA, Oron AP, Lindström J, Vedal S, Kaufman JD. 2015. A unified spatiotemporal modeling approach for predicting concentrations of multiple air pollutants in the Multi-Ethnic Study of Atherosclerosis and Air Pollution. Environ Health Perspect 123:301–309; http://dx.doi.org/10.1289/ehp.1408145


Introduction
Cohort studies have shown an association between long-term exposure to air pollution and cardiovascular morbidity and mortality (Brook et al. 2010;Miller et al. 2007;Pope and Dockery 2006). To estimate these associations, epidemiologic studies develop exposure prediction models to predict pollutant concentrations over long periods of time at cohort home addresses based on monitoring data from regulatory networks or study-specific monitoring campaigns. Although early models were based on region-wide averages or nearestmonitor approaches, more current methods include land-use regression (LUR) (Hoek et al. 2008;Jerrett et al. 2007), the use of satellite and remote sensing data (Kloog et al. 2011), geostatistical methods such as kriging (Beelen et al. 2009), generalized additive models (Hart et al. 2009), or a combination of these techniques (Beckerman et al. 2013;Bergen et al. 2013;Mercer et al. 2011).
The Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) is investigating the association between long-term air pollution exposure and measures of cardiovascular health (Kaufman et al. 2012). MESA Air is following a cohort of > 6,000 individuals in six metropolitan regions: Baltimore, Maryland; Chicago, Illinois; Los Angeles, California; New York, New York; St. Paul, Minnesota; and Winston-Salem, North Carolina. The primary exposures of interest in MESA Air are fine particulate matter (≤ 2.5 μm; PM 2.5 ), nitrogen dioxide (NO 2 ), oxides of nitrogen (NO x ), and black carbon, as measured by light absorption coefficient (LAC). One goal of MESA Air is the development of advanced statistical methods that incorporate extensive supplemental monitoring to improve the prediction of intra-urban pollutant variability. MESA Air health effect analyses require spatiotemporal predictions of ambient outdoor concentrations for all four pollutants in all six metropolitan regions for times ranging from 1999 through 2012.
In general, exposure prediction models developed in one city do not transfer well to another city , so prediction models are often study-and city-specific (e.g., Franklin et al. 2012). A challenge for multicity studies such as MESA Air that combine data from subcohorts and include several pollutant measures is generating predictions that are of comparable quality across pollutants and cities. Here we present a unified and flexible spatiotemporal modeling framework for the four MESA Air pollutants. We apply a standardized approach to model selection for all pollutants and regions, allowing the intrinsic flexibility of our modeling framework to account for differences in the way pollution processes behave in different regions.

Methods
To predict outdoor concentration of pollutants at MESA Air participant residences, we fit a separate spatiotemporal exposure prediction model for each pollutant (PM 2.5 , NO 2 , NO x , and LAC) in each metropolitan region. Briefly, our model decomposes the spacetime field of concentrations into spatially varying long-term (i.e., duration of study period) averages, spatially varying seasonal and long-term trends, and spatially correlated but temporally independent residuals, and Background: Cohort studies of the relationship between air pollution exposure and chronic health effects require predictions of exposure over long periods of time. oBjectives: We developed a unified modeling approach for predicting fine particulate matter, nitrogen dioxide, oxides of nitrogen, and black carbon (as measured by light absorption coefficient) in six U.S. metropolitan regions from 1999 through early 2012 as part of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Methods: We obtained monitoring data from regulatory networks and supplemented those data with study-specific measurements collected from MESA Air community locations and participants' homes. In each region, we applied a spatiotemporal model that included a long-term spatial mean, time trends with spatially varying coefficients, and a spatiotemporal residual. The mean structure was derived from a large set of geographic covariates that was reduced using partial least-squares regression. We estimated time trends from observed time series and used spatial smoothing methods to borrow strength between observations. results: Prediction accuracy was high for most models, with cross-validation R 2 (R 2 CV ) > 0.80 at regulatory and fixed sites for most regions and pollutants. At home sites, overall R 2 CV ranged from 0.45 to 0.92, and temporally adjusted R 2 CV ranged from 0.23 to 0.92. conclusions: This novel spatiotemporal modeling approach provides accurate fine-scale predictions in multiple regions for four pollutants. We have generated participant-specific predictions for MESA Air to investigate health effects of long-term air pollution exposures. These successes highlight modeling advances that can be adopted more widely in modern cohort studies. citation: Keller JP, Olives C, Kim SY, Sheppard L, Sampson  accommodates data from the complex monitoring design described below. We modeled on a 2-week time scale because of the 2-week sampling of MESA Air supplementary monitoring instruments; this allows for flexible aggregation of predictions to time scales of interest for health effects analyses (e.g., 12 months before clinic visit). Monitoring data. We used three sources of outdoor air monitoring data. We obtained hourly NO 2 and NO x and daily PM 2.5 concentration measurements in each metropolitan region from 1 January 1999 through 31 March 2012 from the U.S. Environmental Protection Agency (EPA) Air Quality System (AQS) (http://www.epa.gov/ttn/airs/airsaqs/ detaildata/downloadaqsdata.htm), including data from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network (http://vista.cira.colostate.edu/ IMPROVE/). No AQS data were used for black carbon because of differences in collection methods from the MESA Air LAC measurement methods described below. We aggregated hourly data into daily averages and subsequently averaged daily values to the 2-week scale. AQS monitors that had < 2 years of data or had irregular temporal coverage (e.g., operated only in the summer) were not used.
In each metropolitan region, we defined the modeling area to be locations within approximately 75 km of each metropolitan center ( Figure 1). AQS monitors within the modeling regions were considered for inclusion in the model, and predictions at participant residences were restricted to locations within these modeling regions. In New York, MESA Air participants were recruited from both New York City and Rockland County, so the modeling region included locations near both areas. In Winston-Salem, only one AQS monitoring location for NO 2 and NO x met inclusion criteria. To have a complete time series for the 14-year modeling period, an AQS monitor in Charlotte, North Carolina, was included for estimating time trends. In Chicago, the modeling region was further restricted to locations west of -87.5°W longitude because some covariates were unavailable east of that meridian. In Los Angeles, only locations south and/or west of the San Gabriel Mountains were included.
To better capture the within-city variability of pollutant concentrations, MESA Air conducted a supplementary monitoring campaign targeting the study cohort from July 2005 through August 2009. The MESA Air measurements were 2-week cumulative measurements that began and ended on Wednesdays. Measurements of NO 2 and NO x were made using Ogawa passive samplers, and PM 2.5 mass was measured on Harvard Personal Environmental Monitor impactors using Teflon filters. LAC was computed from the Teflon filters via reflectance. A detailed description of the data collection and site selection procedures has been previously published (Cohen et al. 2009).
The MESA Air monitoring campaign included three types of monitoring sites: fixed, home, and snapshot. Fixed sites were operated for the duration of the 4-year MESA Air sampling period to provide long time series of measurements, with one fixed site collocated with an AQS monitor in each region. Samples of participant residences in each metropoli tan region were selected for monitoring as home outdoor sites on a rotating basis, with most locations monitored one to three times in different seasons. Snapshot sites, which measured only NO 2 and NO x , were located in clusters to capture gradients near sources (e.g., primary roadways) and monitored for three 2-week periods, one each in winter, summer, and either spring or fall.
In New York City, data from the New York City Community Air Survey (NYCCAS) were used to supplement the AQS and MESA Air data (Matte et al. 2013; NYC Department of Health 2014). The NYCCAS data consist of 2-week measurements of PM 2.5 , NO 2 , NO x , and LAC collected during December 2008-December 2010 in a manner consistent with MESA Air sampling protocols. Five NYCCAS reference sites (one in each borough) collected measurements throughout the sampling period, and 150 NYCCAS distributed sites were monitored once per season during this time. Because of the similarity in monitoring scheme, we treated NYCCAS reference sites in the same manner as MESA Air fixed sites, and NYCCAS distributed sites in the same manner as MESA Air home sites, in our models. The NYCCAS data and a small subset of the MESA Air 2-week data were centered on different weeks than most of the MESA Air measurements. To align these measurements with the rest of the MESA Air data, we treated these measurements as if they were made 1 week earlier or later, as appropriate.
Between 0.4% (LAC) and 1.6% (NO 2 ) of the pollutant measurements were below the limit of detection (LOD) and were replaced with the value LOD/2. The number of each type of monitoring site by region and pollutant is provided in Table 1. The range of the number of PM 2.5 observations at each monitoring site during the study period is provided in Table 2, along with summary statistics for the site means. Corresponding statistics for NO 2 , NO x , and LAC observations are provided in Supplemental Material, Tables S1-S3.
Geographic covariates. We compiled > 300 geographic covariates for use in the model (see Supplemental Material, Table S4). These covariates included proximity measures (distance to nearest major road, intersection, truck route, railway, railyard, coastline, airport, and port) and buffer measures (major road length, truck route length, land-use category, long-term vegetation index, population density, and emission sources). We included a long-term average of the dispersion model output from a modified implementation of the Caline3QHCR line-source model (Eckhoff and Braverman 1995). The Caline3QHCR model incorporates distance, traffic volume, meteorology, and diurnal traffic patterns in each region.
Geographic covariates with minimal variation or potentially highly influential values were excluded from the modeling process. Specifically, variables were removed if a) > 80% of monitoring sites had the same value, b) > 2% of observations were more than 5 SDs away from mean, c) the standard deviation of the distribution of values at participant residences was more than five times the standard deviation of the distribution of values at monitoring locations, or d) the maximum value was 10% among all monitoring sites (for land-use variables only). These filters were applied separately for each pollutant and region.
Spatiotemporal model. The monitoring data were highly unbalanced, with a small number of locations providing long time series of several years' duration and a larger  (Lindström et al. 2014;Sampson et al. 2011;Szpiro et al. 2010). This model can be written as where C(s,t) represents the log-transformed 2-week average pollutant concentration at location s and time t. The μ(s,t) term is the spatiotemporal mean surface, and the v(s,t) term represents spatiotemporal residual variation. We break down the spatiotemporal mean into components where β 0 (s) is the long-term mean at location s, f i (t) are smooth time trends, and β i (s) are spatially varying coefficients for the time trends. The time trends are estimated from AQS and MESA Air fixed sites (and NYCCAS reference sites in New York) using a procedure developed by Fuentes et al. (2007) and previously described in detail by Sampson et al. (2011). In brief, we applied an expectationmaximization procedure to fill in missing values in the time series and derived the trends from a singular value decomposition. We smoothed the trends using splines, controlling the smoothness with the degrees of freedom (df) parameter. The model assumes the time trends account for enough of the temporal structure that the residuals are uncorrelated in time.
The long-term averages β 0 (s) and time trend coefficients β i (s) are modeled as spatial random fields with a spatial mean, distributed as Here, X i (s) are reduced-dimension summaries of the geographic covariates (described in detail below) at location s, and α i are vectors of coefficients to be estimated. The covariance structure for β i , denoted by Σ i , is either an independence model with variance τ i or a spatial smoothing model with exponential covariance function parameterized by range ϕ i , partial sill σ i , and nugget τ i (Cressie 1993).
The zero-mean spatiotemporal residual term v (s,t) in Equation 1 has a spatial correlation structure and is assumed independent at each time point. It includes a random effect for each time point to model short-term variations that affect an entire region, such as large-scale meteorological events.  Partial least squares (PLS) scores. Rather than include each of the hundreds of geographic covariates directly in the model or use variable selection methods, we reduced the dimensionality of the covariates using PLS. In a manner similar to principal components analysis (PCA), PLS computes linear combinations, called scores, of the columns of a data matrix. Unlike PCA, the PLS procedure constructs scores that maximize the covariance between the scores and an outcome rather than the variance between the scores. A technical explanation of the PLS algorithm is provided by Abdi (2010). Sampson et al. (2013) described the application of PLS for spatial models, and here we describe how we applied the method to spatiotemporal data.
PLS regression requires a single outcome value for each location. Because the MESA Air data are unbalanced time series, we first derived values that could be used as outcomes in PLS regression. For each AQS, fixed, and NYCCAS reference site s, we regressed the time series of observations C(s,t) on the smoothed time trends using ordinary least squares regression with mean function E[C(s,t)] = γ s 0 f 0 (t) + … + γ s m f m (t) to get estimates (γ s 0 ,…, γ s m ) for each location. For each time trend, PLS regression was performed separately with the γ s i as the outcomes and the matrix of geographic covariates as the predictors. This gave a set of PLS scores for each location that was different for each time trend. PLS scores at home and snapshot monitoring sites were predicted using the geographic variables at those locations and the score definitions defined from the regression at fixed sites. PLS regression was performed using the pls package (Mevik et al. 2011), in R (R Core Team; http:r-project.org). Scores were included in the model as the X i (s) in Equation 3. Parameter estimation and model selection. Once the PLS scores X i (s) and time trends f i (t) were computed, the remaining parameters were calculated via maximum likelihood using the SpatioTemporal package, version 1.1.7 (Lindström et al. 2012), in R. We varied several model parameters and used cross-validation to find the best-fitting model in each metropolitan region, as described below. We considered different values for the number of time trends (either 1 or 2), the df for smoothing time trends (either 4 or 8 per year), the number of PLS scores per time trend (2 or 3), and the covariance structure of the β i fields (spatial smoothing or no spatial smoothing).
Cross-validation procedure. The primary interest of MESA Air is in long-term average exposures, so we assessed model performance using cross-validation of longterm averages (LTAs). Because the highly unbalanced structure of the monitoring data means that LTAs at home sites are computed from a handful of observations of a few weeks' duration, whereas LTAs at fixed sites are computed from long time series, we performed cross-validation separately for each site type. For home sites and NYCCAS distributed sites, we used 10-fold cross-validation, which leaves out one-tenth of the data in turn. For AQS, fixed, and NYCCAS references sites, we used leave-oneout cross-validation because the total number of monitors was relatively small. For snapshot sites, we used 10-fold cross-validation, with monitors in the same cluster left out together. For all three schemes, the covariance parameters (but not the time trends or PLS scores) were re-estimated using all but the left-out sites. Pollutant concentrations at the left-out sites were predicted using the parameters estimated from the remaining data.
R 2 CV provides a measure of fit to the 1-1 line, in contrast to the typical regression-based R 2 (R 2 CVreg ), which measures fit to the regression line and is computed as the square of the correlation coefficient between the crossvalidation predictions and the observed values. R 2 CV reflects the contrast of interest because our goal is accurate prediction at unmeasured locations, and it is typically lower than R 2 CVreg . Although R 2 CV was the primary metric for our model evaluation, we present R 2 CVreg for comparison with published results from other authors. Because we are most interested in spatial contrasts between individual exposures within each region, we prioritized home-site over fixed-site R 2 CV and RMSE in the model selection process. In the context of the hierarchical model (Equation 1), it is challenging to separate the spatial and temporal contributions to R 2 CV for cross-validation of temporally sparse data sets such as the home sites and NYCCAS distributed sites. Lindström et al. (2014) proposed three temporally adjusted adaptations of R 2 CV that use data from the AQS and fixed sites as the reference MSE instead of MSE obs in Equation 5 in order to focus on spatial prediction accuracy. R 2 Avg uses the average values at AQS and fixed sites within that region. R 2 Close uses the closest (in absolute distance) AQS or fixed site. R 2 Smooth uses the smoothed time trend at the closest AQS or fixed site.
Prediction at participant locations. Using the best models from each metropolitan region, predictions of pollutant logconcentrations at participant residences were made on a 2-week scale from January 1999 through March 2012. We back-transformed these predictions using exponentiation to return them to the original scale of concentration measurements and computed averages of 2-week predictions over the study period.

Results
Model structure. Table 3 provides an overview of the model structure selected for each metropolitan region and pollutant. Most models have only one time trend, although all the New York models have two. The two smoothed trends for the NO 2 model in Los Angeles are shown in Figure 2. Figure 2 also includes plots of the fitted trends for a selected AQS site and fixed site. For PM 2.5 , there was noticeable heterogeneity of the best models across metropolitan regions (Table 3). New York and Winston-Salem had no spatial smoothing in the longterm PM 2.5 average [β 0 (s)], and no model had spatial smoothing in the PM 2.5 time trend coefficients [β i (s)]. Half of the regions had two time trends, whereas the other half had only a single time trend for PM 2.5 . For NO 2 , all of the models had spatial smoothing for the long-term average, and the same was true for NO x except in New York.
The relative contribution of geographic covariates to the PLS scores varied by region and pollutant. In the Supplemental Material, Figure S1 shows the correlations between covariates and PLS scores for the NO 2 model in Chicago, which are representative of the general patterns in the other regions (data not shown). Overall, the distance-to-feature covariates and vegetation measures [Normalized Difference Vegetation Index (NDVI) and low development, open development, forest, and wetland land use] tended to have the opposite correlation from emissions and traffic measures (A1, A2/A3, and truck route lengths and inter section counts) within buffers.
Model results. Table 4 shows the crossvalidation metrics for all models, broken down by pollutant and region. These metrics assess how well the site means are modeled, incorporating both the spatial and temporal components of the predictions. Scatter plots of predictions and observed values are provided in Figure 3 for AQS and fixed sites and in Supplemental Material, Figure S2, for home sites. Metrics for the snapshot sites (for NO 2 and NO x ) and for AQS and fixed sites (all four pollutants) on the 2-week scale are reported in the Supplemental Material, Tables S5 and S6.
Predictive accuracy was generally good (R 2 CV > 0.6) to excellent (R 2 CV > 0.8) in almost all regions for each pollutant. NO x models in Baltimore and Los Angeles had the best performance at MESA home sites (R 2 CV of 0.92 and 0.91, respectively) (  Table 4, suggesting that we are predicting spatial differences well. For PM 2.5 , however, the temporally adjusted R 2 CV are consistently lower than the unadjusted R 2 CV . Box plots of the long-term averages of predictions at participant residences are provided in Figure 4. On average, predicted concentrations tended to be higher in New York and Los Angeles, consistent with the higher observed monitoring values in those regions. Variability in predictions is also greatest in these two cities, especially in the tails of the distributions.
Supplemental Material, Table S7, provides performance metrics for the New

Discussion
We present here a complex and successful approach to predicting long-term air pollution concentrations for application in a cohort study. Although this approach was tailored to this particular well-characterized cohort study-taking advantage of cohortspecific monitoring, for example-the success of the approach demonstrates modeling improvements that can be adopted for application in future population-based research on spatially varying pollutants. We believe that this approach to capturing variation in within-region pollution highlights advances that should be adopted in the next generation of air pollution cohort studies, both for understanding contrasts at relatively low concentrations in the United States and at the higher concentrations experienced globally.
We describe a unified framework for implementing exposure prediction models of four air pollutants in six metropolitan regions that easily incorporates spatially and temporally unbalanced monitoring data. The application of a consistent modeling framework to all regions and pollutants is important for studies such as MESA Air that use exposure estimates from multiple subcohorts together in health analyses. Although we applied the same approach in all regions, we varied the model structure to best fit the data for each region and pollutant. This unified modeling approach was shown to have very good model performance (R 2 CV > 0.70) for almost all of the pollutants and regions. The architecture for this modeling approach is publicly available through the SpatioTemporal R package (Lindström et al. 2012).
As a result of the success of our spatiotemporal modeling approaches, we are confident in using these approaches to model outdoor pollutant concentrations in epidemiological analyses in this cohort and in other populations residing in these same communities. We have also found that implementation of portions of this approach, such as PLS regression of a large set of geographic covariates combined with spatial smoothing via universal kriging, can be used with good success in other regions to predict pollutant concentrations without the same level of small-area monitoring ). The NYCCAS data increased the spatial density of the monitoring data in New York, which was likely one reason for the improved model performance when the data were included. For LAC, the NYCCAS data provided particular benefit because they allowed the model to be extended through 2010, which would not have been possible with only the MESA Air data.
A majority of models included spatial smoothing in the long-term average. This suggests that although PLS scores derived from geographic covariates can predict much of the spatial variation in the data, benefit is gained from borrowing strength across observations nearby in space.
Differences in the underlying pollutant variability likely caused some of the differences seen in temporally adjusted R 2 CV . PM 2.5 tended to exhibit less small-scale spatial variation and greater temporal variability, leading to temporally adjusted R 2 CV that are noticeably lower than the unadjusted measures. The NO 2 and NO x data tended to exhibit greater spatial variability, and the similarity of the unadjusted and temporally adjusted R 2 CV values suggests that the unadjusted R 2 CV are not overly inflated by well-predicted temporal variation.
The modeling approach presented here does have several limitations. First, we used geographic covariates that were constant in time [although the modeling framework readily extends to spatiotemporal covariates (Lindström et al. 2014)]. Changes in these variables likely occurred during the study decade, but we nonetheless believe that the time-constant geographic variables still provided a useful means to predict long-term pollutant concentrations. Second, the calculation of PLS scores was limited to AQS and fixed sites because they had long time series. For LAC in particular, this means that the scores were based on a very small number of locations because the LAC model relied only on MESA Air data (plus NYCCAS data for New York). Third, the cross-validation model selection procedure conditioned on the time trends and PLS scores. Overfitting may have occurred in the cross-validation of the AQS and fixed sites, because the left-out observations were used to estimate the time trends and PLS scores. However, because the home sites were not used in estimating time trend or in defining the PLS scores, any overfitting was restricted to the AQS and fixed-site cross-validation. This provides further motivation for prioritization of cross-validation metrics from home sites when selecting the best models.

Conclusions
Our unified spatiotemporal modeling method successfully characterized outdoor concentrations of multiple air pollutants at the homes of cohort members in multiple metropolitan regions. This flexible and powerful modeling approach can incorporate an unbalanced monitoring data structure, leveraging data from supplemental monitoring campaigns that increase the spatial coverage of monitoring data. The method was easily transferred between regions and pollutants, allowing for straightforward comparison between model fits across regions. Although aspects of our techniques are particularly tailored to the unique data and resources of MESA Air, lessons learned here can be applied to understand the spatial and temporal variation of pollutants in future cohort studies. Advances in fine-scale modeling resolved in both space and time are important for the next generation of cohort studies assessing health effects of environmental agents.