Comparison of multiple PM2.5 exposure products for estimating health benefits of emission controls over New York State, USA

Ambient exposure to fine particulate matter (PM2.5) is one of the top global health concerns. We estimate the PM2.5-related health benefits of emission reduction over New York State (NYS) from 2002 to 2012 using seven publicly available PM2.5 products that include information from ground-based observations, remote sensing and chemical transport models. While these PM2.5 products differ in spatial patterns, they show consistent decreases in PM2.5 by 28%–37% from 2002 to 2012. We evaluate these products using two sets of independent ground-based observations from the New York City Community Air Quality Survey (NYCCAS) Program for an urban area, and the Saint Regis Mohawk Tribe Air Quality Program for a remote area. Inclusion of satellite remote sensing improves the representativeness of surface PM2.5 in the remote area. Of the satellite-based products, only the statistical land use regression approach captures some of the spatial variability across New York City measured by NYCCAS. We estimate the PM2.5-related mortality burden by applying an integrated exposure-response function to the different PM2.5 products. The multi-product mean PM2.5-related mortality burden over NYS decreased by 5660 deaths (67%) from 8410 (95% confidence interval (CI): 4570–12 400) deaths in 2002 to 2750 (CI: 700–5790) deaths in 2012. We estimate a 28% uncertainty in the state-level PM2.5 mortality burden due to the choice of PM2.5 products, but such uncertainty is much smaller than the uncertainty (130%) associated with the exposure-response function.


Introduction
Ambient exposure to fine particulate matter (defined as particles with less than 2.5 μm in aerodynamic diameter) is associated with mortality (Dockery et al 1993, Di et al 2017, cardiovascular (Gauderman et al 2004, Pope et al 2002, 2004, respiratory for SO 2 , NO x , CO, primary PM 2.5 and non-methane volatile organic compounds respectively (US EPA 2018a), which led to a 42% decrease in the national annual average PM 2.5 (US EPA 2018b). The reduction in PM 2.5 is associated with longer life expectancy (Correia et al 2013, Fann et al 2017, and decrease in mortality burden over recent decades (Butt et al 2017, Zhang et al 2018. To quantify the health benefits of emission reduction, an important step is to determine the ambient concentration of ground-level PM 2.5 . In general, ambient PM 2.5 is estimated using information from at least one of the following three categories: ground-based observations, atmospheric chemical transport model (CTM) simulations, and remote sensing observations. Early studies (e.g. Pope et al 2004, Jerrett et al 2005 relied on ground-based monitors to estimate PM 2.5 exposure. For regions without monitors, PM 2.5 distributions can be filled spatially using geostatistical interpolation techniques such as kriging (Jerrett et al 2005, Fann et al 2017 and inverse distance weighting (IDW, Lipsett et al 2011). Another approach is to build relationships between in situ observed PM 2.5 and land use, meteorological, and geospatial information using statistical methods (Henderson et al 2007, Paciorek and Liu 2009, Beckerman et al 2013, which can resolve the fine-scale PM 2.5 spatial gradient, but their skill depends on the availability of ground-based monitors (Lee et al 2012). CTMs simulate PM 2.5 concentrations by solving the mass continuity equations for each PM component given emissions, meteorology, and topography. CTMs have been used to estimate PM 2.5 exposure and its historical or future trends nationwide , Zhang et al 2018 and globally (Anenberg et al 2010, Silva et al 2013, Butt et al 2017, and are especially valuable for regions where long-term ground-based measurements are sparse. However, CTMs generally have coarse spatial resolution (> 12 km), limiting their ability to characterize air pollution at local scales (Wang et al 2016), and are subject to uncertain emissions, meteorology and chemical processes.
Space-based remote sensing products offer global coverage and more than two decades of continuous observations (Kaufman et al 1997, King et al 1999, Kaufman et al 2002. Satellite retrieved aerosol optical depth (AOD), which is a measure of total light extinction by aerosol, is correlated with the column mass of aerosols (Wang andChristopher 2003, Koelemeijer et al 2006). Satellite-derived AOD is generally incorporated into estimates of PM 2.5 in surface air in two ways: (1) forward geophysical approaches that rely on CTMs to simulate the relationship between PM 2.5 and AOD (e.g. Liu et al 2004, van Donkelaar et al 2006, 2016; (2) statistical approaches that either directly build a relationship between AOD and PM 2.5 (e.g. Gupta et al 2006, Al-Hamdan et al 2009, or add AOD as a predictor along with other land use, meteorological variables in regression models (e.g. Kloog et al 2014, Ma et al 2014, Just et al 2015. Satellite-derived PM 2.5 is valuable for filling the spatial gaps over regions with sparse monitors (van Donkelaar et al 2014(van Donkelaar et al , 2016, providing observational constraints to models (Anenberg et al 2017, Lacey et al 2017), and improving the predictive power of statistical models (Beckerman et al 2013). However, using satellite AOD to predict PM 2.5 , especially at shorter time scales, is challenging due to retrieval uncertainties (Martin 2008, Jin et al 2019, missing data due to the inability to retrieve over cloud and snow Christopher 2008, Levy et al 2009), and the dependence of PM 2.5 -AOD relationship on aerosol speciation, vertical distributions, and aerosol optical properties (Chin et al 2002, Gupta et al 2006, Jin et al 2019. Over the US, several PM 2.5 products have become publicly available, owing to the increasing availability of observations, both in situ and space-based, and ever-growing computing capacity. However, most epidemiological studies, for practical purposes, rely on a single exposure estimate (e.g.  (2017) find a robust association of PM 2.5 with cardiovascular diseases using multiple PM 2.5 products, but the derived relative risk factor varies. A comparative study by McGuinn et al (2017) over North Carolina finds the urban-rural difference in the relative risk varies with exposure assessment methods. However, objective assessment of the exposure models has long been challenging, mostly due to the lack of externally valid observations (Jerrett et al 2017). To address this gap, we use independent ground-based observations to evaluate seven publicly accessible PM 2.5 products for both urban and rural environments over New York State (NYS). These products include information from ground-based observations, atmospheric models and satellite remote sensing, which cover the most commonly used and up-to-date exposure assessment methods. We then estimate decadal changes in the NYS mortality burden attributable to PM 2.5 exposure using these PM 2.5 products, and assess the extent to which health impact analyses are sensitive to the choice of exposure datasets for NYS.

Data and methods
2.1. PM 2.5 products We collected seven publicly accessible PM 2.5 exposure products for NYS. These products cover the commonly used approaches to estimate PM 2.5 exposure, and most of them have been applied to health studies (table 1). Table 1 provides short names for each PM 2.5 product, along with their spatial and temporal coverage, resolution, and the data sources used to derive PM 2.5 . All products span multiple years from 2002 to 2012, except the CDC WONDER product, which is only available between 2003 and 2011. We compare differences in PM 2.5 by calculating spatial, temporal and population weighted spatial root mean squared differences (RMSD, equations (S1)-(S3) are available online at stacks.iop.org/ERL/14/084023/mmedia), and the spatial and temporal correlation coefficients (R s and R T , equations (S4) and (S5)). We define two metrics to characterize the variations in PM 2.5 across multiple products: the normalized range (equation (S6)) and the uncertainty (δ PM , calculated from the 95% confidence interval (CI) assuming at statistical distribution; equation (S9)). Detailed methods are described in the supplementary material. Satellite retrieved AOD products are used in four datasets, including the two Dalhousie products (Dalhousie_GL; V4.GL.02 and Dalhousie_NA; V4. NA.03), Emory and CDC WONDER, but the methods used to build the PM 2.5 -AOD relationship differ. The Dalhousie products use a global CTM (GEOS-Chem) to explicitly simulate the PM 2.5 -AOD relationship (van Donkelaar et al 2016). Although the Dalhousie products are designed for regional domains or larger, we evaluate their performance at the smaller spatial scale of a single state. The Emory product incorporates satellite AOD as a predictor along with other land use and meteorological variables to a machine learning model (random forest) (Bi et al 2019). The CDC WONDER product builds a linear regression model between satellite AOD and ground-based PM 2.5 , and then merges satellite-derived PM 2.5 with spatially interpolated ground-based PM 2.5 (Al-Hamdan et al 2014). Each of these approaches uses different AOD products (table 1). Four products include simulated PM 2.5 from global or regional atmospheric chemistry models. The Dalhousie products use GEOS-Chem . The CMAQ simulation of PM 2.5 was accessed from the US EPA Remote Sensing Information Gateway (RSIG) (US EPA, RSIG 2016). The FAQSD product fuses this CMAQ PM 2.5 with AQS observations using a space-time downscaling model (Berrocal et al 2010(Berrocal et al , 2011. All products except the CMAQ simulation have been calibrated or merged with ground-based observations of 24 h average PM 2.5 from the EPA Air Quality System (AQS). To assess the added value of satellite remote sensing and model, we construct another dataset that spatially interpolates the daily AQS observations within NYS using IDW.

Independent ground-based PM 2.5 observations
We use ground-based observations from the NYC Community Air Quality Survey (NYCCAS) Program to evaluate these PM 2.5 products over urban NYC. NYCCAS collected integrated samples for every 2-week period in each season from 2009 to 2016 at 150 distributed sites (figure S1) over NYC, which are chosen to represent a range of land use, traffic intensity and other characteristics (Matte et al 2013). While NYCCAS and filter-based AQS data are sampled with different instruments, Matte et al (2013) found that the two-week integrated PM 2.5_CAS mirrors PM 2.5_AQS (R 2 =0.96, slope=1.0).
Over a remote area of upstate NY, we use groundbased measurements collected by the Saint Regis Mohawk Tribe (SRMT) Air Quality Program (Benedict et al 2011). SRMT is located in northern NYS, situated in the northwest corner of Franklin County, bordered by St. Lawrence County (figure S1). There are two SRMT sites that collect hourly PM 2.5 samples continuously with a tapered element oscillating microbalance monitor during our study period of 2.3. Calculation of the mortality burden due to PM 2.5 exposure We estimate the mortality burden for PM 2.5 products by resampling them to a common grid of 0.01˚×0.01˚. We acquire the administrative boundary shapefiles from the Database of Global Administrative Areas (GADM), extract the shapefiles for NYS, and rasterize them to the 0.01˚grid, so that each grid cell belongs to one county. The excess mortality attributable to ambient exposure to PM 2.5 (ΔMort) is estimated using the health impact function (Zhang et al 2018): where y 0 is the baseline mortality rate for specific diseases; Pop is exposed population age 25 years and older; AF is the attributable fraction, which is a function of the relative risk (RR): We use the RR factors from the GBD Study 2010, based on an integrated exposure-response model of Burnett et al (2014) developed from a meta-analysis: where C is the annual average ambient concentration of PM 2.5 ; C 0 is the counter-factual level below which no additional risk is assumed; α, γ, and δ are fitting parameters. We acquired the RRs along with their 95% CIs for four causes of diseases, including chronic obstructive pulmonary disease (COPD), ischemic heart disease (IHD), lung cancer (LC), and cerebrovascular and ischemic stroke (STROKE) from the Global Burden of Disease Collaborative Network (2013). We use the county-level baseline mortality rate from the National Center for Health Statistics (CDC 2017) from 2002 to 2012 for each specific disease, following the definition of the GBD study (Lim et al 2012, Zhang et al 2018. We assign the annual county-level baseline-mortality to grid cells falling in the county. County-level population data for age  25 years are acquired from the CDC WONDER database. Since the population density varies spatially within a county, we distribute the county-level population data for each county by applying the spatial patterns acquired from the Gridded Population of the World (GPW, version 4) data from the Socioeconomic Data and Applications Center (SEDAC

Results
3.1. Comparison across PM 2.5 products at multiple scales Figure 1 compares the spatial distribution of annual average PM 2.5 from multiple products in 2002 and 2012 (2003 and 2011 for PM 2.5_CDC ). The state average PM 2.5 ranges from 9.2 μg m −3 (PM 2.5_Dal_NA ) to 12.1 μg m −3 (PM 2.5_Dal_GL ) in 2002, and 5.9 μg m −3 (PM 2.5_Emory ) to 7.9 μg m −3 (PM 2.5_FAQSD ) in 2012 (figure 2(a)). All products show similar overall patterns with spatial correlation coefficients (R S ) ranging from 0.65 to 0.90 (table 2). The Emory product shows sharp gradients of PM 2.5 along the highways, while other products show more spatially homogeneous patterns. PM 2.5_CMAQ shows the largest spread in PM 2.5 across NYS, overestimating PM 2.5 over populous urban NYC and underestimating over upstate NY (compared to AQS observations, circles on figure 1), leading to a positive bias of population weighted average (PWA) PM 2.5 ( figure 2(b)), and larger population weighted RMSD with other products ( figure S2(b)). PM 2.5_IDW , which only relies on the ground-based monitors, tends to smear urban-rural gradients, thus PWA PM 2.5_IDW is lower than other products ( figure 2(b)). Excluding the IDW and CMAQ data, the other products show consistent PWA PM 2.5 with lower than 10% differences (table S1).
While the burden-of-disease studies are typically based on annual average PM 2.5 , building exposureresponse functions for acute effects require the PM 2.5 data to accurately capture the temporal variability on shorter time scales. At the monthly scale, the temporal variabilities of statewide average PM 2.5_Emory , PM 2.5_IDW , and PM 2.5_FAQSD are almost identical (R T >0.9, table 2), all closely matching the variability of PM 2.5_AQS (R T >0.97). PM 2.5_Dal_NA and PM 2.5_CDC show weaker correlations with PM 2.5_Emory , PM 2.5_IDW , and PM 2.5_FAQSD . PM 2.5_CMAQ , however, shows weak to no correlation with all of the other products (R T <0.55). We attribute this difference to the seasonal cycle of PM 2.5_CMAQ , which differs from other products ( figure 2(c)). At daily scales, PM 2.5_Emory , PM 2.5_IDW , PM 2.5_FAQSD and PM 2.5_CDC closely match (R T >0.8, figure 2(d)). Over NYC, where ground-based monitors are densely distributed, we find consistency across all products except for PM 2.5_CMAQ at all scales, with δ PM =10% for annual average PM 2.5 after excluding PM 2.5_CMAQ (table S1).

Evaluation with independent ground-based observations
The intensive NYCCAS measurements are ideal for evaluating whether the PM 2.5 products capture the spatial patterns of PM 2.5 at the intra-urban scale. Only  Table 2. Spatial/temporal correlation coefficients (R S /R T ) for different pairs of PM 2.5 data. R S is calculated from the multi-year average PM 2.5 gridded to a common grid of 0.1˚× 0.1˚resolution (equation (S4)). R T is calculated from monthly PM 2.5 averaged across NYS (equation (S5)). The dataset best correlated with independent ground-based observations is highlighted in bold. All products are sampled at each site for comparison with ground-based observations (i.e. AQS, NYCCAS, SRMT).  figure S4).
To evaluate the performance of these PM 2.5 products over upstate NY, where the ground-based monitors are sparse, we use the PM 2.5 measurements from two SRMT sites (hereafter PM 2.5_SRMT ). All products correlate more strongly with PM 2.5_SRMT at the St. Lawrence site than the Franklin site. At the St. Lawrence site, PM 2.5_Emory correlates best with the observed PM 2.5_SRMT (R T =0.89, table 2), while PM 2.5_CDC has the smallest RMSD T (1.52 μg m −3 , figure S2(c)). At the monthly scale, PM 2.5_IDW and PM 2.5_Emory are more consistent with PM 2.5_SRMT in the cold season (November to March), and PM 2.5_FAQSD is more consistent with PM 2.5_SRMT from May to September, but overestimates PM 2.5 in winter by 33%. PM 2.5_Dal_NA overestimates PM 2.5 in winter, and underestimates in the warm season (figure S4), though it captures the seasonal cycle and the temporal variability (R T =0.81). At the Franklin site, which is far from the AQS monitors, we find PM 2.5_Dal_NA best captures the observed temporal variability (R T =0.72), though it is overall biased high by 40%. PM 2.5_Emory agrees well with PM 2.5_SRMT in summer, but is biased high in winter. PM 2.5_CMAQ shows an opposite seasonal cycle that peaks in January, leading to the lowest R T value and highest RMSD T with PM 2.5_SRMT among all products (figure S4).

Decadal changes in PM 2.5 and the associated mortality burden
Despite the differences in spatial resolution and PM 2.5 derivation methods, all products (excluding the PM 2.5_CDC ) show significant decreases in statewide average PM 2.5 by 28% (PM 2.5_FAQSD ) to 37% (PM 2.5_CMAQ ) from 2002 to 2012 (figure 1). The ensemble average PM 2.5 over NYS decreased by 33% from 10.5 in 2002 to 7.0 μg m −3 in 2012. The decreasing trend is widespread across all counties with 28%-40% decreases in the ensemble mean of county-level PM 2.5 (figure S5). The decrease in PM 2.5 is largely driven by the decrease in secondary inorganic aerosols  attributed to anthropogenic emission reductions (US EPA, 2018a, 2018b). The annual average PM 2.5 shows larger decreases before 2009, and then levels off ( figure 2(a)). The stabilization is partly due to the inter-annual variability in meteorology: the near-surface air temperature, which correlates with PM 2.5 over NYS (Porter et al 2015), is overall warmer in 2010 to 2012 than other years over NYS. Squizzato et al (2018) suggest PM 2.5 started to decline again over NYS since 2013.
The consistent decreasing trend provides evidence that PM 2.5 -related air quality has improved significantly over NYS, which should decrease the PM 2.5 -related mortality burden. We apply the integrated exposureresponse function of Burnett et al (2014) to seven longterm PM 2.5 products. We estimate a 67% decline in the ensemble mean PM 2.5 -related mortality burden (all causes combined) from 8410 (rounded to three significant figures; 95% CI due to uncertainty in relative risk factor, 4570- 12 400)   Using PM 2.5_Emory yields the largest absolute decrease in mortality burden, by 5990 (CI: 4050-6860) deaths from 2002 to 2012, while using PM 2.5_IDW yields the smallest decrease, by 5130 (CI: 3460-5685) deaths. In terms of relative change, using PM 2.5_Emory , PM 2.5_IDW , or PM 2.5_Dal_NA yields the largest decrease in mortality burden (all three at 74%), while using PM 2.5_CMAQ gives the smallest decrease (57%). The decrease in mortality burden combines decreases in PM 2.5 with decreases in baseline mortality rates: the ensemble mean PM 2.5 -related mortality burden decreases by 46% if the baseline mortality rate is kept constant at 2002 levels, and by 36% if PM 2.5 concentration is kept constant ( figure S6). Among all causes, IHD is the leading cause of PM 2.5 -related mortality in NYS, which contributes 87% of the total mortality (figure S7). The IHD related ensemble mean mortality decreases from 6230 (CI: 4. Discussion 4.1. Which is the 'best' PM 2.5 product? Determining which PM 2.5 product is the 'best' should take into account at least three criteria-resolution, availability and accuracy (table S2). The statistical satellite-based PM 2.5 product (PM 2.5_Emory ) has the finest spatial and temporal resolution, which captures some of the fine-scale patterns of PM 2.5 by incorporating land use and traffic-related information. Our evaluation with independent observations shows PM 2.5_Emory best agrees with ground-based observations for the urban area (PM 2.5_CAS ) and the rural external SRMT site that is closer to an AQS monitor. Jerrett et al (2017) compare the PM 2.5 mortality risk estimated using multiple exposure assessment methods, and they also find the best fit with statistical land use regression model. However, PM 2.5_Emory is a localized product designed for a small region (e.g. NYS in this study). The expansion of this product to wider regions is limited by the availability of ground-based monitors and consistent ancillary data. PM 2.5_FAQSD and PM 2.5_CDC are available for the entire US with daily resolution but at coarser spatial resolution (~10 km); we find PM 2.5_FAQSD performs better over urban areas, while PM 2.5_CDC performs better over remote areas (table 2). The global Dalhousie product (PM 2.5_Dal_GL ), while limited in temporal resolution, has the widest coverage, which is valuable for assessing the PM 2.5 -related global burden of disease (Cohen et al 2017). The regional Dalhousie product (PM 2.5_Dal_NA ) is available monthly for North America, and it best correlates with the rural SRMT site farther from any AQS monitor (table 2). Lee et al (2012) compare the predictive capabilities of the Dalhousie product versus spatially interpolated PM 2.5 , and they similarly find the Dalhousie product is more accurate than spatially interpolated data for areas 100 km or further away from monitors. In summary, there is no single product that stands out in all three criteria. Depending on the study design, the choice of PM 2.5 product for epidemiological studies should reflect a trade-off among these criteria. (b) δ PM as a function of distance to the nearest AQS monitor with all products included (blue) and the outlier product PM 2.5_CMAQ excluded (orange). The calculation is performed on the re-gridded PM 2.5 products with 0.1˚× 0.1˚resolution. 4.2. How do PM 2.5 exposure estimates depend on ground-based measurements? All of the PM 2.5 products in table 1 (except PM 2.5_CMAQ ) either merge AQS observations or use AQS observations to train the model, and their temporal variability is thus almost identical to PM 2.5_AQS at AQS sites (R>0.97, table 2), indicating the important role of AQS in driving the temporal variability of these products. Areas surrounding AQS monitors typically have smaller exposure uncertainties than areas where monitors are sparse ( figure 4(a)). The largest uncertainty is found over northern NYS, where only one AQS monitor is available. We find all products show better correlation and smaller RMSD T with PM 2.5_SRMT at the St. Lawrence site than the Franklin site, also suggesting higher confidence of these products over areas closer to AQS monitors. Figure 4(b) shows δ PM as a function of distance to the nearest AQS monitor, and it increases from 20% for areas close to AQS monitors (< 20 km) to 31% for areas far from monitors (> 80 km). The global geophysical satellite PM 2.5 product (PM 2.5_Dal_GL ) is regarded to have the least reliance on ground-based monitors (van Donkelaar et al 2016). The regional geophysical satellite-based product (PM 2.5_Dal_NA ), mainly differs from PM 2.5_Dal_GL in how biases are adjusted with ground-based observations. We find a large difference in spatial patterns between PM 2.5_Dal_NA and PM 2.5_Dal_GL , especially in 2002 (figure 1), suggesting calibration with ground-based monitors is important even in the product with the least reliance on ground-based monitors. Much of NYS has sufficient monitors: more than 90% of the state area contains at least one monitor within 100 km. PM 2.5 products derived with similar approaches are likely to have larger discrepancies over regions where ground-based monitors are sparse.

What is the value of satellite remote sensing and model simulations?
Our evaluation with independent observations from SRMT suggests the inclusion of satellite remote sensing improves the representativeness of PM 2.5 in remote areas (table 2). Of the four satellite-based products, only the statistical approach (PM 2.5_Emory ) captures some of the urban spatial variability measured by NYCCAS. For the geophysical approach (PM 2.5_Dal_NA and PM 2.5_Dal_GL ), satellite AOD provides observational constraints over the globe with fine spatial resolution, which outperforms unconstrained model simulations ( i.e. PM 2.5_CMAQ ), though the model simulated relationship between AOD-PM 2.5 often introduces large uncertainties (Jin et al 2019). For the AQS-Remote Sensing merged approach (PM 2.5_CDC ), incorporating satellite-AOD better resolves urban-rural gradients of PM 2.5 than the product spatially interpolated from AQS observations (i.e. PM 2.5_IDW ). For the statistical approach, the contribution from satellite AOD is small, less important than land use and meteorological variables (Bi et al 2019). Bi et al (2019) suggest larger enhancement of PM 2.5 over roads after incorporating satellite AOD, but the difference is generally small (<0.2 μg m −3 ). Other studies that use statistical models to predict PM 2.5 find that models with satellite-based AOD better predict PM 2.5 than without (Beckerman et al 2013, Ma et al 2014 . Among all products, PM 2.5_CMAQ has the least accuracy, whose monthly temporal variability is almost uncorrelated with the others, suggesting that the direct use of this CTM without observational constraints in epidemiological studies will introduce larger uncertainties in exposure estimate, consistent with Jerrett et al (2017). PM 2.5_FAQSD , which fuses CMAQ with AQS data, shows a stronger correlation with other products. It should be noted that we only evaluate one single model version (CMAQ v4.7) in this study. 4.4. Does the choice of PM 2.5 products matter for health impact analysis? Depending on the choice of PM 2.5 products, we show the estimated mortality burden varies by 43% (equation (S6)). On average, uncertainty in exposure-response function causes 130% uncertainty (equation (S10)) in the estimated mortality burden, which is more than a factor of 4 larger than the uncertainty due to the choice of PM 2.5 products (δ PM =28%). Previous studies similarly suggest uncertainties in exposure-response functions have larger impacts than uncertainty in exposure estimates (Silva et al 2013, Ford andHeald 2016). The increasing availability of observations (both in situ and space-based) is expected to better constrain the exposure estimate, thus to further reduce uncertainty in PM 2.5 estimates. All products show consistent decreasing trends in PM 2.5 , and thus decrease in the PM 2.5 -related mortality burden that varies by 26% across the different products. At low PM 2.5 levels, the relationship between PM 2.5 and relative risk is approximately linear , Di et al 2017, and thus the uncertainty in the exposure-response function should not strongly influence the long-term trend in the mortality burden. However, it should be noted that the integrated model of Burnett et al (2014) relies on pooling exposure-response functions from studies using different exposure assessment methods, and uncertainty in exposure could cause errors in building the exposure-response functions (Kioumourtzoglou et al 2014, Hart et al 2015. Besides, we only consider the uncertainties in the ambient concentration of PM 2.5 , but the measured ambient concentration differs from the true personal exposure, and such difference is expected to introduce larger biases in the estimates of relative risks (Zeger et al 2000).

Conclusions
We examined seven long-term (2002-2012) publicly available PM 2.5 products over NYS, which cover the most common exposure assessment methods used in health studies. We use independent ground-based observations to evaluate these products over both urban and rural environments. Among the seven products, the localized statistical satellite-based PM 2.5 data have the finest spatial and temporal resolution, and best accuracy over areas with dense monitors, while the geophysical satellite-based product correlates best with ground-based PM 2.5 at the remote site. Inclusion of satellite remote sensing improves the representativeness of PM 2.5 estimates in a remote area. All products, however, have limited capability to resolve the spatial patterns of PM 2.5 at the intra-urban scale captured by NYCCAS. While the uncertainty in the state-level PWA PM 2.5 is small (δ PM <5% after excluding outlier products), we find larger uncertainties over upstate NY where ground-based monitors are sparse. We highlight the importance of ground-based observations to reduce the uncertainties in PM 2.5 exposure estimate, as well as the independent (i.e. not used to develop the product) observations for objective assessment.
Despite these uncertainties summarized above, all products show a significant decrease of PM 2.5 by 28%-37% from 2002 to 2012, which we attribute to the implementation of emission controls. We conclude that emission controls have improved public health across NYS: the multi-product ensemble mean PM 2.5 -related mortality burden decreased by 5660 deaths (67%) from 8410 (CI: 4570-12 400) deaths in 2002 to 2750 (CI: 700-5790) deaths in 2012. We estimate a 28% uncertainty in the state total mortality burden due to the choice of exposure assessment method, much less than the uncertainty in the integrated exposure-response function (130%). Overall, we conclude that exposure estimates for PM 2.5 using combinations of ground-based measurements, remotely sensed and modeled data hold substantial promise, and are rapidly becoming the state of the art for exposure assessment in epidemiological and health impact studies.