Measurement-based assessment of health burdens from long-term ozone exposure in the United States, Europe, and China

Long-term ozone (O3) exposure estimates from chemical transport models are frequently paired with exposure-response relationships from epidemiological studies to estimate associated health burdens. Impact estimates using such methods can include biases from model-derived exposure estimates. We use data solely from dense ground-based monitoring networks in the United States, Europe, and China for 2015 to estimate long-term O3 exposure and calculate premature respiratory mortality using exposure-response relationships derived from two separate analyses of the American Cancer Society Cancer Prevention Study-II (ACS CPS-II) cohort. Using results from the larger, extended ACS CPS-II study, 34 000 (95% CI: 24, 44 thousand), 32 000 (95% CI: 22, 41 thousand), and 200 000 (95% CI: 140, 253 thousand) premature respiratory mortalities are attributable to long-term O3 exposure in the USA, Europe and China, respectively, in 2015. Results are approximately 32%–50% lower when using an older analysis of the ACS CPS-II cohort. Both sets of results are lower (∼20%–60%) on a region-by-region basis than analogous prior studies based solely on modeled O3, due in large part to the fact that the latter tends to be high biased in estimating exposure. This study highlights the utility of dense observation networks in estimating exposure to long-term O3 exposure and provides an observational constraint on subsequent health burdens for three regions of the world. In addition, these results demonstrate how small biases in modeled results of long-term O3 exposure can amplify estimated health impacts due to nonlinear exposure-response curves.


Introduction
There is strong epidemiological and toxicological evidence linking exposure to ambient ozone (O 3 ) with adverse health impacts (US EPA 2013). While historical research has largely focused on impacts attributable to short-term O 3 exposure, there is a growing body of literature suggesting a significant association between long-term ambient O 3 exposure and increased premature mortality, in particular from respiratory diseases (Jerrett et al 2009, Lipsett et al 2011, Zanobetti and Schwartz 2011, REVIHAAP 2013, Turner et al 2016. Consequently, exposure-response relationships, specifically derived from an analysis of the American Cancer Society Cancer Prevention Study-II (ACS CPS-II) cohort (Jerrett et al 2009), have been used to estimate the global health burden from long-term O 3 exposure (e.g. Anenberg et al 2010, Lelieveld et al 2013. Due to spatial and temporal limitations of groundbased monitors, as well as difficulty in relating the vertical column density of O 3 observed by satellites to surface values (Duncan et al 2014), global estimates of long-term O 3 exposure are generally estimated using output from state-of-the-art chemical transport models Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.
Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. mortalities from chronic obstructive pulmonary disease (COPD) were attributable to long-term ambient O 3 exposure in 2015 (Cohen et al 2017). Results from other impact studies can vary substantially due to different CTMs being employed to estimate exposure, updates to exposure-response curves, changing theoretical minimum risk exposure levels, varying baseline mortality rates, and different reference years, making inter-study comparisons of long-term O 3 exposure health burdens challenging. In addition, there is evidence suggesting that long-term O 3 exposure is not only associated with COPD, but a more comprehensive set of respiratory diseases (Jerrett et al 2009, US EPA 2013, Turner et al 2016. Some studies even report significant associations with increased premature cardiovascular mortality (Lipsett et al 2011, Jerrett et al 2013, Crouse et al 2015, Cakmak et al 2016, Turner et al 2016, Day et al 2017. When incorporating these epidemiological updates, the estimated health burden attributable to long-term O 3 exposure increases (Malley et al 2017, Shindell et al 2018, indicating that efforts to reduce long-term O 3 exposure could be more effective in reducing total air pollution-attributable premature mortalities than previously identified (Schwartz 2016).
Many regions of the world, such as the United States, Europe, and China, now have dense groundbased monitoring networks to assess compliance with air quality standards. Application of these networks to estimate long-term O 3 exposure for health impact assessments, rather than CTMs, has a number of advantages. First, this would provide a consistent framework in relation to many of the underlying epidemiological studies, which often incorporate these networks to estimate exposure of the study population (e.g. Jerrett et al 2009, Turner et al 2016. Second, the use of compliance monitoring networks to assess health burdens adds consistency between health burden quantification and regulatory air quality standard monitoring. Third, while the CTMs used to model ozone are extensively evaluated and capable of reproducing significant features of atmospheric chemistry, many of the health-based O 3 exposure metrics are high biased in model predictions (Schnell et al 2015, Seltzer et al 2017. Lastly, seasonal and spatial trends of observationally derived exposure metrics can be used in model evaluations to help diagnose drivers of bias or provide a reference for bias correction.
In this study, we estimate long-term O 3 exposure in the United States, Europe, and China for 2015 through the exclusive use of ground-based observation measurements. We then combine these results with exposure-response relationships to estimate premature mortalities attributable to long-term O 3 exposure in each region. We compare health impact estimates using multiple exposure-response curves and averaging metrics, as well as estimates from previously reported O 3 health burdens, discuss the implications of different averaging metrics, and provide seasonal population-weighted exposure concentrations that can be used for model evaluations.

Methods
To estimate premature mortalities attributable to long-term O 3 exposure, the exposure-response relationships and averaging metrics reported by Jerrett et al (2009) andTurner et al (2016) were utilized. Jerrett et al (2009) used data from the ACS CPS-II cohort and air pollution data to estimate changes in various cause-specific deaths attributable to incremental changes in the April-September average of the daily 1 h maximum O 3 concentration (6mMDA1). . This may be due to differences in study design, such as exposure estimation methods, length of follow-up, and number of events (Jerrett et al 2013). Additional cohort studies are required to evaluate the validity of globally extrapolating these exposure-response relationships.
Ground based measurements in the United States were retrieved from the air quality system (AQS) and the Clean Air Status and Trends Network (CAST-NET), in Europe from the European Union air quality e-reporting data repository, and in China from the Beijing Municipal Environmental Monitoring Center and the China National Environmental Monitoring Center. This compilation has a significant overlap with the Tropospheric Ozone Assessment Report dataset (Schultz et al 2017) in the USA and Europe, but vastly expands the extent of observations in China. All results referring to Europe include the 28 European Union Member States, plus Norway and Switzerland. Gridded surface maps were generated using an objectivemapping algorithm that combines a modified form of inverse distance weighting with a declustering scheme and trapezoidal integration (Schnell et al 2014). This algorithm has previously been used to evaluate O 3 predictions by a suite of CTMs over North America and Europe (Schnell et al 2015).
Daily gridded maps of maximum 1 and 8 h concentrations were generated and appropriately averaged to calculate each metric (e.g. annual average for the Turner et al 2016 metric). Details regarding the calculation of the population-weighted exposure concentrations, as well as the implementation and evaluation of the exposure algorithm, can be found in the supporting information (available online at stacks.iop. org/ERL/13/104018/mmedia). Both long-term O 3 exposure metrics were calculated at 0.25°×0.25°, 0.5°×0.5°, and 1°×1°grid resolutions. Since changes in the mean bias and average root mean square error of the predicted site values were generally insensitive to grid resolution (see tables S1 and S2), all results presented here utilize 0.5°×0.5°resolution.
Premature mortality attributable to long-term O 3 exposure was calculated using previously established methods (Anenberg et al 2010, Silva et al 2016, Malley et al 2017 and is summarized below. where TMREL is the theoretical minimum risk exposure level (i.e. the 'counterfactual'), ΔX is the O 3 exposure in a particular grid box above the TMREL, β is the exposure-response factor (i.e. the slope of the log-linear relationship between the change in exposure and mortality), HR is the hazard ratio reported in the epidemiological study, which links incremental changes in long-term O 3 exposure, ΔY (10 ppb in both studies, albeit a 10 ppb in a different long-term exposure metric), to changes in cause-specific mortality rates, AF is the attributable fraction of the disease burden attributable to long-term O 3 exposure, y 0 is the cause-specific baseline mortality rate, Population is the population count in a particular grid box, and ΔMort is the estimated number of premature, causespecific mortalities. Further details regarding the population and baseline mortality rates can be found in the supporting information.
Changes in cause-specific risk varied based on the underlying epidemiological study. For respiratory diseases, a hazard ratio of 1.040 (95% CI: 1.013, 1.067) and 1.12 (95% CI: 1.08, 1.16) was used, corresponding to the Jerrett et al (2009)  , a sensitivity analysis was carried out to estimate the mortality burdens without the use of a threshold. This test assumes that the standard TMREL values are limited by low concentration observations rather than true thresholds below which no impacts occur and is illustratively included to provide an upper bound on health impacts.

Results
Observationally derived estimates of the Jerrett et al (2009) averaging metric featured distinct patterns in each of the three regions considered here (figure 1). In the USA, there is a peak exceeding 60 ppb over inland southern California. Due to seasonally operating monitors, some parts of the upper northwest did not pass the internal quality assurance test and provide results. Nonetheless, 99% of the population was captured in grid boxes that did generate results, with a population-weighted O 3 concentration of 49.0 ppb (table 1). Europe featured a decreasing gradient in 6mMDA1 concentrations from south to north, with a peak of approximately 60 ppb in the Po valley region of Italy, consistent with previous analyses (EEA 2017). Ireland and Italy had the lowest and highest population-weighted 6mMDA1 O 3 concentrations, respectively (21.3 ppb and 56.7 ppb; see table S3). Overall, the population-weighted 6mMDA1 O 3 concentration in Europe was 46.7 ppb (table 1). Across China, there was an increasing gradient in 6mMDA1 concentrations from south to north, peaking near 90 ppb in the North China Plain. Large areas of western China were without monitoring data and exposure estimates were not generated. However, more than 99% of the population resides in the grid cells for which results were generated and the population-weighted 6mMDA1 O 3 concentration was 67.9 ppb (table 1).
Observationally derived estimates of the Turner et al (2016) averaging metric featured qualitatively similar spatial patterns (figure 2) when compared to the 6mMDA1 concentrations, but were quantitatively smoother. Over the USA, the difference between the 5th and 95th concentration percentiles was 16.2 ppb and 11.2 ppb for the 6mMDA1 and MDA8 concentrations, respectively. The MDA8 concentrations were not calculated for a larger number of grid cells due to some monitors going off-line during winter months. Nonetheless, with 96% of the USA population still captured by the reporting grid cells, the population-weighted MDA8 concentration was 38.1 ppb. Substantial seasonal variations occur throughout the year, influencing the spatial distribution of the annual MDA8 metric (see figures S1-S6). Peak populationweighted seasonal MDA8 concentrations occurred during the summer, with a drop of 14 ppb during the winter (table 1).
In Europe, the difference between the 5th and 95th concentration percentiles for the 6mMDA1 and MDA8 concentrations was 21.9 ppb and 13.2 ppb, respectively. A peak of seasonal MDA8 concentrations did occur over the Po valley during the summer (figures S4) but was the location of low concentrations during the winter (figure S6). Ireland featured the lowest population-weighted MDA8 concentration of 19.3 ppb, but it was anomalous when compared to the rest of the continent. While Italy still featured some of the highest population-weighted concentrations (38.8 ppb), exposure was comparable in many other European nations (table S3).
In China, the differences between the 5th and 95th concentration percentiles were 43.7 ppb and 34.5 ppb for the 6mMDA1 and MDA8 concentrations, respectively. With 99% of the population captured in grid cells for which exposure estimates were generated, the population-weighted MDA8 concentration was 45.3 ppb. Large seasonal variations, driven mainly by low winter concentrations in the North China Plain, led to a 31.4 ppb difference in the population-weighted seasonal  MDA8 concentrations between the summer and winter months.
The estimated average number of premature respiratory mortalities attributable to long-term O 3 exposure for 2015 using the Turner et al (2016) exposure-response relationship was 34 000 (95% CI: 24, 44 thousand), 32 000 (95% CI: 22, 41 thousand), and 200 000 (95% CI: 140, 253 thousand) for the USA, Europe, and China, respectively. When using the Jerrett et al (2009) exposure-response relationship, the premature respiratory mortality impacts were lower: 17 000 (95% CI: 6, 27 thousand), 20 000 (95% CI: 7, 33 thousand), and 135 000 (95% CI: 46, 210 thousand) in the USA, Europe, and China, respectively (table 2  and table S4 for European country-level estimates). While population-weighted O 3 concentrations of both averaging metrics are higher in the USA than Europe, estimates of premature respiratory mortalities attributable to long-term O 3 exposure are similar in the two regions. This is largely due to differences in population density and age-related demographics, with some contributions from differences in baseline mortality rates ( figure S7). In addition, while exposure concentrations are consistently higher for the 6mMDA1 metric than the MDA8 metric, health impacts are consistently higher when using the Turner et al (2016) exposureresponse relationship due to its larger hazard ratio and lower TMREL.
Normalized results, with impacts reported as premature mortalities attributable to long-term O 3 exposure per 100 000 people, show health burdens higher in the USA than Europe (table 3). This reflects the influence of higher population-weighted O 3 concentrations found in the USA. Respiratory mortality rates attributable to long-term O 3 exposure are quite variable between European countries (table S5), reflecting heterogeneity in population-weighted exposure concentrations (table S3), age demographics, and baseline mortality rates. For all countries considered in this analysis, baseline respiratory mortality rates are highest among the oldest age bin of the population (i.e. 80+). As a result, age demographics strongly influence the health impacts calculated here (table S6), with more than 75% of the respiratory premature mortalities attributable to long-term O 3 exposure consistently occurring among the population aged 70 and above.
When a TMREL is not used, average estimates increase in all three regions (table 2). In addition, the estimated average number of premature cardiovascular mortalities attributable to long-term O 3 exposure was 17 000 (95% CI: 9, 26 thousand), 24 000 (95% CI: 12, 36 thousand), and 129 000 (95% CI: 65, 190 thousand) for the USA, Europe, and China, respectively, in 2015. While the hazard ratio of long-term O 3 exposure is larger for respiratory disease than cardiovascular disease (averages of 1.12 versus 1.03), the larger mortality rate of cardiovascular disease drove the substantial estimated impacts.
To compare directly with the GBD project, COPD related premature mortalities attributable to longterm O 3 exposure were also estimated. Consistent with the Jerrett et al (2009) study, these calculations utilized the maximum daily 1 h average O 3 concentration spanning June-August and a hazard ratio of 1.029 (95% CI: 1.010, 1.048). Health burdens for the USA, Europe, and China in 2015 were 7000 (95% CI: 3, 12 thousand), 11 000 (95% CI: 4, 17 thousand), and 88 000 (95% CI: 32, 139 thousand), respectively. In comparison, the GBD project estimated that there were 11 600, 13 330, and 71 850 premature COPD related mortalities in the three regions attributable to long-term O 3 exposure in 2015 (HEI: Health Effects Institute 2017). The high biases in the USA and Europe and low bias in China suggests that the exposure estimates did not adequately capture the ∼40% increase in population-weighted concentrations over China  when compared to the other two regions (table 1). This leads to per capita impacts in China that are ∼45% larger than those in the USA in GBD, whereas we find per capita impacts in China approximately three times greater. When using the Turner et al (2016) averaging metric and hazard ratio of 1.14 (95% CI: 1.08, 1.21), COPD related premature mortalities were 22 000 (95% CI: 14, 31 thousand), 21 000 (95% CI: 13, 30 thousand), and 188 000 (95% CI: 116, 259 thousand) for the USA, Europe, and China, respectively.  (table 4). However, as previously noted, each study may use different TMRELs, baseline mortality rates, and reference years. Only one study, Shindell et al (2018), which utilized a bias-adjustment, generated results comparable to what is reported here. An additional reason for the differences between the results presented here and those in prior studies relates to biased exposure estimates and the interaction between these exposure estimates and nonlinear exposure-response curves. For this study, a log-linear exposure-response function (figure S8) was selected since it is most commonly applied in health impact assessments (e.g. Anenberg et al 2010, Silva et al 2013, Silva et al 2016, Malley et al 2017, Shindell et al 2018. However, other forms of exposure-response functions can be used. For example, Di et al (2017) reported a linear connection between long-term O 3 exposure and mortality and the World Health Organization suggests linear exposure-response relationships for short-term O 3 exposure studies (REVIHAAP 2013). The shape of exposure-response curves have been previously discussed in health impact studies focused on exposure to ambient fine particulate matter (Pope et al 2009, Smith and Peel 2010, Apte et al 2015. While prior studies have indeed noted that high biased O 3 predictions are consistent in models that are typically used to estimate long-term O 3 exposure (e.g. Schnell et al 2015, Yan et al 2016, Travis et al 2016, Seltzer et al 2017, an effort to translate how this bias might influence health impacts has yet to be undertaken.

Discussion
To test this interaction, the observationally derived exposure metrics were artificially scaled and the resulting health impacts were subsequently calculated. The new health impact estimates were then compared to the reference impact estimates (figure 3). Since the impact estimates are normalized to a reference case, variations are exclusively due to changes in exposure (i.e. differences in population demographics do not influence these normalized results). When using the Jerrett et al (2009) averaging metric and exposureresponse relationship, a 10% increase in exposure (i.e. a 10% high bias in the population-weighted exposure concentration) yields a 29%, 35%, and 18% increase in the estimated health impacts in the USA, Europe, and China, respectively. When using the Turner et al (2016) methodology, a 10% increase in exposure yields a 29%, 44%, and 21% increase in the estimated health impacts in the USA, Europe, and China, respectively.
In the prior example, normalized impacts for Europe were consistently most sensitive to changes in the exposure metrics, followed by the USA and then China. Population-weighted concentrations of each metric follow the same order (table 1). This relationship illustrates how a larger normalized change in health impacts occurs at the lower exposure end of each curve. For example, when using the Turner et al (2016) averaging metric, the USA and China feature average exposures of 38.1 ppb and 45.3 ppb, respectively (table 1). The exposure-response curve using the Turner et al (2016) hazard ratio (figure S8) is steeper at 38.1 ppb than 45.3 ppb, which leads to the stronger marginal response in impacts ( figure 3). This relationship is important from a health impacts perspective and should also be noted when considering how bias in exposure estimates influence health calculations in various regions.
Some uncertainties in the results presented here include a small bias in the gridding method (see figures S9 and S10). Though, the mean bias of estimated concentrations at each monitor from the complete population of observations is nearly zero for the three regions. Second, inherent in the mapping algorithm is the assumption that non-observed locations can be estimated using nearby observations. The exposure results show that the final gridded surface maps (figures 1 and 2) have coherent spatial gradients, providing confidence in these assumptions. Third, it is assumed that the gridded surface maps generated here are of sufficient resolution to capture exposure estimates. While all results presented here are at 0.5°×0.5°resolution, additional gridded surface maps of both metrics were calculated at horizontal resolutions of 0.25°×0.25°and 1.0°×1.0°. Health impact estimates at each of these resolutions show little difference (tables S1 and S2). Fourth, the Jerrett et al (2009) averaging metric used here was calculated using the April-September average of the 1 h daily maximum O 3 concentration rather than a grid-by-grid calculation to account for changes in regional O 3 seasons. This was performed to provide consistent population-weighted exposure concentrations that can subsequently be used for model and exposure evaluations. To test the influence of this assumption, population-weighted concentrations for all possible 6 month averaging periods in 2015 were calculated. The April-September average yielded the highest exposure estimates for the USA, China, and a majority of the European countries.

Conclusions
Gridded surface maps of long-term O 3 exposure for 2015 in the USA, Europe, and China were estimated through the exclusive use of ground-based monitoring networks and an objective-mapping algorithm (Schnell et al 2014). This estimation of exposure differs from the widely used method of chemical transport modeling, which can incorporate model biases. Seasonal population-weighted concentrations of two exposure metrics were presented and can be used by the modeling community for model evaluation, to elucidate drivers of model bias, and possibly as correction factors to reduce persistent bias. Using the Jerrett et al (2009) averaging metric and exposure-response function, 17 000 (95% CI: 6, 27 thousand), 20 000 (95% CI: 7, 33 thousand), and 135 000 (95% CI: 46, 210 thousand) premature respiratory mortalities attributable to long-term O 3 exposure in 2015 were estimated for the USA, Europe, and China, respectively. When using the Turner et al (2016) methodology, based on a larger, extended cohort analysis, the estimated health burdens increase to 34 000 (95% CI: 24, 44 thousand), 32 000 (95% CI: 22, 41 thousand), and 200 000 (95% CI: 140, 253 thousand) for the USA, Europe, and China, respectively. After accounting for differences in exposure-response functions, these estimated impacts are lower (∼20%-60%) than what has previously been reported. This is due to small biases in modeled exposure being amplified by nonlinear exposure-response curves, thus highlighting the importance of accurately estimating long-term O 3 exposure in health impact assessments. Overall, the results presented here provide an observational constraint of long-term O 3 exposure impacts on health burdens for three major regions of the world.