Tropospheric Ozone Assessment Report: Present-day ozone distribution and trends relevant to human health

This study quantifies the present-day global and regional distributions (2010–2014) and trends (2000–2014) for five ozone metrics relevant for short-term and long-term human exposure. These metrics, calculated by the Tropospheric Ozone Assessment Report, are: 4 th highest daily maximum 8-hour ozone (4MDA8); number of days with MDA8 > 70 ppb (NDGT70), SOMO35 (annual Sum of Ozone Means Over 35 ppb) and two seasonally averaged metrics (3MMDA1; AVGMDA8). These metrics were explored at ozone monitoring sites worldwide, which were classified as urban or non-urban based on population and nighttime lights data. values, similar North American and some European and Japanese sites, and positive trends across much of East Asia. Globally, metrics at many sites exhibit non-significant trends. At 59% of all sites there is a common direction and significance in the trend across all five metrics, whilst 4MDA8 and NDGT70 have a common trend at ~80% of all sites. Sensitivity analysis shows AVGMDA8 trends differ with averaging period (warm season or annual). Trends are unchanged at many sites when a 1995–2014 period is used; although fewer sites exhibit non-significant trends. Over the longer period 1970–2014, most Japanese sites exhibit positive 4MDA8/SOMO35 trends. Insufficient data exist to characterize ozone trends for the rest of Asia and other world regions.


Introduction to the Tropospheric Ozone Assessment Report (TOAR) and human health metrics
Tropospheric ozone is a secondary air pollutant that is detrimental to human health (LRTAP Convention, 2015;WHO, 2013a;US EPA, 2013), and crop and ecosystem productivity (Ainsworth et al, 2012;Mills et al. 2017: TOAR-Vegetation). It is also an important greenhouse gas (Myhre et al., 2013). Since the 1990s the major source regions of anthropogenic emissions -that react in the atmosphere to produce ozone -have shifted from North America and Europe to Asia (Granier et al., 2011;Cooper et al., 2014;Zhang et al., 2016). This shift, coupled with limited ozone monitoring in most developing nations, has left a number of fundamental outstanding questions: Which regions of the world have the greatest human and plant exposure to ozone pollution? To what extent is ozone changing in the developing world? How can the atmospheric sciences community facilitate access to ozone metrics necessary for quantifying the impact of tropospheric ozone on human health, crop and ecosystem productivity and climate?
To answer these questions the International Global Atmospheric Chemistry Project (IGAC) has developed the Tropospheric Ozone Assessment Report (TOAR): Global metrics for climate change, human health and crop/ecosystem research (www.igacproject.org/activities/TOAR). Initiated in 2014, TOAR's mission is to provide the research community with an up-to-date scientific assessment of the global distribution and trends in ozone from the surface to the tropopause. TOAR's primary goals are to: 1) Produce the first tropospheric ozone assessment report using all available surface ozone observations, the peerreviewed literature and new analyses and 2) Generate easily accessible, documented data on ozone exposure metrics at thousands of measurement sites around the world (urban and non-urban). Through the TOAR-Surface Ozone database (https://join.fz-juelich.de/), these ozone metrics are freely accessible for research on the global and regional-scale impact of ozone on human health, crop and ecosystem productivity and climate (Schultz et al., 2017; hereinafter referred to as TOAR-Surface Ozone Database). The assessment report is organized as series of peer-reviewed publications in Elementa: Science of the Anthropocene (this Special Feature), with this paper (hereinafter referred to as TOAR-Health) focusing on the global distribution and trends of ozone metrics relevant for human health.
Ozone affects human health through its natural presence in the stratosphere where it absorbs harmful UV radiation that could otherwise reach the Earth's surface. However, at the Earth's surface ozone is an air pollutant, and inhalation of this powerful oxidant can impair the functioning of the human respiratory and cardiovascular systems through its reaction with the lining of the lung and other surfaces in the respiratory tract (WHO 2005;US EPA 2013). The goal of TOAR-Health is to present, for the first time, the global distribution and trends of ozone using all available surface ozone observations. The analysis relies on a variety of ozone metrics that are either used by air quality managers to inform and evaluate strategies to protect human health from the adverse effects of ozone, or are useful for epidemiologists who use the daily maximum 8-hour running mean ozone metric or the daily 1 hour maximum ozone metric to quantify the impact of ozone on human health (section 2). The selection of five health relevant ozone metrics are discussed in section 3. The TOAR measurement stations and their data availability used to calculate the ozone health metrics, their classification as urban and non-urban stations, their regional aggregation and associated population demographics are outlined in section 4. Present-day spatial distributions of ozone, presented for the five ozone metrics and by population weighting, are analysed in section 5, while decadal changes and long-term trends are evaluated in section 6. Conclusions and uncertainties are presented in section 7.

Surface ozone and human health effects
A summary of the different types of studies used to determine ozone-related health effects as well as recent risk estimates are provided in this section. These include observational-based toxicological and clinical or controlled human exposure studies and statistically-based epidemiological studies. Further discussion on exposure and dose definitions and health effect studies pertinent for a wide range of exposure metrics are provided in Lefohn et al. (2017a;hereinafter referred to as TOAR-Metrics).
To facilitate comparison of published studies with the new analyses in TOAR-Health, based on the ozone metrics in the TOAR-Surface Ozone Database, we briefly describe the choice of ozone units reported in this paper. When referencing an observation in ambient air, TOAR follows World Meteorological Organization guidelines (Galbally et al., 2013) and uses the mole fraction of ozone in air, expressed in SI units of nmol mol -1 . Under tropospheric conditions the nmol mol -1 is indistinguishable from the volumetric mixing ratio, expressed in units of parts per billion (ppb). To maintain consistency with the ozone human health research community TOAR-Health uses ppb in reference to a mole fraction or mixing ratio and µg m -3 in reference to a concentration. To compare observations or metrics reported in units of ppb or µg m -3 TOAR-Health uses a conversion factor of 1 ppb = 2 µg m -3 at a reference temperature and standard pressure of 20°C and 1013.25 hPa respectively.

Ozone-related health effects from different health study types and uncertainties
Human clinical studies, conducted in the range of ambient ozone concentrations, and animal toxicological studies conducted over a wider range of concentrations (e.g. US EPA, 2013) link acute (short-term) and chronic (longterm) exposure to ozone to a range of pulmonary and cardiovascular health-relevant outcomes, such as reduced lung function (WHO 2005(WHO , 2013a. In particular, in a human clinical laboratory study, Schelegle et al. (2009) found statistically significant decrements in lung function in combination with a significant increase in respiratory symptoms following the controlled exposure of thirtyone healthy adults to ozone averaging ~7 0 ppb (~140 µg m -3 ), ranging from 50 to 90 ppb over ~6 hour exposures. Statistically significant effects on lung function and/or respiratory inflammation, but not respiratory symptoms, have also been reported at 60 ppb (120 µg m -3 ) in other clinical studies (Kim et al. 2011). The US Environmental Protection Agency (EPA) Integrated Science Assessment (ISA) for Ozone (US EPA, 2013) provides a recent review of clinical, and animal toxicological effects of ozone on various endpoints including changes in lung function, inflammation and respiratory symptoms, describing decrements in lung function at ≥60 ppb and other adverse effects at 70 ppb and higher.
There is a vast body of literature providing evidence from epidemiological studies, which are based on ambient ozone concentrations in many areas of the world, including the US, Europe, Asia and Latin America that further demonstrates that short-term or acute exposure to ozone concentrations are associated with respiratory and cardiovascular morbidity effects including inhibited lung development, new onset asthma, hospital admissions and premature mortality (e.g. Wong et al., 2008;Romieu et al., 2012;Yan et al., 2013;US EPA 2013;Bell et al., 2014). In addition, there are also many comprehensive reviews including the World Health Organization's (WHO) Review of Evidence on the Health Aspects of Air Pollution (REVIHAAP) (WHO, 2013a) and the Health Risks of Air Pollution in Europe (HRAPIE) project (WHO, 2013b), and other extensive reviews (US EPA, 2013; The UK Committee on the Medical Effects of Air Pollution (COMEAP) 2015). Many epidemiological studies suggest adverse health effects occur at lower concentrations than in clinical studies. For epidemiological studies key uncertainty issues are (i) whether or not the concentration-response function is linear throughout the range of ambient ozone concentrations and (ii) whether there is a threshold or cutoff below which no adverse effects occur (Atkinson et al., 2012;US EPA, 2013;COMEAP, 2015;TOAR-Metrics). Overall, for the quantification of ozone relevant for health impacts from short-term exposure the REVIHAAP (WHO, 2013a) recommends the use of (i) an all-year metric based on the daily maximum 8-hour running mean (MDA8; see section 2.2), ii) a linear concentration-response risk function and iii) cutoffs specifically at 35 ppb and 10 ppb, since the evidence for linearity does not extend to zero. The HRAPIE project (WHO, 2013b) recommends that a 35 ppb threshold be used to quantify mortality attributable to short-term ozone exposure, "to reflect greater confidence in the significant relationship above 35 ppb". HRAPIE also state that additional effort to estimate the impacts of ozone on health when observed ozone is greater than 10 ppb would also be justified, "owing to uncertainty regarding the presence of a threshold for ozone effects".
Another potential uncertainty for epidemiological studies that quantify short-term health effects due to ozone exposure is confounding by temperature and other pollutants (COMEAP, 2015). These confounding influences are usually accounted for in the statistical models used to calculate health effects (discussed below) with some, but not all, epidemiological studies of ozone health effects using two-pollutant models to account for confounding mainly by particulate matter (PM), PM 10 and PM 2.5 (particle aerodynamic diameter <10 µm and 2.5 µm, respectively). Several studies have suggested an additional effect of high temperatures as a modifier of the health effects of ozone exposure with increasing risk with higher temperatures (e.g. Pattenden et al., 2010;Wilson et al., 2014). There is also epidemiological evidence suggesting risk estimates are higher for older populations and are sensitive to occupational status (Bell et al., 2014).
Epidemiological cohort studies in North America have provided evidence for emerging long-term or chronic effects of exposure to ozone (Jerrett et al., 2009;Smith et al. 2009;Turner et al., 2016;Crouse et al., 2015;Di et al., 2017). However, cohort studies in other regions did not find significant adverse health effects from longterm ozone exposure (Bentayeb et al., 2015;Carey et al., 2013). However, there are methodological differences among studies. For example, Bentayeb et al. (2015) used modelled rather than measurement data, Carey et al. (2013) considered an annual average ozone concentration, whereas Jerrett et al. (2009) used the average over the April to September period or warm months only (the ' ozone' season; section 3). The size, and spatial extent of the cohort populations also vary between studies, as do the number of deaths for which the relationship with ozone exposure is assessed. As many North American studies are restricted to using ozone data from the warm season (many ozone monitors only operate during the warm season), it is difficult to identity whether a threshold exists for the effects of long-term exposure to ozone (WHO, 2013a). However, Jerrett et al. (2009) found some limited evidence of improved model fit for respiratory mortality using warm-season average 1-hour daily maximum ozone and a threshold of 56 ppb. Turner et al. (2016) also found a statistically significant relationship between annualaverage MDA8 ozone (Section 2.2) and respiratory and cardiovascular mortality, and a model with a threshold set at 35 ppb improved the association between annual average MDA8 ozone and respiratory mortality. Similar results were found using warm season ozone metrics for both cardiovascular and respiratory mortality (Turner et al., 2016).

Short and long-term ozone exposure mortality risk estimates
Based on these different types of epidemiological studies described in section 2.1, mortality risk estimates for short-and long-term exposure to ozone have been reported in the literature. Short-term exposure to ozone is often determined using a daily metric, e.g., daily mean, daily maximum 1-hour mean or daily maximum 8-hour running mean (MDA8). The MDA8 metric is one of the most common daily metrics used in many world regions, especially the United States and Europe and is also one of the most common daily metrics used for regulatory purposes (US EPA Federal Register Notice, 2015; COMEAP 2015, see section 3). To quantify the number of premature deaths associated with short-term ozone exposure, exposure-response coefficients for a given increment of ozone metric e.g., per 10 µg m -3 (5 ppb) or a 10 ppb increase in MDA8 ozone are calculated from single studies or through meta-analysis of multiple epidemiological studies. The WHO HRAPIE project recommends the use of a risk coefficient for premature all-cause mortality of 0.29% (95% CI = 0.14%, 0.43%) per 10 µg m -3 exposure to MDA8 ozone concentrations (WHO, 2013b), derived from analysis across 32 European cities, with adjustment for PM 10 concentrations. The UK COMEAP (2015) report suggests a similar but slightly higher value of 0.34% per 10 µg m -3 increase in MDA8. This value was derived through meta-analysis of epidemiological studies from a wider number of regions (Europe, Asia, North America, Latin America and Australasia), but included effect estimates that were not adjusted for other pollutants. Changes in mortality in relation to changes in ozone concentration, typically over a given time period, can be calculated using these risk estimates along with baseline mortality rates and population estimates (e.g. Fann et al., 2012;Riojas-Rodríguez, 2014;EEA, 2016;Xia et al., 2016).
For long-term exposure, based on the results from the Jerrett et al. (2009) cohort study; the HRAPIE project (WHO, 2013b) recommends the use of a risk coefficient for respiratory mortality of 1.4% per 10 µg m -3 (5 ppb) increase in average MDA8 ozone for warm season months, but only for MDA8 ozone levels >35 ppb. This effect estimate derives from a single-pollutant model, without adjustment for PM 2.5 concentration, and has also been applied in the Global Burden of Disease (GBD) project (Forouzanfar et al., 2016), albeit using the annual maximum of the three-month running mean of the daily maximum 1-hour ozone concentration, rather than the corresponding six-month metric, to account for global variation in the timing of the peak ozone season (Brauer et al., 2016, Cohen et al., 2017. A large number of studies have used exposure-response coefficients derived by Jerrett et al. (2009) for estimating global and regional respiratory-related mortality associated with long-term exposure to ozone Anenberg et al., 2010Anenberg et al., , 2012Lim et al., 2012;Fang et al., 2013;Silva et al., 2013Silva et al., , 2016Forouzanfar et al., 2015Forouzanfar et al., , 2016Shindell et al., 2012Shindell et al., , 2016. For example, the most recent GBD Study (Forouzanfar et al., 2016) estimated that 254,000 chronic obstructive pulmonary disease (COPD)-related deaths globally were attributable to ambient ozone exposure in 2015, an increase of 19% since 2005. Turner et al. (2016) derived updated exposure-response coefficients for the same cohort analysed by Jerrett et al. (2009), which have been used to update global and regional ozoneattributable respiratory mortality estimates Chossière et al., 2017).
The findings from the studies described above in relation to the pertinent ozone concentrations associated with health effects, thresholds for health effects and suitable data averaging periods have been used to construct a wide range of ozone-related health metrics of which a sub-set of five metrics is discussed in detail in section 3.

Health-related ozone exposure metrics
As discussed above, regulatory agencies worldwide employ air quality standards in the form of guidelines and limit values to safeguard human health from acute or short-term exposure to surface ozone, where ambient ozone concentrations are used as surrogates for human exposure. To date there are no standards that relate specifically to chronic or long-term ozone exposure. Figure 1 shows that many countries use levels of MDA8 (in ppb) as the basic metric, as noted in section 2.2, for creating limits/guidelines, often combined with a number of exceedances that are allowed before violation of ozone standards occurs. For example, the European Commission (under Directive 2008/50/EU) has a target value for MDA8 ozone concentrations of 120 µg m -3 (60 ppb) not to be exceeded on more than 25 days per calendar year averaged over 3 years (Figure 1). Table S1 in the Supplemental Materials shows international, regional and national ozone limits, their averaging periods and references for these values. In the US the limit value is 70 ppb and this is associated with the annual 4 th -highest MDA8 ozone value, averaged over 3 years (Figure 1, Supplemental Materials: Table S1). Some countries have alternate or additional limit values based on daily 1-hour mean or maximum ozone.
Whereas most ozone limit values are based on the atmospheric concentration in µg m -3 , the ozone monitors report the atmospheric volumetric mixing ratio, typically reported in ppb. Accurate conversion from mixing ratio (or mole fraction) (ppb) to concentration (µg m -3 ) depends on the atmospheric temperature and pressure which requires simultaneous monitoring of meteorology. For simplicity the conversion between these units is often based on a fixed a temperature and pressure (See Table S1 for this conversion). Above and in section 2, when units of ozone concentrations are quoted in units of µg m -3 , the corresponding mixing ratio is also quoted in units of ppb for simplicity using a 1:2 ratio on the basis that 1 ppb = 2 µg m -3 at a reference temperature of 20°C and standard pressure of 1013.25 hPa.
The air quality metrics in the TOAR-surface ozone database in relation to human health can be categorized as a) short-term exposure metrics based on high values of daily concentrations, e.g. the 4 th highest MDA8 ozone value in a year (4MDA8) b) short-term exposure metrics of the numbers of days in a year with MDA8 ozone greater than 70 ppb or other value e.g. 60 ppb (see examples in Figure 1), and c) short or long-term exposure metrics with a seasonal or annual averaging or summation period. All reported metric values meet a data capture criterion of >75% (TOAR-Surface Ozone Database). Often the warm season (April-September in the Northern Hemisphere and October-March in the Southern Hemisphere) is used mainly because some individual states within the US only report data during their "ozone season", and hence site coverage is lower outside of this period. The ozone season is selected because it is the part of the year with highest temperatures and strongest solar radiation and thus the time when photochemical reactions of ozone precursor gases are most likely to produce high ozone levels (Rice, 2014). The full set of health metrics have been detailed in TOAR-Metrics, and are organised according to the range of the ozone distribution to which they correspond, specifically: high ozone concentrations, high and mid-level ozone concentrations and ozone concentrations from across the distribution. Since these various health metrics are determined from different parts of the distribution of ozone concentrations, their spatial variation may be substantial. Similarly, conclusions about the extent to which various health-relevant ozone metrics have increased, decreased or not changed over time will also depend on the changes in the relative frequency of concentrations in different parts of the ozone concentration distribution that have occurred over the time period of interest (TOAR-Metrics). The features associated with different metrics are also evaluated in Lefohn et al. (2017b). To reflect the breadth of different health-related indicators used globally, five metrics (four of which use MDA8 for representing daily ozone levels) have been selected from the TOAR database and are shown in Table 1 and outlined below: 1. 4MDA8: The 4 th highest MDA8 ozone value represents peak short-term exposure and is used in the US for determining compliance with the National Ambient Air Quality Standards for Ozone. The annual 4 th highest value falls in the range of the 98 th to 99 th percentile of the 365 values of the MDA8 per year. This metric is applied to data from the 6-month warm season only to augment the number of sites in the US for which this metric can be constructed. This is a reasonable approach since, in most cases, the 4MDA8 ozone value occurs within the warm season (TOAR-Metrics and TOAR-Surface Ozone Database). A unique exception is the occurrence of high wintertime ozone in rural snow-covered regions of the western US, associated with emissions from oil and natural gas extraction (Oltmans et al., 2014). 2. NDGT70: The number of days with MDA8 ozone greater than 70 ppb also represents peak shortterm exposure. The benchmark level of MDA8 . No standard indicates that information was available to indicate that no standard was in use or defined. NA indicates that no information on standards was found, or that standards may exist but are not 8-hour standards and therefore are not included. DOI: https://doi.org/10.1525/elementa.273.f1 ozone used in the US is 70 ppb, and 75 ppb in China (see TOAR-Metrics). Standards in Europe for short-term exposure to ozone are based on limit values of 60 ppb (Figure 1). The sensitivity of this metric to a lower benchmark level of 60 ppb (i.e. NDGT60) is discussed in section 5.1, and this metric also forms the basis of section 5.2 which estimates the population exposed to NDGT60 > 25 days. This metric is calculated for all days of the year to enable consistent analyses across the globe. Clinical evidence for impaired lung function due to shortterm exposure to ozone at levels of 70 and 60 ppb is discussed in section 2.1. 3. SOMO35: The annual Sum of Ozone Means Over 35 ppb (based on MDA8 ozone) with units of ppb days. This metric is the sum of positive differences between daily MDA8 ozone values, and 35 ppb, and is accumulated over the whole year. SOMO35 characterizes the quantity of ozone relevant for the health impacts from short-term exposure and is in line with WHO recommendations for threshold limits as outlined in section 2.1. This metric is used by the European Environment Agency. 4. 3MMDA1: The annual maximum of the 3-month running mean of the daily maximum 1-hour ozone value. This metric has been used to quantify mortality attributable to long-term ozone exposure used by the GBD project (see section 2.1). The month during which this metric peaks is assigned based on the midpoint date in the 3-month averaging period, and the spatial variability of the peak 3MMDA1 month is discussed in section 5.1. 5. AVGMDA8: The 6-month or warm season often termed the "ozone season" (April to September in the Northern Hemisphere and October to March in the Southern Hemisphere) mean of MDA8 ozone. It is one of the metrics used to characterise long-term ozone exposure as discussed in section 2.1. The sensitivity of this metric to the averaging period (annual vs. warm season) is discussed in section 5.2.
While the first two metrics, 4MDA8 and NDGT70 reflect peak ozone levels, SOMO35 represents mid-high ozone levels summed annually, and 3MMDA1 and AVGMDA8 represent high ozone levels over a 3-6 month season. As noted above, the first three metrics are associated with regulatory standards in different world regions for the protection of human health to acute or short-term exposure to ozone. These five ozone metrics are calculated for all urban and non-urban ozone monitoring stations (section 4.2) available in the TOAR database, as presentday averages for 2010-2014 (section 5), as well as trends between 2000-2014 (section 6).

TOAR stations: Classifications and populations characteristics
Methods used to classify sites as urban or non-urban and the length of ozone data records for monitoring sites are described in this section, along with regional aggregation and population characteristics.

Stations and time periods
The TOAR database contains the world's largest collection of ozone metrics, calculated consistently from hourly ozone observations at all available surface monitoring sites around the globe. The data were contributed by national and regional ozone monitoring networks as well as by independent research programs. The data contributors and the methods for calculating the ozone metrics are described in TOAR-Surface Ozone Database. All data have undergone quality control and validation by the air quality agencies that collected the data (TOAR-Surface Ozone Database). For this analysis we utilize ozone metrics derived from over 4,800 monitoring sites worldwide (1,470 from North America, 1,935 from Europe, 1,239 from South, Southeast and East Asia, and 176 from other regions of the world). This study marks the first time that a range of ozone health metrics has been assessed worldwide across all available ozone monitoring sites. The number of sites and length of the period for which measurement data are available varies greatly by region. For example in Europe, stations with data for 20 years or longer are typically located in the Nordic countries, Table 1: The five health-related ozone metrics and a description of their calculation (see also Lefohn et al., 2017a) The sum of the positive differences between the daily maximum 8-h ozone mixing ratio and the cut-off value set at 35 ppb (70 µg m -3 ) calculated for all days in a year.
Annual summation ppb × days d) 3MMDA1 Annual maximum of the three-month average of daily 1-hour maximum ozone value. Three month running mean values calculated were assigned to the mid-point of the 3 month period.
Annual ppb e) AVGMDA8 6-month warm season mean of MDA8. Warm season ppb the UK, Germany, Austria and Switzerland, with only a few sites in southern and eastern Europe. Many of the stations in the US and a few WMO Global Atmospheric Watch (GAW) stations also have two to three decades of data. However, available station data, particularly from networks in developing countries, may only span a few years to a decade. There is a considerable dearth of measurements across Africa, the Middle East, South and Southeast Asia, and South America, as shown in Figure 2.
For the TOAR analysis, present-day distributions (section 5.1) cover the 5-year period of 2010-2014, with each station required to have hourly data from at least 3 years within this period. For the trend analysis in section 6, data from 2000 to 2014 were required, with no more than 2 years missing from either end of this period; longer trend periods were also considered. These constraints limit the number of stations available for trend analysis, but provide the necessary data for robust trend assessment. See the Supplemental Materials to TOAR-Surface Ozone Database for a full description of stations and the details of the data requirements.

TOAR station classification using global gridded metadata
Historically, station locations have been classified by type, such as: urban, suburban, rural, remote, background, or baseline depending on the network. However, there are limitations associated with using these classifications on the global-scale, since different interpretations of these classifications are likely used by different agencies around the world. This will introduce inconsistencies in site type classifications across continental regions. Therefore, some harmonization is required to link health-related ozone metrics to a more consistent site type classification that can be applied to all stations in the global TOAR database.
One key consideration for health-related ozone metrics is characterizing population exposure in urban or nonurban environments. This distinction between urban and non-urban sites is important for estimating population weighted exposures and obtaining insights into healthrelated ozone trends for resident populations. For example, very low ozone concentrations result from titration of ozone by nitrogen oxide (NO) in areas with high nitrogen oxides (NO x ) emissions, typically found in urban centres (Monks et al. 2015). However, deposition to the surface can also lead to similarly low values in areas characterized by strong static stability at night (Garland and Derwent 1979;Fowler et al., 2009). Also, there are large populations living in non-urban areas (e.g. 60 million people live in rural areas in the US, 30 million in Brazil, and 660 million in China (UN, 2016)); the health exposure of non-urban populations to ozone is significant and will differ from that of urban dwellers. For the purposes of the overall TOAR assessment, all the TOAR stations were categorized as urban, rural or unclassified. This classification is based on the combined use of several high-resolution global gridded data sets to provide objective criteria for determining whether a station is considered urban (see TOAR-Surface Ozone Database) and is based on the year 2010. For the purposes of this study sites are classified as either urban or non-urban. Therefore, non-urban stations include all stations except those classified as urban i.e. all rural and unclassified stations. The high-resolution datasets used for the urban and non-urban classifications are:

Human population (Socioeconomic Data and Applications Center; SEDAC/CIESIN 2015) Gridded
Population of the World (GPW), v3 hereafter (GPWv3). This is a dataset of world population gridded data at ~5 km resolution. 2. NOAA night-time lights of the world at 0.925 km resolution (Elvidge et al., 2014).
The urban site classification was set with thresholds so that sites included would be robustly "urban" across the globe. For example, sites in North America, where the population density of urban areas is much lower than in Asia, needed to be included in the urban classification. Following a number of iterations, a global "urban" classification was achieved by means of the following criteria: a) Population density >15,000 people/km 2 , and b) Nighttime lights (at 1 km resolution) ≥60 (dimensionless light intensity).
Nighttime lights within a 25 km radius of the monitoring site were also examined to rule out spurious assignments in rural areas (Note the nighttime light data becomes saturated at 63, same scale as above, see TOAR-Surface Ozone database, for details of the classification procedure). The use of two types of metadata ensures consistency in site classification globally. The subset of urban sites included 1,453 of 4,801 stations based on the period 2010-2014, and is depicted in Figure 2. These urban stations are representative of relatively dense urban environments and this classification excludes some sites that may be considered as urban by local or regional air quality managers. For example, Perth, Australia is a city with 2 million inhabitants, but its large area results in population densities near the city's five ozone monitors that do not meet the threshold criteria for urban classification. For more details see TOAR-Surface Ozone Database. The number and percentage of urban stations aggregated for different continental regions is described in section 4.3. Further independent classification of the sites based on these proxy data, in addition to tropospheric NO 2 column data at 0.1° resolution from the OMI satellite instrument (Krotkov et al. 2016), was carried out using Ward's hierarchical cluster analysis (Ward, 1963;Kaufman and Rousseeuw, 1990). This method produces six clusters that resemble the TOAR classifications (Supplemental Materials: Figures S2 and S3). This independent site classification indicates that the cut-offs for each proxy variable used to demarcate urban and non-urban (i.e. rural and unclassified) sites provides a relatively consistent classification. The six clusters into which sites were grouped in this separate cluster analysis distinguished elevated stations, rural and urban sites, with three intermediate categories.
The majority of sites grouped in the rural and urban TOAR classifications were similarly grouped in the rural and urban clusters, respectively, while the majority of unclassified sites were grouped in intermediate clusters, indicating that they had mixed characteristics.

Regional aggregation of stations and station population characteristics
The global distribution of stations assigned to the continental region divisions used in the TOAR assessment is derived from the Task Force on Hemispheric Transport of Air pollution (TF-HTAP) phase II experiment regions (www. htap.org) and is depicted in Figure 3. These HTAP II regions were used for regional aggregations in sections 5 and 6.
Gridded population data at ~5 km resolution were also assigned to each ozone station in the TOAR database and used to calculate an average regional human population density for the 15 TOAR regions shown in Figure 3  Table 2. The population density was also calculated for both urban and non-urban stations and averaged for each region. In addition, the percentages of urban stations in each region are also provided. Outside of North America, Europe and East Asia, each with over 1000 ozone monitoring sites, the number of sites in other world regions is small (14-57; Table 2), as also evident in Figure 3. The number of monitors per 100 million people further shows the greater relative coverage in North America and Europe compared to Asia and the other regions, except Oceania which has a relatively small population. The number of monitors per 100 million people is lowest in Sub-Saharan Africa. However, we note the challenge of attempting to provide a single metric of monitor coverage across all of South and East Asia, an extremely large region with 3.5 billion inhabitants. National, provincial, or city-level air quality monitoring programs have been established only recently and to varying degrees. Except for a few individual sites, validated data over longer periods are only available for Hong Kong, Japan, and South Korea (combined population of 190 million). Monitoring sites in Asia also differ from Europe and North America in terms of their representation, having a higher percentage of urban sites and greater population densities.

Present-day ozone metrics
In this section, global and regional present-day distributions of the health-related ozone metrics for urban and non-urban sites are presented (section 5.1); in addition the fraction of the population exposed to NDGT60 > 25 days is estimated (section 5.2).

Distribution of present-day ozone metrics
Present-day (2010-2014) average distributions of the five health-related ozone metrics a) 4MDA8, b) NDGT70, c) SOMO35, d) 3MMDA1 and e) AVGMDA8 at urban and non-urban stations around the world are shown in Figure 4. In general, the patterns shown for 4MDA8 (Figure 4a) and NDGT70 (Figure 4b) are quite similar. As discussed in section 3, these two metrics focus on the highest values of the ozone distribution, thus their magnitude is determined to a large extent by episodes of high photochemical ozone production. Colette et al. (2016) also find that over Europe the NDGT60 metric is closely related to 4MDA8. The spatial patterns for SOMO35 which covers a wider range of hourly ozone values and is accumulated annually, as well as 3MMDA1 and AVGMDA8, which are averaged seasonally (Figure 4c-e), also show similarities to each other. However, differences between these two groups of metrics are also apparent. High values for 4MDA8 and NDGT70 extend across the United States, Europe and East/South Asia at both urban and non-urban sites (Figure 4a, b). Many sites in the western US (especially southern California), southern Europe (notably Northern Italy and Greece), Japan, and South Korea and northern India are characterized by 4MDA8 values at or above 85 ppb (Figure 4a) and/or NDGT70 > 25 days (Figure 4b).
Globally, the number of sites with 4MDA8 > 85 ppb is similar for both site types but there are slightly more non-urban (201 out of 2943) compared to urban stations (157 out of 1396) with NDGT70 > 25 days. However, in relative terms, high values of these two metrics are more frequent at urban sites than at non-urban sites. Lower values for 4MDA8 and fewer exceedance days for NDGT70 occur in higher mid-latitude regions, notably Canada, Scandinavia and the UK. When the threshold for NDGT is lowered from 70 to 60 ppb (Supplemental Materials: Figure S1) the spatial distributions of NDGT60 are still fairly similar to those of NDGT70 across both site types. Naturally, there are more stations with NDGT60 exceeding 25 days compared to NDGT70, with occurrences of exceedances of this threshold in both eastern as well as western North America, in Central as well as Southern Europe and widespread occurrences across East Asia. There are generally insufficient data for characterizing these distributions for other parts of Asia,  Africa and South America. However, the available sites in the Southern Hemisphere tend to have lower 4MDA8 values and fewer NDGT70 exceedance days than those in the Northern Hemisphere at similar latitudes. The other three metrics (SOMO35, 3MMDA1, and AVGMDA8) also show large values in the western US, southern Europe and Asia for both urban and nonurban sites (Figure 4c-e). Higher ozone levels in the western US have been attributed to a number of factors, including intercontinental transport of Asian pollution, stratospheric intrusions and wildfires, as well as the combination of high elevations and an exceptionally deep convective boundary layer which allows high altitude ozone plumes to reach the surface (Lin et al., 2017;Langford et al., 2017). For SOMO35, there are more high values in non-urban (SOMO35 > 7000 ppb day = 45) compared to urban (SOMO35 > 7000 ppb day = 7) locations across the globe, and the percentage of SOMO35 values > 7000 is also higher for non-urban than urban sites (1.53% versus 0.50%). For the other two metrics (3MMDA1 and AVGMDA8) the difference between urban and non-urban sites is less clear (Figure 4d, e). In Europe, there is a prominent north to south gradient in these three metrics (more so than for 4MDA8 and NDGT70), with higher values in southern Europe (Figure 4c-e). A similar pattern of higher SOMO35 values in southern France compared to northern France, with a more distinct north-south gradient in SOMO35 compared to NDGT60 (termed EU60) for the longer time period 1999-2012 is reported by Sicard et al. (2016). Sites in Asia show high levels but no clear spatial patterns except in Japan where there are higher values for the three metrics in the southern compared to the northern half of Japan. Although stations are sparse in the Southern Hemisphere, those available have lower values for all three metrics relative to those in the Northern Hemisphere for both site types.
The month during which the peak of 3MMDA1 occurs at each station is shown in Figure 5. High latitude stations in the northern hemisphere tend to have maxima in the spring (April-May), due to a variety of factors including peak occurrences of stratospheric intrusions, photochemistry involving precursors built up during the winter-time and in some regions, biomass-burning either as forest fires or for land clearance (Monks et al., 2000(Monks et al., , 2015. Further south in North America, Europe and East Asia, peak 3MMDA1 is shifted later to summer months (June, July, August, with a few sites showing maxima in March and September). This change in peak timing is largely the result of increased photochemical production from anthropogenic and biogenic precursors (Monks, 2000, Parrish et al., 2012. Overall, most northern hemisphere mid-latitude sites exhibit peak ozone levels of 3MMDA1 in boreal spring or summer. The East Asian monsoon exerts a controlling influence on ozone in Southern China. In summer, southerly transport associated with clean maritime air masses and cloudy weather leads to relatively low surface ozone levels, often resulting in the annual minimum (Wang et al., 2009;Lam et al., 2001). In late autumn and early winter months, ozone  period when peak ozone concentrations (daily maximum 1h and MDA8) occur, albeit with a few exceptions. The five ozone metrics were also characterised at the continental, or regional level (as defined in section 4.3) and are shown in box and whisker plots in Figure 6, which allows differences in the distribution of values for each ozone metric, between site type and region to be explored in more detail. There were not enough stations in many of the TOAR regions to produce adequate regional representation, so only Europe, East Asia, and North America were included. All other regions had a maximum of 57 stations (Table 2), with most of the 15 regions having fewer than 20 stations. In contrast, the 3 regions in Figure 6 each had greater than 1000 stations ( Table 2).
The median and interquartile range values for the five metrics for both non-urban and urban sites are generally higher in East Asia than in Europe and North America, especially for 4MDA8, NDGT70, and 3MMDA1 (Figure 6a, b, d). For SOMO35 and AVGMDA8 (Figure 6c, e) the interquartile ranges are wider for North America and Europe than for East Asia, while the median values for North America and East Asia are similar. For most metrics median values and values for the interquartile range are lowest in Europe, which can be attributed to lower values in northern Europe (Figure 4). In general, the interquartile ranges of most metrics at non-urban sites in North America and Europe are either similar to or slightly greater than at urban sites, while the interquartile ranges for non-urban sites in East Asia are similar to or slightly less than for the urban sites (Figure 6). Maximum (whisker) values for the two peak metrics 4MDA8 and NDGT70 are also higher for East Asia but are more similar across the regions for the other three metrics across both site types. Maximum outlier values are generally higher at non-urban sites than at urban sites across the three regions, but are approximately equal for 4MDA8, 3MMDA1 and AVGMDA8 in North America (Figure 6). These results qualitatively agree with the findings described above for Figure 4, and may reflect the higher ratio of non-urban to urban sites for North America and Europe. The highest category shown in Figure 4 frequently lies somewhere in the range of maxima outliers in Figure 6, which in turn represent values above the 95 th percentile. The most notable exception occurs for NDGT70 in East Asia for both site types where it corresponds roughly to the 75 th percentile value.

Proportion of monitored population exposed to high ozone levels and changes between 2000 and 2014
Present-day ozone levels (section 5.1) and trends (section 6) in different regions may be associated with very different population densities. As noted earlier, there are much higher urban population densities in Asia compared to North America and Europe. In this section, we consider the population within a 5 km radius around a TOAR ozone monitoring station; hereafter referred to as the "monitored population", and estimate their exposure in terms of exceedances of one metric: NDGT60. Presentday distributions showing the percentage of the monitored population exposed to NDGT60 for more than 25 days per year were produced for countries within Europe and by state for the US (Figure 7). This assumes that ambient ozone concentrations measured at the monitor location are representative of population exposure (see e.g. Meng et al. 2012; US EPA 2013 for a discussion of the validity of this assumption). The number of stations in each country or state used in the analysis, as well as the percentage of the country/state population within 5 km of an urban or non-urban TOAR station (usually under 5% but up to 8% for urban stations) is shown in the Supplemental Materials ( Figures S4 and S5). However, because the monitored population of a state or country is estimated over a small geographical area these results may not be representative of the total population of that European country or US state. Present day (2010-2014 average) and 2000-2014 trend data were used to estimate ozone levels in 2000.
The percentage of the non-urban monitored population exposed to ozone levels >60 ppb for 25 or more days is generally either similar to, or greater than the corresponding percentage of the urban monitored population in both 2000 and 2010-2014. In Italy and Greece, the monitored population is greater than or equal to 40% for these two time periods. There is a decrease in the percentage of the population exposed to NDGT60 > 25 days per year in Europe at both urban and non-urban stations between the year 2000 and the period 2010-2014 (Figure 7a) which is up to 30 or 40% in several countries. In many southern states in the US more than 50% of the monitored population is exposed to NDGT60 for >25 days per year in both urban and non-urban areas, in 2000 and in 2010-14, with several northern states experiencing such exposures only in 2000. Similarly, there is a decrease in the percentage of the population exposed to NDGT60 > 25 days per year between 2000 and 2010-2014 across states in the US at both urban (typicallly ~2 0%) and nonurban sites (up to 40%) (bar two US states for non-urban locations; Figure 7b).

Long-term trends of ozone metrics relevant to human health
The global distributions of 15-year trends for 2000-2014 for the five ozone metrics are presented in this section. The commonality and differences in trends amongst the five metrics are then outlined. The sensitivity of the AVGMDA8 metric to averaging period and of the five metrics to the trend period and its length are discussed. These results are compared with emissions trends and other trend studies in the literature for similar periods.

Ozone metric trends for 2000-2014
Trend analysis was carried out for four time periods: 10 years, 15 years (referred to as the main trend period), 20 years and >25 years. The main 15-year trend period covering 2000-2014 was selected so that a greater number of sites (mainly those in East Asia, where data were unavailable for the two longer trend periods) could be included. The 20-year time period covered 1995 to 2014 with the criteria that at least 16 years of data are present with no more than 2 missing years either at the beginning or end of the period. In addition, for stations with substantially longer time-series, trends for 1970-2014 were calculated. In this 45-year time period most sites have less than 35 years of data and very few have data prior to 1975. Thus trends were calculated with the criterion that a site must have at least 25 years of data. To be able to include more sites with shorter data sets, a decadal change from 2005 to 2014 (inclusive, and with at least 7 years of data) was also calculated. The terminology "change" is used to reflect the difficulty of annual trend detection with less than a decade of data (i.e. 7-10 data points) e.g. Fischer et al. (2011). In this section, only trends for the period 2000 to 2014 are shown. However, a full set of figures including the longer and shorter periods is included in the Supplemental Materials ( Figures S7-S9). Trend analysis is based on the non-parametric methods described in detail by TOAR-Metrics, in which a Mann-Kendall test is used to determine the statistical significance (p-values) associated with each trend calculation. The Theil-Sen estimator is applied to calculate a quantitative trend estimate for the five ozone metrics for all sites. These statistical methods were applied uniformly across all ozone time series in the TOAR-Surface Ozone Database, as described in TOAR-Surface Ozone Database. For the TOAR assessment the following terminology is used when describing trend results: a trend associated with a p-value ≤ 0.05 is a statistically significant trend; a trend with a p-value of 0.05-0.1 is referred to as indicative of a trend; a trend value with p-value = 0.1-0.34 is described as having a weak indication of change; and a trend with a p-value > 0.34 is referred to as weak or no change. These bounds on p-values are based on analysis of the regional average daytime ozone trend across eastern North America by Chang et al. (2017) using a generalized additive mixed model (GAMM). They found that trends at individual sites with p-values up to 0.34 consistently displayed cohesive regional relationships with the pattern of trends whose p-values were < 0.05.
The  Figures S7-S9). The results from the individual site trends are summarized for North America, Europe and East Asia in Figure 8 using vector plots for each of the five metrics. Note that the East Asia region has stations located mainly in Japan and South Korea and a number of stations in Hong Kong; elsewhere in East Asia and notably mainland China there are insufficient sites with available data for trend analysis which results in an uneven distribution of stations across East Asia. The distributions of trends and changes (positive/negative, including p-value) for all sites in 15 continental regions (Figure 3) for each of the five metrics are also shown in Figure 9.
As discussed in section 3, the different health based metrics relate to various parts of the ozone distribution; with the 4MDA8 and NDGT70 metrics most sensitive to peak ozone levels. The geographical distributions of the trends in these two metrics are fairly similar (Figure 8a, b; see also Section 6.2), as was also reflected in the present-day levels (Figure 4a, b). Most stations in North America and a number of stations in Europe exhibit significant negative trends with rates of decrease equal to or greater than 1 ppb per year for 4MDA8 and 1 day per year for NDGT70 (Figures 8a, b and 9a, b). In particular in the US the majority of sites (up to 70%) show statistically significant (p < 0.05) reductions in these two metrics. However, in Europe whilst up to ~1 8% of sites show a statistically significant downward trend, a much higher fraction of non-significant trends (p > 0.05) i.e. weak negative to weak or no change are seen for these two metrics, especially at the urban sites (Figure 9a, b). These downward trends are in broad agreement with the results in Section 5.2 showing that a considerable number of European countries and US states experienced a decrease in the fraction of the population exposed to NDGT60 > 25 days. This is an indication of reduced exposure to peak levels of ozone related to photochemical episodes over the 2000-2014 period in these two regions. Non-significant trends are likely due to large interannual variability in ozone due to meteorology (section 6.4). Very few stations experience statistically significant positive trends in the U.S. or Europe in either of these peak exposure-related metrics (Figures 8a, b  and 9a, b). A few sites in Spain, both urban and nonurban, show statistically significant positive trends for 4MDA8 (Figure 8a).
The distribution of trends in 4MDA8 and NDGT70 in East Asia differs from that in North America and Europe. A number of East Asian stations (~10-30%) exhibit statistically significant positive trends at both site types (Figure 9a, b). In particular, for South Korea and Hong Kong, most urban and non-urban stations exhibit significant positive trends in 4MDA8 (up to 2 ppb per year) and NDGT70 (up to 2 days per year) (Figure 8a, b). No stations in these two regions of East Asia show statistically significant decreases for these two metrics (Figure 9a, b). For both South Korea and Hong Kong, statistically significant increases are more prominent at urban (~50-80%) than at non-urban (up to 40%) sites (Figure 9a, b). In Japan, fewer stations (~5-10%) experience statistically significant positive trends in both metrics, but there are more sites (up to ~2 2%) with significant negative trends, typically in more southerly locations and also a high fraction of sites that indicate weak to no change (Figures 8a, b and 9a, b). However, for Japan the results are strongly sensitive to the trend period selected (section 6.4). The significant and large positive trends across parts of East Asia contrast with those in North America where significant negative trends are more typical; whilst for Japan the larger number of stations with non-significant weak to no changes are more similar to changes experienced at urban stations in Europe. Overall, the trends at sites in North America and Europe and southern parts of Japan indicate that those populations have experienced a reduction in exposure to short-term peak levels between 2000 and 2014, although at many sites in these regions the decreasing trends are non-significant; but for the same period the peak exposure levels have risen in Hong Kong, South Korea and other parts of Japan. Globally, the proportion of non-urban and urban sites with statistically significant negative trends is slightly larger for 4MDA8 compared to NDGT70. The other three metrics: SOMO35 (representative of mid-high ozone levels), and 3MMDA1 and AVGMDA8 (sensitive to high ozone levels) are summed annually or averaged seasonally. For these three ozone metrics the results are somewhat similar to those for the two peak metrics for non-urban sites but are more mixed for urban sites in North America and Europe (Figures 8c-e and 9c-e). In common with the peakfocused metrics, 3MMDA1 and AVGMDA8 show significant negative trends for many (up to 60%) nonurban sites in North America (Figures 8d, e and 9d, e). For SOMO35, while most non-urban sites in North America exhibit significant negative trends, this proportion is smaller than for 3MMDA1 and AVGMDA8 (Figure 9c-e). For Europe, ~2 0% of non-urban sites show significant negative trends in these three metrics (Figure 9c-e). However, a considerable number of urban sites in North America and a large proportion of sites in Europe for both site types have non-significant trends (weak negative or no change) in these three metrics (Figures 8 and 9). Unlike for the two peak metrics, in both North America (especially Canada) and Europe (mainly southern Europe), the SOMO35 and AVGMDA8 metrics exhibit positive increasing trends at 5-15% of the urban stations (Figure 9c, e). These findings further suggest reduced exposure to high levels of ozone in parts of North America and Europe, but increased exposure to moderate to high ozone levels at a small proportion of urban locations, although many sites in these two regions have non-significant trends.
For East Asia, the results for these three metrics are again different from the other two continental regions. In general, the trends in SOMO35, 3MMDA1 and AVGMDA8 show similar patterns to those of 4MDA8 and NDGT70 for non-urban and urban stations (Figures 8, 9). The majority of South Korean and Hong Kong sites show significant positive trends in these three metrics for nonurban (up to 50%) and notably for urban (60-80%) sites. For Japan, the results are again mixed with similar proportions of sites showing significant positive and negative trends for SOMO35 and AVGMDA8, but more significant negative trends for 3MMDA1 (~30%), in addition to a large fraction of sites showing weak or no indication of change for both site types. The results for South Korea and Hong Kong indicate that both the high peaks and the mid-high ozone levels have increased in East Asia during 2000-2014, whilst the situation for Japan is less clear. It is important to note that the sites in some regions, such as East Asia, are extremely unevenly distributed (as shown in Figure 8) so that the trends in ozone metrics likely do not apply to the entirety of these regions (Chang et al., 2017). Sites in South and Central America and Oceania also show significant negative trends, while sites in the Middle East depict mainly positive trends or weak to no change (Figure 9). However, the number of sites in these latter regions is very small (Figure 9) and thus robust conclusions cannot be drawn for these regions.
The trends from 2000 to 2014 in the five ozone health based metrics are summarized for East Asia, Europe and North America in Figure 10, which shows the median, interquartile range and spread in trend values for the stations within each region. Data from all stations are included, regardless of the p-value for the trend. For Europe and North America for each metric and site-type, the median and most (if not all) of the interquartile range lies below zero. The spread in the interquartile ranges for these two regions are also similar (although slightly smaller interquartile ranges for Europe). However, median values of trend estimates for East Asian urban sites are predominantly positive, especially for the SOMO35 and AVGMDA8 metrics (as discussed for Figure 9). East Asia, which is the region generally with the smallest number of stations for a given metric, typically has much larger interquartile ranges compared to the other two regions. This arises due to the mixed patterns of positive changes for South Korea and Hong Kong and positive and negative changes for Japan discussed above, indicating that this region with its sparsity of sites is not very homogeneous with respect to ozone trends. In general, across the five metrics the largest positive trend estimates (whiskers maxima in Figure 10), as expected, are found in East Asia compared to the other two regions, for both site types. The largest negative trend estimates (whiskers minima in Figure 10) for most metrics except NDGT70 and AVGMDA8 also occur in East Asia for both non-urban and urban site types.

Differences and commonalities in trends across the five ozone metrics
The behaviour of the trends across the five metrics is given in Table 3. For both non-urban and urban sites, 59% ( Table 3) of sites have a common trend in all five metrics, i.e. all metrics show either a significant positive trend, a significant negative trend, or all are non-significant.
Hence for approximately 41% of sites included in this analysis, conclusions about the trend in health-relevant ozone depend on the specific metric selected. Globally, the most common difference in trends across the five metrics is a significant decrease in 4MDA8 (4.2% of non-urban, and 4.7% of urban sites), and non-significant trends in all other metrics, followed by a significant decrease in both peak metrics 4MDA8 and NDGT70 (2.8% of nonurban, and 2.0% of urban sites), and non-significant trends in the three other metrics. The largest proportion of sites (non-urban and urban) with these two common trend differences is in North America (~9%), whilst East Asian sites do not show any occurrences of a significant decrease in both peak metrics 4MDA8 and NDGT70 and non-significant trends otherwise. Other combinations of trend patterns across the five metrics occur at 34% of all sites ( Table 3). There are no occurrences of a statistically significant decrease in either 4MDA8 alone or in 4MDA8 and NDGT70, combined with an increase in the other three metrics. The degree of commonality between the two peak metrics, 4MDA8 and NDGT70 is also assessed in Table  4, which displays the percentage of sites at which 4MDA8 and each of the four other metrics have common trends. Globally, a common trend between 4MDA8 and NDGT70 is estimated at 86% and 84% of all nonurban and urban sites, respectively. For comparison, 73%/75%, 79%/80% and 76%/73% (Table 4) of nonurban/urban sites globally have a common trend between the 4MDA8 metric and SOMO35, 3MMDA1, AVGMDA8, respectively. At non-urban sites in North America the largest common trend is a significant downward trend in 4MDA8 and the other metric, whilst at other locations a non-significant trend in 4MDA8 and the other metric is the most common pair combination (Table 4). For urban sites in East Asia, a significant increasing trend is more common than a significant decreasing trend for all pair combinations (Table 4). In addition, at North American sites, the difference between the number of sites with common trends in the peak metrics (4MDA8 and NDGT70) and between 4MDA8 and the other three metrics is larger than for Europe and East Asia ( Table 4).

Ozone trend sensitivities to averaging period and to trend period
The averaging period i.e. seasonal vs. annual over which a metric is calculated can also impact the trend that is estimated. This sensitivity is examined in relation to the averaging period for the AVGMDA8 ozone metric, which has been used for long-term warm season exposure (section 3). While for 73% of sites globally both summer and annual average MDA8 increase significantly, decrease significantly, or have non-significant trends, at 27% of sites globally the 2000-2014 trend in the summertime average MDA8 metric (i.e. AVGMDA8 as used in this study) is different to the trend estimated using the annual average MDA8. Hence, the averaging period is an important factor in terms of the trend result for AVGMDA8. Both averaging periods have been used for MDA8 by Turner et al. (2016) (Section 2.1). The most common differences between summer and annual average MDA8 is a significant decrease in summer-average MDA8, and no significant trend in annual-average MDA8 (9% of sites globally), and a significant increasing trend in annualaverage MDA8 and no significant trend in summer-average MDA8 (6% of sites globally). In particular, for both urban and non-urban sites, there is a larger proportion of sites with increasing trends at European and North American sites for annual-average MDA8, and a smaller proportion of sites with significant decreasing trends (Supplemental Materials: Figure S6). There is also a large percentage of sites with non-significant trends for annual-average MDA8.
The analysis so far has focused on the 2000-2014 time period. Changes over 10-years, and trends for 20-years and >25 years are shown in the Supplemental Materials ( Figures S7-S8 and S9 for 4MDA8 and SOMO35 respectively). The results for these other three trend periods are broadly consistent with those in Figures 8 and 10, depicting the main features noted in section 6.1, i.e. significant negative trends at many stations in North America and Europe and significant positive trends at many sites in East Asia. Considering the shorter change period of 2005-2014 there are more stations that exhibit a non-significant change for both metrics over North America and Europe (Supplemental Materials: Figures S7, S9). However, for sites with negative 4MDA8 trends in Europe, these trends are typically steeper for the 2005-2014 period (-2 ppb per year), compared to 2000-2014 (-1 ppb per year) (Supplemental Materials: Figure S7). For the SOMO35 metric, both steeper negative and positive trends are shown for Europe for 2005-2014 compared to 2000-2014 (Supplemental Materials: Figure  S9). In Japan, like for North America and Europe there is also a tendency towards steeper significant negative trends in 4MDA8 of -2 ppb per year for 2005-2014; and for the rest of the East Asian sites there are more sites with non-significant trends (Supplemental Materials: Figures S7, S8).     Table 5 for all five metrics. For most sites there is no change in terms of statistical significance or direction of trend between the 2000-2014 and 1995-2014 periods (73-78% of all sites globally; Table 5). For the remaining sites, where a change occurs between 1995-2014 and 2000-2014, there are no occurrences of a shift from a statistically significant increasing trend to a statistically significant decreasing trend and vice versa. The broad changes in 4MDA8 and SOMO35 between the different trend periods described above also apply to the other three metrics. For North America, the largest change is from a non-significant change in 2000-2014 to a statistically significant negative trend in 1995-2014 at both site types for all five metrics (ranging from 9-30% of sites across metrics and site types; Table 5). In Europe, the largest change for 4MDA8 and NDGT70 is also a change from non-significant in 2000-2014 to a negative trend in 1995-2014 (11-20% of sites; Table 5). For the other three metrics for Europe the results are more mixed often with similar numbers of non-urban sites showing changes from non-significant to significantly negative and vice versa between the two time periods. For these same three metrics, at urban European sites the largest change is from non-significant in 2000-2014 to significantly positive in 1995-2014 (5-9% of sites; Table 5). This may suggest that emission controls have been somewhat effective in reducing positive trends as suggested by Colette et al. (2016) when comparing 1990-2001 and 2002-2012 at rural stations in Europe.
For East Asia (mainly Japan), as for urban sites in Europe, some sites (7-44% depending on metric and site type; Table 5) exhibit a change from a non-significant trend in 2000-2014 to a significant increasing trend over the period 1995-2014, whilst some other sites (5-27%; Table 5) switch from a statistically significant negative trend for 2000-2014 to non-significant change for 1995-2014 for both non-urban and urban sites. As highlighted earlier, these changes in East Asia almost exclusively reflect changes that have occurred at Japanese sites, due to the lack of monitoring sites in other East Asian countries in 1995-2014 (96% of sites with sufficient monitoring in both the 2000-2014 and 1995-2014 periods were in Japan). A number of non-urban sites in Oceania also depict a change from a non-significant change in 2000-2014 to a statistically significant trend in 1995-2014, but the total number of sites is too few for robust conclusions to be drawn.

Comparison with emission trends and other studies of these metrics
These findings for the five metrics are qualitatively in agreement with the documented trends in ozone precursor emissions from emission inventories, with significant reductions of NO x and Carbon Monoxide (CO) emissions in North America and in Europe, and increases in most of East Asia, notably China (e.g. Granier et al., 2011, Zhao et al., 2013. For North America controls on power generation and motor vehicles were implemented in the late 1990s and early 2000s, leading to reduced NO x and CO emissions thereafter (Figure 3; Granier et al. 2011). European emissions have shown a steady decline since 1980 (Figure 2, Granier et al. 2011) which has continued between 2000(EEA, 2016. Hence, these emission controls appear to have impacted ozone trends in 2000-2014 for the five metrics. NO x emissions in China increased at a rate of 5.9% for the period 1995-2010 (Zhao et al., 2013). Subsequently, in 2011 a stringent NO x emission standard for thermal power plants was issued in China (Zhao et al., 2013), which has caused a decrease in tropospheric column NO 2 regionally as observed from satellites (Duncan et al., 2016;Krotkov et al., 2016;Liu et al., 2016;Miyazaki et al., 2017;Van der A et al., 2017). This recent reduction in NO x emissions, however, has not stopped the observed increasing trend of ozone in parts of mainland China and Hong Kong where long-term ozone observations are available (Ma et al., 2016;Sun et al., 2016;Wang et al., 2017).
Reduced NO x emissions are expected to lead to both a reduced number of high peaks but also reduced low minimum values, resulting in a narrowing of the ozone distribution (Simon et al., 2015). Statistically significant reductions for peak ozone metrics at a substantial number of sites in our study affirms the impact of emission reductions on reducing peak ozone levels. Also, the more mixed direction of change for urban compared to nonurban stations for some metrics, notably SOMO35 in North America and Europe, may suggest the influence of NO x emissions reductions over this 15-year period in increasing ozone minima, due to less ozone titration by NO. Thus, urban sites may show mixed positive and negative trends depending on the chemical environment and the extent to which that location is NO x limited vs. NO x saturated. However, it is also noted that there are some stations in Japan which show negative trends, despite increasing NO x emissions for the East Asian region. Furthermore, it is the lower end of the ozone distribution that is most strongly affected by this process, hence other drivers, especially regional to local meteorology may influence these trend results. In particular, interannual variability in meteorology may well be the cause of the large number of insignificant trend results found in all regions and globally (section 6.1), and noted in a European context by Colette et al. (2016). In addition, despite the extensive data quality assurance that went into the TOAR database, issues with changed calibration or operating procedures over time remain (TOAR-Surface Ozone Database). Caution should be exercised when stations in one small region show more mixed trend signals than elsewhere. Chemistry-transport model studies of the impact of recent changes of ozone precursor emissions, both regionally and globally as outlined above, consistently show that the local response of ozone levels has been a decrease in North America and Europe and an increase in East Asia (Verstraeten et al., 2015;Zhang et al., 2016;Lin et al., 2017). Furthermore, other measurement and model studies comparing the response of mid-range vs. high ozone values show that the ozone decreases in the US and Europe are more pronounced for the highest ozone values, while sites in China show ozone increases for both mid-range and high ozone values (Derwent et al., 2010;Simon et al., 2015;Lefohn et al., 2017b). Hence our trend results agree with these findings in terms of identifying regions with substantial increases or decreases in high levels of ozone displayed by the five metrics.
Specifically, examining the 4MDA8 metric regionally, Lefohn et al. (2017b) reported that the majority of sites analyzed in Europe (276 sites) and in the US (196 sites) experienced reductions at the high-end of the hourly ozone concentration distribution, leading to negative 4MDA8 trends at a majority of sites in the US and some sites in Europe, assessed over at least 20 years up to 2013/2014. It was also found that the sites in Europe experienced substantially fewer occurrences of statistically significant (increasing or decreasing) trends than the US sites. In contrast, at five of six Hong Kong sites the 4MDA8 metric increased significantly. These results, that cover a similar time period to that in our study, are generally in good agreement with our findings presented in Figures 8a, b and 9a, b (section 6.1). For SOMO35, our results (section 6.1) are also similar to those reported in Lefohn et al. (2017b). In their study, at most EU and US sites analyzed either a negative trend or no change was found, whilst SOMO35 increased significantly at four of the six Hong Kong sites.
In a recent study of ozone trends across Europe from the European Monitoring and Evaluation Programme (EMEP) the 4MDA8 and SOMO35 metrics were also examined (Colette et al., 2016). For the period 2002-2012, statistically significant decreases were observed at rural EMEP stations for 20% and 50% of the sites for 4MDA8 and SOMO35, respectively. In our study, for both 4MDA8 and SOMO35 statistically significant decreases were found at 20% of non-urban sites in Europe (section 6.1). Median ozone decreases were 12% for 4MDA8 and 30% for SOMO35 for this period across the EMEP network, which are comparable to median ozone changes in these two metrics for 2000-2014 in our study (not shown). The largest negative trends were observed at the stations with the highest levels of peak ozone in the beginning of the trend period (Colette et al., 2016). The EMEP stations are located exclusively in rural areas, and therefore the estimated trend across these stations does not capture the full range of non-urban ozone environments across Europe. Examining rural sites in France over the period 1999-2012, Sicard et al. (2016) showed larger regionaverage statistically significant negative trends in SOMO35 compared to urban sites. The direction of the trend in SOMO35 was also more variable across urban sites compared to rural sites as similarly found across Europe in our study (section 6.1). Their findings were similar for the NDGT60 metric, although more urban stations had a negative trend in this metric compared to SOMO35 during this period (Sicard et al., 2016).
In addition, for the EMEP stations the sensitivity of the trend results for 4MDA8 and SOMO35 to two different time periods 1990-2001 and 2002-2012 was analysed by Colette et al. (2016). The decreasing trend for 4MDA8 was quite steady over the 1990-2001 and 2002-2012 periods with 11% and 12% median relative decreases across the network, respectively. On the contrary, SOMO35 trends were very different for both periods with a median trend of a 1.6% relative increase over 1990-2001, whereas a sharp 30% decrease was observed for the 2002-2012 period; highlighting the effectiveness of European emissions controls for this health metric (Colette et al., 2016). The differences between both metrics lies in the stronger sensitivity of SOMO35 to high ozone levels but also to mid and baseline levels. When comparing our trends results there were more sites in Europe with a statistically significant positive trend in 1995-2014 compared to 2000-2014, but only for urban locations (Section 6.3). This further highlights the strong sensitivity of trend results over Europe as well as for North America and notably for Japan to the trend period.

Conclusions
The goal of this paper, TOAR-Health, is to present the global distribution and trends of ozone using all available surface ozone observations and relying on ozone metrics that are relevant to human health. Using the TOAR-surface ozone database, global and regional present-day distributions and trends for the period 2000-2014 are analyzed for five health relevant ozone metrics. For analyses of present-day distributions (averaged for 2010-2014) data from 4,801 global monitoring sites were utilized; whilst for trend analysis for 2000-2014 data from 2,600 sites were used. These ozone health metrics are derived based on of clinical and epidemiological studies that examine health outcomes associated with short-and long-term exposure of surface ozone that are typically based on daily maximum 8-hour running mean ozone (MDA8) mixing ratios. The five metrics are: the 4 th highest MDA8 (4MDA8); the number of days per year with MDA8 > 70 ppb (NDGT70); annual Sum of Ozone Means Over 35 ppb (SOMO35); annual maximum of the 3 month running mean of daily 1-hour ozone (3MMDA1); and the warm season average MDA8 (AVGMDA8). The first three of the five metrics are also associated with regulatory standards to protect human health from short-term exposure to ozone. The 4MDA8 and NDGT70 metrics reflect peak ozone levels. SOMO35 represents mid-high ozone values summed over the whole year. The last two metrics (3MMDA1/AVGMDA8) are averaged annually/seasonally to provide a perspective on long-term exposure. These health metrics are examined for two globally applicable objective site categories: urban and non-urban, that are determined based on the use of gridded metadata associated with urban characteristics: population and night-time lights. Globally, 1,453 sites are classified as urban and 3,348 as non-urban according to this categorization for present-day. A further semi-objective classification using hierarchical cluster analysis is in line with these classifications, supporting the metadata approach.
For the present-day 5-year average period (2010-2014), the distributions of the two metrics that measure peak concentrations show similar patterns across the world regions, and are also similar for both site types. Stations located in the major ozone precursor emissions regions of North America, Europe and East Asia display the highest values for the two peak metrics (4MDA8, NDGT70) notably in California, parts of southern Europe and across East Asia. For the other three metrics (SOMO35, 3MMDA1, AVGMDA8) there is a clearer North-South gradient for Europe and in Japan, and a hotspot of peak values in California. Overall, across these three continental regions, East Asia has the highest and Europe the lowest values for the distributions of present day ozone for the five metrics at the urban and non-urban sites within each region. The month of maximum ozone for the 3MMDA1 metric occurs in Northern Hemisphere spring at the high mid-latitude sites and in summer for most other mid-latitude sites in the US, southern Europe and East Asia. The seasonal behaviour of the East Asian winter monsoon leads to a November ozone peak in Hong Kong. Thus, in most northern hemisphere locations peak ozone levels are most likely to occur during the warm season except for parts of East Asia.
The percentage of the population within 5 km of an ozone monitoring station exposed to NDGT60 > 25 days per year (based on populations around TOAR monitoring stations) show similar spatial patterns to the results from the five ozone metrics, with higher values in southern Europe and in California. All countries in Europe and most US states also show a decrease from 2000 to 2014 regarding the monitored population exposed to NDGT60 > 25 days per year.
Trends in the five ozone metrics are calculated for the 15-year period 2000-2014. As for present-day distributions, the results for non-urban and urban stations are broadly similar. In addition, at many sites across the globe, there are non-significant trends for these metrics, likely due to interannual variability in meteorology that affects ozone. For the peak exposure metrics (4MDA8 and NDGT70) a considerable number of stations in North America and in Europe show large statistically significant negative trends (p < 0.05) (ca. 1 ppb per year for 4MDA8 and ca. 1 day per year for NDGT70) or non-significant changes. In contrast, over East Asia the trends vary by sub-region; most sites in South Korea and Hong Kong and in particular at urban locations, exhibit large statistically significant positive trends (p < 0.05) but there are both positive and negative significant trends over Japan. The other three metrics (SOMO35, 3MMDA1, AVGMDA8) show more mixed results in terms of the sign of their trends, although strong negative trends are found for many non-urban sites in North America as well as a considerable number of stations in Europe. Urban sites in Europe typically show weak positive and negative changes in SOMO35 and AVGMDA8. For East Asia, the results are again generally different, and similar to those of peak exposure metrics with significant increases across both site types in Hong Kong and South Korea. For Japan, there are mixed results for the SOMO35 and AVGMDA8 metrics whilst for 3MMDA1 a larger proportion of sites have negative trends. There is a tendency toward negative trends for several other world regions although the low numbers of sites preclude robust conclusions. For the three regions discussed above, considering all sites and all trend estimates (i.e. all p-values), the spread in the interquartile ranges are similar for North America and Europe but is much larger for East Asia, with overall negative median trend estimates between 2000 and 2014 for the North America and Europe regions for most ozone health metrics but generally positive median trend estimates for East Asia. These trend results qualitatively agree with trends in ozone precursor emissions with significant emission reductions in North America and Europe, and increases in parts of East Asia.
The differences and commonalities in trends across the five metrics are also explored. Considering all five metrics a common trend (i.e significant decrease, increase or nonsignificant) occurs at 59% of all (non-urban and urban) sites. The most common pattern at the 41% of sites where the trends diverge is a significant decrease in 4MDA8 and non-significant trend for the other metrics, followed by a significant decrease in 4MDA8 and NDGT70 and a nonsignificant trend for the other metrics. In addition, the 4MDA8 and NDGT70 metrics had a common trend at ~8 0% of all sites.
A key issue for further research is to understand the sensitivity of the ozone health metric trends to the site type classifications, particularly across a range of urbanization, for example, low density suburban areas in North America and Australia compared to high density urban centers in Asia such as Hong Kong, Tokyo, Beijing or Delhi. A comparison of the urban classification developed and used in this study to the urban indicators associated with the GPWv3 dataset Global Rural-Urban Mapping Project (GRUMP), v1 urban (SEDAC/CIESIN 2015) would be highly beneficial, as well as establishing the sensitivity of our urban classification to the underlying population data by using the latest GPWv4 global population dataset. In addition, these station classifications are based on the year 2010; changes in land use driven by population growth or development may change the extent of urbanization around a station and further work is needed to assess how this urban classification may vary over the specified trend time periods.
Sensitivity analyses of the trend metric results to the averaging period (i.e. seasonal or annual) and to the trend data period has been performed. Using an annual rather than a seasonal averaging period for the AVGMDA8 metric results in different trends at 27% of all sites. The sensitivity of trend estimate to different time period lengths is also investigated for 4MDA8 and SOMO35. With a shorter 2005-2014 trend period, at some stations in Europe the magnitude of decreasing trends became larger. Globally, for stations with sufficient data, trends for the 2000-2014 period are similar to trends for the longer period of 1995-2014. In North America and Europe some sites with non-significant trends in 2000-2014 have a significant negative trend, and some urban sites in Europe and Japan have a significant positive trend, in 1995-2014. Over the longer 1970-2014 period (many fewer sites), more sites in North America have significant downward trends for both metrics, whilst more urban sites in Europe show significant trends, both positive and negative, than in 2000-2014. For Japan, differences are more apparent, with both negative and positive trends across the country for 2000-2014, but for the longest period (1970-2014) most sites display positive trends for the two metrics examined: 4MDA8 and SOMO35. For other parts of East Asia such long measurement records are not available.
For East Asia, in particular China, further investigation of different trend periods would be useful to aid in establishing whether recent emissions controls in the region implemented in 2011 impact the ozone health metrics used in this study. Lefohn et al. (2017b) highlight several reasons that suggest it may be too early to detect such changes in the trends and why longer records may be needed, in particular to discern a trend versus the interannual variability in meteorology, which is a likely cause of a large proportion of non-significant trends in the five metrics in our study. This includes modification of the Asian monsoon by regional climate phenomena such as El Niño Southern Oscillation. In addition, since the month of maximum ozone varies across this continental region, e.g. a late spring/early summer peak in Japan but a November peak in Hong Kong, metrics that employ warm season peak values or averages may need to consider this variation.
Finally, there are many uncertainties associated with assumptions for trend calculations in this study which are discussed in TOAR-Surface Ozone database and TOAR-Metrics. However, the general convergence of results across the five metrics suggests the main features of change in ozone for health-relevant and policy-related metrics over the 2000-2014 period are well captured. Therefore, the TOAR database of surface ozone health metrics provides an exciting opportunity for further research, and shows that beyond the dense ozone monitoring networks in North America, Europe, Japan and South Korea ozone monitoring in most countries is sparse and would require significant expansion to characterize the ozone air quality impacting their citizens, especially across large regions of Asia and Africa. Ozone monitoring in some countries, such as China and Thailand, is more extensive than is indicated by the TOAR database, which is limited to validated data sets that were contributed by air quality agencies and research groups. It is hoped that future TOAR assessments will be able to close some of the biggest gaps and provide a more complete global description of ozone and its relevance for human health. In support of current research needs, TOAR has assembled the world's largest collection of ozone metrics, calculated consistently for all available monitoring sites. These metrics are publicly available to the research community and TOAR encourages their use for future analyses that quantify the impact of ozone on human health.

Data Accessibility Statement
The TOAR data portal on PANGAEA (https://doi.pangaea. de/10.1594/PANGAEA.876108) contains ozone statistics (including metrics for assessing health, vegetation, and climate impacts), trend estimates, and graphical material. In particular, all maps and box and whisker plots of the ozone metrics used in this paper are archived at: https://doi.pangaea.de/10.1594/PANGAEA.876109.
The TOAR data portal also provides free and unrestricted access. All use of TOAR surface ozone data should include a reference to the TOAR-Surface Ozone Database (Schultz et al., 2017).

Supplemental Files
The supplemental files for this article can be found as follows: