Tropospheric ozone assessment report: Global ozone metrics for climate change, human health, and crop/ecosystem research

Assessment of spatial and temporal variation in the impacts of ozone on human health, vegetation, and climate requires appropriate metrics. A key component of the Tropospheric Ozone Assessment Report (TOAR) is the consistent calculation of these metrics at thousands of monitoring sites globally. Investigating temporal trends in these metrics required that the same statistical methods be applied across these ozone monitoring sites. The nonparametric Mann-Kendall test (for significant trends) and the Theil-Sen estimator (for estimating the magnitude of trend) were selected to provide robust methods across all sites. This paper provides the scientific underpinnings necessary to better understand the implications of and rationale for selecting a specific TOAR metric for assessing spatial and temporal variation in ozone for a particular impact. The rationale and underlying research evidence that influence the derivation of specific metrics are given. The form of 25 metrics (4 for model-measurement comparison, 5 for characterization of ozone in the free troposphere, 11 for human health impacts, and 5 for vegetation impacts) are described. Finally, this study categorizes health and vegetation exposure metrics based on the extent to which they are determined only by the highest hourly ozone levels, or by a wider range of values. The magnitude of the metrics is influenced by both the distribution of hourly average ozone concentrations at a site location, and the extent to which a particular metric is determined by relatively low, moderate, and high hourly ozone levels. Hence, for the same ozone time series, changes in the distribution of ozone concentrations can result in different changes in the magnitude and direction of trends for different metrics. Thus, dissimilar conclusions about the effect of changes in the drivers of ozone variability (e.g., precursor emissions) on health and vegetation exposure can result from the selection of different metrics.

-4: Percentage of sites included in TOAR for which the trends (from 1995 through 2014) in each metric from Lefohn et al. (2016) were in the same direction as all other exposure metrics from Lefohn et al. (2017). (Page 20).

Additional Metrics Derived from Long-term Observations at Baseline Monitoring Sites
Polynomial "shape factors" that describe long-term trends are calculated for selected baseline sites and made available in table form, as described by Parrish et al. (2014). A systematic approach to quantifying long-term changes in baseline ozone, based upon the available data sets collected at relatively isolated sites from the mid-twentieth century to the present yielded metrics, was applied to represent long-term changes in seasonally averaged baseline ozone levels at northern mid-latitudes suitable for evaluating the analogous ozone trends calculated by models (Parrish et al., 2014). The metrics themselves are the coefficients of polynomial fits to the measured levels following normalization to the year 2000 intercepts of the data. The intercepts (Table S-1) provide metrics for comparing absolute ozone levels at specific locations. The levels in Table S-1 should be interpreted as the seasonally averaged, near surface, baseline ozone level in the year 2000 with the interannual variability removed. Thus, these metrics are suitable for comparison to both Coupled Climate Models (CCMs), which simulate their own meteorology that is not intended to reproduce the actual meteorological variability, and chemical transport models, which assimilate real meteorology. The polynomial coefficients (Table S-2) provide metrics that characterize relative ozone level changes over broad regions of the northern mid-latitude lower troposphere. These polynomial coefficients define the black curves illustrated in Figures 5 and 6 of Section 6, Model Performance).
Fourier series expansion of seasonal cycles at selected baseline sites are calculated for selected baseline sites and made available in the TOAR database. Fourier series expansions of seasonal cycles of ozone at marine boundary layer (MBL) sites around the globe have been shown to provide critical tests of the model treatment of some of the physical processes that control tropospheric ozone levels (Parrish et al., 2015). The annual average plus two sine function terms -the fundamental (period = 1 year) and second harmonic (period = 1/2 year) -are the only significant contributors to the seasonal cycles of ozone at these sites. Figure 1 in the paper illustrates one example. Thus, the seasonal cycle can be defined by: y = Y o + A 1 *sin(c + f 1 )+ A 2 *sin(2*c + f 2 ). (1) Here the variable c spans one year's time period in radians from 0 to 2p. The five parameters of these three terms (Table S-3) provide metrics to which model calculations can be quantitatively compared: Y 0 (the annual average) and the amplitudes (A 1 , A 2 ) and phases (f 1 , f 2 ) of the two sine terms. An additional metric is the root-mean-square deviation (RMSD) of the monthly average from the functional form of Equation (1); this metric is also included in Table  S-3.   Storhofdi, Iceland 38.5 ± 0.5 6.2 ± 0.7 0.53 ± 0.12 3.2 ± 0.7 -2.24 ± 0.23 2.1 a Y o represents the annual average ozone level evaluated over the complete period covered by the measurements. b RMSD is the root-mean-square deviation of the monthly average data from the Fourier Series fits.

Detailed Description of TOAR Human Health Metrics
The procedure for the calculation of the metrics (including data completeness and rounding) is described in Schutz et al. (2017 (US EPA, 2006), as the 8-h running mean for a particular hour is based on the ozone concentration for that hour and the previous 7 hours. This means that, for example, the 8-h running mean for the hour starting at 00:00 on a particular day using the EU method (preceding hours), will be calculated using the same 8 hourly ozone concentrations as the 8-h running mean for the hour starting at 17:00 on the previous day calculated using the US EPA method (following hours). This method for calculation of 8-h running means has also been adopted elsewhere, e.g. in South African air quality standards (SANS, 2011). Hence on a particular day, 7 of the 24 running 8-h ozone concentrations used to calculate the daily maximum 8-h ozone concentration will be different for the two methods. By including both the EU and the US EPA protocols, a comparison is possible for the two methods. Previous application of the 4 th highest 8-h average concentration metric to ozone measurements at European monitoring stations applied the EU protocol to derive the 8-h running means (Tripathi et al., 2012).

c) 4 th Highest 8-h Average Concentration for Each
Year. The form of this metric is based on the US EPA protocol adopted on 1 October 2015 (US Federal Register, 2015).
The US EPA's ozone NAAQS was changed from 0.075 ppm to 0.070 ppm (8-h averaging time, fourth highest daily maximum averaged over 3 years) on 1 October 2015. The US EPA noted that compared to (1) estimates of ozone exposures of concern and (2) estimates of ozone-induced lung function decrements, the Agency's conclusions concerning the level of the standard reflected lower confidence in epidemiologic-based risk estimates (US EPA, 2015). In establishing the new ozone standard, the US EPA indicated that it was confident that reducing the highest ambient ozone concentrations would result in substantial improvements in public health, including reducing the risk of ozoneassociated mortality. The Agency noted that it was far less certain about the public health implications of the changes in relatively low ambient ozone concentrations. The Agency therefore concluded that reducing precursor emissions to meet the new lower ozone standard would result in important reductions in ozone concentrations from the highest part of the air quality distribution, where the scientific evidence provided the strongest support for adverse health effects.
Unlike the form of the 2008 ozone standard (0.075 ppm 8-h average), the daily maximum 8-h average ozone concentration for a given day is derived from the highest of the 17 consecutive 8-h averages beginning with the 8-h period from 7:00 a.m. to 3:00 p.m. and ending with the 8-h period from 11:00 p.m. to 7:00 a.m. the following day (i.e., the continuous 8-h averages running from 7:00 a.m. to 11:00 p.m.). The daily maximum 8-h average ozone concentration is determined for each day with ambient ozone monitoring data, including days outside the ozone monitoring season if those data are available. This method ensures that the MDA8 on two consecutive days will not have any hours in common and prevents double counting of high overnight ozone events.

d) Maximum daily 8-h average over the entire year.
For each day in a year, the maximum of the 24 8-h running means is calculated according to the EU protocol, and the highest of these values constitutes the maximum daily 8-h average over the entire year. The annual maximum daily 8-h average ozone value is the basis for the EU ozone long term objective for the protection of human health, set at 120 µg m -3 (60 ppb) (European Council Directive 2008/50/EC). This metric is calculated for ozone data at all European sites archived in the EU AirBase data repository (http://www.eea.europa.eu/dataand-maps/data/aqereporting), and it has been used previously to assess temporal and spatial variation in health-relevant ozone across Europe (EEA, 2014a;EEA, 2014b;Guerreiro et al., 2014;Tripathi et al., 2012). This metric emphasizes the magnitude of the high percentile ozone concentrations occurring during a given year, which generally result from regional or local photochemical episodes, rather than baseline concentrations (Royal Society, 2008). Additionally, this metric can be combined with another metric, the number of exceedances of daily maximum 8-h ozone values greater than 60 ppb (outlined below), to understand both the magnitude of exceedance of the EU long term ozone objective, and the frequency with which exceedance occurs. Other international air quality guidelines and standards are also based on maximum daily 8-h average values, including the WHO air quality guideline (set at 100 µg/m 3 (50 ppb)), (WHO, 2006)), and air quality standards in India (50 ppb 2015)). Both the EU directive and WHO use limit values given in µg/m 3 , which is approximated by ppb´2.

e) Maximum daily 1-h average over the entire year.
The annual maximum daily 1-h average ozone concentration is the highest 1-h ozone concentration measured at a site across the year. This metric can be used to compare health-relevant ozone at a site against the EU 'information threshold', set at 180 µg m -3 (90 ppb). The directive also has an alert threshold of 240 µg/m 3 . The EU information thresholds have been established as hourly ozone concentrations 'beyond which there is a risk to human health from brief exposure for particularly sensitive sections of the population', and about which the public must be informed (European Council Directive 2008/50/EC). Previous studies at sites have used the information threshold as a basis for assessment of temporal and spatial variation in short-term peak ozone (EEA, 2014a;Jenkin, 2008). This metric quantifies the highest hourly-average ozone concentration and does not provide any information about the distribution of ozone concentrations below the peak value. However, this metric can be combined with another metric, the number of exceedances of daily maximum 1-h ozone concentrations greater than 90 ppb (outlined below), to understand both the magnitude of maximum exceedance of the EU information threshold, and the frequency with which exceedance occurs. Other international air quality standards are based on the maximum hourly value, including those established in India ( Subsequently, the World Health Organization (WHO) Review of Evidence on the Health Aspects of Air Pollution (REVIHAAP, 2013) synthesis report provided recommendations for the quantification of ozone relevant for health effects associated with short-term exposure which are in line with those used to calculate SOMO35 (and SOMO10, see below). Specifically, the SOMO35 metric is accumulated based on daily maximum 8-h ozone concentration over the whole year. Association between daily maximum 8-h ozone and mortality have been calculated in winter as well as summer based on analysis for 23 European cities (Gryparis et al., 2004), and in 21 East Asian cities (Chen et al., 2014). Calculation of SOMO35 also assumes a linear concentration-response relationship, which has been calculated in a range of epidemiological studies (Atkinson et al., 2012;Bell and Dominici, 2008;Gryparis et al., 2004). However, other studies have reached different conclusions on the suitability of a linear concentration-response function (Stylianou and Nicolich, 2009). Finally, REVIHAAP also specified two cutoff values set at 10 ppb (20 µg m -3 , see SOMO10 metric description below) and 35 ppb (70 µg m -3 ), as 'the epidemiological evidence on linearity does not extend down to zero'. The recommendation of two cutoffs reflected the 'not consistent' evidence for a threshold for short term exposure and is similar to no effect levels reported in several epidemiological studies (Gryparis et al., 2004;Pattenden et al., 2010). The SOMO35 metric is a standard statistic calculated on ozone time series archived within the EU AirBase data repository. The magnitude of the SOMO35 metric is determined by both the magnitude of ozone values (above 35 ppb) measured at a site, and the frequency with which these values occur. The Sum Of Means Over 10 ppb (SOMO10) metric is the annual sum of the positive differences between the daily maximum 8-h average ozone values, and the cutoff values set at 10 ppb (10 µg m -3 ), calculated for all days in a year. The SOMO10 metric is calculated in the same way as SOMO35, but with the lower cutoff concentration, and reflects the epidemiological evidence of associations between short-term ozone exposure and lower ozone concentrations, as summarized in REVIHAAP (2013). For example, associations between ozone observations and mortality have been calculated at values as low as 10 ppb (Bell and Dominici, 2008;Kim et al., 2004;Stylianou and Nicolich, 2009). Similarly, Bell et al. (2006) concluded that 'a "safe" ozone level can only exist at very low concentrations'. Hence SOMO10 quantifies health-relevant ozone across a larger range of the ozone distribution compared to SOMO35.
Both the SOMO10 and SOMO35 metrics attempt to quantify short-term healthrelevant ozone across the range of concentrations which contribute to this impact. The magnitude of each metric will be determined by the trends occurring across the ozone distribution above 10 ppb (SOMO10) and 35 ppb (SOMO35), respectively. Hence opposing trends occurring in different parts of the ozone distribution will act to change the magnitude of SOMO10/SOMO35 in opposite directions. For example, if a decrease in relatively high percentile ozone concentrations coincides with an increase in relatively low ozone concentrations, the magnitudes of SOMO10 and SOMO35 estimate the overall change in ozone exposures resulting from both of these trends. The two cutoff levels represent the inconsistent evidence of the ozone concentration-response relationship for health effects associated with short-term exposure.
Controlled human laboratory studies have shown that there is a disproportionately greater pulmonary function response from higher hourly average ozone concentrations than from lower hourly average values and thus, a nonlinear relationship exists between ozone dose and pulmonary function (FEV 1 ) response (Hazucha and Lefohn, 2007;Lefohn et al., 2010b;US Federal Register, 2015). Lefohn et al. (2010b) reanalyzed data from five controlled human response to ozone health laboratory experiments as reported by Hazucha et al. (1992), Adams (2003, 2006a, 2006b), and Schelegle et al. (2009). These investigators exposed subjects (healthy young adults) to multi-hour variable/stepwise ozone concentration profiles that mimicked typical diurnal patterns of ambient ozone concentrations. Lefohn et al. (2010b) reported a common response pattern across most of the studies that provided information for the development of a lung function (FEV 1 )-based 4 th highest W90 5-h cumulative exposure index. Based on the reanalysis of the realistic exposure profiles used in these experiments, an alternative form of the human health 8-h standard, similar to the W126 exposure index (Lefohn et al., 1988) that the US EPA currently uses to assess vegetation effects was proposed. The W90 exposure index assigns lower weights to the hourly average values at and below 50 ppb lower than the W126 metric.
The form of the W90 index is S w i ´ C i with weight w i = 1/[1 + M ´ exp (-A ´ C i /1000)], where M = 1400, A = 90, and where C i is the hourly average ozone mixing ratio in units of ppb. The M and A constants were derived based on the observations noted in Lefohn et al. (2010b). The W90 index has units of ppb-hrs. The W90 metric is a non-threshold index that is described as the sigmoidally weighted sum of all hourly ozone values, where each hourly ozone mixing ratio is given a weight that increases from zero to one with increasing concentration. The W90 metric is based on the range of typical hourly average values experienced under ambient conditions.
i) The annual and seasonal percentiles (median, 5 th , 25 th , 75 th and 95 th ) of hourly average concentrations over 24-h period.
Ozone is influenced by multiple factors (e.g., precursor concentrations, local meteorology, transport of air masses, deposition, etc.). These factors determine the production, loss and transport of ozone, and hence the concentration and frequency distribution of ozone at any site. Changes in precursor emissions, meteorology, transport, etc., can cause changes in the frequency distribution of ozone concentration. Therefore, investigation into the spatial and temporal changes in the frequency distribution of ozone concentration can assist in identifying the reasons hourly average concentrations at certain sites change. The percentiles (median, 5th, 25th, 75th and 95th) of hourly average ozone concentrations over a 24-h period are statistics which summarize ozone concentrations for which a given percentage (50%, 5%, 25%, 75%, and 95%) of hourly average ozone concentrations are below. Such statistics can be used as ozone metrics for assessing impacts on human health, vegetation, and climate change. These metrics can be calculated separately from hourly averaged ozone observations at different time intervals. For example, in the TOAR assessment, annual and seasonal ozone observations are calculated.
Annual and seasonal percentiles (median, 5th, 25th, 75th and 95th) summarize ozone concentration at five key points across the frequency distributions of hourly average ozone concentrations within a year and a season, respectively. Long-term changes in these percentile metrics facilitate the assessment of the impacts of the ozone level associated with different factors, such as changes of emission, global climate, long-range transport, etc. (Lefohn et al., 2010b). Long-term changes in the concentrations of ozone precursors can cause trends for different parts of frequency distribution of ozone concentrations (Lefohn et al., 1998(Lefohn et al., , 2010Brönnimann et al., 2002;Xu et al., 2008;Simon et al., 2015), which are not necessarily consistent. Therefore, studying the long-term variations of ozone using these percentile metrics can help to avoid potential misinterpretation in the cause analysis using single summary statistics (e.g., the mean ozone concentration). As indicated in Section 1, shifting distributions affect human health and vegetation metrics. The shifting that occurs among the hourly average concentrations can result from increased or reduced NO x titration, as well as changes in background ozone mixing ratios (Lefohn and Cooper, 2015). Many studies indicate that the responses of human health and vegetation injury and damage to ozone exposure are often nonlinear (Hazucha and Lefohn, 2007;Kickert and Krupa, 1991;Lefohn et al., 1988;Massman et al., 2000) with the result that different weighting factors of ozone hourly average concentrations should be applied. Seasonal and annual percentile ozone metrics provide important data for the weighting in the exposure-response studies. Numbers of exceedances of daily maximum 1-h values greater than 90, 100, and 120 ppb per year indicate yearly non-attainment occurrences if the threshold values for daily maximum 1-h ozone standard are 90, 100, and 120 ppb, respectively. In health effect studies, the 1-h average is often used in the parameterization of exposure-response (Hazucha and Lefohn, 2007). There has been no evidence of a population threshold for ozone, below which no effect is measurable. However, it is clear that higher ozone hourly average concentrations cause greater physiological responses (US EPA, 2013). Therefore, lowering the daily maximum hourly ozone level is one of the ways to reduce the adverse effects of ozone on human health. The daily maximum 1-h values were widely used as an ozone air quality standard before the daily maximum 8-h values were introduced. In 1979, the US EPA adopted the daily maximum 1-h value of 120 ppb as an air quality standard for ozone. This 1-h standard was revoked in 2005 by EPA, but some areas have continued obligations under this standard (https://www.epa.gov/criteria-airpollutants/naaqs-table). In some countries, the daily maximum 1-h value is still used as the ozone standard. For example, Japan has been using the daily maximum 1-h value of 60 ppb as an ozone standard (http://www.env.go.jp/en/air/aq/aq.html). China is using both the daily maximum 8-h value (75 ppb) and the daily maximum 1-h value (93 ppb) as ozone standards for both residential and commercial areas.

l) Running mean of the 3-month average of the daily 1-h maximum.
Short-term increases in ozone have been linked to a wide array of health responses, including increases in daily mortality (Thurston and Ito, 2001;Bell et al., 2007;Bell et al., 2004). Studies of the impacts of chronic exposure, which are generally thought to have the greatest population health impact, are less common. One of the most influential studies of the impact of chronic ozone exposure on mortality is an analysis of the American Cancer Society (Cancer Prevention Study II) cohort population of nearly 500,000 individuals residing in 96 Metropolitan Statistical Areas across the U.S (Jerrett et al., 2009). This study identified ozone as a risk factor for mortality from respiratory diseases that was independent from co-exposure to particulate matter. The quantitative relationship between ozone exposure and mortality from this study is the basis for the estimates of disease burden attributable to long-term ozone exposure in the Global Burden of Disease (GBD) analyses (Lim et al., 2012;Forouzanfar et al., 2015).

Jerrett et al. (2009) evaluated the risk of mortality associated with the averages of the second (April through June) and third (July through September) quarters of the year.
Since the ozone (summer) season varies throughout the globe, for estimation of ozone's contribution to global disease burden (Lim et al., 2012;Forouzanfar et al., 2015), a running 3-month average (of daily 1-h maximum values) was calculated for each (0.1 x 0.1) grid cell over a full year and the maximum of these values was selected to best approximate a location-specific seasonal ozone average that conformed with the above epidemiologic analyses of chronic ozone exposure impacts on mortality (Brauer et al., 2012(Brauer et al., , 2016. Results from the ACS-CPS-II analyses have been widely applied to quantify premature deaths associated with long-term ozone exposure (Anenberg et al., 2010;Forouzanfar et al., 2015;Lim et al., 2012;Shindell et al., 2016). The TOAR long-term trend results (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) indicate that this human health metric appears to be more associated with the higher hourly concentrations within the distribution than those values associated with the entire distribution (see Section 4). Coupled with this metric, TOAR reports the day of the year on which the 3-month maximum metric reaches its maximum value.

a-h) W126
In 1985, A.S. Lefohn developed the form for the W126 ozone exposure index as an ozone metric that was closely related to vegetation response. Lefohn and Runeckles (1987) proposed the use of a sigmoidally weighted index for assessing vegetation based on evidence indicating a greater relative importance of higher concentrations in affecting vegetation in comparison to the mid and lower values (Musselman et al., 1983). Lefohn et al. (1988) mathematically described and applied the W126 exposure index to develop exposure-response relationships. The W126 metric is a non-threshold index that is described as the sigmoidally weighted sum of all hourly ozone concentrations observed during a specified daily and seasonal time window, where each hourly ozone concentration is given a weight that increases from zero to one with increasing level. The where C i is the hourly average ozone mixing ratio in units of ppb. The M and A constants were derived based on the desire to weight the hourly average levels (1) at a value of one at ≥ 100 ppb and (2) at very low values below 40 ppb. The low weighting at levels below 40 ppb was based on the assumption at the time that hourly average background ozone mixing ratios were mostly associated with levels below this value (US EPA, 2006). As is recognized today, hourly average concentrations associated with background ozone can, at limited times and locations, be significantly higher as a result of stratospheric-tropospheric transport to the surface (Lefohn et al., 2011(Lefohn et al., , 2012(Lefohn et al., , 2014Emery et al., 2012;Lin et al., 2012;US Federal Register, 2015). For both the vegetation (W126) and human health (W90) sigmoidally weighted exposure indices, the weightings are similar except for the weights at the lower levels (compare Figures 2 and 3).
The US EPA (US Federal Register, 2015) concluded that ozone effects in plants were cumulative; higher ozone concentrations appeared to be more important than lower concentrations in eliciting a response; plant sensitivity to ozone varied with time of day and plant development stage; and quantifying exposure with indices that cumulate hourly ozone concentrations and preferentially weight the higher concentrations improved the explanatory power of exposure/response models for growth and yield, over using indices based on mean and peak exposure values. As a result of its emphasis on weighting the higher exposures greater than the midand low-levels, the US EPA endorsed the use of the W126 exposure index. The Agency concluded that protection of vegetation from adverse effects can be provided by an 8-h ozone standard of 70 ppb that limits cumulative 3-month seasonal W126 exposures to 17 ppm-hrs or lower. The 70 ppb 8-h ozone standard as per the US EPA's 2015 decision (US Federal Register, 2015) serves as a surrogate to achieve ozone levels at or below a W126 value of 17 ppm-hrs. The U.S. EPA applies the three-year average of the annual 4 th highest daily 8-h average to determine an exceedance of the standard.

i-r) AOT40
AOT40 is the sum of the difference between the hourly mean ozone value at the top of the canopy and values above 40 ppb for all daylight hours over a specified time. In the 1990s, as a response to growing understanding that plants were responding to accumulated ozone above a threshold rather than a long-term average (Fuhrer et al., 1997), the Convention on Long-range Transboundary Air Pollution (CLRTAP) developed and recommended the AOT40 as a thresholdbased metric for ozone risk assessment in European plant ecosystems (described in CLRTAP, 2017). In recent years, the CLRTAP has adopted the flux-based metric, POD Y (see Section 2.3.4) in preference to AOT40 as this metric has greater biological relevance and is better correlated with field evidence of effects (Mills et al., 2011a). It was not feasible to include POD Y metrics in TOAR. AOT40 is used as legislative standard in Europe (Directive 2008/50/EC), when accumulated over a standard time window (0800-1959 h) and a standard time period (May to July). The CLRTAP has established AOT40-based critical levels for crops, grasslands and forests using vegetation-specific time intervals and AOT40 accumulated during daylight hours. When applied at the global level a variety of accumulation windows are required (see TOAR-Vegetation and http://www.igacproject.org/activities/TOAR).
For all the AOT40 TOAR metrics, the following calculation steps were made: Step 1: Determine the receptor-specific accumulation period.
Step 2: Collate the hourly mean ozone values over the accumulation period.
Step 3: Calculate the AOT40 index by subtracting 40 ppb from each hourly mean value during daylight hours (when global radiation is > 50 W m -2 ), specific hours of day (0800 -1959) or nighttime hours (when global radiation is < 5 W m -2 ) and then sum the resulting values.
For the purposes of the TOAR database, the AOT40 ozone metric was calculated using the reported hourly ozone values at each site, and no adjustment was made to account for the canopy height of vegetation types. This contrasts with guidelines applied regionally (e.g. in the EU), which adjust to typical canopy heights (CLRTAP, 2017).
i) AOT40 (3-month, 0800-1959 h) The timing of the two three-month accumulations period for agricultural crops should reflect the period of active growth of wheat and rice, respectively, and be centered on the timing of anthesis, as summarized in TOAR-Vegetation and http://www.igacproject.org/activities/TOAR.
The timing of the start of the growing season for horticultural crops is more difficult to define because these crops are repeatedly sown over several months in many regions. For local application, appropriate 3-month periods should be selected.
This metric does not apply to forests.
j) AOT40 (6-month, 0800-1959 h) This metric typically applies to forests and long-lived perennial vegetation, such as grasslands. The default exposure windows for the accumulation are suggested in TOAR Vegetation. These time periods do not take altitudinal variation into account, should be viewed as indicative only, and should only be used where local information is not available.
l) AOT40 (12-month, 0800-1959 h) This metric is calculated as stated above, but the accumulation period is the entire year. It applies to species with a year-long growing season (e.g., Mediterranean evergreen forests, tropical and subtropical moist climate forests).

m) AOT40 (3-month, daylight over the period when clear sky radiation > 50 W m -2 )
This metric is calculated as i), but rather than being accumulated over the hours from 8:00 to 19:59, it is restricted to those hours when clear sky radiation exceeds 50 W/m 2 . This value is recommended by CLRTAP (2017)  This metric is calculated as j) and with the same rationale as m).

o) AOT40 (7-month, daylight over the period when clear sky radiation > 50 W m -2 )
This metric is calculated as k) and with the same rationale as m).

p) AOT40 (3-month, nighttime over the period when clear sky radiation < 5 W m -2 )
This metric is calculated as m), but only includes all the nighttime hours, when stomata should be closed. It is restricted to those hours when clear sky radiation is below 5 W/m 2 . Although a 0 threshold would be more biologically-based, an arbitrary 5 threshold ensures that only nighttime values are included and can be easily handled in a large database. Radiation values are again estimated according to Blanco-Muriel et al. (2001), where "irradiance < 5 W m -2 " is similar to the condition "elevation angle < -5 degrees", and leaves out dawn and dusk hours entirely.
This metric is calculated as n) and with the same rationale as p).
This metric is calculated as o) and with the same rationale as p).

s-v) Daily 12-h Average
Besides AOT40, the daily 12-h (0800-1959h mean ozone exposure metric) (M12) has been widely used to characterize crop exposures to establish crop-specific exposure-response relationships, which relate a quantifiable mean to a reduction in crop yield (Heck et al., 1988;Jäger et al., 1992;Legge et al., 1995;Van Dingenen et al., 2009). These resulted in the derivation of robust exposureresponse relationships for a number of key agricultural crops. Notably, the growing season time window on which M12 exposure response relationships are based depend on vegetation types and climate zones. However, in postexperimental data analysis, cumulative metrics, such as the SUM06 and W126 indices (US EPA, 2013;US Federal Register, 2015) better fit the yield loss observations for experiments conducted in the US, and thus received greater focus (Tingey et al., 1991;Lefohn and Foley, 1992;Mauzerall & Wang, 2001). As a result of this, fewer studies use the M12 as an exposure metric for assessing crop or tree injury and damage.

s) Daily 12-h average averaged over 3 months, (0800-1959h)
Similarly to AOT40, this metric is relevant for agricultural crops and the same timings of the representative growing seasons for wheat and rice summarized in TOAR-Vegetation and http://www.igacproject.org/activities/TOARwere used to derive this metric.
(t) Daily 12-h average averaged over 6 months, (0800-1959h) (monthly periods specified in accompanying documentation) This is typical time window for deciduous forests and semi-natural vegetation.
The default exposure windows for the accumulation are shown in TOAR-Vegetation and http://www.igacproject.org/activities/TOAR. This metrics does not apply to the forest whose growth period during the year is less than 6 months (e.g. boreal forest).

(u) Daily 12-h average averaged over 7 months, (0800-1959h) (monthly periods specified in accompanying documentation).
This is a typical time window for deciduous forests and semi-natural vegetation in temperate moist or subtropical climate zones. The default exposure windows for the accumulation are shown in TOAR-Vegetation and http://www.igacproject.org/activities/TOAR. This metrics does not apply to forests whose growth period during the year is less than 7 months (e.g. boreal forest).
This is the typical time window for evergreen forests and semi-natural vegetation in Mediterranean and sub-tropical climate zones.

w) Stomatal flux
In Europe, since the late 1990s, it has been recognized by the CLRTAP that ozone effects on vegetation are related to the uptake of ozone through the stomatal pores of plants. The DO 3 SE model (Emberson et al., 2000) was adopted by CLRTAP for calculating the accumulated stomatal flux of ozone from hourly values of ozone level, together with conductance modifying factors: temperature, vapor pressure deficit, light (irradiance), soil water potential (SWP) or plant available water (PAW), ozone value and plant development stage (phenology). For this metric, the hourly mean instantaneous stomatal flux of ozone based on the projected leaf area (PLA), F st (in nmol m -2 PLA s -1 ), is accumulated over a stomatal flux threshold of Y nmol m -2 s -1 . The accumulated Phytotoxic Ozone Dose (i.e., the accumulated stomatal flux) of ozone above a flux threshold of Y (POD Y , formerly named AF st Y), is calculated for the appropriate timewindow as the sum over time of the differences between hourly mean values of F st and Y nmol m -2 PLA s -1 for the periods when F st exceeds Y. The Y threshold varies between species as do the parameterizations for each flux modifying factor such as temperature or soil moisture, reflecting the different stomatal dynamics of different species. Two types of POD Y model exist: POD Y IAM which has a simplified parameterization and is suitable for large-scale integrated assessment, and POD Y SPEC, species-specific parameterization of the flux model. Local and regional parameterizations have been defined for POD Y SPEC and POD Y IAM in CLRTAP (2017) for a range of crops, tree and grassland species/species groups and used to define 21 critical levels, above which negative effects of ozone on crop yield, biodiversity and tree growth are expected (CLRTAP, 2017). x) Seasonal Percentiles (median, 5 th , 25 th , 75 th , 95 th , 98 th , and 99 th ).
The magnitude of each exposure and dose metric is influenced by the hourly average concentrations occurring within the ozone distribution. As outlined in Sections 1 and 2, the changes in the magnitude and trend pattern for exposure indices relevant for both vegetation and human health are determined by the relationship between each metric and changes that occur in the different parts of the ozone distribution. The shifts that occur among the hourly average levels can occur because of changes (increases or reduction) in the extent of NO x titration, photochemical ozone production, background ozone levels, and ozone deposition (Lefohn and Cooper, 2015;Monks et al., 2015). The seasonal 5 th , 25 th , 50 th , 75 th , 95 th , 98 th , and 99 th percentile values of hourly ozone levels are of interest so that each of the specific percentiles can be trended on a seasonal basis. This information is very helpful in relating the changes which occur in the exposure and dose metrics to the changes occurring across the ozone distribution in different seasons.

Table S-4:
Percentage of sites included in TOAR for which the trends (from 1995 through 2014) in the TOAR exposure metrics (columns) were in the same direction (i.e., decreasing, increasing, or no significant change) compared to a set of metrics (rows), which included those in Lefohn et al. (2017).