Regional trend analysis of surface ozone observations from monitoring networks in eastern North America, Europe and East Asia

Surface ozone is a greenhouse gas and pollutant detrimental to human health and crop and ecosystem productivity. The Tropospheric Ozone Assessment Report (TOAR) is designed to provide the research community with an up-to-date observation-based overview of tropospheric ozone’s global distribution and trends. The TOAR Surface Ozone Database contains ozone metrics at thousands of monitoring sites around the world, densely clustered across mid-latitude North America, western Europe and East Asia. Calculating regional ozone trends across these locations is challenging due to the uneven spacing of the monitoring sites across urban and rural areas. To meet this challenge we conducted a spatial and temporal trend analysis of several TOAR ozone metrics across these three regions for summertime (April–September) 2000–2014, using the generalized additive mixed model (GAMM). Our analysis indicates that East Asia has the greatest human and plant exposure to ozone pollution among investigating regions, with increasing ozone levels through 2014. The results also show that ozone mixing ratios continue to decline significantly over eastern North America and Europe, however, there is less evidence for decreases of daytime average ozone at urban sites. The present-day spatial coverage of ozone monitors in East Asia (South Korea and Japan) and eastern North America is adequate for estimating regional trends by simply taking the average of the individual trends at each site. However the European network is more sparsely populated across its northern and eastern regions and therefore a simple average of the individual trends at each site does not yield an accurate regional trend. This analysis demonstrates that the GAMM technique can be used to assess the regional representativeness of existing monitoring networks, indicating those networks for which a regional trend can be obtained by simply averaging the trends of all individual sites and those networks that require a more sophisticated statistical approach.


Introduction
Tropospheric ozone is a greenhouse gas and pollutant detrimental to human health and crop and ecosystem productivity (REVIHAAP, 2013;U.S. Environmental Protection Agency, 2013;LRTAP Convention, 2015;Monks et al., 2015). Since 1990 a large portion of the anthropogenic reactive gas emissions that produce ozone have shifted from North America and Europe to Asia (Granier et al., 2011;Cooper et al., 2014;Zhang et al., 2016). This rapid shift, coupled with limited ozone monitoring in developing nations, presents a challenge to the scientists trying to summarize and understand recent changes in ozone at the global scale. To address this challenge the International Global Atmospheric Chemistry Project (IGAC) developed the Tropospheric Ozone Assessment Report (TOAR): Global metrics for climate change, human health and crop/ecosystem research (www.igacproject.org/TOAR). Initiated in 2014, TOAR's mission is to provide the research community with an up-to-date scientific assessment of tropospheric ozone's global distribution and trends from the surface to the tropopause. TOAR has produced the world's largest database of surface ozone metrics from hourly observations at over 9000 sites around the globe. These ozone metrics are freely accessible for research on the global-scale impact of ozone on climate, human health and crop/ecosystem productivity (Schultz et al., 2017). Figure 1 illustrates the coverage of the TOAR surface ozone database across North America, Europe and East Asia (additional sites are available for other regions of the world) and shows the annual ozone trends at each station of the April-September average daytime ozone value for the period 2000-2014. In general these observed trends reflect recent changes in ozone precursor emissions. Several studies have documented the decrease of surface ozone across the eastern United States in response to decreases of domestic ozone precursor emissions (Kim et al., 2006;Lefohn et al., 2010;Cooper et al., 2012;Simon et al., 2015;Lin et al., 2017) and also across much of western Europe (Derwent et al., 2010;Simpson et al., 2014;European Environment Agency, 2016). In contrast, China has experienced decades of emissions increases (Zhao et al., 2013) and several studies have documented increasing ozone at the few sites available for assessing long term trends (Ma et al., 2016;Sun et al., 2016;Xu et al., 2016;Li et al., 2017;Wang et al., 2017). However, some regions of East Asia have experienced emissions decreases in recent years such as Beijing, the Pearl River Delta, Taiwan and Japan, and more works is required to understand the response of surface ozone (Duncan et al., 2016;Krotkov et al., 2016;Liu et al., 2016;Miyazaki et al., 2017;Van der A et al., 2017).
While Figure 1 provides a great deal of detail regarding the regional distribution of trends, the wide range of trend values across urban and rural areas, especially for Europe, makes it difficult to describe the overall regional trend. If one were to simply average all trend values across a region, how should they be weighted in terms of their spatial representation, and what is our confidence that a regional mean trend would be statistically significant?
This paper aims to answer these questions by applying a generalized additive mixed model (GAMM) to determine the systematic regional variations of several ozone metrics across eastern North America, Europe and East Asia. Quantifying a regional ozone trend is complicated by temporal and spatial variabilities. Also, this estimation is vigorously challenged by data inhomogeneity in time and by the irregularity of the spatial distribution of stations, as well as by interruptions in observational records. Furthermore, measurement practices may change over the years at a given site, impacting the observed quantity of ozone. The typical problems in the analysis of multi-site datasets are discussed by Chandler (2005); Wagner and Fortin (2005); Paciorek et al. (2009).
Our main focus is to derive regional trends using an advanced and more accurate GAMM approach. The analysis begins in Section 2 which describes the TOAR dataset applied here and briefly summarizes preliminary trend analyses: the approach of fitting a separate regression model for each station time series. We discuss the narrowness of this approach as it ignores the spatial dependence between sites. Section 3 explains how the GAMM can be used to represent ozone's systematic regional variations (i.e. dependence of the mean ozone level on space and time) in terms of the random adjustments of station-specific effects over time. In Section 4, we demonstrate our approach to determine the monthly systematic regional variations of summertime (April-September) ozone across eastern North America, including the regional trends in rural and urban sites, and investigate the change of spatial patterns by year. We then expand our analysis to Europe and East Asia. In Section 5 we conduct a trend analysis using summertime means rather than monthly means in order to efficiently investigate additional surface ozone metrics. Finally, to demonstrate the useful information on regional ozone trends afforded by sites with relatively weak trends, we illustrate the ability of GAMM to quantify regional ozone trends even when sites with the most robust trends are removed from the analysis. In Section 6, we provide a summary of our trend analysis.

TOAR dataset and preliminary analysis
We use two surface ozone metrics in our monthly trend analysis: (1) , 2013). Note that if less than 75% of data are present (i.e. less than 9 hours for daytime average or 6 hours for DMA8), the value is considered missing. These metrics are volumetric mixing ratios in units of ppb (i.e. parts per billion by volume) and retrieved from April to September over 2000-2014 from the TOAR database. For the summertime mean trend analysis in Section 5, these metrics are averaged across the months of April-September and are also retrieved from the TOAR database (Schultz et al., 2017). Note that when extracting DMA8 for the 6-month summertime period the value returned is the 4th highest daily 8-hour maximum of the April-September aggregation period, which aligns with the US EPA National Ambient Air Quality Standard for ozone (https://www.epa.gov/criteria-air-pollutants/ naaqs-table); but when extracting DMA8 for individual months the value is simply the monthly mean.
In this study we consider four explanatory variables at the location of each ozone monitoring site: (1) Station elevation: the value of the station elevation in meters above sea level, as obtained from the google maps API (Application Programming Interface); (2) Population density: the population density in a 5 km radius around the station location, the unit is people per km 2 ; (3) NO x emissions: annual anthropogenic surface NO x emissions for the year 2010 from the HTAP_v2.2 (Hemispheric Transport of Air Pollution) global emissions inventory (Janssens-Maenhout et al., 2015) (gridded data in 0.1 degree resolution and in units of grams of NO 2 m -2 yr -1 ); values range from 0 to 1000; (4) OMI (Ozone Monitoring Instrument) tropospheric column NO 2 : 5-year average (2011)(2012)(2013)(2014)(2015) high-resolution NO 2 column value from the OMI satellite instrument in units of 10 15 molecules cm -2 . Values are in the range of 0 to 20.80. High values are indicative of regions with fresh emissions of nitrogen oxides, an important ozone precursor primarily emitted by fossil fuel combustion. All of these variables were made available through the TOAR database and further details on their sources are provided by Schultz et al. (2017). The TOAR dataset also identifies stations that are "rural, low elevation", "rural, high elevation or mountain", and "urban" sites using an objective methodology based on satellitedetected nighttime lights, OMI tropospheric column NO 2 and population density. Roughly one half of all stations in the database are characterized by one of these labels. For the other half, the categorization is not robust, therefore these stations are labeled as "unclassified" (Schultz et al., 2017).
The trend analysis provided by the TOAR database records the results of regression analyses of ozone time series for each selected station during 2000-2014. The analysis fits the time series itself without including any explanatory variables. The assessment method is the Sen-Theil estimator and p-values were derived from Mann-Kendall tests (Theil, 1950;Sen, 1968;Kendall, 1975). We provide an illustration of summaries from many individual trends across eastern North America in the supplemental material for the interested reader (see Figures S1-S2 and  Table S1. DOI: https://doi.org/10.1525/elementa.243.s1).
Separate modeling of single surface ozone time series is the simplest approach for a trend analysis. However, for assessing regional trends this approach has been criticized because it does not account for the representativeness of a site in the ozone monitoring network, and ignores spatial dependency (e.g. ozone can change at neighboring stations in a similar manner due to changes in meteorology) and may thus cause a statistically less powerful and possibly misleading analysis for the assessment of regional trends (Thompson et al., 2001). This approach can also be severely biased by failing to account for the spatial dependency or irregularity (i.e. sub-regions of the network are more heavily weighted) of the station network. Therefore an advanced technique is required to quantify the systematic regional variations.

Methods: Generalized additive mixed model
The generalized linear model (GLM) is a mathematical extension of the classical linear regression model, which assumes a specific relationship (presumed linear dependence, but a polynomial relationship is allowed) between response and covariates via a link function (e.g. identical, log or logit). The GLM framework is widely used in the study of environmental time series (Chandler and Wheater, 2002;Yan et al., 2002;Chandler, 2005;Yang et al., 2005). In order to alleviate the linear constraints in the GLM, the generalized additive model (GAM) allows that one or more covariates depend on nonparametric smooth functions. Each unknown smooth function is represented by a linear combination of spline basis functions, i.e. linear covariates in a GLM are partly replaced by nonparametric spline functions in a GAM (Hastie and Tibshirani, 1990).
We consider an ozone time series where observations were averaged to monthly or seasonal means over a number of years. Nonlinear seasonal and interannual variations can thus be assessed by using the spline basis representation within the GAM framework. The GAM is also widely applied to the spatial analysis of regular or irregular data in order to account for geographical variability (Wood and Augustin, 2002;Wood, 2004). The generalized additive mixed model (GAMM) is an extension of GAM and allows for the incorporation of complex autocorrelation, and is therefore flexible for multi-site modeling (Carslaw et al., 2007;Ambrosino and Chandler, 2013;Park et al., 2013).
To describe the general framework in multi-site modeling, let y(i, t) be the ozone value at station i and time t, then the GAMM can be expressed as: y It decomposes the observations into the following additive components: 1) Linear terms xβ: x denotes the vector of explanatory variables listed in the previous section (including an intercept µ, representing the overall mean), β is the coefficient vector. Note that the demographic and physiographic information (e.g. station elevation) for each station remain unchanged over time in the TOAR dataset. 2) Smooth terms f (·): the covariates that are considered with a functional nature and thus modeled as nonlinear functions. The systematic regional variation can be regarded as a function of space and time. An explicit parameterization for this space-time variation is generally impractical, but in many cases it would be feasible by using spline smoothing controlled by the dimension of spline basis functions (Wood et al., 2016). In this study we consider three smooth components: seasonal (within-year), interannual (between-year) and purely spatial effects (i.e. not varying with time); each term is represented by a linear combination of spline basis functions. We refer to the interannual effect as the (deseasonalized) regional trend. We explain at the end of this section how to choose the type of spline basis function for representing underlying structure of these smooth terms. 3) Station-specific effects b(i, t): statistical models often assume independent observations. The model does not recognize that a series of observations is produced from the same station with a particular instrument. The random effects are introduced to avoid violating this assumption and therefore permit the clustering of observations by stations.
The residual noise series, ε(i, t), is modeled as an autoregressive process of the order 1. In the regional trend analysis we do not intend to estimate the spatial variations for individual months (i.e. the temporally varying spatial patterns). Instead, we add the station-specific random effects, b(i, t) (including random adjustments to the ozone baseline as well as random adjustments to the slope of the trend in each station and each year), to account for unobserved heterogeneity or correlation. These random effects enable the self-adjustment of the difference of an individual trend against the regional trend. Indeed, since we assume there is an "overall and averaged" regional trend, the measurements from each station should reveal at least some deviations from the average. As a result, the estimates for the explanatory variables may become less confident, but the autocorrelation of the residuals will be reduced. Moreover, since f 2 (interannual) is representing the overall and averaged regional trend in the study region, the individual trend for a station i can thus be represented by f 2 (interannual) + b (i, t).
For the model implementations, we choose spline basis functions for each smooth term and station-specific effects. Spline functions are known to provide an efficient approach for numerical computation. Each spline function is evaluated at knots and so we need to choose the number and locations of these knots in order to create a flexible and appropriate smooth system. The degree of smoothness can be controlled by the maximum number of knots K: the number is required to be a good compromise between computational feasibility and fidelity to the data (Wood, 2006). The seasonal cycle and interannual trend can be represented via basis expansions: where {w 1 } and {w 2 } are associated coefficients to be estimated, {φ 1 } is the penalized regression cyclic cubic splines (assumed with periodic nature) and places the knots at each month (K 1 = 6); and {φ 2 } is the penalized regression cubic splines (provided a convenient basis for computational efficiency) and places the knots at each year (K 2 = 15). The same basis functions are also employed for the station-specific effects. A large number of knots is required for the spatial variations to minimize complications from the irregular distribution of the network stations. For the spatial variations, the smooth function can be expressed as: where (L, l) is the collection of latitude and longitude from each site, {φ 3k (L, l)} is the collection of spatial spline basis functions evaluated at each location (L, l), and {w 3k } is the collection of associated coefficients. This procedure is a method of spatial interpolation/kriging for which the interpolated values are modeled by the Gaussian process penalized regression splines (Kammann and Wand, 2003), based on the Matérn covariance. A greater number of basis functions allows the fitted surface to be more complex and have a higher spatial resolution. If the value of K 3 is too small, the basis representation will not have enough degrees of freedom to represent local variability. Note that the irregularity also exists in the temporal domain, the so called time sampling issue (Tiao et al., 1990), but it is difficult to address explicitly if the observations are aggregated into a (regular) monthly average. The irregularity in the spatial domain, however, cannot be ignored in our analysis, therefore a higher dimension of basis functions is required for the capture of local variations (whenever appropriate). The choice of the number of knots is currently post hoc. The number of knots for each smooth term is chosen so that a further increase of knots would have negligible impact on the result (here K 3 = 80). More details about spline functions in the trend analysis can be found in Park et al. (2013) and Wood et al. (2016). The estimation is implemented in R package mgcv.

Summertime ozone trend analysis
In this section, we show how the systematic regional variations can be determined by employing the GAMM technique. We focus on the summertime period, which TOAR defines as the 6-month warm season (April-September in the Northern Hemisphere and October-March in the Southern Hemisphere). There are two reasons to take this approach: (1) many sites in the USA only measure ozone in the warm season, therefore our analysis for the USA will have less data interruptions; (2) Emissions have different effects on ozone in the warm and cold season. In the warm season emissions tend to produce ozone, while in the cold season fresh emissions tend to destroy ozone in urban areas. By focusing on the warm season it will be easier to interpret the results. We first provide a detailed demonstration of our approach as applied to a temporal trend analysis for rural and urban ozone measurements in eastern North America, and extend to a spatial and temporal trend analysis for accommodating all categories of monitoring sites. After demonstrating the methodology, we then expand our analysis to Europe and East Asia, two other regions of the world with enough stations to perform a reliable trend analysis.

Surface ozone in rural and urban sites
A total of 64,567 observations from 756 stations is used to construct the summertime ozone regional trend over eastern North America, including urban (140 sites), rural (273 sites) and unclassified (343 sites). We do not account for the spatial variations in this section as rural and urban sites are non-separable in space. Figure 2 shows the estimated summertime cycles and long-term changes for daytime average (blue) and DMA8 (red) ozone, separated by rural and urban sites. The Sen-Theil estimator is also fit to illustrate the tendency. The curvature in the estimated trend is generally slight, therefore the linearity would not be inappropriate in this case. We emphasize that the regional trend is referenced to the curve directly derived by the GAMM, the regression line merely enables us to summarize the overall tendency. It should be noted that autocorrelation is not taken into account for the Theil-Sen estimator and Mann-Kendall test in the TOAR dataset (though could be by using bootstrap simulations (Kunsch, 1989) or by incorporating an autoregressive process (Hamed and Rao, 1998)). The positive autocorrelation of the residuals can result in the underestimation of the uncertainty for the slope. In our approach the GAMM accounts for the autocorrelation by incorporating an AR(1) model and the autocorrelation is generally negligible for the resulting regional trend, it is therefore reasonable to use the Sen-Theil method. Note that all the nonlinear smooth terms can be regarded as "anomalies" (i.e. departures from an overall mean level, adjusted by explanatory variables in some cases). The estimation of means from which to calculate such anomalies introduces uncertainty, which is displayed within the band between dashed lines. The model intercept, µ, in each scenario was added back to these anomalies in order to compare the results over the original scale (we are doing so throughout the paper).
The Sen-Theil estimators (and 95% confidence interval for the slope estimated by a bootstrap method) for the regional trends from daytime average and DMA8, with p-values from the Mann-Kendall test to detect tendency, are shown in part of Table 1. The intercept and slope  values in the Table are referenced to the year 2000. The results show that rural ozone decreased relatively faster than urban ozone in both metrics; daytime average ozone in urban sites does not reveal substantial changes over 15 years. The DMA8 reveals a larger decline than the daytime average in both rural and urban sites.
The advantage of using the GAMM over the simple method described in the previous section is that it enables us to learn about associations in the environmental system by the visualization of uncertainty as well as the explanatory variable analysis. However, a practical issue is that when the number of observations is very large, standard errors (SEs) of estimates in the regression model become very small. As a consequence, most p-values for explanatory variables turned out to be highly statistically significant (i.e. very small p-value). In the large dataset statistical significance tends to diverge from practical significance. Hence we are conservative with the results even when the coefficients reach statistical significance. The results should not be over-interpreted.
The detailed summary statistics (mean, SE and p-value) of the fixed effects, i.e. the β covariate coefficients, separated by rural and urban sites with different metrics, are provided in Table S2. We present the main finding as follows: due to the current version of the TOAR database has only 1 year average of OMI NO 2 and NO x emission data, the correlation should be interpreted in a geographical sense. For example, similar to a higher ozone level typically observed at site located in a higher elevation (Vingarzan, 2004), NO 2 column reveals a strong positive correlation with rural ozone (i.e. higher correlations between ozone and NO 2 are observed in rural sites than urban sites, which is consistent to the result of Safieddine et al. (2013)). NO x emissions reveal a significant factor for both rural and urban sites with an opposite correlation. The emissions tend to be negatively correlated with rural ozone and positively correlated with urban ozone. Population density is less crucial for rural ozone.
4.1.2 Regional trend analysis After partitioning out the seasonal cycle and spatial variation, the regional trend from DMA8 ozone shows a faster decline than for the daytime average, as in the previous analysis (see Figure 2(c) and (d)).
The map presented here is the spatial prediction from statistical interpolation across the gap between each site, based on the GAMM fitting result. Any area more than 5% of the regional width from the nearest ozone monitoring site is left blank on the map (i.e. we only interpolate any gap less than 5% regional width, as extrapolation or interpolation across too large a distance tends to cause greater uncertainty). The interpolation method is assumed to be realizations of a Gaussian random spatial process from the GAMM estimation. It spatially interpolates values as linear combinations of the original observations (a weighted average of the observations in the neighborhood of the location), and this constitutes the spatial inference of quantities in unobserved locations. The GAMM itself accounts for the spatial weight estimated from each site that best describes the set of observed data. Therefore the spatial interpolation is independent from the estimation of interannual trends and seasonal cycles. The spatial distribution shows a lower mean level in the north and south of the region and a higher mean level in the middle area for both metrics. Since the regional trend is the primary focus of this work, the estimation of station-specific effects is relegated to the supplemental material (see Figure S3).
Summarizing the explanatory variables analysis indicates that a similar pattern can be found in station elevation and population density in both metrics (please refer to Table S2 for detailed numbers); elevation has a significant positive correlation with mean ozone level. A significant and negative relationship is observed between population density and ozone level. NO x emissions reveal an insignificant contribution due to an opposite effect in urban and rural sites (as in the previous section), and the significance could be neutralized in the whole regional analysis. The results from linear regression in Table 1 suggest a 4.2 and 12.2 ppb reduction of daytime and DMA8 ozone, respectively, in summertime over 2000-2014.

Investigation of spatial patterns
In order to investigate the changes of the spatial structure of the summertime mean of daytime average ozone and the 4th highest DMA8 (one summertime value per year per site, as opposed to using monthly means) over the same period, we interpolate the summertime distributions over the study region each year with a statistical method and then carry out the regression analysis to the regional summertime means from the interpolation. The analysis step is firstly fitting a surrogate statistical model based on the irregularly spaced observations (L i , l i ), i = 1, …, 756 sites, then smoothing out the irregularity by predicting the values over a regular network (L * , l * ) (here we use 0.5° × 0.5° regular grid), and finally averaging the predicted values over all grid points in the region (i.e. spatial aggregation). This model projection is particularly useful if we aim to compare the spatial distribution from observations to satellite data or global atmospheric chemistry model output, because we can project the interpolation from the surrogate statistical model onto the exact same grid point as the satellite data or atmospheric chemistry model output (Chang et al., 2015). We separate the spatial interpolations from the trend analysis (i.e. only spatial variations are evaluated), thus we can investigate the spatial patterns in different years. The station network in eastern North America is dense enough to allow us to do so. The spatial aggregation approach aims to estimate the ozone distribution in each year; the long-term regional mean changes are based on the results of 15 summertime averages of ozone distribution over a designed regular grid within the monitoring network. This approach implicitly assumes that the regional change can be represented by a series of estimated summertime means and the rate of change is the same for each site. This technique would be intuitive and straightforward under this assumption. However, the spatial aggregation approach may not have enough degrees of freedom to capture all of the temporal variability in the observations. In addition, the spatial coverage of the station network might change due to time series interruptions (e.g. the records at the Canadian sites are only available through 2013). Hence we only use this approach to investigate the spatial patterns. Figure 4 displays the approximated regional daytime average surface ozone distributions in 2000,2004,2009  and 2014. An ozone reduction can be found over the central area of the region. The ozone distributions of the 4th highest DMA8 in the selected years are shown in Figure 5. The 4th highest DMA8 shows a relatively clear decline in the whole region. There are two reasons for the low ozone in 2004. One is that summer 2004 was unusually cool, associated with meteorological conditions that were not conducive for stagnant air pollution events. This was was also the year that power plants across the eastern USA began using scrubbers to reduce NO x emissions, which is one of the main reasons why ozone has decreased across the eastern US over the past decade (Frost et al., 2006;Kim et al., 2006). The regression intercepts (and slopes) for the regional mean changes of daytime average ozone and the 4th highest DMA8 are 41.20(-0.22) and 77.11(-1.01), respectively. All three approaches discussed in this paper provide similar results for the daytime average ozone trend. However, the results cannot be directly compared for the DMA8. For the simple method described in Section 2 and this section, we used the summertime 4th highest DMA8. For the GAMM approach in the previous section (Figure 3), we used the monthly mean of DMA8. Therefore the decrease here of 15.2 ppb cannot be directly compared to the decrease of 12.2 ppb in Figure 3. All the results still indicate that surface ozone decreased over 2000-2014 in both metrics.

Europe
We select the sites located in Europe up to 66°N (North of 66°N is the Arctic circle according to the European region defined by the Task Force on HTAP). As a result, a total 76,520 observations from 1,007 stations are used in Europe, including urban (260 sites), rural (290 sites) and unclassified (457 sites). Figure 6(a) and (b) display the estimated summertime cycles and long-term changes. The spatial variations reveal similar patterns in Figure 3(c) and (d): lower values in western and northern Europe and higher values in southeast Europe. Both metrics indicate that a large spike occurred in 2003 (a well-known event associated with an extreme heatwave), followed by a small spike in 2006. The overall tendency of the regional trend for daytime average ozone is slightly decreasing over 2000-2014. A small amount of reduction can be observed in the DMA8.
The Sen-Theil estimators for summertime regional trends from daytime average and DMA8, with p-values from the Mann-Kendall test, are shown in part of Table 1. One important finding is that decreasing rural ozone can be observed in both metrics. Another finding suggests that decreasing DMA8 is detected in urban sites.
The detailed summary statistics of the fixed effects can be found in Table S3. Our main result is provided as follows: elevation has a significant positive correlation with mean ozone level. Population density shows that the correlation with ozone level is negative in urban sites and positive for rural sites (only for DMA8); the overall statistics show a negative impact (this finding is the same as the results in eastern North America). NO x emissions reveal an insignificant contribution to DMA8 in urban sites. NO 2 column tends to negatively correlate with ozone in urban sites and daytime ozone in rural sites, whereas the DMA8 has a positive correlation in rural sites.

East Asia
A unique overall trend is difficult to determine in East and Southeast Asia due to a large spatial gap between Japan and Taiwan. The results will be highly uncertain if we interpolate across such a gap, and also because there is a strong network asymmetry between East Asia (557 stations) and Southeast Asia (19 stations). In addition, an analysis indicates that the systematic variations in East and Southeast Asia behave differently. Therefore we separate the analysis over East Asia (including Japan and South Korea) and Southeast Asia (including Taiwan and Hong Kong). A total of 42,792 observations from 557 stations are used in East Asia, including urban (217 sites), rural (39 sites) and unclassified (217 sites). Figure 7 displays the estimated smooth terms in the model. The estimated regional trends are increasing for both metrics. A high mean level of ozone can be observed in south Japan. The Sen-Theil estimators for summertime regional trends and the Mann-Kendall test statistics are shown in Table 1. Urban ozone in East Asia shows a higher increasing rate than rural ozone. These results suggest a 6.8 and 4.5 ppb growth of the daytime average and DMA8 in urban ozone, respectively. The explanatory variable analysis shows that elevation and population density have a significant positive and negative correlation with mean ozone level, respectively. NO x emissions reveal an insignificant contribution to the DMA8, and have a significant contribution to daytime average with opposite correlation in rural and urban sites. NO 2 column tends to negatively correlate with overall statistics, whereas it shows a positive correlation with DMA8 when rural and urban sites are separated (see also  Table S4).
There are too few stations to estimate accurately the spatial variations in Southeast Asia, thus the spatial variations will not be evaluated. A total of 1,632 observations from 19 stations are used in Southeast Asia, including urban (11 sites), rural (3 sites) and unclassified (5 sites). We do not separate the urban and rural sites here as the results would not be robust for only 3 rural sites available. We only show the results on the seasonal cycles, long-term changes and the explanatory variables analysis. Figure 8 displays the estimated summertime cycles and long-term changes. A different seasonal effect is observed in contrast to the other regions: a marked drop from May-July can be found in both metrics (about 20 ppb decline in DMA8). The regional trend turns out to be linear increasing for both metrics. The increasing rate of the DMA8 is more than twice as great as the daytime average (see also Table 1). The result also shows that mean ozone level reveals a significant positive correlation with elevation and NO x emissions, and a negative correlation with population density. Tropospheric column NO 2 reveals a negative contribution to daytime average and a positive contribution to DMA8 in southeast Asia (See Table S5).

Summertime means trend analysis
In order to explore the regional trends of other ozone metrics, we carry out the trend analysis using annual summertime values (i.e. one value per site per year, calculated for the period from April to September) of the following four metrics: (1) daytime average: this metric has already been discussed and it will be relevant to the climate community, especially at rural sites because it gives a broad overview as to how the mid-range ozone values are changing, which can be compared to global models with relatively coarse horizontal resolution; (2) summertime mean of all daily 8-hour maximum values (avgdma8epax). This metric is used to determine the mortality due to longterm ozone exposure and is of great interest to researchers who study the impact of ozone on human health; (3) AOT40 is defined as cumulative ozone exposure over a threshold of 40 ppb. This is a metric designed to study the impacts of ozone exposure on vegetation; (4) A useful metric for the human health community is the number of days per summertime period in which the maximum daily 8-hour average exceeds 70 ppb (NVGT070). A potential complication for this metric is that some sites never exceed 70 ppb so their value is always zero. Due to different data characteristics, additional treatments are required for AOT40 and NVGT070. Therefore in this section the analysis is laid out by metric, rather than by region. This arrangement also enables us to directly compare the ozone pollution in the three regions with the most extensive ozone monitoring networks: eastern North America, Europe and East Asia. In order to assess the quality of spatial coverage of monitoring sites in different regions, we compare our approach to the results from the simple method of averaging all individual trends (as described in Section 2). Similar results are expected if the monitoring network is well developed with good spatial coverage. Since we use the summertime data in this section, no seasonal cycle will be evaluated, and we only focus on the estimation of the regional trends and spatial variations (without going through the details of the station-specific effects).
In this analysis we quantified regional ozone trends using all available stations regardless of the strength and statistical significance of the trend at each site because even sites with weak trends provide useful information that can be considered for the calculation of the regional trend (Chandler and Scott, 2011). To explore the contribution to the regional trend by sites that have statistically significant trends (p < 0.05) and sites that have insignificant trends (p > 0.05), we remove sites from the analysis sequentially according to p-value, beginning with the lowest p-values. This analysis is applied to eastern North America using the summertime mean of daytime average and DMA8 in the final step.

Daytime average
We first present the summertime daytime average trend analysis across eastern North America, Europe and East Asia. Figure 9 shows the estimated interannual trends and spatial variations based on summertime means over 2000-2014. The Sen-Theil estimator is less sensitive to the extreme event in 2003 and the ozone mean level across Europe remains steady over the study period. The mean daytime ozone reveals that there has been a gradual decline in eastern North America, while ozone shows a sharp rise in East Asia after 2011 (>10 ppb). No extreme ozone levels are found over eastern North America on the regional scale. A lower mean concentration can be found in northern and western Europe. In East Asia, the mean ozone level in the Republic of Korea is lower than Japan. The area with the highest mean level in Japan is Wakayama. The first half of Table 2 shows the Sen-Theil estimators for the detrended line by different regions. Rural ozone reveals a relatively higher baseline than urban ozone in all regions. Rural ozone in eastern North America reveals a steeper decline than in Europe. There is no significant trend for urban ozone in Europe. A slight decline of urban ozone can be detected in eastern North America. Both rural and urban ozone are increasing in East Asia, with urban ozone increasing faster than rural ozone.
The last two columns in Table 2 report the mean and standard deviation (SD) of all available individual trend estimates (i.e. "regional mean approach"). In cases when the slopes from the GAMM and regional mean approaches are similar, we conclude that the station network is well covered in this region, and a sophisticated statistical approach might not be required to assess the regional trend. This is the case for eastern North America and East Asia. A discrepancy is expected for the results in Europe due to the network being more scattered across northern and eastern Europe (the same scenario can be observed in the analysis of the summertime mean of DMA8, described below). Figure 10 shows the estimated interannual trends and spatial variations based on the summertime mean of DMA8 over 2000-2014. The trend and spatial features are similar to the summertime daytime average in the previous section. The second half of Table 10 displays the Sen-Theil estimators of the detrended line. The difference from the previous analysis is that the DMA8 reveals a larger decline than daytime average for both rural and urban ozone in eastern North America and Europe, while urban ozone shows insignificant changes in Europe. DMA8 at rural and urban sites in east Asia shows similar levels of increment as daytime average.

AOT40
The AOT40 values are summertime cumulative values and the range of values is relatively more wide spread than daytime average and DMA8. For instance, the range of AOT40 values in eastern North America over 2000-2014 is 8 to 46,234 ppb hr. In order to improve the linearity and the fit, we transform the AOT40 values by using the natural logarithm. Therefore the results should be interpreted by their exponential values. Figure 11 Table 2: The Sen-Theil estimators for summertime mean of daytime average and DMA8 regional trends, p-values are derived from Mann-Kendall test (GAMM approach), along with the means (SDs) of all individual intercepts and slopes (Regional mean approach displays the estimated trends and spatial distributions for different regions. The highest AOT40 mean concentration was found in Wakayama, on the southern side of Japan, corresponding to roughly 34,000 ppb hr. There is a cluster of high AOT40 values in southern Europe, corresponding to roughly maximal 32,000 ppb hr over land. The highest AOT40 value in eastern North America is about 22,000 ppb hr. The first half of Table 3 reports the Sen-Theil intercepts and slopes of the estimated regional trends in these three areas. The linear regression results suggest AOT40 decreased by half (from ∼20,800 to ∼11,200 ppb hr) in eastern North America over 2000-2014. In the same period AOT40 decreased from ∼12,900 to ∼10,200 ppb hr in Europe and increased from ∼15,700 to ∼19,600 ppb hr in East Asia. Most of the increase in East Asia is driven by the years after 2011, see Figure 11(a).

NVGT070
NVGT070 is the cumulative number of days per summertime period in which the maximum daily 8-hour average exceeds 70 ppb, and the values are treated as non-negative integer values, in contrast to daytime average and DMA8 which are treated as continuous values. Therefore the statistical model assumption needs to be adjusted. Poisson regression is a generalized form of regression analysis used to model count data. It assumes the response has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown covariates. There are two major issues in employing the Poisson regression: (1) A common problem with Poisson regression is excess zeros, for instance, 1,677 zeros out of 10,949 (15.3%) observations in eastern North America. The high proportion of zeros is often used to justify the use of zero-inflated models, although this sort of model is only appropriate when none of the covariates help to explain the zeros in the data (Wood et al., 2016). In our investigation the zero NVGT070 values are highly clustered in space, suggesting the need for a process with a spatially varying structure, rather than zero inflation; (2) A characteristic of the Poisson distribution is that its mean is equal to its variance. In certain circumstances, it will be found that the observed variance is greater than the mean; this is known as overdispersion and indicates that the model is not appropriate. There are several approaches to tackle this issue, here we adopt a negative binomial regression instead.
Negative binomial regression can be considered as a generalization of Poisson regression since it has the same mean structure as Poisson regression and it has an extra parameter to model the overdispersion. We briefly introduce the structure of negative binomial regression as follows. Let ỹ(i, t) be the NVGT070 value at station i and year t, then the probability mass function is modeled as: where Γ(·) is the gamma function and θ > 0 is the heterogeneity parameter. This structure has a property ( ) to accommodate the overdispersion. Figure 12 shows the NVGT070 summertime mean trends and spatial distributions over different regions. A marked difference from previous analyses on different metrics is that a larger coverage of high values in mean NVGT070 can be found over southern Japan, corresponding to at least 20 days per summertime period when the maximum daily 8-hour average exceeds 70 ppb.
The second half of Table 3 shows the Sen-Theil estimators of the detrended line. The mean NVGT070 value decreases from 12 days to less than 1 day in eastern North America over 2000-2014 (the overall mean from all sites is 1.47 in eastern US vs. 0.97 by model prediction over eastern North America in 2014, this discrepancy might be due to the lack of Canadian data in the year 2014). In the same period the mean NVGT070 decreases from 6 days to 2 days in Europe (the average of all available sites in 2014 is 3.15 days) and increases from 15 days to over 20 days in East Asia (the average of all available sites in 2014 is 24.33 days).

Sensitivity analysis of the trend to the representativeness of sites
We use 756 sites located over eastern North America in summertime as an illustration (with the same metrics described in Section 5.1 and 5.2). We conduct a sensitivity analysis for the long-term mean estimations by removing stations sequentially, according to the p-value for the slope of the trend at each site. We only illustrate the impact of removing stations on the trends, the rest of the estimations will not be shown. Figure 13(a) shows the regional daytime average trend for eastern North America using all 756 available stations (red, labeled as ALL). The slope is statistically significant with a slope of -0.30. We then threw out the 273 stations with p-values less than 0.05 to see the impact on the regional trend. Because these dropped stations have the strongest negative trends, the slope relaxed to -0.20 but it was still statistically significant (orange). We then threw out all stations with p-values less than 0.10 (a total of 312 stations thrown out) for a similar result. The trend relaxed to -0.17 and was still statistically significant (light blue). We then threw out all stations with p-values less than 0.15 (a total of 400 stations thrown out and over half of sites removed at this point) for a similar result. The trend was similar at -0.15 and was statistically significant (dark blue). Using a different metric we found similar results for the DMA8 (see Figure 13(b)), however the slope dropped faster due to more stations removed in each iteration. Table 4 shows the linear regression coefficients for the regional trend for further experiments. The daytime average regional trend remains negative and significant even after removing 551 stations with p-values less than 0.40. Despite the slope dropping faster for DMA8, the trend remains negative and significant after removing 663 (87.7% of sites) stations with p-value less than 0.50. Beyond this point there are few stations left for the regional analysis and the slope of the regional trend then becomes insignificant. We also provide an example of how the result of spatial kriging can be affected by the similar throwing out procedure in Figures S4 and S5, this example also suggests that useful information can be gleaned from many individual trends with p-values larger than 0.05, 0.10 or even 0.34.

Conclusions
This paper provides a trend analysis of summertime surface ozone in eastern North America, Europe and East Asia for several metrics during 2000-2014. Our approach assumes that there is an overall and averaged seasonal cycle and an interannual trend in the study region. The expected achievement in this approach lies in the combination and adjustment of the deviations from each station to the overall regional trend. All of the components in the GAMM are not new techniques, however, this sophisticated incorporation with a focus on overall variations of multiple time series for large and irregular spatial datasets has not been accounted as a whole in previous studies.   All of our approaches in this paper are easy to implement under moderate computational costs, and are suitable for application to the TOAR dataset.
The main results are summarized as follows: 1) In eastern North America surface ozone has decreased strongly in summertime (although the daytime average trend at urban sites is less certain).
The summertime mean of DMA8 shows a larger decrease than daytime average. AOT40 is reduced by roughly half (from ∼20,800 to ∼11,200 ppb hr) over the 15-year period. The average modeled value of NVGT070 decreased to less than 1 day in 2014.
2) The regression result of the ozone trends in Europe shows that significant decreases of daytime average and summertime mean of DMA8 are only detecTable in rural sites. AOT40 and NVGT070 decreased significantly in both rural and urban sites. The spatial distributions estimated from different metrics display a similar result: lower values in western and northern Europe and higher values in southern Europe.
3) All the metrics indicate that surface ozone increased over East Asia, with statistically significant trends of 0.40 and 0.37 ppb yr -1 estimated for summertime mean of daytime average and DMA8, respectively. AOT40 also reveals a significant increase of 260 ppb hr yr -1 . The linear regression predicts the NVGT070 value reached 20 days in summertime 2014. All the trends show a steep increase from 2011-2014. 4) The monitoring network is well covered and developed in eastern North American and East Asia, assessed by several metrics. A consistent result in Europe is difficult to achieve due to relatively limited monitoring sites over northern and eastern Europe. 5) The results from the sensitivity analysis clearly demonstrate that regional trends calculated from just the sites with relatively weak trends are spatially consistent with the regional trends calculated from all sites.
The GAMM has been shown to be applicable to an analysis of the TOAR dataset. It properly accounts for relevant covariate information before producing spatial distributions and regional trends. The GAMM can also take into account other factors known to affect surface ozone. For example the present study did not consider the wellknown association between weather and ozone due to lack of meteorological data in the TOAR database. In some regions, ozone is highly correlated with temperature as warm temperatures not only affect reaction rates but they are also associated with stagnant conditions conducive to boundary layer accumulation of ozone precursors (Porter et al., 2015;Pusede et al., 2015;Shen et al., 2016). Continuing development of the TOAR database will permit the inclusion of meteorological variables at all stations (from observations or reanalysis). These data will allow future studies to account for meteorological adjustment of ozone concentrations to provide additional insight, and a more elaborated interpretation, into regional or global scale surface ozone variability (Camalier et al., 2007).