Tropospheric Ozone Assessment Report: Database and metrics data of global surface ozone observations

terms of regions without monitoring, and in terms of regions that have monitoring programs but no public access to the data archive. Therefore future improvements to the database will require not only improved data harmonization, but also expanded data sharing and increased monitoring in data-sparse regions.


Introduction
Ozone in the troposphere is relevant to human health and the environment in several respects (Cooper et al., 2014;Monks et al., 2015;Schultz et al., 2015). High ground-level ozone concentrations impact the human respiratory system and impair the growth of vegetation. Furthermore, ozone is a greenhouse gas and plays a key role in photooxidation processes in the troposphere. Ozone is a secondary air pollutant, i.e. it is not emitted directly but formed in the troposphere as a result of chemical reactions of precursor gases such as nitrogen oxides, carbon monoxide, and volatile organic compounds (VOC). Ozone is lost chemically through photo-dissociation, reaction with HO 2 or NO 2 radicals, unsaturated VOC, or halogens. It is also lost through deposition at the surface and uptake by plants, or heterogeneous reactions involving aerosol. The global average photochemical lifetime of tropospheric ozone is between 20 and 25 days (Young et al., 2013), but generally less than 5 days in the summertime surface boundary layer. The local lifetime varies considerably depending on altitude, geographic location, season, temperature, humidity, and atmospheric composition.
Surface or ground level ozone is a term used to describe the ozone mole fraction in ambient air that humans and plants experience. It is typically measured by sampling air between 2 m and 10 m above the surface. Historical observations of surface ozone mixing ratios (mole fractions) range from zero to over 400 nmol mol -1 (Bartel and Temple, 1952;Riveros et al., 1998;Lacasaña-Navarro et al., 1999). As documented in this article, current measurements rarely exceed 200 nmol mol -1 (see Table 5 in section 5). "Zero ozone" (i.e. ozone at sub-nmol fractions) is often found in urban environments with high levels of nitrogen oxides at night, when excess amounts of NO emitted from combustion sources react with ozone to form NO 2 (for example Wang et al., 2012). At rural sites, low to very low mole fractions of ozone can be found at night due to ozone destruction at the underlying soil and plant surface (Galbally, 1968). Very low ozone mixing ratios are also seen during springtime in the Arctic troposphere where ozone gets destroyed from reactions with halogens (e.g. Tarasick and Bottenheim, 2002;Helmig et al., 2007;Simpson et al., 2007). Marine sites in clean tropical environments frequently report ozone mole fractions in the 10-20 nmol mol -1 range (e.g. Oltmans et al., 2006Oltmans et al., , 2012, while rural continental sites in the mid-latitudes show typical average mole fractions between 30 and 80 nmol mol -1 in the Northern Hemisphere, and 15 to 25 nmol mol -1 in the Southern Hemisphere (Galbally et al., 1986;Oltmans et al., 2012). The highest mole fractions are found in or downwind of major conurbations (e.g. Seinfeld et al., 1991). Mountain sites also generally exhibit higher mole fractions, in particular during the influence of stratospheric intrusions (e.g. Cristofanelli et al., 2006).
In spite of many years of research and substantial monitoring of surface ozone on the regional and global scales, scientists have been unable to answer the most basic questions: Which regions of the world have the greatest human and plant exposure to ozone pollution? Is ozone continuing to decline in nations with strong emission controls? To what extent is ozone increasing in the developing world? How can the atmospheric sciences community facilitate access to ozone metrics necessary for quantifying ozone's impact on climate, human health, and crop/ecosystem productivity?
To answer these questions the International Global Atmospheric Chemistry Project (IGAC) developed the Tropospheric Ozone Assessment Report (TOAR): Global metrics for climate change, human health, and crop/ecosystem research (www.igacproject.org/TOAR). Initiated in 2014, TOAR's mission is to provide the research community with an up-to-date scientific assessment of tropospheric ozone's global distribution and trends from the surface to the tropopause. TOAR's primary goals are: (1) Produce the first comprehensive tropospheric ozone assessment report based on all available surface ozone observations, the peer-reviewed literature and new analyses, and (2) generate easily accessible, documented ozone exposure and dose metrics at thousands of measurement sites around the world (urban and non-urban). The assessment report is organized as a special feature of Elementa.
Assessing the global distribution of tropospheric ozone near the surface and its trends in time is scientifically challenging, because ozone is a reactive gas with variable lifetime and consequently non-stationary distribution. Furthermore, there is inadequate data coverage in many regions of the world, combined with inhomogeneous data quality and metadata information, data access, and language issues. Prior attempts to summarize the global distribution of tropospheric ozone and its trends (e.g. Cooper et al., 2014;Sofen et al., 2016a) were limited to readily accessible data from large networks maintained by the World Meteorological Organisation, and North American, and European institutions, which introduced substantial geographical bias in the analyses. In the framework of TOAR the most comprehensive database possible of global surface ozone observations has been established at Forschungszentrum Jülich in Germany. The database contains surface ozone data sets with hourly time resolution collected from all accessible data sources worldwide, including regional or national air quality monitoring networks, multi-national programmes, and individual researchers' data. These data and the associated metadata that describe measurement sites and instrumentation have been augmented with several pieces of information from global gridded data sets. As a result, the TOAR database contains the world's largest collection of surface ozone observations in homogeneous form and allows for consistent analyses across all networks and in many world regions. While the TOAR database includes measurements from several hitherto inaccessible sources, a certain analysis bias remains due to the much denser observation networks in the Northern hemisphere mid-latitudes than anywhere else.
The TOAR database constitutes the foundation of all major analyses of surface ozone distributions and trends throughout the TOAR special feature. In particular TOAR-Health (Fleming et al., 2017, this issue), TOAR-Vegetation (Mills et al., 2017, this issue), and TOAR-Climate (Gaudel et al., 2017, this issue) draw heavily on data and data products from the database described in this article. In spite of its value for tropospheric ozone research, we would like to emphasize, however, that the TOAR database is not a primary data archive and has no intention to replace or substitute any existing data center for environmental observations, nor does it have any obligation to maintain updated data records or inform countries or other legal entities in legally binding form. Establishing the TOAR data archive has only been possible through cooperation with many officially endorsed data archives, and these remain the primary repositories for the vast majority of ozone observations that are now accessible through TOAR. TOAR provides a variety of ozone metrics based on hourly observations, but not the hourly observations themselves.
This article describes the TOAR database and the ozone data products (including standard statistics as well as metrics relevant for assessing health, vegetation, and climate impacts, and trend statistics) that have been generated from hourly averages of continuous surface observations. It further demonstrates new possibilities of surface ozone analyses that have become possible through linking ozone data sets with global metadata, and it highlights the necessity and problems of a posteriori data quality assessment. The article structure is as follows: Section 2 summarizes the data sets that have been identified and made available for the TOAR database. Section 3 describes the different methods for public access to the TOAR database and the surface ozone data products. Section 4 details the procedures that have been applied to harmonize the ozone data and metadata in the TOAR database, including the extended metadata that were added from several global, gridded data sets. Section 5 discusses ozone quality control issues. Section 6 presents the ozone metrics data sets and demonstrates the enhanced analysis potential made possible through this work. We discuss ozone changes with (station) altitude, regional ozone differences, and seasonal cycles of ozone in different latitude bands. Finally, section 7 presents conclusions including commentary on the current state of the global surface ozone observation network and recommendations for its future development. The paper is accompanied by detailed technical documentation on: the TOAR ozone metrics and metrics data products (Supplemental Material 1), and the Jülich Open Web Interface (JOIN; Supplemental Material 2). All TOAR surface ozone data products including also standard graphics and software, are available at the PANGAEA data publishing portal (https://doi.org/10.1594/PANGAEA.876108). For more detailed descriptions of ozone metrics, the rationale for adopting certain metrics, and for the actual analyses of the present-day surface ozone distribution and trends with respect to health, vegetation, and climate impacts, the reader is referred to the other articles of the TOAR special feature.

Available global surface ozone observations
Surface ozone measurements commenced in the 19 th century out of scientific curiosity and because it was believed "that the presence of ozone maintains health, and its absence is a cause of serious maladies" (Verdi, 1874). With the discovery of the ozone layer in the stratosphere (Fowler and Strutt, 1917) surface ozone measurements continued in the first half of the 20 th century as an adjunct to explore atmospheric composition. In the early 1950's, ozone was identified as the key component of photochemical smog in Los Angeles (Haagen-Smit, 1952). This led to the monitoring of surface ozone in the US and the subsequent discovery of photochemical smog in Australia (Galbally, 1971), the UK (Atkins et al., 1972;Derwent and Stewart, 1973), and Japan (Kondo and Akimoto, 1975). In Canada, ozone monitoring started in the early 1970s to investigate the cause of "Tobacco fleck" in southern Ontario (Cole and Katz, 1966).
The first global baseline ozone measurements at remote sites were initiated in response to the International Geophysical Year in 1957. The earliest global network of Background Air Pollution Monitoring (BAPMoN) was established under the auspices of the World Meteorological Organisation (WMO) in 1969. Some of the measurements initiated under this umbrella in the 1970s continue to the present, within the framework of WMO's Global Atmosphere Watch (GAW) programme (Schultz et al., 2015). More information on historic ozone measurements and ozone measurement techniques can be found in TOAR-Observations (Tarasick et al., 2017, this issue).
Since the late 1970's, air quality monitoring networks with ozone measurements have been established in several countries in Europe, North America, Australia, Japan, and South Korea. Over time, many other countries also established at least some air quality monitoring sites with ozone monitors. Often, the data from these networks are now available in near-realtime (see for example https:// aqicn.org/map/world/). However, the rapid reporting of these data precedes the required quality control, which is needed for the purposes of TOAR. Access to qualitycontrolled data from the networks' archives is more difficult due to either restrictive government regulations or the lack of supra-regional data archives. In some world regions, multi-national networks or databases have been installed that complement national monitoring and data provision. Examples are the East Asia (Acid Deposition) Network (EANET) or the European Environment Agency Airbase system. Unfortunately, as described in section 5, the data from these multi-national archives are not always fully consistent with the original data reported at the regional or national level. The TOAR database maintains these "duplicate" data records as individual data series and applies a merging procedure to select the most appropriate data for analysis (see section 5).
The focus of current air quality monitoring networks generally lies in urban and suburban areas. However, in the context of acid deposition monitoring, ozone monitoring stations were also established in rural areas. These stations allow for observations of regional baseline concentrations and attribution of high ozone episodes. Examples are the US National Park Service Gaseous Pollutant Monitoring Program, the US Environmental Protection Agency Clean Air Status and Trends Network (CASTNET), the Canadian Air and Precipitation Monitoring Network (CAPMoN), the European Monitoring of Environmental Pollution (EMEP) programme, and also the Acid Deposition Monitoring Network in East Asia (EANET).
Other surface ozone observations, typically of shorter duration, have been made during field campaigns or in support of ozone impact studies on forest or agricultural vegetation. Finally, a few programs have recorded multi-year ozone measurements from mobile platforms such as railway trains (for example TROICA: Oberlander et al., 2002;Pankratova et al., 2011), or ships (EU project APICE: Velchev et al., 2011).
For establishing the TOAR database of surface ozone concentrations we focused on data from stationary platforms, with hourly time resolution, and time series that are longer than 2 years. The vast majority of measurements were made with the UV absorption technique (see Tarasick et al., 2017, this issue). Passive sampling data were not considered due to the low time resolution. These choices were made in order to allow a globally uniform calculation of ozone metrics for the analyses in TOAR-Metrics (Lefohn et al., 2017, this issue), TOAR-Health, TOAR-Vegetation, and TOAR-Climate. Furthermore, these criteria allow characterization of at least some inter-annual variability, and to assess the robustness of the derived ozone metrics at each site. In order to achieve a well-defined data set and allow for some quality assurance of the ozone data (see section 5), the database was closed for new submissions in July 2016. Only corrections to existing data sets were accepted after this date. The most recent measurements that entered the TOAR analyses are from 2015, although most of the results presented throughout TOAR do not extend beyond 2014. In Europe, many datasets included in TOAR do not extend beyond the year 2012 due to changes in the Airbase data reporting system, which coincided with the build-up of the TOAR database and prevented inclusion of more recent Airbase data before closure of data submissions.
Data availability and accessibility of existing long-term surface ozone observations varies considerably among the countries and multi-national networks. Some networks maintain comprehensive, well-managed databases and allow open access through ftp, web downloads or interoperable web services. Such data were readily retrieved and included in the TOAR database. In cases where open access is not available or language problems prevented us from accessing or interpreting data directly, we tried to negotiate access to long-term archived data, which was particularly successful for data from Japan, and South Korea. In other regions, data collection and harmonisation remains fragmented and many different data providers must be addressed individually in order to obtain data. Major efforts were undertaken especially in Australia, South Africa, and South America to collect the available data and make them accessible to TOAR.
In some countries concerns about misuse or misinterpretation of the data prevent local authorities from openly sharing information, especially in the form of time-resolved hourly concentration values. Also, large requests for hourly data can impose a substantial work load on agency staff and they may not be in a position to engage in the necessary reprocessing of historic data if this is not a direct part of their mandate. In the case of research data, not all scientists who are involved in ozone measurements are fully supportive of the TOAR open data policy for different reasons. This may have prevented them in some cases from freely sharing their data with the TOAR database curators. We would like to note in this context that the TOAR database contains special provisions to restrict open access to the ozone metrics at a limited number of sites if this is requested by the data provider. In such cases, only the aggregated metrics data products (see section 6) are freely available. However, it is important that these metrics are derived consistently with the metrics at all other stations in the database, so only hourly data are entered into the database.
A first attempt about 10 years ago by the first author of this article to collect and harmonize global surface ozone observations resulted in a set of about 400 data files which contained publically accessible data from four networks, namely CASTNET, EANET, EMEP, and GAW. These data were then used in the evaluation of global chemistry models, for example in the context of multi-model experiments on the hemispheric transport of air pollution (e.g. Fiore et al., 2009;Rasmussen et al., 2012). In 2015, the results of a similar ozone data collection effort were published by Sofen et al. (2016b), including data from about 6,600 sites and 8 different networks with open data archives. The authors noted the existence of other, not readily accessible ozone data, but did not see themselves in a position to acquire and process such data. The primary intention of Sofen et al. (2016b) was again to provide surface observations for the evaluation of global chemistry models, and therefore their main products are gridded data sets containing spatial averages of monthly mean or other statistically aggregated ozone concentrations. The present work benefitted from the cooperation with E. Sofen on the identification of data format and data quality issues (see discussion in sections 4 and 5).
Going beyond previous ozone data collection efforts, the TOAR initiative has worked closely with data providers from around the world in order to increase coverage beyond the regions where data are easily available from open data archives. Through the creation of regional working groups and the efforts of the many co-authors on this article, it has been possible to integrate the hourly ozone data from 9,690 stations, many of which had never been available for internationally coordinated research. Some of these data, for example from Australia and Japan, date back to the early 1980s and thus fill important gaps in our knowledge about global surface ozone during that period. Table 1 lists the ozone monitoring networks and data sources of the TOAR database, and Table 2 provides a summary of the database holdings grouped by region. Regions are labelled as defined by the Task Force on Hemispheric Transport of Air Pollution (TFHTAP; Koffi et al., 2016). Large gaps remain in the global coverage of surface ozone measurements. Very few data exist in Africa, Central America, Central Asia, South Asia, and the Middle East. Where data are available from these regions, the time series are often short and sometimes appear to have some data quality issues. Figure 1 shows the temporal evolution of the number of ozone data records in the database by region. In general the data coverage has increased over time in all world regions. The dropoff in the most recent years is mostly due to delays in the final processing and validation of current data. In North America, most data originate from AQS and NAPS ( Table 1). The earliest data from AQS that are included in the TOAR database are from 1980, the earliest records from NAPS are from 1974. As we did not have NAPS data after 2013, the coverage in North America drops slightly during the final two years. In Europe, the majority of data are from Airbase with the earliest records from Great Britain dating back to 1973. Since Airbase changed their reporting system and data format for data after 2012, we could not include more recent data in the TOAR database. In East Asia, the earliest data are from Japan, dating back to 1976. Data from South Korea have been made available to us beginning in 2000. The TOAR database also includes data from 26 Chinese stations including 15 in Hong Kong, some of which date back to 1990. Unfortunately, we have not been able to obtain ozone data from the vast Chinese air quality network which commenced in 2012. Similarly, data from only 7 Taiwanese stations were included while at least 50 stations currently report ozone concentrations (AQICN, 2017).
The earliest data from Australia and New Zealand date back to 1978, while the majority of measurements in this region commenced around 1992. In Mexico and the Caribbean the network density has steadily increased since the mid-1980s. There are no other data from Central America. Data coverage decreases after 2010 in the TOAR database due to the fact that we obtained these data from individual researchers and not from official agencies and their analyses did not always include the most recent years. The earliest officially available data from Argentina and Chile are from 1994 and 1997, respectively, while ozone data from Brazil are available since 1998. Data classified as oceanic in the database are mostly from coastal sites and they are labelled OCN only because of inaccuracies in the global gridded map that was used to assign the TFHTAP region code to each station. True oceanic sites are American Samoa (GAW), Sable Island (NAPS, OTHER), Ieodo Ocean Research Station (OTHER), Ogasawara (EANET), and Minamitorishima (GAW). We note that several other island sites are not included in the OCN region. For example, Amsterdam Island (OTHER) belongs to region SAF, and Bermuda (GAW) belongs to Middle and Central America in spite of their remote locations (this has been coded in the TFHTAP gridded file). In future versions of the database, a better designation of island sites would be desirable.
The earliest data from Africa and the Middle East are from Cape Point, South Africa (GAW), which began in 1983. Assekrem, Algeria (GAW), and Amsterdam Island (OTHER) commenced in 1997 and 1995, respectively. Beginning in 2000, data from up to 20 stations are also available from Tehran, Iran. Unfortunately, these seem to have some data quality issues. From Southern Asia, 7 data sets could be obtained from India, and 1 data set from Nepal. From South East Asia, 5 data sets are from Indonesia, 4 from Thailand, 3 from Malaysia, and 1 from Vietnam. Central Asia has one data set from the station Issyk-Kul in Kyrgyzstan (GAW), and the RBU region contains data from 8 stations in Russia, 1 in Armenia, and 1 in Poland (misclassified because it is located on the border with Belarus). There are a total of 26 ozone data sets from the Arctic (north of 66°N), with 8 from Finland, 5 from Norway, 4 from the USA, 4 from Canada, and the rest from Denmark, Greenland, Sweden, and Russia. The US station Barrow has the oldest data going back to 1973.
In the Antarctic region (south of 66°S), a total of 11 stations contribute data (Table 2), with the oldest being the Amundsen-Scott South Pole station, going back to 1975. Some of these data from the early 1970s predate the establishment of the modern UV standard, and will have used the KI method, or a chemiluminescent method calibrated to the KI standard (Tarasick et al., this issue).   For this figure, a data record is defined as a station which has at least 3600 hours (~5 months) of valid ozone data in a given year. Regions are labelled according to the TFHTAP definitions (see Table 2). DOI: https://doi.org/10.1525/ elementa.244.f1 3. Access to TOAR data All TOAR data except for the original hourly time series are freely available for scientific and policy use. However, we want to stress that, in spite of close collaboration between the TOAR database curators and most data providers, the TOAR data products do not constitute official data and are therefore not to be used in analyses with legal implications such as exceedance monitoring. 1 The actual TOAR database with the hourly observations continues to develop (for example, we recently included ozone precursor and meteorological data for European stations), while the pre-compiled data products, which have been used in other parts of the assessment, represent a frozen snapshot in order to guarantee reproducibility of results.
For access to the pre-compiled TOAR data products including present-day monthly, seasonal, summertime, and annual data, trend datasets, and gridded datasets, we refer the reader the data publication on PANGAEA (https://doi.org/10.1594/PANGAEA.876108). The metrics files are provided as simple csv files. Gridded data products intended for model evaluation purposes are provided as netcdf files. More detailed descriptions of the file formats and variables are provided in the Supplemental Material 1.
The PANGAEA data portal also contains an extensive collection of plots similar to those presented in TOAR-Health, TOAR-Vegetation, and TOAR-Climate.
The live database can be accessed via the interactive web interface of the Jülich Open Web Services Interface (JOIN; https://join.fz-juelich.de), and through the Representational State Transfer (REST) services which are also provided by JOIN. Details are provided in the JOIN user guide (Supplemental Material 2).
The JOIN web interface allows for easy access to metadata and data from individual measurement sites. It includes several filtering options to select stations based on various metadata criteria (faceted search). Hourly, daily, monthly, seasonal, summertime, and annual data can easily be visualized as time series and, with the exception of hourly data, are also available for download as text files. The user can also generate comprehensive data summary plots, which contain a time series including data capture information, average annual, weekly, and diurnal cycles with distinction between night and day or season, respectively, frequency distributions, and trend information (see example in Figure 2 below).
For instructions how to generate such plots, see Supplement 2. Via the REST interface of JOIN, most information from the TOAR database is also available in an interoperable way. This allows for software access to TOAR data without the requirement to log into the JOIN web interface.

Data collection
The vast majority of data in the TOAR database were obtained from a few well-managed networks which maintain their own databases and apply their own quality control measures to varying degrees (see Sofen et al., 2016b for a discussion on typical data and metadata issues).
In each case we established contact with the database managers and individual data providers and asked for the best available data collection and metadata information. In many cases, this included extensive discussions of metadata and data quality issues with the data providers and archive managers. As noted above, in some regions the TOAR regional working groups and other individuals invested considerable effort to collect and harmonize ozone data sets and submit them to the database. A simple template in ASCII format was designed and shared for these submissions. We generated standardized data summary plots (Figure 2) from each data set that was inserted into the database and we shared these with the data providers to ensure that the processing was successful and also to prompt another critical look at the data.

Reported metadata and their quality control
The TOAR data submission template requested detailed metadata information, including, but not limited to the station location and altitude, station type ('traffic', 'industry', or 'background'), station type of area ('urban', 'suburban', 'rural', or 'remote'), and details about the measurement method, the data set PI, and the contributing organisation. Not all of this information is always available, and as of yet there is no standardized vocabulary applied to many of these metadata fields, limiting the ability to search for specific types of data in the database. For example, there are more than 50 different terms used to describe the most common ultraviolet absorption measurement method; in this case the information has been standardized. We performed extensive checking of station metadata, primarily with respect to the station location. In several hundred cases we tried to manually verify the station location through Google maps (Google, 2017), and often found either incorrect or imprecise coordinates which were then corrected. Often, these checks were prompted by detecting different coordinate values for identical stations in different data repositories. In extreme cases, the station locations differed by more than 30 km from each other (for example for Somerton, GB0044R between EMEP and Airbase). In most cases the differences were on the order of a few hundred meters or less. One means of checking station coordinates was to compare the reported station altitudes with the topographic elevations from Google maps. This not only revealed many cases with highly inaccurate station elevations (for example, many of the older AQS sites apparently had altitudes given in feet instead of meters, or they reported altitude above ground), but could also be used to correct the reported location of the site, in particular in mountainous terrain. For example, the reported station altitude of the Indian station Mt. Abu was 1680 m asl, while Google maps returned an altitude of only 1180 m at the reported coordinate values of 24.6°N, 72.7°E. The station building could be visually identified on the Google maps satellite image at the more precise location 24.653056°N, 72.779167°E. Using these coordinates to retrieve a new Google maps altitude yielded 1663 m asl, which is very close to the reported station altitude. Note, however, that differences in station altitude can also occur for sites where sampling occurs on tall towers. Some of these were identified and documented in the station_comments attribute, but we likely missed several other towers, because information on inlet heights is generally not available.
The coordinate corrections were also sent to the data providers for verification. More recently, we have begun documenting all coordinate corrections by introducing additional database fields. For example, three different station altitude values are maintained (station_reported_ alt, station_google_alt, and station_etopo_alt, see Table 3 below), and we use a station_alt_flag to document which of these is regarded as the most trustworthy piece of information. This value is then returned as station_alt. Similarly, we have begun to document the confidence we have in the station coordinates that are saved in the database (for details see the description of the database layout in Supplemental Material 2).
Despite these efforts many issues remain with respect to inaccuracies in the station information, and this may limit the applicability of these data for impact assessments. For example, while modern web services and GIS applications would make it feasible to relate air pollutant concentrations to the distance of pollution sources such as roads or industrial plantations, such analyses will produce erroneous results if the station coordinates are not maintained with high accuracy. Furthermore, many impact studies also require information about the sampling height of the measurement. This information is, unfortunately, currently not included in the TOAR database, because there are too few data sets that report this site characteristic.

Gridded metadata
In order to improve the characterization of stations and their environment, and to allow for globally consistent data aggregation by site-specific criteria, we obtained several high-resolution global gridded data products. As described above these additional metadata are also used to quality control the metadata information that is provided with the original data. Table 3 provides an overview of the station metadata added to the data submissions through extraction from global gridded data products. Additional information can be found in the description of the database layout in Supplemental Material 2.
The global gridded products are provided in different resolutions, are valid for different years, and may contain errors themselves. It is therefore not advisable to rely on one individual piece of metadata for selecting or aggregating surface ozone data. For example, we tried to correlate several "pollution indicators" (nighttime lights, population density, NO x emissions, and NO 2 tropospheric column densities), and found large scatter among these variables even though they are qualitatively consistent (Figure 3). Correlation coefficients r range from 0.46 to 0.83. The lowest correlation is found between nighttime lights and log(NO 2 columns), while the highest correlation is calculated for the pair of nighttime lights and log(population density). Nevertheless, taken together these indicators allow us to make some clear distinctions between more and less pollution-impacted environments, and with their help it has been possible for the first time to develop a globally applicable, robust station classification scheme. Table 4 lists the criteria that were applied to mark all stations in the TOAR database as "urban", "rural, low elevation", "rural, high elevation", or "unclassified". The main intention of this classification is to identify sites which should have a clear urban, or clear rural signature. Therefore, the criteria have been chosen so that only about one half of all stations are classified as either urban or rural. All stations which do not fall in one of these categories are labelled "unclassified". The thresholds listed in Table 4 were determined experimentally, starting from various definitions of "urban" and "rural" obtained from web searches and varying the thresholds until we achieved the most convincing results. We verified the classification scheme manually by checking that about 100 sites on all continents, which we know to be either urban or rural, are actually classified as such, and we checked another 100 sites or so by inspecting the station location on Google maps in high resolution. This classification scheme is used extensively in TOAR-Health and TOAR-Vegetation. Examples are also shown in section 6 of this paper.

Data quality control
The assembly of so many long-term ozone measurement records invariably raises questions about the consistency and comparability of these data. Different rules and procedures for the quality assurance of the measurements and of the data management are in place in the various networks and at individual sites. Furthermore, instruments, operators, and calibration procedures may change over time which can lead to more or less visible changes in the data record (cf. Zurbenko et al., 1996).
The principal sources of systematic data errors are: -Measurement errors, i.e. errors in the set-up or operation of the instrument, calibration errors, inadequate instrument operating conditions (power failures, lack of air conditioning, etc.), or inconsistencies arising from instrument changes or maintenance; -Sampling errors, i.e. an ill-positioned measurement site, improper set-up, or inadequate material of the inlet line, etc.; -Data processing errors, i.e. false flagging of suspicious or erroneous measurements, neglect of documenting special conditions, such as local pollution sources, unit conversion errors, errors when applying calibration results, arithmetic errors when averaging higher frequency data to the standard hourly resolution (including the neglect of data capture criteria); -Data submission errors, i.e. formatting errors, misinterpretation of flagging values, use of incorrect units, wrong time stamps due to incorrect time zone specifications or ambiguities with respect to stamping the beginning or end of an averaging interval.
On top of these systematic errors there are of course measurement uncertainties due to the measurement principle of the instrument, potential interferences, uncertainties of calibration, and instrument noise. The limited information on data quality that is available in current surface ozone data sets precludes a meaningful and systematic use of metadata for the identification of potential data inaccuracies. Furthermore, such metadata would not protect against data processing and data submission errors, of which we found many in the data that were made available to TOAR. We did not keep track of all individual data errors that were identified. A rough, subjective estimate is that more than 95% of the hourly ozone values do not show any obvious quality issues. Most of the questionable data concern complete time series from some fifty sites where reported ozone mole fractions are frequently interrupted and the values, diurnal and seasonal patterns simply don't match any expectation. These series are easily identified and excluded from further analysis. About 1% of the remaining data show questionable or erroneous features during some parts of the time series. This can be individual outlier values or, for example, calibration shifts which happen throughout one year or part of a year. Even if these cases are only 1%  The prefix "station_" is omitted from the variable names in the rule expressions for clarity. For details of the selected variables, see Table 3. Population density is reported in km -2 , nightlight is an integer index, omi_no2_column is given in 10 15 molec. cm -2 , and altitudes are in m.
of the data, several hundred time series are affected, and without accounting for such data problems, the TOAR statistics and trend analyses would lead to wrong results. The most prevalent, obvious errors are readily exposed through visible inspection of time series, or from database queries searching for negative or anomalously high concentration values. As an example, Table 5 lists the largest mixing ratios that are found in the data between 2008 and 2015 after we had completed our data quality inspections (see below). Each entry in Table 5 is accompanied by a comment based on a visual inspection of the respective time series around the maximum ozone occurrence.
Four types of erroneous values were most common, and we flagged such errors in hundreds of data series: (i) extreme outliers, (ii) large negative concentration values, (iii) extended periods of very low ozone with little variability, and (iv) periods with obvious unit conversion errors. In some cases of unit errors we were able to correct the data instead of flagging them. It must be noted that identification of data errors is not always straightforward. Outliers, for example, may also arise from real events like stratospheric intrusions, and unit conversion errors may be masked by drifts in instrument sensitivity or calibration. Due to the sheer amount of data we had to process and screen, not all discovered data quality issues were systematically logged. However, in many cases erroneous or questionable data were flagged, and the data providers were contacted. Clearly, for the future development of the TOAR database, it would be desirable to develop algorithms to pre-screen the data and alert the station operators about potentially flawed time series. While we have indeed begun such developments, there has not been enough time to conduct sufficient testing, to integrate such tests into a quality control system, and to optimize the tests in order to reduce false positive alarms and potential misses to the extent possible.
A particular data quality issue concerns very small or negative concentration values. Small negative concentrations can arise in ozone measurements due to the measurement principle of UV absorption instruments. However, assuming appropriate equipment with adequate signal to noise ratios and detection limits, and carefully maintained and calibrated instruments we would not expect mixing ratios lower than -1 or -2 ppbv in an hourly mean value (see Galbally and Schultz, 2013). Yet, at some stations mixing ratios of -5 ppbv or even lower were reported. These are likely associated with calibration errors and cannot easily be corrected, because it is unclear if the data should be shifted or truncated. Hence, such data have to be accepted as is (they are not flagged in the TOAR database), but the user should be aware that the data uncertainty will be large. It should be noted that individual data centers or data providers handle negative concentrations differently. Without detailed knowledge of their procedures, the resulting uncertainties of the data can be only poorly assessed.
Fortunately, at least in Europe, North America, Japan, and Korea, the density of stations is high enough that unusual patterns emerge during data analysis such as the mixing ratio and trend maps that were generated for TOAR-Health, TOAR-Vegetation, and TOAR-Climate. Indeed, in a few dozen cases such anomalies were seen on preliminary versions of the TOAR plots and they prompted us to perform a closer inspection of the respective time series, which in turn resulted in some unit corrections and the flagging of several outlier values.
For some regions with a limited number of sites but suspicious features in individual time series, we plotted  the data from neighbouring sites together in order to check their consistency and identify the most suitable time series. One example of such analysis is shown in Figure 4 for three Romanian stations from the EEA Airbase. In this example, ozone data from three stations which are less than 9 km apart from each other are compared. Two of the stations (DJ-3, and DJ-5) exhibit similar patterns of variability, although the ozone mole fractions differ considerably Figure 4: Comparison of an arbitrarily selected subset of hourly ozone data from three adjacent Romanian stations between October 2008 and January 2009. The three sites are named DJ-3, DJ-4, and DJ-5, respectively, and they are no more than 9 km apart from each other. All three sites are associated with EDGAR HTAP NO x emissions of 6.22 g m -2 year -1 , and average NO 2 column of 2.3⋅10 15 cm -2 , as they apparently all fall into the same 0.1° grid cell. DJ-3 is an urban traffic site with a nighttime light intensity value of 61 (the TOAR classification of this site is "unclassified"). DJ-4 is a suburban, industrial site with a nighttime lights value of 26, and DJ-5 is a suburban background site with a nighttime lights value of 16. The TOAR classification labels both DJ-4, and DJ-5 as "rural, low elevation". The apparent inconsistencies in the ozone data from DJ-4 are likely caused by NO x emissions from the nearby power plant (see map). DOI: https://doi.org/10.1525/elementa.244.f4 during at least a third of the depicted period. Ozone variations at the third site (DJ-4) initially also follow those from the other two sites but then change their behaviour. Ozone mole fractions at DJ-4 are then substantially lower than at the other two sites and only rarely correlate with them. We excluded all data from DJ-4, because this station was clearly visible as an outlier in all TOAR analyses. In retrospect, after re-investigating this case for the preparation of this manuscript, this may have been a mistake, because the observed ozone variations and lower mole fractions at DJ-4 could be real as there is a thermal power plant about 700 m away from the station. However, even so, it is questionable if such data with very strong local influence are suitable for TOAR analyses. Future analyses of this kind could greatly benefit from the inclusion of ozone precursor data, i.e. NO 2 or NO x measurements, and from more precise NO x emission data at finer resolution. The TOAR database contains several "duplicate" records where data from one station had been submitted to different data archives. One would expect these time series to overlap indistinguishably on top of each other. In most cases this is indeed the case. However, we did find instances with obvious discrepancies from periods missing in one time series but not the other, outliers flagged in one data set but not the other, to different concentration values due to subtle differences in the application of data coverage rules. For example, in one network hourly values were considered valid only if two 30-minute average values were valid, whereas the presence of only one valid 30-minute value sufficed in the other network. In the case of a European mountain station (Jungfraujoch, Switzerland) the comparison of data from the different archives also led to the discovery of a unit conversion error in one of the databases. Here, different reference temperature and pressure values for the conversion from ppbv to µg m -3 were assumed by the data provider and the data center, and for a high-altitude site this has a significant effect. Note that instead of removing duplicate records from the database we established a selection and merging table (see Supplemental Material 1).
In summary, this review showed that large uncertainties remain in the reported surface ozone data from many networks and data centers. In spite of substantial efforts to identify and flag erroneous and suspicious parts of the data, there may be many time series with anomalies or inconsistencies which escaped our attention. Conversely, as demonstrated by one example above, we may have erroneously flagged legitimate data in some cases, because of insufficient information. Fortunately, as evidenced by the analyses presented in TOAR-Health, TOAR-Vegetation, and TOAR-Climate, the impacts of remaining data errors appear to be minor, because the concentration and trend maps show largely consistent features across wide regions. Where trends vary within a region, this may be a combination of local impacts (changes of local emission sources) and data quality issues. As discussed with the example of three Romanian stations in the context of data consistency above, detailed knowledge about individual stations and instrument performance is required in order to avoid misinterpretation of ozone trends. In order to derive meaningful results that are robust representations of regional ozone changes, automated data filtering methods must be further developed. The objective station classification of TOAR is only a beginning in this regard.

Metrics data sets
Precompiled data sets with extensive sets of statistical quantities, including metrics defined for assessing ozone impacts on health, vegetation, and climate, and ozone trend statistics, have been made available for the TOAR report and wider research use at the PANGAEA data portal (https://doi.org/10.1594/PANGAEA.876108). These data products are provided in the form of easily readable ASCII files (csv format), or NetCDF (http://www.unidata.ucar. edu/software/netcdf/) files in case of gridded products for the evaluation of numerical models. A detailed description of the available surface ozone data products can be found in Supplemental Material 1.
The primary TOAR data products which are also used extensively in the analyses of TOAR-Health, TOAR-Vegetation, and TOAR-Climate, are the 5-year aggregate data sets and the trend statistics. Table 6 lists the time periods and the conditions that must be met by an ozone series to be included in the respective metrics or trend files. Note that ozone series are here defined as merged series (see section 8 in Supplemental Material 1) where appropriate.
All aggregate metrics and trend files are available for 6 different aggregation periods during the year ( Table 7). The extractions proceed in two steps: in the first step intermediate files are generated which contain metrics data for each individual year during the selected time interval (these are stored as "yearly statistics" on the TOAR data portal), then, in a second processing step, the aggregate or trend data sets are produced, thereby reducing the information from one line per year per station to one line per station.
All aggregated metrics and trend files preserve the complete metadata information that is provided for each site in the TOAR database. This makes it possible to relate metrics and trends to site characteristics such as population density, nighttime light intensity, etc.
Figures 5-8 demonstrate the value of the TOAR surface ozone data products and the possibility to distinguish sites by their metadata. Figure 5 shows annual median ozone mixing ratios versus station altitude compiled from more than 5000 data sets measured between 2008 and 2015. The data show a large spread in low altitudes ranging from around zero to above 50 nmol mol -1 . Ozone mole fractions generally increase with altitude (e.g. Staehelin et al., 1994;Chevalier et al., 2007), although the maximum values above 3000 m are only slightly larger than those in the lowest 500 m. Rural sites tend to exhibit larger ozone mole fractions than urban sites and there is no tendency of increasing values with altitude in urban environments. This indicates that ozone in the urban environment is primarily controlled by local chemical processes and less influenced by large scale advection. Figure 6 shows maps of monthly mean gridded daytime ozone averages in January and July between 2010 and 2014. The data represented in these maps are intended for the evaluation of global chemistry models. The grid size of 5° × 5° longitude-latitude was chosen in order to find a compromise between maximizing the data coverage and preserving regional-scale features. Data sets with other grid resolutions are also available on the TOAR data portal. While the maximum daytime average mole fractions are seen over Europe and North America during July, the largest values actually occur over East Asia in April and May (see Figure 7 and map plots on the TOAR data portal for details). The lowest values are seen over the Southern Ocean and South America during winter (i.e. July).
In Figure 7, seasonal cycles from the gridded daytime averages are shown at selected latitudes. Arctic and Antarctic sites exhibit the largest mole fractions during winter, while mid-latitudes show maxima either in  summer or spring, and tropical sites generally do not exhibit any pronounced seasonality. There are some distinct differences in the seasonal cycles of ground-level ozone in different regions (longitudes). For example, in the Northern Hemisphere subtropics (15°-30°N), Asian stations show a clear impact of the summer monsoon (low values in June and July, see green lines). A similar effect is also seen over the Southeastern US (purple line), but it is practically absent over Europe and other parts of North America. In the temperate regions of the Northern Hemisphere (30°-45°N and 45°-60°N) some regions exhibit a maximum in summer (July-August), while others show a springtime maximum. This has been extensively discussed in the literature (e.g. Derwent et al., 1998;Monks, 2000;Wang et al., 2003;Cooper et al., 2014). The seasonal cycle of ozone at stations in Southern Asia and Eastern Asia at 30°-45°N again indicate the influence of the Asian summer monsoon, although the minimum during June-July is less well developed than at 15°-30°N.
In Figure 8 we compare the seasonal cycles of different metrics for all rural sites in the latitude range 30°-45°N. The 5 th percentiles all maximize in spring and the ozone mole fractions are astonishingly consistent among regions.
The summertime values fall into two groups: one group shows low ozone due to the summer monsoon, particularly in Asia, while the other group shows a steady decrease of the 5 th percentiles towards the winter minimum. Median values exhibit greater variability: while some regions have the maximum in summer, others show again a springtime maximum. The 95 th percentiles are more consistent with respect to the seasonal cycle, with all regions displaying a broad summer maximum, except for East Asia, where the maximum occurs in April and May and it is also considerably larger than the maxima elsewhere.

Conclusions
The TOAR database constitutes the world's largest collection of surface ozone data achieved so far and is the primary source of information for the TOAR exploratory analyses of the global distribution and trends of tropospheric ozone using metrics relevant to the impact of ozone on human health, vegetation, and climate. Data from more than 9,600 sites around the world have been brought together in one single repository and thereby coverage in many world regions has substantially increased compared to previous global ozone data collections. However, important gaps still remain in several parts of the world as there are few measurements in Africa, Central and South America, Central and Southern Asia and the Middle Eastern region (or data from these regions were not accessible). Centralized extractions of aggregated metrics data sets applying consistent filtering and statistics routines to the entire data set for all TOAR analyses are made available through the TOAR data portal at the PANGAEA publishing site. Additional methods to access TOAR data are the JOIN web interface and the associated REST services. The TOAR database allows for novel ways of analysing surface ozone data and trends as a result of the extensive data collection, the data quality control efforts, the multi-faceted metadata, and the consistent processing of all surface ozone observations. One particular example is the first objective, globally consistent station characterisation through combination of various global, gridded data sets. Demonstrations of new surface ozone analyses are given in this paper.
An important lesson from the TOAR database building effort was the recognition of the value of intensive communication with ozone data centers and other data providers and the appreciation of their work, expressed also by the long author list of this paper. As there is no global standardization of atmospheric composition metadata, data formats and data access methods, the data collection effort is rather complex and error-prone (see discussion in Sofen et al., 2016b). Only by intensive communication with the original data providers has it been possible to resolve errors and ambiguities, and we hope that the feedback they have received from the TOAR database curators will help to improve the quality and consistency of the original ozone data repositories. Each panel contains all seasonal cycles within one 15° latitude band. Colors (i.e. hue) represent the longitude (in 30° bins). Solid lines represent grid boxes where at least 30 sites have valid data (i.e. at least 3 years with 75% coverage), dashed lines represent grid boxes with 6 to 29 sites, and dotted lines represent grid boxes with less than 6 sites. As in Figure 6, all stations at altitudes less than 2000 m and with at least 3 years of data during the interval were included. DOI: https:// doi.org/10.1525/elementa.244.f7 To enhance the usability and value of surface ozone and air quality observations worldwide, the community must make further progress on the data and metadata harmonization and quality control. We recommend to managers of air quality data that: (i) data centers improve the documentation of their metadata and include more information on the measurement sites, the instruments and calibration techniques in their repositories; (ii) station coordinates be provided with better accuracy and station locations be verified in order to allow analysis of ozone data together with high resolution geographical information, thereby enabling new applications such as automatic search for nearby pollution sources; (iii) more automated quality control tools be developed and harmonized among data centers. Automated quality control tools should be based on rigorous statistical methods and may benefit from new methods of big data analytics (e.g. "deep learning") in order to test data sets for consistency in space and time and use additional information such as meteorological fields or ozone precursor data to test the plausibility of reported ozone concentrations. Finally, we express our hope that access to quality controlled hourly observations of air quality will be facilitated through the implementation of state-of-the-art web services. At present a lot of manual intervention is needed in order to collect all available observations. Over the coming years we aim to further expand and continue updates of the TOAR database. Recent additions include ozone precursor and meteorological data from European stations. While the existence and accessibility of the TOAR database is guaranteed for several years, its further development will depend on the availability of funding.
The database and tools that have been developed in the context of TOAR constitute an important step towards a global data architecture (see for example https://www. rd-alliance.org/about-rda/who-rda.html) and are used as examples in presentations and discussions on this topic. This work shows a lot of potential but also reveals many issues with respect to building the fully interoperable, federated data repositories (https://en.wikipedia.org/ wiki/Federated_database_system) of the future. In particular, it clearly demonstrates that the cooperative involvement of many people is needed in order to bring such systems to life and to make sure that the underlying information is of known quality, robust, documented, and traceable.

Data Accessibility Statement
General access to TOAR data is free and unrestricted through the JOIN web interface (https://join.fz-juelich. de/) and its associated REST service (see documentation in Supplemental Material). This applies to all data products on daily and coarser time resolution. The hourly ozone data in the TOAR database are not publically available due to restrictions imposed by individual data providers. For use of the interactive JOIN web interface, a registration is required. Many of the original hourly ozone observations are, however, available from the original data center websites (see Table 1 of this article). The TOAR data portal on PANGAEA (https://doi.org/10.1594/ PANGAEA.876108) contains ozone statistics (including metrics for assessing health, vegetation, and climate impacts), trend estimates, and graphical material. The TOAR data portal also provides free and unrestricted access. All use of TOAR surface ozone data should include a reference to this article.

Supplemental Files
The supplemental files for this article can be found as follows: • Supplement 1. Documentation of TOAR surface ozone data products. DOI: https://doi.org/10.1525/ elementa.244.s1 • Supplement 2. Documentation of the Jülich Open Web Interface for accessing TOAR surface ozone data. DOI: https://doi.org/10.1525/elementa.244.s2 Note 1 In most cases the results provided through TOAR will be very close to official statistics, but there are subtle differences in the implementation of various TOAR metrics which may change the results of such analysis. Furthermore, as described in section 5, we performed extensive additional quality control on the TOAR database records and even though we reported data flagging changes or unit errors back to the data providers, these may not always be adopted in the original archives. Finally, some data flagging schemes can be ambiguous with respect to interpreting "unusual" data as either valid or invalid. Such data may be valid for TOAR purposes, but may be unsuitable for judging whether a given station attained the legal air quality criteria or not.