Full-coverage mapping and spatiotemporal variations of ground-level ozone (O 3 ) pollution from 2013 to 2020 across China Remote

Ozone (O 3 ) is an important trace and greenhouse gas in the atmosphere, posing a threat to the ecological environment and human health at the ground level. Large-scale and long-term studies of O 3 pollution in China are few due to highly limited direct ground and satellite measurements. This study offers a new perspective to estimate ground-level O 3 from solar radiation intensity and surface temperature by employing an extended ensemble learning of the space-time extremely randomized trees (STET) model, together with ground-based observations, remote sensing products, atmospheric reanalysis, and an emission inventory. A full-coverage (100%), high-resolution (10 km) and high-quality daily maximum 8-h average (MDA8) ground-level O 3 data- set covering China (called ChinaHighO 3 ) from 2013 to 2020 was generated. Our MDA8 O 3 estimates (pre-dictions) are reliable, with an average out-of-sample (out-of-station) coefficient of determination of 0.87 (0.80) and root-mean-square error of 17.10 (21.10) μ g/m 3 in China. The unique advantage of the full coverage of our dataset allowed us to accurately capture a short-term severe O 3 pollution exposure event that took place from 23 April to 8 May in 2020. Also, a rapid increase and recovery of O 3 concentrations associated with variations in anthropogenic emissions were seen during and after the COVID-19 lockdown, respectively. Trends in O 3 con- centration showed an average growth rate of 2.49 μ g/m 3 /yr ( p < 0.001) from 2013 to 2020, along with the continuous expansion of polluted areas exceeding the daily O 3 standard (i.e., MDA8 O 3 = 160 μ g/m 3 ). Sum- mertime O 3 concentrations and the probability of occurrence of daily O 3 pollution have significantly increased since 2015, especially in the North China Plain and the main air pollution transmission belt (i.e., the “ 2 + 26 ” cities). However, a decline in both was seen in 2020, mainly due to the coordinated control of air pollution and ongoing COVID-19 effects. This carefully vetted and smoothed dataset is valuable for studies on air pollution and environmental health in China.


Introduction
Ozone (O 3 ) is an important atmospheric trace gas, where O 3 in the stratosphere plays a crucial role in absorbing ultraviolet rays, protecting the environment and humans. Tropospheric O 3 (< 12 km above the ground) is mainly produced by anthropogenic activities, affecting radiative forcing at a global scale with implications on climate change (Checa-Garcia et al., 2018;Chen et al., 2007;Gaudel et al., 2018;Shindell et al., 2013;Sinha and Toumi, 1997). Exposure to high surface O 3 levels is highly related to increased human health risks, including cardiovascular and respiratory diseases (Bell et al., 2004;Lim et al., 2019;Turner et al., 2015). It also affects the ecosystem and agricultural production, e.g., inhibiting plant growth, promoting leaf senescence, and affecting crop yields (Ainsworth et al., 2012;Mills et al., 2018;Rai and Agrawal, 2012;Sitch et al., 2007).
Since the middle of the twentieth century, many countries around the world have observed tropospheric and ground-level O 3 . In 2013, the Chinese Ministry of Environment and Ecology (MEE) established a national air quality observation network to monitor real-time O 3 , particulate matter (PM), and other near-surface air pollutants (MEE, 2018). However, the construction and maintenance of ground networks require substantial manpower and material resources. As such, monitoring stations are sparsely distributed. Satellite remote sensing can make up for such a deficiency by providing spatially continuous atmospheric O 3 distributions. The Ozone Monitoring Instrument (OMI) on the Aura satellite, launched in 2004, provides a variety of widely used daily, global-coverage trace gas products, e.g., O 3 , nitrogen dioxide (NO 2 ), and sulfur dioxide (SO 2 ). Existing techniques from space mainly provide the total column O 3 , tropospheric O 3 , and ozone profiles at different vertical ranges (Liu et al., 2010). Near-surface O 3 typically accounts for only a few percent of total column O 3 , and the retrieval sensitivity to nearsurface O 3 from ultraviolet measurements is limited. In some cases, tropospheric total column amounts can be helpful for understanding global-and regional-scale features, but values for O 3 in the planetary boundary layer are challenging to obtain and at exposure heights (~2 m), even more so. It is thus particularly difficult to extract near-surface O 3 concentrations from satellite measurements.
In recent years, much effort has been made to estimate near-surface O 3 concentrations using three main methodologies: chemical transport models, statistical models, and artificial intelligence. Chemical transport methods mainly use mature models, e.g., WRF-Chem, CMAQ, and GEOS-Chem, to simulate O 3 at the ground level by considering chemical reactions and the transport of air pollutants (Di et al., 2017;Hu et al., 2016;Qiao et al., 2019;Wang et al., 2016a;Wang et al., 2015). Statistical models fit the relationships between the measured air pollution and their potential influential factors (e.g., satellite retrievals, precursors, and meteorology) by applying different regression methods, such as Land Use Regression (LUR; Beelen et al., 2009;Huang et al., 2017;Kerckhoffs et al., 2015;Son et al., 2018), Bayesian maximum entropy (BME; Adam-Poupart et al., 2014;Chen et al., 2020), the generalized additive model (GAM; Li et al., 2020b), and geographically weighted regression (GWR, Zhang et al., 2020). Artificial intelligence, i.e., machine and deep learning, allows for obtaining more accurate parameter estimates by mining valuable information from big data using different methods, e.g., neural network (Di et al., 2017), random forest (RF; Li et al., 2020b;Zhan et al., 2018), and eXtreme Gradient Boosting (XGBoost; Li et al., 2020b).
In general, chemical/numerical methods can provide high spatiotemporal coverage of near-surface O 3 simulations but are computationally intensive. Predictions with any chemical mechanism are sensitive in nonlinear ways to emissions and meteorology. Statistical models have been widely adopted because of their simplicity and rapidity, but they are sensitive to outliers and easily affected by collinear variables, leading to poor estimates. Artificial intelligence has become very popular recently due to its strong data-mining ability, but they are always directly applied and neglect the spatiotemporal heterogeneity of air pollution. Most past related studies are limited by input data sources, e.g., satellite total column gas products (e.g., OMI/Aura) with missing values, and meteorological products, e.g., National Centers for Environmental Protection (NCEP), Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), and ERA-Interim, at low spatiotemporal resolutions (e.g., 3-6 h, 0.25 • -0.625 • ).
Over the years, PM pollution has decreased significantly due to implemented environmental protection and control measures Wei et al., 2021a-c;Xue et al., 2021;Zhang et al., 2019). By contrast, surface O 3 pollution has increased in China Lu et al., 2018;Wang et al., 2016b;Wang et al., 2020), creating a major public health concern (Shen et al., 2019). Compared with PM studies, research on ground-level O 3 in China is more meager. Therefore, aimed at addressing the above problems, according to the idea of ensemble learning and considering the spatiotemporal variations in O 3 pollution, we extended a space-time extremely randomized trees (STET) model to derive daily ground-level O 3 concentrations with full spatial coverage at a resolution of 10 km from 2013 to 2020 across China. We also validated our O 3 estimates at different spatiotemporal scales and investigated the variations in daily and multi-year O 3 pollution via time series analyses across China.  (Fig. 1). We first removed invalid values and abnormal values due to instrument calibration issues. More importantly, since 31 August 2018, the reference state of gas observations was changed from the standard condition (i.e., 273 K and 1013 hPa) to room temperature and pressure (i.e., 298 K and 1013 hPa). The new measurements of O 3 concentrations (in μg/m 3 ) were thus correspondingly rescaled by a factor of 1.09375 (MEE, 2018). For data presented here, 1 μg/m 3 is equivalent to 0.467 ppbv. Additionally, we averaged maximum O 3 concentrations over eight hours in a day to obtain MDA8 O 3 values at each station in China for each year from 1 January 2013 to 31 December 2020.

Potential factors affecting surface O 3
Surface O 3 , a secondary air pollutant, is the characteristic product of complex photochemical reactions affected by numerous natural and human factors. Most satellites (e.g., OMI) provide only total-column or tropospheric O 3 retrievals, rather than lower tropospheric O 3 retrievals where there are large differences in O 3 content. Long-term satellite O 3 products with high spatial resolutions are rarely available, and those existing satellite retrievals have numerous missing values. In our study, we provide a new approach for estimating high-resolution surface O 3 concentrations with full coverage using two crucial meteorological parameters, namely, solar radiation intensity and surface temperature (Bloomer et al., 2009;Lee et al., 2014;Li et al., 2020a). Thus, available surface solar radiation downwards (or downward shortwave radiation, DSR) and air temperature (TEM) measurements are used as main predictors for the ground-level O3 estimation.
Remote sensing measurements of OMI/Aura total-column O 3 products (Pawan, 2012) were also considered. NO 2 concentrations may have large impacts on O 3 , so OMI/Aura tropospheric NO 2 products were utilized (Krotkov et al., 2019). The LandScan™ product was also selected to provide population distribution (POP) information. Also Moderate Resolution Imaging Spectroradiometer land cover type (LUC) and NDVI products, and Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) data were employed to describe land-use and terrain changes across China. Anthropogenic emissions from fossil fuel combustion, industrial production, and vehicle exhaust are precursors affecting the formation of surface O 3 (Li et al., 2020a). Direct emissions of three main O 3 precursors, i.e., nitrogen oxides (NO x ), volatile organic compounds (VOCs), and carbon monoxide (CO), whose concentrations were provided by the Multiresolution Emission Inventory for China (MEIC) (Li et al., 2017a;Zheng et al., 2018), were thus used. Table S1 summarizes the ground-based, satellite remote sensing, atmospheric reanalysis, and model emission datasets used in this study. Except for meteorological conditions, the spatiotemporal resolutions of other auxiliary data are coarser than our targeted model resolution. The coarser-spatial-resolution variables (e.g., emissions and NO 2 ) show smaller variations in space than meteorological variables. The coarsertemporal-resolution variables (i.e., DEM, LUC, and POP) change little over time. In addition, they are generally less important than the two main predictors (i.e., DSR and TEM) in estimating surface O 3 . Therefore, similar to previous studies Wei et al., 2021b;Zhan et al., 2018;Zhang et al., 2020), all finer and coarser-resolution auxiliary data were aggregated and resampled (regridded) to the same spatial resolution of 0.1 • × 0.1 • using the bilinear interpolation approach, and the same time interval.

STET modeling
In this study, a STET model was extended for estimating ground-level O 3 concentrations (Wei et al., 2020(Wei et al., , 2021a, based on the ensemble learning method called extremely randomized trees (extra-trees, or ERT) (Geurts et al., 2006).

Model training
First, all selected factors with potential effects on surface O 3 were input to the ERT model for model training, followed by four main steps: 1) A training-and-validation dataset (N) was generated by collocating surface O 3 measurements, satellite data, meteorological variables, and model emissions at each surface monitoring station on each day of one year. The entire training dataset was then used to construct each decision tree. 2) For each binary tree, a random split (S, a) was first generated according to surface O 3 measurements by randomly selecting one arbitrary number (a c ) between the maximum (a max ) and minimum (a min ) values. The training samples were next randomly assigned to two branches. 3) All auxiliary feature attributes (a 1 , …, a k ) in the node were traversed to get bifurcation values (s 1 , …, s k ) for all feature attributes, based on the Gini index (GI; Jiang et al., 2009). The best split (s*) was then determined when the scoring function was satisfied: score(s*, S) = max[Score(s i , S)] (Geurts et al., 2006). 4) A decision tree was established using the classification-andregression-trees (CART) algorithm (Breiman et al., 1984). Thousands of decision trees were constructed by repeating the above steps. Last, all weak classifiers were combined to form a strong classifier, i.e., ERT, allowing for parallel processing.
Fig. S1 shows a basic diagram of how trees begin their learning. Geurts et al. (2006) provide detailed information on how the ERT model works. The ERT model enables us to evaluate the importance of each independent variable for the surface O 3 estimation, assigning an importance score to each variable, calculated according to the GI. It normalizes the cumulative changes in GIs before and after node branching for each feature during the model training (Jiang et al., 2009). The higher this score, the more important is this feature in the decision-tree construction. Variables with high scores make greater contributions to the model performance. Low-score variables have smaller effects on the model and may generate redundant information (Wei et al., 2020(Wei et al., , 2021a. Variables with importance scores less than 1% are eliminated from the model to improve its efficiency and avoid overfitting caused by redundant input variables. Per our analysis of each feature importance, DSR and TEM are the two most important variables for model construction, with high importance scores of 32% and 14%, respectively (Fig. S2). OMI NO 2 and O 3 products are also highly valuable, with importance scores of 6% and 5%. However, they can only provide trace gas information about the troposphere and the whole atmosphere. Other meteorological variables (especially ET and RH) and two land-related variables (i.e., DEM and NDVI) also have significant impacts on O 3 estimates, with importance scores ranging from 2% to 7%. The emissions of three main O 3 precursors (i.e., NO x , VOCs, and CO) influence the model to some degree, with importance scores of about 2%. In general, all 18 selected independent variables have an impact, with importance scores >1.5%. They are, therefore, all kept in the model.

Model extension
In the second stage, we extended a STET model for the surface O 3 estimation by considering the autocorrelation of O 3 pollution in space and its differences in time series using the original ERT model (Wei et al., 2021a). The position of one point in space is expressed by its longitude and latitude and the Haversine great-circle distances to the four corners and the center of the study region (i.e., 73.6 • E-134.8 • E, 15.8 • N-53.7 • N). The time is expressed by the day of the year (DOY), set to identify each raw data record on each day under different air pollution conditions. The above-mentioned independent variables, along with space and time terms, are input into the model to build a robust ground-level O 3 estimation specific to China.

Validation method
In this study, the widely adopted out-of-sample (sample-based) 10fold cross-validation (10-CV) method was selected to test the overall model performance in estimating near-surface O 3 concentrations (Di et al., 2017;Li et al., 2020b;Liu et al., 2020;Zhan et al., 2018). This method stipulates that all data samples are first randomly divided into 10 groups, of which 9 groups (i.e., 90% of the samples) are used for model training, and the rest (i.e., 10% of the samples) are used for model validation. This operation runs 10 times to ensure that all samples have been tested (Rodriguez et al., 2010).
Furthermore, an additional out-of-station (station-based) 10-CV method was employed to test the spatial predictive ability of the model in areas without ground-based measurements (Li et al., 2017b;Wei et al., 2020;Wu et al., 2021). It is performed using measurements from ground-based O 3 monitoring stations. The stations are randomly divided into 10 groups, of which data samples from 9 groups (i.e., 90% Fig. 2. Flowchart of the mapping process of the ChinaHighO 3 dataset in our study. of the stations) and the remaining group (i.e., 10% of the stations) are employed for model training and validation, respectively. The training and validation samples thus represent data samples collected at different locations in space. This method enables us to evaluate the predictive accuracy of the model at locations where ground-based O 3 measurements are unavailable.
In addition, several main statistical metrics were used, including ordinary least squares (OLS; Zdaniuk, 2014) regression (e.g., slope and intercept), coefficient of determination (R 2 ), root-mean-square error (RMSE), mean absolute error (MAE), and mean relative error (MRE). Deseasonalized O 3 monthly anomalies were adopted to calculate temporal trends (Wei et al., 2019) and used to analyze the long-term spatiotemporal variations in O 3 pollution across China. Fig. 2 shows the flowchart of the mapping process of the ChinaHighO 3 dataset in our study.

Overall model performance
First, we validated the overall performance of the developed model using the out-of-sample approach at different spatial scales. Collocated are more than 3.5 million data samples (N = 3,567,344) from 2013 to 2020 across China. The MDA8 O 3 estimates for the whole of China are highly consistent with surface measurements (CV-R 2 = 0.87), with the slope and y-intercept equal to 0.87 and 11.8 μg/m 3 , respectively (Fig. 3a). The mean RMSE, MAE, and MRE values are 17.10 μg/m 3 , 11.29 μg/m 3 , and 18.38%, respectively, over the entire domain. Note that the overall accuracy of O 3 estimates has significantly improved compared to results derived from the original ERT model (i.e., CV-R 2 = 0.78, slope = 0.81, RMSE = 22.39 μg/m 3 , and MAE = 14.88 μg/m 3 ) (Geurts et al., 2006). This confirms the necessity for including spatiotemporal information about O 3 pollution.
We further tested the model performance in typical regions in China. The model works best over the Beijing-Tianjin-Hebei (BTH) region (Fig. 3c) and the North China Plain (NCP, Fig. 3b), with out-of-sample CV-R 2 values of 0.91 and 0.89, respectively, and slopes from linear regression close to 1.0 (0.91 and 0.89, respectively). The model performance is slightly poorer (e.g., CV-R 2 = 0.85-0.86, and slope = 0.84-0.86) in the Yangtze River Delta (YRD, Fig. 3d), the Pearl River Delta (PRD, Fig. 3e), and the Sichuan Basin (SCB, Fig. 3f). Overall, the model uncertainty is generally small and stable with small differences (e.g., RMSE = 18.9-21.3 μg/m 3 , MAE = 12.1-13.6 μg/m 3 , and MRE = 17.6-24.7%). These results suggest the varying robustness of our model at the regional scale in China, stemming chiefly from variable input parameters in terms of their density and accuracy.
The model performance for each separate year (Table 1) was also evaluated for the whole of China. The overall accuracy of the MDA8 O 3 estimates in the years since 2017 (i.e., out-of-sample CV-R 2 = 0.89-0.93, RMSE = 11.9-15.6 μg/m 3 , MAE = 7.9-10.8 μg/m 3 , and MRE = 10.3-15.0%) is generally better than that of the previous years (i.e., outof-sample CV-R 2 = 0.79-0.82, RMSE = 19.1-22.4 μg/m 3 , MAE = 12.9-14.9 μg/m 3 , and MRE = 21.4-31.8%). The continuous increase in the density of monitoring stations, resulting in a sharp increase in the number of data samples, and instrument improvements and quality control upgrades explain this (Wei et al., 2021a). Overall, our model works well for the study period considered and for individual years in the study domain.
On the individual-station scale (Fig. 4), the sample size varies from site to site due to differences in the observational record and the number of useful data samples from 2013 to 2020. Except for a few stations established later in the study period, most stations have a sufficient number of data samples (Fig. S3a), with an average sample size (N) of 2230 and with more than 83% of the stations having at least 5 years of data samples (i.e., N > 1825). In terms of model accuracy, CV-R 2 values exceed 0.8 at ~83% of the stations, especially those located in central and eastern China (CV-R 2 > 0.9

Spatial predictive capability
Next, we focus on evaluating the ability of our model to predict surface O 3 spatially using the out-of-station approach at varying spatial scales. Over the entire domain, our surface O 3 predictions are well correlated to observations (e.g., CV-R 2 = 0.80, slope = 0.84), with mean RMSE, MAE, and MRE values of 21.10 μg/m 3 , 13.87 μg/m 3 , and 23.18% (Fig. S4a). These values are somewhat lower than the out-of-sample validation results (Fig. 3a), indicating a strong spatial predictive ability. The spatial predictive ability of the model also gradually increases over the years (Table 1), possibly due to the same reasons discussed above. On the regional scale, the predictive ability of the model varies, i.e., better surface O 3 predictions were observed in the BTH (CV-R 2 = 0.87, RMSE = 21.51 μg/m 3 ; Fig. S4c) and NCP (CV-R 2 = 0.84, RMSE = 21.99 μg/m 3 ; Fig. S4b) regions, while relatively less accurate predictions were observed in the YRD, SCB, and PRD regions (CV-R 2 < 0.80, RMSE >22.6 μg/m 3 ; Fig. S4d-f). Compared with the out-of-sample results ( Fig. S4: b-f), the accuracy changed little. The evaluation metrics of the former two regions decreased slightly less than those of the other three regions. This is mainly due to the different densities of monitoring stations in each region.
Furthermore, the spatial predictive ability of the model shows spatial differences on the individual-station scale (Fig. S5). The predictive accuracy of surface O 3 concentrations is poor in most stations located in western China, with large estimated uncertainties (e.g., CV-R 2 < 0.  Fig. 5 shows the time series of the evaluation metrics used to evaluate how well the model estimated daily O 3 surface concentrations from 2013 to 2020. The daily sample size is large, ranging from 8003 to 10,060, with an average of 9766 and remaining unchanged over time (Fig. S3b). This is due to the unique advantage of the ChinaHighO 3 dataset, namely, its full coverage of China. While the performance varies somewhat seasonally, the magnitudes of the changes are moderate throughout the year, with CV-R 2 s ranging from 0.69 to 0.89 (average = 0.81), exceeding 0.75 on about 88% of the days in a year. The absolute uncertainties (i.e., RMSE and MAE) of the O 3 estimates vary seasonally, i.e., low in spring and winter but high in summer. By contrast, the relative uncertainty (MRE) shows an opposite seasonal variation. Surface MDA8 O 3 concentrations are relatively high in summer in most midlatitude regions of China (Gong et al., 2018).  Combining ground-based observations, satellite remote sensing data, atmospheric reanalysis products, and emission inventory, we generated the ChinaHighO 3 dataset using the STET model, which belongs to one of the series of long-term, full-coverage, high-resolution and high-quality ground-level air pollutants for China (i.e., ChinaHighAirPollutants, CHAP). The ChinaHighO 3 dataset includes daily MDA8 O 3 maps from 1 January 2013 to 31 December 2020, which was released on 30 December 2020 and is constantly updated. It overcomes the problem of missing data in optical remote sensing products caused by cloud contaminations and can provide full-coverage ground-level O 3 distributions over China. Monthly, seasonal, and annual MDA8 O 3 maps from 2013 to 2020 are also available (Table S2). The spatial pattern of O 3 in autumn (average = 80.9 ± 10.1 μg/m 3 ) was similar but generally higher than that in winter across China, especially in southeastern areas. Fig. S8 shows zoomed-in summer mean O 3 maps for four key regions in China. The ChinaHighO 3 dataset can reflect and describe well the distribution of and variation in ozone pollution at the local, even urban, scale due to its high spatial resolution of 10 km. All four typical regions experienced different degrees of O 3 pollution in summer, especially the BTH (average = 142.9 ± 14.5 μg/m 3 ) and YRD (average = 113.9 ± 13.7 μg/m 3 ) regions.

A short-term severe O 3 pollution event
In addition to the atmospheric environment and air quality, shortterm exposure to air pollution has a significant impact on human health (Almeida et al., 2011;Giani et al., 2020;Wong et al., 2007). It is difficult for traditional ground-based observations and previous satellite estimates to accurately capture air pollution on a wide scale due to the uneven distribution of surface monitoring stations. Our study helps make up for this deficiency by monitoring the pollution exposure at unprecedented spatiotemporal scales, i.e., generating spatially continuous and full-coverage daily surface O 3 maps. This allows users to quickly obtain more accurate estimates of the distribution of and variation in O 3 pollution at any location, especially in those areas with no or minimal ground-based measurements.
We closely examined a severe surface O 3 pollution episode that occurred from 23 April to 8 May in 2020 in eastern China (Fig. 8). Before 25 April, O 3 was at a low level across the whole country, then gradually  increased. On 28 April, the O 3 levels in all "2 + 26" cities ( Fig. 1), the main air pollution transmission belt in the BTH and its surrounding area ("2" refers to Beijing and Tianjin, and "26" refers to 26 prefecture-level cities in Hebei Province), had exceeded the ambient air quality standard, i.e., MDA8 O 3 = 160 μg/m 3 (Fig. S9) On 30 April, BTH experienced the maximum level of O 3 pollution (average = 232.1 ± 47.2 μg/m 3 ), remaining high until 2 May, when >50% of the cities in China exceeded the daily ozone standard. The air quality was significantly improved in northern BTH starting on 3 May, but central and southern China still suffered from light to moderate pollution, with some cities experiencing severe pollution. This national heavy pollution event lasted for nearly a week.
Surface O 3 concentrations were generally low in SCB before 25 April, gradually forming into regional pollution on 26 April, polluting the Chengdu Plain and southern and northeast Sichuan to varying degrees. By 28 April, most cities exceeded the ambient air quality standard. Pingyuan and southern Sichuan were heavily polluted, and O 3 concentrations remained high, reaching a maximum on 3 May, with an average value of 184.3 ± 29.8 μg/m 3 in SCB (Fig. S9). On 6 May, the polluted air moved southward, gradually decreasing in pollution intensity. After 7 May, accompanied by cooling and precipitation, this episode of ozone pollution ended, and the air quality improved to good or excellent. This episode of severe regional pollution lasted for about 11 days, the first severe ozone pollution event with a long duration and wide coverage in Sichuan province since the start of 2020.

Changes during the COVID-19 pandemic
The coronavirus disease (COVID-19) broke out in Wuhan, Hubei Province at the end of 2019, quickly spreading to the whole country due mainly to the Spring Festival (WHO, 2020;Zu et al., 2020). To prevent the further spread of COVID-19, Hubei Province went into lockdown starting at 10 am on 23 January 2020, soon followed by almost all other major cities in China and lasting for about three weeks (Su et al., 2020;Tian et al., 2020). To gain further insight into ozone changes associated with COVID-19, O 3 changes in China are examined before (Period I: 1-25 January), during (Period II: 26 January to 17 February), and after (Period III: 18 February to 31 March) the COVID-19 outbreak. Considering the increase in O 3 in recent years, only compared are the relative differences in O 3 concentrations across eastern China between 2020 and 2019 during the three periods (Fig. 9).
Before the COVID-19 outbreak, O 3 concentrations remained near historical values, with relative changes within ±10%. During the lockdown, significant increases in O 3 concentrations were seen in most parts of eastern China, especially in Hubei Province and its surrounding area, showing a relative change of >40%. Because O 3 formation rates over northern China are under a NO x -saturated regime, a reduction in NO x would enhance O 3 generation rates (Benish et al., 2020;Liu et al., 2021;Shi and Brasseur, 2020). By contrast, an opposite decline in O 3 concentrations was observed in the PRD, mainly caused by meteorological changes and reductions in both VOCs and NO x emissions during the lockdown . In addition, different from northern China, O 3 formation rates over the PRD are under a NO x -limited regime, so the same reduction in NO x would diminish O 3 generation rates Wang et al., 2021). After the COVID-19 outbreak, O 3 concentrations changed little (within ±10%) compared with concentrations in the previous year in most areas of eastern China, indicating that life had returned to normal. In southern China, there was a contrasting increase in O 3 concentrations, likely related to increases in NO x and temperature . The rate of ozone production varies nonlinearly with VOC and NO x emissions, and air quality can initially worsen when NO x emissions are reduced, but the total amount of ozone produced ultimately increases with increasing NO x emissions (Lin et al., 1988). Strict NO x controls must eventually be implemented to protect the environment and human health.   Fig. 10 shows the MDA8 O 3 trends (μg/m 3 /yr) during the study period (2013− 2020), calculated from monthly anomalies across China. Surface O 3 concentrations varied from national to regional scales during the recent eight years. In general, most areas of the country show significant increasing O 3 pollution, with an average of 2.49 μg/m 3 /yr (p < 0.001), especially in central China (> 5 μg/m 3 /yr, p < 0.05) and the NCP (~4.42 μg/m 3 /yr, p < 0.001). The BTH and YRD regions had the stronger increasing trends, i.e., 3.84 and 3.43 μg/m 3 /yr (p < 0.001), respectively. The other two typical regions, i.e., SCB (~1.78 μg/m 3 /yr, p < 0.001) and PRD (~1.41 μg/m 3 /yr, p < 0.05), show relatively low but clear increasing trends. The increase in O 3 over city clusters is closely associated with a decrease in NO x emissions and PM 2.5 concentrations Wang et al., 2020;Wei et al., 2021a;Zhang et al., 2019) and meteorological variations (Li et al., 2020a). By contrast, seen are opposite weakening trends in several coastal provinces in southern China (e.g., Guangxi and Zhejiang).

Long-term variations in the recent decade
Next, we investigated surface O 3 variations under the background of different implemented environmental policies (Table 2) Taking into account the seasonal differences in O 3 discussed above, we next focus on the spatiotemporal variations in summertime mean MDA8 O 3 concentrations from 2013 to 2020 over eastern China (Fig. 11). Ozone levels remained at a high level in summer among different years in China, with an average value of >90 μg/m 3 . It was higher during the period 2017-2019 than in previous years, especially in the NCP (>120 μg/m 3 ). This was closely associated with rising temperatures and an increase in the number of hot days in the NCP (Li et al., 2020a). Changes in O 3 have been diverse in the recent eight years, e.g., O 3 concentrations were higher in 2014 than in 2013 in most areas of China, yet generally decreased in 2015, especially in southern China. Ozone pollution had increased significantly since 2016, reaching a maximum in 2019 (~117.4 ± 23.6 μg/m 3 ), especially in the NCP (~159.7 ± 14.1 μg/m 3 ). This may be due to the decreasing PM 2.5 concentrations by ~15% in the NCP (Li et al., 2020a;Wei et al., 2021a); NO x emission reduction is also suspected as an important driving factor of O 3 increase in recent years. However, the dominant reason remains controversial. By contrast, in 2020, overall O 3 pollution levels decreased in China and in most typical regions in China (Table S3). The coordinated control measures of fine PM and O 3 implemented by the Chinese government (Xiang et al., 2020) may explain this, as well as the ongoing effects of COVID-19 in China. These results are highly consistent with those previously reported, based on ground-based measurements made from 2013 to 2019 (Li et al., 2020a;Lu et al., 2020;Wang et al., 2020). Our predicted results also show similar patterns in spatial distribution compared to those derived from OMI/Aura satellite observations Zhang et al., 2020) and air quality model simulations (Hu et al., 2016;Xue et al., 2020).
We also calculated the percentage of O 3 -polluted days (i.e., MDA8 O3 > 160 μg/m 3 ) for each grid in eastern China for each year from 2013 to 2020 (Fig. 12). In 2013 and 2014, O 3 pollution was mainly found in the eastern and southern provinces of China. Overall, the probability of occurrence was generally low (< 10%) in most areas. The area covered by O 3 pollution generally decreased from 2014 to 2015, especially in southern China. From then on, the area covered by O 3 pollution continuously expanded until 2020, covering most areas of eastern China. More importantly, the probability of occurrence of O 3 pollution increased significantly from 2017 to 2019, especially in the NCP. For example, 23% of the days in 2019 exceeded the accepted O 3 standard. At the regional scale, the proportion of days exceeding the daily O 3 standard also gradually increased in four typical regions, reaching 21%, 12%, 7%, and 3% in the BTH, YRD, PRD, and SCB regions in 2019, respectively (Fig. 13). By contrast, the probability of occurrence of O 3 pollution declined in most areas of northern China (e.g., NCP, BTH, and YRD) in 2020. Similar conclusions have been reported in previous studies Xue et al., 2020;Zhan et al., 2018).

Uncertainty and error analysis
We first investigated the effects of varying the number of training samples on modeled surface O 3 concentrations. For this purpose, we gradually increased the proportion of training samples from 50% to 90% for model building, with the rest of the samples used for validation by applying different N-fold (i.e., 2, …, 10) CV methods using 2020 data from China (Table S4). Overall, with an increase in training samples, the overall accuracy and spatial predictive ability of the STET model gradually improved, with increasing CV-R 2 values and decreasing estimation uncertainties. Small changes in each evaluation indicator were found, even when the training sample changed by as much as 40%, indicating that our model is stable and robust (e.g., CV-R 2 > 0.90 and RMSE <14.1 μg/m 3 ). This is mainly attributed to the unique advantage of the fullcoverage mapping, which provides a large enough sample size to cover most surface O 3 conditions and variations across mainland China. It also benefits from the robustness of ensemble learning, which has a strong anti-noise ability (Breiman, 2001;Geurts et al., 2006).
We trained and built separate models for each characteristic region and compared their prominent features (Fig. S10) and model   (Table S5) against the national model. The top-scoring features for the regional models are similar to those for the national model, e.g., ERA5 DSR, TEM, ET, RH, and OMI NO 2 and O 3 (Fig. S2). However, there were numerical differences in the importance scores for each variable. The model had different accuracies and spatial predictive abilities at the regional scale, with causes closely related to the density and spatial distribution of ground-level monitoring stations. Geographic, meteorological, and population conditions are also different in each region. The performance of the national model was generally better with smaller estimation uncertainties than any one regional model, but differences in the statistical metrics were small. The national model involves a much bigger number of data samples that can cover more O 3 conditions. It can also consider the impact of adjacent regions, especially transition areas. Full-coverage mapping provides the richest dataset to train a robust model.

Comparison with related ozone datasets
We next compared our ChinaHighO 3 dataset with long-term atmospheric reanalysis products generated from chemical models, including MERRA-2 and ERA5, which have similar spatiotemporal coverages. For this purpose, 3-h MERRA-2 and 1-h ERA5 O 3 mixing ratio (unit: kg kg − 1 ) simulations at horizontal resolutions of 0.625 • × 0.5 • and 0.25 • × 0.25 • were collected to calculate daily 14:00 local time MDA8 O 3 concentrations at the ground level (μg/m 3 ) for the year 2020 in China, validated with corresponding ground-based measurements (Fig. S11). The ground-level O 3 simulations from the chemical reanalysis products are poor, with large uncertainties (e.g., R 2 < 0.1 and RMSE >47 μg/m 3 ). The main reason is that the chemical reactions in the assimilation models are substantially simplified, mainly reflecting the impact of dynamic processes on stratospheric and tropospheric O 3 (Knowland et al., 2017). Our surface O 3 estimates are highly consistent with ground-based measurements (e.g., R 2 = 0.96 and RMSE = 8.6 μg/m 3 ), a significant improvement over the chemical reanalysis products.
We also compared the O 3 estimates derived from our ensemble approach and from basic kriging techniques by simply interpolating data from all available O 3 stations in China. MDA8 maps obtained by two widely used kriging techniques, namely, ordinary kriging and universal  kriging, have similar spatial patterns and can basically capture information about surface O 3 (Fig. S12a-b). They are in good agreement with our results in southeastern China, with small differences within ±5%. However, significant differences >40% are seen in central, southwest, and northwest China (Fig. 12c-d). This is closely associated with the decreasing density of ground-based monitoring stations that the kriging methods are highly dependent on. In particular, severe block noise easily occurs in areas with few or no observation stations (areas outlined in red in Fig. S12). In addition, kriging results change smoothly in space in areas with complex terrains (e.g., southwest China). The main reason is that kriging only considers spatial correlations among observation stations, neglecting natural and human factors that impact air pollution (Fig. S2). Ensemble learning can make up for this deficit by making full use of data mining to build a robust conversion model from abundant, potentially influential factors on air pollution. Cross validation further illustrates that our model can obtain more reliable O 3 predictions by significantly decreasing the uncertainties by 16-28% compared to kriging techniques.
In addition, different studies have relied on different main predictors, i.e., key variables input to the model for estimating surface O 3 concentrations, such as satellite-based total-column O 3 /NO 2 or CH 2 O,  MERRA-2 reanalysis data, model simulations, or in situ observations. These other O 3 datasets contain a large number of missing values at coarse or false (e.g., forced resampling) spatial resolutions (i.e., 0.25 • -0.625 • ), limited by input data sources. Our study overcomes these issues and is an improvement on previous studies, providing a daily fullcoverage (spatial coverage = 100%) and true-spatial-resolution (~0.1 • × 0.1 • ) O 3 dataset for China generated from two main predictors, i.e., DSR and TEM. The dataset developed here constitutes a nearly continuous record of ground-level O 3 concentrations from 2013 to 2020 in China.

Summary and conclusions
Ground-level O 3 is a major pollutant affecting human health. To compensate for the sparse and inhomogeneous coverage of groundbased O 3 networks and the low data quality, missing values, and low resolution of many existing satellite-based O 3 estimates, we applied a spatiotemporal extremely randomized trees machine-learning model to develop a long-term, near-surface ozone product that can overcome or lessen the above limitations. Besides O 3 training data, input variables include surface solar radiation downwards, air temperature, meteorological variables, land use information and topography, population distribution information, and a pollution emission inventory. The MDA8 O 3 product (ChinaHighO 3 ) with full coverage across China at a spatial resolution of 10 km from 2013 to 2020 was generated.
The estimates were evaluated against surface observations at varying spatiotemporal scales and compared with previous related studies. CV results illustrate that our model has a high overall accuracy (spatial predictive ability), with average out-of-sample (out-of-station) CV-R 2 , RMSE, MAE, and MRE values of 0.87 (0.80), 17.10 (21.10) μg/m 3 , 11.29 (13.87) μg/m 3 , and 18.38 (23.18) %, respectively. Note that currently, we can only evaluate the surface O 3 predictions by removing parts of the base dataset using different 10-CV approaches. Also, note that assessing the accuracy of predictions in locations where O 3 measurements have never been made still remains a challenging task. Overall, the China-HighO 3 product is superior to existing ones in terms of model accuracy, spatial coverage and resolution, and data record length.
Benefiting from the unique advantages of the ChinaHighO 3 dataset, a recent (April to May 2020) short-term national and regional severe O 3 pollution event was well captured. Also observed was a rapid increase in O 3 pollution during the COVID-19 lockdown, especially in Hubei and surrounding provinces (e.g., an increase of >30%), followed by a return to normal levels after the lockdown ended in China. This was not a repudiation of NO x controls. A long-term analysis also showed that O 3 concentrations have significantly increased by 2.49 μg/m 3 /yr (p < 0.001) in China from 2013 to 2020, especially in the North China Plain (~4.42 μg/m 3 /yr, p < 0.001). In addition, summertime O 3 concentrations after 2017 were much higher than in previous years due to rising temperatures and an increase in the number of hot days. The number of days exceeding the ambient O 3 air quality standard (MDA8 O 3 = 160 μg/ m 3 ) and the areal extent of high O 3 levels were also shown to be gradually increasing across China, especially in the "2 + 26" cities in the North China Plain. Our ChinaHighO 3 dataset will thus be useful for related studies on air pollution in China, especially those studies focused on environmental health.
In this study, daily MDA8 O 3 concentrations were first estimated to examine daily and interannual changes in surface O 3 in China. Since hourly ground measurements and core input predictors (i.e., downwelling solar surface radiation and air temperature) are available, our model can be employed to reasonably predict hourly surface O 3 concentrations. This will help toward reproducing and investigating the diurnal patterns of and variations in surface O 3 . Considering similarly available ground measurements and after determining the appropriate core input predictors (e.g., aerosol optical depth and tropospheric gas column amounts), our model can also be applied and extended to estimate other species of ground-level air pollutants (e.g., PM, NO 2 , SO 2 and CO). Our future studies will focus on this.

Declaration of Competing Interest
None.