Spatially continuous mapping of hourly ground ozone levels assisted by Himawari-8 short wave radiation products

ABSTRACT Ground-level ozone concentration has demonstrated a noteworthy increasing trend in easternChina over the past 20 years, exerting inescapable influence on atmospheric chemistry, climate change, and air pollution. To support epidemiological research and prevent further environmental pollution, it is imperative to monitor the spatio-temporal distribution of ground-level ozone pollution. Geostationary satellites prove a promising approach to make up for the limitation of ground-based measurement and polar-orbiting observation by acquiring regional atmospheric measurements of high spatio-temporal resolution, yet inevitably exists spatial discontinuity caused by the cloud cover. This study established an effective model to produce spatially continuous estimation of hourly 5 km ground-level ozone (O3) concentrations covered most of China by firstly integrating hourly Himawari-8 SWR product from 1 March 2018 to 28 February 2019. The R2 values of established model of all available samples for direct fitting, 20-fold cluster-based CV, 10-fold site-based CV, and 10-fold sample-based CV reaches 0.95, 0.69, 0.87 and 0.89, respectively, implying superior universality and spatial scalability. The R2 values of seasonal site-based CV range from 0.78 (Winter) to 0.86 (Autumn), and that of hourly site-based CV ranges from 0.73 (0900 BST) to 0.86 (1300 BST). In addition, this study displayed the spatial distribution of the estimated ground-level ozone from the temporal scales of quarter, week, and day. Summer (87.1 ± 28.2 μg/m3) proves to be the most polluted season, and the least polluted season is winter (59.4 ± 16.1 μg/m3). 1400 BST appears to be the most polluted hour (87.4 ± 26.6 μg/m3) and 0900 BST is the least polluted hour (59.1 ± 16.8 μg/m3). Remarkable “weekend effect of ozone” has been detected in northeast Hebei and Sichuan Basin and Yangtze River valley.


Introduction
Ozone, one of the most significant atmospheric trace gases, plays a bifacial role according to its location (Ramanathan and Dickinson 1979).In the stratosphere, ozone proves to be a salutary shield that protects life on earth by absorbing injurious shortwave radiation of the sun.In the troposphere, however, ozone turns a pernicious oxidant, greenhouse gas and pollutant, exerting inescapable influence on atmospheric chemistry, climate change and air pollution, respectively (Swackhamer 1993).The high concentration ozone near the surface has been proven to be a noxious atmospheric pollutant and characteristic component of urban photochemical smog pollution (Wang et al. 2017), which is mainly produced by a battery of intricate chemical reactions between volatile organic compounds (VOCs) and nitrogen oxides (NOx) under the irradiation of sunlight.Due to prosperous industrial development and the expanding use of fossil fuels such as petroleum, massive ozone precursors have been discharged into the atmosphere through human activities, thus the increase of near-surface ozone pollution has gradually become prominent regional environmental problems desired to be solved.Ozone stays in atmosphere for a long time and can be transmitted over distance, intensively stimulating human eyes and respiratory tract, cussing cardiovascular, cerebrovascular and respiratory diseases (Spektor et al. 1988).There remains severe health threat to animals and plants exposed to high concentration ozone pollution.Researchers have demonstrated that there keeps a noteworthy ascending trend of surface ozone concentration in the east of China over the past 20 years (Shao et al. 2006;Xu et al. 2008;Wang et al. 2009;Sun et al. 2016;Wei, Li et al. 2022).There are serious photochemical pollution problems in major urban agglomerations such as Beijing-Tianjin-Hebei, Pearl River Delta, Chengdu-Chongqing and Yangtze River Delta (Xue et al. 2014).The latest atmospheric monitoring results suggested that major air pollutants of PM 2.5 and PM 10 have been controlled since 2013 while ground-level ozone pollution remains deteriorating.It can be concluded that the improvement of groundlevel ozone pollution will become a primary mission for Chinese environmental management department.Consequently, to support epidemiological research and prevent further environmental pollution, it is imperative to monitor the spatio-temporal distribution of ground-level ozone pollution.
Traditional environmental monitoring methods are of high accuracy and reliability which mainly depends on ground sampling and fixed site monitoring.However, the regional ground-level ozone concentration cannot be obtained through traditional environmental monitoring ways due to the sparse ground monitoring sites.The chemical transfer models (CTMs), such as WRF-CMAQ and GEOS-Chem, have been extensively used to estimate surface ozone level, which shows the ability to achieve high spatial resolution estimation by means of fusing measurements and multi-model yet still has a large degree of uncertainty (Fu et al. 2022) and lack of hourly resolution.Meteorological reanalysis datasets can supply spatial continuous estimation of hourly ground-level ozone loading, while there still exists restrictions in spatial resolution.Although polarorbiting satellites are able to provide long-time sequential observation at a regional scale (Pei et al. 2022, He et al. 2021), which still cannot meet the requirements of hourly atmospheric monitoring for daily variation analysis of ground-level ozone pollution due to insufficient observation frequency.Whereas geostationary satellites are capable of acquiring the real-time atmospheric measurements, yet inevitably affected by the cloud cover.Therefore, cloud filtering is needed to eliminate the measurements affected by clouds, implying spatial discontinuity of regional estimation results (Wang et al. 2022).Still, it is a promising approach to make up for the limitation of ground-based measurement and polarorbiting observation (Chen et al. 2023), and achieve spatially continuous estimation of hourly ground-level ozone concentration by utilizing effective datasets.
As one of the most essential conditions for photochemical reaction, the surface solar radiation downwards is a crucial variable in the modeling estimation of ground-level ozone pollution.Ozone mainly absorbs ultraviolet waves below 0.3 μm, and there proves a weak absorption at 9.6 μm, while the absorption at 4.75 μm and 14 μm is weaker, which is almost imperceptible (Bogumil et al. 2001;Nassar et al. 2008;Boynard et al. 2009;Anton et al. 2011;Bak et al. 2012;Damiani et al. 2012).Himawari-8 Short Wave Radiation (SWR) dataset provides ultraviolet radiation a (UVA), ultraviolet radiation b (UVB) and SWR that can be applied for highresolution (5 km, 1hour) ozone estimation.This study firstly attempts to apply Himawari-8 Short Wave Radiation dataset as the key input for surface ozone estimation.The data deficiency caused by cloud cover can be almost negligible in the SWR dataset for generating spatially continuous ozone estimation (Frouin and Murakami 2007;Yu et al. 2018).In this study, spatially continuous estimation of hourly 5 km ground-level ozone concentrations will be built over mainland China, based on Himawari-8 Short Wave Radiation dataset and Bagged-tree (BT) model.The study region, datasets and satellite-based models are described in section 2 and section 3.Then, hourly ozone concentrations are evaluated and validated through cross-validation (CV).The spatiotemporal distribution of hourly and seasonal ozone concentrations is displayed in section 4. The possible causes of bias and spatio-temporal heterogeneity are discussed in section 5.

Study region
The Chinese government has initiated nationwide air quality monitoring to ameliorate the air quality in China.During the selected time period of this study, hourly site-based O 3 observations were acquired from 1501 atmospheric monitoring sites operated by CNEMC.The non-uniform distribution of monitoring stations managed by CNEMC demonstrates that the eastern sites outnumbered the western sites (Figure 1).The measurements of isolated monitoring sites cannot interpret the distributing disciplinarian of regional pollution layout and controlling factors surrounding such as the terrain and weather conditions that contribute to ground-level ozone at regional scale.

Datasets
The datasets utilized in this research included in situ O 3 loading, SWR measurements, and correlative ancillary datasets with O 3 level transmutation comprising background O 3 , precursor pollutants, meteorological, land type, topographic, and atmospheric scattering factors.All the datasets covered the period of 1 March 2018 to 28 February 2019.The detailed descriptions of these datasets are provided in Table 1.

Himawari-8 short wave radiation products
As the new generation of geosynchronous meteorological satellite supervised by Japan Meteorological Agency (Bessho et al. 2016), Himawari-8 is equipped with advanced optical sensor of elaborate radiation, spectrum and spatial resolution and shows the capability of distinguishing meteorologic variations than other prevailing geosynchronous satellites.Himawari-8 carries a wide range of view that observes the region from 80° E to 160° W in east-west direction and from 60° N to 60° S in north-south direction (Kurihara, Murakami, and Kachi 2016).The transitory revisit time (about 2.5 minutes for sector area and 10 minutes for Full Disk) enables high temporal resolution.Being congregated with aforementioned advantages, diverse superior products have been developed for multifarious meteorological monitoring demand.
Whereas the spectral absorption characteristics of ozone, solar radiation has been adopted as the most pivotal parameter in estimation of surface ozone pollution (Li et al. 2022;Song, Li et al. 2022;Wei, Li et al. 2022), particularly the ultraviolet part in the spectral coverage that is between 270 and 330 nm where conspicuous absorption peaks and valleys of ozone Hartley and Hugging bands are easily captured (Levelt et al. 2006).Additionally, the solar radiation contains essential information related to ozone formation such as atmospheric temperature and the necessary conditions of photochemical reaction.Himawari-8 Short Wave Radiation (SWR) dataset provides UVA, UVB and SWR that can be applied for high-resolution (5 km, 1hour) ozone estimation.The data deficiency caused by cloud cover can be almost negligible in the SWR dataset for generating spatially continuous ozone estimation.Compared with several widely used products (ERA-INTERIM, Ceres-SYN and MERRA-2), the SWR dataset established by Himawari-8 proves highest accuracy and spatio-temporal resolution (Frouin and Murakami 2007;Yu et al. 2018).
The SWR parameterized method applied in Himawari-8 SWR products refers to the plane parallel radiation transmission theory (Frouin and Murakami 2007), and on the assumption that the planetary atmosphere can be simulated as the clear sky atmosphere above the clouds, the influences of the clear sky atmosphere and the clouds are processed and decoupled respectively.The solar radiation reaching the surface is described by following formula: Where E clear is the solar radiation arriving at the earth's surface.If the cloud/surface layer neither reflects nor absorbs, A denotes the cloud/surface albedo, A s represents the ocean surface albedo, and S a is the spherical albedo.This decoupling model is characterized by superiorities of simplicity and feasibility that it is not necessary to discriminate the composition of clear and fuzzy components in a pixel beforehand and the cloud filtering operation is not required.At present, the accuracy evaluation of this product is mostly carried out by comparing with the actual measurements of marine stations, but less with measurements of land stations (Yu et al. 2018).Still, it has been proven the potential of this product in surface ozone estimation for the high correlation calculated (UVA:0.57,UVB:0.54,SWR:0.56,marked three highest correlation coefficients among all selected parameters) between the variables contained in this product and surface ozone measurements of monitoring sites.

Atmospheric scattering factors
As discussed in the article above, the surface ozone pollution is closely associated with the surface ultraviolet radiation intensity which is heavily affected by the air pollution loading.Atmospheric particulate matter diminishes the surface ultraviolet radiation through absorption or scattering, decelerating the photolysis rate of nitrogen dioxide (Li et al. 2005), thus affecting the near-surface ozone concentration.Researchers found that the concentration of nearsurface ozone decreased with the increase in atmospheric aerosol Jacobson, Jacobson, and Mark (1998), Castro et al. (2001) and PM 10 concentration (Nishanth et al. 2014).Meanwhile, AOD also impacts the photolysis rate of ozone precursors, thus affecting ozone generation (He and Carmichael 1999;Crutzen and Lelieveld 2001).
The original AOD, PM 10 obtained from Himawari-8 MASINGAR product have been processed spatially continuously at Meteorological Research Institute, and provided by JAXA P-Tree System (https://www.eorc.jaxa.jp/ptree/),Japan Aerospace Exploration Agency, as the AOD, PM 10 datasets are assimilated by Himawari L3 aerosol optical thickness at 00, 03,  (Ziemke et al. 2006;Yang et al. 2010;Ziemke et al. 2011).The surface ozone concentration any specific place can be regarded as the sum of regional background ozone concentration and local photochemical ozone concentration (Lin et al. 2011).In this article, tropospheric background ozone concentration is computed by taking SCO as the vertical correction of monthly averaged TCO.

Ozone precursor
The chemical formation mechanism of ozone proves extremely complicated, and nonlinear relationship between ozone and precursors (NOx and VOCs) has become the greatest difficulty to prevent and control ozone pollution.Reducing the emissions of ozone precursors must conform to a certain scientific proportion and adapt to local conditions.Unreasonable emission reduction policies may lead to the aggravation of local ozone pollution (Sillman 1999).Change of ozone precursor concentration, the mechanism of ozone generation transforms, can be considered as a chain cycle reaction that not only generates ozone but also consumes ozone.During the period of ozone consumption in the chain cycle reaction, the reduction of ozone precursors may conversely result in growth of ozone.The area where O 3 is mainly produced and controlled by NOx needs to be the first priority of reducing NOx emissions, and the same is true for the areas where O 3 is produced and controlled by VOCs.Therefore, the ozone precursor is deemed as another vital parameter in the modeling estimation of ground-level ozone pollution.Although remote sensing technique cannot directly retrieve the atmospheric VOCs, HCHO proves not only a significant VOC component, but also a main oxidation consequence of other VOCs, reflecting the atmospheric VOCs content to some extent.Likewise, the concentration of NO2 column is able to reflect the content of atmospheric NOx.Therefore, the NO2 and HCHO (0.25°×0.25°, daily) derived from OMI/Aura are introduced into modeling estimation of ground-level ozone pollution.Monthly averaging is applied in NO2 and HCHO to cope with the data missing caused by the satellite orbit gap of Aura.Besides, median filtering is applied in HCHO to handle orbital noise of Aura.Additionally, considering the demand of collective heating, a large amount of coal burning bring about grievous atmospheric bituminous coal pollution in urban areas and the Carbon Monoxide (CO) concentration (Chandra, Ziemke et al. 2009), also one of the important precursors of O 3 (Parrish, Trainer et al. 1998), may play a major role in the ozone generation process, proves obviously higher than that in nonheating period.Previous studies have highlighted a clear positive correlation of 0.89 was found between ozone increment and CO increment during Sporadic biomass burning.(Yarragunta et al. 2020) The CO product derived from Measurements Of Pollution In The Troposphere (MOPITT) was applied as a predictor to further improve precision of proposed method (Nasa/Larc/Sd/Asdc 2000).

Land type data
Land use type (LUC) represents the distribution of population and roads which is closely related to the ozone precursor emissions to some extent.MODIS Land Cover Type Product (MCD12Q1) releases scientific unitized data sets (SDSs) which outputs global land cover maps for six different land cover legends with a spatio-temporal resolution of 500 m, annually.These maps are derived from the Moderate Resolution Imaging Spectroradiometer (MODIS).
There are manifold interactions between vegetation and atmospheric ozone (Guenther et al. 2012;Hong et al. 2012).The normalized difference vegetation index (NDVI) was used as auxiliary factor for land cover type.MODIS 16-day NDVI products, namely, "CMG 0.05 Deg 16 days NDVI" in "MOD13C1/MYD13C1," were obtained from the data center of the US National Aeronautics and Space Administration.The luxuriant level of surface vegetation is positively correlated with NDVI.

Topographic factor
Topographic factor acts as a vital factor in the diffusion and deposition of atmospheric ozone pollution (Tao et al. 2012), which was acquired from the Consortium for Spatial Information of the US Geological Survey (USGS) (http://srtm.csi.cgiar.org/).

Strategies to generate spatially continuous results
In this study, several strategies are utilized to obtain spatial continuous hourly ground ozone levels.1) Different from the masses of missing values in original Aerosol Property datasets provided by Himawari-8, the Himawari-8 Short Wave Radiation products (UVA, UVB, SWR) used in this study refers to the plane parallel radiation transmission theory, in which the data deficiency caused by cloud cover can be almost negligible, providing round-the-clock short-wave surface radiation property without missing spatial data covered most parts of China (>80° E).Likewise, the atmospheric scattering factors (AOD and PM 10 ) used in this study are assimilated by Himawari L3 aerosol optical depth, deriving from Model of Aerosol Species IN the Global AtmospheRe (MASINGAR), supplying hourly forecast of aerosol properties.This product was assimilated by Himawari L3 aerosol optical thickness and L3 NRT dataset provided by MODIS/Terra+Aqua, which reaches round-the-clock aerosol property without missing data across whole territory of China as well.2) In this study, data missing inevitably occurs in products derived from OMI and MLS (SCO, TCO, NO2, HCHO) due to the gap between satellite orbits.This research carried out monthly average operation on the products derived OMI and MLS, and HCHO is additionally subjected to median filtering operation to deal with the orbits noise problem after the monthly average operation.3)The CO was directly filled by the nearest neighbor interpolation since its minor missing scope.After the above operation, all dataset inputs are spatial continuous in space scale.

Estimation models
The decision tree method is characterized by straightforward interpretability, being used extensively for regression and classification (Margineantu and Dietterich 2003, Luo et al. 2022, Wu et al. 2022).But compared with the most advanced supervised algorithm, the decision tree method proves inferior.The mainstream ensemble methods of multiple decision trees include bagging and boosting method.Comparing the two aforementioned methods, bagging method is more insensitive to outliers, and the training time cost proves less.As the "bagging" implies, after multiple models are packed into the same bag, let this bag be used as a new model to perform the forecast demand.In other words, multiple models are combined to form a new big model, and the final prediction result of this big model is determined by these small models, and the decision mode is that the minority is subordinate to the majority (i.e. predicting results are averaged in regression).The key of bagging method is that each model in the bag cannot be correlated, and the more irrelevant, the better.The irrelevance here is mainly reflected in the different samples used to train each model.Secondly, the higher the accuracy of each model, the better, so that its decisions will be more valuable.In bagging method, the more accurate the model, the better, even if it is over-fitted.Because a model should be as accurate as possible in the training set, and the degree of accuracy is probably proportional to the complexity of the model, it is normal and justifiable to have over-fitting.And complex over-fitting is only for separate model in the bag, because it will be weighted in the end, thus the whole bag will not be easily over-fitted.The Random Forest can be the most widely used model based on bagging method.The difference between the Random Forest model and the Bagged Tree is that when performing split process of separate decision tree, Random Forest does not use all the features supplied in samples, but randomly selects several certain features.In this study, we did not expect the predictors to be randomly selected to participate in the model training, but hoped that all the predictors can participate in the model.Hence, the bagged-tree method is adopted in this study for ground-level ozone estimation, which effectively improves the prediction accuracy and avoids overfitting (Banfield et al. 2007).
Grid search was adopted in the process of hyperparameter optimization.Namely, set the range and step size of the hyperparameter, and test model performance under each hyperparameter combination.The number of estimators, minimum observations per leaf, number of ensemble learning cycles were chosen to adjust to attain better modeling performance, and the optimization goal is minimize the generalization error (i.e. the determination coefficient of validation set that accounts for 10% of all the samples).

Spatio-temporal matching
All the aforementioned data were reprocessed to be spatially and temporally consistent, and consequently formed a matched dataset as the fundamental samples to train model.On account of the datasets used in this study are of different spatial resolution, these datasets were interpolated to same spatial resolution and size with Himawari-8 SWR (5 km) first.Then, related factors around an ozone monitoring site were derived using the following criteria.1) The matching spatial distance should be within 5 kmradius from a site center.2) Based on the measuring time of Himawari-8 SWR, matching time difference of datasets with hourly temporal resolution is less than 1 hour; The matching time difference of datasets with daily temporal resolution is less than 24 hours; The matching time difference should be minimal in the time series of remaining datasets.Lastly, the hourly average measurements of ground-level ozone were correlated with the aforementioned factors within study area.

Evaluation approaches
In this study, all spatio-temporal matched samples are added to the model training, and then the trained model is used for generating corresponding estimation of the whole samples, which is referred to direct fitting (Fit).N-fold crossvalidation (N-fold CV) was implemented to investigate the performances of this model for estimating ozone concentrations.Widely applied samplebased CV randomly divides samples into N parts, and randomly selected N-1 of parts to train model in first, and then evaluated the trained models by the rest one part.This step was repeated N times to test all samples (N-fold SamCV).As for sitebased CV, 90% of monitoring sites were selected for model training, and the rest 10% of that were used to accomplish corresponding estimation of ground-level ozone concentration, which similarly repeated 10 times to test all the sites (10-fold SCV).Through 10-fold CV, all the estimated results were inspected by comparing with corresponding sitebased observations.Subsequently, three assessment indicators, namely, the determination coefficient (R 2 ), mean absolute error (MAE), root-meansquare error (RMSE), and relative RMSE (rRMSE), were adopted to assess the performance of validations.
Correspondingly, the 10-fold SCV was applied for simulating the scene of lacking measurements of monitoring sites for failure of foundation sampling equipment or other possible reasons, which is to evaluate whether proposed method could replenish absent sites' measurements and the capacity of generating estimations that are representativeof areas near monitors.Furthermore, the cluster-based cross validation was also applied to examine the spatial expansion capability of this method, assessing model performance when estimating unsampled areas far from monitors.According to (Young et al. 2016), this study utilized 20-fold cluster based cross validation and adopted the spatial clustering method subject to K-means to form all the monitoring sites into different clusters (20-fold cluster-based CV). Figure 3 suggests seasonal comparison between the estimation results and the site-based observations.In general, the method proposed in this study shows the best performance in autumn, with the R 2 achieving 0.94, 0.65, 0.86, 0.89 in FIT of all available samples, 20-fold cluster-based CV, 10-fold SCV and 10-fold SamCV attain, respectively.Conversely, the method demonstrates inferior performance in winter with the R 2 achieving 0.90, 0.52, 0.78, 0.84 in above four experiments, respectively, but still features outstanding spatial scalability and accuracy.Possible reasons for inferior performance in winter will be discussed in section 5. Based on 10-fold SCV, the RMSE ranges from 13.9 (Winter) to 22.6 (Summer) μg/m 3 .Based on 20-fold clusterCV the RMSE ranges from 20.1 (Winter) to 34.4 (Summer) μg/m 3 .Based on 10-fold SamCV, the RMSE ranges from 11.9 (Winter) to 18.8 (Summer) μg/m 3 .

Performance of estimation models
The diurnal comparison between the estimation results and the site-based observations is depicted in Figure 4.The BST refers to Beijing Standard Time.The bias of proposed method within time series of this study is evaluated by comparing temporal averaged estimated ozone concentration with measurements of monitoring sites.As shown in Figure 5, the daily average positive deviation is 1.65 μg/m 3 , and the daily averaged negative deviation is −1.77μg/m 3 .The hourly average bias ranges from −10.60 to 10.87 μg/m 3 , and the absolute hourly mean bias reaches 2.62 μg/m 3 .
At the same time, the spatial distribution of error indicators has been calculated to verify estimated ground-level ozone concentration at different site locations.There remains a small part of ground sites with deficient matching samples that may bring about considerable bias on account of privation of site-based observations (the blue dots in Figure 6(a)).Except for a few ground stations with less than 1,000 matching samples, generally speaking, subjected to SCV, the average R 2 , MAE and RMSE reach 0.86, 13.1 μg/m 3 and 17.3 μg/m 3 , respectively.As shown in Figure 6, the typical districts with severe atmospheric ozone pollution such as Yangtze River Delta, Sichuan Basin, and North China Plain appear better estimated performance with relatively high value of R 2 and low MAE and RMSE.The estimated ground-level ozone concentration located in the same position of sites distributed in southeast coast shows comparatively high values of RMSE.Relatively large-scale errors occurred in the southeast coast of China, with the MAE and RMSE being palpably higher than other areas, which will be elaborated in Section 5. Figure 7 shows the 20-fold clusterCV validation results, and the tinctorial borderlines split each cluster for contrast.Comparing Figures 6(b) and 7(b), a distinctive difference appears in the R 2 , as most R 2 values of 10-fold SCV validation are above 0.8 while that of 20-fold clusterCV validation has been remarkably declining, especially in the district where sparse monitoring stations are located.As aforementioned conclusion, SCV is considered to be more accurate because nearby observations may be randomly   clusterCV, as fitting slope relatively decreases in Figure 2. If a specific cluster with higher average concentration than other regions is used for verification, the training set may not contain sufficient highconcentration samples, which may lead to underestimation.For instance, within the time frame of this research, the ozone concentration in Northeast China appears to be relatively higher than that in other regions, while sparse monitoring station distribution and insufficient training samples caused accuracy descent and underestimation in cluster-based validation.

Spatial distribution of estimated ground-level ozone
Figure 8displays the spatial distribution of seasonal average ground-level ozone estimation in most parts of China.Unfortunately, there are no available estimation results in western China (<80°E) due to the limitation of full-disk range of Himawari-8 datasets.According to Figure 8, there exist significant differences in estimated ozone concentration in different geographical locations and seasons, implying substantial spatio-temporal heterogeneity.Overall, summer proves to be the most contaminated season with the average estimated ozone concentration reaching 87.1 ± 28.2 μg/m 3 , and the least contaminated season is winter, with the estimated ozone concentration decreasing to 59.4 ± 16.1 μg/m 3 .In addition, the average estimated ozone concentrations in spring and autumn are 86.3 ± 23.2 μg/m 3 and 67.6 ± 19.5 μg/m 3 , respectively.The average estimated ozone concentration in spring being very close to that in summer, when comparing the standard deviation of the two, it can be considered that more serious pollution arises in spring in most parts of China, but a more prominent spatial heterogeneity of ozone pollution is detected in summer for the substantial standard deviation, corresponding to previous existing studies ( Chan et al. 2003;Nassar et al. 2008).To verify the accuracy of the spatial distribution of seasonal average estimated ozone concentration depicted in Figure 8 Figure 9 displays site-based seasonal mean ground-level ozone concentrations on the basis of Figure 8.The validation of the estimated results in Figure 9 is generally good.Also, there exist noticeable deviations in northern Xinjiang and northern Heilongjiang for sparse site distribution and fewer matching samples which has been revealed in Figure 6a.
Notable regionality and hourly change of ozone concentrations can be found in Figure 10 Temporally, diurnal estimated ozone concentration indicates a remarkable unimodal variation, as the 1400 BST appears to be most polluted (87.4 ± 26.6 μg/m 3 ) when the strongest surface solar radiation aroused, and the 0900 BST is the least polluted hour (59.1 ± 16.8 μg/m 3 ), for titration reaction and dry deposition at night (Ding et al. 2007;Schnell et al. 2016).In areas with low boundary layer height, such as southern China and Sichuan Basin, the dry deposition of ground-level ozone will be stronger at night (Clifton et al. 2020).Additionally, the dilution and dissipation of ozone have been detected after 1400 BST since decrease of sunlight intensity and temperature and may be associated with accelerated wind speed associated with the rise of height of boundary layer at noon (Yawei et al. 2017,Liu et al. 2022).The regional estimated ozone concentration varies with solar position that the accumulation of ozone pollution first shows up in eastern China (64.8 ± 20.1 μg/m 3 , 1000 BST), subsequently, the ozone pollution in eastern China began to decrease, while that in western China began to aggravate as the sun moves westward.To verify the accuracy of the spatial distribution of hourly average estimated ozone concentration depicted in Figure 10, 11 displays sitebased hourly mean ground-level ozone concentrations on the basis of Figure 11.The hourly estimation result displayed in Figure 11 is basically consistent with the average measurements of monitoring sites.Expect for the position that of fewer matching samples and sparse site distributed, it seems that the dissipation of ozone pollution with afternoon solar radiation is not as fast as the foregoing conclusion interpreted from Figure 10 since the concentration of surface ozone in southeastern China still stays at a high level at 1600 BST in eastern China.The possible reasons will be introduced in Section 5.
According to site-based measurements, on 30 June 2018, the maximum 8-h average groundlevel O 3 (MDA8 O 3 ) in 13 provinces and regions exceeded 160 μg/m 3 (exceeding the secondary concentration limit of surface ozone pollution in China's ambient air quality standard), and the pollution was concentrated in the North China Plain and Northeast China Plain, mostly in the afternoon.A total of 62 monitoring stations (out of a total of 80 available stations in the region) in the Beijing-Tianjin-Hebei region showing hourly mean surface ozone of more than 200 μg/m 3 .At 16 pm of this date, the average hourly ozone pollution in Shandong Province reached a staggering 393 μg/m 3 , the peak of average hourly ozone level nationwide.Furthermore, to characterize the estimation performance of the proposed method on a certain day and show the advantages of the proposed method in estimating surface ozone pollution in typical regions, the spatial distribution of hourly estimation on June 30th, 2018 is taken as an example displayed in Figure 12, and the Beijing-Tianjin-Hebei is displayed as a typical region with severe surface ozone pollution.The site-based hourly mean ground-level ozone concentrations on June 30th, 2018 are depicted in Figure 13 for further verification of estimations in Figure 12 .Figure 13 suggests that the proposed method still has a prominent performance in certain day's estimation since hourly estimation result basically consistent with the measurements of monitoring sites.From Figure 12 and 13 , proposed method has well accomplished the mission of portraying typical surface ozone pollution events in full spatial coverage and representative local area, as it exactly revealed regional properties of ground-level ozone pollution that matched the site-based observations in a relatively high spatio-temporal resolution.It is worth noting that the part circled by the blue dotted line in lower part of Figure 13(a), a monitoring site, named 1035A, has been detected to be different from the surrounding estimated regional ozone level.We speculate that such deviation may relate to insufficient spatial resolution since the estimations are merely the mixed result in a spatial grid (5 km).Because other sites (site 1029A to 1034A) close to 1035A show high values in ozone estimation, estimation cannot generate different results in the same grid.Another possibility is that there is equipment failure in this site, which leads to inconsistent estimation.
The "weekend effect of ozone" in urban area is detected by comparing the changes of emissions on weekdays and weekends, which provides practical measured foundation for the sensitivity study of source emissions and analysis of influencing factors of urban ozone generation mechanisms, supporting scientific basis for the formulation of ozone control policies (Swackhamer 1991).The weekdays and weekends distribution of averaged estimated ground-level ozone concentration are calculated for analysis the weekend effect of ozone is displayed in Figure 14(a) and 14(b).There seems little difference in the distribution of ozone concentration between weekdays and weekends when directly comparing the two, while a distinctive dissimilarity emerged after calculating the differential value and captured in Figure 14 (c).The estimated ozone concentrations of weekdays (261 days) during research time period are regarded as an array, and that of weekends (103 days) are taking as another array, while Chinese traditional holidays are not discussed separately.Through calculation, the two arrays do not conform to the normal distribution, then the Levene test is adopted to demonstrate the significant difference between the two.Levene is a special analysis of variance (F test), which is used to test the homogeneity of variance and relatively robust when the data distribution does not conform to the normal distribution.Through the Levene test (h = 1, p < 0.05), areas with protuberant weekend effect are presented in Figure 14(d).
With the variation of NOx concentration, there are noteworthy differences in the mechanism of ozone generation.In northeast Hebei, the ozone concentration on weekdays is significantly lower than that on weekends, and this feature occurs imperceptibly in the surrounding areas, which may relate to high NO X concentration over this area.When concentration of NOx staying high level, excess NO reacts with ozone and consumes ozone, called the "NO titration effect" (Sillman 1999).Under this circumstance, controlling NOx emission will lead to the increase of local ozone concentration (Bower et al. 1989).The NOx from automobile exhaust emissions decreases due to the diminution of vehicle traffic on weekends has weakened the titration effect, leading to the increase of local ozone concentration conversely.On the contrary, in Sichuan Basin and Yangtze River valley, the ozone concentration on weekdays is significantly higher than that on weekends.On weekends, the content of ozone precursors from automobile exhaust in these areas may be greatly reduced, which can be associated with the decrease of ozone concentration.Judging from aforementioned perspective, implementing policies of energy conservation and emission reduction and vehicle current restriction in these areas may be effective ways to reduce atmospheric ozone pollution.Admittedly, the above analysis only provides a reference diagnosis method of ozone generation mechanism, and merely represents the situation in the time period of this research, while specific policy formulation needs to be combined with the latest data.

Comparison with other studies
This study utilizes the superiorities of geostationary satellites in spatio-temporal resolution, obtaining satisfactory results in hourly estimation of surface ozone concentration over study region.This study marks outstanding accuracy among all the researches summarized in Table 2, with the highest R 2 of sitebased CV reaching 0.87 and the lowest MAE and RMSE reaching 13.30 and 18.30 μg/m 3 , respectively.Most analogic studies in China use maximum 8-h average ground-level O 3 (MDA8) as the model response, hence the temporal resolution still stays on daily scale.In this study, hourly ground-level ozone estimation has been realized by directly using the monitoring site-based hourly ground-level ozone concentration measurements as the model response  regional estimation results.Preferably, the data deficiency caused by cloud cover can be almost negligible in the SWR dataset for generating spatially continuous ozone estimation of this study.Furthermore, solar radiation was also used as the key parameter in studies summarized in Table 2, such as downward shortwave radiation (Wei, Li et al. 2022), Thermal InfraRed (TIR) bands (Wang et al. 2022;Li, Yang et al. 2022) and surface ultraviolet (UV) at 380 nm (Song, Li et al. 2022).While there four parameters (UVA, UVB, SWR, STRD) pertain to solar radiation are utilized to comprehensively simulate the sunlit conditions of ozone generation of this study.Besides, the method used in this study proves of strong theoretical basis as the ozone generation mechanisms associated with precursor has been thoroughly elaborated and the regional "weekend effect of ozone" has been diagnosed uniquely.The principles of variable selection and the causes of deviations and noteworthy phenomena in this research are also analyzed in detail for further promotion of correlated field.

Contribution factors on spatiotemporal change of ground O 3 levels
There exists multifarious reasons that contribute to seasonal spatio-temporal heterogeneity of groundlevel ozone distribution.The photochemical reaction is promoted by the favorable meteorological factors in summer, such as the high temperature, low humidity and strong solar radiation.The ground-level ozone concentration in the southern coastal areas descend transparently in summer, which can be associated with the dilution of ozone and its inland migration under the affection of the summer monsoon (Yi et al. 2009;Bian et al. 2012;Jia et al. 2017;Wang et al. 2018).Furthermore, during the summer in southern China, frequent precipitation weather will inhibit the formation of O 3 and promote its decomposition that the free radicals such as OH x contained in water vapor will instantly decompose O 3 into O 2 under the circumstance of high relative humidity, thus reducing the concentration of O 3 (Qi et al. 2017).The comparatively dry weather conditions in northern China in summer further facilitate photochemical reactions, causing noteworthy discrepancy in ozone pollution between North and South China.The overall high ozone pollution in spring may be relevant to the frequent stratosphere-troposphere O 3 exchange in Mid-low latitude during spring (Cooper and Moody 2000).With plants growing vigorously in spring, the emission of plant-derived VOC S may be a natural source of increasing ozone concentration (Sadiq et al. 2017).

Sensitivity analysis of SWR datasets
Following the practice in previous studies (Young et al. 2016), the sensitivity analysis of the Himawari-8 SWR products has been performed by examining the changes in model performance with or without them, which is to demonstrate its significance to the ground-level estimation model.The *R value based on RMSE, which is calculated by 1-RMSE 2 /var(Obs), is selected as the indicator of sensitivity analysis.Different from conventional determination coefficient (R 2 ) that merely reflects the precision, the *R reveals more about the bias of estimated model.
As the result shown in Figure 15(a), there is sightly improvement of estimated precision emerged in the model with SWR dataset that the most significant improvement is 0.03 of *R that occurred in December 2018.From this point of view, it is plausible that the introduction of SWR dataset does not bring about great progress to the model performance.However, the aforementioned phenomena can be excusable.On the one hand, several meteorological factors already contain the sufficient information of SWR dataset, as the absolute value of correlation coefficient between SWR predictors and BLH, TEMP and RELH exceeds 0.49, 0.42 and 0.34, respectively.On the other hand, because of above predictors of high mutual correlation coefficient, the predictors of SWR dataset did not show prominent importance score in BT model, which can explain such indistinctive sensitivity analysis.However, it is worth noting that the main intention of introducing SWR dataset as the model predictor was not further improve model performance, but to extend the spatio-temporal resolution of ground-level ozone estimation.

Inferior performance of BT model to retrieve O 3 in typical season and region
Possible reasons for inferior performance of CV in winter may relate to the predictor with insignificant influence.The correlation coefficient between measured surface ozone of monitoring sites main parameters, including NO2, HCHO, and TEMP, decreased to −0.09, −0.05, 0.10 in winter, compared to that in summer, reaching 0.31, 0.16, 0.39, respectively.With the weakened solar radiation and lower temperature for photochemical reaction in winter, it may be more difficult for the model to simulate the mechanism of photochemical reaction and deal with the relationship between surface ozone and related parameters in winter.Stagnant weather conditions in winter may lead to ozone pollution staying longer, the weather condition may not instantly act on ozone variation, causing the temporal matching mechanism established in Section 3.2 out of work.
Regionally, relatively large-scale errors in the southeast coast of China and the bias of hourly spatial distribution of estimated ground-level ozone at 1600 BST can be found in eastern China.As the criteria depicted in 3.2 that matching time difference of datasets with hourly temporal resolution is restricted within 1 hour that may be an inadequate reaction time in the ozone generation process, implying ozone variation lags behind environmental variables (Wang et al. 2017;Chen et al. 2020), which indicates that the dissipation of ozone pollution with changing weather conditions is not as fast as the estimated ground-level ozone at 1600 BST in eastern China.The blocking of mountains in central China and complicated hilly terrain along the southeast coast may aggravate ozone accumulation, intensifying the hysteresis effect of dissipation of ozone concentration.However,topographic factors do not show high significance in estimation model, signifying the model used in this study may not be able to simulate the ozone retention very well.Meanwhile, the above discussion can also explain the relatively large-scale errors in the southeast coast of China as MAE and RMSE there are basically the same as that in other areas when removing the estimation of 1500 and 1600 BST.Additionally, aforesaid problems may be improved by adding spatio-temporal weight function (Wang et al. 2021) and adjusting the variable structure.

The potential to estimate MDA8
Although this study was developed for estimating hourly ground ozone levels, this article still tests the potential of proposed method to estimate MDA8, and compares the estimated results with the MDA8 dataset generated by previous studies.In this part, the sliding eight-hour average surface ozone concentration (A8O) measured by monitoring stations are taken  as the response of hourly model prediction.The estimated MDA8 was calculated as the maximum of certain day's estimated A8O.It is worth noting that the SWR data set used in this study has no available value at night.According to the data matching principle proposed in Section 2.2, there are no estimation generated in the time period before sunrise and after sunset every day.In other words, the MDA8 values calculated in this paper are not the maximum A8O within 24 hours on a certain day, but the maximum A8O within the time period after sunrise and before sunset, which may cause some deviation.In this part, the generated MDA8 is compared with the corresponding results of ChinaHighAirPollutants (CHAP) (Wei, Li et al. ,2022) and Tracking Air Pollution in China (TAP, http://tapdata.org.cn)(Xue et al. 2020;Xiao et al. 2022), and the nearest neighbor search is used to match the available measurements of monitoring sites with the estimated results of the aforementioned data sets, and statistical indicators were calculated to judge the quality of the aforementioned dataset.As displayed in Figure 16, CHAP data set shows the best performance in estimating MDA8, with a lowest RMSE value of 14.3 μg/m 3 , while this study is a little worse than that, with the same R 2 value and an RMSE value of 19.4 μg/m 3 .This study and CHAP are merely based on satellite remotesensing measurements, while estimated MDA8 values of TAP were simulated and fused from WRF-CMAQ and remote-sensing measurements.It is speculated that a certain degree of biases may be caused by the WRF-CMAQ since the R 2 value of TAP is 0.56 (Fu et al. 2022).The spatial resolution of both the two compared datasets are 10 km, and that of this study is 5 km, which appears slightly superior.It can be seen that this study still has strong potential in estimating MDA8, even though nighttime results are lacking because of aforementioned reasons.

Conclusions
This study establishes a scientific method to estimate spatially continuous estimation of hourly 5 km ground-level ozone concentrations based on Himawari-8 Short Wave Radiation dataset using BT model.The spatial-temporal distribution of hourly ground-level ozone is estimated by proposed method in most parts of China from 1 March 2018 to 28 February 2019.The main findings are summarized as follows.
(1) Himawari-8 Short Wave Radiation products show potential for spatially continuous estimation of hourly ground-level ozone concentration, with the highest correlation coefficients among all selected parameters between surface ozone measurements of monitoring sites.(2) In comparison and verification analysis between the estimation results and the sitebased observations, the R 2 value of FIT, 20fold clusterCV, 10-fold SCV and 10-fold SamCV reaches 0.95, 0.69, 0.87 and 0.89, respectively, implying the method applied in this study is of superior universality that the estimated results performed well in seasons and daytime and the spatial scalability for area which near or far from monitoring sites.The hourly average bias ranges from −10.60 to 10.87 μg/m 3 , and the absolute hourly mean bias reaches 2.62 μg/m 3 .(3) The R 2 values of seasonal SCV range from 0.78 (Winter) to 0.86 (Summer and Autumn), and that of hourly SCV ranges from 0.73 (0900 BST) to 0.86 (1300 to 1600 BST).The averaged R 2 value of sample-based CV of each site separately reaches 0.88.These results indicate highlevel spatio-temporal universality.(4) A distinct spatio-temporal heterogeneity is found in the seasonal and hourly estimated distribution of ground-level ozone concentration, which is caused by multifarious factors.
Remarkable "weekend effect of ozone" has been detected in northeast Hebei and Sichuan Basin and Yangtze River valley, indicating complicated ozone generation mechanisms in urban area.
Future study will focus on application of hourly regional ground-level ozone concentration.Besides, we will spare no effort to ameliorate existing problems by further improving spatiotemporal matching principle, introducing factors representing spatio-temporal weight and expanding the time scope henceforth to achieve more precise results.

Figure 1 .
Figure 1.Spatial distribution of the discrete O 3 monitoring network conducted by the China National Environmental Monitoring Center (CNMEC).(Elevation was derived from consortium for spatial information of the US geological survey).

Figure
Figure 2a denotes the 20 clusters based on spatial K-means method, monitoring sites in different cluster are assigned different colors to distinguishing each other.Figure 2(b-e) shows comparison and verification analysis between the estimation results and the

Figure 2 .
Figure 2. Scatterplot of estimated ground-level ozone concentrations from the conducted model versus corresponding site-based observations for (a) 20 clusters based on spatial K-means method, (b)all available samples, (c)20-fold cluster-based CV, (d)10-fold sitebased CV, (e)10-fold sample-based CV.The RMSE, MAE, rRMSE, number of samplings (N), R 2 and the linear-regression function are displayed in each scatterplot.

Figure 3 .
Figure 3. Scatterplot of seasonal estimated ground-level ozone concentrations from conducted model versus corresponding sitebased observations for (a-d) direct fitting of available data for each season, (e-h) 20-fold cluster-based seasonal CV, (i-l) 10-fold sitebased seasonal CV, and (m-p) 10-fold sample-based seasonal CV.The RMSE, MAE, rRMSE, number of samplings (N), R 2 and the linearregression function are displayed in each subplot.

Figure 4 .
Figure 4. Scatterplot of diurnal estimated ground-level ozone concentrations from conducted model versus corresponding site-based observations for (a-h) 20-fold cluster-based CV for each diurnal time periods, (i-p)10-fold sample-based diurnal CV, (q-x) 10-fold sitebased seasonal CV.The RMSE, MAE, rRMSE, number of samplings (N), R 2 and the linear-regression function are displayed in each subplot.

Figure 5 .
Figure 5. Variation of (a) daily averaged estimated O 3 concentrations versus measured O 3 and (b) daily averaged estimation bias.

Figure 6 .
Figure 6.Spatial distribution of estimated performance from conducted model versus corresponding site-based observations for sample number (a), R 2 (b), MAE (c), RMSE (d).

Figure 7 .
Figure 7. Spatial distribution of estimated performance from conducted model versus corresponding cluster-based observations for sample number (a), R 2 (b), MAE (c), RMSE (d).

Figure 8 .
Figure 8. Spatial distribution of seasonal average estimated ozone concentration.

Figure 9 .
Figure 9. Verification of spatial distribution of seasonal average O 3 .The site-based seasonal mean ground-level ozone concentrations are represented by tinctorial circles.

Figure 10 .
Figure 10.Spatial distribution of average estimated ground-level ozone concentrations for (a) all available data, and (b-i) different diurnal hours (0900 BST-1600 BST).

Figure 11 .
Figure 11.Verification of spatial distribution of hourly average O 3 .The site-based hourly mean ground-level ozone concentrations are represented by tinctorial circles.

Figure 12 .
Figure 12.Spatial distribution of hourly estimated ozone concentration on a specific day (Take June 30th, 2018 as an example, and the Beijing-Tianjin-Hebei is displayed as a typical region with severe surface ozone pollution.The top picture shows the whole country, and the bottom picture shows the Beijing-Tianjin-Hebei region).

Figure 13 .
Figure 13.Verification of spatial distribution of hourly estimated ozone concentration on a specific day (The top picture shows the whole country, and the bottom picture shows the Beijing-Tianjin-Hebei region).The site-based hourly mean ground-level ozone concentrations at corresponding date are represented by tinctorial circles (Take June 30th, 2018 as an example, and the Beijing-Tianjin -Hebei is displayed as a typical region with severe surface ozone pollution).

Figure 14 .
Figure 14.distribution of averaged estimated ground-level ozone concentration for (a)weekdays, (b)weekends, (c)differential between weekdays and weekends and (d)salient region under different ozone generation mechanisms.

Figure 15 .
Figure 15.(a)Monthly *R values of estimated models with or without Himawari-8 SWR dataset.(b)The importance score of features included in BT model.

Figure 16 .
Figure 16.Comparison among this study, CHAP and TAP on the estimated performance of MDA8.

Table 1 .
Collection of datasets used in this study.
umn ozone at global scale (http://aura.gsfc.nasa.gov/).Numerous studies have demonstrated the combination of OMI and MLS can be used to simulate tropospheric ozone

Table 2 .
Comparison of previous O 3 studies focused on China.