Inferring Near-Surface PM2.5 Concentrations from the VIIRS Deep Blue Aerosol Product in China: A Spatiotemporally Weighted Random Forest Model

Much of the population is exposed to PM2.5 (particulate matter) pollution in China, and establishing a high-precision PM2.5 grid dataset will be very valuable for air pollution and related studies. However, limited by the traditional models themselves and input data sources, PM2.5 estimations are of low accuracy with narrow spatial coverage. Therefore, we develop a new spatiotemporally weighted random forest (SWRF) model to improve the estimation accuracy and expand the spatial coverage of PM2.5 concentrations using the latest release of the Visible infrared Imaging Radiometer (VIIRS) Deep Blue (DB) aerosol product, along with meteorological variables, and socioeconomic data. Compared with traditional methods and the results of previous similar studies, our satellite-derived PM2.5 distribution shows better consistency with surface-measured records, having a high out-of-sample (out-of-station) cross-validation (CV) coefficient of determination (CV-R2), root mean squared error (RMSE), and mean absolute error (MAE) of 0.87 (0.85), 11.23 (11.53) μg m−3 and 8.25 (8.78) μg m−3, respectively. The monthly, seasonal, and annual mean PM2.5 were also successfully captured (CV-R2 = 0.91–0.92, RMSE = 4.35–6.72 μg m−3). Then, the spatial characteristics of PM2.5 pollution in 2018 were investigated, showing that although air pollution has diminished in recent years, China still faces a high PM2.5 pollution risk overall, especially in winter (average = 50.43 + 16.81 μg m−3). In addition, 19 provinces or administrative regions have annual PM2.5 concentrations >35 μg m−3, particularly the Xinjiang Uygur Autonomous Region (~55.25 μg m−3), Tianjin (~49.65 μg m−3), and Henan Province (~48.60 μg m−3). Our estimated surface PM2.5 concentrations are accurate, which could benefit further research on air pollution in China.


Introduction
In the past decade, air pollution in China has become a hot topic. Among all air pollutants, PM 2.5 (particulate matter with the aerodynamic diameter ≤2.5 µ m) has been identified as the major component of composite air pollution in China. With the rapid increase in urban loading throughout China, particulate-laden (i.e., smoggy) air is emitted from industrial sources during production processes and from domestic emissions, particularly in urban agglomerations throughout China [1][2][3]. Furthermore, high PM 2.5 concentration loading has caused frequent heavy pollution episodes in those key regions in recent years [4]. More importantly, public health studies have confirmed that particulate matter is a crucial causative factor in respiratory, cardiovascular, and autoimmune diseases [5][6][7][8], and Rohde and Muller (2015) estimated that air pollution kills approximately 700,000 to 2,200,000 people per year in China. Moreover, fine particulate matter could also affect the material and energy cycles of the Earth-atmosphere system [9]. Hence, related research on PM 2.5 is important for resolving air pollution and environmental health problems.
In the above context, the Chinese government has monitored the PM 2.5 concentration in real time since 2013, for which in situ observation sites were established primarily in urban areas across mainland China [10]. Despite these endeavors, however, the temporal and spatial distributions of measurement stations are highly discontinuous, introducing major uncertainties into related research. Therefore, mapping the PM 2.5 concentration distribution across China is vital. Fortunately, in tandem with the advances made in satellite remote-sensing technology, it is feasible to construct high-precision and -resolution PM 2.5 records. The aerosol optical depth (AOD) refers to the integration of the extinction coefficient of the medium in the vertical direction, which is used to describe the reduction of light by aerosol [11,12]. It has been used as the most important indicator to derive ground-level PM 2.5 concentrations due to their stable and positive relationships [13]. The moderate-resolution imaging spectroradiometer (MODIS) Dark Target (DT) and Deep Blue (DB) algorithm aerosol products with a spatial resolution of 10 km have been extensively applied [14,15]. However, since the launch of Terra, the first Earth Observing System (EOS) AM-1 orbiting satellite, on 18 December 1999, this satellite has been in service for over 20 years, which is far beyond its designed service life. As an extension and improvement of MODIS and the Advanced Very High Resolution Radiometer (AVHRR), a new sensor, the Suomi National Polar-orbiting Partnership (NPP) Visible infrared Imaging Radiometer (VIIRS) satellite was successfully launched in 2011. The VIIRS aerosol products (VAOOO) are generated by an algorithm similar to the DT algorithm at a spatial resolution of 6 km [16]. However, large uncertainties were discovered in the VIIRS VAOOO AOD product with large missing values, especially for bright surfaces [17,18]. Fortunately, a new aerosol product, i.e., the VIIRS Version 1 DB aerosol (AERDB) product, was generated using the DB algorithm over land and a satellite ocean aerosol retrieval algorithm over the ocean and was released by the National Aeronautics and Space Administration (NASA) in February 2018. Owing to improvements in the radiative transfer model, surface reflectance and aerosol type, this new product yields a much better accuracy with wider spatial coverage than the VIIRS VAOOO AOD product, especially for bright surfaces [19].
Three main approaches have been widely used to derive PM 2.5 concentrations: physical mechanisms [20], statistical modeling [21] and machine learning [22,23]. Many of these methods were applied to establish the VIIRS AOD-PM 2.5 relationship on different spatial scales. Wu et al. (2016) selected a spatiotemporal statistical model to construct a prediction-based PM 2.5 map for the Beijing-Tianjin-Hebei (BTH) region based on the VIIRS AOD product, meteorological data, and other auxiliary data, resulting in a R 2 of 0.72 for the model validation. Subsequently, Yao et al. (2018) indicated that the VIIRS AOD-based model could explain 76% of the PM 2.5 variations in the BTH and work better than the MODIS AOD-based model (~71%), hence suggesting that using the VIIRS AOD product is meaningful. On this basis, a two-stage model incorporating both spatial and temporal information was created to establish a PM 2.5 map for China according to the VIIRS AOD product, and the cross-validation (CV) R 2 was 0.60 [24]. In addition, studies have been conducted to invert the PM 2.5 distribution based on the VIIRS AOD product in many regions of China [10,24]. However, the consistency between the model-predicted and ground-measured PM 2.5 remains poor because the VIIRS VAOOO AOD product lacks determinacy and traditionally derived models are unstable. Accordingly, because of its superior data mining capability and ability to construct regression relationships, machine learning has been widely employed to derive accurate near-surface PM 2.5 .
Although there are existing studies making gridded PM 2.5 data of the China region from different satellite AOD products, there are still many limitations that need to be improved. On the one hand, the overall accuracy of PM 2.5 estimates in previous studies is generally low and limited by the developed models, which have poor data mining ability or ignored the spatiotemporal differences of air pollution. On the other hand, most previous studies are based on the widely used MODIS satellites, which have been in service for more than 20 years and may decommission soon in the future. Thus, VIIRS will play an important role in extending the EOS long-term observations [25,26]. However, recent related PM 2.5 studies are mainly relying on VIIRS VAOOO AOD products, which have a large number of missing values over bright surfaces due to the limitation of the DT algorithm [17,18]. Therefore, focusing on these issues, we developed a new spatiotemporally weighted random forest (SWRF) model based on ensemble learning by considering the spatiotemporal characteristics of air pollution to improve the estimation accuracy and spatial coverage of PM 2.5 concentrations in China from the newly released VIIRS DB AOD products. Section 2 introduces the data and establishes the SWRF model. Section 3 evaluates the performance of the SWRF model and presents a comparison with other relevant studies and traditional models; moreover, the PM 2.5 spatial distribution is investigated in China in this section. Section 4 provides the summary and conclusions. For this research, hourly PM 2.5 records were obtained from 1583 ground-based monitoring stations operating in 2018 in China to build the AOD-PM 2.5 relationships. To be consistent with the satellite overpass time, hourly PM 2.5 measurements from 13:00 to 14:00 local time were averaged regard as the daily mean PM 2.5 concentration for model fitting and verification. Figure 1 displays the locations of in situ observation stations in monitoring air quality across China. In general, the number of sites in East China is much higher than that in Western China, and the distribution of sites in the former is more uniform than that in the latter. For the urban agglomeration regions in particular, e.g., the BTH, Yangtze River Delta (YRD) and Pearl River Delta (PRD) regions, the urban and suburban areas in China are characterized by a high station density. In this study, daily Level-2 VIIRS AERDB product data with a spatial resolution of 6 km were collected for 2018 to establish the AOD-PM 2.5 relationship. For VIIRS aerosol products, only the cloud-free DB AOD retrievals (550 nm) with the highest quality (i.e., quality assurance = best) in 2018 covering the whole of China are employed [25,26]. In addition, AOD measurements provided by the Aerosol Robotic Network (AERONET) at 10 stations across China in 2018 were collected to evaluate the accuracy of the VIIRS DB AOD product. For AOD measurements, the AERONET Version 3 Level 2.0 data with cloud screened and quality assured were selected [27]. Figure 2 shows the agreement between the satellite-based and ground-measured AODs in China, demonstrating excellent consistency between the VIIRS AERDB retrievals and AERONET AODs with a high R 2 , low root mean squared error (RMSE), and low mean absolute error (MAE) of 0.91, 0.11, and 0.07, respectively. The high quality of these data allows a stable AOD-PM 2.5 relationship to be established.

Meteorological Dataset
Meteorological factors can impact PM 2.5 concentrations [26,27], and surface meteorological measurements are available in China, but they are in situ observations, and there are only 720 public base stations across China, which are more sparsely distributed and number much less than PM 2.5 monitoring stations. Thus, it cannot meet the needs of our study to generate the grid data in our study paper. By contrast, ERA-Interim reanalysis data can provide spatial continuous surface meteorological measurements and has a considerable accuracy across China [28,29]. Here, eight relevant meteorological factors, namely, the 2-m temperature (TEM), surface pressure (SP), wind direction (WD) and speed (WS), relative humidity (RH), evaporation (ET) precipitation (PRE) and the boundary layer height (BLH), were collected [30] to help build the AOD-PM 2.5 relationship. The temporal and spatial resolutions of the ERA-Interim meteorological variables were 3 hours and 0.125 • × 0.125 • , respectively.

Other Multiple Datasets
In this study, the VIIRS normalized difference vegetation index (NDVI) was collected with a monthly temporal resolution and a spatial resolution of 750 m to explain the spatial heterogeneity of PM 2.5 concentrations caused by vegetation coverage in China. However, land use and altitude also have an important impact on PM 2.5. Therefore, VIIRS annual land cover data (LUC, 750 m) and a Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM, 90 m) were obtained. Furthermore, as socioeconomic factors also play a crucial role in air pollution monitoring, monthly nighttime light (NTL) data were obtained from the VIIRS Day/Night Band (DNB) product with a spatial resolution of 500 m. All auxiliary data were resampled using bilinear interpolation to the same spatial resolution of 0.06 • ×0.06 • to be consistent with VIIRS AOD.

Spatiotemporally Weighted Random Forest Model
Aimed at the issues of weak data mining ability and ignoring the spatiotemporal heterogeneities of PM 2.5 pollution for current existing models, in this study we developed a totally new spatiotemporally weighted random forest (SWRF) model by involving the spatial and temporal information into the ensemble learning to improve the PM 2.5 estimations in China. The SWRF model includes two stages: the first stage is a temporally weighted random forest (TRF) model [31], which is used to build the preliminary relationship between PM 2.5 and explanatory variables by considering the different air pollution conditions among varying days in a year. The second stage employs the geographically weighted regression (GWR) model, which is used to eliminate the residual by considering the spatial autocorrelations and differences of points in space. Figure 3 shows the flowchart and structure of the SWRF model.  In this paper, we employed two independent 10-fold cross-validation (10-CV) 257 methods [39] based on the data samples (i.e., out-of-sample or sample-based) and the 258 PM2.5 monitoring stations (i.e., out-of-station or station-based), respectively. Data samples 259 or PM2.5 monitoring stations were randomly divided into 10 subsets, where nine were 260 used as training data, and one was used as validation data. This procedure was repeated 261 10 times until all the data had been tested. The training and validation data were totally 262 independent in the sample and spatial scales, which have been widely used to evaluate 263 the overall accuracy and spatial prediction ability of the model [40][41][42]. In addition, four 264 statistical indicators were employed: the regression line (slope and intercept), R 2 , RMSE, 265 In the first stage, to describe the time information more accurately, a time-weighted matrix was established to reflect the temporal difference in PM 2.5 concentrations among different days in a year. The matrix includes the day of the year (DOY, ranging from 1 to 366) and weighted time distances, where the DOY represents a time interval of equal weight (e.g., 1 day) used to identify each data record at one point on different days in a year. This matrix can reflect the different pollution conditions on each day and the continuity Remote Sens. 2021, 13, 505 6 of 17 between adjacent days. In addition, PM 2.5 pollution shows obvious seasonal variations [32]: summer (winter) is usually the cleanest (dirtiest) season; thus, seasonal differences were also considered. To describe the indeterminacy caused by seasonal fluctuations, 4 seasonal time nodes, i.e., 15 January, 16 April, 16 July, and 16 October, were selected to represent the middle of winter, spring, summer, and autumn, respectively. Then, the minimum time intervals from the above four timing nodes of each day in a year were calculated based on the inverse distance weighting (IDW) method [33]. Accordingly, a time-weighted matrix was established to depict the PM 2.5 variations on daily and seasonal scales, and the matrix can be expressed as: where TW M gt indicates the time-weighted matrix in grid g on day t, DOY gt represents the DOY index in grid g on day t, and SprD gt , SumD gt , AutD gt and WinD gt are the temporal distances for four seasons.
Except for TW M gt , other variables to be entered RF model include AOD, BLH, ET, NDVI, NTL, RH, PRE, SP, TEM, WD, and WS. We first performed a correlation analysis to ensure the interrelation between PM 2.5 and each independent variable to ensure the results are statistically significant (Table 1). Among them, AOD, evaporation, land cover, night light, relative humidity, surface pressure, and wind direction showed positive influences on the PM 2.5 concentration; in contrast, negative relationships were found between PM 2.5 and boundary layer height, digital elevation model, NDVI, precipitation, temperature and wind speed. These significant interrelations with PM 2.5 indicate that all the above variables could contribute to estimating the PM 2.5 concentration distribution. In addition, the multicollinearity problem may arise due to many independent variables employed herein. Therefore, we used the variance inflation factor (VIF) method to test and eliminate collinearity among these predictors (Table 1). There are obvious collinearity problems among independent variables if the VIF values are >10 [34]. The results show that except for the strong collinearity between SP (VIF = 10.09) and DEM (VIF = 10.10), all the other variables are totally independent with small VIF values (less than 10). However, SP and DEM are used in different stages in the SWRF model; thus, the collinearity between SP and DEM can be ignored, and all of the aforementioned variables were involved in estimating PM 2.5 . Therefore, the aforementioned 11 independent variables and the temporal term are input to the RF model as: where PM 2.5 − Pre gt indicates the preliminary estimated PM 2.5 concentrations at the surface in grid g on day t. Furthermore, PM 2.5 pollution also shows significant spatial heterogeneities because of the difference of natural conditions and human activities at the regional scale. Therefore, in the second stage, AOD, LUC, and DEM were selected to calculate geographical weights to express the spatial autocorrelations and differences at a point in space and to correct the spatial uncertainty in the PM 2.5 estimates obtained in the first stage. The residuals of preliminary near-surface PM 2.5 concentrations (i.e., the differences between the measured and predicted PM 2.5 concentrations) were calculated as the explained variable, and a geostatistical method was used to revise the PM 2.5 spatial heterogeneity as follows: where PM 2.5 − resi gt indicates the simulated PM 2.5 residual obtained by the RF model in grid g on DOY t, a 0g represents the intercept in grid g, a 1 µ gt , υ gt , a 2 µ gt , υ gt and a 3 µ gt , υ gt represent the slopes of AOD, DEM, and LUC, and ε0 gt is the error term.
In addition, Traditional statistical methods have been widely used in the estimation of surface PM 2.5 concentration. To compare the inversion ability of our SWRF model to pervious methods, five traditional widely used PM 2.5 estimation models, including the generalized additive model (GAM), the multiple linear regression (MLR), linear mixed effect (LME), geographically weighted regression (GWR) methods and the traditional 2-stage approach, were selected for comparison used the same dataset with SWRF model [21,[35][36][37][38].

Valuation Approaches
In this paper, we employed two independent 10-fold cross-validation (10-CV) methods [39] based on the data samples (i.e., out-of-sample or sample-based) and the PM 2.5 monitoring stations (i.e., out-of-station or station-based), respectively. Data samples or PM 2.5 monitoring stations were randomly divided into 10 subsets, where nine were used as training data, and one was used as validation data. This procedure was repeated 10 times until all the data had been tested. The training and validation data were totally independent in the sample and spatial scales, which have been widely used to evaluate the overall accuracy and spatial prediction ability of the model [40][41][42]. In addition, four statistical indicators were employed: the regression line (slope and intercept), R 2 , RMSE, and MAE. In general, the out-of-station results were slightly worse than the out-of-sample results, further illustrating the robustness of our model, which was mainly due to the advantages of integrated learning. out-of-station results were slightly worse than the out-of-sample results, further illustrating the robustness of our model, which was mainly due to the advantages of integrated learning.

Spatiotemporal-Scale Validation
The spatial and temporal consistency between our PM2.5 estimates and the surface measurements was also verified. Figure 5 presents the regional sample-based 10-CV results in Chinese main urban agglomerations (i.e., BTH, YRD, and PRD). The satellite-derived PM2.5 show the highest CV-R 2 (0.89) and strongest regression lines (slope =

Spatiotemporal-Scale Validation
The spatial and temporal consistency between our PM 2.5 estimates and the surface measurements was also verified. Figure 5 presents the regional sample-based 10-CV results in Chinese main urban agglomerations (i.e., BTH, YRD, and PRD). The satellite-derived PM 2.5 show the highest CV-R 2 (0.89) and strongest regression lines (slope = 0.86) but the largest RMSE (12.65 µg m −3 ) and highest MAE (9.35 µg m −3 ) in the BTH region because this agglomeration is characterized by the most severe air pollution, followed by the YRD region with average out-of-sample CV-R 2 , RMSE, and MAE values of 0.87, 9.74 µg m −3 , and 7.41 µg m −3 , respectively. However, we reached the opposite conclusion in the PRD; i.e., the model in this region yields the lowest sample-based CV-R 2 of 0.83 but the smallest RMSE and MAE values of 8.35 and 6.47 µg m −3 , respectively. The increased frequency of clouds over South China results in a smaller number of data samples. The reduction of the number of training samples can affect the learning and training ability of the model, which is one of the potential reasons for affecting the accuracy of the results [43]. However, it will not have a great impact on the regional scale. The numbers of data samples of YRD and PRD regions are 17,407 and 1984, respectively, with a difference of about 9 times, but the difference of CV-R 2 is only −0.04. In addition, PRD region has a much lower level of PM 2.5 pollution, with most data samples falling within the range of 0-100 µg m −3 . Figure 6 illustrates the model accuracy and uncertainty (i.e., R 2 and RMSE) at each PM 2.5 monitoring station across China. The results suggest obvious spatial differences in model performance: higher CV-R 2 values are observed mainly in East China, especially in Henan Province (CV-R 2 = 0.98), whereas lower CV-R 2 values are observed primarily in Western China, especially in the Xinjiang Uygur Autonomous Region (CV-R 2 <0.6). In addition, except for some individual sites located in Xinjiang (RMSE >20 µg m −3 ), our model yields small uncertainties with RMSEs <15 µg m −3 at most stations across China. This is mainly attributed to the lack of monitoring stations as well as the frequent occurrence of sandstorms in Western China, increasing the uncertainty of PM 2.5 retrievals. Nevertheless, the average site-scale CV-R 2 and RMSE are 0.81 and 11.39, respectively, and approximately 82.5% and 82.2% of all surface sites express a high CV-R 2 >0.7 and a low RMSE <15 µg m −3 . These results illustrate that, in China, our SWRF model can successfully estimate PM 2.5 at site scale. 0.86) but the largest RMSE (12.65 μg m −3 ) and highest MAE (9.35 μg m −3 ) in the BTH re-298 gion because this agglomeration is characterized by the most severe air pollution, fol-299 lowed by the YRD region with average out-of-sample CV-R 2 , RMSE, and MAE values of 300 0.87, 9.74 μg m −3 , and 7.41 μg m −3 , respectively. However, we reached the opposite con-301 clusion in the PRD; i.e., the model in this region yields the lowest sample-based CV-R 2 of 302 0.83 but the smallest RMSE and MAE values of 8.35 and 6.47 μg m −3 , respectively. The 303 increased frequency of clouds over South China results in a smaller number of data 304 samples. The reduction of the number of training samples can affect the learning and 305 training ability of the model, which is one of the potential reasons for affecting the accu-306 racy of the results [43]. However, it will not have a great impact on the regional scale. 307 The numbers of data samples of YRD and PRD regions are 17,407 and 1984, respectively, 308 with a difference of about 9 times, but the difference of CV-R 2 is only −0.04. In addition, 309 PRD region has a much lower level of PM2.5 pollution, with most data samples falling 310 within the range of 0-100 μg m −3 . Furthermore, the temporal performance of the SWRF model as a function of the DOY was also investigated. In 2018, the CV-R 2 varies from 0.27 to 0.98 with an average of 0.73 in China. In general, approximately 65.5% of all days have a high CV-R 2 >0.7, with only 8 days yielding a low CV-R 2 <0.4. The PM 2.5 predicts are always less correlated with the ground measurements (CV-R 2 = 0.67) but with lower uncertainties (RMSE = 8.32 µg m −3 ) on summer days than on days in the other seasons; by contrast, higher CV-R 2 values > 0.72 but larger RMSEs >15 µg m −3 are always observed on spring and winter days. This disparity occurs because in summer, the dominant natural (e.g., meteorological) conditions complicate the AOD-PM 2.5 relationship; by contrast, spring and winter are characterized by a higher number of severely polluted days due to intense human activity (e.g., heating) and a higher frequency of sand-dust storm days, especially in North China [44,45]    however, the GAM does not consider the spatiotemporal heterogeneity of air pollution. Thus, the GWR and LME models were selected for comparison; however, their accuracies are unsatisfactory with overall low sample-based (station-based) CV-R 2 of 0.55 (0.53) and 0.67 (0.66), respectively, and low RMSEs of 17.32 (17.65) µg m −3 and 16.48 (16.77) µg m −3 because these two models consider only one factor of spatial and temporal information. Thus, a 2-stage model combining the LME and GWR models was employed for the estimation, demonstrating improvements in the accuracy and predictive ability with better evaluation metrics (e.g., CV-R 2 = 0.70-0.72, RMSE=15.59 -15.72 µg m −3 , MAE=12.41-12.87 µg m −3 ). Nevertheless, the two-stage model is much less accurate than our SWRF model due to the considerably weaker data mining ability of the former and the poor integration of spatiotemporal information. Moreover, we compared our results with those of previous PM2.5 studies using the 383 VIIRS AOD product at the national and regional scales in China ( Table 2). All the listed 384 studies performed the independent validation using the same out-of-sample 10-CV 385 method, making it comparable. We found that our model is more accurate than the time 386 fixed effects regression (TFER) model (CV-R 2 = 0.72, RMSE = 22.07 μg m −3 ) and the com-387 bination of the TFER and GWR models in the BTH region (CV-R 2 = 0.72, RMSE = 19.72 μg 388 m −3 ) [14,46]. In addition, our model outperforms the LME (CV-R 2 = 0.64, RMSE = 18.02 μg 389 m −3 ), LME + GAM (CV-R 2 = 0.69, RMSE = 15.82 μg m −3 ), and LME + GWR (CV-R 2 = 0.70, 390 RMSE =15.73) models in Central China [23]. Furthermore, our model is superior to the 391 spatially structured adaptive 2-stage model (TFER + GWR) (CV-R 2 = 0.60, RMSE = 21.67 392 μg m −3 ) in the whole of China. This is mainly due to the much stronger data mining abil-393 ity of our model. In addition, we further optimized the introduction of spatiotemporal 394 information and selected the AERDB AOD product with higher accuracy and wider 395 coverage than the VAOOO DT AOD product in our study.  Moreover, we compared our results with those of previous PM 2.5 studies using the VIIRS AOD product at the national and regional scales in China ( Table 2). All the listed studies performed the independent validation using the same out-of-sample 10-CV method, making it comparable. We found that our model is more accurate than the time fixed effects regression (TFER) model (CV-R 2 = 0.72, RMSE = 22.07 µg m −3 ) and the combination of the TFER and GWR models in the BTH region (CV-R 2 = 0.72, RMSE = 19.72 µg m −3 ) [14,46]. In addition, our model outperforms the LME (CV-R 2 = 0.64, RMSE = 18.02 µg m −3 ), LME + GAM (CV-R 2 = 0.69, RMSE = 15.82 µg m −3 ), and LME + GWR (CV-R 2 = 0.70, RMSE =15.73) models in Central China [23]. Furthermore, our model is superior to the spatially structured adaptive 2-stage model (TFER + GWR) (CV-R 2 = 0.60, RMSE = 21.67 µg m −3 ) in the whole of China. This is mainly due to the much stronger data mining ability of our model. In addition, we further optimized the introduction of spatiotemporal information and selected the AERDB AOD product with higher accuracy and wider coverage than the VAOOO DT AOD product in our study.

Spatial Distribution of PM 2.5 in China
Ground-based observations can provide high-frequency and high-precision PM 2.5 data at the individual station scale; however, PM 2.5 monitoring stations distributed unevenly and varied greatly in number at the regional scale (e.g., only 51 stations in Hebei province) and, in addition, most sites were distributed in urban areas. This cannot accurately observe the air pollution from a wide-scale, leading to inevitable overestimations due to the obvious urban-rural differences. By contrast, satellite remote sensing can make up for this deficiency by generating spatially continuous data, which can provide more accurate distribution and variations of PM 2.5 pollution, especially for those areas without stations, such as suburban/rural areas. Therefore, based on the SWRF model, we generated spatially continuous PM 2.5 distributions for 2018 at a spatial resolution of 6 km across China. Figure 9 shows the satellite-derived and surface-based annual averaged PM 2.5 among China. Our satellite retrievals reveal spatial patterns that are highly consistent with the ground measurements at most sites in China, especially in North and Central China, which suffer from severe air pollution. In general, the 2018 annual mean PM 2.5 among China was 36.47 ± 12.45 µg m −3 . Although air pollution has been alleviated recently, more than 45% of the area in China still exceeds the national air quality standard (PM 2.5 = 35 µg m −3 ). The main polluted regions are the North China Plain and Sichuan Basin, which are characterized by developed economies, large populations, and rapid industrial development. In addition, the Taklimakan Desert area of Xinjiang also shows an extremely high PM 2.5 pollution level because it is a main dust source area and experiences frequent dust storms in spring. In contrast, lower PM 2.5 (<15 µg m −3 ) are mainly observed in Southwest and Northeast China, which exhibit high vegetation coverage.
We also investigated the local PM 2.5 concentrations in three city clusters in China (BTH, YRD, and PRD region). In general, the BTH region shows the worst air quality with an annual mean PM 2.5 of 45.60 ± 10.44 µg m −3 . Higher PM 2.5 exist in the southern BTH region with widely distributed manufacturing and industry [48]. In contrast, the pollution in the northern BTH region is much lighter than in the northern part due to the high vegetation coverage and fewer human activities. In the YRD, the annual mean PM 2.5 is 43.68 ± 8.31 µg m −3 . In particular, the southern part of the YRD suffers from high PM 2.5 pollution: highly developed urbanization and transportation increase the emissions of primary pollutants and promote the production of secondary pollutants. In the PRD, the PM 2.5 pollution is relatively low (average = 37.32 ± 5.30 µg m −3 ), but this level still exceeds the national air pollution standard. The occurrence of relatively strong wet settlement and an advanced industrial structure are both valuable for decreasing the PM 2.5 in PRD.   Figure 11 shows the spatial distributions of the seasonal PM 2.5 in 2018 across China. Winter shows the most severe air pollution with the highest mean concentration of 50.43 ± 16.81 µg m −3 ; approximately 80.76% of China exceeded the acceptable air quality standard in the winter of 2018. The Xinjiang Uygur Autonomous Region, Sichuan Basin, and North China Plain are the major highly polluted areas. Fossil fuel combustion during the heating season and the unfavorable weather conditions for the dissipation of PM can explain this. By contrast, summer has the least PM 2.5 pollution, and except for the desert areas of Northwest China, over 88% of the country has low PM 2.5 concentrations (below 35 µg m −3 ). The pollution level in autumn is similar to that in summer with an average value of 30.77 ± 9.55 µg m −3 . In spring, higher PM 2.5 > 80 are observed, mainly in the Taklimakan Desert due to frequent sandstorms.  458 and North China Plain are the major highly polluted areas. Fossil fuel combustion during 459 the heating season and the unfavorable weather conditions for the dissipation of PM can 460 explain this. By contrast, summer has the least PM2.5 pollution, and except for the desert 461 areas of Northwest China, over 88% of the country has low PM2.5 concentrations (below 462 35 μg m −3 ). The pollution level in autumn is similar to that in summer with an average 463 value of 30.77 ± 9.55 μg m −3 . In spring, higher PM2.5 > 80 are observed, mainly in the 464 Taklimakan Desert due to frequent sandstorms.

Conclusions
The number of satellite-based studies on the near-surface PM2.5 concentration in China is increasing. However, these studies are based mainly on MODIS aerosol products with a coarse spatial resolution, and the MODIS satellites have been in service for

Conclusions
The number of satellite-based studies on the near-surface PM 2.5 concentration in China is increasing. However, these studies are based mainly on MODIS aerosol products with a coarse spatial resolution, and the MODIS satellites have been in service for more than 20 years, exceeding their design life. In our study, by contrast with the traditional models that have weak data mining ability and ignore the spatiotemporal heterogeneities of air pollution, we developed a new spatiotemporally weighted random forest (SWRF) model based on the idea of ensemble learning to improve PM 2.5 estimations in China. The temporal information is represented by a time matrix calculated using the inverse distance weighting method, which is used to identify the differences in air pollution conditions among different days and seasons in a year. The spatial information is determined according to the geographically weighted regression to describe the autocorrelations of points in space. Validation results show that our model has a high accuracy and a strong predictive ability with average CV-R 2 of 0.87 and 0.85, respectively, and RMSEs of 11.23 µg m −3 and 11.53 µg m −3 . In addition, the proposed model also works well at varying spatial and temporal scales. More importantly, comparison results show that the model performance has been significantly improved after considering the spatiotemporal information, and our model outperforms the traditional models and those models developed in previous related studies. Based

Data Availability Statement:
The data presented in this study are available on request from the co-first authors.

Conflicts of Interest:
All authors declare no conflict of interest.