Improving PM2.5 Forecasting and Emission Estimation Based on the Bayesian Optimization Method and the Coupled FLEXPART-WRF Model

In this study, we evaluated estimates and predictions of the PM2.5 (fine particulate matter) concentrations and emissions in Xuzhou, China, using a coupled Lagrangian particle dispersion modeling system (FLEXPART-WRF). A Bayesian inversion method was used in FLEXPART-WRF to improve the emission calculation and mixing ratio estimation for PM2.5. We first examined the inversion modeling performance by comparing the model predictions with PM2.5 concentration observations from four stations in Xuzhou. The linear correlation analysis between the predicted PM2.5 concentrations and the observations shows that our inversion forecast system is much better than the system before calibration (with correlation coefficients of R = 0.639 vs. 0.459, respectively, and root mean square errors of RMSE = 7.407 vs. 9.805 μg/m3, respectively). We also estimated the monthly average emission flux in Xuzhou to be 4188.26 Mg/month, which is much higher (by ~10.12%) than the emission flux predicted by the multiscale emission inventory data (MEIC) (3803.5 Mg/month). In addition, the monthly average emission flux shows obvious seasonal variation, with the lowest PM2.5 flux in summer and the highest flux in winter. This pattern is mainly due to the additional heating fuels used in the cold season, resulting in many fine particulates in the atmosphere. Although the inversion and forecast results were improved to some extent, the inversion system can be improved further, e.g., by increasing the number of observation values and improving the accuracy of the a priori emission values. Further research and analysis are recommended to help improve the forecast precision of real-time PM2.5 concentrations and the corresponding monthly emission fluxes.


Introduction
PM 2.5 refers to atmospheric particulates with aerodynamic diameters less than 2.5 µm in ambient air and is one of the six major atmospheric pollutants [1]. Although atmospheric PM 2.5 particulates account for a small proportion of the particles in the earth's atmosphere, they have an important impact on air quality, air visibility, the atmosphere radiation balance and precipitation [1,2]. China's rapid industrialization and urbanization have resulted in decreased air quality, especially in urban an accurate air pollutant emission inventory is very important. Such an inventory can provide fundamental information for understanding air pollution formation and help guide decision making with respect to air pollution control. Recently, various air pollutant emission inventories have been established at various scales. For example, the emission database for global atmospheric research (EDGAR) was developed by scientists at the Netherlands Organization for Applied Scientific Research (TNO) and the National Institute for Public Health and the Environment (RIVM). Zhang et al. (2009) established an inventory of air pollutant emissions in Asia in 2006 to support the Intercontinental Chemical Transport Experiment-Phase B (INTEX-B) [26]. The 2012 emission database based on China's multiscale emission inventory data (MEIC) developed by Tsinghua University is a highly representative pollution emission inventory product in China. However, its coarse spatiotemporal resolution and considerable uncertainty limits its application [27]. To fundamentally understand PM 2.5 emissions and solve pollution problems on a regional scale, a large amount of manpower and material resources are needed. Therefore, optimization of the emission database is urgently required.
Thus, the main objectives of this paper are to optimize the primary PM 2.5 emission inventory, to increase the accuracy of the real-time PM 2.5 concentration forecasting, and to calibrate the hourly FLEXPART-WRF simulation system and monthly emission inventory using ground monitoring data and the Bayesian optimization method. This model makes it possible to correct the spatiotemporal variation in the PM 2.5 emission flux.
This paper focuses on evaluating the performance of the Bayesian inversion method in improving air quality forecasting using the FLEXPART-WRF modeling system at high resolutions over the domain of Xuzhou, China. The model evaluation exercise covers the full year of 2016 and includes observations from the 58 air quality monitoring stations in the study area. The results show that the Bayesian optimization method can improve the prediction accuracy of the model.
The study area is introduced in Section 2.1. The data and model are described in Sections 2.2 and 2.3, respectively. The inversion method used to improve the accuracy is described in Section 2.4. The results and discussion are presented in Section 3, and the conclusion is given in Section 4.

Study Area
The research area of this study is Xuzhou city in Jiangsu Province, China, which covers an area of 11,258 square kilometers and had a total population of 8.76 million in 2017. Economic development has been accompanied by serious pollution since the 1980s. Considering the impact of pollutants in the surrounding area of the Xuzhou region, we selected Xuzhou city and its surrounding area as the research area for inversion of the emission inventory ( Figure 1). In view of the operation efficiency of the model, the study scope cannot be too large. Therefore, the core research area should be kept close to the center of the study area, so that the influence of surrounding area on the prediction can be captured as much as possible.

Meteorological Data
The numerical weather prediction model WRF Version 3.6 was employed to model the meteorological field at hourly time steps with nested domains with resolutions of 27 km, 9 km and 3 km ( Figure 2). The boundary values were acquired from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data (http://www.ecmwf.int) with a resolution of 28 km and a 6-hourly interval. These meteorological outputs were used to force the Lagrangian particle model FLEXPART [28].

Meteorological Data
The numerical weather prediction model WRF Version 3.6 was employed to model the meteorological field at hourly time steps with nested domains with resolutions of 27 km, 9 km and 3 km ( Figure 2). The boundary values were acquired from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data (http://www.ecmwf.int) with a resolution of 28 km and a 6-hourly interval. These meteorological outputs were used to force the Lagrangian particle model FLEXPART [28].

Meteorological Data
The numerical weather prediction model WRF Version 3.6 was employed to model the meteorological field at hourly time steps with nested domains with resolutions of 27 km, 9 km and 3 km ( Figure 2). The boundary values were acquired from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data (http://www.ecmwf.int) with a resolution of 28 km and a 6-hourly interval. These meteorological outputs were used to force the Lagrangian particle model FLEXPART [28].

A Priori PM 2.5 Emission Inventory
An a priori air pollutant emission inventory was acquired from the emission inventory of China (i.e., MEIC, http://www.meicmodel.org/), which includes all major anthropogenic emission sources in East Asia at a resolution of 0.25 • × 0.25 • (Figure 3). Major uncertainties in the PM 2.5 emission data in the MEIC inventory were mostly confined to the residential combustion and transportation sectors because their activity data are not well reported through national and provincial statistical systems. The total Xuzhou anthropogenic emissions of PM 2.5 in 2010 were estimated to be 3803.5 Mg/month based on the MEIC data. An a priori air pollutant emission inventory was acquired from the emission inventory of China (i.e., MEIC, http://www.meicmodel.org/), which includes all major anthropogenic emission sources in East Asia at a resolution of 0.25° × 0.25° (Figure 3). Major uncertainties in the PM2.5 emission data in the MEIC inventory were mostly confined to the residential combustion and transportation sectors because their activity data are not well reported through national and provincial statistical systems. The total Xuzhou anthropogenic emissions of PM2.5 in 2010 were estimated to be 3803.5 Mg/month based on the MEIC data.  Figure 4a shows the research area for inversion of the emission inventory. Red dots indicate the locations of observation sites providing hourly PM2.5 concentration data following a quality control assurance procedure that guarantees confidence in the use of these data. To identify and eliminate data outliers, a data quality control algorithm was applied independently to the PM2.5 observations for each observation site. This algorithm has the following components and thresholds [29] 3. Single spike. If the difference in PM2.5 concentration between the present hour (ℎ ) and both the previous hour (ℎ ) and the following hour (ℎ ) is greater than 50 μg m , the data for the present hour (ℎ ) are rejected. 4. Averaged spike. If the mean concentration over three consecutive hours (ℎ , ℎ , ℎ ) is greater  Figure 4a shows the research area for inversion of the emission inventory. Red dots indicate the locations of observation sites providing hourly PM 2.5 concentration data following a quality control assurance procedure that guarantees confidence in the use of these data. To identify and eliminate data outliers, a data quality control algorithm was applied independently to the PM 2.5 observations for each observation site. This algorithm has the following components and thresholds [29] In addition to the above quality control procedure, a check for frequently missing data at each site was implemented. Following the above tests, approximately 25% of the data were eliminated. An optimizing method was applied to the valid sites that remained, as shown by the red marks in Figure 4b. The research area of Xuzhou contains four sites. For convenience, all the times referenced hereafter are in Beijing Local Time (BLT).

Observations
Atmosphere 2018, 9, x FOR PEER REVIEW 6 of 17 than the data in the previous hour (ℎ ) and the following hour (ℎ ) by more than 80 μg m , the data for hours (ℎ , ℎ , ℎ ) are rejected.
In addition to the above quality control procedure, a check for frequently missing data at each site was implemented. Following the above tests, approximately 25% of the data were eliminated. An optimizing method was applied to the valid sites that remained, as shown by the red marks in Figure  4b. The research area of Xuzhou contains four sites. For convenience, all the times referenced hereafter are in Beijing Local Time (BLT).

Model Descriptions
To analyze the corresponding transport of PM2.5 pollution, the coupled FLEXPART-WRF model was employed to calculate forward Lagrangian particle dispersion [5,30,31], driven by the meteorological variables modeled using the WRF model. The FLEXPART-WRF model has been applied and validated for mid-range and long-range transport cases [28,[32][33][34]. Compared to an adjoint Eulerian model, the advantage of the LPDM is that no initial diffusion occurs due to the release of the adjoint tracer into a finite-size grid cell, which is independent of a computational grid and has, in principle, infinitesimally small resolution. The Lagrangian model has low computational costs. Furthermore, long-range transport can be simulated more accurately, as no artificial numerical diffusion occurs. In view of the simulation precision of the FLEXPART-WRF system, this system has been previously used for a variety of applications. For example, Srinivas et al. (2014) estimated radiation risk following the 2011 Fukushima nuclear accident using FLEXPART-WRF [35], and the model could reproduce observed deposited activity, albeit with some deviations related to model errors in wind and precipitation data from the WRF model. Gentner et al. (2014) used FLEXPART-WRF in backward mode to determine the source-receptor relationship (SRR) of anthropogenic emissions of organic carbon [36]. Another advantage of FLEXPART-WRF model is that it can obtain the contribution rate of pollutants sources to the receptor points, which makes it possible to improve the accuracy of pollutant forecast using the Bayesian optimization method. However, research on PM2.5 using the FLEXPART model is considerably limited because the current version of FLEXPART-WRF does not support the direct simulation of PM2.5 concentrations. Most studies are based on the off-line coupling of the FLEXPART-WRF model with the WRF-Chem model or the CMAQ model [37]. The emphasis and innovation of this study is to improve the prediction accuracy of PM2.5 by combining the FLEXPART-WRF model with the Bayesian optimization method. Considering the computing time cost and the characteristics of the FLEXPART-WRF model, the experiment design to simulate the pollutant particles as passive tracer air parcels without considering chemical reactions and dry deposition [38]. The chemistry and the dry deposition errors were treated as the errors of pollution sources. In this research, the spatial distribution of the SRR was determined by the Lagrangian particle dispersion model (FLEXPART-WRF). The SRR represents the contribution of each potential source for the receptors (observation sites). The forward simulation was adopted in

Model Descriptions
To analyze the corresponding transport of PM 2.5 pollution, the coupled FLEXPART-WRF model was employed to calculate forward Lagrangian particle dispersion [5,30,31], driven by the meteorological variables modeled using the WRF model. The FLEXPART-WRF model has been applied and validated for mid-range and long-range transport cases [28,[32][33][34]. Compared to an adjoint Eulerian model, the advantage of the LPDM is that no initial diffusion occurs due to the release of the adjoint tracer into a finite-size grid cell, which is independent of a computational grid and has, in principle, infinitesimally small resolution. The Lagrangian model has low computational costs. Furthermore, long-range transport can be simulated more accurately, as no artificial numerical diffusion occurs. In view of the simulation precision of the FLEXPART-WRF system, this system has been previously used for a variety of applications. For example, Srinivas et al. (2014) estimated radiation risk following the 2011 Fukushima nuclear accident using FLEXPART-WRF [35], and the model could reproduce observed deposited activity, albeit with some deviations related to model errors in wind and precipitation data from the WRF model. Gentner et al. (2014) used FLEXPART-WRF in backward mode to determine the source-receptor relationship (SRR) of anthropogenic emissions of organic carbon [36]. Another advantage of FLEXPART-WRF model is that it can obtain the contribution rate of pollutants sources to the receptor points, which makes it possible to improve the accuracy of pollutant forecast using the Bayesian optimization method. However, research on PM 2.5 using the FLEXPART model is considerably limited because the current version of FLEXPART-WRF does not support the direct simulation of PM 2.5 concentrations. Most studies are based on the off-line coupling of the FLEXPART-WRF model with the WRF-Chem model or the CMAQ model [37]. The emphasis and innovation of this study is to improve the prediction accuracy of PM 2.5 by combining the FLEXPART-WRF model with the Bayesian optimization method. Considering the computing time cost and the characteristics of the FLEXPART-WRF model, the experiment design to simulate the pollutant particles as passive tracer air parcels without considering chemical reactions and dry deposition [38]. The chemistry and the dry deposition errors were treated as the errors of pollution sources. In this research, the spatial distribution of the SRR was determined by the Lagrangian particle dispersion model (FLEXPART-WRF). The SRR represents the contribution of each potential source for the receptors (observation sites). The forward simulation was adopted in this experiment, and each emission source had a release rate of 5000 computational particles per hour. Concentrations were sampled on a 3-D output grid every 3 min and used for calculating hourly averages. The trajectories of individual particles were computed using a zero-acceleration scheme, in which particle movements are dependent on the grid-scale wind and the turbulent fluctuations as follows: Equation (1) is accurate to the first order, and the trajectory equation is integrated as follows [30]: where t is time, ∆t is the time increment, X is the position vector, and v = v + v t + v m is the wind vector, which is composed of grid-scale wind v, turbulent wind fluctuations v t and mesoscale wind fluctuations v m .

Generating the Source-Receptor Relationship
In this study, the source of SRR has different meaning compared to the source stated by Gentner et al. (2014) [36]. In the paper of Gentner et al. (2014), the source indicates petroleum industry and dairy operations, while, in this paper, it refers to the location of the PM 2.5 emissions. In the present study, 30-day SRRs were determined by releasing 5000 particles per hour from the potential sources. Here, the potential sources represent the potential locations. Considering the number of emission sources and the lower speed of the serial method, the parallel method is adopted to run the FLEXPART-WRF model. The contaminated area was divided into 30 pieces and run separately and finally, the model results were merged, and the receptor contributions from each pollution source were acquired. To apply the Bayesian optimization method, the potential source locations should be consistent with the grid locations from the MEIC a priori inventory.
In this paper, when simulating the SRR and inverting the PM 2.5 inventory, the first two months of the results are not examined because this period corresponds to the typical spin-up period.

Construction Cost Function and Calculation
By combining the Bayesian inversion method and receptor observations, the PM 2.5 emission and the corrected real-time PM 2.5 were obtained. The principle of inversion is to optimize the emission flux density by minimizing the mismatch between observed and simulated concentrations. The basic equation of the inversion approach can be written as follows: where M is the matrix of SRR, y is a vector of the PM 2.5 concentrations (µg/m 3 ) at a receptor, and x is the emission flux vector (Mg/grid/month). The equation can be written as follows: where m is the time series of observations and n is the number of emission grids. Y m is a vector that has m rows. Assuming that x = x − x a and y = y o − M · x a , where x a , x and y o are the a priori source vector, a posteriori source vector and observation vector, respectively, the cost function is described as follows [39,40]: The best estimate x was obtained by solving the equation ∇ x J(x) = 0 as follows: where δ o and δ x represent the standard errors related to the observations and a priori values, respectively. We are unable to precisely obtain the uncertainty information of each potential source and observation site. Therefore, this study adopted the method from Stohl et al. (2009) to deal with the uncertainty of greenhouse gases [40]. The potential source emission uncertainty is where t represents the time series of the simulation in the future, which can be understood as the number of hours. Because of the dramatic changes in the PM 2.5 concentrations in response to changes in tropospheric meteorological factors, this paper considers only the contributions from source region-based SRR calculations rather than background values.

Verification Statistics
The Pearson product-moment coefficient of linear correction is related to the degree of correlation between two fixed distance variables.
The root mean square error (RMSE) is often used to evaluate model performance because it quantifies the magnitude of the error in the model [41,42].
In addition, the index of agreement (IOA) is used in the model evaluation. IOA is defined as follows: where i is the ith paired (model-observation) data point, N is the total number of paired data points, and C (m,i) and C (o,i) are the ith modeled and observed mixing ratios, respectively.

Evaluation of the Site Optimizations
For the remainder of the paper, we use observation data for a one-month period in the optimization using the Bayesian optimization schemes. The use of such one-month time series observation data can not only meet the forecast time requirements but also greatly improve the prediction accuracy. Studies have shown that longer simulation times do not result in better reconstruction of emission sources because of the rapid decrease in emission sensitivity and the increase in model error. Table 1 summarizes the statistical measures of the model simulations using a priori and a posteriori emission inventories relative to observations. The correlation coefficient R between the observations and inversion results was ∼ 0.858, compared with that of ∼ 0.808 using the a prior inventory. The values of RMSE were~4.039 and~8.648 µg/m 3 , respectively. The IOA is another metric that can be used to assess improvements resulting from the application of the Bayesian method, and the IOA increased on average by 21.45%. The Bayesian optimization method adequately removes the systematic mean bias component. The unsystematic component can be reduced significantly through the optimization method. The mean bias of the Bayesian inversion system is 53.2% less than that of the raw results.  Figure 5 shows the comparison of the yearly mean diurnal variation of PM 2.5 between simulation and observation at four sites in Xuzhou. The simulations with inventory data originating from MEIC (a priori inventory) agree well with the observations. Compared to the observations, the raw model tends to overestimate PM 2.5 at night for all four sites. The MEIC inventory data (a priori inventory) are largely skewed towards highly positive PM 2.5 errors, with average errors larger than 10-25 µg/m 3 . In contrast, the optimization results are more consistent with the observation curve. The Bayesian optimization method tends to bring the optimization values closer to the observations than the raw model forecasts. In addition, the results show that, at nighttime and in the early morning, PM 2.5 concentrations were higher than at midday because, during the night and early morning, the boundary layer is generally low in height and has weaker convection than during the daytime. However, the simulation deviation is also large using the a priori inventory, which may not have considered the relatively low levels of anthropogenic emissions from factories and motor vehicles in the evening.
The optimization results are not always superior to those of the raw model. For example, the optimization effect of the Huaita-Xuzhou observation site is not as good as the raw model values at 2:00-5:00, although the difference is very small (Figure 5a). This discrepancy may be explained by the fact that the optimization effect is affected not only by the emission inventory but also by the meteorological conditions, the model errors and the pollution emergency event [10,14]. Overall, the results of the Bayesian inversion system in this study are closer to the actual observations, and the inversion emission inventory has greater credibility.

Evaluation of the Site Forecasts
The forecast values of three days from the a posteriori inventory calculated by the inversion model were compared with observations at four sites in Huaita-Xuzhou, Huanghexincun-Xuzhou, Nongkeyuan-Xuzhou and Xinchengqu-Xuzhou. Application of the Bayesian optimization method produces PM2.5 forecast values that are much closer to the observation values than the raw model. The raw model tends to have larger fluctuations than the observations on almost all days at these stations. As shown in Table 2, the correlation coefficient R between the predicted and observed PM2.5 values was improved using the inversion system, with values of ~0.639 versus ~0.459 after and before using inversion, respectively. Accordingly, the values of RMSE were ~7.407 and ~9.805 µg/m 3 after and before calibration, respectively. The IOA for the forecasts increased on average by 16.14%. As expected, the optimization method greatly reduced the systematic error, which was an improvement over the raw model forecasts. However, Figure 6 reveals that, while the Bayesian method helped to reduce the systematic errors in the model forecasts, some residual error still exists in the adjusted forecasts. This residual error mainly caused by without considering chemical reactions that are affected by meteorological factors and emissions, and these errors cannot be adequately resolved by the inversion system.  Figure 6 shows that the forecast values calculated from the a priori inventory were overestimated to some extent. The forecast values from the a posteriori inventory, by contrast, not

Evaluation of the Site Forecasts
The forecast values of three days from the a posteriori inventory calculated by the inversion model were compared with observations at four sites in Huaita-Xuzhou, Huanghexincun-Xuzhou, Nongkeyuan-Xuzhou and Xinchengqu-Xuzhou. Application of the Bayesian optimization method produces PM 2.5 forecast values that are much closer to the observation values than the raw model. The raw model tends to have larger fluctuations than the observations on almost all days at these stations. As shown in Table 2, the correlation coefficient R between the predicted and observed PM 2.5 values was improved using the inversion system, with values of ∼ 0.639 versus ∼ 0.459 after and before using inversion, respectively. Accordingly, the values of RMSE were~7.407 and~9.805 µg/m 3 after and before calibration, respectively. The IOA for the forecasts increased on average by 16.14%. As expected, the optimization method greatly reduced the systematic error, which was an improvement over the raw model forecasts. However, Figure 6 reveals that, while the Bayesian method helped to reduce the systematic errors in the model forecasts, some residual error still exists in the adjusted forecasts. This residual error mainly caused by without considering chemical reactions that are affected by meteorological factors and emissions, and these errors cannot be adequately resolved by the inversion system.  Figure 6 shows that the forecast values calculated from the a priori inventory were overestimated to some extent. The forecast values from the a posteriori inventory, by contrast, not only exhibited an improved correlation with observations but also corrected the system error. Comparisons for 17 August, 18 August and 19 August show that the model performed quite well on 17 August and 18 August. However, the model for 19 August did not perform as well as that for 17 and 18August. Previous research has suggested that poor forecast performance with the extension of the forecast time is common among air quality models and may be caused by the difficulty in simulating stagnant weather conditions [43]. This finding may be partially attributed to a combination of effects associated with the current model grid resolution, which is not sufficient to resolve the highly variable meteorological conditions, spatial and temporal variability in emissions, and, consequently, the evolution of the chemistry leading to PM 2.5 formation.
Atmosphere 2018, 9, x FOR PEER REVIEW 11 of 17 17 August and 18 August. However, the model for 19 August did not perform as well as that for 17 and 18August. Previous research has suggested that poor forecast performance with the extension of the forecast time is common among air quality models and may be caused by the difficulty in simulating stagnant weather conditions [43]. This finding may be partially attributed to a combination of effects associated with the current model grid resolution, which is not sufficient to resolve the highly variable meteorological conditions, spatial and temporal variability in emissions, and, consequently, the evolution of the chemistry leading to PM2.5 formation. Other scholars have used different methods to improve the forecast accuracy of PM2.5. For example, Borrego et al. (2011) evaluated the application of two bias-correction techniques, the multiplicative ratio and the Kalman filter, to improve air quality forecasts [44]. Their statistics showed that both bias-correction techniques improved the raw forecast skills, and the forecasts of PM2.5 improved significantly, with R values ranging from 0.5 to 0.64. The results confirm the advantage of the application of bias-correction techniques for air quality forecasts. However, the Bayesian inversion system, unlike the Kalman filter and multiplicative ratio methods, improves the forecasting accuracy by optimizing the priori inventory rather than merely correcting model forecast values. Therefore, the Bayesian inversion system developed in this study can not only produce better prediction results but also estimate a high-resolution emission inventory. Other scholars have used different methods to improve the forecast accuracy of PM 2.5 . For example, Borrego et al. (2011) evaluated the application of two bias-correction techniques, the multiplicative ratio and the Kalman filter, to improve air quality forecasts [44]. Their statistics showed that both bias-correction techniques improved the raw forecast skills, and the forecasts of PM 2.5 improved significantly, with R values ranging from 0.5 to 0.64. The results confirm the advantage of the application of bias-correction techniques for air quality forecasts. However, the Bayesian inversion system, unlike the Kalman filter and multiplicative ratio methods, improves the forecasting accuracy by optimizing the priori inventory rather than merely correcting model forecast values. Therefore, the Bayesian inversion system developed in this study can not only produce better prediction results but also estimate a high-resolution emission inventory. Figure 7 shows the spatial distribution of the PM 2.5 a posteriori emission fluxes in March, June, September and December, representing spring, summer, autumn and winter, respectively. The results show that the emission of PM 2.5 was significantly higher in winter than in summer due to fossil fuel combustion for heating. In contrast, the fluxes of MEIC (Figure 3) and the monthly average calculated by the inversion system (Figure 8a) show that the PM 2.5 emission flux might be overestimated to some extent by the a priori inventory in the heavy pollution area (Figure 9a), which may be related to the recent prevention and control of pollution. However, a comparison of the a posteriori and INTEX-B emission inventories (Figure 8b), which was conducted by the National Aeronautics and Space Administration (NASA), shows that the a posteriori PM 2.5 emissions are generally higher than the INTEX-B emission flux (Figure 9b). One of the possible reasons for this finding is that the INTEX-B emission inventory has a low spatial resolution and time lag. Overall, the spatial features of the a posteriori values were generally consistent with the latest regional emission inventory in the study areas.  A comparison of PM2.5 emissions at a city level between a priori and a posteriori values is shown in Table 3 and Figure 10 (the a posteriori inventory and MEIC statistical values). Table 3 shows the adjustment for the a posteriori values (difference between a posteriori and a priori) for Xuzhou, Suqian, Hefei and Jining, where the inversion results were statistically 384.76 Mg/month and 451.50 Mg/month larger than the a priori values in Xuzhou and Suqian, respectively. The statistics showed that the a posteriori PM2.5 emissions increase by 10.12% and 27.84% for Xuzhou and Suqian, respectively. However, the inversion results were smaller than the a priori values in Hefei and Jining A comparison of PM 2.5 emissions at a city level between a priori and a posteriori values is shown in Table 3 and Figure 10 (the a posteriori inventory and MEIC statistical values). Table 3 shows the adjustment for the a posteriori values (difference between a posteriori and a priori) for Xuzhou, Suqian, Hefei and Jining, where the inversion results were statistically 384.76 Mg/month and 451.50 Mg/month larger than the a priori values in Xuzhou and Suqian, respectively. The statistics showed that the a posteriori PM 2.5 emissions increase by 10.12% and 27.84% for Xuzhou and Suqian, respectively. However, the inversion results were smaller than the a priori values in Hefei and Jining by 561.41 Mg/month and 634.32 Mg/month, respectively, which showed that the a posteriori emissions decreased by 10.83% and 9.58%, respectively. The differences between a priori and posteriori can stem from three reasons: the errors of inversion system, a priori inventory errors and the effect of time lag. The a priori inventory was from the MEIC data of 2010, but the study period was 2016. The impact of time lag mainly refers to the changes in pollutant emissions caused by the rapid development of china's economy [45]. During this period, the rapid growth of urbanization and rising consumption of fossil fuels increased the emissions of pollutant, while the implementation of industrial energy saving and emission reduction measures also reduced the emissions of pollutant in the meantime.          Figure 11 shows the trends of monthly PM2.5 for the cities of Xuzhou, Jining, Hefei, and Suqian in 2016. The PM2.5 emission is lower in the summer and higher in the winter because abundant heating fuel is needed in the winter, which produces many fine particulates in the atmosphere. However, the PM2.5 emissions increase to some extent due to the large-scale burning of crops straw in the summer harvesting season. The obvious deviations of the PM2.5 emission flux at certain locations in the diagram could be partly attributed to the failure to account for chemical transformation, and the corresponding SRR might exclude the emission signal associated with open biomass burning sources.

Conclusions
In this study, the emission inventory and real-time PM2.5 concentration in Xuzhou are inverted by the Bayesian optimization method based on tower observations. The general feature of the inversion results is that the a posteriori values are consistent with the latest MEIC emission inventory, and the prediction accuracy is greatly improved. The R and RMSE values between the observations and the forecast values were improved by approximately 39.2% and 24.5%, respectively, using the inversion system. However, the present results imply that the PM2.5 emissions in the Xuzhou region might be significantly underestimated by bottom-up statistical methods. In general, the present PM2.5  Figure 11 shows the trends of monthly PM 2.5 for the cities of Xuzhou, Jining, Hefei, and Suqian in 2016. The PM 2.5 emission is lower in the summer and higher in the winter because abundant heating fuel is needed in the winter, which produces many fine particulates in the atmosphere. However, the PM 2.5 emissions increase to some extent due to the large-scale burning of crops straw in the summer harvesting season. The obvious deviations of the PM 2.5 emission flux at certain locations in the diagram could be partly attributed to the failure to account for chemical transformation, and the corresponding SRR might exclude the emission signal associated with open biomass burning sources.  Figure 11 shows the trends of monthly PM2.5 for the cities of Xuzhou, Jining, Hefei, and Suqian in 2016. The PM2.5 emission is lower in the summer and higher in the winter because abundant heating fuel is needed in the winter, which produces many fine particulates in the atmosphere. However, the PM2.5 emissions increase to some extent due to the large-scale burning of crops straw in the summer harvesting season. The obvious deviations of the PM2.5 emission flux at certain locations in the diagram could be partly attributed to the failure to account for chemical transformation, and the corresponding SRR might exclude the emission signal associated with open biomass burning sources.

Conclusions
In this study, the emission inventory and real-time PM2.5 concentration in Xuzhou are inverted by the Bayesian optimization method based on tower observations. The general feature of the inversion results is that the a posteriori values are consistent with the latest MEIC emission inventory, and the prediction accuracy is greatly improved. The R and RMSE values between the observations and the forecast values were improved by approximately 39.2% and 24.5%, respectively, using the inversion system. However, the present results imply that the PM2.5 emissions in the Xuzhou region might be significantly underestimated by bottom-up statistical methods. In general, the present PM2.5

Conclusions
In this study, the emission inventory and real-time PM 2.5 concentration in Xuzhou are inverted by the Bayesian optimization method based on tower observations. The general feature of the inversion results is that the a posteriori values are consistent with the latest MEIC emission inventory, and the prediction accuracy is greatly improved. The R and RMSE values between the observations and the forecast values were improved by approximately 39.2% and 24.5%, respectively, using the inversion system. However, the present results imply that the PM 2.5 emissions in the Xuzhou region might be significantly underestimated by bottom-up statistical methods. In general, the present PM 2.5 emission inventory inversion implies a 10.12% increase between 2010 and 2016 in Xuzhou. Although our analysis suggests that the Bayesian method can significantly reduce the systematic errors, the relative improvements of the real-time PM 2.5 forecast values from the application of this method were limited in terms of the temporal extent of the forecast. The inversion approaches still have considerable uncertainties due to the limited observation sites and the method limitations. Our future work will involve an assessment of inversion accuracy by including multiple sites in Xuzhou, and the seasonal variation will be addressed using a long-term observation dataset. At present, this study only verified the posteriori inventory is better than the a priori inventory for improving the forecast accuracy of PM 2.5 . We also need to combine the models of WRF-Chem and CMAQ for further verification.
In the applications presented in this study, the Bayesian optimization method is applied only at discrete points of the monitors. The extension of these methods for the development of optimized spatial map forecasting surface-level PM 2.5 distributions is an area for further research.