Revealing the sulfur dioxide emission reductions in China by assimilating surface observations in WRF-Chem

The anthropogenic emission of the sulfur dioxide (SO2) over China has significantly declined as the consequence of clean air actions. In this study, we have developed a new emission inversion system based on a Four-Dimensional Local Ensemble Transform Kalman Filter (4D-LETKF) and the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) to dynamically update the SO2 emission grid by grid over China by assimilating the ground-based hourly SO2 observations. Sensitivity tests for the assimilation system have been conducted firstly to tune four system parameters: 20 ensemble size, horizontal and temporal localization lengths, and perturbation size. Our results reveal that the same random perturbation factors used throughout the whole model grids with assimilating observations within about 180 km can efficiently optimize the SO2 emission, whereas the ensemble size has only little effect. The temporal localization by assimilating only the subsequent hourly observations can reveal the diurnal variation of the SO2 emission, which is better than that to update the the magnitude of SO2 emission every 12 hours by assimilating all the observations within the 12-hour 25 window. The inverted SO2 emission over China in November 2016 has declined by an average of 49.4% since 2010, which is well in agreement with the “bottom-up” estimation of 48.0%. Larger reductions of SO2 emission are found over the priori higher source regions such as the Yangtze River Delta (YRD). The simulated SO2 surface mass concentrations using two distinguished chemical reaction mechanisms are both much more comparable to the observations with the newly inverted SO2 emission than those with the priori emission. These indicate that the newly developed emission inversion system can 30


Introduction
China and India are the top two emitters of the anthropogenic sulfur dioxide (SO2) in the world (Li et al., 2017a). SO2 is a 35 toxic air pollutant and the precursor of sulfate aerosol, leading to the acidification of the atmosphere and the current heavy haze problem in China (Wang et al., 2016a;Huang et al., 2014;Yao et al., 2018). Sulfate aerosol can further perturb the radiative energy budget on Earth through directly scattering solar radiation (Goto et al., 2011) and hydrological cycle by aerosol-cloud interactions (Ramanathan et al., 2001;Sato et al., 2018;Rosenfeld et al., 2019). Sulfate coating on dust leads to a shorter lifetime of dust by increasing the deliquescence of the mixed dust, inducing a great impact on radiative properties 40 and climate modelling (Zhang et al., 2003;Bauer et al., 2007;Fu et al., 2009;Wang et al., 2013;Qi et al., 2013;Penner, 2019).
Hydrophilic polluted continental aerosols such as sulfate and mixed dust serve as cloud condensation nuclei (CCN) and thus have a substantial effect on cloud properties and the initiation of precipitation (Rosenfeld et al., 2008). The liquid and ice water paths of dust-contaminated clouds were found obviously smaller than those of dust-free conditions over Eastern Asia (Huang et al., 2006b;Huang et al., 2006a). Asian dust altering cloud microphysics and precipitation was revealed by 45 observations and model simulations Liu et al., 2019b;Liu et al., 2019a). This, in turn, plays a key role in the climate system. To mitigate climate change and control air quality, the emission control policies especially for SO2 implemented by China since 2006 cover all the major source sectors and have become increasingly stringent over time (Zhang et al., 2012). Consequently, the decreasing trends of SO2 loading over China have been revealed by satellite observations, demonstrating SO2 emissions in China have declined by 75% during 2007-2016 Li et al., 50 2017a). The relative change rate of SO2 emission in China during 2010-2017 is also estimated as -62% by using the bottomup emission inventory .
The timely precise emission inventories such as SO2 are the primary inputs to models for air quality prediction and mitigation. All the atmospheric chemistry and aerosol models rely on their descriptions of the emissions virtually, which are mostly from the "bottom-up" emission inventories. The "bottom-up" emission inventories are compiled based on indirect 55 information such as activity data and emission factors (Zhang et al., 2009;Kurokawa et al., 2013;Zheng et al., 2018). Due to the uncertainties of the activity rates and emission factors, large discrepancies of global and regional emissions are identified among different emission inventories Granier et al., 2011). It demonstrates that there is still no consensus on the best estimates for the emissions of atmospheric compounds. Moreover, the "bottom-up" anthropogenic emission inventories often lag several years behind the present and may quickly become outdated , leaving the 60 model without up-to-date emission inventories.
The emission inversion approach can feed historical and near-real-time observations into the models, providing a top-down approach to estimate and timely update the primary emissions of air pollutants (Streets et al., 2013). Generally speaking, variational and ensemble data assimilation approaches are the two widely used methodologies to estimate the emission 3 fluxes of gases (such as NOx, CO, VOCs) (Tang et al., 2011;Qu et al., 2017;Wu et al., 2020;Miyazaki et al., 2012b;Cheng et 65 al., 2010;Feng et al., 2020a) and/or aerosols (Dai et al., 2019a;Cohen and Wang, 2014;Peng et al., 2017;Yumimoto et al., 2008). The NOx emission changes over China during the COVID-19 epidemic were inferred from surface NO2 observations based on ensemble data assimilation approach (Feng et al., 2020b). The emission reductions during the 2015 China Victory Day Parade were successfully detected with an ensemble data assimilation system (Chu et al., 2018). The SO2 emission inventories over China were updated on monthly or seasonal time scales assuming a linear relationship between SO2 70 emissions and satellite observed SO2 column amounts (Koukouli et al., 2018;Lee et al., 2011), known as the mass balance approach (Martin, 2003), although the sulfur chemistry especially in polluted areas as well as by the interactions of clouds should be nonlinear (Goto et al., 2011;Liao et al., 2003). Fioletov et al. (2015) described a new mass balance approach to simultaneously estimate the SO2 lifetimes and emissions from large SO2 point sources using satellite measurements. Based on the variational data assimilation approach in the framework of the GEOS-Chem adjoint model, Wang et al. (2016b) 75 developed a new sophisticated inverse modeling (IM) method to timely update monthly anthropogenic SO2 emissions by assimilating the Ozone Monitoring Instrument (OMI) SO2 satellite measurements. The nonlinear full sulfur chemistry and lifecycle in the atmosphere were accounted for the first time to conduct the top-down estimation of the anthropogenic SO2 emissions from the GEOS-Chem adjoint model (Wang et al., 2016b). However, the great limitation to the application of variational data assimilation approach is the requirement of developing the complicated adjoint model (Henze et al., 200780 Liang et al., 2020. The ensemble data assimilation approach requires neither linearization of the observation operator and nor an adjoint model, therefore it is much more easily implemented and flexible (Evensen, 2003). Additionally, the ensemble data assimilation and the variational data assimilation use the flow-dependent and pre-calculated model error covariances respectively (Descombes et al., 2015;Zang et al., 2016). Based on the Ensemble Square Root Filter (EnSRF) approach (Chen et al., 2019a), the recent SO2 emission changes from the year 2010 in China were successfully updated to improve the model 85 forecast skill. An ensemble Kalman filter data assimilation system was developed to simultaneously optimize the chemical initial conditions and emissions including SO2 with multi-species chemical observations . The effects of meteorological assimilation on SO2 emission inversions were also studied recently (Peng et al., 2020).
Retrievals of SO2 from satellite-based spectrometers are often contaminated by factors such as interference between ozone and SO2, and there are significant regional differences between different satellite instruments (Fioletov et al., 2013). This 90 subsequently induces the inconsistency of the inversed regional emissions by assimilating different satellite observations (Lee et al., 2011). Meanwhile, satellite observations are usually assimilated on the monthly time scale due to data availability.
Compared with satellite observations, the surface SO2 observations have higher accuracy and temporal frequency. Therefore, the assimilation of intensive direct surface SO2 observations can provide more spatial-temporal characteristics of emission (Chen et al., 2019a). The China National Environmental Monitoring Centre (CNEMC) started to monitor hourly 95 concentrations of PM2.5 (particulate matter with diameter ≤ 2.5 micrometers), PM10, SO2, NO2, CO and O3 since 2012, and it had included 1436 monitoring sites from 369 cities by March 2017 (Wu et al., 2018). Those important direct intensive surface SO2 observations provide a new chance to estimate the more spatial-temporal characteristics of the SO2 emission in 4 China (Chen et al., 2019a).
Due to the limited ensemble members, the Ensemble Kalman filter (EnKF) generally has a spurious long distance correlation 100 problem (Houtekamer and Mitchell, 2001;Miyazaki et al., 2012a). Compared with the EnKF, the Local Ensemble Transform Kalman Filter (LETKF) can assimilate measurements simultaneously over different model grids in the parallel architecture (Miyoshi et al., 2007;Hunt et al., 2007). Generally speaking, the LETKF computational time is robust with increasing observations, while that of most other ensemble Kalman filter is essentially proportional to the number of observations (Miyoshi et al., 2007). Moreover, the global analysis is linear combinations of the ensemble members in different regions, 105 which is not confined to the limited ensemble members and provides better results in many cases (Ott et al., 2004). A Four-Dimensional LETKF (4D-LETKF) was recently developed to assimilate hourly aerosol optical properties observed by satellite, which can avoid frequent switching between the assimilation and the ensemble aerosol forecasting to significantly reduce computational load (Dai et al., 2019b). In current study, we implement a 4D-LETKF in the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). Our major objectives are to investigate whether 4D-LETKF 110 together with the intensive CNEMC SO2 observations can be applied to quantitatively estimate the spatially resolved changes of SO2 emissions in China and how sensitive are the estimated SO2 emissions to the system parameters of the 4D-LETKF.
The reminder of the paper is organized as follows. In section 2, the methodology of our emission inversion system is described in detail. Sect. 3 presents our experimental designs and purposes. The emission inversion results and validations 115 are provided in Sect. 4 before concluding in Sect. 5.

Methodology
In order to optimize the SO2 emissions in this study, we need to formally minimize a scalar cost function in a Bayesian framework (Hunt et al., 2007;Huneeus et al., 2012). can be formulated as the sum of the departures of a potential gridded 120 SO2 emissions x and the corresponding simulated SO2 surface mass concentrations to the a priori SO2 emissions # and the CNEMC observed surface SO2 concentrations $ : where is the observation operator that forward the SO2 emissions to the simulated CNEMC measurements; and are the covariance matrix of the error statistics of the a priori SO2 emissions and CNEMC observations. 125

Forward model and observation operator
The relationship between the emission and the surface concentration of short-lived reactive gas SO2 is mainly determined by the atmospheric chemical reactions, transport and deposition. The fully coupled "online" Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) version 4.1.2 (Grell et al., 2005) is served as the forward model to relate the SO2 emissions to the simulated observations of surface mass concentration in current study, which can reflect the complex 130 nonlinear relationship between atmospheric chemical concentrations and emissions. Our primary aim is to understand how sensitive are the estimated SO2 emissions to the parameters of the assimilation system, which requires huge computing 5 resources for sensitivity experiments as described later. Therefore, the model is configured with a domain covering most of China as shown in Fig. 1 with a relatively low horizontal resolution of 50 km and 32 vertical levels (Snyder et al., 2015). A state-of-the-art and highly nonlinear gas phase chemical mechanism named the second generation Regional Acid Deposition 135 Model (RADM2) (Stockwell et al., 1990) coupled with the Goddard Global Ozone Chemistry Aerosol Radiation and Transport aerosol model (Chin et al., 2000;Chin et al., 2002) (i.e., chem_opt = 303) is adopted to simulate the atmospheric sulfur cycle. The RRTMG radiation scheme with prognostic aerosols is selected to consider the aerosol direct effect on atmospheric radiation and photolysis calculations (Iacono et al., 2008). The other main selected physics are identical to those of Dai et al. (2019a). The initial and lateral boundary meteorological conditions are from the NCEP Final (FNL) Analysis. 140 To reduce the uncertainties associated with the meteorological fields and facilitate a more straightforward comparison of simulations and observations, the predicted wind (u, v), temperature (t), and specific humidity (q) by WRF dynamical core are also nudged to the NCEP FNL analysis every 6 hours (Dai et al., 2018). The meteorological fields in the Planetary Boundary layer (PBL) are not nudged. The WRF-Chem simulated surface gridded SO2 volume mixing ratios in the unit of parts per million ( ) are firstly converted to micrograms per cubic meter ( / " ) for comparing to the observations 145 (Chen et al., 2019a) and then linearly interpolated to the CNEMC site locations.

SO2 observations and uncertainties of CNEMC
The quality-assured and controlled measurements of hourly SO2 surface mass concentration from the CNEMC, which is partly purposefully built for assimilation (Wu et al., 2018), are used to minimize the cost function . There are a total of 1424 sites in November 2016, and those sites span most of central and eastern China and primarily locate in urban and suburban 150 areas (Peng et al., 2017). Due to unresolved emission variations between urban and suburban, model may have large representativeness errors. To overcome the spatial scale gaps and to produce more representative observations, the superobservation is adopted to average all observations located within a model grid cell (Miyazaki et al., 2012a). Altogether 463 of 7221 model grid cells are covered by the super-observations (Fig. 1). The locations of the super-observations are assumed as the locations of the covered model grid cells. To independently verify the assimilation results, we further randomly 155 eliminate the super-observations located in 155 of the 463 grid cells to be assimilated. In other words, the assimilated and independent verification observation sites are randomly decided. The observation error covariance matrix is assumed diagonal, in other word, the observational error covariance is assumed uncorrelated. The observation error of CNEMC is calculated as same as Chen et al. (2019a), which contains both the measurement and representativeness errors. In the assimilation data quality control process, SO2 observation leading to absolute innovation exceeding three times of the prior 160 total spread is considered as an outlier and discarded. The innovation is calculated as observation minus the model simulated ensemble mean observation determined from the first guess filed, and the prior total spread is the square root of the sum of the background ensemble variance and the observational error variance (Chen et al., 2019a;Rubin et al., 2016).

4D-LETKF
The 4D-LETKF assimilation approach generalizes a flow-dependent from ensemble simulation and finds the minimum of 165 the cost function as following five formulas (Cheng et al., 2019): : where ̅ # and ̅ ' represent the ensemble mean of the first guess (a priori) and analysis (a posteriori) SO2 emissions in this study; the ensemble perturbation matrix is calculated as ( ) − ̅ , { = 1, 2, … , } , which represents the ensemble size; the matrix : ' is the Kalman gain, which specifies the increment between the first guess and the analysis; the vector @ # represents the first guess SO2 surface concentrations averaged over the ensemble members; the matrix # is calculated as 175 # ( ) − @ # , { = 1, 2, … , }; represents the identity matrix. The ensemble analyses are calculated as the sum of the ̅ ' and each of the columns of ' , which is serving as part of a priori emission information for the next analysis as described later.
The multiplicative inflation factor is used to avoid the filter divergence, which is fixed at 1.1 to inflate the analysis covariance as same as our previous studies (Dai et al., 2019b;Cheng et al., 2019). In our implementation of the 4D-LETKF, the temporal and spatial localizations are achieved by multiplying the !& by a factor ( ) as described in section 3, which 180 makes the effect of an observation on the analysis decays smoothly to zero as the time and physical distance of the observation from the analysis grid point increases (Hunt et al., 2007).
As shown in Fig. 2, each assimilation cycle with 4D-LETKF includes two steps: a first guess and a state analysis. In our implementation, the first guess is the WRF-Chem ensemble forecasting for 12 hours with hourly model output. The state analysis optimizes the SO2 emissions in the past 12 hours. The advantages of 4D-LETKF used here are threefold: (1) each 185 member of the ensemble WRF-Chem simulations is continuously integrated for 12 hours, therefore, this avoids frequent switching between the ensemble WRF-Chem forecasts and the assimilation (Peng et al., 2017;Chen et al., 2019a); (2) the asynchronous observations can be assimilated to the optimize the current state (Hunt et al., 2007;Dai et al., 2019b); (3) the assimilation window time of 12 hours could avoid filter convergence and divergence by finite ensemble samples, since more frequent assimilation forces the experiments more closer, inducing the underestimation of the model spread and 190 overconfidence in the first guess state estimate (Schutgens et al., 2010;Miyazaki et al., 2012a;Hunt et al., 2007).

State variable and forecast model for emission
In this study, the state variable to be optimized is the SO2 emission. A forecast model for emission is required to propagate observation information and determine the first guest for the next assimilation cycle (Miyazaki et al., 2012a). We adopt the same forecast model for SO2 emission proposed by Chen et al. (2019a). The forecast model for SO2 emission weights 75% 195 and 25% toward the SO2 emission ensemble * ! ' from the previous analysis and the static initial prior ensemble *+ as following formula: 7 are used to calculate the first guess SO2 emission ensemble * !"# # in sequence for the next assimilation cycle. The SO2 200 emission inversion depends on the forecast model, therefore, sensitivity experiments for various different emission forecasts are conducted to tune the assimilation system as given in Table 1. The detail settings of the sensitivity experiments will be described in next section. As shown in Figs. S1 and S2 in the Supplement, the temporal and spatial distributions of the ensemble spread of the forecast emissions * !"# # are significantly sensitive to the assimilation system parameters. The initial prior ensemble of SO2 emission is generated by perturbing the freely public available MIX Asian inventory for November 205 2010 (Li et al., 2017b). For example, the SO2 emission for ensemble member at a given location ( , ) is calculated as , ( , ) ( , ) (Rubin et al., 2016), and the perturbation , ( , ), { = 1, 2, … , } , follows a lognormal distribution in thedimensional space. The mean and the variance of the perturbations ( , ) are equal to 1 and the MIX SO2 uncertainty (i.e., 35%) (Li et al., 2017b). The horizontal perfect correlated and random uncorrelated perturbations are both created to generate the initial prior ensemble *+ and the associated first guess SO2 emission ensemble * !"# # as described later. The spatial 210 distribution of the ensemble spread of the *+ with either horizontal perfect correlated or random uncorrelated perturbations has the similar pattern as the MIX Asian inventory , which is generally equal to 35% multiplying . In MIX inventory, anthropogenic emissions are aggregated into five sectors: power, industry, residential, transportation, and agriculture.
However, only the combined total emission is used in the model and updated in the analysis. It aims to decrease the degree of freedom in the analysis (Miyazaki et al., 2012a). Ten chemical species including both gaseous and aerosol species are 215 included in MIX inventory (Li et al., 2017b).  (Chen et al., 2019b). An improved speciation framework for mapping Asian anthropogenic emissions 220 of non-methane volatile organic compounds (NMVOC) to multiple chemical mechanisms (Li et al., 2014), is adopted to prepare the initial hourly anthropogenic emissions every 12 hours with two separated emission files (i.e., io_style_emissions = 2). We do not apply any diurnal variation for the MIX emissions. Therefore, the initial priori emissions are identical throughout the 24 hours. The emissions of aerosol species for WRF-Chem are prepared according to the study of Peng et al. (2017). Notably, only the SO2 emission is perturbed and optimized by CNEMC SO2 observations in this study. 225 The chemical initial conditions (i.e., atmospheric SO2 concentrations) for the next forward simulation of the WRF-Chem ensemble are also needed to be updated with the optimal emission ensemble from the previous analysis (Peng et al., 2015;Peters et al., 2005), and this is achieved by recalculation of the WRF-Chem ensemble with the optimized emissions ( Fig. 2). In other word, the WRF-Chem ensemble is performed twice in one assimilation cycle. Theoretically, the uncertainties of the forecast SO2 concentrations by recalculation of the WRF-Chem ensemble are dependent on the 230 optimized emissions. Lower uncertainties of the initial SO2 conditions for the next assimilation cycle should be found with higher accurate optimized SO2 emissions, which in turn makes the SO2 emission inversion more reasonable. Sensitivity 8 experiments for the SO2 emission inversions as described in next section are performed to choose the best assimilation system parameters.

Experimental design
The effectiveness of 4D-LETKF is highly dependent on having sufficient spread in the ensemble members in order for the observations to impact the first guess (Rubin et al., 2016;Dai et al., 2019b;Hunt et al., 2007). The ensembles represent the uncertainty in the model first guess, therefore, the method for generating the ensemble is an important consideration for an optimal "top-down" emission inversion. Meanwhile, 4D-LETKF allows a flexible choice of observations to be assimilated 240 for a specific grid point through horizontal, vertical, and temporal observation localizations (Miyoshi et al., 2007;Dai et al., 2019b;Cheng et al., 2019). The observation localization gradually reduces the effect of an observation as the increasing departure from the analysis grid. In this study, the horizontal localization factor is calculated as the Gaussian function (Miyoshi et al., 2007): where is the localization length and is defined as the physical distance between the observation and the analysis grid, 245 and we force the localization factor to zero at 3.65 times the localization length (Zhao et al., 2015). In other word, we ignore observations beyond the cutoff distance. The tuneable horizontal and temporal localization lengths are defined in the physical distance (km) and time (hour), respectively. The vertical localization is not applied for the SO2 emission inversion in this study, in other word, we trust the vertical profiles of the emission factors from the Model Inter-Comparison Study for Asia (MICS-Asia) phase III (Chen et al., 2019b). 250 A correct choice of the assimilation system parameters such as the ensemble size and correlation length is important to improve the data assimilation performance (Miyazaki et al., 2012b). A series of sensitivity experiments are performed to tune the assimilation system as listed in Table 1. A control experiment assuming the same emissions in November 2016 as in November 2010 (i.e., the standard MIX emissions) is conducted as the deterministic simulation to assess the influence of data assimilation. Considering the GOCART aerosol scheme uses a simple representation of the aerosol chemistry for 255 reducing the computational load, we also conduct another deterministic simulation using a more sophisticated aerosol chemical scheme named Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) coupled with the "lumpedstructure" Carbon Bond Mechanism (CBMZ) (Zaveri et al., 2008) (i.e., chem_opt = 9) to investigate the effects of different chemistry and aerosol schemes on SO2 oxidation. The data assimilation experiments are divided into three groups. In the first group, same random perturbation factor throughout the whole domain emission grids including vertical and temporal spaces 260 per member is applied to the MIX SO2 emission to generate 10 ensemble members for the WRF-Chem ensemble forward simulation. The spatial correlation coefficients among the initial prior ensemble of SO2 emissions over every two model grids are equal to one, and this makes the spatial correlations among the grids points of the forecast emissions are also equal 9 to one. The same random perturbation factor generates a perfect correlation of emission in both the spatial and temporal spaces, however, this should not be seen as overly restrictive (Schutgens et al., 2010). Firstly, the "bottom-up" SO2 emission 265 inventories are to a large extent based on the used activity rates and emission factors . Therefore, with the same random perturbation factors we effectively create an ensemble of inventories derived with different activity rates and emission factors. Secondly, the same emission standards for SO2 emission mitigation are implemented in China , and this induces the SO2 reductions should be correlated at a certain extent in both spatial and temporal spaces.
Thirdly, the analysis is conducted locally in 4D-LETKF, and the analysis at two grids separated by a distance over about 7.3 270 times the localization length is mostly independent (Schutgens et al., 2010). In this group, the strongest temporal localization is applied to assimilate only the observations within 1 hour of the local patch center. In other word, the hourly SO2 emission is optimized using only the CNEMC SO2 observation within the subsequent one hour, making the inverted SO2 emission variable hour by hour. The difference of the experiments in this group is only the horizontal localization length, which is assumed as 10km, 30km, 50km, and 100km respectively. The purpose of the experiments in this group is to investigate the 275 effects of horizontal localization length on SO2 emission inversion. Based on the results in the first group as described latter in Section 4, the second group of experiments by fixing the horizontal localization length of 50km are subsequently performed with 10, 20, 40 ensemble members to investigate the effects of ensemble size on SO2 assimilation. In this group, we remove the temporal localization to investigate the effects of temporal localization on the SO2 emission inversion. In other word, the hourly SO2 emission is optimized using all the CNEMC SO2 observations within the 12 hours assimilation 280 window, making the inverted SO2 emission constant within every 12 hours. In the third group, the experiments are performed as same as those of the second group except that the ensembles are generated by independently perturbing the emission in horizontal space but dependently in vertical and temporal spaces. Those last two groups of experiments are used to investigate the effects of the ensemble size and perturbation factor on SO2 emission inversion. The sensitivity data assimilation experiments are all performed for 10 days over the period of 00:00 UTC 8 November to 00:00 UTC 18 285 November 2016. The global model MOZART-4/GEOS-5 provides the initial and lateral boundary conditions used in this study (https://www.acom.ucar.edu/wrf-chem/mozart.shtml, last access: 10 August 2020). Since we don't know the uncertainties of the global model MOZART-4/GEOS-5, the initial and lateral boundary chemical fields are not perturbed in this study. The first three days are used as the spin-up of the data assimilation system, and the subsequent simulation results for one week are analysed in the next section. Based on the sensitivity tests of the SO2 emission inversion system, the 290 experiment H50kmT1hE10Ps, which generally performing better than other experiments, is extended to 00:00 UTC on 1 December 2016. This provides a longer period of 20 days to further validate the assimilation system. We also perform a recalculation experiment with the sophisticated CBMZ/MOSAIC scheme and the updated SO2 emissions to verify the new SO2 emission and the associated effects of SO2 emission reduction.

Sensitivity of the inverted SO2 emission to the assimilation parameters
The spatial distribution of the MIX SO2 emission in November 2010 at the model lowest layer is shown in Figure 3a, which is serving as the base of the initial priori SO2 emission for our experiments in November 2016. The hotspots of the anthropogenic SO2 emission are apparently found over the economically developed areas such as the North China Plain 300 (NCP), the Yangtze River Delta (YRD), and the Peral River Delta (PRD). The Multi-resolution Emission Inventory for China (MEIC, http://www.meicmodel.org, last access: 15 February 2021) developed by Tsinghua University can provide the updated SO2 emission in November 2016 (Fig. 3b), which is used as the independent "bottom-up" SO2 emission to validate our inverted SO2 emission. It is apparent that significant negative changes of SO2 emission are found over the priori higher source regions such as the NCP, YRD, and PRD between the year of 2010 and 2016, which are in agreement with the 305 changes of the column SO2 concentrations observed by satellite . As the consequence of clean air actions  With horizontal localization length of 50 km, the spatial distribution of the inverted SO2 emission by removing the temporal localization is shown in Fig. 3g. It is clearly found that the inverted SO2 emissions over the Shandong Province, YRD and PRD without temporal localization are lower than those with temporal localization, inducing larger negative bias and RMSE (Fig. 4f). It demonstrates that it is important to reveal the diurnal variations of the SO2 emission (Wang et al., 2010). The experiment with temporal localization can reveal the hourly variation of the SO2 emission by assimilating only the 335 subsequent hourly observations, whereas the experiment without temporal localization only adjust the magnitude of SO2 emission every 12 hours by assimilating all the observations within the 12-hour window. The inverted SO2 emission with horizontal random uncorrelated perturbations gets closer to the independent MEIC one as increasing the size of the ensemble member . However, the performances of the horizontal 345 distribution and magnitude of the inverted SO2 emission using 40 ensemble members with horizontal random uncorrelated perturbations are even obviously worse than those using 10 ensemble members with horizontal correlated perturbations. It demonstrates that the independent emission perturbations over each model grid tend to underestimate the model spread due to the current limited ensemble members and the cancellation of neighbouring cells (Pagowski and Grell, 2012;Schutgens et al., 2010). 350 The mean bias and RMSE of SO2 emission over China by using the MIX SO2 emission in November 2010 for that in November 2016 are 2.70 and 9.78 !) ℎ !& , respectively (Fig. 4a). For the inverted SO2 emission by data assimilation, the bias and RMSE reduction rates (Miyazaki et al., 2012b)  where -. and -. are the mean bias and RMSE between the inverted SO2 emission and the MEIC SO2 emission in November 2016. As shown in Fig. 5, it is obviously found that (1) the inverted SO2 emission in every assimilation experiment can both reduce the bias and RMSE; (2) the randomly correlated perturbation factor is superior to the randomly uncorrelated perturbation factor in reducing the bias and RMSE, and it is generally unaffected by the ensemble size; (3) the experiment H50kmT1hE10Ps shows the best performance in both reducing the bias and RMSE, decreasing the bias and 360 RMSE by 94.5% and 45.4% respectively.  Fig. S6 in the Supplement. It is apparent that significant RMSEs and positive biases 370 are found over the priori SO2 emission hotspot regions such as the NCP, YRD, and PRD in both the two free run experiments, whereas slight RMSEs and negative biases are both found over northwestern China. Furthermore, the horizontal distributions of both the biases and RMSEs of the two free run experiments are generally similar. As given in Table 2, the relative differences of the RMSEs in the FR and FR_CM experiments are both less than 1% over the assimilated and independent sites, although the mean biases in the FR_CM experiment tend to both slightly smaller than those in the FR 375 experiment. Those demonstrate that the biases and RMSEs between the simulated and observed surface SO2 concentrations are not induced by the uncertainties of the different chemical reaction mechanisms but due to the uncertainties of the used biases of the SO2 surface concentrations in both the assimilated and independent sites are benefitted from the temporal 13 localization, although the RMSEs are slightly increased. It is interesting that the smallest RMSE of the SO2 surface concentrations over the independent sites is also found in the H50kmT1hE10Ps experiment with value of 36.20, which the inverted SO2 emissions are also best in agreement to the independent MEIC ones. This further indicates the assimilation system parameters used in this experiment are suitable for the SO2 emission inversion, decreasing the biases of SO2 surface 400 concentrations over assimilated and independent sites by 87.2% and 88.9% respectively. The underestimation of the surface SO2 concentration with the original MIX emission over northwestern China such as the Gansu province is potentially attributable to the increasing SO2 emissions due to energy industry expansion and relocation over northwestern China (Ling et al., 2017). The SO2 emissions and surface concentrations over the Gansu province are increased to reduce the negative biases in the assimilation experiments as shown in Figs. S4 and S6 in the Supplement, indicating our emission inversion 405 system also works well when the prior emissions are underestimated. However, the simulated surface SO2 concentrations with the inverted emissions are still underestimated over the Gansu province. The reason for the underestimation is twofold:

Sensitivity of the surface SO2 concentration to the emission
(1) there are limited observations to be assimilated over northwestern China because the observation sites are sparse; (2) the initial priori MIX SO2 emission over northwestern China is small and underestimated, inducing the model uncertainty is small relative to the observation one. This translates to a reduced impact of the observation on the priori emission. 410 H50kmT1hE10Ps experiment, as expected, shows the best performance with a peak closer to 0 in both the assimilated and independent sites. 420

SO2 reduction in China and associated effects
Based on the aforementioned sensitivity tests of the SO2 emission inversion system, the experiment H50kmT1hE10Ps is The SO2 surface concentrations simulated by the FR_CM experiment are sometime lower than those in the FR experiment especially over the YRD and PRD subregions, indicating the overestimations of the SO2 emission reduction by the Topdown approach over the YRD and PRD are probability due to the simple aerosol chemistry schemes used in 445 RADM2/GOCART (Chin et al., 2000). This is proved as the simulated SO2 surface concentrations in YRD with the concentrations and underestimating the associated ensemble spread. The latter induces the inverted emission to be 460 overconfident in the background emission (Hunt et al., 2007). Since the emissions are constant over time in the priori MIX inventory, the diurnal variations of the SO2 emission reduction over China and NCP both reveal higher emission reductions in the nighttime, inducing the SO2 emissions in the nighttime are lower than those in the daytime (Fig. 11c). This is generally reasonable as less human and economic activities happen in the nighttime (Chen et al., 2019b). Figure 12 shows the spatial distributions of the averaged surface concentrations of the sulfate, ammonium, nitrate, and PM2.5 465 over 11 November to 1 December 2016 simulated with the CBMZ/MOSAIC mechanism and the original MIX emissions, and the absolute and relative changes of the associated aerosol surface concentrations with the newly inverted emissions by data assimilation. It is found that the SO2 emission reductions induce the sulfate surface concentrations reduced up to 10 / " (50%) over the center China, and this is due to the sulfate aerosols are dominated by the productions in-cloud oxidations (Chin et al., 2000;Goto et al., 2015) and more cloud are found over the center China Ma et al., 470 2014). The nitrate surface concentrations are found slightly increased in the center China as the reductions of sulfate aerosols, and this is due to the emissions of the nitrate precursors (i.e., NO and NO2) are not updated in this study and NH4NO3 is formed only in sulfate-poor aerosols (Zaveri et al., 2008;Chen et al., 2016). The synergy effects of sulfate-nitrateammonium induce slightly reductions of ammonium surface concentrations, decreasing the PM2.5 surface concentrations about 10 !" (10%) over the center China. 475

Conclusions
The timely precise emission inventories are crucial to air quality prediction and mitigation. To dynamically update the Sensitivity tests for the emission inversion system demonstrate that the assumption of the covariance error matrix of the a priori SO2 emissions has the largest effect on the inverted emissions. The random perfectly correlated emission perturbations throughout the whole model grids with horizontal localization length of 50 km can best reproduce the independent MEIC SO2 emissions, decreasing the MIX emission bias and RMSE by 94.5% and 45.4% respectively. The independent emission perturbations over each model grid tend to underestimate the model spread due to the current limited ensemble members and 490 the cancellation of neighbouring cells. With the random perfectly correlated emission perturbations, the ensemble size has only little effect on the inverted SO2 emissions and the ensemble forecast with 10 members seems feasible to reveal the SO2 reductions in China. The temporal localization by assimilating only the subsequent hourly observations can reveal the diurnal variation of the SO2 emission, which is better than that to update the magnitude of SO2 emission every 12 hours by assimilating all the observations within the 12-hour window. 495 The known overestimates of the prescribed SO2 emissions in November 2016 as same as that in November 2010 are successfully detected as the simulated SO2 surface concentrations especially over the SO2 emission hotspot subregions with two distinguished chemical reaction mechanisms are both significantly positive biased. The simulated SO2 surface concentrations with the inverted SO2 emissions in all assimilation experiments show more comparable to the observations, and the performances of the simulated SO2 surface concentrations are clearly affected by the inputs of the different inverted 500 SO2 emissions due to assimilation system parameters. This indicates that the uncertainties of the different chemical reaction mechanisms in simulating SO2 concentrations are much smaller than those of the SO2 emissions. The smallest RMSE of the simulated and observed SO2 surface concentrations over the independent verification sites is also found in the experiment that the inverted SO2 emissions are best in agreement to the independent MEIC ones, decreasing the biases of SO2 surface concentrations by 88.9%. 505 The SO2