A three-dimensional variational data assimilation system for aerosol optical properties based on WRF-Chem: design, development, and application of assimilating Himawari-8 aerosol observations

Abstract. This paper presents a three-dimensional variational (3DVAR) data assimilation (DA) system for aerosol optical properties, including aerosol optical depth (AOD) retrievals and lidar-based aerosol profiles, which was developed for the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) within the Weather Research and Forecasting model coupled to Chemistry (WRF-Chem) model. For computational efficiency, 32 model variables in the MOSAIC_4bin scheme are lumped into 20 aerosol state variables that are representative of mass concentrations in the DA system. To directly assimilate aerosol optical properties, an observation operator based on the Mie scattering theory was employed, which was obtained by simplifying the optical module in WRF-Chem. The tangent linear (TL) and adjoint (AD) operators were then established and passed the TL/AD sensitivity test. The Himawari-8 derived aerosol optical thickness (AOT) data were assimilated to validate the system and investigate the effects of assimilation on both AOT and PM2.5 simulations. Two comparative experiments were performed with a cycle of 24 h from November 23 to 29, 2018, during which a heavy air pollution event occurred in North China. The DA performances of the model simulation were evaluated against independent aerosol observations, including the Aerosol Robotic Network (AERONET) AOT and surface PM2.5 measurements. The results show that Himawari-8 AOT assimilation can significantly improve model AOT analyses and forecasts. Generally, the control experiments without assimilation seriously underestimated AOTs compared with observed values and were therefore unable to describe real aerosol pollution. The analysis fields closer to observations improved AOT simulations, indicating that the system successfully assimilated AOT observations into the model. In terms of statistical metrics, assimilating Himawari-8 AOTs only limitedly improved PM2.5 analyses in the inner simulation domain (D02); however, the positive effect can last for over 24 h. Assimilation effectively enlarged the underestimated PM2.5 concentrations to be closer to the real distribution in North China, which is of great value for studying heavy air pollution events


Besides, numerical simulations conducted by atmospheric chemistry models or air quality models have increasingly played an essential role in aerosol analysis and prediction. Significant progress has been achieved in recent years; however, accurate aerosol modeling remains challenging given the large uncertainties associated with 90 aerosol emissions, initial conditions, and complex interactions with meteorological processes. Solving these uncertainties is of great significance for improving aerosol modeling. Data assimilation (DA), a statistically optimal approach combining observations with numerical model outputs, can reduce uncertainties in the initial aerosol fields. Chemical DA, especially aerosol DA, has gradually developed to improve the 95 prediction of air quality in recent years. In early studies of aerosol DA, the optimal interpolation (OI) technique was employed to assimilate the total mass concentrations of PM 2.5 or PM 10 using a control variable scheme (Tombette et al., 2009). The variational algorithm was also employed in some studies. Niu et al. (2008) used the three-  (Wu et al., 2002;Kleist et al., 2009) has been widely applied to aerosol DA (Pagowski et al., 2010;Liu et al., 2011;Jiang et al., 2013;Feng et al., 2018；Pang et al., 2018. The GSI tool was preliminary developed for the Goddard Chemistry Aerosol Radiation and Transport (GOCART) aerosol scheme (Chin et al., 2000) using the 110 3DVAR algorithm. To overcome the systematic underestimation of the GOCART scheme in the assimilation context, researchers developed an aerosol DA system based on the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) aerosol scheme (Li et al., 2013;Zang et al., 2016;Wang et al., 2020;Liang et al., 2020). For instance, Li et al. (2013) lumped eight aerosol species within MOSAIC into five control 115 variables and then constructed a 3DVAR DA system to assimilate PM 2.5 mass concentrations, and the results showed that DA has a beneficial effect on both the initial field and PM 2.5 forecasts within a 24 h period. Although the four-dimensional variational (4DVAR) technique has been extensively used in meteorological operations (Gauthier et al., 2007), it has only been employed to assimilate atmospheric chemical compositions 120 such as O 3 , SO 2 , and CO based on a simple offline chemical transport model (CTM) because of the high computational cost and complex adjoint model (Eibern and Schmidt, https://doi.org/10.5194/gmd-2021-215 Preprint.  natural to incorporate them into models via assimilation. The key issue of directly assimilating aerosol optical properties is the establishment of an observation operator and its adjoint for variational methods. Liu et al. (2011) added the forward AOD operator and its adjoint module within the Community Radiative Transfer Model (CRTM) (Han et al., 2006) to the GSI for the first time and successfully assimilated MODIS AODs. This 130 extended assimilation tool was then employed to assimilate various AOD retrievals from different platforms (Schwartz et al., 2012;Saide et al., 2014;Tang et al., 2017;Pang et al., 2018;Ha et al., 2020), and achieved encouraging results. Similar to AOD, assimilating lidar aerosol profiles also involves the complex forward operator and its adjoint Wang et al., 2014). In order to simplify the observation 135 operator, an approximate approach was utilized to directly assimilate aerosol profiles.
For example, Liang et al. (2020) employed the Interagency Monitoring of Protected Visual Environments (IMPROVE) equation which is the linear link between the extinction coefficient and aerosol chemical species mass as the forward operator to construct a 3DAVR DA system and then assimilated ground-based lidar aerosol profiles 140 and PM 2.5 mass concentrations simultaneously. Also, some researchers have used sequential approaches, such as the ensemble Kalman filter, to advance aerosol DA (Schutgens et al., 2010;Yumimoto et al., 2016;Sekiyama et al., 2016;Dai et al., 2019).
Nevertheless, the ensemble based aerosol forecasts are very expensive duo to the heavy computational load, especially online meteorology-chemistry modelling, it is difficult to 145 widely implement them in the operational air quality DA systems (Pang et al., 2021). Following Li et al. (2013) and You (2017), this study further extends the assimilation of aerosol optical properties. Using an observation operator based on the Mie scattering theory, a comprehensive 3DVAR DA system aiming for aerosol optical properties, including AOD retrievals and aerosol profiles, is developed for the MOSAIC aerosol 150 scheme within the Weather Research and Forecasting model coupled to Chemistry (WRF-Chem) model for the first time. The remainder of this paper is organized as follows. Sect. 2 presents the aerosol DA system in detail. The data and experimental methods used in this study are described in Sect. 3. The background error statistics necessary for the assimilation experiment are analysed in Sect. 4. The results are 155 summarized in Sect. 5, discussing the assimilation effects. Finally, a summary is presented in Sect. 6, along with discussions on the limitations of this study and suggestions for future research.
2 Aerosol data assimilation design 2.1 Model description 160 WRF-Chem is an advanced online coupled meteorology-aerosol model (Grell et al., 2005) that can simultaneously simulate meteorological fields and atmospheric chemical compositions including aerosols. It has been widely used in air quality forecasting and aerosol-related studies (Chen et al., 2016). Aerosol processes are treated by modules or schemes in WRF-Chem, such as GOCART (Chin et al., 2000), MOSAIC (Zaveri et al.,165 2008), and Modal Aerosol Dynamics Model for Europe (MADE) (Ackermann et al., 1998). There is no size information but total mass for sulfate, black carbon (BC), and organic carbon (OC), while there is size information only for dust and sea salt in GOCART, in addition to no description of second organic aerosol (SOA), resulting in its numerical efficiency. Due to more detailed descriptions of dust, GOCART has been 170 applied more extensively in dust aerosol research. MADE is a modal aerosol scheme that describes more aerosol species than GOCART, including sulfate, ammonium salt, back carbon, organic carbon, sea salt, nitrate, dust, and SOA. Moreover, it employs three lognormal modes, that is, Aitken, accumulation, and coarse, to describe aerosol size distributions in detail. Although such a scheme is ideal for aerosols, it consumes more 175 computational resources; therefore, its applications are limited. As a newly developed scheme, MOSAIC is a sectional aerosol scheme that incorporates tradeoffs between detailed descriptions of aerosol chemical species, size distributions, and computational cost. Previous studies have shown that this scheme has a good ability to simulate the compound aerosol pollution process in China (Gao et al., 2015;Chen et al., 2016;Chen 180 et al., 2019). The MOSAIC scheme divides atmospheric aerosols into eight species, including BC, OC, nitrate (NO 3 − ), sulfate (SO 4 2− ), chloride (Cl − ), sodium (NA + ), ammonium salt (NH4 + ), and other unclassified inorganic mass (OIN). At the same time, 4 or 8 discrete size sections or bins are employed to represent the size distribution of each species. In this study, we selected four bins for computational efficiency. The first, 185 second, third, and fourth size sections are set to be 0.039-0.1 μm, 0.1-1.0 μm, 1.0-2.5 μm, and 2.5-10.0 μm, respectively. The sum of the eight species in the first three sections corresponds to PM 2.5 , whereas the sum of all the sections corresponds to PM 10 . This approach ensures that aerosols are represented efficiently and accurately. Thus, it can be concluded that the MOSAIC aerosol mechanism of multiple species in multi-190 particle size sections has an advantage in anthropogenic aerosol studies over other schemes. Therefore, we conducted aerosol analyses and forecasts using MOSAIC, and a DA system was developed for the MOSAIC scheme.
The WRF-Chem version 4.0 was used to perform assimilation simulation experiments.
Both physical and chemical parameterization schemes are indispensable for numerical 195 simulations. The main parameterization schemes used in this study include the WRF single-moment 6-class microphysics scheme (WSM6, Hong and Lim, 2006), the Rapid Radiative Transfer Model for General Circulation Model (RRTMG) longwave and shortwave radiation scheme (Iacono et al., 2008), the Noah land-surface scheme (Chen and Dudhia, 2001), the Yonsei University (YSU) atmospheric boundary layer scheme 200 ( Hong and Lim, 2006), the Grell-Freitas convective parameterization scheme, the Fast-J photolysis scheme (Ruggaber et al., 1994), the Regional Acid Deposition Model, version 2 (RADM2, Stockwell et al., 1990)/MADE/Second Organic Aerosol Model (SORGAM, Schell et al., 2001) anthropogenic emissions, and the MOSAIC_4bin scheme described above. 205 The configuration of the two-level nested simulation domain is shown in Fig. 1a, including most of East Asia in Domain 1 (denoted by D01, hereafter) with a horizontal grid spacing of 27 km, and the entirety of North China as well as parts of East and Central China in Domain 2 (denoted by D02, hereafter) with a horizontal resolution of 9 km, 1/3 that of D01. To ensure a detailed simulation of aerosol vertical distributions, 40 210 vertical layers were modelled in the simulation, and it is worth mentioning that the vertical resolution was not set to be uniform but decreased with height. The lowest layer is at the surface, whereas the top reaches 50 hPa.

Basic formulation
The 3DVAR algorithm has been extensively used for aerosol analysis and forecasts, such 215 as the GSI tool, because of its high computational efficiency and the advantages of handling unconventional observations. Thus, it was employed to construct a DA system aiming for aerosol optical properties in this study. The 3DVAR algorithm can produce an aerosol analysis field with minimum analysis error covariance after a correction to the background field through the introduction of various observation information. For this 220 purpose, an incremental approach was adopted, similar to the operational use in meteorology (Courtier et al., 1998). In its incremental formulation, 3DVAR attempts to minimize the objective function J.
where x is the increment, corresponding to an aerosol state vector that defines the state 225 variables of three-dimensional grid, also known as control variables in the DA process.
At the minimum, the resulting analysis increment x a is added to the background x b to provide the analysis x a . B is the background error covariance matrix, and d is the innovation vector, which is expressed as: The search for a minimum solution to the objective function usually involves a numerical iterative process using a descent algorithm. However, it is difficult to solve using Eq. (1) because it includes the inverse of B. We used the methods of Li et al. (2013) to deal with the inversion of B. First, B can be represented as the product of submatrices (Bannister, 2008), B=DCD T , where D is the background error standard deviation (STD) matrix, and 245 C is the background error correlation matrix. Second, a Cholesky factorization is applied to C because it is a symmetric and positive definite matrix. The Cholesky factorization is: where the matrix C 1/2 is a lower triangular matrix. Using this Cholesky factorization, we can transform the control variables x to z through: Finally, substituting Eq. (4) into Eq. (1), we obtain the desired form of Eq. (1) as The transformed objective function is generally better conditioned, and thus, this transformation expedites convergence when it is iteratively minimized. Along with the 255 objective function J(  z) computed at each iterative step, the derivative of J(  z) with respect to z is computed as: The iteration starts with z = 0 and does not finish until the convergence condition is met or it reaches the maximum number of iterations. The descent algorithm used is the 260 limited memory BEGS method (LBFGS) (Liu and Nocedal, 1989). Finally, a return from the resulting z at the minimum to x a is obtained through Eq. (4).

Control variables
As discussed above, the basic framework of Li et al. (2013)  number of control variables, much more than those in meteorological DA, will cause a heavy burden on computational and memory resources and even lead to computational 275 non-convergence when the cost function is iteratively minimized. Therefore, a reduction in the model variables is essential for a stable and efficient assimilation system, meaning that the model variables should be lumped into fewer control variables in the DA process.
The control variables generally depend on the available observations to be assimilated.
For example, when assimilating routine aerosol measurements, such as the total mass 280 concentrations of PM 2.5 , and PM 10 , the lumped control variables represent the total mass and ammonium salt (NH4 + ). Thus, we lumped these species so that the species treated by 290 the assimilation system were reduced to black carton, organic carton, the summation of sulfate, nitrate, and ammonium salt, the summation of chlorides and sodium salts, which are quite rare inland, and other unclassified inorganics, and these five species were denoted by EC, OC, SSN, CN, and OIN, respectively (You, 2017), the size information of which was retained using the same four bins as in Sect. adding that to its background value will produce an aerosol analysis.

Observation operator and its adjoint
The and Barnard et al. (2010). 320 Although computing aerosol optical properties with WRF-Chem outputs involves many aerosol variables, as shown in Fig. 2 which will have an adverse impact on operational use. The methodology described by in WRF-Chem (Fast et al., 2006). It employs a Chebychev polynomial expansion to fit extinction efficiency, absorption efficiency, scattering efficiency, asymmetry factor, and backscattering efficiency, based on a sample generated from Mie calculations; for 355 example, Q ext can be given by the averaged refractive index for the size bin in question, and they are found using bilinear interpolation over a set of stored coefficients. Once A i is obtained, Q ext is easily computed using Eq. (7). This method is fast and results in maximum errors of just a few percent. More details regarding this methodology can be found in Fast et al. (2006).
Similarly, optical efficiencies are determined by the wet radius r i and refractive index m i 365 as well as the wavelength  more efficiently than the Mie calculation, which also reduces the difficulties of developing the DA system. Hence, we directly transported the OP module to construct the forward observation operator. To perform efficiently, some routine codes unnecessary for the assimilation system were removed so that the forward codes were dramatically reduced compared to the OP module, which is convenient for 370 establishing the TL and adjoint codes. Finally, optical properties are determined by summation across all four size bins. For example, the extinction coefficient is given by: and AOD is the column integration of b ext over the vertical layers. Obviously, the optical properties simulated by the forward operator are distributed over the three-dimensional 375 grid points. To directly compare the observations, spatial interpolation is needed. As mentioned in Sect. 2.2, the TL and AD operators are used to compute the cost function and its derivative with respect to the control variables, respectively. Sourcecode transformation based on the chain rule is usually used to construct TL and adjoint codes, which is an augmentation of forward operator codes that have been already 380 established and tested. Adjoint coding involves strict rules (Zou et al., 1997;Giering and Kaminski, 1998) and is also a heavy task if completed manually. The TL and adjoint codes were generated using the automatic differentiation tool TAPENADE V. were examined to ensure that they were correct prior to real application, and they passed TL/AD sensitivity test; for more details on how to check the TL and adjoint codes, please refer to Zou et al. (1997). Manual interventions are required when these generated codes 390 are incorporated into the DA system, especially in the case of variable calculations on three-dimensional grid points. In addition, because the optical parameters are computed independently at each point, the forward, TL, and adjoint codes are properly organized in a parallel mode to further reduce the computation time.
With the increase in aerosol observations, the simultaneous assimilation of aerosol 395 observations from various platforms has become a trend. This can be achieved by adding the summation associated with the corresponding observation items to the second term in the cost function described by Eq. (1). For this purpose, the system was developed to assimilate as many aerosol measurements as possible so that it has more potential for aerosol analysis and forecasting. In contrast to aerosol optical properties, assimilating mass concentrations is elementary and easily performed using only a simple linear operator. The system developed here can assimilate optical properties, including extinction coefficient, backscattering coefficient, AOD, and even total attenuated backscattering coefficient (Sekiyama et al., 2010), and mass concentrations, including total PM 2.5 or PM 10 mass and individual chemical species mass, simultaneously or 405 separately, with a rational introduction of desired observational data, making it possible for further study.

Data and methods
Two comparative experiments were performed to assess the performance of assimilating aerosol optical properties, which have the same model configurations described in Sect. The Himawari-8 AOT product was selected for assimilation by this system because it has a much higher temporal coverage than that of polar-orbiting satellites, which is 425 promising for aerosol DA, and has also been successfully assimilated using other methods Yumimoto et al., 2016;Dai et al., 2019). The Himawari-8 level 2 AOT is retrieved at 500 nm with a 10-minute observation interval as well as a 0.05°spatial resolution, however the data is noticeably noisy. The level 3 AOT, including AOT_Pure and AOT_Merged, an improved hourly product, is an optimal 430 estimation of AOT at a certain time rather than an estimate of the average state over an hour. AOT_Pure is a subset of level 2 AOT with strict quality control of cloud contamination, and AOT_Merged is the spatial and temporal optimum interpolation of AOT_Pure within an hour (Kikuchi et al., 2018). In this study, we focused on assimilating the latest version of the Himawari-8 level 3 AOT_Merged at 500 nm, which 435 contains as many AOT retrievals as possible with a horizontal resolution of 0.05°×0.05°.
The original AOT data is commonly thinned before directly assimilating to avoid seriously overestimated increments caused by the much higher spatial resolution of AOT data than that of the model Dai et al., 2019;Ha et al., 2020).
Similar to Ha et al. (2020), we thinned the original AOT data over the D01 mesh (27 km) 440 and D02 mesh (9 km), respectively, using the mean value of all the data points in one grid cell. A case of thinned AOTs retrieved at 0300 UTC on November 25, 2018, in D02 is shown in Fig. 3b, the number of the data is 13100 with a maximum value of 1.801.

The AOT observations represent a heavy aerosol pollution episode that occurred in North
China, yet there is a lack of aerosol information in some heavily polluted regions due to 445 cloud contamination, meaning that optical retrievals alone are not sufficient to thoroughly study aerosols. The Himawari-8 AOT is retrieved in the visible and near- To evaluate the performance of Himawari-8 AOT assimilation, three common statistical metrics, including the correlation coefficient (CORR), root mean squared error (RMSE), and mean bias (BIAS), were utilized (Boylan and Russell, 2006). It should be noted that 455 compared with observations is the WRF-Chem D02 simulation, the results given below were computed using D02 outputs. First, we investigated the effects of assimilation on AOT simulations using assimilated Himawari-8 AOTs and independent observations, including MODIS AOD and AERONET AOT observations. Second, we investigated the effects of assimilating AOTs on PM 2.5 analysis and forecasting using hourly surface mass 460 concentration observations (Fig. 1b)

Statistics of background error covariance
Background error covariance is an important issue in data assimilation, which not only specifies the spread of observation information in the background field, namely the way in which the observations affect the background values, but also determines the relative weight of observational and background information across the analysis field. In practice, however, the error covariance B-matrix is too large for a multi-variable aerosol DA to be calculated numerically. For instance, the number of D02 grid points used here is in the order of 10 6 , in addition to 20 state variables, the number of elements in B is, therefore, 10 7 ×10 7 . This size will result in difficulty for computing and storing B, therefore a simplification of B is required. Following the studies of Bannister (2008) and Li et al. 475 (2013), the B-matrix was reduced to background error STD D, horizontal correlation matrix, and vertical correlation matrix, which can be computed separately. These three submatrices have dramatically fewer dimensions than B, so they become computationally treatable. Because the forecast error is unknown, most studies use model outputs to statistically estimate error covariance via modeling or parameterization, such 480 as the NMC method (Parrish and Derber, 1992), which has been regularly used to calculate background error covariance for traditional meteorological fields such as temperature and wind and is also appropriate for aerosol mass concentrations (Benedetti and Fisher, 2007;Liu et al., 2011;Li et al., 2013). This study also utilizes the NMC method to calculate background error STDs, horizontal correlation, and vertical 485 correlation based on differences between 48 h and 24 h forecasts valid at the same time (i.e., 0000 UTC) within a period of one month (November 2018). Because each aerosol state variable has a different background error covariance from others, which has been demonstrated by error statistics (see below), there is a need to estimate the error covariance for each variable to achieve better assimilation performance. 490 The error STD D-matrix of each variable is diagonal and was directly estimated as a domain average at every model level using WRF-Chem D01 and D02 outputs, respectively, and its vertical distribution (only for D02) is shown in Fig. 4 For further simplification, we assumed that horizontal correlations are isotropic (Kahnert et al., 2008), which means that horizontal correlations are just a function of distance and 510 have nothing to do with direction. Consequently, the horizontal correlation can be fitted using a one-dimensional Gaussian function. The correlation between two arbitrary points x 1 and x 2 can be expressed as c(x 1 , control variables in the third size bin are shown in Fig. 5. A salient and common feature of these vertical correlations is that they decrease with height and have strong relation to the boundary layer heights, which means that aerosols are mainly stacked in the 545 boundary layer and tend to accumulate closer to the ground. At the same time, consistent with the horizontal correlations, vertical correlations differ among aerosol variables.
SSN3 has a relatively large vertical scale, whereas CN3 has a relatively small vertical scale, which is consistent with the horizontal features.

Effects on AOT simulations
AOT is of great value for studying aerosol activities, which can be simulated by the forward operator within the DA system. In general, assimilating AOT certainly improves its analysis according to the basic principle of the 3DVAR algorithm, unless it is not successfully assimilated. It is noted that the wavelength variable necessary for computing 555 AOTs described in Sect. 2.4 was set to be 500 nm, the wavelength at which the Himawari-8 AOTs are retrieved. A comparison between the simulated AOTs in the background field and analysis is usually employed to demonstrate the positive effects of assimilation. For illustration, the simulated AOTs as well as the so-called AOT increments at an initialization of 0300 UTC on November 25, 2018, are shown in Fig. 6. 560 The increments, which are differences between the analysis and the background field, can be considered as the improvements generated by assimilation, including magnitude and range, and these increments are spatially consistent with the observations, which means that the observations have an important effect on the assimilation results. Obviously, the simulated AOTs in the background field are dramatically underestimated 565 (Fig. 6a) compared with the observed Himawari-8 AOTs (Fig. 3b), while the analysis brings the AOTs closer to the observations, which is indicated by the prominently positive increments (Fig. 6c). At the same time, assimilation also decreases the AOTs over other regions, with negative increments marked in blue. The background field is generally unable to describe the real pollution, especially in the case of heavy pollution; 570 however, the analysis after assimilation can provide a relatively accurate pollution situation (Fig. 6b).
The distributions shown in Fig. 6 express the effects of assimilating AOT on its analysis.
To quantitatively evaluate the effects, the three metrics described above, CORR, RMSE, and BIAS, were computed through all the data pairs between the simulated and observed 575 AOTs after spatial interpolation from regular grid points to the corresponding observational locations. The larger and closer to 1 CORR, the better the assimilation performance, while the smaller RMSE and the closer to 0 BIAS, the better the assimilation performance. Besides, the assimilated Himawari-8 AOTs were used to compute the metrics, and another independent observation MODIS AOD was employed 580 to fully evaluate the effects of assimilation on the analysis. The Terra MODIS level 2 AOD data (MOD04_L2) were used for validation in this study. As this polar-orbiting satellite passes over the equator at 10:30 local time, we collected all the data between 0000 UTC and 0600 UTC, rather than at a given time, to obtain more observations, matching the simulated values at the initial time (i.e., 0300 UTC). It is worth mentioning 585 that MODIS AOD is retrieved at 550 nm and the simulated AOT is at 500 nm, which will pose some but not largely significant effects on the evaluation. The experiment lasted consecutively for a week in a cycle of 24 h, which contained seven initializations, Beijing, Beijing-CAMS, Beijing_PKU, Beijing_RADI, XiangHe, and XuZhou-CUMT, experiment significantly enlarged AOTs so that they became closer to the observations. This indicates that assimilation significantly improves AOT simulation. As can be seen, the assimilation benefits vary with sites; for instance, assimilation improves the AOT 615 simulation at XuZhou-CUMT less than that at other sites (Fig. 8f), as well as the forecasting time; for example, the assimilation benefits for analyses can reach 24 h in the case of November 25, while they last less than 24 h in the case of November 24. The available observations largely account for this variation. A high pollution event took place on 26 November in North China so that AOTs over 1.6 were measured in Beijing 620 (Fig. 8b), which can also be demonstrated by ground-level PM 2.5 observations (not shown here), but there is few Himawari-8 observations for the event to be assimilated due to cloud contamination. As a result, the assimilation experiment had the same performance as the control experiment, which is unable to describe the high pollution event. It has been concluded that the introduction of AOT observations by assimilation is 625 beneficial to capture heavy pollution levels (Rubin et al., 2017).

Effects on PM 2.5 simulations
PM 2.5 mass concentrations draw large attention from both the public and researchers.
They can be directly modeled using WRF-Chem and are conventionally measured at ambient air quality monitoring stations. As the DA system was developed based on the 630 MOSAIC scheme, it should hopefully improve aerosol analyses and subsequent forecasts, especially for PM 2.5 . North China is located in D02 and is known for its high levels of air pollution; therefore, WRF-Chem D02 outputs were directly employed to investigate the effects of assimilating Himawari-8 AOTs on regional PM 2.5 forecasts. As described in Sect. 2.3, the assimilation process will produce the increments of 20 635 control variables. Of course, we can analyse every increment to assess the effects of AOT assimilation on the corresponding aerosol species simulations. Because there is a lack of observations for aerosol species at each size section, the total increment of PM 2.5 is analysed instead, which is simply a summation of increments over five assimilated species in the first, second, and third size bins. For illustration, Fig. 9  Assimilation directly aims to improve aerosol analyses. As shown in Fig. 10, the data dots between simulated and observed PM 2.5 concentrations were also analysed according  (Wang et al., 2020), however, the use of PM 2.5 concentrations to evaluate the effects of 665 AOT assimilation is not objective and comprehensive because there is a discrepancy between PM 2.5 and AOT observations. For example, no assimilation benefits in some highly polluted areas are generated because of the lack of AOT retrievals, so the PM 2.5 observations cannot reflect the benefits from AOT assimilation.
To investigate the effects of AOT assimilation on PM 2.5 forecasts, time series of three 670 metrics regarding the forecast range (i.e., 24 h) were computed using hourly WRF-Chem D02 outputs and observations. As shown in Fig. 11, in terms of both CORR and RMSE, the assimilation experiment performed better than the control experiment, indicating that the benefits for analyses from AOT assimilation can last up to 24 h. It is noted that the assimilation benefits vary with integration time, decreasing in a fluctuating manner. The 675 computed BIAS indicates that AOT assimilation improves PM 2.5 forecasts within 24 h, but can vary for certain times. As discussed above, assimilation significantly enlarges the simulated PM 2.5 concentrations, but an overcorrection, namely, the simulated values surpass observations, occurs approximately 7-8 h from the initial time (Fig. 11c), which may be ascribed to the dramatically noisy AOT retrievals, as well as an imperfection of 680 the observation operator for aerosol optical properties. 15.12 μg m  3 , and  9.81 μg m  3 , respectively, which means that the assimilation experiment had a better performance in PM 2.5 forecasts than the control experiment.
These metrics indicate that AOT assimilation improves regional PM 2.5 forecasts, especially in the case of heavy pollution. was added to assimilate aerosol optical properties, which consisted of the forward observation operator and its TL and AD codes. We properly reduced the OP module 705 (Fast et al., 2006) in WRF-Chem to establish the forward operator, then the TL and AD codes were generated using an automatic differentiation tool and tested to ensure that they were correct. The system can assimilate aerosol optical properties such as extinction coefficient profile, AOD, and mass concentrations, simultaneously or separately, and these should be applied for further studies in the future. Background error statistics, including SDs, horizontal correlation length scales, and vertical correlations of 20 control variables, were estimated using monthly WRF-Chem outputs based on the NMC method, which are also necessary for the assimilation process.
Our results showed that background error statistics distinctly vary among these control variables, which also illustrates the necessity of building a multi-variable aerosol DA 725 system. analyses, but not significantly, in D02 and the assimilation benefits can last more than 24 h. Assimilation significantly enlarges the underestimated PM 2.5 concentrations to be closer to the real distribution in North China during heavy pollution. The averaged surface PM 2.5 concentrations over D02 were better simulated during the whole pollution 740 period after assimilation compared with corresponding observations, which means that AOT assimilation improves regional PM 2.5 simulations.
In this study, the observation errors of AOT retrievals were simply set as a constant.
However, they should be determined by the retrieval uncertainty, or should be variable at least. Additionally, different thinning schemes for AOT retrievals may have different 745 results. Consequently, these questions should be studied further. As more aerosol optical property observations become available, combined assimilation of optical properties and routine observations, such as aerosol extinction profiles and mass concentrations, has become popular. As described above, the system developed in this study has great Holben, B. N., Eck, T. F., Slutsker, I., Tanré, D., Buis, J. P., Setzer, A., Vermote, E., Reagan, J. A., Kaufman, Y. J., Nakajima, T., Lavenu, F., Jankowiak, I., and Smirnov, A.: AERONET -A