Constraining non-methane VOC emissions with TROPOMI HCHO observations: impact on summertime ozone simulation in August 2022 in China

. Non-methane volatile organic compounds (NMVOC), serving as crucial precursors of O 3 , have a signiﬁcant impact on atmospheric oxidative capacity and O 3 formation. However, both anthropogenic and biogenic NMVOC emissions remain subject to considerable uncertainty. Here, we extended the Regional multi-Air Pol-lutant Assimilation System (RAPAS) using the ensemble Kalman ﬁlter (EnKF) algorithm to optimize NMVOC emissions in China in August 2022 by assimilating TROPOspheric Monitoring Instrument (TROPOMI) HCHO retrievals. We also simultaneously optimize NO x emissions by assimilating in situ NO 2 observations to address the chemical feedback among VOCs–NO x –O 3 . Furthermore, a process-based analysis was employed to quantify the impact of NMVOC emission changes on various chemical reactions related to O 3 formation and depletion. NMVOC emissions exhibited a substantial reduction of 50.2 %, especially in the middle and lower reaches of the Yangtze River, revealing a prior overestimation of biogenic NMVOC emissions due to an extreme heat wave. Compared to the forecast with prior NMVOC emissions, the forecast with posterior emissions signiﬁcantly improved HCHO simulations, reducing biases by 75.7 %, indicating a notable decrease in posterior emission uncertainties. The forecast with posterior emissions also effectively corrected the overestimation of O 3 in fore-casts with prior emissions, reducing biases by 49.3 %. This can be primarily attributed to a signiﬁcant decrease in the RO 2 + NO reaction rate and an increase in the NO 2 + OH reaction rate in the afternoon, thus limiting O 3 generation. Sensitivity analyses emphasized the necessity of considering both NMVOC and NO x emissions for a comprehensive assessment of O 3 chemistry. This study enhances our understanding of the effects of NMVOC emissions on O 3 production and can contribute to the development of effective emission reduction policies.


Introduction
Since the Chinese government implemented the Air Pollution Prevention and Control Action Plan in 2013, there has been a notable reduction in NO x emissions (Zheng et al., 2018).However, despite these advancements, the issue of O 3 pollution persists and, in certain cases, has shown signs of worsening (Ren et al., 2022).The increase in O 3 concentration can be attributed not only to adverse meteorological conditions but also predominantly to unbalanced joint control of non-methane volatile organic compounds (NMVOCs) and nitrogen oxides (NO x ; Li et al., 2020).NMVOCs are vital precursors of O 3 and have a substantial impact on atmospheric oxidation capacity, thereby altering the lifetimes of other pollutants.Accurately quantifying NMVOC emissions holds significant importance in investigating their impact on O 3 chemistry and in formulating emission reduction policies.
Anthropogenic NMVOC emissions have traditionally been estimated using a bottom-up method.However, the accuracy and timeliness of these estimations face challenges owing to the scarcity of local measurements for emission factors, the incompleteness and unreliability of activity data, and the diverse range of species and technologies involved (Cao et al., 2018;Hong et al., 2017).Furthermore, uncertainties arise in model-ready NMVOC emissions due to spatial and temporal allocations using various "proxy" data for different source sectors (Li et al., 2017a).Li et al. (2021) reported substantial discrepancies among emission estimates in various studies, ranging 23 % to 56 %.Biogenic NMVOC emissions are typically estimated using models like the Model of Emissions of Gases and Aerosols from Nature (MEGAN; Guenther et al., 2012) and the Biogenic Emission Inventory System (BEIS; Pierce et al., 1998).NMVOC emissions result from the multiplication of plant-specific standard emission rates by dimensionless activity factors.Nonetheless, apart from inaccuracies in the distribution of plant functional types, empirical parameterization, especially concerning responses to temperature and drought stress, can introduce substantial uncertainties (Angot et al., 2020;Seco et al., 2022;Jiang et al., 2018).Warneke et al. (2010) determined isoprene emission rates through field measurements and conducted a comparison to MEGAN and BEIS estimates, revealing a notable tendency for MEGAN to overestimate emissions while BEIS consistently underestimated them.Similarly, Marais et al. (2014) found that MEGAN's isoprene emission estimates were 5-10 times higher than the canopy-scale flux measurements obtained from African field campaigns.
A top-down approach utilizing observed data has been developed for estimating VOC emissions.For instance, techniques based on aircraft-and ground-based field measurements such as the source-receptor relationships algorithm with Lagrangian particle dispersion model (Fang et al., 2016), mixed layer gradient techniques (Mo et al., 2020), eddy covariance flux measurements (Yuan et al., 2015), and the box model (Wang et al., 2020) have been employed to complement or verify bottom-up results.However, these approaches do not comprehensively consider the complex nonlinear chemical reactions and transport processes that VOCs undergo in the atmosphere.Formaldehyde (HCHO) and glyoxal (CHOCHO) in the atmosphere serve as crucial oxidization intermediates for various VOCs (Hong et al., 2021;Liu et al., 2012).Satellite-based observations can readily detect their presence in the form of vertical column density (VCD) from space, making them widely utilized for estimating NMVOC emissions.A commonly used approach assumes that the observed HCHO/CHOCHO columns are locally linearly correlated with VOC emission rates (Palmer et al., 2006;Liu et al., 2012).However, this approach does not consider the spatial offset resulting from chemistry reactions and transport processes.Chaliyakunnel et al. (2019) conducted a Bayesian analysis to derive an optimal estimate of VOC emissions using HCHO measurements over the Indian subcontinent.Their results indicated that biogenic VOC emissions modeled by MEGAN v2.1 were overestimated by approximately 30 %-60 %, whereas anthropogenic VOC emissions derived from the REanalysis of the TROpospheric chemical composition (RETRO) inventory were underestimated by 13 %-16 %.Cao et al. (2018) employed the GEOS-Chem model and its adjoint, incorporating tropospheric HCHO and CHOCHO column data from the GOME-2A and Ozone Monitoring Instrument (OMI) satellites as constraints, to quantify Chinese NMVOC emissions.They demonstrated a low bias in the MEGAN model, in contrast to the significant overestimation shown in Bauwens et al. (2016), especially in southern China.
Several investigations have been conducted to explore the implications of inverted VOC emissions on surface O 3 .For instance, using the Eulerian box model, Zhou et al. (2023) employed concurrent VOC measurements to constrain anthropogenic VOC emissions.This led to improved simulations of VOCs and O 3 , with a reduction in high emissions by 15 %-36 % in the Pearl River Delta (PRD) region.Local model biases in simulating the oxidation of NMVOCs and O 3 are closely related to uncertainties in NO x emissions (Wolfe et al., 2016;Chan Miller et al., 2017).To tackle these critical questions, Kaiser et al. (2018) applied an adjoint algorithm to estimate isoprene emission over the southeast US by downwardly adjusting anthropogenic NO x emissions by 50 % to rectify NO 2 simulations.Their findings indicated that isoprene emissions from MEGAN v2.1 were overestimated by an average of 40 %, slightly lower than the 50 % reduction in Bauwens et al. (2016).Souri et al. (2020) simultaneously optimized NMVOC and NO x emissions utilizing Ozone Mapping and Profiler Suite (OMPS-NM) HCHO and OMI NO 2 retrievals in east Asia.They found that predominantly anthropogenic NMVOC emissions from the mosaic Asian anthropogenic emission inventory (MIX-Asia) 2010 increased over the North China Plain (NCP), whereas predominantly biogenic NMVOC emissions from MEGAN v2.1 decreased over southern China after the adjustment.Unfortunately, the posterior simulations exacerbated the overestimation of O 3 levels in northern China.
Most studies regarding the inversion of NMVOC emissions or its impact on O 3 neglected the uncertainties associated with NO x -dependent production or loss of NMVOC oxidation and O 3 .An iteratively nonlinear joint inversion of NO x and NMVOCs using multi-species observations is expected to minimize the uncertainties in their emissions and is well-suited to address the intricate relationship among VOCs-NO x -O 3 .In this study, we extended the Regional multi-Air Pollutant Assimilation System (RAPAS) with the ensemble Kalman filter (EnKF) assimilation algorithm to enhance the optimization of NMVOC emissions over China, utilizing the TROPOspheric Monitoring Instrument (TROPOMI) HCHO retrievals with high spatial coverage and resolution.To more accurately quantify the impact of NMVOC emissions on O 3 , NO x emissions were simultaneously adjusted using nationwide in situ NO 2 observations.Process analysis was subsequently employed to quantify various chemical pathways associated with O 3 formation and loss.Through a top-down constraint on both types of emission, this study aims to offer a more scientific insight into the consequences of optimizing NMVOC emissions on O 3 and to contribute to the development of appropriate emission reduction policies.

Data assimilation system
The RAPAS system (Feng et al., 2023) has been developed based on a regional chemical transport model (CTM) and on ensemble square root filter (EnSRF) assimilation modules (Whitaker and Hamill, 2002), which are employed to simulate atmospheric compositions and infer anthropogenic emissions by assimilating surface observations, respectively (Feng et al., 2022(Feng et al., , 2020)).The inversion process follows a two-step procedure within each inversion window, in which the emissions are inferred first and then input into the Community Multiscale Air Quality Modeling System (CMAQ) to simulate initial conditions of the next window.Meanwhile, the optimized emissions are transferred to the next window as prior emissions.The two-step inversion strategy facilitates error propagation and iterative emission optimization, which have proven the superiority and robustness of our system in estimating emissions (Feng et al., 2023).In this study, we extended the data frame to include the assimilation of TROPOMI HCHO retrievals to optimize NMVOC emissions.Concise descriptions of the forecast model, data assimilation approach, and experimental settings follow.

Atmospheric transport model
The Weather Research and Forecasting (WRF v4.0) model (Skamarock and Klemp, 2008) and the CMAQ (v5.0.2;Byun and Schere, 2006) were applied to simulate meteorological conditions and atmospheric chemistry, respectively.WRF simulations were conducted with a 27 km horizontal resolution, covering the entire mainland of China on a grid of 225 × 165 cells (Fig. 1).The CMAQ was run over the same domain but with the removal of three grid cells on each side of the WRF domain.The vertical settings in WRF and CMAQ were the same as in Feng et al. (2020).To account for the rapid expansion of urbanization, we updated underlying surface information for urban and built-up land using the MODIS Land Cover Type product (MCD12C1) Version 6.1 from 2022.Chemical lateral boundary conditions for NO, NO 2 , HCHO, and O 3 were extracted from the output of the global CTM (i.e., the Whole Atmosphere Community Climate Model, WACCM) with a resolution of 0.9°× 1.25°at 6 h intervals (Marsh et al., 2013).Meanwhile, boundary conditions for the other NMVOCs were obtained directly from background profiles.In the first data assimilation (DA) window, initial chemical conditions (excluding NMVOCs) were also derived from the WACCM outputs, whereas in subsequent windows, they were derived through forward simulation using optimized emissions from the previous window.Table S1 in the Supplement lists the detailed physical and chemical configurations.To assess the impact of updated NMVOC emissions on O 3 production efficiency, we further decoupled the contribution of the primary chemical processes from the O 3 levels using the CMAQ integrated reaction rate (IRR) analysis.

EnKF assimilation algorithm
The emissions are constrained using the ensemble square root filter (EnSRF) algorithm introduced by Whitaker and Hamill (2002).This approach fully accounts for temporal and geographical variations in both the transportation and the chemical reactions within the emission estimates.During the forecast step, the background ensembles are derived by applying perturbation to the prior emissions.The perturbed samples are typically drawn from Gaussian distributions with a mean of zero and a standard deviation equal to the prior emission uncertainty in each grid cell.Ensemble runs of the CMAQ were subsequently performed to propagate the background errors with each ensemble sample of state vectors.
In the analysis step, the ensemble mean X a of the analyzed state is regarded as the best estimate of emissions, which is obtained by updating the background ensemble mean through the following equations: where y is the observational vector; H represents the observation operator mapping model space to observation space; the expression y − H X b quantifies the disparities between simulated and observed concentrations; P b H T illustrates https://doi.org/10.5194/acp-24-7481-2024Atmos.Chem.Phys., 24, 7481-7498, 2024 how uncertainties in emissions relate to uncertainties in simulated concentrations; and the Kalman gain matrix K, dependent on background error covariance P b and observation error covariance R, determines the relative contributions to the updated analysis.
State variables for emissions include NO x and NMVOCs.To reduce the degree of freedom in the analysis and avoid the difficulty associated with estimating spatiotemporal variations in background errors for individual species, we focus on optimizing the lumped total NMVOC emissions.During the forecast step, we differentiate individual NMVOC species emissions from the total NMVOC emissions using bottom-up statistical information.For a consistent compari-son between simulations and observations, model-simulated NO 2 was diagnosed at the time and location of surface NO 2 measurements, whereas model-simulated HCHO was horizontally sampled to align with TROPOMI HCHO VCD retrievals, and subsequently integrated vertically.
In this study, the DA window was set to 1 d and daily TROPOMI HCHO columns were utilized as observational constraints in our inversion framework.The ensemble size was set to 50 to strike a balance between computational cost and inversion accuracy.To reduce the impact of unrealistic long-distance error correlations, the Gaspari-Cohn function (Gaspari and Cohn, 1999) was utilized as covariance localization to ensure the meaningful influence of observations on state variables within a specified cutoff radius while mitigating their negative impacts on distant state variables.The optimal localization scale is interconnected with factors such as the assimilation window, the dynamic system, and the lifetime of chemical species.Given an average wind speed of 2.8 m s −1 (Table S2 in the Supplement) and a DA window of 1 d, the localization scales for NO 2 and HCHO, both characterized as highly reactive species with lifespans of just a few hours, were set to 150 and 100 km, respectively.

Observation data and errors
Considering the availability of HCHO data, we utilized daily offline retrievals of tropospheric HCHO columns from Sentinel-5P (S5P) L3 TROPOMI data obtained through the Google Earth Engine (De Smedt et al., 2018).The S5P satellite follows a near-polar sun-synchronous orbit at an altitude of 824 km with a 17 d repeating cycle.It crosses the Equator at 13:30 local solar time (LST) on the ascending node.The spatial resolution at nadir was refined to 3.5 km × 5.5 km on 6 August 2019.Following the recommendations in the S5P HCHO product user manual, we filtered the source data to exclude pixels with a qa_value less than 0.5 for HCHO column number density and 0.8 for aerosol index (AER_AI).The remaining high-quality pixels with minimal snow/ice or cloud interference are averaged to 27 km grids.Figure 1b illustrates the coverage and number of TROPOMI HCHO data retrievals in August 2022 after processing.Although the distribution of filtered data exhibits spatial nonuniformity, most grid cells have observational coverage for over half of the time, particularly in the southern region of China where NMVOC emissions are higher.Based on validation against a global network of 25 ground-based Fourier transform infrared spectroscopy (FTIR) column measurements (Vigouroux et al., 2020), TROPOMI overestimates HCHO emissions by 25 % (< 2.5 × 10 15 molec.cm −2 ) in clean regions and underestimates by 30 % ( 8 × 10 15 molec.cm −2 ) in polluted regions.Therefore, we set the measurement error to 30 %.To evaluate the effect of observational data retrieval errors in emission estimates, we conducted a sensitivity experiment in which HCHO columns were empirically bias-corrected according to the error characteristics de-scribed above (Fig. S1 in the Supplement).The posterior emissions increased by 12.8 % compared to those in the base experiment (EMDA), indicating that the existing retrieval error in HCHO measurements likely exerts an influence on the estimation of NMVOC emissions.The representation error can be disregarded because the model resolution significantly surpasses that of the TROPOMI pixels.
To address the chemical feedback among VOCs-NO x -O 3 , we also simultaneously optimized NO x emissions by assimilating in situ NO 2 observations.The extensively covered high-precision monitoring network can provide sufficient constraints for emission inversion (Fig. 1a).Hourly averaged surface NO 2 observations were obtained from national air quality control stations from the Ministry of Ecology and Environment of the People's Republic of China (https://air.cnemc.cn:18007/,last access: 5 May 2023).In cases where multiple stations were located within the same grid, a random site was chosen for validation while the remaining sites were averaged to mitigate the impact of error correlation (Houtekamer and Zhang, 2016) for assimilation.In total, 1276 stations were chosen for assimilation and an additional 425 independent stations were selected for verification (Fig. 1a).The observation error covariance matrix R incorporates contributions from both measurement errors and representation errors.The measurement error is defined as ε 0 = 1.0 + 0.005 × 0 , where 0 represents the observed NO 2 concentration.Following the approach of Elbern et al. (2007) and Feng et al. (2018), the representative error is defined as ε r = γ ε 0 √ l/L, where γ is a tunable parameter (here, γ = 0.5), l is the grid spacing (27 km), and L is the radius (here, L = 0.5) of the observation's influence area.
The total observation error (r) was defined as r = ε 2 0 + ε 2 r .The observation errors are assumed to be uncorrelated, such that R is a diagonal matrix.

Prior emissions and uncertainties
The prior anthropogenic NO x and NMVOC emissions for China were obtained from the most recent Multi-resolution Emission Inventory for China from 2020 (MEIC, http:// www.meicmodel.org/,last access: 8 May 2023; Zhang et al., 2009).For anthropogenic emissions outside China, we utilized the mosaic Asian anthropogenic emission inventory (MIX) for base year 2010 (Li et al., 2017b).The daily emission inventory, which was arithmetically averaged from the combined monthly emission inventory, was employed as the first guess.Ship emissions were derived from the shipping emission inventory model (SEIM) for 2017, which was calculated based on the observed vessel automatic identification system (Liu et al., 2017).Biomass burning emissions were retrieved from the Global Fire Emissions Database version 4.1 (GFED v4, https://www.globalfiredata.org/,last access: 8 May 2023; van der Werf et al., 2017;Mu et al., 2011).Biogenic NO x and NMVOC emissions were calculated using the Model of Emissions of Gases and Aerosols from Nature (MEGAN) developed by Guenther et al. (2012).
As previously mentioned, the optimized emissions are transferred to the next DA window as prior emissions for iterative inversion.For biogenic emissions, they are decomposed into hourly scales, based on the daily varying temporal profiles in MEGAN as model inputs.Daily emission variations will largely dominate the uncertainty in emissions.Taking into account compensation for model errors and avoiding filter divergence, we consistently applied an uncertainty of 25 % to each model grid of NO x emissions at each DA window, as in Feng et al. (2020).NMVOC emissions typically exhibit greater uncertainties compared to NO x emissions (Li et al., 2017b).Based on model evaluation, the uncertainty in NMVOC emissions was set to 40 % (Kaiser et al., 2018;Souri et al., 2020;Cao et al., 2018).A sensitivity experiment involving a doubling of the prior uncertainty (80 %) revealed that the differences in posterior NMVOC emissions amounted to a mere 0.2 % (Fig. S2 in the Supplement).The implementation of a two-step inversion strategy allows for the timely correction of residual errors from the previous assimilation window in the current window, thus ensuring that the RAPAS system has relatively low dependence on prior uncertainty settings.This study also addresses uncertainties in emissions for CO, SO 2 , primary PM 2.5 , and coarse PM 10 to consider the chemical feedback between different species following Feng et al. (2023).

Experimental design
During the summer of 2022, southern China experienced severe heat wave conditions.The combination of high temperatures and drought had a pronounced effect on vegetation growth and NMVOC emissions, thereby influencing O 3 production (Wang et al., 2023).Consequently, we opted to focus on August 2022, as it presented an ideal period for testing the capabilities of our DA system.Before implementing the emission inversion, a relatively perfect initial field is generated at 00:00 UTC on 1 August 2022 by conducting a 5 d simulation with 6 h interval 3D-Var data assimilation.Subsequently, daily emissions are continuously updated over the entire month of August (EMDA).Additionally, we designed a sensitivity experiment (EMS) to illustrate the significance of optimizing NO x emissions in quantifying VOC-O 3 chemical reactions.In this experiment, NO x emissions were not optimized.To validate the posterior emissions of NO x and NMVOCs in EMDA, we compared two parallel forward simulation experiments, denoted CEP and VEP, corresponding to prior and posterior emission scenarios, respectively, against NO 2 and HCHO measurements.To investigate the impact of optimizing NMVOC emissions on the secondary production and loss of surface O 3 , a forward simulation experiment (CEP1) was conducted with the prior NMVOC emissions and the posterior NO x emissions.Anhttps://doi.org/10.5194/acp-24-7481-2024Atmos.Chem.Phys., 24, 7481-7498, 2024 other forward modeling experiment (CEP2) used the posterior emissions of EMS to evaluate its performance.All experiments employ identical meteorological fields, as well as the same gas-phase and aerosol modules.Table 1 summarizes the different emission inversion and validation experiments conducted in this study.

Inverted emissions
Figure 2 shows the spatial distribution of temporally averaged prior and posterior NMVOC emissions along with the differences in NMVOC emissions.Hotspots of prior NMVOC emissions were prevalent across much of central and southern China.However, posterior NMVOC emissions were predominantly concentrated in the NCP, Yangtze River Delta (YRD), PRD, and Sichuan Basin (SCB), areas characterized by high levels of anthropogenic activity.
High emissions are also located in parts of central and southern China with a warm climate favorable for emitting biogenic NMVOCs.Employing TROPOMI HCHO observations as constraints led to widespread decreases of approximately 60 %-70 % over these areas, indicating substantial biogenic NMVOC emissions.In northwestern China, there was a moderate increase in NMVOC emissions.Potential significant TROPOMI retrieval errors in polluted regions could exacerbate the emission decreases (Text S2 in the Supplement).Additionally, uncertainties in MEGAN parameterization have significant implications for NMVOC emission estimations, particularly concerning the responses of vegetation in MEGAN to temperature and drought stress (Angot et al., 2020;Jiang et al., 2018).Zhang et al. (2021) highlighted that the temperature-dependent activity factor noticeably increases with rising temperatures in MEGAN.P. Wang et al. (2021) pointed out that the lack of a drought scheme is one of the factors causing the overestimation of isoprene emissions in MEGAN.Opacka et al. (2022) optimized the empirical parameter in the MEGAN v2.1 soil moisture stress algorithm, resulting in significant reductions in isoprene emissions and providing better agreement between modeled and observed HCHO temporal variability in the central US.During the study period, China experienced severe heat wave conditions, which may have further hindered MEGAN's ability to effectively capture the impacts of high temperatures and drought on vegetation, thus resulting in significant overestimation in NMVOC emissions (Wang et al., 2022).Ultimately, the biogenic NMVOC emissions decreased by 53.7 %, which was higher than the 43.4 % decrease in anthropogenic NMVOC emissions (Fig. S3 in the Supplement).Overall, the large magnitude of emission decrease of 50.2 % in our inversion is comparable to studies in southern China (Bauwens et al., 2016;Zhou et al., 2023), the southeastern US (Kaiser et al., 2018), Africa (Marais et al., 2014), India (Chaliyakunnel et al., 2019), Amazonia (Bauwens et al., 2016), and parts of Europe (Curci et al., 2010) but opposite to the large-scale emission increase over China in Cao et al. (2018).For NO x (Fig. S4 in the Supplement), the nationwide total emissions decreased by 10.2 %, with the main reductions concentrated in the NCP and YRD, in parts of central China, and in most key urban areas.
Table 2 shows the changes in emissions of biogenic NMVOCs across different land cover types (Fig. S5 in the Supplement) after inversion.The most significant reduction in biogenic emissions occurred within woody savannas, accounting for 26.9 % of the overall reduction, followed by savannas and croplands, accounting for 21.2 % and 17.2 %, respectively.Among all vegetation types, the broadleaf evergreen forests, recognized as the primary source of isoprene emission (H.Wang et al., 2021), presented the greatest uncertainty, with NMVOC emissions experiencing a significant reduction of 66.2 %.Standard emission rates in MEGAN are derived from leaf-or canopy-scale flux measurements and extrapolated globally across regions sharing similar land cover characteristics based on very limited observations (Guenther et al., 1995).This methodology introduces biases due to the large variability in emission rates among plant species.

Evaluations for posterior emissions
The NO x emissions were first evaluated by indirectly comparing the forward simulated NO 2 concentrations with measurements.As shown in Fig. S6 in the Supplement, the CEP with prior emissions exhibited positive biases in eastern China and negative biases in western China.However, when posterior emissions were used in the VEP, a substantial improvement in simulation performance was observed.Biases were limited to within ±3 µg m −3 , and correlation coefficients exceeded 0.7 across the entire region.Figure 3 presents the simulated HCHO VCDs using prior and posterior NMVOC emissions, along with their associated biases.Both experiments showed high VCDs over central and eastern China, especially in the YRD and SCB.However, the CEP displayed substantial overestimation across most of mainland China, with the largest bias reaching 12×10 15 molec.cm −2 in central China.Conversely, the VEP demonstrated notable improvements in both the magnitude and spatial distribution of simulated HCHO columns after the inversion compared to TROPOMI retrievals.More than 84 % of the areas exhibited biases of less than 1 × 10 15 molec.cm −2 , and no significant spatial variation was observed.Overall, the biases in simulated HCHO VCDs decreased by 75.7 % after the inversion.These results emphasize the efficiency of our system in reducing uncertainty in both NO x and NMVOC emissions.

Implications for surface O 3
Figure 4 shows the spatial distribution of the mean bias (BIAS), root mean square error (RMSE), and correlation coefficient (CORR) for simulated O 3 concentrations in the CEP1 and VEP experiments compared to assimilated observations.Beyond the northwestern region of China, the CEP1 exhibited significant overestimation throughout the entire area, with a BIAS of 20.5 µg m −3 .In the VEP, the modeled O 3 chemical production was alleviated, especially in the southern regions of China where NMVOC emissions significantly decreased.Overall, observation-constrained NMVOC emissions resulted in a 49.3 % decrease in the BIAS, bringing it down to 10.4 µg m −3 .Additionally, the RMSE showed noticeable improvement due to the assimilation of HCHO observation, reducing the value from 30.9 to 23.3 µg m −3 .Despite a significant reduction in NMVOC emissions after inversion, notable overestimations persisted in northern provinces such as Liaoning, Hebei, Shanxi, and Shaanxi.This may be attributed to limited NMVOC constraints resulting from insufficient observations during the study period (Figs.1b and 3d).The remaining discrepancies between simulations and observations can be attributed to the combined results of intricate urban-rural sensitivity regimes and O 3 photochemistry reactions, which may not be comprehensively represented by the CMAQ, masking any potential improvement expected from the constrained emissions (See Sect.4.4).The CORR was comparable between the CEP1 and VEP experiments, reflecting the fact that the CMAQ effectively simulated the temporal variation in O 3 concentrations.The biases at the independent sites were similar to those at the assimilated sites (Fig. S7 in the Supplement).
In comparison to CEP1, the decreasing ratios in BIAS and RMSE in the VEP were 46.7 % and 23.4 %, respectively.
Figure 5 shows the time series of simulated and observed hourly O 3 concentrations and their RMSEs, verified against surface monitoring sites.The VEP achieved better representations of diurnal O 3 variations compared with those in the CEP1, especially excelling in reproducing elevated O 3 concentrations at noon.Constraining the NMVOC emissions also led to better model simulations in terms of RMSE throughout the entire study period.The time-averaged BIAS and RMSE decreased from 20.6 and 37.3 µg m −3 to 10.6 and 31.0 µg m −3 , respectively.We also evaluated the simulation results for seven key cities (i.e., Beijing, Shanghai, Guangzhou, Wuhan, Chongqing, Yinchuan, and Changchun, which represent key cities in North, East, South, Central, Southwest, Northwest, and Northeast China, respectively), and the biases in the VEP with posterior emissions all showed a significant reduction (Fig. S8 in the Supplement).Overall, the assimilation of HCHO column observations effectively reduced NMVOC emission uncertainties and consequently improved simulations of HCHO and O 3 .These improvements hold promise for further research into the implications of emission optimizations on regional O 3 photochemistry.
As crucial O 3 precursors, the abundance of NMVOCs plays a significant role in modulating O 3 production.Here we employed the IRRs to elucidate changes related to O 3 production and loss stemming from constrained NO x and NMVOC emissions at the surface.Figure 6 illustrates comparisons of the simulated maximum daily 8 h average (MDA8) surface O 3 levels and net reaction rates before and after the inversion.The CEP1 exhibited an overestimation of O 3 levels, with a BIAS of 22.6 % compared to observed O 3 concentrations.This overestimation corresponded to the high net chemical rates of O 3 in these areas (Fig. S9 in the Supplement).After inversion, O 3 net rates decreased in most regions.Consequently, the VEP experiment yielded results that closely aligned with observations, with a BIAS of 9.2 %.Referring to Fig. 6e and f, differences in production rates of O 3 closely track the changes in the NMVOC emissions (Fig. 2).The discrepancies in specific regions may be attributed to the complex nonlinear relationships associated with O 3 and its precursors, which depend on prevailing chemical regimes and regional transport.Additionally, changes in O 3 production predominantly drive the overall decrease in O 3 concentrations, outweighing changes in O 3 loss.
Figure 7 shows the differences in the six principal pathways responsible for O 3 loss and formation when comparing simulations employing prior and posterior emissions.The reactions of HO 2 + NO and RO 2 + NO are treated as the pathways leading to O 3 formation, whereas O 3 loss involves reactions including NO 2 + OH, O 3 + HO 2 , O 3 + NMVOCs, and O 1 D + H 2 O (Wang et al., 2019).Our analysis was focused on the time frame from 12:00 to 18:00 according to China standard time (CST).The differences were computed by subtracting the simulation with posterior emissions from that with prior emissions.Following emission, NMVOCs undergo rapid oxidation by atmospheric hydroxyl (OH) radicals.Due to the substantial decrease in NMVOC emissions, there was a reduction in the production of hydroperoxy radicals (HO 2 ) and organic peroxy radicals (RO 2 ; Fig. S10 in the Supplement).Consequently, this reduction in HO 2 /RO 2 levels, coupled with their reaction with NO, resulted in diminished O 3 production (Fig. 7a and b).A strong correlation was observed between changes in O 3 production via the https://doi.org/10.5194/acp-24-7481-2024Atmos.Chem.Phys., 24, 7481-7498, 2024 RO 2 + NO reaction and NMVOC emissions (Fig. 2), consistent with the findings of Souri et al. (2020).Typically, in NMVOC-rich environments, a decrease in NMVOC emissions boosts OH concentrations.Consequently, we noted an enhancement in the NO 2 + OH reaction in the eastern and central regions of China.In response to heightened HO x concentrations over these areas, increased O 3 loss through the O 3 + HO x pathway was observed.Furthermore, we detected a substantial decrease in O 3 loss through reactions with NMVOCs, especially in southern China where substantial isoprene emissions are prevalent.This reduction was primarily attributable to the decrease in NMVOC and O 3 levels.
While the NMVOC + O 3 reaction proceeds at a substantially slower rate than that of NMVOC + OH, this specific chemical pathway remains significant in oxidizing NMVOCs and forming HO x in forested areas (Paulson and Orlando, 1996).
The difference in O 1 D + H 2 O is primarily driven by the decrease in O 3 photolysis.Although the rate of O 3 loss decreases in some chemical pathways, overall, the rate of O 3 production dominates the changes in O 3 concentration.2019) identified a substantial overestimation of annual surface O 3 in east Asia, ranging from 20 to 60 µg m −3 .Notably, the NCP exhibited substantial overestimations, with most models overestimating O 3 by 100 %-200 % in May-October.Despite our optimization of O 3 precursor emissions, the posterior simulations still exhibit some degree of overestimation (Fig. 4), suggesting that there may indeed be an effect of systematic bias, such as meteorological fields, spatial resolution, model treatments of nonlinear photochemistry, and other physical processes.The WRF can generally reproduce meteorological conditions sufficiently in terms of temporal variation and magnitude over China (Fig. S11 in the Supplement), with small biases of −0.5 °C, −5.3 %, 0.3 m s −1 , and −42.4 m for temperature at 2 m, relative humidity at 2 m, wind speed at 10 m, and planetary boundary layer height, respectively.However, due to the relatively coarse spatial resolution, NO titration effects in urban areas may not be well represented in the model, which can lead to an overestimation of O 3 in these areas.Additionally, model-inherent errors arising from model structure, parameterization, and the simplification or lack of chemical mechanisms inevitably affect the O 3 simulations.For example, Li et al. (2018) reported that heterogeneous reactions of nitrogen compounds could weaken the atmospheric oxidation capacity and thus reduce surface O 3 concentration by 20-40 µg m −3 for polluted regions over China.These reactions have not been fully incorporated in CMAQ chemical mechanisms.However, there is still a lack of reasonable and effective algorithms for addressing model errors through assimilation (Houtekamer and Zhang, 2016).O 3 concentration and NO x (VOC) emissions are positively correlated in NO x -(VOC-)limited regions and negatively correlated in VOC-(NO x -)limited regions (Tang et al., 2011).Therefore, the uncertainty in NO x emissions can affect the model diagnosis of O 3 -NO x -VOC sensitivity, thereby introducing substantial model errors in the HCHO yield from VOC oxidation.In the base inversion experiment (EMDA), we simultaneously assimilated NO 2 and HCHO observations to optimize NO x and NMVOC emissions.To evaluate the impact of optimized NO x emissions on O 3 -VOC chemistry, EMS disregarded the uncertainty in NO x and focused on optimizing NMVOC emissions.Compared to the EMDA, in areas where NO x is significantly overestimated, NMVOC emissions in the EMS have correspondingly decreased (Fig. 8b).This might be due to the fact that under high NO x conditions, HCHO production occurs promptly, thereby compensating for the substantial amount of HCHO already present in the atmosphere by reducing emissions (Chan Miller et al., 2017).Figure S12 in the Supplement shows comparisons of concentrations and RMSE between the simulations using posterior emissions from EMS and EMDA experiments.Compared to VEP, CEP2 showed a larger RMSE, highlighting the necessity for simultaneous optimization of NO x emissions when evaluating the impact of NMVOC emission optimization on O 3 .Additionally, CEP2 using prior NO x emissions exhibited lower O 3 levels over parts of NCP and YRD as well as over some urban areas (Fig. 8c) but with larger biases and RMSEs (Fig. 8d).The reduction in NMVOC emissions contributed to a partial decrease in O 3 concentration.More significantly, these areas typically align with VOC-limited mechanisms (Wang et al., 2019;W. Wang et al., 2021).Therefore, the overestimation of NO x emissions (Fig. S4) excessively inhibits O 3 accumulation due to the titration effect, thereby disrupting the evaluation of NMVOC contributions to O 3 .This substantial disparity also seriously affects O 3 source apportionment, precursorsensitive area delineation, and emission reduction policy formulation.

Summary and conclusions
In this study, we extended the RAPAS assimilation system with the EnKF assimilation algorithm to optimize NMVOC emissions using TROPOMI HCHO retrievals.Taking the MEIC 2020 for anthropogenic emissions and MEGAN v2.1 output for biogenic sources as a priori emissions, NMVOC https://doi.org/10.5194/acp-24-7481-2024Atmos.Chem.Phys., 24, 7481-7498, 2024 emissions over China in August 2022 were inferred.Importantly, we implicitly took the chemical feedback among VOCs-NO x -O 3 into account by simultaneously adjusting NO x emissions using nationwide in situ NO 2 observations.Furthermore, we quantified the impact of NMVOC emission inversion on surface O 3 pollution using the CMAQ-IRR.
The application of TROPOMI HCHO observations as constraints led to a substantial reduction of 50.2 % compared to the prior emissions for NMVOCs in August 2022.A domain-wide significant decrease was found over areas of central and southern China with abundant forests, especially with the broadleaf evergreen forests, implying a con- siderable overestimation of biogenic NMVOC emissions.Observation-constrained emissions significantly improved the performance of surface NO 2 and HCHO column simulations, reducing biases by 97.4 % and 75.7 %, respectively.This highlights the effectiveness of RAPAS in reducing uncertainty in NO x and NMVOC emissions.Isolating the impact of NO x emission changes, the posterior NMVOC emis-sions significantly mitigated the overestimation in prior O 3 simulations, resulting in a 49.3 % decrease in surface O 3 biases.This is mainly attributed to a substantial decrease in the RO 2 +NO reaction rate (a major pathway for O 3 production) and an increase in the NO 2 + OH reaction rate (a major pathway for O 3 loss) during the afternoon, resulting in a decrease https://doi.org/10.5194/acp-24-7481-2024Atmos.Chem.Phys., 24, 7481-7498, 2024 Data availability.The observations used for assimilation and the optimized emissions from this study can be accessed at https://doi.org/10.5281/zenodo.10079006(Feng and Jiang, 2023).
Author contributions.SF and FJ conceived and designed the research.SF developed the data assimilation code, analyzed data, and prepared the paper with contributions from all co-authors.FJ supervised and assisted in conceptualization and writing.TQ, NW, MJ, SZ, JC, FY, and WJ reviewed and commented on the paper.

Competing interests.
The contact author has declared that none of the authors has any competing interests.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper.While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.Regarding the maps used in this paper, please note that Figs.1-4, 6-8, and the key figure contain disputed territories.R&D Program of China (grant no.2022YFB3904801), the National Natural Science Foundation of China (grant nos.42305116 and 42377102), the Natural Science Foundation of Jiangsu Province of China (grant no.BK20230801), and the Hangzhou Agricultural and Social Development Scientific Research Project (grant no.202203B29).The authors also gratefully acknowledge the use of the High-Performance Computing Center (HPCC) blade cluster system of Nanjing University for the performance of the numerical calculations in this paper.Review statement.This paper was edited by Zhibin Wang and reviewed by two anonymous referees.

Figure 1 .
Figure 1.Model domain and observation network (a) and number of TROPOMI HCHO data retrievals during August 2022 in each grid (b).The dashed red frame delineates the CMAQ computational domain; black squares denote surface meteorological measurement sites; navy triangles indicate sounding sites (Text S1 in the Supplement); and red and blue dots represent air pollution measurement sites, where red dots are used for assimilation and blue dots for independent evaluation.

Figure 3 .
Figure 3. Simulated HCHO vertical column densities using prior (a) and posterior (b) NMVOC emissions, along with their biases (c and d) versus TROPOMI measurements.All model results were sampled at TROPOMI overpass time.

Figure 4 .
Figure 4. Spatial distribution of mean bias (BIAS; a and b), root mean square error (RMSE; c and d), and correlation coefficient (CORR; e and f) for simulated O 3 using prior (a, c, and e, CEP1) and posterior (b, d, and f, VEP) emissions versus assimilated observations.

Figure 5 .
Figure 5.Time series comparison of hourly surface O 3 concentrations (µg m −3 ) and RMSE (µg m −3 ) from the CEP1 and VEP experiments versus all observations at 1701 monitoring sites.The blue and red values on the graph represent the time-averaged statistics in the CEP1 and VEP experiments, respectively.
over China have a tendency to be overestimated in studies involving chemical transport modeling.For example, by intercomparing 14 state-of-the-art CTMs with O 3 observations within the framework of the Model Intercomparison Study for Asia (MICS-Asia) III,Li et al. (

Figure 6 .
Figure 6.Comparisons of (a, b) simulated maximum daily 8 h average (MDA8) O 3 concentrations, (c, d) net reaction rates, and (e, f) differences in production and loss rates between CEP1 and VEP experiments at the surface.Surface MDA8 O 3 values (circles) from the national air quality control stations were overlaid in (a) and (b).

Figure 7 .
Figure 7. Differences in six major pathways of O 3 production and loss between the CEP1 and VEP experiments at the surface.Time period: August 2022, 12:00-18:00 CST.PO 3 and LO 3 represent the pathways of O 3 formation and loss, respectively.

Figure 8 .
Figure 8. Spatial distribution of (a) posterior emissions in the EMS experiment, (b) differences in posterior emissions between EMS and EMDA, and differences in (c) simulated O 3 concentrations and (d) RMSE between CEP2 and VEP experiments.EMS did not optimize NO x emissions compared to EMDA.

Financial support .
This research has been supported by the National Key Research and Development Program of China (grant no.2022YFB3904801), the National Natural Science Foundation of China (grant no.42305116 and 42377102), the Natural Science Foundation of Jiangsu Province (grant no.BK20230801), and the Hangzhou Science and Technology Bureau (grant no.202203B29).

Table 1 .
The assimilation, sensitivity, and validation experiments conducted in this study.

Table 2 .
Prior and posterior biogenic NMVOC emissions, as well as the differences in different land cover types.