Ozone response modeling to NOx and VOC emissions: Examining machine learning models

Current machine learning (ML) applications in atmospheric science focus on forecasting and bias correction for numerical modeling estimations, but few studies examined the nonlinear response of their predictions to precursor emissions. This study uses ground-level maximum daily 8-hour ozone average (MDA8 O 3 ) as an example to examine O 3 responses to local anthropogenic NOx and VOC emissions in Taiwan by Response Surface Modeling (RSM). Three different datasets for RSM were examined, including the Community Multiscale Air Quality (CMAQ) model data, ML-measurement-model fusion (ML-MMF) data, and ML data, which respectively represent direct numerical model predictions, numerical predictions adjusted by observations and other auxiliary data, and ML predictions based on observations and other auxiliary data. The results show that both ML-MMF (r = 0.93 – 0.94) and ML predictions (r = 0.89 – 0.94) present significantly improved performance in the benchmark case compared with CMAQ predictions (r = 0.41 – 0.80). While ML-MMF isopleths exhibit O 3 nonlinearity close to actual responses due to their numerical base and observation-based correction, ML isopleths present biased predictions concerning their different controlled ranges of O 3 and distorted O 3 responses to NOx and VOC emission ratios compared with ML-MMF isopleths, which implies that using data without support from CMAQ modeling to predict the air quality could mislead the controlled targets and future trends. Meanwhile, the observation-corrected ML-MMF isopleths also emphasize the impact of transboundary pollution from mainland China on the regional O 3 sensitivity to local NOx and VOC emissions, which transboundary NOx would make all air quality regions in April more sensitive to local VOC emissions and limit the potential effort by reducing local emissions. Future ML applications in atmospheric science like forecasting or bias correction should provide interpretability and explainability, except for meeting statistical performance and providing variable importance. Assessment with interpretable physical and chemical mechanisms and constructing a statistically robust ML model should be equally important.


Introduction
Air pollution has gained great attention owing to its adverse effects on human health (Apte et al., 2018;Kuo et al., 2021), climate , agriculture (Tai and Martin, 2017), ecosystems (Zarnetske et al., 2021), and visibility (Huang et al., 2014). In regional air quality management, controlling local anthropogenic emissions is a common way to improve regional air quality. Predicting air quality under designed emission control scenarios by using chemical transport models (CTMs) like the Community Multiscale Air Quality (CMAQ) model has been much studied (Arnold and Dennis, 2006;Che et al., 2011).
Moreover, to meet the prompt and various needs of policymakers, response surface modeling (RSM) was developed to assess the improved or changed air quality based on designed emission control strategies without extra CTM simulations. That is, RSM can retrieve the nonlinear equation between ambient air pollutant concentrations (e.g. O 3 ) and multiple precursors emissions (e.g. NOx and VOCs) from multiple emission sources based on an ensemble of CTM simulations with changing O3 precursors emissions Xing et al., 2018;Zhu et al., 2015), and the user can apply the retrieved nonlinear equation to timely estimate the changed air pollutant concentrations based on input emission ratios and support emission control strategies. For example, RSM has been intensively applied to assess the NOx and VOC emission control strategies of O 3 pollution Xing et al., 2018;Zhao et al., 2015). RSM can identify ambient O 3 sensitivity to NOx and VOC emission by O 3 isopleth and further classify the isopleth regimes into two chemical regimes, NOx-limited and VOC-limited. In the NOx-limited regimes, O 3 increases with increased NOx emissions and exhibits limited response to VOC emissions, and vice versa. Classification of the O 3 formation regime in the isopleth can assist policymakers to determine whether NOx or VOC emissions should be controlled preferentially in emission control strategies (Gipson et al., 1980). However, although RSM was employed to improve regional air quality in several previous studies, these studies were still based on simulated results and neglected the bias between the modeled estimations and observations in the benchmark case (Kelly et al., 2021;Li et al., 2022;Xing et al., 2022), which could largely affect the nonlinearity between pollutants and precursor emission changes.
To forecast air quality and support air quality policies, machine learning (ML) or machine intelligence has been rapidly developed and intensively implemented in environmental science and air quality management (Kang et al., 2018;Zheng et al., 2021). ML in atmospheric science applications for predicting air quality is mostly driven by historical air quality data, real-time monitoring data, measurements, satellite images, land-use information, or emission activity-related indices. ML is relatively easy to execute and can provide more accurate predictions compared with CTMs, which still need complicated datapreparing processes and computationally-intensive time and resources, and have a larger modeling bias. However, ML is also debated and remains low persuasiveness due to its black-box modeling process and failure to provide interpretability and explainability concerning physical/chemical mechanisms (Fu et al., 2022;Zheng et al., 2021).
In previous applications, ML can serve as a bias corrector to adjust modeling results. Several measurement-model fusion (MMF) (Fu et al., 2022) techniques in post-analysis have been developed in recent years to adjust CTM results based on observations (Geng et al., 2020;Lu et al., 2020;Requia et al., 2020;Sayeed et al., 2021). Air pollutant estimations without correction by observations could underestimate or overestimate improved air quality and derived environmental benefits (Fu et al., 2022). In addition, ML can also forecast air quality based on historical observations and other auxiliary data (e.g. meteorological and land-use data) without involving CTM results and still have good performance (Ausati and Amanollahi, 2016;Song et al., 2015;Zhou et al., 2019). However, whether ML either serves as a bias corrector or a forecaster, few ML studies examined pollutants' sensitivity to their precursor emissions based on observation-corrected results.
In this study, we used the maximum daily 8-hour O 3 average (MDA8 O 3 ) as the target index and selected Taiwan as the study region due to its island geography, high-density air quality monitoring stations (n = 73), and three-year-updated emission inventory. The goal of this study is to (1) verify the capability of ML to correct CMAQ modeling results (denoted as ML-MMF data) and predict MDA O 3 without CMAQmodeled results (denoted as ML data) in the benchmark-case modeling performance and (2) examine O 3 nonlinear responses to all anthropogenic NOx and VOC emission ratios by O 3 isopleths by using CMAQ, ML-MMF, and ML modeling results.

Methodology
The technical work is illustrated in Fig. 1. Three types of data (CMAQ, ML-MMF, and ML) were prepared for RSM; CMAQ data were direct outputs from CMAQ-modeled results; ML-MMF data are the corrected estimates by observation and CMAQ inputs (emissions, boundary conditions, meteorology, and land-use), and the constructed model was employed to predict O 3 based on the changed CMAQ outputs, the other CMAQ inputs and changed emissions (kg/day) under different emission scenarios; ML outputs are predictions constructed by using O 3 observations and CMAQ inputs, and the constructed model was utilized to predict O 3 based on the changed emissions (kg/day) and the other CMAQ inputs under different emission scenarios. Second, RSM was executed for each dataset to predict O 3 under different NOx and VOC emission ratios in proportion to the baseline emissions (emission ratio = 1) in the benchmark case. O 3 isopleths were finally constructed for each dataset and were validated by out-of-samples and observations.

Data preparation
Air quality regions in this study were categorized into six regions: Northern (NT), Chu-Miao (CM), Central (CT), Yun-Chia-Nan (YCN), Kao-Ping (KP), and Eastern (ET). A total of 73 air quality monitoring stations with hourly O 3 measurements were included (Fig. 2). The modeling period included January, April, July, and October 2016 to represent different seasons. Meteorological fields were firstly simulated by WRF (version 3.8), and hourly O 3 concentrations were simulated by the CMAQ model (version 5.2) with the gas-phase chemistry module, Carbon Bond 6, (Sarwar et al., 2008) and aerosol module, AERO6 (Appel et al., 2013) mechanisms. The configurations of the simulation domain nested four layers from East Asia (81 km × 81 km) to Taiwan island (3 Fig. 1. Technical flowchart of this study. km × 3 km) which covers 90 (row) × 135 (column) horizontal grid cells (Lai and Lin, 2020). Daily emission data from Taiwan Emission Data System (TEDS) version 10.0 with 3 km × 3 km resolution developed by Taiwan EPA including industrial, mobile, fugitive, and biogenic sources were used. Before CMAQ modeling, emission distribution and speciation were processed by the USEPA SMOKE program (U.S. EPA, 2018), which can apply temporal and spatial allocation and chemical speciation for industrial, mobile, and fugitive sources from TEDS. The modeling performance assessment of meteorology factors and CAMQ-modeled O 3 are shown in Tables A1 and A2, respectively.
The details of the developed ML-MMF and ML framework were illustrated in Appendix I, and the major difference between ML-MMF and ML framework is predicting O 3 with and without CMAQ direct outputs for prediction, as shown in the following conceptual equations: where CMAQ is CAMQ output; Emis are emission data (kg/day); BC are boundary conditions; Met are meteorological variables; Land are landuse variables. A total of 4039 grid cells with local emission, meteorology, and land-use information was considered. Basically, Both ML-MMF and ML models employed five techniques including the k-nearest neighbors' regression (KNN) (Kramer, 2013), regression tree (RT) (Loh, 2008), random forest (RF) (Deng et al., 2001), gradient boosted tree models (GBM) (Natekin and Knoll, 2013), and convolutional neural network (CNN) (Albawi et al., 2017), which can construct the nonlinearity between input variables (emissions, boundary conditions, meteorology, and land-use data) and observed O 3 concentrations in the benchmark case. The best model with higher accuracy, reasonable spatial predictions, and no overfitting tendency was selected to predict O 3 concentrations (Appendix I); 60% and 40% of the data set were selected as the training dataset and the testing dataset, respectively; The 10-fold cross-validation was also conducted to optimize hyperparameters of the training dataset based on the uncertainty of modeling performance. The selected model for ML-MMF and ML would be further utilized to predict O 3 based on CMAQ outputs (not for the ML model), the changed emissions (kg/day), and the other fixed auxiliary data (boundary conditions, meteorology, and land-use data) under different emission scenarios.
The variables selected for ML-MMF and ML were related to emissions of precursor species including NOx and VOCs, boundary conditions that affect the background level of O 3 , NOx, and VOCs, meteorological factors involved with photochemical reactions and transport fluxes of air, and time-independent land-use geographical information (Table 1). Boundary conditions are the averaged values of all the grid cells around the modeling domain, and the temporal variations of meteorological and air pollution information during modeling period were considered. Meteorological factors on 850 hPa and 690 hPa represent the weather conditions of the mixing layer and lower troposphere layer (Lu et al., 2021). MDA8 O 3 would be the dependent variable because O 3 concentrations usually are higher during the daytime due to the existence of ultraviolet energy (Ware et al., 2016). For each month, the modeling performance was evaluated by correlation coefficient (r), mean absolute error (MAE), and root mean square error (RMSE) as shown in the following equation.

Response surface modeling (RSM)
RSM can retrieve non-linear O 3 responses to anthropogenic NOx and VOC emission ratios, which are changed ratios of emission compared with baseline emission in the benchmark case (emission ratio = 1). First, to generate the control matrix of anthropogenic NOx and VOC emission ratios for each air quality region, the Latin hypercube sampling (LHS) method was utilized to estimate enough sample size of the number of CMAQ simulations for RSM while providing enough statistical power and saving computing resources. The LHS design can provide flexibility, which selects a number of runs based on limited computing resources (USEPA, 2006), and also can capture the nonlinearity of ozone. The number of CMAQ runs for RSM was decided by the emission ratio range (0 %-200 %) and a number of control sectors such as power plants and mobile sources, the accuracy of O 3 response to NOx and VOC emissions, and the available computing resources. Typically, the number of CMAQ runs for RSM around 50 simulations is enough for constructing a statistically robust model . Extra simulations for individual air quality regions were also conducted due to higher O 3 concentrations in spring, fall, and winter. Emission ratios of NOx and VOC are designed from 0% to 200%, and the number of CMAQ runs for six air quality regions is shown in Table A3.
In each air quality region, RSM involves multiple precursor emissions (NOx and VOC) and anthropogenic emission sectors (industrial, mobile, and fugitive sources) from multiple cities and counties, as shown in Fig. A1 and Table A4. Multiple city/county-level NOx and VOC emission sectors (Table A4) were used for each air quality region. Self-adaptive RSM (SA-RSM) based on regression method and stepwise selection was employed to predict O 3 based on designed emission ratios of NOx and VOC, and the followed multidimensional kriging method was used to illustrate O 3 isopleths to show O 3 nonlinear response to NOx and VOC emission ratios USEPA, 2006;Xing et al., 2019Xing et al., , 2018. In each air quality region, the averages of grid cells with daily MDA8 O 3 higher than 60 ppb were used for RSM based on WHO and Taiwan EPA standards (World Health Organization, 2005). Also, higher O 3 concentrations were identified to be more sensitive to anthropogenic emissions of NOx and VOC . In each region, the following equation linking the O 3 concentration (ΔConc) in each grid to city/ county-level NOx or VOC emission ratios of the emission sectors and stepwise selection that can automatically select polynomial variables for all grid cells with 0.15 of the entering level and the leaving level were utilized : where ΔConc is the changed daily MDA8 O 3 concentration (response) from the baseline scenario (the benchmark case, emission ratio = 1) in the grid (m, n); ΔE i is the changed emission ratios from 1 of k city/ county-level NOx or VOC emission sectors; ΔE j is the other changed emission ratios from 1 of emission sector other than ΔE i ; a i , b i , and c j are the nonnegative integer powers of ΔE i and interaction terms of ΔE i and ΔE j . We set a i from 1 to 3 and b i and c j from 1 to 2, respectively. A total of 2782 CMAQ simulations were used for RSM. To assure the performance of RSM, a total of 107 CMAQ simulations were used for out-of-sample validation (Table A3). The out-of-samples were randomly selected based on the specified emission ratios (0 %-200 %) by the LHS method as well. The RSM performance was evaluated by correlation coefficient (r), Mean normalized error (Mean NE), and Maximum NE (Max NE) (Xing et al., 2018) as shown in the following equations.
where Finally, the response of O 3 concentrations to changes in anthropogenic NOx and VOC emission ratios from all sources and all cities and counties in each air quality region was illustrated by O 3 isopleths, which can be divided into NOx-limited (NOx-sensitive and VOC-rich) and VOClimited (VOC-sensitive and NOx-rich) regimes and used to help determine whether NOx or VOC emissions should be controlled more aggressively in strategies to alleviate ground-level O 3 concentrations (Gipson et al., 1980).

Benchmark case modeling performance
The performance of CMAQ, ML-MMF, and ML modeling compared with observations is shown in Table 2. Compared with ML-MMF and ML predictions, the CMAQ predictions have lower performance for all months (r = 0.41-0.80, RMSE = 13.45-21.19 ppb). After CMAQ data were adjusted by the auxiliary data (emission, meteorology, boundary condition, and land-use data) and corrected by observations, ML-MMF modeling presents significantly better performance for all months (r = 0.93-0.94, RMSE = 4.49-7.43 ppb). The improved performance highlights the benefits of adding auxiliary data in ML-MMF applications.
Even though excluding CMAQ output and purely using auxiliary data for ML modeling, the ML model still maintains comparable modeling performance (r = 0.89-0.94, RMSE = 4.62-8.94 ppb). Both ML-MMF and ML have no overfitting tendency considering their similar performance in training and testing data. Averages of observed and modeled MDA8 O 3 for CMAQ, ML-MMF, and ML in selected months are presented in Fig. 3. Fig. 3(a) shows obvious overestimations of CMAQ compared with observations, especially in the regions along with the west side of the mountains and the central basin. Fig. 3(b)(c) shows the ML-MMF and ML estimations are much close to observations and have lower modeling bias (RMSE and MAE) compared with CMAQ.
Monthly concentrations of CMAQ, MMF, and ML are presented in Fig. A2. Among all selected months, Fig. A2(a) shows CMAQ remain overestimated in all months and has spatial specificity in each month, in which CT, YCN, and KP region are overestimated in January, NT, CM, and CT regions are overestimated in April, and whole west regions overestimated in July and October. On the other hand, Fig. A2(b)(c) shows the ML-MMF model and ML model present similar spatial distributions for all months except for the ET region in July and October. The different estimations in the ET region are due to fewer air quality monitoring stations in the region, so the ML model needs to learn the data from the monitoring sites in the western regions.

Adjusted O 3 seasonal patterns
In Taiwan, frequent long-range transboundary pollution events from mainland China with higher air pollutant concentrations occur in spring, fall, and winter when northeast monsoon prevails Huang et al., 2019). Since there is no standard method to identify longrange transboundary pollution in Taiwan, our method to classify event and non-event days is illustrated in Appendix II. In the modeling period, there were 14, 13, and 3 event days in January, April, and October, respectively, and no event days occurred in July. Fig. 4 presents the monthly MDA8 O 3 average of event and non-event days based on ML-MMF estimations. In January, transboundary plumes mostly contained higher PM, NOx, and CO, so there is no significant O 3 concentration difference between the event and non-event days. In April, the MDA8 O 3 averages of 13 event days show a significant impact on the whole island. Compared with event days, O 3 exceedances on non-event days mostly occur in central mountainous areas, which may result from higher biogenic VOC emissions in spring (Chang et al., 2005). In July, higher O 3 level in northern Taiwan is due to the prevalence of southwest and south monsoon, and NOx, VOC, and formatted O 3 transport from southern regions accumulated in northern regions. In October, both event days and non-event days show similar distribution, but event days present higher O 3 concentration in western areas, especially in the NT region.

RSM performance
Because higher O 3 concentrations are more sensitive to anthropogenic NOx and VOC emissions , we performed RSM for O 3 exceedance days (MDA8 O 3 > 60 ppb) for April, July, and October and excluded January due to no O 3 exceedance days. RSM performances by using CMAQ, ML-MMF, and ML data for six air quality regions (NT, CM, CT, YCN, KP, and ET) of the event and non-event days in April, July, and October are illustrated in Table A5. Most of the RSM results meet the statistical requirement of mean NE < 3 % and max NE < 10 % considering r (0.855-1.000), mean NE (0.02 %-2.62 %), and max NE (0.08 %-9.88 %), except for the CM region's October (max NE = 11.48 %) and the YCN region's April (max NE = 11.38 %) and October (max NE = 12.04 %) of CMAQ data and KP's April (r = 0.759) of ML-MMF data. Good RSM performance represents that RSM can reproduce CMAQ, ML-MMF, or ML outputs by only using county/city-level NOx and VOC emission ratios without extra CMAQ simulations or ML-MMF/ML modeling. The higher max NE may be due to unstable RSM performance when NOx and VOC emission ratios are very low (0 %) or very  (Luo et al., 2021). Although the limited performance could be improved by adding more simulations under these extreme emission ratios, these scenarios are still impractical and hardly verified by observations. For example, it is almost impossible to measure pollutant concentrations without any anthropogenic emissions in urban areas. Furthermore, the validation of the baseline emission ratio (emission ratio = 1, Fig. A3) compared with observations shows a satisfactory performance (Mean NE = 2-6 %). Second, the higher bias may imply the high impact of non-local emissions, which means air pollutants or their precursors from upwind countries or other air quality regions dominate downwind local air quality, so local emissions only have a limited impact on local air quality and hardly explain the temporal variation of observed concentrations. As previously mentioned, in April and October, long-range transboundary air pollution transported from mainland China and plumes from upwind regions frequently accumulate and deteriorate air quality in southern air quality regions.

O 3 sensitivity to local NOx and VOC emissions
Monthly combined O 3 isopleths of all air quality regions in April, July, and October for the event and non-event days by using CMAQ, ML-MMF, and ML data are presented in Fig. 5, and the O 3 isopleths for the individual zone are shown in Figs. A4-A8. All isopleths for each air quality region show the averages of grid cells having MDA8 O 3 concentrations higher than 60 ppb in the region, and the x-axis and y-axis represent the NOx and VOCs emission ratios for all sources in all cities and counties in the region.
First, ML-MMF isopleth is considered as the better tool to explain O 3 nonlinear responses to anthropogenic NOx and VOC emissions due to its observation-based adjustment and CTM basis, and ML-MMF isopleths show that the improved effort by reducing local emissions should be less than what CMAQ model simulated. For example, the CMAQ isopleth in July ranges from 72 ppb to 90 ppb concerning different emission ratios, but the ML-MMF isopleth presents a narrower range, which is from 62  ppb to 68 ppb. The reduced range of isopleths emphasizes the importance of adjustment by observations, and CMAQ-based RSM results may bias the improved air quality by emission control strategies.
Second, the ML-MMF O 3 isopleths show changed O 3 sensitivity after fusing with observations. For example, CMAQ isopleths in April with a combined regime of NOx-limited and VOC-limited trends (Fig. 5(a)) are corrected to VOC-limited in ML-MMF isopleths (Fig. 5(b)) for the event and non-event days. If further comparing CMAQ modeled NOx and VOCs with observations (Fig. A9), CMAQ modeled estimations in the benchmark case were found to overestimate NOx in the NT region and underestimate VOCs compared with observations for all regions, which means the real O 3 -NOx-VOC system should be more VOC-limited and NOx-rich.
Third, ML O 3 isopleths present diverse O 3 ranges and distorted NOx and VOC regimes compared with ML-MMF O 3 isopleths, which implies ML without the support from CMAQ output could provide biased air quality predictions concerning its different O 3 response to emissions, although ML model performances well in the benchmark case. For example, for the non-event days of the NT region in April (Fig. A4), the ML isopleth identifies an obvious VOC-limited trend, but the ML-MMF isopleth shows a combined NOx-limited and VOC-limited trend. For the other months, the ML O 3 isopleths also present a lower O 3 concentration level and narrower range, even though the trend may be similar to the ML-MMF O 3 isopleths.
The disparity between ML-MMF and ML O 3 isopleths implies that previous air quality forecasting studies (Ausati and Amanollahi, 2016;Song et al., 2015;Zhou et al., 2019) without involving CTM results and only using historical observations or other auxiliary data to predict future air quality may be potentially biased, even though the ML models met statistical requirements. Because the ML model could deviate the pollutant's responses to its precursor emissions from the real responses of air pollutant concentrations due to its lack of physical and chemical basis. One major explanation is the lower variable importance priority of emission variables in the ML model. The variable importance priority of emission variables is relatively lower than other variables like meteorological variables (Fig. A11), so the model missed the link between O 3 responses with NOx and VOC emissions. The lower variable importance priority is because most emission inventories are constructed by topdown methodology with routine monthly and weekly variation, the regular temporal variation of emission data has limited capability to explain dramatically varied pollutants. On the contrary, CMAQ-modeled O 3 has a higher importance priority in the ML-MMF model (Fig. A11), so its nonlinear information to NOx and VOC emissions still remains in ML-MMF outputs. Another potential reason is that the ML model only learns grid-level information without considering the air pollutants transported and reacted between grids, so the ML predictions are unable to reflect the accumulated pollutants from the upwind grids or air quality regions, even adding zonal emissions as independent variables cannot improve the performance. Moreover, the limited sample size from observations could be not large enough for the ML model, because O 3 exceedance (>60 ppb) does not always occur around the monitoring stations within limited exceedance days (Fig. A10). Therefore, the ML model under certain seasons (e.g. non-event days in April) has limited data to construct a robust model. In summary, we suggest not using ML O 3 isopleths for emission control strategies.
Fourth, transboundary pollution from upwind may change O 3 sensitivity to local NOx and VOC emissions. In the NT region, which is the first region suffering from transboundary pollution from mainland China, the ML-MMF isopleths (Fig. A4) of non-event days in April remain a combined regime of NOx-limited and VOC-limited, but for event days in April, the isopleth changes to serious VOC-limited. Because the northeast monsoon carries high NOx concentrations from the upwind, the NT region turns to a NOx-rich atmosphere and local O 3 response becomes more sensitive to VOC emissions. Even though there is no local NOx emission (emission ratio = 0), transboundary NOx still can support local O 3 formation. Moreover, if taking out boundary conditions and local meteorology during the fusion process (Fig. A12), boundary conditions and local meteorology play a significant role that changing O 3 sensitivity, which also proves the significant impact of transboundary pollution. The changed O 3 sensitivity affected by outside pollution from the upwind was also reported in California, where wildfires in the late summer would emit large amounts of VOCs that can be transported to urban areas and significantly change the urban O 3 sensitivity (Wu et al., 2022).

Fig. 6.
Monthly O 3 isopleths for event and non-event days in selected months based on ML-MMF data.

Suggested emission control preference
Monthly O 3 isopleths for the event and non-event days based on ML-MMF data are shown in Fig. 6. The O 3 isopleths in April show VOClimited trends for both the event and non-event days for all western regions, but the range reveals a limited improved effort by reducing VOC emission, which is around 0.25-4.5 ppb. Transboundary NOx could also accumulate for days and affect O 3 sensitivity to local emissions on nonevent days. In July, controlling NOx emissions is a more effective way to lower O 3 concentrations in the CM, CT, YCN, and ET regions, and the NT and KP regions present combined NOx-limited and VOC-limited regimes. The more VOC-limited trend in the NT and KP regions may be due to more population and vehicle emissions in these regions.
In October, O 3 isopleths are similar between the event and non-event days, because transboundary pollution in October mostly carries PM and SO 2 , which are not related to the formation of O 3 . NT, CT, and YCN regions have a combined NOx-limited and VOC-limited trend, and the potential improvement can be significant (58-76 ppb) in the NT region. The more VOC-limited trend in the CM and KP regions could be due to the impact of local geography. Owing to the many plateaus and hills in the CM and KP regions, the wind would be slowed down by higher terrain roughness, causing the accumulation of pollutants like NOx from the upwind regions. In the ET region, the O 3 isopleths suggest an obvious NOx-limited trend for all months. In summary, most of the regions are VOC-limited in April and October but NOx-limited in July. The suggestion of controlling VOC emissions in fall and winter is also coherent with the other study in Taiwan .

Limitations and future works
This study still has some limitations. First, the ML-MMF model needs to rely on enough monitoring stations (sample size) and good-quality of emission inventory to build a robust model. Multiple monitoring stations can explain the variance of space-related variables like emissions, meteorology, and land uses between sites and provide enough statistical power, and the emission inventory should reflect the various emissions around the monitoring stations. In our case, the employed TEDS emission inventory is updated every three years with finer to 1 km spatial resolution, thus it can reflect emissions around stations within space and time. For those regions or countries without enough monitoring stations, using remote sensing data from satellites may be an alternative way to obtain observations, but satellite data could be still biased by cloud and vertical column densities (Wu et al., 2022). For regions or countries without proper emission inventory, using a global emission database like ECLIPSE (Stohl et al., 2015) could be a potential solution to provide emission data around stations, but the spatial distribution method from national emissions needs careful assessment.
Second, the ML-MMF RSM results or O 3 isopleths are hardly validated by observations. Even though the ML-MMF O 3 isopleths with baseline emission ratio (emission ratio = 1) were validated by observations and have a satisfactory performance (Fig. A3, Mean NE = 2-6 %), O 3 responses under extreme emission scenarios are hardly validated by observations. Because there are no observations to validate the estimations under extreme emission scenarios like zero (emission ratio = 0) or double (emission ratio = 2) anthropogenic emissions. Thus, the margins of O 3 isopleths still remained potential uncertainty. Further study to assess the uncertainty of O 3 isopleths should be investigated in the future.

Conclusion
ML models become mainstream in atmospheric science because of more available real-time monitoring data and measurements. But its black-box modeling process, failure to involve physical and chemical mechanisms, and weak cooperation with the existing numerical models lower its stringency and persuasion. Except for developing more sophisticated model designs to increase predicting accuracy, the interpretability and explainability of predictions should be evaluated as well. In this study, the capability of the ML model to serve as a bias corrector (ML-MMF output) based on CMAQ modeled results and a forecaster (ML output) was examined and applied to predict O 3 nonlinear responses to anthropogenic NOx and VOC emissions. Three types of data were examined for constructing O 3 isopleths by RSM: CMAQ data, ML-MMF data (a bias corrector), and ML data (a forecaster).
Compared with CMAQ predictions (r = 0.41-0.80), both ML-MMF (r = 0.93-0.94) and ML predictions (r = 0.89-0.94) showed significantly improved performance in the benchmark case. While ML-MMF isopleths exhibit O 3 nonlinearity close to actual responses due to their numerical base and observation-based correction, ML isopleths present different O 3 ranges and distorted NOx and VOC-limited regimes compared with ML-MMF O 3 isopleths even though the ML model meets the statistical requirement in the benchmark case. Without involving CMAQ results, the ML model presents biased predictions concerning its different O 3 responses to NOx and VOC emission ratios. It also implies that only using historical observations or other auxiliary data without support from CMAQ or other CTMs to forecast the air quality could mislead the future trend. Meanwhile, after being corrected by observations, ML-MMF data present changed O 3 sensitivity compared with CMAQ data. The corrected O 3 isopleths emphasize the impact of transboundary pollution from mainland China on the local O 3 sensitivity, which transboundary NOx in April would make all air quality regions in Taiwan more sensitive to local VOC emissions and limit the potential effort by reducing local VOC emissions.
It is advisable for future ML applications in atmospheric science like forecasting or bias correction to provide interpretability and explainability while requiring modeling performance. Failing to interpret the interaction between predicted air quality, emissions, and environmental factors may mislead controlled targets and air quality policies. Assessment with interpretable physical and chemical mechanisms and constructing a statistically robust ML model should be equally important.