Typhoon-associated air quality over the Guangdong–Hong Kong–Macao Greater Bay Area, China: machine-learning-based prediction and assessment

. The summertime air pollution events endangering public health in the Guangdong–Hong Kong–Macao Greater Bay Area are connected with typhoons. The wind of the typhoon periphery results in poor diffusion conditions and favorable conditions for transboundary air pollution. Random forest models are established to predict typhoon-associated air quality in the area. The correlation coefﬁcients and the root mean square errors in the air quality index (AQI) and PM 2 . 5 , PM 10 , SO 2 , NO 2 and O 3 concentrations are 0.84 (14.88), 0.86 (10.31 µgm − 3 ), 0.84 (17.03 µgm − 3 ), 0.51 (8.13 µgm − 3 ), 0.80 (13.64 µgm − 3 ) and 0.89 (22.43 µgm − 3 ), respectively. Additionally, the prediction models for non-typhoon days are established. According to the feature importance output of the models, the differences in the meteorological drivers of typhoon days and non-typhoon days are revealed. On typhoon days, the air quality is dominated by local source emission and accumulation as the sink of pollutants reduces signiﬁcantly under stagnant weather, while it is


Introduction
The rapid and continuous economic and industrial development of China in recent decades has resulted in a mounting air pollution problem in the country.Major atmospheric pollutants, such as particulate matter (PM), O 3 , SO 2 and NO 2 , not only have important impacts on ecosystems, traffic safety and weather/climate but also seriously exacerbate human health issues and increase morbidity and mortality from cardiovascular and respiratory diseases (Che et al., 2005(Che et al., , 2014;;Lolli et al., 2018;Lolli, 2021;Zheng et al., 2020;Zhu et al., 2021).The Guangdong-Hong Kong-Macao Greater Bay Area (GBA), located in southern China, comprises nine municipalities of Guangdong Province, including Guangzhou and Shenzhen, and two Special Administrative Regions, Hong Kong and Macao.With a high population density of over 1100 people per square kilometer, the GBA is one of China's most heavily populated and urbanized areas.As a result, the area sees a high intensity of air pollutant emissions and frequent air pollution events (Deng et al., 2008(Deng et al., , 2011;;Hou et al., 2019).As well as the intense emission of pollutants, the other main factor influencing the air quality is the weather circulation pattern (Yang et al., 2018(Yang et al., , 2020a;;Zong et al., 2021).For instance, when light breezes and a temperature inversion layer occur in the surface layer of the GBA, the air quality deteriorates (Tong et al., 2018a;Ding et al., 2004;Huang et al., 2005;Yang et al., 2012).In contrast, the air quality is good when the wind speed in the area is high -for example, the strong southerly winds in summer and northerly winds that cross the northern mountains in winter (Chen et al., 2016;Tong et al., 2018a, b).
The GBA is continually affected by typhoons in summer (Ying et al., 2014;Lu et al., 2021), and as they make landfall, the air quality and synoptic situation in the region changes significantly (Ding et al., 2004;Huang et al., 2005;Lam et al., 2005;Feng et al., 2007;Wei et al., 2007Wei et al., , 2016;;Yang et al., 2012;Wang et al., 2022).The causes of typhoonassociated air pollution can be concluded as follows.On the one hand, the downdraft of the typhoon periphery leads to a large-scale temperature inversion layer, meaning light air and adverse pollutant diffusion conditions prevail in the area (Feng et al., 2007;Yang et al., 2012;Deng et al., 2019).Additionally, pollutants in the upper level are transported down to the lower atmosphere, where they accumulate under the impact of the downdraft.Consequently, the accumulation of local source emissions is aggravated, making the air quality bad or even severe (Huang et al., 2005;Wei et al., 2016).On the other hand, the various wind patterns of the typhoon periphery (mostly northerlies during pollution events) provide favorable conditions for transboundary air pollution from both outside the GBA and cities inside the GBA (Chow et al., 2018;Lam et al., 2018;Luo et al., 2018;Deng et al., 2019;Yim et al., 2019;Yang et al., 2019).However, there are still two issues with respect to typhoonassociated air quality in the GBA that have yet to be fully understood.(1) Which local meteorological factors play the dominant role in the change in different atmospheric pollutants during typhoon processes?(2) What are the differences in the dominant local meteorological factors between typhoon and non-typhoon processes?These two issues are of great significance to the forecast of air quality and the adaptions of air pollution in the GBA.
Quantitative analysis and the prediction of pollutant concentrations have become a focus in this field of study.Existing methods include numerical forecasting, statistical forecasting and machine learning.In terms of numerical forecasting, several models have been developed, such as CMAQ (Community Modeling and Analysis System; developed by the US EPA) and NAQPMS (a nested air quality prediction modeling system developed by the Institute of Atmospheric Physics, Chinese Academy of Sciences) (Arnold et al., 2003;Li et al., 2011).These models have been applied by some researchers to study typhoon-associated air quality, and results have revealed the impacts of meteorological conditions on the transportation and diffusion of air pollutants -for example, the downdraft, northerly winds and high near-surface air temperatures that boost the photochemical reaction that generates O 3 (Wei et al., 2016).Numerical experiments also led to the discovery that the contribution of cross-regional transportation varies with the wind field, and these studies reflect one of the advantages of the numerical modeling method, which is that they can analyze the formation mechanism of a specific pollution event (Huang et al., 2005;Lam et al., 2005).However, this approach also has its drawbacks, such as computational complexity and high data requirements.As for statistical methods, examples include clustering and multiple regression methods based on meteorological factors and weather types (Su et al., 2009;Singh et al., 2012).Although the calculations involved in these statistical meth-ods are simple, their predicted results exhibit uncertainties with large errors and local dependence (Ross et al., 2007;Singh et al., 2012).In contrast, machine learning methods perform very well in terms of accuracy and are already leveraged in many fields such as meteorology and the environmental sciences (Li et al., 2021;Zheng et al., 2021;Bochenek and Ustrnul, 2022;Chen et al., 2022).The forecasting of air quality is no exception.The most widely used algorithms include random forest (RF), support vector machines, extreme gradient boosting (XGBoost) and neural networks.The input variables include meteorological data and traffic flow data.Among the machine learning models, RF is an ensemble machine learning algorithm based on decision trees, which has certain advantages in capturing the nonlinear relationship between variables.Attempts made to employ RF in predicting air quality have produced promising results (Kamińska, 2018;Bai et al., 2019;Hu et al., 2021;Ding et al., 2022;Liu et al., 2022).
It is clear from the literature, as reviewed above, that there is a definite link between typhoons and air quality in the GBA.Nevertheless, the meteorological determinants of different kinds of pollutants during a typhoon event are still unclear.There is also little research on applying machine learning to predicting air quality with typhoon location and intensity data, and the accuracy of such predictions remains unknown.Therefore, in order to improve the accuracy of air quality prediction for the GBA during typhoon processes, the present research establishes an RF prediction model of typhoon-associated air quality in the GBA with air quality data (air quality index (AQI), PM 2.5 , PM 10 , SO 2 , NO 2 , O 3 ) from 39 air quality monitoring stations in 10 cities in the region, the China Meteorological Administration (CMA) tropical cyclone best-track dataset from 2014-2020, and meteorological data from the fifth major global reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ERA5).Also, for the non-typhoon days (NTY days) in the typhoon season (June-September), RF prediction models based on meteorological elements are established to analyze the changes in local meteorological determinants.The aim of this study is to improve the prediction and assessment of typhoon-associated air quality in the GBA, which not only is important from a scientific viewpoint but also has considerable practical application value for tackling the socioeconomic effects of typhoons and associated air quality.
2 Data and methods

Data
The present study takes 39 air quality monitoring stations in 10 cities in the GBA (Guangzhou, Shenzhen, Zhuhai, Foshan, Zhaoqin, Jiangmen, Huizhou, Dongguan, Zhongshan, Hong Kong) as research objects.Three of these stations from Guangzhou, Shenzhen and Hong Kong are used for indepen- Using the model constructed with the data above, the future air quality under the effect of the typhoon can be predicted.To be specific, the forecasted air quality can be acquired by replacing the ERA5 reanalysis meteorological data with the ECMWF's forecast field and introducing the predicted typhoon location and intensity (for example, from the CMA).The distribution of all typhoon samples and air quality monitoring stations is shown in Fig. 1.The data preprocessing procedure is given in Sect.S1 in the Supplement.

RF model
The RF algorithm, first proposed by Breiman (2001), is a kind of ensemble machine learning algorithm.The process for establishing the model is follows.Select a random sample with the replacement of the training set and train a large number of decision trees.For each tree, calculate the error at the node and split with the minimum error as the criterion until the designated maximum tree depth is reached.The average of the output of all trees is calculated as the model output.
One of the strengths of the RF model is that it can calculate the importance of features based on impurity, which means that it can calculate the feature's importance by the degree of error reduction brought about by it.The higher the importance value is, the more influential the feature will be.Because of these advantages, RF models have been applied to analyze causal relationships between variables and provide a powerful tool for determining the dominant factors among variables (Wang et al., 2019;Yang et al., 2020b;Zeng et al., 2020;Li et al., 2021;Venter et al., 2021;Chen et al., 2022).
Figure 2 presents the technology roadmap for establishing the RF model, which is described as follows.
Step 1: data acquisition and matching.This paper uses the scikit-learn package in Python to construct the RF forecast model with the typhoon location and intensity data (on typhoon days, TY days), the location of monitoring stations and meteorological data as input variables, and the AQI and concentrations of PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 as the predicted variables.
Step 2: RF model establishment and cross-validation.First, the dataset (data from 36 stations) is randomly divided into a 70 % training set and 30 % testing set.The hyperparameter tuning and model training process is conducted on the training set.The hyperparameter tuning process refers to determining the best hyperparameters, which means the parameters must be set manually in advance.The testing set is used for evaluating the RF model's ability to predict the unseen data.To avoid the bias caused by the splitting of the training and testing sets, 10-fold cross-validation (CV) is adopted in the training set, which is to say that the training set is divided into 10 parts, 9 of which are used as the training set of the tuning process, and then the performance of the remaining one, called the validation set, is tested.This therefore ensures that the optimal parameters of the model that are found are not affected by data partitioning.The hyperparameters adjusted in the present study are described in Sect.S2.Afterward, the training set and the testing set were applied to the optimal model, and the feature importance of the model output was analyzed to obtain the dominant meteorological factors of each model.
Step 3: model prediction and verification.Once the optimal model is established, the training set, testing set and data from testing stations in Guangzhou, Shenzhen and Hong Kong are applied to the model separately, and a series of model evaluation metrics are calculated, including the mean absolute error (MAE), root mean square error (RMSE), bias, correlation coefficient between the observed and predicted value (R), standard deviation of the observation (SD O ), standard deviation of the prediction (SD P ), and index of agreement (IA).The definitions of these metrics are given in Table 1.Among these indicators, the smaller the bias, MAE and RMSE are, the better the model performs; the closer R and IA are to 1, the more ideal the result is; and the closer SD O and SD P are, the better the model is overall.If RMSE is lower than SD O , the IA is high, and SD O is close to SD P , then the prediction is satisfactory (Lu et al., 1997).

TY-associated model
The RF model is applied to the AQI and five pollutants to establish six distinct RF models (the hyperparameters of the six models can be seen in Table 2; 70 % of the samples from https://doi.org/10.5194/amt-16-1279-2023Atmos.Meas.Tech., 16, 1279-1294, 2023 Table 1.The definition of evaluation metrics of the model.
Notation: p i is the predicted value, O i is the observed value, N is sample size, O is the mean of observed value, p is the mean of predicted value, and φ i is the difference between the predicted and observed values.
36 stations are used as the training set and 30 % as the testing set).
The training and testing results for the AQI, PM 2.5 and PM 10 are shown in Fig. 3a, b, d, e, g and h.The R values between the observed and predicted value of the training set (testing set) are 0.986 (0.843), 0.986 (0.859) and 0.983 (0.837), respectively; the RMSEs are 5.43 (14.88), 3.88 µg m −3 (10.31 µg m −3 ) and 6.33 µg m −3 (17.03 µg m −3 ); and the biases are 0.10 (−0.07), 0.06 µg m −3 (0.20 µg m −3 )  some extent.When there is rainfall related to a typhoon in the GBA, wet deposition will reduce the concentration of PM.The reason why total precipitation is less important than d 2 m could be that the latter always has a value and is more variable.The similarity between the AQI, PM 2.5 and PM 10 results reveals that the major pollutant in the ambient air of the GBA during typhoon events is PM, since the AQI value is the highest of the individual air quality index.The other category is the PBLH-driven type, which includes SO 2 , NO 2 and O 3 .Obviously, the major meteorological influence in this case during typhoon events is the PBLH.Nevertheless, the situation for SO 2 is unlike that of the other two.The most important variables affecting the SO 2 concentration after the PBLH are U 10 m and V 10 m .Indeed, this is the highest U 10 m and V 10 m importance among the six models, indicating that the SO 2 in the GBA may mainly derive from transboundary transportation.The variable importance for NO 2 and O 3 exhibits very similar characteristics because they are both pollutants closely related to photochemical reactions.Under certain conditions, the free radical reaction of NO 2 can generate O 3 (Lam et al., 2005(Lam et al., , 2018;;Zhang et al., 2013;Deng et al., 2019).It is also worth noting for these two pollutants that the PBLH, which has the highest rank of importance among all variables, is more than twice as important as the secondhighest variable, and this is distinct from the other four models.
Additionally, this paper uses three testing stations in Guangzhou, Shenzhen and Hong Kong, which are excluded from the training and testing set mentioned above, to further investigate the generalization ability of the model.The results for the AQI, PM 2.5 and PM 10 of TY days are shown in Fig. 4a-c.The R (RMSE) values for AQI, PM 2.5 and PM 10 are 0.868 (11.70), 0.900 (7.16 µg m −3 ) and 0.841 (13.45 µg m −3 ), respectively.As for SO 2 , NO 2 and O 3 , Fig. S4a-c show the R (RMSE) values of the testing set for SO 2 , NO 2 and O 3 are 0.496 (5.38 µg m −3 ), 0.538 (27.94 µg m −3 ) and 0.878 (22.45 µg m −3 ), respectively.These results are not significantly different from the results for the previous 36 stations, indicating that models trained with some station data also predict equally well in new locations, which is to say that the RF model successfully captures the correlation between the typhoon's location and the monitoring stations' location.Though the input stations changed, the model still produces accurate predictions based on the relative position of the station and the typhoon.
Overall, the model has outstanding predictive ability for the AQI and five air pollutants and makes correct predictions for the new stations that are unseen in the training stage.The present study also highlights that the typhoon location variables of Tlat and Tlon are more important than the typhoon intensity variables of Tpres and Tws, showing that the major driving factor in modifying the synoptic situation in the GBA and thereby changing the AQI value is typhoon location.The role of typhoon intensity requires further study.The dominant meteorological drivers of typhoon-associated air quality are also revealed by the RF model: for the AQI and concentrations of PM 2.5 and PM 10 it is d 2 m , while for SO 2 , NO 2 and O 3 it is the PBLH.

NTY-associated model
We then use the meteorological data, station location and air quality data of the NTY days during the typhoon season (June-September) to build RF models (the hyperparameters of the six models can be seen in Table 2).Similarly, the model prediction accuracy and output feature importance are calculated and compared with the results of TY days.The training and testing results for the AQI, PM 2.5 and PM 10 are shown in Fig. 5a, b, d, e, g and h.The R values between the observed and predicted value of the training set (testing set) are 0.979 (0.745), 0.978 (0.744) and 0.978 (0.708), respectively; the RMSEs are 5.52 (15.11), 3.60 µg m −3 (9.68 µg m −3 ) and 5.65 µg m −3 (15.45 µg m −3 ); and the biases are 0.19 (0.57), 0.12 µg m −3 (0.27 µg m −3 ) and 0.18 µg m −3 (0.48 µg m −3 ).Compared with the prediction results of the TY days, the prediction accuracy is significantly reduced, and the R values are all reduced to below 0.8.Figures S5-S7 show the R (RMSE) values of the testing set for SO 2 , NO 2 and O 3 are 0.452 (7.00 µg m −3 ), 0.744 (11.63 µg m −3 ) and 0.867 (24.18 µg m −3 ), respectively.The prediction accuracy of the model is significantly poorer compared with the model of TY days, and it can be seen that the maximum pollutant concentration on NTY days is significantly larger than that on TY days, indicating that the period of air quality deterioration in the GBA coincides with the period of typhoon activity.
The feature importance of model predictions on NTY days is significantly different from that on TY days.For AQI and PM 2.5 , the meteorological driver is longitudinal wind (V 10 m ), while for PM 10 it is the latitude of the monitoring station (lat).Considering that the southern part of the GBA is close to the sea, and the farther north one goes, the farther it is from the sea, V 10 m can represent the strength of the sea breeze and land breeze, and lat can be seen as the distance from the sea.By contrast, their meteorological determinants are all d 2 m on TY days, and this change indicates that the typhoon deters the pollutants from being blown away and replaced by clean air from the ocean, which is the major sink of pollutants on NTY days.Therefore, haze occurs.As for the pollutants classified as the PBLH-driven type -SO 2 , NO 2 and O 3their meteorological drivers on NTY days are V 10 m , St and PBLH, respectively.
Consistent with the TY-associated model, three testing stations from Guangzhou, Shenzhen and Hong Kong are introduced into the NTY-associated model.The results for the AQI, PM 2.5 and PM 10 of NTY days are shown in Fig. 4df.The R (RMSE) values for AQI, PM 2.5 and PM 10 are 0.835 (11.65), 0.825 (7.24 µg m −3 ) and 0.740 (12.76 µg m −3 ), respectively.As for SO 2 , NO 2 and O 3 , Fig. S4d-f   In general, the prediction results indicate that the RF model can accurately and effectively capture the mechanism of the impact of typhoons on air quality.Additionally, differences in meteorological determinants between TY and NTY days also have important implications for air quality in the GBA: for PM, the prevailing sea breeze is the major scavenging mechanism on NTY days and is deterred by the various wind patterns of the typhoon periphery on TY days, while for SO 2 , NO 2 and O 3 , on TY days, their concentrations are strongly affected by the PBLH, and the effects of local emission and accumulations are more dramatic than transboundary air pollution, causing pollution events.In contrast, on NTY days, transboundary air pollution is more obvious than the local pollutant emissions.These findings shed new light on the control of regional air pollution in the GBA, which is to say that different strategies should be adopted on TY and NTY days.On NTY days, countermeasures should focus more on source emission control and make full use of the diffusion and cleaning effect of the sea breeze to reduce air pollution.Coordinated emission reduction in the region should be strengthened to reduce the concentration of pollutants in the entire region at the same time.On TY days, more focus should be on increasing the sink of pollutants (which is decreased due to the static and stable weather of the typhoon periphery).Countermeasures should be taken to increase the sedimentation and decomposition of pollutants in the area, such as more road watering.

Model-predicted correlation between air quality and typhoon center location
To further investigate the RF model's ability to capture the correlation between typhoon location and air quality in the GBA, each position within the research area (at a spatial interval of 0.5 • ) is input into the RF model as the position of the typhoon to predict the AQI and concentrations of PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 (the typhoon intensity and meteorological variable values are the averages of all typhoons within the specified spatial interval).Figure 6 shows the average of the predictions across all stations.In all six models, the RF model predicts a low level of air pollution in the GBA when the typhoon is located in the southwest sea area of the GBA, close to Hainan Island.This is because of the relatively clean southerly winds from the sea brought by the cyclonic circulation, significant wind speed and precipitation when typhoons are located here.All these meteorological conditions are highly favorable for the deposition and removal of pollutants, and the result is consistent with the findings of previous studies (Yang et al., 2012;Chow et al., 2018;Luo et al., 2018;Yang et al., 2019).By contrast, the air quality in the GBA deteriorates when a typhoon is located over the waters from the Philippines to the island of Taiwan and in the most northerly area over the waters near 30 • N. The maximum average concentration of PM 10 exceeds 80 µg m −3 .It is worth noting that the spatial distribution characteristics of the AQI, PM 2.5 and PM 10 are very similar because the primary pollutant in the GBA during typhoon weather is likely to be PM, as mentioned above.The distribution of typhoons during SO 2 pollution weather is mainly over the sea area to the east of the island of Taiwan (16-27 • N), with the maximum SO 2 concentration predicted by the model reaching 20 µg m −3 .However, the prediction results for NO 2 and O 3 are scattered, which may be because their associated photochemical reactions are greatly affected by solar radiation, so the concentrations of these two pollutants possess diurnal variation, which will cause uncertainty in the predictions of the RF model.Nevertheless, the model still accurately captures the overall spatial distribution characteristics; when a typhoon is located over the waters on the southwest side of the GBA, near Hainan Island, the pollutant concentrations are low, but when a typhoon is over the waters near the island of Taiwan (117-125 • E), they are high.

Case verification
This paper takes Typhoon Danas (2019) as an independent case to analyze the model's ability to predict typhoonassociated air quality over the GBA.For better evaluation of the RF model, Typhoon Danas's data have been removed from the dataset in the training and testing steps.The active time of Danas was 14-21 July 2019, with a minimum central pressure of 980 hPa.It did not make landfall in China, and its path traveled northwards along the eastern coast of the island of Taiwan.During this typhoon event, a significant pollution episode occurred in the GBA (Fig. 7).The synoptic chart shows northerly winds from inland prevailed in the GBA during the event (17-19 July), which caused pollutants to be transported from inland to the GBA.Meanwhile, the GBA was under high pressure, which was also unfavorable for the diffusion of pollutants (Fig. S8). Figure 7 presents the observed and predicted AQI value and concentrations of PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 .As Fig. 7a depicts, the track of Danas was L-shaped, which coincides quite well with the typhoon locations that cause pollution as predicted by the RF model.Around 16 July, the typhoon turned north over the sea near the Philippines and then moved along 123 • E longitude, gradually increasing in intensity.The observed data also show a pollution event in the GBA during this period.First, we examine the spatial distribution of the AQI (Fig. 8a-b).The AQI of the GBA is higher in the northern area than in the southern area during the pollution event.This may be because the southern part is closer to the sea and is affected by a stronger sea breeze, and the RF model successfully predicts this distribution with high accuracy.The distributions of PM 2.5 (Fig. 8c-d) and PM 10 (Fig. 8e-f) are similar, but the model slightly overestimates their concentrations.The spatial distributions of the SO 2 , NO 2 and O 3 (Fig. 8g-l) concentrations are relatively scattered, and, except for the underestimated concentration of SO 2 , the predicted results are quite accurate.
Regarding the numerical accuracy of the prediction, Table 3 lists the model evaluation metrics calculated by the average model output.In terms of MAE and RMSE, the largest values are for the predicted O 3 , which are 15.047 and 18.319 µg m −3 , respectively.Meanwhile, the smallest MAE (RMSE) is found for PM 2.5 (SO 2 ), which is 4.117 (4.876) µg m −3 .The R values between the observations and predictions of the AQI and five pollutants all exceed 0.7, with that of the AQI and O 3 even exceeding 0.85.The bias values of the predicted AQI and five pollutants are all less than 0, indicating that the RF model tends to underestimate in this case.The RMSEs of the result of the AQI, PM 2.5 , PM 10 , NO 2 and O 3 are lower than the SD O values, and the SD O and SD P values of all the pollutants are quite close.Furthermore, the IA is high.Among all the models, the IA of the AQI, PM 2.5 and O 3 exceeds 0.9, indicating that these three air quality parameters perform the best in this case.The evaluation metrics of the results in 10 cities are listed in Tables S1-S10 in the Supplement, revealing that 39 (66 %) of all air quality parameter predictions in these cities have an RMSE less than the SD O , and 31 (53 %) have an IA exceeding 0.8.Generally, the best-performing pollutants are PM 2.5 and O 3 , as judged by the metrics, while the performance with respect to SO 2 needs improvement.The MAE and RMSE values obtained by city are both larger than the values obtained by the average over the entire GBA because the averaging process eliminates some random errors.
In summary, the evaluation metric results are extremely encouraging and indicate a satisfactory prediction by the RF model of the Danas-associated air quality in the GBA.Moreover, the RF model obtains temporal information from the diurnal variation in the input features such as typhoon intensity to accurately predict the diurnal fluctuations in NO 2 and O 3 , which reflects the model's ability to capture the nonlinear  relationship and its potential for tackling complex prediction problems.

Conclusions and discussions
Typhoons are highly active weather systems in summer that have substantial effects on the synoptic situation in the entire southern part of China, including the Guangdong-Hong Kong-Macao Greater Bay Area.In addition to causing violent winds, rainfall and storm surges in the area close to their location, typhoons also affect the background circulation situation in areas more distant from their immediate vicinity.For instance, the typhoon periphery downdraft brings about light winds, stagnant weather, high temperatures and a low planetary boundary layer, and consequently it has a detrimental impact on the generation, transportation and diffusion of air pollutants, causing hazy weather.The Guangdong-Hong Kong-Macao Greater Bay Area, located at the southernmost tip of the Chinese mainland, is often affected by typhoons.Therefore, air pollution events associated with typhoons in the GBA are prevalent in summer.The present study employs the RF model to predict the typhoon-associated air quality quantitatively.The R (RMSE) values of the testing set for the AQI, PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 are 0.843 (14.88), 0.859 (10.31 µg m −3 ), 0.837 (17.03 µg m −3 ), 0.510 (8.13 µg m −3 ), 0.799 (13.64 µg m −3 ) and 0.894 (22.43 µg m −3 ), respectively.To test the generalization ability of the model, three monitoring stahttps://doi.org/10.5194/amt-16-1279-2023Atmos.Meas.Tech., 16, 1279-1294, 2023 tions in Guangzhou, Shenzhen and Hong Kong are selected as testing stations and are excluded from the training procedure.For these three stations, the R (RMSE) values for AQI, PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 on TY days are 0.868 (11.70), 0.900 (7.16 µg m −3 ), 0.841 (13.45 µg m −3 ), 0.496 (5.38 µg m −3 ), 0.538 (27.94 µg m −3 ) and 0.878 (22.45 µg m −3 ), respectively.The results are satisfactory overall.Then, the model is verified using the case of Typhoon Danas (2019).The results are averaged over the GBA, and the R (RMSE) values of the AQI, PM 2.5 , PM 10 , SO 2 , NO 2 and O 3 are 0.862 (7.458), 0.841 (5.136 µg m −3 ), 0.793 (8.135 µg m −3 ), 0.727 (4.876 µg m −3 ), 0.827 (5.633 µg m −3 ) and 0.952 (18.319 µg m −3 ), respectively.The prediction is accurate for both the air quality of one city and the average air quality in the GBA.In contrast, using meteorological data to predict the air quality of NTY days, the accuracy is significantly lower than the results of TY days, indicating that the impact mechanism of typhoons on air pollution is accurately captured by the model, and it is important for the improvement in model prediction accuracy.Another important finding of the present study is the difference in feature importance output by the RF model on TY days and NTY days.On TY days, the meteorological driver of AQI, PM 2.5 and PM 10 is the d 2 m that represents the air humidity, while SO 2 , NO 2 and O 3 are dominated by the height of the boundary layer.Differently, on NTY days, their dominant meteorological factors were changed, and the importance of variables representing regional transportation and sea breeze diffusion was significantly higher than that in TY days.These findings suggest that the prevailing sea breeze is the major scavenging mechanism of pollutants on NTY days and is deterred by the various wind patterns of the typhoon periphery on TY days.This implies that different control strategies should be adopted on TY days and NTY days.On TY days, countermeasures should be taken to increase the sink of pollutants in order to compensate for the effect of the weakened sea breeze and the hindered diffusion of pollutants caused by the static and stable weather of the typhoon periphery.
Moreover, the present study also highlights the following.
1.The feature importance output by the RF model indicates that the typhoon location is more important than the intensity, suggesting that the most significant factor in modifying the synoptic condition, and thereby changing the air quality, is the location of the typhoon center.
2. By sampling at a spatial interval of 0.5 • and inputting the data into the RF model as the typhoon center location, the prediction result is consistent with previous studies; the air quality in the GBA deteriorates when the typhoon passes over the waters near the island of Taiwan.
3. The concentrations of NO 2 and O 3 possess diurnal variation as a result of their photochemical reactions in the atmosphere, and the RF model predicts this diurnal cycle with high accuracy because of the diurnal variation in the input variables such as air temperature, PBLH, typhoon intensity and wind speed.
Overall, the RF model achieves good results in predicting typhoon-associated air quality.Compared with approaches adopted in previous research, such as numerical simulation and statistical modeling, the RF model has the advantages of high accuracy and convenient application and produces a precise quantitative prediction of typhoon-associated air quality in the GBA.At the same time, the importance of features revealed by the model also shed new light on regional pollution control on typhoon days.Of course, the impact of typhoons on air quality is not limited to the GBA, but the model structure provided in the present study can be applied conveniently to various areas, which gives it significant application value for air pollution prevention and control.It is worth mentioning that not all typhoons affect the air quality in their area of impact because of the substantial variability in typhoon tracks.The R and RMSE values in the case study are better than those of the whole dataset, reflecting that some typhoons in the dataset do not directly affect the air quality in the GBA.Meanwhile, as mentioned earlier, the air quality is also affected by factors such as source emissions.The RF model's prediction of the air quality in the GBA under these scenarios merits further study.

Figure 1 .
Figure 1.Overview of the data used in this study: (a) tracks of the studied typhoons (only those typhoons within the dotted box area are introduced into the model); (b) locations of the 39 observation stations.

Figure 2 .Figure 3 .
Figure 2. Flow chart of the study framework.

Figure 4 .
Figure 4.The results of AQI, PM 2.5 and PM 10 of three testing stations predicted by the RF model: (a) AQI of TY days; (b) PM 2.5 of TY days; (c) PM 10 of TY days; (d) AQI of NTY days; (e) PM 2.5 of NTY days; (f) PM 10 of NTY days.

Figure 5 .
Figure 5.The results of NTY days' AQI, PM 2.5 and PM 10 predicted by the RF model: (a) training set of AQI; (b) testing set of AQI; (c) feature importance of AQI; training set of (d) PM 2.5 and (g) PM 10 ; testing set of (e) PM 2.5 and (h) PM 10 ; feature importance of (f) PM 2.5 and (i) PM 10 .

Figure 6 .
Figure 6.The correlation of air quality over the GBA and typhoon center location predicted by the model.(a) The correlation of AQI and typhoon center location predicted by the model, as well as for (b) PM 2.5 , (c) PM 10 , (d) SO 2 , (e) NO 2 and (f) O 3 .The scattering points indicate the average value of air quality when the typhoon is located in the corresponding location.

Figure 7 .
Figure 7. Track of Typhoon Danas (2019) and the observed and model-predicted air quality (the value of a city is the mean value of all its stations): (a) track and minimum pressure of Typhoon Danas from 20:00 CST (China standard time) on 15 July 2019 to 14:00 CST on 20 July 2019; (b) the observed AQI value; (c) the model-predicted AQI value; (d) the observed PM 2.5 concentration; (e) the model-predicted PM 2.5 concentration; (f) the observed PM 10 concentration; (g) the model-predicted PM 10 concentration; (h) the observed SO 2 concentration; (i) the model-predicted SO 2 concentration; (j) the observed NO 2 concentration; (k) the model-predicted NO 2 concentration; (l) the observed O 3 concentration; (m) the model-predicted O 3 concentration.

Table 2 .
The best hyperparameters of the model.).As for the MAE and IA, the RF model also performs well.The IA of the testing is as high as 0.894, 0.906 and 0.895.It can be seen that the red points in the training set are mostly close to the diagonal line, which means that the RF model makes an accurate prediction over the majority of the samples.Although the data points for the testing set are not as dense as those for the training set, the sample with the most frequency is still relatively close to the y = x line, indicating that the RF model has good predictive ability for unseen data.Concerning the feature importance (Fig.3c, f, i), the dominant factor of the AQI is d 2 m , which represents the atmospheric humidity, followed by the static stability.The first two factors have approximate importance values, reflecting that the meteorological determinants of the AQI in the GBA during typhoon events are humidity and static stability.Among all the typhoon information data, the importance of Tlon and Tlat is intermediate among all the variables, while the importance of Tpres and Tws is the lowest.It can be concluded that the typhoon center location rather than the typhoon intensity is the key to modifying the synoptic situation in the GBA, thereby changing the AQI value.Similarly, Figs.S1-S3 in the Supplement show the R (RMSE) values of the testing set for SO 2 , NO 2 and show the R (RMSE) values of the testing set for SO 2 , NO 2 and O 3 https://doi.org/10.5194/amt-16-1279-2023Atmos.Meas.Tech., 16, 1279-1294, 2023

Table 3 .
Evaluation metrics of the model prediction of the case of Danas in the GBA.