Urban PM2.5 Diffusion Analysis Based on the Improved Gaussian Smoke Plume Model and Support Vector Machine

With the acceleration of urbanization in China, haze has become a growing threat to human health. However, comprehensive research on the diffusion and evolution of PM2.5 is still lacking. Therefore, this study proposed an improved Gaussian smoke plume model that considered the influence of multiple factors, such as rain wash, gravity sedimentation, and surface rebound, on PM2.5. Additionally, the evolution of PM2.5 was predicted by selecting 9 factors with a large influence. In the prediction, a support vector machine (SVM) and radial basis function kernel were adopted to construct classifiers and obtain the maximum distinction degree, respectively. Finally, the diffusion simulation and experimental evolution prediction were verified using data obtained from nine PM2.5 monitoring stations in Wuhan. The experimental results showed that the algorithm could obtain considerably accurate simulation results of the PM2.5 diffusion with low error for measured values. Therefore, this model may be useful in government plans for formulating strategies that control and reduce environmental pollution.


INTRODUCTION
Industrialization in China has remained dynamically innovative over the past 30 years; however, as a consequence, China is now facing significant environmental problems (Bell et al., 2014). Common environmental problems include air pollution, water pollution, and soil pollution. Among them, air pollution, especially PM 2.5 particles in the air, threatens human health directly (Chafe et al., 2015). PM 2.5 refers to fine particulate matter in air that are less than or equal to 2.5 µm in diameter. These particles can absorb bacteria, viruses, and harmful pollutants (Atkinson et al., 2014). The concentration of PM 2.5 is closely related to haze weather, which directly influences human health and causes various kinds of diseases, such as cardiovascular diseases (Kloog et al., 2015). Therefore, governance on haze weather and PM 2.5 has become the focus of the government, environmental protection departments, and the general public (Kioumourtzoglou et al., 2016).
Thus far, numerous scholars have studied the causes, related factors, and the harmful effects of PM 2.5 to the environment and human health. Huang et al. (2017) studied PM 2.5 diffusion using NOAA, NAQFC, and other indexes; Kim et al. (2017) conducted long-term prediction of PM 2.5 index diffusion using the history prediction model; Hu et al. (2015) explored PM 2.5 diffusion using U.S. MODIS data with a three-phase model; Gan (2012) completed diffusion prediction through a point source Gaussian smoke plume model for the first time. Gao et al. (2017) predicted winter PM 2.5 concentration in the north of China, and also conducted diffusion research using the Gaussian plume model; Cao et al. (2016) conducted PM 2.5 diffusion research using the multi-scale adaptive KLMS kernel function; Zhou et al. (2003) comprehensively analyzed the influencing factors of atmospheric pollution through weather forecasting and a statistical method, and identified the influencing factor associated with PM 2.5 ; Di et al. (2016) conducted PM 2.5 diffusion research using the chemical transmission model, and predicted PM 2.5 using linear regression; Zhang et al. (2013) predicted the evolution of PM 2.5 through multiple linear regressions, and achieved good prediction. You et al. (2016) predicted the evolution of PM 2.5 using the geometric weighting regression model; Zhan et al. (2017) completed PM 2.5 evolution prediction of China's fog areas using the airspace dominant machine learning method; Huang et al. (2014) achieved accurate prediction of PM 2.5 with non-point source Gaussian smoke plume model, considering the influence of various factors on PM 2.5 to ensure more accurate prediction results. On this basis, Liu and Ding (2015) determined the influencing factors of PM 2.5 , identified key factors with the largest impact on PM 2.5 through a principal component analysis model, and established the theoretical basis for the evolution prediction of PM 2.5 .  built the Gaussian plume model by considering factors such as wind speed, simulated the diffusion of PM 2.5 , and obtained ideal results.
According to existing literature, the diffusion simulation and evolution prediction of PM 2.5 still require further study. In this study, an improved Gaussian smoke plume model was built on the basis of the existing Gaussian smoke plume model. The new model took rain wash, gravity sedimentation, and surface rebound into full consideration. In the evolution prediction of PM 2.5 , the key influencing factors were extracted from the literature through principal component analysis. Then, they were transformed into feature vectors, and input into support vector machine (SVM) for training. Next, the trained SVM was used to complete the PM 2.5 evolution prediction of the follow-up time. Finally, a corresponding simulation experiment was conducted using data provided by nine PM 2.5 monitoring sites in Wuhan, and the improved algorithm proposed in the paper was verified.

Gaussian Smoke Plume Model
The Gaussian smoke plume model is a pollutant diffusion model of PM 2.5 in the atmosphere established on the assumption of normal distribution. Among numerous diffusion prediction models, Gaussian smoke plume models based on turbulence statistics can provide more accurate predictions for simulating actual conditions (Ristic et al., 2015). Therefore, research on diffusion models of PM 2.5 generally adopt Gaussian smoke plume models. The distribution of the Gaussian function can help simulate all kinds of random processes. The current Gaussian smoke plume model includes a point diffusion model, closed diffusion model, and surface diffusion model, which are applied to the diffusion simulation of different pollutants. The point diffusion model is usually constructed in the diffusion modeling of PM 2.5 .
Gaussian smoke plume models can simulate the concentration distribution of PM 2.5 , and areas of dangerously high concentrations, as well as indicate the concentration diffusion radius, and also reflect the concentration peak of a certain concentration diffusion point with changes in time. Fig. 1 shows the classic Gaussian point source smoke plume diffusion model. In the model, t = 0 signifies the origin coordinates of the PM 2.5 point pollution sources; after t, the contamination concentration of the point (x, y, z) in 3D space is marked with C(x, y, z). According to the calculation equation of surface flux, the flux Q per unit area per unit time could be obtained: (1) σ denotes diffusion coefficient; gradC(x, y, z) denotes pollution concentration gradient because the pollutants diffuse from high to low concentration areas. The transformation equation of pollutants could be obtained from turbulent diffusion and atmospheric molecular diffusion theory: The X axis signifies the downwind diffusion direction axis of the pollution source; the Y axis is the horizontal direction axis; the Z axis is the vertical upward direction axis; and σ x , σ y , and σ z are the diffusion coefficients of the respective corresponding directions, and the coefficients are closely related to the atmospheric stability and downwind distance.  signifies the dissolution degree. Eq.
(2) is the parabolic partial differential equation in infinite space. The source function of coordinate origin was recorded as initial conditions: C(x, y, z, 0) = Qδ (x, y, z) (3) Q denotes the total amount of pollutants; δ(x, y, z) is the intensity function of the point source. From the limit of the H(m) signifies the effective emission height of PM 2.5 ; k denotes the average wind speed. The model assumes that the wind speed is horizontal in the X-axis direction. Through the definition of Gaussian smoke plume model C (x, y, z, H(m)), the model could simulate the diffusion of PM 2.5 concentration at various points through normal distribution in the three-dimensional space. In fact, because the model was based on turbulent statistics, the following assumptions were made during the process of modeling: (1) Pollutants presented Gaussian distribution in space; (2) The space wind speed was uniform, and the direction of wind did not change; (3) The distribution of pollutants was continuous in space; (4) The process of diffusion followed the law of conservation of energy, and the conversion rate was 100%.

Gaussian Smoke Plume Model Modified with Multiple Factors
In the actual prediction process of PM 2.5 , interference conditions (Hosseini et al., 2016), such as air humidity, gravity sedimentation, and earth surface reflection, exist. In order to solve the inaccurate diffusion prediction of traditional Gaussian smoke plume models under the actual conditions of several influencing factors, the modified models were established in view of the three largest influencing factors (air humidity, gravity sedimentation and earth surface reflection) such that the Gaussian smoke plume model satisfied the various constraints, and the accuracy of the predicted PM 2.5 concentration in air could be increased.
(1) Air humidity factor: When air humidity is high, during fog, rain, and snow events, some particles in the air are cleaned, and soluble gases will be incorporated into the water; when air humidity is higher, some PM 2.5 particles will suffer erosion or sedimentation, leading to changes in the diffusion region (Elperina et al., 2016). It was assumed that the coefficient of PM 2.5 was washed or subsided as φ. The relationship between the coefficient and moisture impact strength is as follows: I signifies the strength of the effect of moisture; a and b are parameters, and their experience values were usually taken. Considering erosion or sedimentation, the source point strength was modified as follows: k denotes the average wind speed. The adjusted source point strength can be expressed as follows: Q(x) signifies the original source point strength.
According to the change of the source point, the Gaussian smoke plume model in Eq. (4) was modified and can be represented as follows: When z = 0, the concentrations distribution of ground PM 2.5 could be obtained: When z = y = 0, the concentrations distribution of PM 2.5 at the ground axis direction could be obtained: The Gaussian smoke plume model was modified by regarding the influence of erosion or sedimentation as a weight to make the model obtain accurate simulation results of PM 2.5 diffusion in a high-humidity environment.
(2) Gravity sedimentation factor In actual observation scenarios, PM 2.5 particle diffusion is influenced by rain wash as well as sedimentation due to gravity (Ristic et al., 2015). For sedimentation caused by gravity, sedimentation velocity generally depends on the combined action of the particles' own gravity and the acting force of air on particles. In the gravitational theory, it could be expressed as follows by the Stoke formula: ρ denotes the density of PM 2.5 particles; g is acceleration due to gravity; D is the diameter of PM 2.5 particles; µ is the air reacting force coefficient; V s signifies the sedimentation velocity of particles under the action of gravity. Under the effect of downward speed produced by gravity sedimentation, after time t, the descending distance of PM 2.5 particles on X axis is as follows: Within t time in the X-axis direction, the result of changes in height is H − (V s x)/µ; according to the diffusion theory and momentum transfer theory, the amount of ground sedimentation produced by the height change is as follows: Under the action of ground sedimentation volume, the ground PM 2.5 concentration distribution was correspondingly modified: (3) Earth surface reflection factor: Earth surface reflection is another major factor influencing PM 2.5 particles. When PM 2.5 particles undergo washing or gravity sedimentation on a building surface or the ground, some PM 2.5 particles are rebounded into the air; these particles could still cause secondary diffusion (Juodis et al., 2016). The rebounded PM 2.5 particles also require further weighting, and therefore, the original Gaussian smoke plume model was modified. It was assumed that the ground rebound coefficient in the rebound model was α; for a point P in space, pollutants can be considered as the sum of two parts; one part is PM 2.5 concentrations produced by sedimentary volume, and the other part is PM 2.5 concentrations produced by the ground reaction. The sum of the two parts is the PM 2.5 particle concentrations at current point P: In addition, the bounce produced by building surfaces need to be taken into account. Buildings generally produce tilting bounce. Assuming that the rebounded coefficient is β, the revised PM 2.5 particle concentration at point in the space is as follows: By combining with Eqs. (15) and (16), a condition of α = 0, β = 0 implies that ground/building surface pollutants were completely absorbed; in contrast, α = 1, β = 1 implies that the ground/building surface pollutants were completely rebounded.
The modified Gaussian smoke plume model could successfully simulate the diffusion of PM 2.5 considering air humidity, gravity sedimentation, and earth surface reflection in actual situations. In addition, the diffusion area in the three-dimensional space could be simulated through the model. Nevertheless, in addition to the model building of diffusion scope, the PM 2.5 value at each sampling point of the diffusion also needed to be estimated, which is influenced by many factors.

EVOLUTION AND PREDICTION OF PM 2.5 CONCENTRATION BASED ON SVM
According to the correlation analysis of PM 2.5 , the concentration level was influenced by many factors, generally CO 2 , SO 2 , O 3 , H 2 O, temperature, and humidity. Thus, it was difficult to obtain good generalization through a mathematical model . In the study, from the viewpoint of machine learning, nine important influencing factors of PM 2.5 were considered as feature vectors; then, prediction and regression were performed with machine learning such that the PM 2.5 concentration in the area would automatically evolve over time. Based on statistical rules, there was no obvious linear relationship between the air pollution factors and PM 2.5 , and various nonlinear relationships further complicated model depiction. On this basis, SVM was used for training the influencing factors. PM 2.5 evolution was predicted using a data training model. The model could use the kernel method to describe all kinds of nonlinear relations, had good adaptability to different samples, and was suitable for solving various nonlinear problems.
SVM is a type of machine learning classifier based on the statistical learning method (Hou et al., 2016). Mapping low dimensional input vectors to high dimensional feature space through nonlinear mapping allows the feature vector to always be split by the hyperplane in high dimensional space. For this purpose, the classification surface of the data requires optimization, namely, the maximum interval between the data on the classification surface. The optimal classification plane w × x + b = 0 was analyzed using SVM, where w denotes the weight, x denotes the support vector, and b is additive bias. Solving the optimal classification plane means solving the quadratic programming problem of w and b: By solving the optimization problem of quadratic programming, the optimization of weights w and b could be solved through the Lagrange multiplier method and KKT conditions. Then, the optimal results of the sample classification could be obtained.
In fact, almost all of the practical problems cannot meet the linear separable data set x, and linear inseparable data sets cannot be classified only through linear classification. On this basis, SVM combines slack variables and allows some samples to be wrongly classified. In this case, SVM can be used to solve the nonlinear separable data sets. In addition, as shown in Fig. 2, such data sets could not be solved through slack variables because classification according to the linear classification plane would lead to more incorrectly classified samples, resulting in poorer final classifier performance. Therefore, in SVM, the concept of kernel function was adopted, and the linear inseparable data sets were changed through the kernel function to the high dimensional space, through which mapped original data such that the linear classification plane can be effectively used for classification.
The quadratic optimization problem of SVM could be converted as follows by changing the kernel function: ξ i denotes the slack factor, φ x denotes the kernel function, and C is the penalty coefficient. Similarly, the optimal solution of quadratic programming could be obtained using the Lagrange multiplier method and KKT conditions, and then the nonlinear SVM classification plane could be expressed as follows: K(x i , x j ) denotes the kernel function. Based on the idea of kernel function, we could easily predict PM 2.5 diffusion using SVM with multiple kernel functions. In general, multiple characteristics are required for the prediction and classification with SVM, and the prediction of PM 2.5 evolution can be completed by building related characteristics of PM 2.5 . According to the related studies on the evolution process of PM 2.5 , influencing factors that have significant relationships with PM 2.5 generally include this study, according to the literature, nine influencing factors air, temperature, humidity, pressure, and wind speed. In were selected for the prediction of PM 2.5 (SO 2 , NO 2 , O 3 , PM 10 , CO, temperature, humidity, air pressure and wind speed) from many factors. We input the 9D features to the SVM in the form of vectors, and predicted the PM 2.5 evolution by building SVM training and prediction processes. For the kernel function, the suitable high-dimensional space RBF kernel function was selected: In the process of prediction, the collected training set was assumed to be {x i , y i , i = 1, 2, 3, …, 9}, and nine samples were considered. In the training set, x denotes the feature vector composed of the nine influencing factors, and y signifies the specific PM 2.5 concentration value obtained from every sample. They were regarded as the tag for regression prediction in actual use; regression could be completed through the input feature vector in the trained model. The training process of SVM was aimed at finding an optimal separating hyperplane w T x + b = 0, on which the 9D feature vectors provided by x could be distinguished to the highest degree. In fact, when the data had low degree of differentiation, SVM provided the kernel function to solve the problem. The original feature could be converted to high dimensional space through different kernel functions. Under the conversion function, characteristics had stronger linear separability, and the generalization of different categories of data was increased. Because the 9 feature vectors extracted in the study were independently and identically distributed in space and time, and various variables almost had no relationship, they could be effectively distinguished through the radial basis function. Then, the prediction and regression of PM 2.5 could be completed through the SVM separating hyperplane. Thus, the SVM classification kernel function in the paper was selected as the radial basis function.

Fig. 2.
Changing nonlinear data set and higher dimensional linear separable data sets through the kernel function.

Experimental Scenario and Data
PM 2.5 data from Wuhan were taken as the analysis object. Wuhan, Hubei Province, is located in the middle and lower reaches of the Yangtze River at 30.5931°N, 114.3054°E, and covers an area of 8000 km 2 , of which urban area accounts for around 800 km 2 and forest cover accounts for more than 25%. Because its year-round climate is temperate continental climate and subtropical monsoon climate, it receives adequate annual rainfall, sufficient sunlight, and four distinct seasons. It is hot in summer and cold in winter and has high annual rainfall. There are nine PM 2.5 monitoring sites in Wuhan, which publish real-time monitoring data every hour. The monitoring data include SO 2 , NO 2 and O 3 , PM 10 , CO concentrations required for the SVM prediction in this study. Each index required by PM 2.5 regression and prediction could be obtained by combining with the real-time monitoring indexes of temperature, humidity, air pressure, and wind speed. Fig. 3 shows the distribution of PM 2.5 monitoring stations in Wuhan.

Experimental Process and Parameter Solving
(1) Solving diffusion parameters of the improved Gaussian smoke plume model: When solving the parameters of the improved Gaussian plume model, atmospheric stability was first required. In general, atmospheric stability includes six levels, namely, extremely unstable, unstable, slightly unstable, moderately stable, stable, and highly stable. In the process of analysis and calculation of the experimental data, parameters were built in the highly stable atmospheric environment because Wuhan has moderate latitude and better atmospheric stability . In addition, it was assumed that the diffusion parameters on the direction of x and y were the same. Only the two parameters σ y , σ z along the direction of y and z were actually used. The diffusion coefficient was determined by the following equation: γ, ε is the effect of wind speed on PM 2.5 particle diffusion, and X is the observation data at the current time point.
According to the data of the PM 2.5 monitoring station in January and February 2015, the improved Gaussian smoke plume model was solved. When solving the Gaussian plume model, we needed to consider the proposed three major factors, namely, rain wash, gravity settling, and surface reflection. Rain wash is more complex than the other factors in the specific calculation process, so it was excluded in the experiment. Therefore, gravity settling was first considered in the experiment. According to Eqs. (11-13), the key parameter W d in gravity settling factor was calculated. On this basis, considering the surface reflection factor, the key parameters α and β of the surface reflection factor were calculated according to Eq. (15). Finally, Q ʹ and H(m) of the Gaussian model under ideal conditions were calculated  14) and (16). The parameter solving results are given in Table 1. Parameters γ 1 and γ 2 were obtained through observation, and parameters ε 1 and ε 2 were obtained through calculation based on Eq. (24).
(2) Solving PM 2.5 source strength: The collected mass fraction was first converted to concentration because the actual diffusion prediction involved the prediction of PM 2.5 concentrations. The equation below gives the conversion process (Zhu et al., 2017): IAQI P signifies air quality partial of PM 2.5 ; C P is the mass density of PM 2.5 ; BP Hi is the air quality subindex and the high value of nearby pollution concentration limit of the corresponding C P ; BP LO denotes low value of nearby pollution concentration limit of the corresponding C P ; IAQI Hi signifies the air quality partial corresponding to BP Hi ; and IAQI LO is the air quality partial corresponding to BP LO . After the abovementioned transformation, the recorded values of PM 2.5 from the stations could be converted into the mass density. The location of the collection stations is shown in Fig. 3; formatting data are provided to simplify the calculation of C P , and C P was calculated using data from all the collection stations. Through the total mass concentrations of PM 2.5 calculated using the abovementioned process, source intensity Q could be expressed with unit time per unit volume of mass concentration of PM 2.5 . We assumed a C unit mass concentration within t time and V volume. Then, the source intensity Q could be solved with C P : It was assumed that t = 1 h, V = 10 km 3 .
(3) Calculation of the distance between collection stations: The distance between collection stations is closely related to the diffusion of PM 2.5 . According to the latitude and longitude relationship between each collection station, the shortest distance between each collection station could be calculated from the great-circle distance. Assuming that (φ 1 , λ 1 ),(φ 2 , λ 2 ) denotes the latitude and longitude of two collection stations, the great circle distance between two collection stations on the sphere could be built through the definition of the haversine function and cosine of two angles.
2 2 2 arcsin( sin ( / 2) cos cos sin ( / 2)) After the above parameter solving, source transformation, and distance calculation, the diffusion results of PM 2.5 pollution in the 9 monitoring points could be computed. Table 2 shows the diffusion results and statistical value.
As shown in Table 2, Yuehu, Huaqiao, Qingshan, Ziyang, and Donghu in heavy industry area of Wuhan had higher PM 2.5 concentration, and more obvious diffusion of PM 2.5 . Moreover, Wujiashan, Jiangtan, as well as other scenic and living areas had lower concentrations of PM 2.5 , and smaller diffusion range, being consistent with the basic law of the environment. In addition, remote areas, such as Zhuankou, which has lower local population density and belonged to an agricultural area, had the lowest PM 2.5 concentration without any diffusion of PM 2.5 . In short, the Gaussian smoke plume model established in the paper could effectively analyze different environmental conditions and different concentrations of PM 2.5 and diffusion. The model built in this study considers gravity settling, building surface rebound, and the rain wash, so it could ensure true and reliable results corresponding to actual conditions, and adaptability to various cases. The statistical results of the nine monitoring In general, such small differences are caused by factors such as temperature, humidity and wind direction. These minor factors should be taken into consideration in the process of follow-up studies so as to increase the reliability of the results.
(4) Predicting PM 2.5 evolution with SVM: The indexes of SO 2 , NO 2 and O 3 , PM 10 and CO could be obtained from PM 2.5 monitoring stations. By combining with the real-time parameters such as temperature, humidity, air pressure, and wind speed provided by the Wuhan Meteorological Agency, multidimensional characteristics could be generated with the nine real-time parameters. Then, through SVM training with a month as the unit, PM 2.5 diffusion and evolution could be predicted for the follow-up months. Finally, the evolution prediction results using the SVM model were compared with the actual values. Fig. 3 shows the comparison of the prediction values and real values of concentration evolution for PM 2.5 monitoring stations in Wuhan within eight follow-up months. In addition, a contrasting experiment was conducted under the same feature vectors using the linear regression model (Di et al., 2016). Finally, the evolution prediction results through SVM model and the linear regression model were compared with real results. Fig. 4 shows the comparison of the evolution prediction results of the SVM model and the linear regression model with the real results. Furthermore, in the experiment, the concentration evolution of PM 2.5 at the nine monitoring stations in Wuhan was predicted within the eight follow-up months.
As the comparison results show, there were no significant differences between the predicted value and actual value of PM 2.5 evolution from March to May, while there were more differences from June to October. The climate, temperature, rainfall, and humidity in Wuhan changed slightly from March to May and had little impact on PM 2.5 . With the change of temperature and humidity, coupled with the impact of monsoon winds, the hypothesis of the prediction research was nullified. Thus, the gap between evolution prediction value and the actual value gradually increased. As shown by the comparison of the evolution prediction results of the linear regression model and SVM model, the SVM model showed clearly higher precision than the linear regression model towards the last few months. With longer prediction time, the prediction model requires higherdimension features, which would result in poorer fitting of the machine learning model, and eventually decrease the prediction precision. However, in the SVM model, we adopted the RBF kernel, which could cope with higher dimension features and provided better characteristics and robustness of prediction. Nevertheless, because there were insufficient samples in SVM modeling, it was assumed that samples under various climates throughout the year could be obtained. Accordingly, the SVM modeling was performed with PM 2.5 data for the whole year to obtain better results. The SVM model built in this study had certain generalizations and can better predict PM 2.5 evolution.

CONCLUSION
With rapid industrialization, haze has become an increasingly serious threat in China. To effectively control haze, the PM 2.5 concentration must first be decreased. A large number of studies have shown that the formation of PM 2.5 is complicated and mutually influenced by several factors. In order to more effectively simulate and predict PM 2.5 evolution, the Gaussian smoke plume model was rebuilt based on the practical situation and a consideration of multiple influential factors. Feature vectors were constructed by selecting the 9 factors with the largest influence on PM 2.5 , and an SVM model was established. The model achieved good diffusion simulation and good evolution prediction. However, because various factors influence PM 2.5 , the currently built Gaussian smoke plume model and the SVM model still cannot effectively and accurately simulate and predict PM 2.5 values. Therefore, more factors should be considered in future work. In addition, the Gaussian smoke plume model and SVM model should be modified to ensure that the simulation and prediction of PM 2.5 are closer to the actual measured values.