PM2.5 Concentration Prediction Based on Pollutant Pattern Recognition Using PCA-clustering Method and CS Algorithm Optimized SVR

Environmental issues, particularly air pollution, are a matter of concern for people all around the world. PM2.5 levels that are too high harm people’s physical and mental health. For government air pollution control, more accurate PM2.5 concentration predictions are critical. In this paper, we explored the relationship between pollutants (PM10, SO 2 , NO 2 , O 3 , CO) and meteorological factors (atmospheric pressure, relative humidity, air temperature, wind speed, wind direction, cumulative precipitation) that affect the generation and transmission of PM2.5. To better predict the concentration of PM2.5, we innovatively combined principal component analysis (PCA) and clustering methods to extract pollutant variables and patterns as important PM2.5 concentration predictors of different models such as support vector regression (SVR), multivariate nonlinear regression (MNR), and artificial neural network (ANN). Compared to MNR and ANN models, SVR presented better prediction accuracy. Moreover, cuckoo search (CS), cross-validation (CV), and particle swarm optimization (PSO) algorithms were used to further optimize the parameters in the process of SVR. And to evaluate the above PM2.5 concentration prediction results, we introduced several evaluating indicators including root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and person correlation coefficient (R) between predicted and measured values. The obtained results confirmed that when the pollutant data was divided into three patterns, the best prediction accuracy was achieved by the CS-SVR model.


INTRODUCTION
Particulate matter (PM) in the atmosphere has gotten a lot of attention in recent decades because it has a big impact on human health. PM2.5 is made up of harmful chemicals such as heavy metals and carcinogenic organic compounds with aerodynamic diameters less than 2.5m. It can easily and deeply penetrate the human lungs to cause serious health issues (Thomaidis et al. 2003, Yuan et al. 2019, Badaloni et al. 2017). High PM2.5 level exposure is correlated to the increase in respiratory and cardiovascular diseases (Ostro et al. 1999, Biancofiore et al. 2017) and population mortality (Di et al. 2017, Liang et al. 2018). It has also been proven that prenatal exposure to PM2.5 can decrease corpus callosum volume and affect children's neuropsychological development (Mortamais et al. 2019, Suades-González et al. 2015. International environmental organizations and countries all over the world pay great attention to the negative effects of PM2.5. According to the WHO guideline and China's current situation, in 2012 the Ministry of Ecology and Environment (MEE) published Chinese ambient air quality standards, in which the daily and annual average PM2.5 limits were set as 75 and 35 μg.m -3 (State Bureau of Environment Protection 2012). In Beijing, for example, days with air quality surpassing the MEE limit accounted for 43.5% of the total, which is higher than days with other pollutants such as O 3 , PM10, and NO 2 (Beijing Municipal Ecology and Environment Bureau 2019, 2018). PM2.5 concentration predictions that are more accurate help not only people in planning their daily activities but also government regulation.
PM2.5 concentration is influenced by a number of factors, the most important of which are pollutant emission factors and meteorological conditions. The former takes part in the chemical process of PM2.5 formations, and the latter influences the dissipation of PM2.5 (Liang et al. 2015, Wang et al. 2015. With the development of statistical methods, data mining, and artificial intelligence technology, researchers hope to use more simple and effective methods to predict the PM2.5 concentrations. Many efforts have been committed to the data algorithm and optimization. Marsha and Larkin (2019) used a multiple linear regression scheme to forecast daily PM2.5 concentrations using the previous day's PM2.5 measurements as well as fire and smoke-related variables from satellite observations. Sun et al. (2013) used hidden Markov models to forecast daily average PM2.5 concentrations for the next 24 hours. Liu and Sun (2019) used the supplementary ensemble empirical modal decomposition algorithm, in which the random forest was applied to the decomposition sequence, to effectively reflect the trend of PM2.5 concentration.
In recent years, more effective data mining methods like artificial neural networks and support vector machines have also been successfully implemented in air pollution forecasting. Artificial neural networks, principal component analysis, and k-means clustering technology were combined by Franceschi et al. (2018) to forecast the PM10 and PM2.5 concentrations in Bogotá, Colombia. A hybrid model based on principal component analysis (PCA) and cuckoo search algorithm (CS) optimized least square support vector machine (LSSVM) method was developed by Sun and Sun (2016) to predict PM2.5 concentrations. Gan et al. (2018) proposed a new method based on the secondary-decomposition-ensemble learning paradigm to forecast hourly PM2.5 concentration, in which the least square support vector was used to model all reconstructed components independently. These findings show that the support vector machine method is very effective at predicting PM2.5 concentrations.
In this work, we introduced the PCA-clustering method to CS algorithm optimized SVR for the prediction of PM2.5 concentrations in Beijing. In the beginning, we investigated the correlation between pollutant factors, meteorological factors, and PM2.5 concentrations, and extracted the pollutant variables and patterns using the PCA-clustering method to assist prediction. Then, contrastive studies on parameters optimization algorithms for SVR have been carried out, including cross-validation (CV), particle swarm optimization (PSO), and cuckoo searching (CS) algorithms, to achieve better prediction efficiency. Evaluation metrics such as RMSE, MAE, MAPE, and R were introduced as part of the process. Finally, to further verify the effectiveness of our method, other predictive models like multivariate nonlinear regression (MNR) and artificial neural network (ANN) were also tested. The obtained results indicated that The PCAclustering approach with SVR optimized by CS algorithm produced the best prediction accuracy.

Sites and Data
Yizhuang station in Beijing has the most advanced meteorological observation equipment in China, enabling it to provide the most accurate data. Therefore, we model and simulate the PM2.5 predictions with the pollutant data and meteorological data from the Yizhuang observation station as shown in Fig. 1.
The pollutant factors include PM10, SO 2 , NO 2 , O 3 , CO which were collected from Beijing Municipal Environmental Monitoring Center, and the meteorological factors include atmospheric pressure (P), relative humidity (RH), air temperature (T), wind speed (WS), wind direction (WD), 20-20 hours' cumulative precipitation (CP) which were collected from the National Meteorological Information Center. The details of the original data are shown in Table 1. The 24 h average of the pollutant factors was calculated for the purpose of PM2.5 concentration prediction. Fig. 2 shows the PM2.5 concentration and temperature of Yizhuang station from October 14, 2014, to December 31, 2017. When some variables' data was missing for several days in a row, the associated dates data was removed, and the sporadic missing data was imputed using the EM imputation method.
As shown in Fig. 2, the trends of PM2.5 concentration and temperature are opposite. Low PM2.5 values were observed in the warm period from April to September, while high PM2.5 values were observed in the cold days from October PCA-clustering method to assist prediction. Then, contrastive studies on parameters optimization algorithms for SVR have been carried out, including cross-validation (CV), particle swarm optimization (PSO), and cuckoo searching (CS) algorithms, to achieve better prediction efficiency. Evaluation metrics such as RMSE, MAE, MAPE, and R were introduced as part of the process. Finally, to further verify the effectiveness of our method, other predictive models like multivariate nonlinear regression (MNR) and artificial neural network (ANN) were also tested. The obtained results indicated that The PCA-clustering approach with SVR optimized by CS algorithm produced the best prediction accuracy.

Sites and Data
Yizhuang station in Beijing has the most advanced meteorological observation equipment in China, enabling it to provide the most accurate data. Therefore, we model and simulate the PM2.5 predictions with the pollutant data and meteorological data from the Yizhuang observation station as shown in Fig. 1.  to March next year, some of which were even more than 500μg/m 3 . Considering that high-level PM2.5 concentration has a great impact on people's lives, we use the atmospheric environment data of cold days to establish a prediction model for PM2.5 concentration.

PCA-clustering method to extract pollutant variables and pollutant patterns:
The principal component analysis (PCA) algorithm has been widely applied for reducing the dimension of the data set on the premise of retaining the main variance. A new set of variables can be achieved by PCA transforming which are called principal components (PCs). To simplify the structure of the dataset, only the first few PCs with large variance are usually chosen to reflect the information of the original variables in the real research process. In most cases, a cumulative variance contribution rate of more than 85% for the first several major components is appropriate. For the purpose of this study, PCA was combined with the correlation coefficient between PM2.5 concentration and related covariates to find the primary influencing factors. Moreover, k-meaning clustering was further introduced to extract pollutant patterns.
Support vector machine regression model: Support vector machine (SVM), originally developed by Vapnik in the 1990s (Vapnik 1995(Vapnik , 1998, is one of the most robust and accurate data mining algorithms, mainly including support vector machine classification (SVC) and support vector machine regression (SVR). It is very flexible to solve all kinds of nonlinear classification regression problems (Wu & Kumar 2013). In this paper, SVR has been used to build the PM2.5 concentration prediction model for satisfying results.
In the SVR model, the training data is set as and y i is the corresponding dependent variable. To learn a g(x) close to y, the SVR linear regression model can be as follows: (1) where w, b are the pending parameters. To obtain larger intervals and smaller amounts of noise data, relaxation variables x i and x i  are further introduced, and the SVR regression problem can be expressed as follows: to solve all kinds of nonlinear classification regress In this paper, SVR has been used to build the PM for satisfying results.
In the SVR model, the training data is set as { is the input variable and is the correspond g(x) close to , the SVR linear regression model , , , , . where parameter C is the penalty factor to solve the optimi- Meanwhile, a a m m , , ,   OER are introduced to build the Lagrange function as follows: where parameter C is the penalty factor to solve the optimization problem, where ,  ∈ are Lagrange multiplier. Meanwhile, α, , μ,̂∈ are introduced to build the Lagrange function as follows: In Eq. (3), by fixing α, , μ,̂, calculating derivation of ω, b, ξ,̂ and setting the results as 0, the following Eqs. are obtained:
s the penalty factor to solve the optimization problem, where ge multiplier. Meanwhile, α, , μ,̂∈ are introduced to build as follows: α, , μ,̂, calculating derivation of ω, b, ξ,̂ and setting the ing Eqs. are obtained: , In Eq. (3), by fixing SVR regression problem can be expressed as follows: where parameter C is the penalty factor to solve the optimization problem, where ,  ∈ are Lagrange multiplier. Meanwhile, α, , μ,̂∈ are introduced to build the Lagrange function as follows: In Eq. (3), by fixing α, , μ,̂, calculating derivation of ω, b, ξ,̂ and setting the results as 0, the following Eqs. are obtained:

…(4)
and setting the results as 0, the following Eqs. are obtained: SVR regression problem can be expressed as follows: where parameter C is the penalty factor to solve the optimization problem, where ,  ∈ are Lagrange multiplier. Meanwhile, α, , μ,̂∈ are introduced to build the Lagrange function as follows: In Eq. (3), by fixing α, , μ,̂, calculating derivation of ω, b, ξ,̂ and setting the results as 0, the following Eqs. are obtained: where parameter C is the penalty factor to solve the optimization problem, where ,  ∈ are Lagrange multiplier. Meanwhile, α, , μ,̂∈ are introduced to build the Lagrange function as follows: In Eq. (3), by fixing α, , μ,̂, calculating derivation of ω, b, ξ,̂ and setting the results as 0, the following Eqs. are obtained:

=̂+̂.
, Putting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to the obtained duality problem, Eq. (5) is achieved as follows: Putting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to the obtained duality problem, Eq. (5) is achieved as follows: To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of and can be obtained.
Accordingly, the final SVM regression modal can be defined as follows: d adding Karush-Kuhn-Tucker conditions to chieved as follows: on problem, the SMO algorithm is used. After lier, the values of and can be obtained.
Putting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to the obtained duality problem, Eq. (5) is achieved as follows: To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of and can be obtained.
Putting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to the obtained duality problem, Eq. (5) is achieved as follows: To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of and can be obtained.
Putting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to the obtained duality problem, Eq. (5) is achieved as follows: To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of and can be obtained.
tting the four Eqs. (4) into Eq. (3), and adding Karush-Kuhn-Tucker conditions to obtained duality problem, Eq. (5) is achieved as follows: efficiently solve the above optimization problem, the SMO algorithm is used. After termining the optimal Lagrange multiplier, the values of and can be obtained.
To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of w and b can be obtained. Accordingly, the final SVM regression modal can be defined as follows: ̂= 0, ̂= 0, ( − ) = 0, ( −̂)̂= 0.
To efficiently solve the above optimization problem, the determining the optimal Lagrange multiplier, the values o Accordingly, the final SVM regression modal can be defi where ∅( ) is the nonlinear mapping function that map space with a higher dimension. The kernel function condition can be used instead of the mapping function to s and computing problems. In this paper, the radial basis f function (Eq. (7)): where 2 is the width of the kernel parameter.

SVR optimized by the cuckoo search algorithm
In the SVR nonlinear prediction model with radial bas the penalty and the width 2 are the parameters. In (CS) algorithm is introduced to optimize these two parame and accuracy of prediction.
To efficiently solve the above optimization problem, the SMO algorithm is used. After determining the optimal Lagrange multiplier, the values of and can be obtained.
Accordingly, the final SVM regression modal can be defined as follows: where ∅( ) is the nonlinear mapping function that maps the data into a linear feature space with a higher dimension. The kernel function ( , ) satisfying Mercer's condition can be used instead of the mapping function to solve the complex dimensions and computing problems. In this paper, the radial basis function is used as the kernel function (Eq. (7)): where 2 is the width of the kernel parameter.

SVR optimized by the cuckoo search algorithm
In the SVR nonlinear prediction model with radial basis function as kernel function, the penalty and the width 2 are the parameters. In this paper, the cuckoo search (CS) algorithm is introduced to optimize these two parameters to improve the efficiency and accuracy of prediction. To efficiently solve the above optimization problem, the SM determining the optimal Lagrange multiplier, the values of Accordingly, the final SVM regression modal can be define where ∅( ) is the nonlinear mapping function that maps t space with a higher dimension. The kernel function ( condition can be used instead of the mapping function to so and computing problems. In this paper, the radial basis fu function (Eq. (7)): where 2 is the width of the kernel parameter.

SVR optimized by the cuckoo search algorithm
In the SVR nonlinear prediction model with radial basis the penalty and the width 2 are the parameters. In th (CS) algorithm is introduced to optimize these two paramete and accuracy of prediction.
is the nonlinear mapping function that maps the data into a linear feature space with a higher dimension. The kernel function k(x i , x) satisfying Mercer's condition can be used instead of the mapping function to solve the complex dimensions and computing problems. In this paper, the radial basis function is used as the kernel function (Eq. (7)): where s 2 is the width of the kernel parameter.

SVR Optimized by the Cuckoo Search Algorithm
In the SVR nonlinear prediction model with radial basis function as kernel function, the penalty C and the width s 2 are the parameters. In this paper, the cuckoo search (CS) algorithm is introduced to optimize these two parameters to improve the efficiency and accuracy of prediction.
Yang & Deb (2009) presented the Cuckoo search natural heuristic method, which mimics cuckoo brood parasitism behavior. The algorithm can be enhanced by Levy flight rather than a simple isotropic random walk. The CS algorithm combines global search and local search which are controlled by discovery probability (P a ). This makes it possible to explore the search space more efficiently in the global scope, achieving global optimum with a higher probability. Although the PSO algorithm may converge to local optimization prematurely, it is not necessarily the global optimal solution. While CS can usually converge to global optimization.
In the D dimensional space, the population of the nest is n, X Nestpop = [X 1 ,X 2 ,...,X n ] T , and each nest is the solution to the problem. In each nest, there is a D dimension vector{X i = [X i1 ,X i2 ,...,X iD ] T |i = 1,2,..,n}.
After the nest population is formed randomly, CS updates the individual through two paths: i) the cuckoo uses Levy flight-based Eq (8) to find the nest and lay an egg. where is the scaling f ⅱ) the host uses random probability of (Eq. (1 In combined Eq. (8) and Eq. (9), S is the random step size obeying Levy distribution. ⅰ) the cuckoo uses Levy flight-based Eq (8) to find the nest and lay an egg.
In combined Eq. (8) and Eq. (9), is the random step size obeying Levy distribution.
where is the scaling factor of step size, which is to 0.01, and is set to 1.5.
ⅱ) the host uses random walk to rebuild its nest after finding the alien egg with the probability of (Eq. (12)). ⅰ) the cuckoo uses Levy flight-based Eq (8) to find the nest and lay an egg.
In combined Eq. (8) and Eq. (9), is the random step size obeying Levy distribution.
where is the scaling factor of step size, which is to 0.01, and is set to 1.5.
ⅱ) the host uses random walk to rebuild its nest after finding the alien egg with the probability of (Eq. (12)).
where a is the scaling factor of step size, which is to 0.01, and b is set to 1.5.
ii) the host uses random walk to rebuild its nest after finding the alien egg with the probability of P a (Eq. (12)).
where P a = 0.25 is recommended. g, Î are random numbers subject to a uniform distribution. Heaviside(P a - In combined Eq. (8) and Eq. (9), is the random step size obeying Levy distribution.
where is the scaling factor of step size, which is to 0.01, and is set to 1.5.
ⅱ) the host uses random walk to rebuild its nest after finding the alien egg with the probability of (Eq. (12)).
where = 0.25 is recommended. , are random numbers subject to a uniform distribution.
( − ) is the Heaviside step function. When > , ) is the Heaviside step function. When P a > Î, Heaviside(P a -ⅰ) the cuckoo uses Levy flight-based Eq (8) to find the nest and lay an egg.
In combined Eq. (8) and Eq. (9), is the random step size obeying Levy distribution.
where is the scaling factor of step size, which is to 0.01, and is set to 1.5.
ⅱ) the host uses random walk to rebuild its nest after finding the alien egg with the probability of (Eq. (12)).
where = 0.25 is recommended. , are random numbers subject to a uniform distribution.
( − ) is the Heaviside step function. When > , ) = 1, when P a < vy flight-based Eq (8) to find the nest and lay an egg.
nd Eq. (9), is the random step size obeying Levy distribution.
om walk to rebuild its nest after finding the alien egg with the . (12)).
host uses random walk to rebuild its nest after finding the alien egg with the ility of (Eq. (12)).
( − ) is the Heaviside step function. When > , ) = 0, when P a = ⅰ) the cuckoo uses Levy flight-based Eq (8) to find the nest and lay an egg.
In combined Eq. (8) and Eq. (9), is the random step size obeying Levy distribution.
where is the scaling factor of step size, which is to 0.01, and is set to 1.5.
ⅱ) the host uses random walk to rebuild its nest after finding the alien egg with the probability of (Eq. (12)).
where = 0.25 is recommended. , are random numbers subject to a uniform distribution.
( − ) is the Heaviside step function. When > , Heaviside(P a -flight-based Eq (8) to find the nest and lay an egg.
Eq. (9), is the random step size obeying Levy distribution.
walk to rebuild its nest after finding the alien egg with the 2)).
The flow chart of SVR prediction optimized by the CS algorithm (CS-SVR) is shown in Fig. 3. Evaluation index for prediction results: To investigate the accuracy of different PM2.5 concentration prediction models, four evaluation indexes are applied, including person correlation coefficient (R), root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). R can release the relevance between the observed value and the predicted value (Eq. (12)). Mean squared error (MSE) is the expected value of the square of the difference between the predicted value and the observed value. Correspondingly, RMSE is the square root of MSE which is more intuitive in order of magnitude (Eq. (13)). And the smaller the RMSE value, the better the accuracy of the prediction model. MAE represents the mean of the absolute error between the predicted value and the observed value, which can better reflect the actual predicted error (Eq. (14)). MAPE is used to better evaluate different models with the same set of data (Eq. (15)).

The Results of Extracting Pollutant Variables for Pattern Calculation
The atmospheric environment affecting PM2.5 concentration consists of pollutant factors and meteorological factors. We Table 2: Correlation coefficient (R) between the PM2.5 concentration and 6 pollutant variables, 5 meteorological variables.  Table 2. The R values between pollutant factors and PM2.5 concentrations are mostly higher than 0.5, but the R values in the meteorological parts are between 0.1 and 0.4. These results indicate that pollutant factors have a greater impact on the PM2.5 concentrations. Therefore, we decided to extract pollutant patterns to improve the prediction accuracy. The PCA-clustering method was employed for extracting needed variables.
In the beginning, all the 12 atmospheric environment variables including 6 pollutant components (PM2.5, PM10, SO 2 , NO 2 , O 3 , CO) and 6 meteorology factors (P, RH, T, WS, WD, CP) were examined by PCA methods, and the results are shown in Tables 3 and 4. In our previous work, we have confirmed that relative humidity (RH), temperature (T), and wind speed (WS) have a more significant impact on the concentration level of PM2.5 . As a result, the PCA method was used to construct an examination using nine variables (PM2.5, PM10, SO 2 , NO 2 , O 3 , CO, RH, T, and WS) (Tables 5 and 6). As shown in Table 3 and Fig. 4, the first three principal components can explain the degree of data variation as 70.57%, with the first one accounting for 40.39%. However, for the 9 variables calculation, the first three principal components can explain 82.017%, and the first one accounts for 51.321% (Table 5 and Fig. 4). Therefore, 3 meteorological variables (RH, T, WS) will be introduced for predicting PM2.5 concentrations in the next part.
The rotated component matrix in Table 4 shows the interpretive competency of pollutant variables to each primary component in the 12 variables (6 pollutant and 6 meteorology variables) computation. PM2.5, PM10, SO 2 , NO 2 , and CO all strongly explain the first principal component, however, O 3 is shifted to the second principal component. The same pattern may be seen in the findings of the 9 variable calculation (6 pollutant and 3 meteorology variables) and the results as shown in Table 6. PM10, SO 2 , NO 2 , and CO are used in the   The flow chart of SVR prediction optimized by the CS algorithm (CS-SVR) is shown in Fig. 3. extraction of pollutant patterns, which will be generated using the clustering approach and used as a key input factor for the subsequent prediction model, based on the study above.

The Results of PM2.5 Concentration Prediction by CS-SVR Model
All cold day data from October to the following March 2014-2017 was used as a training set, while the data from November and December 2017 was used as a testing set for PM2.5 concentration prediction. The population of the nest is set to 20 in the CS optimization procedure. The discovery probability P a = 0.25 is recommended. Penalty C and the width s 2 as parameters to be optimized are set to [0.01, 100]. In the optimization process, 100 iterations have been carried out.
First, three prediction models including CS-SVR, multivariate nonlinear regression (MNR), and artificial neural network (ANN) have been studied using 11 variables. As shown in Fig. 5, the absolute value of relative errors of CS-SVR is much better than those of MNR and ANN. The prediction accuracy of different models was further compared using four evaluation indexes such as R, MAE, RMSE, and MAPE (Table 7). The R values between the predicted and observed PM2.5 concentration by CS-SVR (0.9430) are higher than ANN (0.9342) and MNR (0.9326). Meanwhile, the MAE, RMSE, and MAPE indicators of CS-SVR decreased by 30.10%, 10.22%, 70.87% than MNR, and by 13.88%, 3.4451%, 51.48% than ANN, respectively. All of these findings indicate that the CS-SVR model outperforms the MNR and ANN models in predicting PM2.5 concentrations. In addition, different optimization methods for SVR were also investigated ( Table 7). All the four indexes (R, MAE, RMSE, and MAPE) of CS-SVR are better than those of CV-SVR and POS-SVR.
In addition, eight factors were used in the CS-SVR prediction model. CS-SVR still outperforms ANN and MNR in terms of prediction performance, as demonstrated in Fig. 6 and Table 8, despite the greater R-value and smaller MAE,

The Results of PM2.5 Concentration Prediction by CS-SVR Model
All cold day data from October to the following March 2014-2017 was used as a training set, while the data from November and December 2017 was used as a testing set for PM2.5 concentration prediction. The population of the nest is set to 20 in the CS optimization procedure. The discovery probability = 0.25 is recommended. Penalty and the width 2 as parameters to be optimized are set to [0.01, 100]. In the optimization process, 100 iterations have been carried out.  RMSE, and MAPE indicators. Therefore, from the above two groups of comparative experiments, the prediction accuracy of the CS-SVR model is better than the other two models in terms of each index.
Interestingly, although using fewer variables, the 8 variables CS-SVR prediction shows acceptable prediction accuracy. However, the R-value of the 8 variables CS-SVR prediction is a little bit lower than the 11 variables one (0.9388 vs 0.9430). To further improve the prediction accuracy of the 8 variables CS-SVR model, we used the PCA-clustering method to extract the pollutant pattern as an additional variable for PM2.5 concentration prediction. The obtained prediction results taking into account the pollutant pattern variable are shown in Table 9. It is quite clear that when the pollutant pattern variable is involved in the calculation, the prediction accuracy of all models improved. Especially, for the CS-SVR model, when the pollutant data was divided into three patterns (k = 3) by k-meaning clustering, the best prediction accuracy was achieved. The R-value increased to 0.9455, while the MAE, RMSE, and MAPE values decreased to 11.2523, 16.7114, and 0.3006, respectively, the lowest values of all models. As a result, the PCA-clustering extracted pollutant pattern-based CS-SVR model predicts PM2.5 concentrations the best.  and the width 2 as parameters to be optimized are set to [0.01, 100]. In the optimization process, 100 iterations have been carried out.

CONCLUSION
In this work, we applied principal component analysis (PCA)-clustering-based pollutant pattern recognition and CS algorithm optimized SVR for the PM2.5 concentratimon prediction. In comparison to a prediction based on all meteorological factors, relative humidity, air temperature, and wind speed could be chosen to provide an acceptable prediction accuracy. To further improve the prediction results, a new variable (k) of pollutant pattern was extracted by the PCA-clustering method and added to the calculation. Support vector regression outperforms multivariate nonlinear  regression and artificial neural network models, as seen by indices like RMSE, MAE, MAPE, and R. Furthermore, the cuckoo search was used to further optimize the parameters in the SVR process, which resulted in a better prediction of PM2.5 concentration than cross-validation and particle swarm optimization algorithms. According to these comparative studies, the best PM2.5 concentration prediction accuracy could be obtained by the CS-SVR model with three pollutant patterns (k = 3). Pollutant and meteorological data from more observation stations will be introduced in the future to further prove the reliability of our prediction model and acquire higher prediction accuracy.