A Multiple Kernel Learning Approach for Air Quality Prediction

Air quality prediction is an important research issue due to the increasing impact of air pollution on the urban environment. However, existing methods often fail to forecast high-polluting air conditions, which is precisely what should be highlighted. In this paper, a novel multiple kernel learning (MKL) model that embodies the characteristics of ensemble learning, kernel learning, and representative learning is proposed to forecast the near future air quality (AQ). ,e centered alignment approach is used for learning kernels, and a boosting approach is used to determine the proper number of kernels. To demonstrate the performance of the proposed MKL model, its performance is compared to that of classical autoregressive integrated moving average (ARIMA) model; widely used parametric models like random forest (RF) and support vector machine (SVM); popular neural network models like multiple layer perceptron (MLP); and long short-term memory neural network. Datasets acquired from a coastal city Hong Kong and an inland city Beijing are used to train and validate all the models. Experiments show that the MKL model outperforms the other models. Moreover, the MKL model has better forecast ability for high health risk category AQ.


Introduction
With the development of the economy and society all over the world, most metropolitan cities are experiencing elevated concentrations of ground-level air pollutants, especially in fast developing countries like India and China.Exposure to air pollution can affect everyone, but it can be particularly harmful to people with a heart disease or a lung condition, elderly people, and children.Studies show that long-term exposure to fine particulate air pollution or traffic-related air pollution is associated with environmentalcause mortality, even at concentration ranges well below the standard annual mean limit value [1,2].erefore, building an early warning system, which provides precise forecast and also alerts health alarm to local inhabitants will provide valuable information to protect humans from damage by air pollution.
Currently, three major approaches are used to forecast real-time air quality: simple empirical approaches, advanced physically based approaches, and machine learning approaches.
Simple empirical approaches like persistence method and climatology method are based on assumptions or hypothesis; that is, thresholds of forecasted meteorological variables can indicate future pollution level [3].ey are computationally fast but have low accuracy and are primarily used as references by other methods.Advanced physically based approaches like chemical transport models (CTMs) simulate the formation and accumulation of air pollutants by a solution of the conservation equations and transformation relationships among the mass of various chemical species and physical states.ey can provide valuable insights for understanding pollutant diffusion mechanisms.But they are computationally expensive, demanding reliable meteorological predictions, and highly relevant to a high level of expertise [4].
Machine learning methods are computationally fast and cost-effective and can provide promising prediction accuracy.Various machine learning methods have been applied to predict the air quality.Widely used methods include classical autoregressive moving average (ARMA) methods like the autoregressive integrated moving average (ARIMA) [5], support vector machine (SVM) methods like the support vector classifier (SVC) [6,7], ensemble methods like the random forest (RF) [8,9], artificial neural network (ANN) methods like the multiple layer perceptron (MLP) [10,11], and deep learning methods like the long short-term memory neural network (LSTM NN) [12,13].
Among the models mentioned above, ARIMA is a time series model and is often used as a baseline model.e performance of the SVM model is often hinged on the appropriate choice of the kernel.A kernel in SVM introduces nonlinearity into the problem by mapping new input data implicitly into a Hilbert space where it may then be linearly separable [14].Neural network models, especially deep neural networks, can automatically learn the representations from raw data, but it takes a long time and a large volume of data to train a wellbehaved network.
Multiple kernel learning (MKL) is proposed as an alternative to cross validation, feature selection, metric learning, and ensemble methods.MKL refers to using multiple kernels instead of a single one; most of the algorithms which make use of the kernel tricks can take the advantage of MKL, such as SVM and kernel ridge regression (KRR).In MKL, feature combination and classifier training are done simultaneously, and different data formats can be used in the same formulation.In addition, the inherent kernel trick of combining linear kernels and nonlinear kernels in MKL makes it more promising in solving fusing information problems.ere is a significant amount of work in the literature for combining multiple kernels [15,16].Various applications indicate that performance gains can be achieved by linear and nonlinear kernel combinations using MKL methods [17][18][19].
In this paper, a novel multiple kernel learning-based air quality prediction approach that can inherently capture the characteristics of the heterogeneous time, meteorology, and air pollutant data is proposed.Real datasets from a coastal city Hong Kong and an inland city Beijing are used to demonstrate the effectiveness the proposed approach.Comprehensive comparison experiments with ARIMA, RF, SVCs, MLP, and LSTM are conducted.
ough some of the algorithms can automatically learn the representative features of the data, pretraining featuring engineering is still necessary and will significantly affect the models' performance.In addition, hyperparameter tuning is critical for all the parametric models.erefore, in this paper, special attention is paid to the feature engineering and parameter tuning process.e methodologies applied to Hong Kong and Beijing datasets are similar.erefore, Hong Kong is used for demonstration in most of the paper.e main contributions of this paper are as follows: (1) A multiple kernel learning approach is introduced into the domain of air quality prediction for the first time.Multiscale predictions over the next 1, 3, 6, 9, and 12 hours' air quality of an inland city Beijing and a coastal city Hong Kong are presented.
(2) e proposed method can effectively capture the air quality features from the hybrid time, meteorology, and air pollutant data.e experimental results demonstrated the advantages of this approach over some of the widely used models, especially in the prediction of severe air pollution conditions.e rest of the paper is organized as follows: Section 2 presents the methodology of the multiple kernel learning algorithm; data preparation is introduced in Section 3; in Section 4, extensive experimentation results and necessary discussions are presented; and Section 5 concludes this paper.

Methodology
While classical kernel-based classifiers such as SVCs are based on a single kernel, in practice, it is often desirable to base classifiers on combinations of multiple kernels since data points typically can be due to multiple heterogeneous sources.A kernel implicitly represents a notion of similarity for the data, and different kernels will accommodate different nonlinear mappings, and MKL provides a way to combine different ideas of similarity.Using a specific kernel may be a source of bias, and MKL provides a way to select optimal kernels and parameters from a larger set of kernels.In the air quality prediction case, the source data are coming from different modalities.erefore, in the paper, instead of using just a single kernel which is usually more suitable for the homogeneous data source, multiple kernels are combined, and the classical and empirically successful support vector classifier is used as the base learner.
e detailed introduction of the kernel support vector machine is given in Appendix A. In this section, the multiple kernel learning approach is described first, and then, the centered alignment method is introduced for learning kernels.

Multiple Kernel
Learning.MKL is conceptually similar to single kernel learning.In other words, single kernel leaning is a special case of MKL.In MKL, the final kernel is learnt as a combination (linear or nonlinear) of many base kernels from the data itself: where f η : R P → R is the combination function, κ m is the kernel function, m is the dimensionality of the corresponding feature representation, and η parameterizes the combination function.
It is also possible to integrate η into the kernel functions where it is optimized during training.
Most of the existing MKL algorithms fall into the first category and try to combine predefined kernels in an optimal way.Commonly used kernels are linear, polynomial, radial basis function (RBF), and sigmoid.

2
Advances in Meteorology e kernels can be combined in different ways, and each has its own combination parameter characteristics.Generally, linear combination methods are used, and they fall into two basic categories: unweighted sum (i.e., using sum or mean of the kernels as the combined kernel) and weighted sum.In the weighted sum case, the combination function is linearly parameterized: where η denotes the kernel weights.Different versions of this approach differ in the way they put restrictions on η: the linear sum has arbitrary real value η m and the conic sum requires η m to be positive, while η sums to 1 for the convex sum.e conic sum and convex sum are special cases of the linear sum, but the former two are used more often because the relative importance of the combined kernels can be extracted by looking at the kernel weights.Furthermore, the kernel weights of the conic and convex sum correspond to scaling the feature spaces when they are nonnegative [20].
In this paper, the conic sum restriction used as the convex sum is a special case of the conic sum.e resulting decision function of the multiple kernel support vector classifier (MKSVC) is defined as ( ere are four important parameters: the number of kernels (P), the inner kernel coefficients of each kernel, features to use for each kernel (x m i ), and the weight (η m ) of each kernel.In this paper, the inner kernel coefficients are obtained by optimizing the single kernel-based learners.η is obtained by the centered alignment approach proposed in [32].P is obtained through the boosting approach by iteratively adding a new kernel until the performance stops improving (the kernels are added based on the weights learned by the centered alignment approach, kernel with higher weight first).As with the features used by each kernel, for simplicity, the canonical multiple kernel learning approach is used, namely, one kernel combination for all feature representations.e pseudo code of the MKSVC is described in Algorithm 1.

Centered Alignment Method for Learning Kernels.
Centered alignment is used as a similarity measure between kernels or kernel matrices.Given p kernels matrices Κ 1 , Κ 2 , . . ., Κ p , centered kernel alignment learns a linear combination of kernels resulting in a combined kernel matrix: where p is the number of kernels, μ q is the centered kernel weight, and Κ cq is the centered kernel: where I is the identity matrix, 1 ∈ R m×1 denotes the vector with all entries equal to one, and Κ q is the original kernel matrix.e alignment between two kernel functions Κ and Κ ′ is defined by where Κ c and Κ c ′ are the centered kernels of Κ and Κ ′ and 〈•, •〉 F denotes the Frobenius product and ‖ • ‖ F the Frobenius norm defined by and  ρ(Κ, Κ ′ ) ∈ [0, 1] by definition.

Start
First, get the kernel coefficients by optimizing the single kernel-base learners (κ m (x i , x i )) Second, get the weight of each kernel by the centered kernel alignment algorithm (η) ird, get the number of kernels by boosting approach (P) Fourth, get the combined optimized kernel κ η (x i , x j ) �  P m�1 η m κ m (x i , x i ) en, use SVC as the base learner and optimize it with a general optimizing algorithm Return Advances in Meteorology computed independently by using the training samples and the centered kernel weight can be chosen proportional to that alignment.us, the resulting kernel matrix is defined by

Data Preparation
In this paper, two datasets are used: one is from Hong Kong, a coastal city, whose air condition is relatively good, and the other is from an inland city, Beijing, whose air condition is relatively poor.Dataset of HK contains two years' hourly meteorology data and pollutant data between 1 February 2013 and 31 January 2015 collected from HK's Sha Tin air quality monitoring station [21] and weather forecast station [22].Dataset of Beijing contains five years' hourly PM2.5 data and meteorology data between 1 January 2010 and 31 December 2014 collected from UCI machine learning repository [23].

Prediction Target.
e prediction targets in this paper are the air quality health index (AQHI) in Hong Kong and the PM2.5 individual air quality level (IAQL) in Beijing.AQHI and IAQL are scales designed to help understand the impact of air quality on health.Unlike air quality concentrations, these air quality indices provide the public with advice on how to protect their health during air quality levels associated with low, moderate, high, and very high health risks.ey also provide advice on how to improve air quality by proposing behavioral change to reduce the environmental footprint [24,25].
For any given hour, the AQHI is calculated from the sum of the percentage excess risk of daily hospital admissions attributing to the 3-hour moving average concentrations of four criteria air pollutants: ozone (O 3 ), nitrogen dioxide (NO 2 ), sulphur dioxide (SO 2 ), and particulate matter (PM) (respirable suspended particulates (RSP or PM10) or fine suspended particulates (FSP or PM2.5), whichever poses a higher health risk).
e IAQL is classified based on the individual air quality index (IAQI) which is calculated according to a formula published by China' Ministry of Environmental Protection (MEP) [26].e highest IAQI among pollutants SO 2 , NO 2 , O 3 , carbon monoxide (CO), PM2.5, and PM10 at a given time is called the primary or dominant pollutant and is chosen for the overall AQI value.In China, PM2.5 is the primary pollutant most of the time; therefore, its IAQI is usually the overall AQI.
e detailed information of calculating AQHI and IAQI is given in Appendix B. ese indices are health protection tools used to make decisions to reduce short-term exposure to air pollution by adjusting activity levels during increased levels of air pollution.Table 1 shows the health risks with corresponding air quality classifications.

Performance Metric.
In this paper, accuracy, mean square error (mse), weighted precision (wp), weighted recall (wr), and weighted f1-score (wf) are used to evaluate the effectiveness of all the algorithms.e precision (P) is calculated by the formula TP/(TP + FP) where TP is the number of correct predictions and FP is the number of incorrect predictions.Recall (R) is the proportion of instances classified as a given class divided by the actual total in that class.F1-score is a harmonic average of precision and recall [27].
For accuracy and mse, where  y i is the predicted value of the ith sample and y i is the corresponding true value. For 3.2.Featured Data.Take dataset of HK for example.Following air pollutant data features are contained: FSP, NO 2 , NO x , O 3 , RSP, and SO 2 (unit of measurement of all the air pollutants is μg/m 3 ).Air pollutant data samples are shown in Table 2. 4

Advances in Meteorology
Following meteorology data features are contained: T, P0, P1, δP, H, WD, WP, and dew.Meteorological samples are shown in Table 3.
Following time stamp features are contained: month, the day of the week (week), the day of the month (day), and the hour of the day (hour).ere may be a yearly trend of the air quality, but we just have limited years of data, so "year" is not included in the feature set.

Feature Transformation
(1) Encoding Wind Direction.Among the data obtained, the wind direction is nonnumeric (i.e., "east," "east-southeast").It has to be converted to numerical value so that the algorithms can make use of.One-hot encoding (e.g., "east" is encoded as [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]) and label encoding (e.g., "east" is encoded as 1, "south" is encoded as "2" etc.) were tried in this paper.Figure 1 shows the forecast performances of RF, MLP, and SVC_linear (SVC with linear kernel) algorithms when the wind direction was encoded by one-hot encoding and label encoding, respectively, and the parameters of the algorithms stayed unchanged.From the gure, it is obvious that label encoding is superior over onehot encoding on the dataset.erefore, in this paper, the wind direction was label encoded.
(2) Missing Data Imputation.Linear interpolation was used in the paper to interpolate the missing values in the two datasets.
where V t denotes the missing value at time t and n is the time gap between interval (V s , V e ).
(3) Data Normalization.Normalization or standardization of either input or target variables tends to make the training process better behaved.Normalization scales the feature values in the range [0,1]: Standardization transforms the feature values to have zero mean and unit variance: To see whether normalization or standardization helps, both of them were tried and compared with the one without any processing.Again, RF, MLP, and SVC_linear were used as the validation algorithms.Results are shown in Figure 2. e gure shows that, generally, models bene t from normalization or standardization, especially for the neural network model.Normalization is slightly better than standardization.erefore, in this paper, the data were normalized.T, air temperature (degrees Celsius) at 2 meters height above the Earth's surface; P0, atmospheric pressure at weather station level (millimeters of mercury); P1, atmospheric pressure reduced to mean sea level (millimeters of mercury); δP, pressure tendency, changes in atmospheric in the last three hours; H, relative humidity (%) at a height of 2 meters above the Earth's surface; WD, mean wind direction (compass points) at a height of 10-12 meters above the Earth's surface over the 10-minute period immediately preceding the observation; WP, mean wind speed at a height of 10-12 meters above the Earth's surface over the 10-minute period immediately preceding the observation (meters per second); dew, dew point at 2 meters height above the Earth's surface (degrees Celsius).

Advances in Meteorology
Meteorological (M) data features: <T, P0, P1, δP, H, WD, WP, dew> Air pollutant (AP) data features: <FSP, NO 2 , NO x , O 3 , RSP, SO 2 > Time data features: <month, week, day, hour> e target is to forecast the near future AQHI.However, not all the features above are related to the AQHI, nding out the features which are correlated with the target would be bene cial.e historical pollutants and meteorology may impact the future air quality as the simple empirical approaches assume, nding out the in uential historical time lag would be important as well.
(1) Feature Correlation Analysis.In this paper, Spearman's correlation analysis was used due to the possible nonlinear relationships between variables.Spearman's rank correlation coe cient measures the monotonic association between two variables and relies on the rank order of values [28].e formula for Spearman's coe cient is where rank x , rank y are the ranked (sorted) values of variables x and y, cov(•) is the covariance, and σ(•) is the standard deviation.Figure 3 shows the Spearman correlation coe cients between the features of HK dataset.Correlation scores go from −1 to 1. Perfect positive correlation is 1.Perfect negative correlation is −1.e gure shows that FSP, O 3 , RSP, SO 2 , P0, and P1 have strong positive correlations with the AQHI, while T, H, and dew have strong negative correlations with the AQHI.Cohen's standard [29] was used in this paper to select the correlated features.Features with association smaller than 0.30 are discarded.e picked features are as follows: <FSP, NO 2 , NO x , O 3 , RSP, SO 2 , T, P0, P1, δP, H, dew, WP, WD, month, hour> (2) Temporal Correlation Analysis.Intuitively, historical data from di erent periods have di erent e ects on future time lags.More recent events have a stronger in uence on the current status, while earlier events have a weaker in uence.Denote current time as t, the historical time lag as h, and the future time lag as f, and then the prediction time is t + f (f 1, 3, 6, 9, 12) and the in uential historical time is t − h (h 1, 2, . . ., n).
e multiscale prediction task is represented in Figure 4.In this paper, the LSTM NN model which is capable of learning long time series was used to select the appropriate in uential historical time lag [30].
e network architecture of the LSTM model used in the paper is shown in Figure 5, which is the same as the LSTMextended network proposed in [13].e main input is the air pollutant data, and the auxiliary input is the time and meteorology data.
ere are two LSTM layers and one   output layer which is a fully connected layer that has 11 neurons corresponding to the number of classes.e number of neurons in the LSTM layer has to be tuned.For simplicity, the number of neurons in each LSTM layer was set to an equivalent value chosen from a candidate set of {50, 100, 200, 500, 1000, 2000}.e most appropriate setting was chosen that yielded the best performance based on several comparative experiments.When the number of neurons in the LSTM was as 1000, the LSTM achieved the best performance.erefore, in this paper, the number of neurons in the LSTM layers was set as 1000.
e future 1, 3, 6, 9, and 12 hours' AQHIs were predicted in this paper.With each future time lag, the in uences of di erent historical time lags were examined.e results are given in Table 4. e evaluation metric is weighted f1-score (f1 in Table 4).
e corresponding curve graph is given in Figure 6. e result shows that di erent future time lag (F-lag in Table 4) corresponds to slightly di erent optimal historical time lag (H-lag in Table 4).e general in uential time of historical data for a speci c future time's AQHI is around 9 hours.
Notably, the result shows that the prediction performances are poor for future time lag larger than 6, indicating that long-term prediction tasks are instinctively more difcult.Small-time lag cannot guarantee enough long-term memory inputs for the LSTM model, while large time lags permit an increased number of unrelated inputs, which increase the model's complexity and the di culty of learning useful features.According to the above experiments, for simplicity, 9 was selected as the most appropriate in uential historical time lag for di erent future time lag.

Advances in Meteorology
and SVC are widely used air quality forecast models, they were ne-tuned in this paper in order to make a fair comparison with MKSVC, and the LSTM in this paper has the same structure as the LSTM extended model proposed in [13].Figure 7 shows the experimental ow.All algorithms were designed and tested with the same operation environment (Python 3.5.3,Windows 10, Intel ® Core ™ i7-5500U CPU @2.40 GHz, 16.0 GB RAM).

Parameter Optimization.
Parameter optimization refers to the method of nding optimal parameters for a machine-learning algorithm.is is important since the performance of any machine learning algorithm depends to a huge extent on what the values of parameters are.For each prediction time lag, the parameters are di erent for each algorithm.It means an optimal model for each prediction task and each algorithm need to be tuned.e ways to get the parameters of MKSVC are detailed in Section 2 and Section 3.2.2 for LSTM.For the other algorithms, the parameter tuning process of the one-hour future time lag prediction task is presented in the following part, and the multiscale prediction tasks have identical ne-tuning processes.First, the grid search interval of a parameter is narrowed by analyzing the in uence curve of a single parameter on the training score and the validation score.For instance, by varying the kernel coe cient c of the RBF kernel in SVC_rbf, the c-score curve can be obtained as shown in Figure 8. e yellow line denotes the score over the training set.e purple line represents the score on the validation set, and the shadow represents the variance.
e gure shows that, at rst, both the training and validation scores rise with the increase of c.However, when c reaches around 0.5, a further increase will result in the increase of the training score but the decrease of the validation score; it signi es that the model is getting over tting.According to this in uence curve, the grid search interval of c in the next step can be narrowed between 0.0 and 1.0.
Based on the in uence curve, the grid search intervals of the main parameters of the ARIMA, RF, MLP, and SVCs are shown in Table 5. RF, MLP, and SVCs used in this paper are implemented in scienti c toolbox scikit-learn [31] and ARIMA implemented in statsmodels [32].e unlisted parameters are set as default.
en, a gird search with 5-fold cross validation was applied to nd the optimum parameter.By exhaustively considering all parameter combinations in Table 5, the optimal parameter settings of the ARIMA, RFC, MLP, and SVCs are obtained as shown in Table 6.After getting the inner kernel coe cients of all the base kernels, the centered kernel alignment method described in Section 2.2. was used to get the optimal weight for each kernel.

Comparison.
For HK, one year's data was used for training, and the other year's data was used for testing.For Beijing, the rst two years' data was used for training, and  Advances in Meteorology the other three year's data was used for testing.e comparisons of the predictions for the future 1, 3, 6, 9, and 12 hours are given below.

Predict the AQHI of Hong
Kong.show the performances of the algorithms for forecasting the future 1, 3, 6, 9, and 12 hours' AQHI in Hong Kong.From the table, the following conclusions can be drawn: (1) MKSVC performs best on all the three prediction tasks.SVC models with linear, RBF, and polynomial kernels perform better than other models except for the MKSVC.Sigmoid kernel SVC always makes the  (10,20,10), (20,40,20)    Generally, all the models make better prediction for light pollutions than severe ones due to the bias towards majority classes.It demonstrates that the task for severe air pollution prediction is challenging.As can be seen from the experiments, long-term prediction task is difficult, so is the task to predict severe air pollutions.ough the proposed multiple kernel learningbased approach demonstrated relatively good performance in terms of both long-term prediction and severe air pollution prediction, more sophisticated methods need to be explored in order to build a more comprehensive and effective air quality forecasting system.suspended particulates (RSP or PM10) or fine suspended particulates (FSP or PM2.5), whichever poses a higher health risk).
e %AR of each pollutant depends on its concentration and a risk factor which was derived from local health statistics and air pollution data.e %AR is then compared to a scale to obtain the appropriate banding of AQHI.

Figure 2 :
Figure 2: Comparison of with and without normalization.

Figure 5 :Figure 6 :
Figure 5: e LSTM network architecture used in this paper.

Figures 11 and 12
Figures 11 and 12 are the confusion matrixes of MKSVC and SVC_rbf when forecasting the next hour's PM2.5 IAQL of Beijing.e same conclusion can be drawn as that of HK.Generally, all the models make better prediction for light pollutions than severe ones due to the bias towards majority classes.It demonstrates that the task for severe air pollution prediction is challenging.

Table 1 :
Air quality classifications and health risk.

Table 4 :
In uences of di erent historical time lag over di erent future time lag.

Table 5 :
Main parameters and their tuning range of the used algorithms.
: AR specification; d: integration order; q: MA specification; C: regularization coefficient in SVC; n_estimators: number of trees in the forest; max_depth: maximum depth of the tree; max_features: maximum number of features when looking for the best split; min_samples_split: the minimum number of samples required to split an internal node; min_samples_leaf: the minimum number of samples required to be at a leaf node; solver: algorithm used in the optimization problem; hidden_layer_sizes: hidden layer size; alpha: regularization term parameter in MLP; activation: activation function for the hidden layer; gamma: kernel coefficient for 'rbf', 'poly', and 'sigmoid'; degree: degree of the polynomial kernel function; coef0: independent term in kernel functions for 'poly' and 'sigmoid'; * [a, b; c] means within range [a, b], increase c every iteration; {} means set of values. p

Table 6 :
e optimal parameter settings of the algorithms.

Table 7 :
Performance comparison for predicting the next hour's AQHI in HK.

Table 8 :
Performance comparison for predicting the future 3 hour's AQHI in HK.
(2)Time series models like ARIMA and LSTM fail to compete with the widely used parametric models like RF, MLP, and SVCs, and as the future time lag increases, the time series models' performances decrease, while the parametric models keep achieving very satisfying results.(3)Among the well-performed SVC models, linear kernel model performs best, which demonstrates

Table 9 :
Performance comparison for predicting the future 6 hour's AQHI in HK.
e proposed MKSVC algorithm offers a better predictive ability than the other models.(2)e proposed MKSVC algorithm is capable of forecasting severe air pollution much better than the other models.(3) e widely used parametric models RF, MLP, and SVC exhibit better prediction performance than the time series models ARIMA and LSTM.(4) Feature transformation and feature selection play a significant role in making better air quality forecasting.
[25](NO 2 ) � 0.0004462559, β(SO 2 ) � 0.0001393235, β(O 3 ) � 0.0005116328, β(PM10) � 0.0002821751, and β(PM2.5)�0.0002180567areaddedhealthriskfactors(technicallyknownasregression coefficients) of the respective pollutants[24].B.2.Calculation of IAQI.Each pollutant's individual AQI is called its IAQI.e highest IAQI among these six pollutants at a given time is called the primary or dominant pollutant and is chosen for the overall AQI value.IAQI p � IAQI H i − IAQI L o BP H i − BP L o C p − BP L o   + IAQI L o , AQI � max IAQI 1 , IAQI 2 , IAQI 3 , ..., IAQI n  , (B.3)where C p is mass concentration value of the air pollutant p, BP H i is the high value of the concentration limit which can be checked in the reference table from the paper[25], BP L o is the low value of the concentration limit which can be checked in the reference table from[25],IAQIH i is the corresponding value of BP H i in the same reference table, and IAQI L o is also the corresponding value of BP L o in the reference table.e detailed break down of China AQI for PM2.5 concentrations is shown in Table 17.