Real-time air quality assessment based on the EM-TOPSIS-PROMETHEE integrated assessment method

The goal of this article is to discuss the application of the EM-TOPSIS-PROMETHEE integrated assessment method in the real-time assessment of urban air quality and to compare it with other methods. By using the web crawler based on Selenium + Python we obtained the concentration of five air pollutants in 150 cities. We imported the data into MATLAB, used the Entropy evaluation method (EM) to get the weight of each air pollutant indicator, then used TOPSIS-PROMETHEE to obtain the score of air quality in each city. This method is more comprehensive than AQI, and it can also be combined with IAQI. Finally, we discussed the advantages and disadvantages of this method.


Introduction
Air pollution is one of the main factors affecting the environment. The main reasons of air pollution include industrial emissions, forest fires, evaporation of salts from the oceans' surface, and other natural and anthropogenic factors. Air pollution not only harms human health, but also significantly affects climate (decreased solar radiation, increased precipitation, and acid rain). In China the problem of air pollution is very serious.
The most commonly used simple method to assess air quality is AQI (air quality index). AQI is calculated as the maximum value of IAQI -air quality index of a single pollutant. It is unwise to use AQI for grading air quality, as it considers only one, although the most important pollutant [1].
The goal of this research is to discuss the application of EM-TOPSIS-PROMETHEE in assessing the air quality of cities [2]. The criterion of air quality assessment is the concentration of air pollutants. The lower the concentration of air pollutants, the better air quality is. In this article, factors that determine good quality are the concentrations of five air pollutants: PM10, PM2.5, CO, NO2, and SO2, they can be found on weather sites. Based on the concentration of these five air pollutants in 154 Chinese cities at 14:00 on July 13, 2020 (Beijing Time), we estimated the air quality.
The Entropy Method (EM) is used to determine the weight of each indicator (concentration of a pollutant). TOPSIS and PROMETHEE are used to determine the score of each city.
We assume that there are m objects that we need to assess (cities) and n indicators (air pollutants): where is the evaluation matrix, concentration of the j-th air pollutant in the i-th city.  First, we normalize the data using min-max normalization: For positive j, the more , the better. Obviously, all the concentration of air pollutants are negative indicators. Thus, we got a normalized score matrix. * = [ * ]. (3) In order to determine the weight of each indicator, we can use EM. Principal Component Analysis (PCA), Analytic hierarchy process (AHP), and EM are the three most commonly used methods for calculating weights. PCA can reduce the number of indicators. For example, if there are 14 indicators, then they can be condensed into 4 (also called factors or principal components) by using PCA. Since we have only five indicators (concentration of five pollutants), we do not need to use this method. AHP requires experts to compare the importance of various pollutants and it is very difficult to operate (mainly because the author is not an expert in this field and he is not familiar with any of them). EM is based on the concept of entropy in thermodynamics. It describes the average size of information and can determine the weight of each indicator accordingly. The advantage of the model is that it is an objective assessment method. Compared with AHP, EM does not require human experts. The disadvantage of the model is that it evaluates the importance of the indicator only by values of , ignoring the physical meaning. In addition, EM cannot reduce the dimension of data (number of indicators) like PCA.
The algorithm of determining the weight of each indicator (air pollutant) is as follows: 1. Calculation of probability p ij : 2. Calculation of the entropy of the indicator j: Calculation of the redundancy of information entropy : 4. Calculation of the weight of each indicator : The concentration of the main five air pollutants in different cities of China and indicator weights are presented in Table 1. Then we use TOPSIS to assess the air quality of each city. This method does not have restrictions on the distribution and amount of data, and the calculation of data is simple. The advantage of the model is that it uses the relationship between objects (cities) and positive and negative ideal objects for valuing. The disadvantage of the model is that when there are few objects (cities) to be evaluated, the positive and negative ideal objects are not typical and representative, which will affect the credibility of the evaluation results. However, in the example in this article, there are many objects (over 150 cities).
Let us introduce the concept of the positive and negative ideal objects. The positive ideal objects select the best values of each air pollutant concentration in all cities. By contrast, the negative ideal objects select the worst values of each air pollutant concentration in all cities. For example, if we need to compare only Beijing and Shanghai (rather than 154 cities), and we can find the pollutant concentration of these two cities in Table 1, then the pollutant concentration of positive ideal object of these two cities is: PM10 -21 g/m 3 , PM2.5 -23 g/m 3 , CO -0.8 g/m 3 , NO2 -14 g/m 3 , and SO2 -2 g/m 3 .
First, we define the matrix Z: where + represents the value of the indicator j of a positive ideal object. − represents the value of the indicator of a negative ideal object.
To determine the geometric distance + between object i and the positive ideal object: where is the weight of the indicator j.
To determine the weighted geometric distance − between the object i and the negative ideal object: Score of the object i is determined as follows: Obviously, 0 ≤ ≤ 1.
Next we use PROMETHEE II to assess the air quality of each city. The method compares the objects (cities) in pairs by comparing each indicator (concentration of air pollutants), then sums the results up and calculates the positive and negative flows for each object. The disadvantage of the model is that we have to determine the preference function and weight for each indicator. The operation is complex.
The algorithm inputs are shown in Table 2. Therefore, in addition to the score matrix * and the weight , it is also necessary to select the preference function ( * , * ) for each indicator (air pollutant). ( * , * ) represents the score of object i (the i-th city) compared to object j (the j-th city) when examining the k-th indicator (concentration of the k-th air pollutant). Suppose = * − * . Define the preference function ( * , * ) as: The function ( ) must satisfy the restrictions: 1.
Here we select the most commonly used function: In other words, we assume that if * ≤ * , then ( * , * ) = 0 . If * > * , then ( * , * ) = 1. After determining the preference function ( * , * ), we can determine the positive and negative flows of each object (city) according to the following algorithm.
where + is the positive flow of the i-th object, and − is the negative flow of the i-th object. is the total number of objects (m = 154: number of cities in our research) and is the total number of indicators (n = 5: number of air pollutants in our research).
In the end, we combine the results of TOPSIS and PROMETHEE II. First, we use min-max normalizations to normalize and get , normalize and get . Then define the final score as: where and depend on user's preference. In our research, we assume that = = 0.5.

Results
Using Python + Selenium, we obtained the concentrations of five air pollutants (PM10, PM2.5, CO, NO2, and SO2) at 14:23 on July 13, 2020 in 154 cities. Data can be found on the website China Weather (Table 1) [3]. The calculation results are shown in Table 3. For a more convenient comparison, we can compare Table 3 with Table 4 showing concentration of the major air pollutants in the cities from Table 3. After min-max normalization, we obtain results presented in Table 5.  The sort result obtained by this method are generally in line with expectations. However, when comparing some of these cities, it is difficult to immediately judge which city's air quality is better or worse. For example, 10 and 2 of Benxi is better than Jiayiguan.
2.5 and of Benxi is worse than Jiayiguan. But the weight of the 2 , calculated by the Entropy Method, is smaller, therefore Benxi's advantage in 2 is difficult to influence its position in the sort result.    Figure 2 shows that after min-max normalization, the value of the 2 is mainly concentrated in the range from 0.9 to 1. EM assumes that such data contain less information, so the weight of 2 is less than those of other indicators. The distribution of other indicators is sparser, so their weights are greater. The weights of each indicator are presented in Table 6. To compare TOPSIS, PROMETHEE, and TOPSIS-PROMETHEE, we normalized the results of these three methods by using min-max normalization, and calculated their standard deviation (SD), average difference between the best score and other scores (DAB), and score difference between the first and second best scores (DFS). Below is the algorithm.
We assume that the scores after min-max normalization is ′.
where is the number of cities. Then we obtain the following expressions for SD and DAB: Obviously, the larger SD, FAB, and DFS, the better the method is. From Table 7, we can conclude that PROMETHEE is the best method in this example. TOPSIS-PROMETHEE is the second best. TOPSIS is the worst method. However, when assessing other examples. PROMETHEE may behave wrong. Since TOPSIS-PROMETHEE combines the two methods, it usually has good reliability. In other articles which compared this method, TOPSIS-PROMETHEE is often the best assessment method. Therefore, the use of TOPSIS-PROMETHEE for air quality assessment is still reasonable.

Conclusion
EM-TOPSIS-PROMETHEE is a reliable method for assessing air quality. Widely used method of assessing air quality (AQI) takes into account only one pollutant; therefore, the use of this method in scientific research (especially in sorting) is not very reasonable. The main advantage of EM-TOPSIS-PROMETHEE is complete combination of various methods, considering various air pollutants [2]. The disadvantages of this method are also obvious. Real-time air quality assessment depends on the data scanned by the web scanner. The data quality depends on the data source quality. Moreover, the method is very complex. We need to select the preference functions when using PROMETHEE.
For the further research we can:  select other preference functions when using PROMETHEE to assess air quality. This article has used the simplest preference function.
 use IAQI as elements of matrix. In this article, the concentrations of pollutants are used as elements of the matrix, but the relationship between the harm of an air pollutant and its concentration is obviously not just proportional; therefore, it is inappropriate to use the concentrations of pollutants as elements of the matrix. We can consider using the IAQI of each pollutant. However, we have not yet found a good data source for collecting IAQI. IAQI also depends on countries. American standard of AQI is deferent from Chinese standard [4].