Abstract
The purpose of this study was to compare different machine learning models for predicting daily air quality index (AQI) and evaluating air quality grade (AQG). The study used publicly available data from 2014 to 2019 for six pollutants (PM10, PM2.5, NO2, SO2, CO, O3). Four models (random forest (RF), gradient boosting (GB), Lasso Regression (LASSO), and the Stacked Regressor) were used for predicting AQI, while six models (K-Nearest Neighbors (KNN), support vector machines (SVM), decision tree (DT), multilayer perceptron (MLP), random forest (RF), and the Stacked Classifier) were used for forecasting AQG. The individual models were evaluated using different statistical measures, such as R-squared (R2), root mean square error (RMSE), mean absolute error (MAE), accuracy score (ACC), Matthew’s Correlation Coefficient (MCC), and F1 score. The study found that the stack model performed consistently across all metric scores for AQI prediction. The stack model had an R2 score of 0.973, RMSE of 7.568, and MAE of 4.596, outperforming LASSO, GB, and RF. This indicates that the stack model was able to minimize the weaknesses of the individual models and provide a more accurate prediction. For AQG, the stack model also performed better across all metric scores, with an ACC of 0.970, MCC of 0.960, and F1 of 0.970, outperforming MLP, KNN, SVM, DT, and RF. The study concluded that stacked generalization machine learning models can be used for forecasting air quality index and grade with high efficiency and precision, mitigating the concerns of overfitting against individual models.
Similar content being viewed by others
Availability of data and materials
The entire dataset used in this study was obtained from the China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn).
References
Abelsohn A, Stieb D, Sanborn MD, Weir E (2002) Identifying and managing adverse environmental health effects: 2. Outdoor air pollution. CMAJ 166(9):1161–1167
Ahmadi K, Kalantar B, Saeidi V, Harandi EK, Janizadeh S, Ueda N (2020) Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral sentinel-2 data. Remote Sens 12(18):3019
Akinfolarin OM, Boisa N, Obunwo C (2017) Assessment of particulate matter-based air quality index in Port Harcourt Nigeria. J Environ Anal Chem 4(4):224
Alfeilat HAA, Hassanat AB, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, Prasath VS (2019) Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data 7(4):221–248
Banerjee T, Srivastava RK (2011) Assessment of the ambient air quality at the Integrated Industrial Estate-Pantnagar through the air quality index (AQI) and exceedence factor (EF). Asia-Pac J Chem Eng 6(1):64–70
Bao J, Yang X, Zhao Z, Wang Z, Yu C, Li X (2015) The spatial-temporal characteristics of air pollution in China from 2001–2014. Int J Environ Res Public Health 12(12):15875–15887
Ben Seghier MEA, Carvalho H, Keshtegar B, Correia JA, Berto F (2020) Novel hybridized adaptive neuro-fuzzy inference system models based particle swarm optimization and genetic algorithms for accurate prediction of stress intensity factor. Fatigue Fract Eng Mater Struct 43(11):2653–2667
Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64
Bui DT, Tsangaratos P, Ngo PTT, Pham TD, Pham BT (2019) Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ 668:1038–1054
Canbek G, TaskayaTemizel T, Sagiroglu S (2021) BenchMetrics: a systematic benchmarking method for binary classification performance metrics. Neural Comput Appl 33(21):14623–14650
Chen B, Lu S, Li S, Wang B (2015) Impact of fine particulate fluctuation and other variables on Beijing’s air quality index. Environ Sci Pollut Res 22(7):5139–5151
Cheng Y, Zhang H, Liu Z, Chen L, Wang P (2019) Hybrid algorithm for short-term forecasting of PM2. 5 in China. Atmos Environ 200:264–279
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13
Choi G, Heo S, Lee J-T (2016) Assessment of environmental injustice in Korea using synthetic air quality index and multiple indicators of socioeconomic status: a cross-sectional study. J Air Waste Manag Assoc 66(1):28–37
Dastoorpoor M, Idani E, Goudarzi G, Khanjani N (2018) Acute effects of air pollution on spontaneous abortion, premature delivery, and stillbirth in Ahvaz, Iran: a time-series study. Environ Sci Pollut Res 25(6):5447–5458
Dominici F, Peng RD, Barr CD, Bell ML (2010) Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology 21(2):187–194
Dragomir EG (2010) Air quality index prediction using K-nearest neighbor technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII 1(2010):103–108
Feng Q, Wu S, Du Y, Xue H, Xiao F, Ban X, Li X (2013) Improving neural network prediction accuracy for PM10 individual air quality index pollution levels. Environ Eng Sci 30(12):725–732
Ganesh SS, Modali SH, Palreddy SR, Arulmozhivarman P (2017) Forecasting air quality index using regression models: a case study on Delhi and Houston. 248–254
GB 3095–2012 (2012) China Ambient air quality standards. Environmental Development Center
Harrington P (2012) Machine learning in action. Simon and Schuster
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, pp 278–282
Hong H, Liu J, Bui DT, Pradhan B, Acharya TD, Pham BT, Ahmad BB (2018) Landslide susceptibility mapping using J48 decision tree with AdaBoost, bagging and rotation forest ensembles in the Guangchang area (China). CATENA 163:399–413
Janarthanan R, Partheeban P, Somasundaram K, Elamparithi PN (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc 67:102720
Kamiński B, Jakubczyk M, Szufel P (2018) A framework for sensitivity analysis of decision trees. Cent Eur J Oper Res 26(1):135–159
Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755
Kumar A, Goyal P (2011a) Forecasting of daily air quality index in Delhi. Sci Total Environ 409(24):5517–5523. https://doi.org/10.1016/j.scitotenv.2011.08.069
Kumar A, Goyal P (2011b) Forecasting of air quality in Delhi using principal component regression technique. Atmos Pollut Res 2(4):436–444
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithms. John Wiley and Sons, New Jersey
Kurt A, Oktay AB (2010) Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Syst Appl 37(12):7986–7992
Liang YC, Maimury Y, Chen AHL, Juarez JRC (2020) Machine learning-based prediction of air quality. Appl Sci 10(24):9151
Liu H, Chen C (2020) Spatial air quality index prediction model based on decomposition, adaptive boosting, and three-stage feature selection: a case study in China. J Clean Prod 265:121777
Liu H, Li Q, Yu D, Gu Y (2019) Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci 9(19):4069
Mahalingam U, Elangovan K, Dobhal H, Valliappa C, Shrestha S, Kedam G (2019) A machine learning model for air quality prediction for smart cities. In: International conference on wireless communications signal processing and networking (WiSPNET) pp 452–457. IEEE
Maynard R (2004) Key airborne pollutants—the impact on health. Sci Total Environ 334:9–13
Mishra D, Goyal P (2015) Analysis of ambient air quality using fuzzy air quality index: a case study of Delhi, India. Int J Environ Pollut 58(3):149–159
Mohammed Y, Caleb J (2014) Assessment of some air pollutants and their corresponding air quality at selected activity areas in Kaduna metropolis. In: Paper presented at the Proceedings of 37th Annual International Conference of Chemical of Nigeria (SCN) at Uyo, Akwa Ibom State Nigeria, 7th
Pan R, Wang X, Yi W, Wei Q, Gao J, Xu Z, Duan J, He Y, Tang C, Liu X, Zhou Y, Son S, Ji Y, Zou Y, Su H (2020) Interactions between climate factors and air quality index for improved childhood asthma self-management. Sci Total Environ 723:137804. https://doi.org/10.1016/j.scitotenv.2020.137804
Polley E, LeDell E, Kennedy C, van der Laan MS (2019) Super learner prediction. 2018. URL http://CRAN.R-project.org/package=SuperLearner, r package version, 2–0
Prasad P, Loveson VJ, Das B, Kotha M (2021) Novel ensemble machine learning models in flood susceptibility mapping. Geocarto Int 1–23
Qiao X, Jaffe D, Tang Y, Bresnahan M, Song J (2015) Evaluation of air quality in Chengdu, Sichuan Basin, China: are China’s air quality standards sufficient yet? Environ Monit Assess 187(5):1–11
Sicard P, Lesne O, Alexandre N, Mangin A, Collomp R (2011) Air quality trends and potential health effects–development of an aggregate risk index. Atmos Environ 45(5):1145–1153
Soni, HB, Patel J (2018) Assessment of Ambient Air Quality and Air Quality Index in Golden Corridor of Gujarat, India: a case study of Dahej port. Int J Environ
Sonibare J, Adebiyi F, Obanijesu E, Okelana O (2010) Air quality index pattern around petroleum production facilities. Manag Environ Qual Int J
Sowlat MH, Gharibi H, Yunesian M, Mahmoudi MT, Lotfi S (2011) A novel, fuzzy-based air quality index (FAQI) for air quality assessment. Atmos Environ 45(12):2050–2059
Taylan O (2017) Modeling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality. Atmos Environ 150:356–365
Wang H, Jiao M, Tan Y (2016) Air quality index forecast based on fuzzy time series models. J Resid Sci Technol 13(5)
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
World Health Organization (2013) Health Effects of particulate matter: policy implications for countries in eastern Europe, Caucasus and central Asia
Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821
Zhai B, Chen J (2018) Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing China. Sci Total Environ 635:644–658
Zhang X, Lu JJ, Qin X, Zhao XN (2013) A high-level energy consumption model for heterogeneous data centers. Simul Model Pract Theory 39:41–55
Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231(Pt 2):1232–1244. https://doi.org/10.1016/j.envpol.2017.08.069
Zhu S, Qiu X, Yin Y, Fang M, Liu X, Zhao X, Shi Y (2019) Two-step-hybrid model based on data preprocessing and intelligent optimization algorithms (CS and GWO) for NO2 and SO2 forecasting. Atmos Pollut Res 10(4):1326–1335
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Editorial responsibility: Chenxi Li.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aram, S.A., Nketiah, E.A., Saalidong, B.M. et al. Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. Int. J. Environ. Sci. Technol. 21, 1345–1360 (2024). https://doi.org/10.1007/s13762-023-05016-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13762-023-05016-2