Machine learning-based prediction of air quality index and air quality grade: a comparative analysis

Aram, S. A.; Nketiah, E. A.; Saalidong, B. M.; Wang, H.; Afitiri, A.-R.; Akoto, A. B.; Lartey, P. O.

doi:10.1007/s13762-023-05016-2

Machine learning-based prediction of air quality index and air quality grade: a comparative analysis

Original Paper
Published: 07 June 2023

Volume 21, pages 1345–1360, (2024)
Cite this article

International Journal of Environmental Science and Technology Aims and scope Submit manuscript

S. A. Aram ORCID: orcid.org/0000-0003-1280-2185^1,2,
E. A. Nketiah³,
B. M. Saalidong⁴,
H. Wang^2,5,
A.-R. Afitiri⁶,
A. B. Akoto⁷ &
…
P. O. Lartey⁸

816 Accesses
3 Citations
Explore all metrics

Abstract

The purpose of this study was to compare different machine learning models for predicting daily air quality index (AQI) and evaluating air quality grade (AQG). The study used publicly available data from 2014 to 2019 for six pollutants (PM₁₀, PM_2.5, NO₂, SO₂, CO, O₃). Four models (random forest (RF), gradient boosting (GB), Lasso Regression (LASSO), and the Stacked Regressor) were used for predicting AQI, while six models (K-Nearest Neighbors (KNN), support vector machines (SVM), decision tree (DT), multilayer perceptron (MLP), random forest (RF), and the Stacked Classifier) were used for forecasting AQG. The individual models were evaluated using different statistical measures, such as R-squared (R²), root mean square error (RMSE), mean absolute error (MAE), accuracy score (ACC), Matthew’s Correlation Coefficient (MCC), and F1 score. The study found that the stack model performed consistently across all metric scores for AQI prediction. The stack model had an R² score of 0.973, RMSE of 7.568, and MAE of 4.596, outperforming LASSO, GB, and RF. This indicates that the stack model was able to minimize the weaknesses of the individual models and provide a more accurate prediction. For AQG, the stack model also performed better across all metric scores, with an ACC of 0.970, MCC of 0.960, and F1 of 0.970, outperforming MLP, KNN, SVM, DT, and RF. The study concluded that stacked generalization machine learning models can be used for forecasting air quality index and grade with high efficiency and precision, mitigating the concerns of overfitting against individual models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence-based solutions for climate change: a review

Article Open access 13 June 2023

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

Air pollution prediction with machine learning: a case study of Indian cities

Article 15 May 2022

Availability of data and materials

The entire dataset used in this study was obtained from the China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn).

References

Abelsohn A, Stieb D, Sanborn MD, Weir E (2002) Identifying and managing adverse environmental health effects: 2. Outdoor air pollution. CMAJ 166(9):1161–1167
Google Scholar
Ahmadi K, Kalantar B, Saeidi V, Harandi EK, Janizadeh S, Ueda N (2020) Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral sentinel-2 data. Remote Sens 12(18):3019
Article Google Scholar
Akinfolarin OM, Boisa N, Obunwo C (2017) Assessment of particulate matter-based air quality index in Port Harcourt Nigeria. J Environ Anal Chem 4(4):224
Article Google Scholar
Alfeilat HAA, Hassanat AB, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, Prasath VS (2019) Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data 7(4):221–248
Article Google Scholar
Banerjee T, Srivastava RK (2011) Assessment of the ambient air quality at the Integrated Industrial Estate-Pantnagar through the air quality index (AQI) and exceedence factor (EF). Asia-Pac J Chem Eng 6(1):64–70
Article CAS Google Scholar
Bao J, Yang X, Zhao Z, Wang Z, Yu C, Li X (2015) The spatial-temporal characteristics of air pollution in China from 2001–2014. Int J Environ Res Public Health 12(12):15875–15887
Article CAS Google Scholar
Ben Seghier MEA, Carvalho H, Keshtegar B, Correia JA, Berto F (2020) Novel hybridized adaptive neuro-fuzzy inference system models based particle swarm optimization and genetic algorithms for accurate prediction of stress intensity factor. Fatigue Fract Eng Mater Struct 43(11):2653–2667
Article Google Scholar
Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64
Article Google Scholar
Bui DT, Tsangaratos P, Ngo PTT, Pham TD, Pham BT (2019) Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ 668:1038–1054
Article CAS Google Scholar
Canbek G, TaskayaTemizel T, Sagiroglu S (2021) BenchMetrics: a systematic benchmarking method for binary classification performance metrics. Neural Comput Appl 33(21):14623–14650
Article Google Scholar
Chen B, Lu S, Li S, Wang B (2015) Impact of fine particulate fluctuation and other variables on Beijing’s air quality index. Environ Sci Pollut Res 22(7):5139–5151
Article CAS Google Scholar
Cheng Y, Zhang H, Liu Z, Chen L, Wang P (2019) Hybrid algorithm for short-term forecasting of PM2. 5 in China. Atmos Environ 200:264–279
Article CAS Google Scholar
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13
Article Google Scholar
Choi G, Heo S, Lee J-T (2016) Assessment of environmental injustice in Korea using synthetic air quality index and multiple indicators of socioeconomic status: a cross-sectional study. J Air Waste Manag Assoc 66(1):28–37
Article CAS Google Scholar
Dastoorpoor M, Idani E, Goudarzi G, Khanjani N (2018) Acute effects of air pollution on spontaneous abortion, premature delivery, and stillbirth in Ahvaz, Iran: a time-series study. Environ Sci Pollut Res 25(6):5447–5458
Article CAS Google Scholar
Dominici F, Peng RD, Barr CD, Bell ML (2010) Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology 21(2):187–194
Article Google Scholar
Dragomir EG (2010) Air quality index prediction using K-nearest neighbor technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII 1(2010):103–108
Google Scholar
Feng Q, Wu S, Du Y, Xue H, Xiao F, Ban X, Li X (2013) Improving neural network prediction accuracy for PM10 individual air quality index pollution levels. Environ Eng Sci 30(12):725–732
Article CAS Google Scholar
Ganesh SS, Modali SH, Palreddy SR, Arulmozhivarman P (2017) Forecasting air quality index using regression models: a case study on Delhi and Houston. 248–254
GB 3095–2012 (2012) China Ambient air quality standards. Environmental Development Center
Harrington P (2012) Machine learning in action. Simon and Schuster
Google Scholar
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, pp 278–282
Hong H, Liu J, Bui DT, Pradhan B, Acharya TD, Pham BT, Ahmad BB (2018) Landslide susceptibility mapping using J48 decision tree with AdaBoost, bagging and rotation forest ensembles in the Guangchang area (China). CATENA 163:399–413
Article Google Scholar
Janarthanan R, Partheeban P, Somasundaram K, Elamparithi PN (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc 67:102720
Article Google Scholar
Kamiński B, Jakubczyk M, Szufel P (2018) A framework for sensitivity analysis of decision trees. Cent Eur J Oper Res 26(1):135–159
Article Google Scholar
Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755
Article CAS Google Scholar
Kumar A, Goyal P (2011a) Forecasting of daily air quality index in Delhi. Sci Total Environ 409(24):5517–5523. https://doi.org/10.1016/j.scitotenv.2011.08.069
Article CAS Google Scholar
Kumar A, Goyal P (2011b) Forecasting of air quality in Delhi using principal component regression technique. Atmos Pollut Res 2(4):436–444
Article CAS Google Scholar
Kuncheva LI (2014) Combining pattern classifiers: methods and algorithms. John Wiley and Sons, New Jersey
Book Google Scholar
Kurt A, Oktay AB (2010) Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Syst Appl 37(12):7986–7992
Article Google Scholar
Liang YC, Maimury Y, Chen AHL, Juarez JRC (2020) Machine learning-based prediction of air quality. Appl Sci 10(24):9151
Article CAS Google Scholar
Liu H, Chen C (2020) Spatial air quality index prediction model based on decomposition, adaptive boosting, and three-stage feature selection: a case study in China. J Clean Prod 265:121777
Article CAS Google Scholar
Liu H, Li Q, Yu D, Gu Y (2019) Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci 9(19):4069
Article CAS Google Scholar
Mahalingam U, Elangovan K, Dobhal H, Valliappa C, Shrestha S, Kedam G (2019) A machine learning model for air quality prediction for smart cities. In: International conference on wireless communications signal processing and networking (WiSPNET) pp 452–457. IEEE
Maynard R (2004) Key airborne pollutants—the impact on health. Sci Total Environ 334:9–13
Article Google Scholar
Mishra D, Goyal P (2015) Analysis of ambient air quality using fuzzy air quality index: a case study of Delhi, India. Int J Environ Pollut 58(3):149–159
Article CAS Google Scholar
Mohammed Y, Caleb J (2014) Assessment of some air pollutants and their corresponding air quality at selected activity areas in Kaduna metropolis. In: Paper presented at the Proceedings of 37th Annual International Conference of Chemical of Nigeria (SCN) at Uyo, Akwa Ibom State Nigeria, 7th
Pan R, Wang X, Yi W, Wei Q, Gao J, Xu Z, Duan J, He Y, Tang C, Liu X, Zhou Y, Son S, Ji Y, Zou Y, Su H (2020) Interactions between climate factors and air quality index for improved childhood asthma self-management. Sci Total Environ 723:137804. https://doi.org/10.1016/j.scitotenv.2020.137804
Article CAS Google Scholar
Polley E, LeDell E, Kennedy C, van der Laan MS (2019) Super learner prediction. 2018. URL http://CRAN.R-project.org/package=SuperLearner, r package version, 2–0
Prasad P, Loveson VJ, Das B, Kotha M (2021) Novel ensemble machine learning models in flood susceptibility mapping. Geocarto Int 1–23
Qiao X, Jaffe D, Tang Y, Bresnahan M, Song J (2015) Evaluation of air quality in Chengdu, Sichuan Basin, China: are China’s air quality standards sufficient yet? Environ Monit Assess 187(5):1–11
Article Google Scholar
Sicard P, Lesne O, Alexandre N, Mangin A, Collomp R (2011) Air quality trends and potential health effects–development of an aggregate risk index. Atmos Environ 45(5):1145–1153
Article CAS Google Scholar
Soni, HB, Patel J (2018) Assessment of Ambient Air Quality and Air Quality Index in Golden Corridor of Gujarat, India: a case study of Dahej port. Int J Environ
Sonibare J, Adebiyi F, Obanijesu E, Okelana O (2010) Air quality index pattern around petroleum production facilities. Manag Environ Qual Int J
Sowlat MH, Gharibi H, Yunesian M, Mahmoudi MT, Lotfi S (2011) A novel, fuzzy-based air quality index (FAQI) for air quality assessment. Atmos Environ 45(12):2050–2059
Article CAS Google Scholar
Taylan O (2017) Modeling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality. Atmos Environ 150:356–365
Article CAS Google Scholar
Wang H, Jiao M, Tan Y (2016) Air quality index forecast based on fuzzy time series models. J Resid Sci Technol 13(5)
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
World Health Organization (2013) Health Effects of particulate matter: policy implications for countries in eastern Europe, Caucasus and central Asia
Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821
Article CAS Google Scholar
Zhai B, Chen J (2018) Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing China. Sci Total Environ 635:644–658
Article CAS Google Scholar
Zhang X, Lu JJ, Qin X, Zhao XN (2013) A high-level energy consumption model for heterogeneous data centers. Simul Model Pract Theory 39:41–55
Article Google Scholar
Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231(Pt 2):1232–1244. https://doi.org/10.1016/j.envpol.2017.08.069
Article CAS Google Scholar
Zhu S, Qiu X, Yin Y, Fang M, Liu X, Zhao X, Shi Y (2019) Two-step-hybrid model based on data preprocessing and intelligent optimization algorithms (CS and GWO) for NO2 and SO2 forecasting. Atmos Pollut Res 10(4):1326–1335
Article CAS Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

College of Safety and Emergency Management Engineering, Taiyuan University of Technology, Taiyuan, People’s Republic of China
S. A. Aram
Center of Shanxi Engineering Research for Coal Mine Intelligent Equipment, Taiyuan University of Technology, Taiyuan, 030024, People’s Republic of China
S. A. Aram & H. Wang
College of Mathematics, Taiyuan University of Technology, Taiyuan, People’s Republic of China
E. A. Nketiah
Department of Geosciences, Taiyuan University of Technology, Taiyuan, People’s Republic of China
B. M. Saalidong
College of Mechanical and Vehicle Engineering, Taiyuan University of Technology, Taiyuan, 030024, People’s Republic of China
H. Wang
Institute of Environmental Technology, Chair of Biotechnology of Water Treatment, Brandenburg University of Technology, BTU Cottbus-Senftenberg, 03046, Cottbus, Germany
A.-R. Afitiri
Department of Environment and Geography, Macquarie University, Sydney, Australia
A. B. Akoto
Ministry of Education Key Laboratory of Interface and Engineering in Advanced Materials, Research Centre of Advanced Materials Science and Technology, Taiyuan University of Technology, Taiyuan, People’s Republic of China
P. O. Lartey

Authors

S. A. Aram
View author publications
You can also search for this author in PubMed Google Scholar
E. A. Nketiah
View author publications
You can also search for this author in PubMed Google Scholar
B. M. Saalidong
View author publications
You can also search for this author in PubMed Google Scholar
H. Wang
View author publications
You can also search for this author in PubMed Google Scholar
A.-R. Afitiri
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Akoto
View author publications
You can also search for this author in PubMed Google Scholar
P. O. Lartey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. A. Aram.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Editorial responsibility: Chenxi Li.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aram, S.A., Nketiah, E.A., Saalidong, B.M. et al. Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. Int. J. Environ. Sci. Technol. 21, 1345–1360 (2024). https://doi.org/10.1007/s13762-023-05016-2

Download citation

Received: 07 November 2022
Revised: 19 April 2023
Accepted: 20 May 2023
Published: 07 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s13762-023-05016-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning-based prediction of air quality index and air quality grade: a comparative analysis

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence-based solutions for climate change: a review

Water quality prediction using machine learning models based on grid search method

Air pollution prediction with machine learning: a case study of Indian cities

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning-based prediction of air quality index and air quality grade: a comparative analysis

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence-based solutions for climate change: a review

Water quality prediction using machine learning models based on grid search method

Air pollution prediction with machine learning: a case study of Indian cities

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation