Skip to main content
Log in

Machine learning-based prediction of air quality index and air quality grade: a comparative analysis

  • Original Paper
  • Published:
International Journal of Environmental Science and Technology Aims and scope Submit manuscript

Abstract

The purpose of this study was to compare different machine learning models for predicting daily air quality index (AQI) and evaluating air quality grade (AQG). The study used publicly available data from 2014 to 2019 for six pollutants (PM10, PM2.5, NO2, SO2, CO, O3). Four models (random forest (RF), gradient boosting (GB), Lasso Regression (LASSO), and the Stacked Regressor) were used for predicting AQI, while six models (K-Nearest Neighbors (KNN), support vector machines (SVM), decision tree (DT), multilayer perceptron (MLP), random forest (RF), and the Stacked Classifier) were used for forecasting AQG. The individual models were evaluated using different statistical measures, such as R-squared (R2), root mean square error (RMSE), mean absolute error (MAE), accuracy score (ACC), Matthew’s Correlation Coefficient (MCC), and F1 score. The study found that the stack model performed consistently across all metric scores for AQI prediction. The stack model had an R2 score of 0.973, RMSE of 7.568, and MAE of 4.596, outperforming LASSO, GB, and RF. This indicates that the stack model was able to minimize the weaknesses of the individual models and provide a more accurate prediction. For AQG, the stack model also performed better across all metric scores, with an ACC of 0.970, MCC of 0.960, and F1 of 0.970, outperforming MLP, KNN, SVM, DT, and RF. The study concluded that stacked generalization machine learning models can be used for forecasting air quality index and grade with high efficiency and precision, mitigating the concerns of overfitting against individual models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

The entire dataset used in this study was obtained from the China National Environmental Monitoring Centre (CNEMC, http://www.cnemc.cn).

References

  • Abelsohn A, Stieb D, Sanborn MD, Weir E (2002) Identifying and managing adverse environmental health effects: 2. Outdoor air pollution. CMAJ 166(9):1161–1167

    Google Scholar 

  • Ahmadi K, Kalantar B, Saeidi V, Harandi EK, Janizadeh S, Ueda N (2020) Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral sentinel-2 data. Remote Sens 12(18):3019

    Article  Google Scholar 

  • Akinfolarin OM, Boisa N, Obunwo C (2017) Assessment of particulate matter-based air quality index in Port Harcourt Nigeria. J Environ Anal Chem 4(4):224

    Article  Google Scholar 

  • Alfeilat HAA, Hassanat AB, Lasassmeh O, Tarawneh AS, Alhasanat MB, Eyal Salman HS, Prasath VS (2019) Effects of distance measure choice on k-nearest neighbor classifier performance: a review. Big Data 7(4):221–248

    Article  Google Scholar 

  • Banerjee T, Srivastava RK (2011) Assessment of the ambient air quality at the Integrated Industrial Estate-Pantnagar through the air quality index (AQI) and exceedence factor (EF). Asia-Pac J Chem Eng 6(1):64–70

    Article  CAS  Google Scholar 

  • Bao J, Yang X, Zhao Z, Wang Z, Yu C, Li X (2015) The spatial-temporal characteristics of air pollution in China from 2001–2014. Int J Environ Res Public Health 12(12):15875–15887

    Article  CAS  Google Scholar 

  • Ben Seghier MEA, Carvalho H, Keshtegar B, Correia JA, Berto F (2020) Novel hybridized adaptive neuro-fuzzy inference system models based particle swarm optimization and genetic algorithms for accurate prediction of stress intensity factor. Fatigue Fract Eng Mater Struct 43(11):2653–2667

    Article  Google Scholar 

  • Breiman L (1996) Stacked regressions. Mach Learn 24(1):49–64

    Article  Google Scholar 

  • Bui DT, Tsangaratos P, Ngo PTT, Pham TD, Pham BT (2019) Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci Total Environ 668:1038–1054

    Article  CAS  Google Scholar 

  • Canbek G, TaskayaTemizel T, Sagiroglu S (2021) BenchMetrics: a systematic benchmarking method for binary classification performance metrics. Neural Comput Appl 33(21):14623–14650

    Article  Google Scholar 

  • Chen B, Lu S, Li S, Wang B (2015) Impact of fine particulate fluctuation and other variables on Beijing’s air quality index. Environ Sci Pollut Res 22(7):5139–5151

    Article  CAS  Google Scholar 

  • Cheng Y, Zhang H, Liu Z, Chen L, Wang P (2019) Hybrid algorithm for short-term forecasting of PM2. 5 in China. Atmos Environ 200:264–279

    Article  CAS  Google Scholar 

  • Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13

    Article  Google Scholar 

  • Choi G, Heo S, Lee J-T (2016) Assessment of environmental injustice in Korea using synthetic air quality index and multiple indicators of socioeconomic status: a cross-sectional study. J Air Waste Manag Assoc 66(1):28–37

    Article  CAS  Google Scholar 

  • Dastoorpoor M, Idani E, Goudarzi G, Khanjani N (2018) Acute effects of air pollution on spontaneous abortion, premature delivery, and stillbirth in Ahvaz, Iran: a time-series study. Environ Sci Pollut Res 25(6):5447–5458

    Article  CAS  Google Scholar 

  • Dominici F, Peng RD, Barr CD, Bell ML (2010) Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach. Epidemiology 21(2):187–194

    Article  Google Scholar 

  • Dragomir EG (2010) Air quality index prediction using K-nearest neighbor technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII 1(2010):103–108

    Google Scholar 

  • Feng Q, Wu S, Du Y, Xue H, Xiao F, Ban X, Li X (2013) Improving neural network prediction accuracy for PM10 individual air quality index pollution levels. Environ Eng Sci 30(12):725–732

    Article  CAS  Google Scholar 

  • Ganesh SS, Modali SH, Palreddy SR, Arulmozhivarman P (2017) Forecasting air quality index using regression models: a case study on Delhi and Houston. 248–254

  • GB 3095–2012 (2012) China Ambient air quality standards. Environmental Development Center

  • Harrington P (2012) Machine learning in action. Simon and Schuster

    Google Scholar 

  • Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, pp 278–282

  • Hong H, Liu J, Bui DT, Pradhan B, Acharya TD, Pham BT, Ahmad BB (2018) Landslide susceptibility mapping using J48 decision tree with AdaBoost, bagging and rotation forest ensembles in the Guangchang area (China). CATENA 163:399–413

    Article  Google Scholar 

  • Janarthanan R, Partheeban P, Somasundaram K, Elamparithi PN (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc 67:102720

    Article  Google Scholar 

  • Kamiński B, Jakubczyk M, Szufel P (2018) A framework for sensitivity analysis of decision trees. Cent Eur J Oper Res 26(1):135–159

    Article  Google Scholar 

  • Khosravi K, Pham BT, Chapi K, Shirzadi A, Shahabi H, Revhaug I, Bui DT (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755

    Article  CAS  Google Scholar 

  • Kumar A, Goyal P (2011a) Forecasting of daily air quality index in Delhi. Sci Total Environ 409(24):5517–5523. https://doi.org/10.1016/j.scitotenv.2011.08.069

    Article  CAS  Google Scholar 

  • Kumar A, Goyal P (2011b) Forecasting of air quality in Delhi using principal component regression technique. Atmos Pollut Res 2(4):436–444

    Article  CAS  Google Scholar 

  • Kuncheva LI (2014) Combining pattern classifiers: methods and algorithms. John Wiley and Sons, New Jersey

    Book  Google Scholar 

  • Kurt A, Oktay AB (2010) Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Syst Appl 37(12):7986–7992

    Article  Google Scholar 

  • Liang YC, Maimury Y, Chen AHL, Juarez JRC (2020) Machine learning-based prediction of air quality. Appl Sci 10(24):9151

    Article  CAS  Google Scholar 

  • Liu H, Chen C (2020) Spatial air quality index prediction model based on decomposition, adaptive boosting, and three-stage feature selection: a case study in China. J Clean Prod 265:121777

    Article  CAS  Google Scholar 

  • Liu H, Li Q, Yu D, Gu Y (2019) Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci 9(19):4069

    Article  CAS  Google Scholar 

  • Mahalingam U, Elangovan K, Dobhal H, Valliappa C, Shrestha S, Kedam G (2019) A machine learning model for air quality prediction for smart cities. In: International conference on wireless communications signal processing and networking (WiSPNET) pp 452–457. IEEE

  • Maynard R (2004) Key airborne pollutants—the impact on health. Sci Total Environ 334:9–13

    Article  Google Scholar 

  • Mishra D, Goyal P (2015) Analysis of ambient air quality using fuzzy air quality index: a case study of Delhi, India. Int J Environ Pollut 58(3):149–159

    Article  CAS  Google Scholar 

  • Mohammed Y, Caleb J (2014) Assessment of some air pollutants and their corresponding air quality at selected activity areas in Kaduna metropolis. In: Paper presented at the Proceedings of 37th Annual International Conference of Chemical of Nigeria (SCN) at Uyo, Akwa Ibom State Nigeria, 7th

  • Pan R, Wang X, Yi W, Wei Q, Gao J, Xu Z, Duan J, He Y, Tang C, Liu X, Zhou Y, Son S, Ji Y, Zou Y, Su H (2020) Interactions between climate factors and air quality index for improved childhood asthma self-management. Sci Total Environ 723:137804. https://doi.org/10.1016/j.scitotenv.2020.137804

    Article  CAS  Google Scholar 

  • Polley E, LeDell E, Kennedy C, van der Laan MS (2019) Super learner prediction. 2018. URL http://CRAN.R-project.org/package=SuperLearner, r package version, 2–0

  • Prasad P, Loveson VJ, Das B, Kotha M (2021) Novel ensemble machine learning models in flood susceptibility mapping. Geocarto Int 1–23

  • Qiao X, Jaffe D, Tang Y, Bresnahan M, Song J (2015) Evaluation of air quality in Chengdu, Sichuan Basin, China: are China’s air quality standards sufficient yet? Environ Monit Assess 187(5):1–11

    Article  Google Scholar 

  • Sicard P, Lesne O, Alexandre N, Mangin A, Collomp R (2011) Air quality trends and potential health effects–development of an aggregate risk index. Atmos Environ 45(5):1145–1153

    Article  CAS  Google Scholar 

  • Soni, HB, Patel J (2018) Assessment of Ambient Air Quality and Air Quality Index in Golden Corridor of Gujarat, India: a case study of Dahej port. Int J Environ

  • Sonibare J, Adebiyi F, Obanijesu E, Okelana O (2010) Air quality index pattern around petroleum production facilities. Manag Environ Qual Int J

  • Sowlat MH, Gharibi H, Yunesian M, Mahmoudi MT, Lotfi S (2011) A novel, fuzzy-based air quality index (FAQI) for air quality assessment. Atmos Environ 45(12):2050–2059

    Article  CAS  Google Scholar 

  • Taylan O (2017) Modeling and analysis of ozone concentration by artificial intelligent techniques for estimating air quality. Atmos Environ 150:356–365

    Article  CAS  Google Scholar 

  • Wang H, Jiao M, Tan Y (2016) Air quality index forecast based on fuzzy time series models. J Resid Sci Technol 13(5)

  • Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  • World Health Organization (2013) Health Effects of particulate matter: policy implications for countries in eastern Europe, Caucasus and central Asia

  • Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821

    Article  CAS  Google Scholar 

  • Zhai B, Chen J (2018) Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing China. Sci Total Environ 635:644–658

    Article  CAS  Google Scholar 

  • Zhang X, Lu JJ, Qin X, Zhao XN (2013) A high-level energy consumption model for heterogeneous data centers. Simul Model Pract Theory 39:41–55

    Article  Google Scholar 

  • Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231(Pt 2):1232–1244. https://doi.org/10.1016/j.envpol.2017.08.069

    Article  CAS  Google Scholar 

  • Zhu S, Qiu X, Yin Y, Fang M, Liu X, Zhao X, Shi Y (2019) Two-step-hybrid model based on data preprocessing and intelligent optimization algorithms (CS and GWO) for NO2 and SO2 forecasting. Atmos Pollut Res 10(4):1326–1335

    Article  CAS  Google Scholar 

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. A. Aram.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Editorial responsibility: Chenxi Li.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aram, S.A., Nketiah, E.A., Saalidong, B.M. et al. Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. Int. J. Environ. Sci. Technol. 21, 1345–1360 (2024). https://doi.org/10.1007/s13762-023-05016-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13762-023-05016-2

Keywords

Navigation