Estimation of urban AQI based on interpretable machine learning

Wang, Siyuan; Ren, Ying; Xia, Bisheng

doi:10.1007/s11356-023-29336-5

Estimation of urban AQI based on interpretable machine learning

Research Article
Published: 14 August 2023

Volume 30, pages 96562–96574, (2023)
Cite this article

Environmental Science and Pollution Research Aims and scope Submit manuscript

Siyuan Wang¹,
Ying Ren¹ &
Bisheng Xia¹

422 Accesses
Explore all metrics

Abstract

Air pollution is an increasingly serious problem. Accurate and efficient prediction of air quality can effectively prevent air pollution and improve the quality of human life. The air quality index (AQI) is a dimensionless tool to describe air quality quantitatively. In this study, the machine learning (ML) method was used to estimate AQI for Shijiazhuang, China, as the research object, and pollutants and meteorological factors as data models. Specifically, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) models were used. The experimental results show that XGBoost model captures the AQI variation trend well, and the R² of XGBoost model is 0.929, which is 0.3% and 2.3% higher than the R² of RF model and LightGBM model, respectively. In addition, through the SHAP-based model interpretation method, the study reveals the key factors of AQI variation, that is PM_2.5 and PM₁₀, play positive roles in the variation of AQI and AQI is less sensitive to meteorological factors. Finally, Beijing, Shanghai, Xi’an, and Guangzhou were selected to test the model’s validity, and the model performance remained good. Our study shows that applying ML approach to air quality prediction is beneficial for efficiently assessing cities’ future air quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of Air Quality Using Machine Learning

Air quality index prediction via multi-task machine learning technique: spatial analysis for human capital and intensive air quality monitoring stations

Article 05 October 2022

GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms

Article 17 September 2021

Data availability

The datasets during the study are available at https://air.cnemc.cn:18014/ and https://power.larc.nasa.gov/data-access-viewer/.

References

Abedi R, Costache R, Shafizadeh-Moghadam H, Pham QB (2022) Flash-flood susceptibility mapping based on XGBoost, random forest and boosted regression trees. Geocarto Int 37:5479–5496
Google Scholar
Aliramezani M, Koch CR, Shahbakhti M (2022) Modeling, diagnostics, optimization, and control of internal combustion engines via modern machine learning techniques: a review and future directions. Prog Energy Combust Sci 88:100967
Google Scholar
Antwarg L, Miller RM, Shapira B, Rokach L (2021) Explaining anomalies detected by autoencoders using Shapley Additive Explanations. Expert Syst Appl 186:115736
Google Scholar
Arbex MA, Santos UP, Martins LC, Saldiva PH, Pereira LA, Braga AL (2012) Air pollution and the respiratory system. J Bras Pneumol 38:643–655
Google Scholar
ArunKumar K, Kalaga DV, Kumar CMS, Kawaji M, Brenza TM (2022) Comparative analysis of Gated Recurrent Units (GRU), long Short-Term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends. Alex Eng J 61:7585–7603
Google Scholar
Azodi CB, Tang J, Shiu S-H (2020) Opening the black box: interpretable machine learning for geneticists. Trends Genet 36:442–455
CAS Google Scholar
Benchrif A, Wheida A, Tahri M, Shubbar RM, Biswas B (2021) Air quality during three covid-19 lockdown phases: AQI, PM2.5 and NO2 assessment in cities with more than 1 million inhabitants. Sustain Cities Soc 74:103170
Google Scholar
Chauhan AJ, Johnston SL (2003) Air pollution and infection in respiratory illness. Br Med Bull 68:95–112
CAS Google Scholar
Chen X, Zhang L-W, Huang J-J, Song F-J, Zhang L-P, Qian Z-M, Trevathan E, Mao H-J, Han B, Vaughn M (2016) Long-term exposure to urban air pollution and lung cancer mortality: A 12-year cohort study in Northern China. Sci Total Environ 571:855–861
CAS Google Scholar
Chen S-Z, Feng D-C, Han W-S, Wu G (2021) Development of data-driven prediction model for CFRP-steel bond strength by implementing ensemble learning algorithms. Constr Build Mater 303:124470
CAS Google Scholar
Dias HLF, Bertoncini BV, Cavalcante RM, Jensen SS, Hansen KM, Ketzel M (2021) Evaluation of OSPM against air quality measurements in Brazil–the case study of Fortaleza, Ceará. J Air Waste Manag Assoc 71:170–190
CAS Google Scholar
Greener JG, Kandathil SM, Moffat L, Jones DT (2022) A guide to machine learning for biologists. Nat Rev Mol Cell Biol 23:40–55
CAS Google Scholar
Gregório J, Gouveia-Caridade C, Caridade PJ (2022) Modeling PM2.5 and PM10 using a robust simplified linear regression machine learning algorithm. Atmosphere 13:1334
Google Scholar
Guliyev H, Mustafayev E (2022) Predicting the changes in the WTI crude oil price dynamics using machine learning models. Resour Policy 77:102664
Google Scholar
He Y, Hu C, Jiang B, Sun Z, Ma J, Li H, Tang D (2022) Data-driven approach to predict the flow boiling heat transfer coefficient of liquid hydrogen aviation fuel. Fuel 324:124778
CAS Google Scholar
Hu Y, Zang Z, Chen D, Ma X, Liang Y, You W, Pan X, Wang L, Wang D, Zhang Z (2022) Optimization and evaluation of SO2 emissions based on WRF-Chem and 3DVAR data assimilation. Remote Sensing 14:220
Google Scholar
Ju J, Liu K, Liu F (2022) Prediction of SO2 concentration based on AR LSTM neural network [J]. Neural Proces Lett 1–19
Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nature Reviews. Physics 3:422–440
Google Scholar
Kim B-Y, Lim Y-K, Cha JW (2022) Short-term prediction of particulate matter (PM10 and PM2.5) in Seoul, South Korea using tree-based machine learning algorithms. Atmospheric Pollut Res 13:101547
CAS Google Scholar
Li Z (2022) Extracting spatial effects from machine learning model using local interpretation method: an example of SHAP and XGBoost. Comput Environ Urban Syst 96:101845
Google Scholar
Li Y, Yang L, Yang B, Wang N, Wu T (2019) Application of interpretable machine learning models for the intelligent decision. Neurocomputing 333:273–283
Google Scholar
Li S, Hui EC, Wen H, Liu H (2022) Does public concern matter to the welfare cost of air pollution? Evidence from Chinese Cities Cities 131:103992
Google Scholar
Liu X, Lu D, Zhang A, Liu Q, Jiang G (2022) Data-driven machine learning in environmental pollution: gains and problems. Environ Sci Technol 56:2124–2133
CAS Google Scholar
Naghibi SA, Dolatkordestani M, Rezaei A, Amouzegari P, Heravi MT, Kalantar B, Pradhan B (2019) Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential. Environ Monit Assess 191:1–20
Google Scholar
Nasir N, Kansal A, Alshaltone O, Barneih F, Sameer M, Shanableh A, Al-Shamma’a A (2022) Water quality classification using machine learning algorithms. J Water Process Eng 48:102920
Google Scholar
Niri MF, Reynolds C, Ramírez LAR, Kendrick E, Marco J (2022) Systematic analysis of the impact of slurry coating on manufacture of Li-ion battery electrodes via explainable machine learning. Energy Storage Materials 51:223–238
Google Scholar
Perera F, Nadeau K (2022) Climate change, fossil-fuel pollution, and children’s health. N Engl J Med 386:2303–2314
CAS Google Scholar
Qiu T, Zhang M, Liu X, Liu J, Chen C, Zhao W (2020) A directed edge weight prediction model using decision tree ensembles in industrial Internet of things. IEEE Trans Industr Inf 17:2160–2168
Google Scholar
Senthilkumar N, Gilfether M, Chang HH, Russell AG, Mulholland J (2022) Using land use variable information and a random forest approach to correct spatial mean bias in fused CMAQ fields for particulate and gas species. Atmos Environ 274:118982
CAS Google Scholar
Sun Y, Haghighat F, Fung BC (2020) A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy and Buildings 221:110022
Google Scholar
Sun Z, Santos J, Caetano E (2022) Data-driven prediction and interpretation of fatigue damage in a road-rail suspension bridge considering multiple loads. Struct Control Health Monit 29:e2997
Google Scholar
Tao H, Awadh SM, Salih SQ, Shafik SS, Yaseen ZM (2022) Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction. Neural Comput Appl 34:515–533
Google Scholar
Thongthammachart T, Araki S, Shimadera H, Matsuo T, Kondo A (2022) Incorporating Light Gradient Boosting Machine to land use regression model for estimating NO2 and PM2.5 levels in Kansai region. Japan. Environmental Modelling & Software 155:105447
Google Scholar
Tian Y, Yao X, Chen L (2019) Analysis of spatial and seasonal distributions of air pollutants by incorporating urban morphological characteristics. Comput Environ Urban Syst 75:35–48
Google Scholar
Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 86:105941
Google Scholar
Wang Y, Sun K, Li L, Lei Y, Wu S, Jiang Y, Mi Y, Yang J (2022) The impacts of economic level and air pollution on public health at the micro and macro level. J Clean Prod 366:132932
CAS Google Scholar
Yang Z, Liu H, Bi T, Li Z, Yang Q (2020) An adaptive PMU missing data recovery method. Int J Electr Power Energy Syst 116:105577
Google Scholar
Yang Y, Yuan Y, Han Z, Liu G (2022) Interpretability analysis for thermal sensation machine learning models: an exploration based on the SHAP approach. Indoor Air 32:e12984
Google Scholar
Ye L, Dai B, Li Z, Pei M, Zhao Y, Lu P (2022) An ensemble method for short-term wind power prediction considering error correction strategy. Appl Energy 322:119475
Google Scholar
Yu H, Wu Y, Niu L, Chai Y, Feng Q, Wang W, Liang T (2021) A method to avoid spatial overfitting in estimation of grassland above-ground biomass on the Tibetan Plateau. Ecol Ind 125:107450
Google Scholar
Yu W, Li S, Ye T, Xu R, Song J, Guo Y (2022) Deep ensemble machine learning framework for the estimation of PM 25 concentrations. Environmental Health Perspectives 130:037004
Google Scholar
Zaib S, Lu J, Bilal M (2022) Spatio-temporal characteristics of air quality index (AQI) over Northwest China. Atmosphere 13:375
CAS Google Scholar
Zhang Y, Zhu B, Gao J, Kang H, Yang P, Wang L, Zhang J (2017) The source apportionment of primary PM2. 5 in an aerosol pollution event over Beijing-Tianjin-Hebei region using WRF-Chem. China Aerosol and Air Quality Research 17:2966–2980
CAS Google Scholar
Zhang B, Zhang Y, Jiang X (2022b) Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm. Sci Rep 12:9244
CAS Google Scholar
Zhang B, Rong Y, Yong R, Qin D, Li M, Zou G, Pan J, (2022a). Deep learning for air pollutant concentration prediction: a review. Atmospheric Environment, 119347.
Zhou J, Li Y (2022) Research on spatial distribution characteristics of high haze pollution industries such as thermal power industry in the Beijing-Tianjin-Hebei Region. Energies 15:6610
Google Scholar
Zhu M, Xie J (2023) Investigation of nearby monitoring station for hourly PM25 forecasting using parallel multi-input 1D-CNN-biLSTM. Expert Systems App 211:118707
Google Scholar
Zhu S, Wang X, Mei D, Wei L, Lu M (2022) CEEMD-MR-hybrid model based on sample entropy and random forest for SO2 prediction. Atmos Pollut Res 13:101358
CAS Google Scholar

Download references

Funding

This work was supported by Yan’an City Science and Technology Development Program (No.203010096), Yan’an University Doctoral Program (No.20504306), Shaanxi Provincial Talent Program (No. YAU202305399), Yan’an University 14th Five-Year Major Research Program (YAU202313738), and Graduate Education Innovation Program of Yan'an University (No. YCX2023008).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Yan’an University, Yan’an, 716000, China
Siyuan Wang, Ying Ren & Bisheng Xia

Authors

Siyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Ren
View author publications
You can also search for this author in PubMed Google Scholar
Bisheng Xia
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Siyuan Wang: conceptualization, methodology, data analysis, data collection, and writing—original draft. Ying Ren: formal analysis, investigation, and validation. Bisheng Xia: supervision, validation, writing—review and editing, and funding acquisition.

Corresponding author

Correspondence to Bisheng Xia.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Responsible Editor: Marcus Schulz

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, S., Ren, Y. & Xia, B. Estimation of urban AQI based on interpretable machine learning. Environ Sci Pollut Res 30, 96562–96574 (2023). https://doi.org/10.1007/s11356-023-29336-5

Download citation

Received: 11 April 2023
Accepted: 10 August 2023
Published: 14 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11356-023-29336-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of urban AQI based on interpretable machine learning

Abstract

Access this article

Similar content being viewed by others

Prediction of Air Quality Using Machine Learning

Air quality index prediction via multi-task machine learning technique: spatial analysis for human capital and intensive air quality monitoring stations

GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of urban AQI based on interpretable machine learning

Abstract

Access this article

Similar content being viewed by others

Prediction of Air Quality Using Machine Learning

Air quality index prediction via multi-task machine learning technique: spatial analysis for human capital and intensive air quality monitoring stations

GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation