Skip to main content

Advertisement

Log in

Regression model and method settings for air pollution status analysis based on air quality data in Beijing (2017–2021)

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Regression analysis is an essential tool for modeling and analyzing data, which can be utilized in various areas for predictive analysis and discovering relationships between variables. However, guidelines such as the model's features, dataset selection, and method settings for using regression models to explore air pollution status in a region are not detailed. This paper applied regression analysis based on air quality data in Beijing from 2017 to 2021, to study the characteristics of regression models, provide research guidance, and update the air pollution research data based on the dataset. This paper drew the latest conclusions: (1) PM2.5 and NO2 are positively correlated on the test set from these 5 years, yielding a correlation coefficient of 0.7036 by using linear regression. The respective coefficient of determination on small-scale test sets for 2017, 2019, and 2021 is much lower than those derived from a 5-year dataset. Single-year dataset is not befitting for linear regression analysis. (2) The polynomial regression’s coefficient of determination on the training set is higher than that of the linear regression model, which is more proper for regression analysis on a 1-year dataset. (3) PM2.5 and NO2 concentrations are strongly positively correlated with whether the air is polluted or not, and the correlation coefficient on the test set from these 5 years is 0.9697. The accuracy of logistic regression in classifying air pollution status based on these two pollutants’ concentrations reaches 0.9430. Besides, this paper proposed some appropriate parameter settings for the logistic regression method provided by Python third-party library sklearn. Specifically, L2-type regularization is better optimized for the 2017–2021 dataset. L1-type regularization works better when applying a 1-year dataset. A boost in the inverse of the regularization strength to 1.8 will optimize the regularization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this article.

References

  1. Aryal, A., Harmon, A.C., Dugas, T.R.: Particulate matter air pollutants and cardiovascular disease: strategies for intervention. Pharmacol. Ther. 223, 107890 (2021). https://doi.org/10.1016/j.pharmthera.2021.107890

    Article  Google Scholar 

  2. Chu, B., Zhang, S., Liu, J., Ma, Q., He, H.: Significant concurrent decrease in PM2.5 and NO2 concentrations in China during COVID-19 epidemic. J. Environ. Sci. 99, 346–353 (2021). https://doi.org/10.1016/j.jes.2020.06.031

    Article  Google Scholar 

  3. Dai, H., Huang, G., Wang, J., Zeng, H.: VAR-tree model based spatio-temporal characterization and prediction of O3 concentration in China. Ecotoxicol. Environ. Saf. 257, 114960 (2023). https://doi.org/10.1016/j.ecoenv.2023.114960

    Article  Google Scholar 

  4. Dai, H., Huang, G., Zeng, H., Yu, R.: Haze risk assessment based on improved PCA-MEE and ISPO-LightGBM model. Systems 10(6), 263 (2022). https://doi.org/10.3390/systems10060263

    Article  Google Scholar 

  5. Domingo, J.L., Rovira, J.: Effects of air pollutants on the transmission and severity of respiratory viral infections. Environ. Res. 187, 109650 (2020). https://doi.org/10.1016/j.envres.2020.109650

    Article  Google Scholar 

  6. Dong, Z., Xing, J., Zhang, F., Wang, S., Ding, D., Wang, H., Huang, C., Zheng, H., Jiang, Y., Hao, J.: Synergetic PM2.5 and O3 control strategy for the Yangtze river delta China. J. Environ. Sci. 123, 281–291 (2023). https://doi.org/10.1016/j.jes.2022.04.008

    Article  Google Scholar 

  7. Garrett J (2021) SciencePlots (v1.0.9). 10.5281/zenodo.5512926, https://zenodo.org/record/5512926

  8. He, C., Hong, S., Zhang, L., Mu, H., Xin, A., Zhou, Y., Liu, J., Liu, N., Su, Y., Tian, Y., Ke, B., Wang, Y., Yang, L.: Global, continental, and national variation in PM2.5, O3, and NO2 concentrations during the early 2020 COVID-19 lockdown. Atmos. Pollut. Res. 12(3), 136–145 (2021). https://doi.org/10.1016/j.apr.2021.02.002

    Article  Google Scholar 

  9. Hua, J., Zhang, Y., de Foy, B., Mei, X., Shang, J., Feng, C.: Competing PM2.5 and NO2 holiday effects in the Beijing area vary locally due to differences in residential coal burning and traffic patterns. Sci. Total Environ. 750, 141575 (2021). https://doi.org/10.1016/j.scitotenv.2020.141575

    Article  Google Scholar 

  10. Hua, J., Zhang, Y., de Foy, B., Shang, J., Schauer, J.J., Mei, X., Sulaymon, I.D., Han, T.: Quantitative estimation of meteorological impacts and the COVID-19 lockdown reductions on NO2 and PM2.5 over the Beijing area using generalized additive models (GAM). J. Environ. Manag. 291, 112676 (2021). https://doi.org/10.1016/j.jenvman.2021.112676

    Article  Google Scholar 

  11. Leng, J., Wen, Y.: Environmental standards for healthy ventilation in metros: status, problems and prospects. Energy Build. 245, 111068 (2021). https://doi.org/10.1016/j.enbuild.2021.111068

    Article  Google Scholar 

  12. Si-li, J.I.A.N.G., Wen-xue, L.I., Li, B.U., Jia-yun, L.Y.U., Wen-ru, F.E.N.G., Yi-jian, Y.A.N.G.: Pollution characteristics of PM2.5 and its correlation with residents’ circulatory system diseases in Guangzhou in 2020. China Trop. Med. 12, 1144–1149 (2021). https://doi.org/10.13604/j.cnki.46-1064/r.2021.12.06

    Article  Google Scholar 

  13. Lin, C., Labzovskii, L.D., Leung Mak, H.W., Fung, J.C.H., Lau, A.K.H., Kenea, S.T., Bilal, M., Vande Hey, J.D., Lu, X., Ma, J.: Observation of PM2.5 using a combination of satellite remote sensing and low-cost sensor network in Siberian urban areas with limited reference monitoring. Atmos. Environ. 227, 117410 (2020). https://doi.org/10.1016/j.atmosenv.2020.117410

    Article  Google Scholar 

  14. Liu, B., Jin, Y., Xu, D., Wang, Y., Li, C.: A data calibration method for micro air quality detectors based on a LASSO regression and NARX neural network combined model. Sci. Rep. 11(1), 21173 (2021)

    Article  Google Scholar 

  15. Liu, S., Gautam, A., Yang, X., Tao, J., Wang, X., Zhao, W.: Analysis of improvement effect of PM2.5 and gaseous pollutants in Beijing based on self-organizing map network. Sustain. Cities Soc. 70, 102827 (2021). https://doi.org/10.1016/j.scs.2021.102827

    Article  Google Scholar 

  16. Pak, U., Ma, J., Ryu, U., Ryom, K., Juhyok, U., Pak, K., Pak, C.: Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: a case study of Beijing China. Sci. Total Environ. 699, 133561 (2020). https://doi.org/10.1016/j.scitotenv.2019.07.367

    Article  Google Scholar 

  17. Pan, C., Wang, H., Guo, H., Pan, H.: How do the population structure changes of china affect carbon emissions? An empirical study based on ridge regression analysis. Sustainability 13(6), 3319 (2021). https://doi.org/10.3390/su13063319

    Article  Google Scholar 

  18. Pang, N., Gao, J., Zhu, G., Hui, L., Zhao, P., Xu, Z., Tang, W., Chai, F.: Impact of clean air action on the PM2.5 pollution in Beijing, China: insights gained from two heating seasons measurements. Chemosphere 263, 127991 (2021). https://doi.org/10.1016/j.chemosphere.2020.127991

    Article  Google Scholar 

  19. Ritz, B., Hoffmann, B., Peters, A.: The effects of fine dust, ozone, and nitrogen dioxide on health. Dtsch. Arztebl. Int. 116(51–52), 881–886 (2019). https://doi.org/10.3238/arztebl.2019.0881

    Article  Google Scholar 

  20. Sethi, J.K., Mittal, M.: An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inf. 14(4), 1777–1786 (2021)

    Article  Google Scholar 

  21. Wen, L., Shao, H.: Analysis of influencing factors of the carbon dioxide emissions in China’s commercial department based on the STIRPAT model and ridge regression. Environ. Sci. Pollut. Res. 26(26), 27138–27147 (2019). https://doi.org/10.1007/s11356-019-05929-x

    Article  Google Scholar 

  22. Xie, Y., Dai, H., Zhang, Y., Wu, Y., Hanaoka, T., Masui, T.: Comparison of health and economic impacts of PM2.5 and ozone pollution in China. Environ. Int. 130, 104881 (2019). https://doi.org/10.1016/j.envint.2019.05.075

    Article  Google Scholar 

  23. Yuan, X.: Correlation between PM2.5 and concentration of main air pollutants in Wuhan City. J. Jianghan Univ. Nat. Sci. Edit. 45(6), 503 (2017)

    Google Scholar 

  24. Zhang, L., An, J., Liu, M., Li, Z., Liu, Y., Tao, L., Liu, X., Zhang, F., Zheng, D., Gao, Q., Guo, X., Luo, Y.: Spatiotemporal variations and influencing factors of PM2.5 concentrations in Beijing China. Environ. Pollut. 262, 114276 (2020). https://doi.org/10.1016/j.envpol.2020.114276

    Article  Google Scholar 

  25. Zhang, L., Zhao, N., Zhang, W., Wilson, J.P.: Changes in long-term PM2.5 pollution in the urban and suburban areas of China’s three largest urban agglomerations from 2000 to 2020. Remote Sens. 14(7), 1716 (2022). https://doi.org/10.3390/rs14071716

    Article  Google Scholar 

  26. Zhao, X., Yu, B., Liu, Y., Chen, Z., Li, Q., Wang, C., Wu, J.: Estimation of poverty using random forest regression with multi-source data: a case study in Bangladesh. Remote Sens. 11(4), 375 (2019). https://doi.org/10.3390/rs11040375

    Article  Google Scholar 

  27. Zhu, B., Pang, R., Chevallier, J., Wei, Y.-M., Vo, D.-T.: Including intangible costs into the cost-of-illness approach: a method refinement illustrated based on the PM2.5 economic burden in China. Eur. J. Health. Econ. 20(4), 501–511 (2019). https://doi.org/10.1007/s10198-018-1012-0

    Article  Google Scholar 

Download references

Acknowledgements

Acknowledgment for the data support from “Weather Hindcast Website (http://tianqihoubao.com/)” and “Beijing Municipal Ecological and Environmental Monitoring Center (http://www.bjmemc.com.cn/).” We also would like to thank the reviewers and editors for their valuable comments and suggestions. And sincerely thank our corresponding author for her instruction and supervision.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Specifically, material preparation, data collection, and analysis were performed by SW and XL. Data visualization was accomplished by SW. The first draft of the manuscript was written by SW and XL, and all authors commented on previous versions of the manuscript. The whole study was supervised by MW and SW. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Minjuan Wang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wa, S., Lu, X. & Wang, M. Regression model and method settings for air pollution status analysis based on air quality data in Beijing (2017–2021). Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00415-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-023-00415-7

Keywords

Navigation