Abstract
Regression analysis is an essential tool for modeling and analyzing data, which can be utilized in various areas for predictive analysis and discovering relationships between variables. However, guidelines such as the model's features, dataset selection, and method settings for using regression models to explore air pollution status in a region are not detailed. This paper applied regression analysis based on air quality data in Beijing from 2017 to 2021, to study the characteristics of regression models, provide research guidance, and update the air pollution research data based on the dataset. This paper drew the latest conclusions: (1) PM2.5 and NO2 are positively correlated on the test set from these 5 years, yielding a correlation coefficient of 0.7036 by using linear regression. The respective coefficient of determination on small-scale test sets for 2017, 2019, and 2021 is much lower than those derived from a 5-year dataset. Single-year dataset is not befitting for linear regression analysis. (2) The polynomial regression’s coefficient of determination on the training set is higher than that of the linear regression model, which is more proper for regression analysis on a 1-year dataset. (3) PM2.5 and NO2 concentrations are strongly positively correlated with whether the air is polluted or not, and the correlation coefficient on the test set from these 5 years is 0.9697. The accuracy of logistic regression in classifying air pollution status based on these two pollutants’ concentrations reaches 0.9430. Besides, this paper proposed some appropriate parameter settings for the logistic regression method provided by Python third-party library sklearn. Specifically, L2-type regularization is better optimized for the 2017–2021 dataset. L1-type regularization works better when applying a 1-year dataset. A boost in the inverse of the regularization strength to 1.8 will optimize the regularization.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this article.
References
Aryal, A., Harmon, A.C., Dugas, T.R.: Particulate matter air pollutants and cardiovascular disease: strategies for intervention. Pharmacol. Ther. 223, 107890 (2021). https://doi.org/10.1016/j.pharmthera.2021.107890
Chu, B., Zhang, S., Liu, J., Ma, Q., He, H.: Significant concurrent decrease in PM2.5 and NO2 concentrations in China during COVID-19 epidemic. J. Environ. Sci. 99, 346–353 (2021). https://doi.org/10.1016/j.jes.2020.06.031
Dai, H., Huang, G., Wang, J., Zeng, H.: VAR-tree model based spatio-temporal characterization and prediction of O3 concentration in China. Ecotoxicol. Environ. Saf. 257, 114960 (2023). https://doi.org/10.1016/j.ecoenv.2023.114960
Dai, H., Huang, G., Zeng, H., Yu, R.: Haze risk assessment based on improved PCA-MEE and ISPO-LightGBM model. Systems 10(6), 263 (2022). https://doi.org/10.3390/systems10060263
Domingo, J.L., Rovira, J.: Effects of air pollutants on the transmission and severity of respiratory viral infections. Environ. Res. 187, 109650 (2020). https://doi.org/10.1016/j.envres.2020.109650
Dong, Z., Xing, J., Zhang, F., Wang, S., Ding, D., Wang, H., Huang, C., Zheng, H., Jiang, Y., Hao, J.: Synergetic PM2.5 and O3 control strategy for the Yangtze river delta China. J. Environ. Sci. 123, 281–291 (2023). https://doi.org/10.1016/j.jes.2022.04.008
Garrett J (2021) SciencePlots (v1.0.9). 10.5281/zenodo.5512926, https://zenodo.org/record/5512926
He, C., Hong, S., Zhang, L., Mu, H., Xin, A., Zhou, Y., Liu, J., Liu, N., Su, Y., Tian, Y., Ke, B., Wang, Y., Yang, L.: Global, continental, and national variation in PM2.5, O3, and NO2 concentrations during the early 2020 COVID-19 lockdown. Atmos. Pollut. Res. 12(3), 136–145 (2021). https://doi.org/10.1016/j.apr.2021.02.002
Hua, J., Zhang, Y., de Foy, B., Mei, X., Shang, J., Feng, C.: Competing PM2.5 and NO2 holiday effects in the Beijing area vary locally due to differences in residential coal burning and traffic patterns. Sci. Total Environ. 750, 141575 (2021). https://doi.org/10.1016/j.scitotenv.2020.141575
Hua, J., Zhang, Y., de Foy, B., Shang, J., Schauer, J.J., Mei, X., Sulaymon, I.D., Han, T.: Quantitative estimation of meteorological impacts and the COVID-19 lockdown reductions on NO2 and PM2.5 over the Beijing area using generalized additive models (GAM). J. Environ. Manag. 291, 112676 (2021). https://doi.org/10.1016/j.jenvman.2021.112676
Leng, J., Wen, Y.: Environmental standards for healthy ventilation in metros: status, problems and prospects. Energy Build. 245, 111068 (2021). https://doi.org/10.1016/j.enbuild.2021.111068
Si-li, J.I.A.N.G., Wen-xue, L.I., Li, B.U., Jia-yun, L.Y.U., Wen-ru, F.E.N.G., Yi-jian, Y.A.N.G.: Pollution characteristics of PM2.5 and its correlation with residents’ circulatory system diseases in Guangzhou in 2020. China Trop. Med. 12, 1144–1149 (2021). https://doi.org/10.13604/j.cnki.46-1064/r.2021.12.06
Lin, C., Labzovskii, L.D., Leung Mak, H.W., Fung, J.C.H., Lau, A.K.H., Kenea, S.T., Bilal, M., Vande Hey, J.D., Lu, X., Ma, J.: Observation of PM2.5 using a combination of satellite remote sensing and low-cost sensor network in Siberian urban areas with limited reference monitoring. Atmos. Environ. 227, 117410 (2020). https://doi.org/10.1016/j.atmosenv.2020.117410
Liu, B., Jin, Y., Xu, D., Wang, Y., Li, C.: A data calibration method for micro air quality detectors based on a LASSO regression and NARX neural network combined model. Sci. Rep. 11(1), 21173 (2021)
Liu, S., Gautam, A., Yang, X., Tao, J., Wang, X., Zhao, W.: Analysis of improvement effect of PM2.5 and gaseous pollutants in Beijing based on self-organizing map network. Sustain. Cities Soc. 70, 102827 (2021). https://doi.org/10.1016/j.scs.2021.102827
Pak, U., Ma, J., Ryu, U., Ryom, K., Juhyok, U., Pak, K., Pak, C.: Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: a case study of Beijing China. Sci. Total Environ. 699, 133561 (2020). https://doi.org/10.1016/j.scitotenv.2019.07.367
Pan, C., Wang, H., Guo, H., Pan, H.: How do the population structure changes of china affect carbon emissions? An empirical study based on ridge regression analysis. Sustainability 13(6), 3319 (2021). https://doi.org/10.3390/su13063319
Pang, N., Gao, J., Zhu, G., Hui, L., Zhao, P., Xu, Z., Tang, W., Chai, F.: Impact of clean air action on the PM2.5 pollution in Beijing, China: insights gained from two heating seasons measurements. Chemosphere 263, 127991 (2021). https://doi.org/10.1016/j.chemosphere.2020.127991
Ritz, B., Hoffmann, B., Peters, A.: The effects of fine dust, ozone, and nitrogen dioxide on health. Dtsch. Arztebl. Int. 116(51–52), 881–886 (2019). https://doi.org/10.3238/arztebl.2019.0881
Sethi, J.K., Mittal, M.: An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inf. 14(4), 1777–1786 (2021)
Wen, L., Shao, H.: Analysis of influencing factors of the carbon dioxide emissions in China’s commercial department based on the STIRPAT model and ridge regression. Environ. Sci. Pollut. Res. 26(26), 27138–27147 (2019). https://doi.org/10.1007/s11356-019-05929-x
Xie, Y., Dai, H., Zhang, Y., Wu, Y., Hanaoka, T., Masui, T.: Comparison of health and economic impacts of PM2.5 and ozone pollution in China. Environ. Int. 130, 104881 (2019). https://doi.org/10.1016/j.envint.2019.05.075
Yuan, X.: Correlation between PM2.5 and concentration of main air pollutants in Wuhan City. J. Jianghan Univ. Nat. Sci. Edit. 45(6), 503 (2017)
Zhang, L., An, J., Liu, M., Li, Z., Liu, Y., Tao, L., Liu, X., Zhang, F., Zheng, D., Gao, Q., Guo, X., Luo, Y.: Spatiotemporal variations and influencing factors of PM2.5 concentrations in Beijing China. Environ. Pollut. 262, 114276 (2020). https://doi.org/10.1016/j.envpol.2020.114276
Zhang, L., Zhao, N., Zhang, W., Wilson, J.P.: Changes in long-term PM2.5 pollution in the urban and suburban areas of China’s three largest urban agglomerations from 2000 to 2020. Remote Sens. 14(7), 1716 (2022). https://doi.org/10.3390/rs14071716
Zhao, X., Yu, B., Liu, Y., Chen, Z., Li, Q., Wang, C., Wu, J.: Estimation of poverty using random forest regression with multi-source data: a case study in Bangladesh. Remote Sens. 11(4), 375 (2019). https://doi.org/10.3390/rs11040375
Zhu, B., Pang, R., Chevallier, J., Wei, Y.-M., Vo, D.-T.: Including intangible costs into the cost-of-illness approach: a method refinement illustrated based on the PM2.5 economic burden in China. Eur. J. Health. Econ. 20(4), 501–511 (2019). https://doi.org/10.1007/s10198-018-1012-0
Acknowledgements
Acknowledgment for the data support from “Weather Hindcast Website (http://tianqihoubao.com/)” and “Beijing Municipal Ecological and Environmental Monitoring Center (http://www.bjmemc.com.cn/).” We also would like to thank the reviewers and editors for their valuable comments and suggestions. And sincerely thank our corresponding author for her instruction and supervision.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Specifically, material preparation, data collection, and analysis were performed by SW and XL. Data visualization was accomplished by SW. The first draft of the manuscript was written by SW and XL, and all authors commented on previous versions of the manuscript. The whole study was supervised by MW and SW. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wa, S., Lu, X. & Wang, M. Regression model and method settings for air pollution status analysis based on air quality data in Beijing (2017–2021). Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00415-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41060-023-00415-7