Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data
Introduction
High levels of haze and PM2.5 usually induced by coal combustion, vehicle emissions, industrial processes, and petroleum usage, have become progressively more serious public crises in urban areas (Kong et al., 2015, Tan et al., 2015). In 2009, the annual average concentration of PM2.5 in Milan was 30 (Perrone et al., 2012), which was higher than the World Health Organization's short-term (24-hour) guideline concentration value of 25 μgm-3 (Caggiano et al., 2011). In January 2013, Beijing experienced a severe and long-lasting haze episode during which the daily mean PM2.5 concentrations exceeded 75 (the Second Grade National Standard for China) for 22 days, and exceeded 35 (the First Grade National Standard for China) for 27 days (He et al., 2014). Moreover, as has been shown in many studies, the health risks from PM2.5 exposure can be serious, due primarily to the chemical composition of PM2.5. Higher risk of pulmonary disease, emphysema, lung and nasal cancer could result from the carcinogenic constituents in PM2.5 (Diaz and Dominguez, 2009, Matus et al., 2012, Gursumeeran Satsangi et al., 2014). To efficiently respond to the PM2.5 problem, both individually and governmentally, it is necessary to develop effective models to analyze the correlation between PM2.5 and potential related factors, and to be able to predict the concentrations of PM2.5 in time series.
Most of the research that has been done on the PM2.5 problem focuses on either Sampling Analysis or Atmospheric Dispersion Model Simulation. Sampling Analysis can describe the characteristics of PM2.5 in certain regions to a certain extent. A PM-visibility correlation study was carried out in the Yangtze River Delta in China using sampling PM data and meteorological data from 1980 to 2012. The results showed that fine PM, such as PM2.5, was the key influence on visibility in this region. Fine PM affects both the PM itself and the relative humidity, ultimately altering visibility (Cheng et al., 2013). Additional studies with sampling PM data showed that PM2.5 follows a characteristic seasonal cycle (Zhao et al., 2013), and the scenes of PM2.5 differ greatly according to the region in which the huge Chinese city is located (Yang et al., 2011). The Atmospheric Dispersion Model Simulation is used mostly because of the dynamic mechanisms resulting in the dissemination and transmission of PM2.5. To model the distribution circumstances and the dispersion of PM2.5 in certain regions, the CMAQ Model (Pun et al., 2006, Liu et al., 2008, Wang et al., 2012, Chemel et al.,2014), the WRF-Chem Model (Saide et al., 2011, Marcelo et al., 2012), the GEOS-CHEM Model (Hu et al., 2009, Zhang et al., 2013) and some other atmospheric dispersion models were applied.
Despite the studies mentioned above, the ability to analyze and predict PM2.5 from the perspective of atmospheric dynamics is still limited because of the complexity of the formation and development of PM2.5 (Kirk and Peter, 2007, Kirk and Kristen, 2011, Zhang et al., 2016). Recently, studies of air pollution from the aspect of Big Data have appeared. Some studies gave an exposition of the qualitative correlation of PM2.5 and meteorological factors, traffic factors, human mobility, etc., through a Data Co-Training Method (Zheng et al., 2013). Another way to qualitatively analyze the correlation between PM2.5 and related factors is to perform a Correlation Coefficient Analysis, the result of which can show which factors are most related to PM2.5 (Chen et al., 2014, Yang et al., 2015). Some quantitative methods have been developed based on Neural Network models to forecast the concentrations of PM2.5 (Russo et al., 2013, Arhami et al., 2013, Fu et al., 2015). In the field of correlation analysis and estimation of the current PM2.5 using related factors based on data mining methods, the RMSE of the predicted PM2.5 and observed PM2.5 was around the level of about 20 to 30 (Xie et al., 2015, You et al., 2016, Fu et al., 2015).
Big data encompasses many different data types, ranging from social media data (mobile app data, microblog data, internet search engine data, etc.) to purely physical data. For example, as the concentration of PM2.5 increases, complaints and comments about it in microblogs increase substantially. Studies into the correlation of PM2.5 and social media data are useful in helping to understand PM2.5 and thus facilitating information distribution and pre-warning of PM2.5. This paper established a correlation analysis model of PM2.5 to physical data (meteorological data, other pollutant concentration data) and social media data (especially microblog data) based on the Multivariate Statistical Analysis method and the BPNN method. The correlation analysis models were evaluated by the indicator RMSE. The RMSE of the estimated concentration of PM2.5 and the actual concentration of PM2.5 based on the Multivariate Statistical Analysis method and the BPNN method reached to 26.69 and 24.06 in the case study in Beijing, China, a city facing complex PM2.5 problem in recent years. A short-term prediction of PM2.5 was made using historical PM2.5 data based on the ARIMA Time Series model, and the relationship of RMSE to the prediction time-lag was also studied. This study helps to realize real-time monitoring, analysis and pre-warning of PM2.5, and also broadens the application of big data and the multi-source data mining methods.
Section snippets
Overview
This paper focused on relevance analysis and short-term prediction of PM2.5 using multi-source data. In this study physical data were used, including meteorological data (regional average rainfall, daily mean temperature, average relative humidity, average wind speed, maximum wind speed) and other pollutant concentration data (CO, NO2, SO2, PM10). We selected microblog data to represent social media data, since it is a well-used platform with a large number of users in China. Fig. 1 shows the
Study area and data
Beijing covers an area of 16 808 square kilometers and has a population of nearly 21.7 million. PM2.5 levels are currently extremely high in highly-populated city clusters such as the Beijing-Tianjin-Hebei region (Zhang and Cao, 2015). Beijing is thus a typical city faced with a serious PM2.5 problem. Fig. 3 shows the study area. The black marks on the map represent the main air monitoring stations for each district in Beijing.
The time period used in this study was from 1st January to 31st
Relevance analysis of PM2.5 and the related factors based on Multivariate Statistical Analysis
We collected physical data (meteorological data, including regional average rainfall, daily mean temperature, average relative humidity, average wind speed, maximum wind speed, and other pollutant concentration data, including CO, NO2, SO2, PM10) and social media data (microblog data) from 2014. Correlation scatter diagrams of PM2.5, other physical data, and social media data are plotted using the database in Fig. 4. For the content transmission of microblog is common in specific news,
Conclusion
Correlation analysis and short-term prediction of PM2.5 based on multi-source data mining were explored in this paper. Two methods were developed to analyze the correlation of PM2.5 and other possibly related factors: physical data (meteorological data, including regional average rainfall, daily mean temperature, average relative humidity, average wind speed, maximum wind speed), other pollutant concentration data (CO, NO2, SO2, PM10) and social media data (microblog data). This paper also
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 71473146). This work was supported by China Clean Development Mechanism Foundation (grant No. 2013049). This work was also supported by the Collaborative Innovation Center of Public Safety.
References (33)
- et al.
PM2.5 measurements in a Mediterranean site: two typical cases
Atmos. Res.
(2011) - et al.
Application of chemical transport model CMAQ to policy decisions regarding PM2.5 in the UK
Atmos. Environ.
(2014) - et al.
Observation of aerosol optical properties and particulate pollution at background station in the Pearl River Delta region
Atmos. Res.
(2014) - et al.
Long-term trend of haze pollution and impact of particulate matter in the Yangtze River Delta, China
Environ. Pollut.
(2013) - et al.
Health risk by inhalation of PM2.5 in the metropolitan zone of the City of Mexico
Ecotoxicol. Environ. Saf.
(2009) - et al.
New algorithms and their application for satellite remote sensing of surface PM2.5 and aerosol absorption
Aerosol Sci.
(2009) - et al.
Variation of polycyclic aromatic hydrocarbons in atmospheric PM2.5 during winter haze period around 2014 Chinese Spring Festival at Nanjing: insights of source changes, air mass direction and firework particle injection
Sci. Total Environ.
(2015) - et al.
Health damage from air pollution in China
Glob. Environ. Change
(2012) - et al.
Sources of high PM2.5 concentrations in Milan, Northern Italy: molecular marker data and CMB modelling
Sci. Total Environ.
(2012) - et al.
Air quality prediction using optimal neural networks with stochastic variables
Atmos. Environ.
(2013)
Forecasting urban PM10 and PM2.5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF-Chem CO tracer model
Atmos. Environ.
Urban air quality and regional haze weather forecast for Yangtze River Delta region
Atmos. Environ.
Characterization of haze episodes and factors contributing to their formation using a panel model
Chemosphere
Predicting hourly air pollutant levels using artificial neural networks coupled with uncertainty analysis by Monte Carlo simulations
Environ. Sci. Pollut. Res.
Time series analysis, forecasting and Control
J. R. Stat. Soc. Ser. A General.
Prediction of particular matter concentrations by developed feed-forward neural network with rolling mechanism and gray model
Neural Comput. Appl.
Cited by (137)
Emission inventory and air quality impact of non-road construction equipment in different emission stages
2024, Science of the Total EnvironmentResearch status and evolution trends of emergency information resource management: Based on bibliometric analysis from 2003 to 2022
2023, International Journal of Disaster Risk ReductionUrban climate and cardiovascular health: Focused on seasonal variation of urban temperature, relative humidity, and PM<inf>2.5</inf> air pollution
2023, Ecotoxicology and Environmental SafetyImpact of lifetime air pollution exposure patterns on the risk of chronic disease
2023, Environmental ResearchMulti-component emission characteristics and high-resolution emission inventory of non-road construction equipment (NRCE) in China
2023, Science of the Total Environment