Correlation Analysis Between PM2.5 Concentration and Meteorological, Vegetation and Topographical Factors in the Urbanized Ecosystem in Beijing, China

With the economic growth and massive industrialization, the air quality of China in general and industrial regions in specific has saturated with different health hazard pollutants. Among the pollutants, PM2.5 is posing some serious threats to the society. In this study we evaluated the correlation between PM2.5 concentration and 12 different meteorological, vegetation and topographical factors in Beijing, China. We used the Difference Index (DI) method and dark pixel method to retrieve the PM2.5 concentration of 30m and 1km spatial resolution. Spearman correlation analysis method was used to analyse the correlation between PM2.5 concentration and three types of 12 factors. The results showed that the forest land can play a major role in decreasing the PM2.5 concentration in the air, as in this study a significant drop of (18.78%) was observed in PM2.5 concentration in the regions having coniferous forest. Moreover, the PM2.5 reduction rate was positively correlated with forest vegetation coverage (FVC). Our results demonstrated that relative humidity, air pressure and water vapour pressure were positively correlated with PM2.5, while air temperature and wind speed were negatively correlated. The altitude and slope showed a weak negative correlation with PM2.5 concentration, while, aspect was very weakly correlated with the PM2.5 concentration. The findings of this study could help design the urban green space planning and air pollutioncontrol in the heavily populated urban ecosystems.


INTRODUCTION
In recent decades China has experienced an exponential infrastructural and developmental growth. With the economic growth and massive industrialization, the air quality of China, in general, and industrial regions in particular have saturated with different health hazard pollutants. Among the pollutants, the PM 2.5 is posing some serious threats to society. The distribution of PM 2.5 is affected by the emission, diffusion and propagation conditions of air pollution sources. Moreover, the atmospheric chemical reactions, under specific conditions, diffusion of pollen and other particles in the air also facilitate its distribution. Recently, many research studies have been focused on the PM 2.5 spatial distribution and related factors , as it provides a reference for the effective control and management of atmospheric particles.
The diffusion and propagation of PM 2.5 is a very complex system, which is affected by many factors. Some studies established a direct correlation of spatial distribution of PM 2.5 with meteorological factors, such as wind speed and air pressure etc. (Feng 2019). Table 1 exhibits the findings of some studies that are conducted to assess the correlation between temperature and particle concentration in 10 years (Yang 2009, Li 2011, Deng 2012. Wang 2019a, Hou 2019). It can be seen that the correlation between the same factors with PM 2.5 concentration may be different in different regions. The correlation of the same region, factor and different time is not completely consistent.
The leaves, branches and trunks of plants have microstructure, rough texture, and secretions such as the wax layer that can absorb PM 2.5 in its surface. The complex three-dimensional structure of forest vegetation, coniferous and broad-leaved forests canopy, and groove like tissue on leaves surface having a certain degree of humidity and roughness, that can effectively intercept and retain fine PM 2.5 found in the air ). In addition, vegetation cover and types and canopy shading and transpiration play an important role in the construction of a suitable environment for particle deposition, effectively avoiding the environment that is not conducive to PM 2.5 deposition . Nowak et al. (2006) showed that trees can remove air pollutants and improve urban air quality. The estimated total annual pollutant removal can reach up to 711 thousand tons in the urban ecosystem using trees. Liu et al. (2014) studied the correlation between PM 2.5 concentration and forest area in the Fifth Ring Road of Beijing and their results showed that PM 2.5 concentration was negatively correlated with air temperature and positively correlated with relative humidity.
Vegetation cover is an important index to measure the state of vegetation and plays an effective role in the concentration of PM 2.5 (Nowak 2006. Above ground vegetation biomass is the most direct expression of the structure and function of the forest ecosystem, which can reflect the environmental quality of the ecosystem (Zhang & Buren 2020). Wind directly affects the propagation of PM 2.5 , air temperature and air pressure play an important role in the diffusion, and dilution and transport of PM 2.5 ). In addition, the PM 2.5 has certain hygroscopic property (Han & Wang 2016). The relative humidity and water vapour pressure reflect the moisture content and stability of the atmosphere (Wang et al. 2020) and play a significant role in the distribution and concentration of PM 2.5 . Therefore, this study selected 5 meteorological factors including air temperature (temp), relative humidity (RH), wind speed (V_wind), air pressure (p) and water vapour pressure (e). At the same time, the terrain conditions affect climate and vegetation growth. Zhang (2019) studied the concentration of particles and the geographical topography in Zhangzhou of 2017-2018, found that the geographical topography of this city is not conducive to the dilution and diffusion of air pollutants. Terrain factors, such as altitude and slope, affect the meteorological conditions (e.g. temperature and wind) and vegetation growth to a certain extent. This study focused to estimate the influence of terrain factors including altitude, slope, position and direction on the distribution of PM 2.5 .
In this study, we evaluated the correlation of atmospheric particulate PM 2.5 concentration with 3 types of 12 factors (Table 2).

Overview of the Study Area
Beijing, the capital of China (39°28'-41°03'N，115°25'-117°30'E) is spanning over an area of 16410.54 km 2 . It is surrounded by mountains on the north, east and west side. The terrain is high in the northwest and low in the southeast, with the warm and humid zone and semi-humid continental monsoon climate. Frequent haze in winter and PM 2.5 are the main pollutants throughout the year. Beijing has set up 35 ground air quality monitoring stations and 17 meteorological stations (Fig. 1). The zonal vegetation types in Beijing are warm temperate deciduous broad-leaved forests and temperate coniferous forests (Feng et al. 2015). The results of the Eighth National Forest Resources Inventory from 2009 to 2013 showed that the forest coverage rate of Beijing is 35.84%, with the total area of forest land of 1013500 hectares, and the forest area of 588100 hectares.

Data Acquisition and Processing
This paper aims to correlate the temporal and spatial distribution of PM 2.5 with meteorological, topographical, vegetation and other factors using data obtained from multiple sources  ( Table 3). The data acquisition and analysis were carried out on the days with less cloud and better remote sensing image acquisition effect. The satellite remote sensing data were acquired of the year 2016 for dates including March 1, May 6, June 8, June 9, June 10, July 24, August 4, October 19, December 6, December 14, December 18 and December 19 in 2016, respectively. All the spatial data were unified as WGS_1984_UTM_Zone_50N coordinate system.

Inversion of PM 2.5 Concentration
At present, most researchers use the data of ground monitoring stations and Geographic Information System (GIS) spatial interpolation to obtain the spatial-temporal distribution of PM 2.5 . This kind of method is greatly influenced by the number and distribution of sampling points. There are 35 air quality monitoring stations in our study area. The monitoring stations are unevenly distributed in the area as are

Data Acquisition and Processing
This paper aims to correlate the temporal and spatial distribution of PM 2.5 with meteorological, topographical, vegetation and other factors using data obtained from multiple sources (    more concentrated in the central area as compared to suburbs (Fig. 1). The accuracy of PM 2.5 concentration obtained by this method was poor. According to the research of Su et al. (2015) the correlation coefficient "r" between aerosol optical thickness (AOD) data retrieved from NPP_VIIRS (National Polar-orbiting Partnership_Visible Infrared Imaging Radiometer Suite) 750m surface reflectance data and AERONET (AErosol RObotic NETwork) ground aerosol remote sensing data is 0.7920. In this paper, we used this method to retrieve AOD by using the surface reflectance and calibration data of VIIRS sensor of NPP satellite, through altitude correction, using 6S model, establishing look-up table, and resampling it to 1km resolution. According to the transit time of the NPP satellite on the same day, we selected the closest ground monitoring station to collect the PM 2.5 concentration data for model fitting. Due to the seasonal difference of AOD, the four seasons including Spring (March-May), Summer (June-August), Autumn: (September-November), Winter (December-February) were respectively fitted as per the division of meteorology to retrieve the concentration and distribution of PM 2.5 on that day in Beijing. The model accuracy of four seasons are 0.9299 (spring) , 0.8181 (summer) , 0.8958 (autumn) and 0.8597 (winter) respectively. To analyze the effect of vegetation on the concentration of PM 2.5 we followed our previous work (Feng et al. 2018) and used near-infrared and infrared bands of Landsat 8 data to establish the remote sensing Difference Index (DI), and inversed the 30m particle concentration distribution. The comparison of PM 2.5 distribution maps which was retrieved using the aerosol inversion method (VIIRS data) and the difference index method (Landsat data) on March 1, 2016 is given in Fig. 2

Vegetation Data Acquisition
Quantification of forest land type: Based on the sub-compartments survey data of 2016 in Beijing obtained from Beijing forestry bureau, we carried out a geographical analysis of the land type of forest in the study area. According to Technical Regulations for Forest Resources Planning and Design in China, the land classification of the data was   As the forest land is a qualitative factor, it needs to be quantified and assigned for carried out to generate 1km resolution grid map of the forest resources found in our study area (Fig. 3).
As the forest land is a qualitative factor, it needs to be quantified and assigned for statistical analysis. According to the available data, there are very few data about young afforested land, sandy wasteland, other suitable forestlands, cutting slash and burned blank, therefore, in the statistical analysis, they are classified into the first level classification of young forest and suitable forest land and site land (Table 4). To intuitively analyze the impact of forest land on PM 2.5 , four second level non-forest lands were classified into one category. The water areas in non-forest land were not included in the statistical analysis.
Vegetation coverage and aboveground biomass data acquisition: In this paper, MODIS (Moderate-resolution Imaging Spectroradiometer) 09GQ surface reflectance data from 2016 were used to obtain vegetation coverage (250m), while the MRT (MODIS Reprojection Tool) was used to re-project and splice the mod09 data. Cutting and band operation were carried out in ENVI to calculate vegetation coverage FVC, which was then resampled in ArcGIS software to 1km resolution. Due to the small change of vegetation coverage in a short period, the data with good inversion effect of May 6 (spring), July 24 (summer), October 17 (autumn) and December 19 (winter) for the year 2016 were selected as the coverage data of each season (Fig. 4).
Based on the above-ground biomass data of global 1km forest provided by the national integrated earth observation data-sharing platform, the biomass distribution map of Bei- Auxiliary production forestland auxiliary production forestland 180 9 Non-forestland farmland 200 water area unused land Other land

Other land
Vegetation coverage and aboveground biomass data acquisition: In this paper, MODIS (Moderate-resolution Imaging Spectroradiometer) 09GQ surface reflectance data from 2016 were used to obtain vegetation coverage (250m), while the MRT (MODIS Reprojection Tool) was used to re-project and splice the mod09 data. Cutting and band operation were carried out in ENVI to calculate vegetation coverage FVC, which was then resampled in ArcGIS software to 1km resolution. Due to the small change of vegetation coverage in a short period, the data with good inversion effect of May 6 (spring), July 24 (summer), October 17 (autumn) and December 19 (winter) for the year 2016 were selected as the coverage data of each season (Fig. 4). Based on the above-ground biomass data of global 1km forest provided by the jing (Fig. 5) was generated after cutting. According to the acquisition time of the measured data, the biomass data were only used for summer and autumn seasons. Forest profile data acquisition: To analyze the impact of vegetation on the concentration of PM 2.5 , three large forest areas, Tiantan Park, Orson centre and northern mountain area, were selected to obtain the vegetation coverage and analysed it on a high-resolution scale.
In ERDAS software, two sections of horizontal and vertical lines were drawn, which run through the forest area and extend to two pixels outside the forest area. The PM 2.5 concentration and the forest vegetation coverage on the section line were extracted based on Landsat data. To facilitate viewing, we enlarged the vegetation coverage value by 300%. Calculation of PM 2.5 reduction rate in forest area: In the equation (3) t is the PM 2.5 concentration of air particles in the forest area, and x b is the concentration outside the forest area. The PM 2.5 concentration at the two ends of the profile line is taken as x b , then the reduction rate of PM 2.5 (ρ) can be calculated by the following formula: ion rate of PM2.5 (ρ) can be calculated by the following formula: ted the reduction rate of each point on the section line of Fig. 6, and fit gram model with the extracted vegetation coverage value using the l to predict the reduction rate when the vegetation coverage exceeds 90% ward to vegetation coverage = 1). In the modelling, 30% of data were …(1) We calculated the reduction rate of each point on the section line of Fig. 6, and fit the scatter diagram model with the extracted vegetation coverage value using the extended model to predict the reduction rate when the vegetation coverage exceeds 90% (extending forward to vegetation coverage = 1). In the modelling, 30% of data were used to verify the accuracy of the model (R 2 i = 0.649).

Meteorological Data Acquisition
The meteorological data were acquired from the MICAPS (Meteorological Information Combine Analysis and Process System) based on the data of 17 meteorological stations in Beijing. The daily average data corresponding to the time in the inversion model of PM 2.5 was selected for statistical analysis. The relative humidity data were obtained by querying the temperature and dew point. The spherical variogram of the Kriging interpolation tool of ArcGIS was used to interpolate the data of 5 meteorological factors into Beijing. Then we resampled the data to 1km. The distribution of five meteorological factors is shown in Fig. 6 (Data of 1 st March 2016).

Acquisition and Quantification of Terrain Data
Using the DEM (Digital Elevation Model) data of 1km spatial resolution downloaded from the scientific data centre in the arid area of the cold region website (http://westdc.westgis. ac.cn), the altitude data of Beijing were obtained after cutting. 3D analysis tools in ArcGIS were used to extract the slope and aspect data. In the forest resource survey, the slope position was usually divided into six categories: ridge, upper, middle, flat, lower and valley to study the influence of site conditions on vegetation growth. The distribution of atmospheric particles in a small range was not very different due to their existing forms. Therefore, the slope position was divided into three categories upper, middle and lower (Zhao et al. 2014). The slop direction data was transformed into the vector in ArcGIS. After that, we extracted the maximum elevation (Hmax) and the minimum elevation (Hmin) of each  ope direction and slope position are qualitative factors. To carry out analysis, they were assigned and quantified respectively (Fig. 7, Table 5,  lope direction and slope position are qualitative factors. To carry out analysis, they were assigned and quantified respectively (Fig. 7, Table 5,

…(3)
The slope direction and slope position are qualitative factors. To carry out correlation analysis, they were assigned and quantified respectively (Fig. 7, Table 5, Table 6).

Sample Data Collection
Using the create random points tool in ArcGIS, 3000 random sampling points were created in Beijing, and the minimum allowable distance was set as 1km (Fig. 8). PM 2.5 concentration data and 12 kinds of impact factors data on the sampling points were extracted for statistical analysis. After removing the data that failed to acquire due to the quality of remote sensing image and abnormal data whose deviation from the average is more than three times of the standard deviation, 20473 groups of valid data were obtained.

Spearman Correlation Analysis Method
Spearman correlation analysis can provide the common changing trend of two random variables under the linear or non-linear correlation, and the conclusion can still be obtained under the condition of unclear overall distribution and lack of overall information (Cameriere et al. 2019). The calculation formula is:

Spearman Correlation Analysis Method
Spearman correlation analysis can provide the common changing trend of tw random variables under the linear or non-linear correlation, and the conclusion can st be obtained under the condition of unclear overall distribution and lack of overa information (Cameriere et al. 2019). The calculation formula is: Where, r is the correlation coefficient, n is the number of samples. Suppose that th original data xi, yi have been arranged in the order of large to small, xi 1 , yi 1 is the ran of xi, yi, and di is the difference of the rank of xi 1 , yi 1 . The absolute value range correlation coefficient 'R' (0~1) is used to judge the correlation strength, the larger th value, the stronger the correlation (Table 7).

Analysis of the Impact of Vegetation on PM2.5
It can be seen from the comparison between the left and right graphs of

Spearman Correlation Analysis Method
Spearman correlation analysis can provide the common changing trend of random variables under the linear or non-linear correlation, and the conclusion can be obtained under the condition of unclear overall distribution and lack of ove information (Cameriere et al. 2019). The calculation formula is: Where, r is the correlation coefficient, n is the number of samples. Suppose that original data xi, yi have been arranged in the order of large to small, xi 1 , yi 1 is the r of xi, yi, and di is the difference of the rank of xi 1 , yi 1 . The absolute value range correlation coefficient 'R' (0~1) is used to judge the correlation strength, the larger value, the stronger the correlation (Table 7).

Analysis of the Impact of Vegetation on PM2.5
It can be seen from the comparison between the left and right graphs of Fig. 9 at the forest edge (including the vegetation blank area in the middle of the forest), concentration of PM2.5 decreased significantly while in the interior of the forest change was smaller.

…(5)
Where, r is the correlation coefficient, n is the number of samples. Suppose that the original data x i , y i have been arranged in the order of large to small, x i 1 , y i 1 is the rank of x i , y i , and d i is the difference of the rank of x i 1 , y i 1 . The absolute value range of correlation coefficient 'R' (0~1) is used to judge the correlation strength, the larger the value, the stronger the correlation (Table 7).

Analysis of the Impact of Vegetation on PM 2.5
It can be seen from the comparison between the left and right graphs of Fig. 9 that at the forest edge (including the vegetation blank area in the middle of the forest), the concentration of PM 2.5 decreased significantly while in the interior of the forest this change was smaller.
The reduction rate of PM 2.5 concentration (Fig. 10) was not in a simple linear correlation with the vegetation coverage. As with an increase of vegetation coverage, a decrease in the reduction rate of PM 2.5 was observed, but when the vegetation coverage was too high (>83%), a decrease in the reduction rate took place.
The PM 2.5 concentration in all different forest land types was lower than that of non-forest land ( Table 8). The most into Beijing. Then we resampled the data to 1km. The distribution of five meteorological factors is shown in Fig. 6 (Data of 1 st March 2016).

Acquisition and Quantification of Terrain Data
Using the DEM (Digital Elevation Model) data of 1km spatial resolution downloaded from the scientific data centre in the arid area of the cold region website (http://westdc.westgis.ac.cn), the altitude data of Beijing were obtained after cutting.
3D analysis tools in ArcGIS were used to extract the slope and aspect data. In the forest resource survey, the slope position was usually divided into six categories: ridge, upper, middle, flat, lower and valley to study the influence of site conditions on vegetation growth. The distribution of atmospheric particles in a small range was not very different due to their existing forms. Therefore, the slope position was divided into three categories upper, middle and lower (Zhao et al. 2014). The slop direction data was transformed into the vector in ArcGIS. After that, we extracted the maximum elevation   The slope direction and slope position are qualitative factors. To carry out correlation analysis, they were assigned and quantified respectively (Fig. 7, Table 5, Table 6).  Fig. 7: Altitude, slope, aspect and TPI distribution of Beijing.
The slope direction and slope position are qualitative factors. To carry out correlation analysis, they were assigned and quantified respectively (Fig. 7, Table 5, Table 6).

Sample Data Collection
Using the create random points tool in ArcGIS, 3000 random sampling points were created in Beijing, and the minimum allowable distance was set as 1km (Fig. 8). PM2.5 concentration data and 12 kinds of impact factors data on the sampling points were extracted for statistical analysis. After removing the data that failed to acquire due to the quality of remote sensing image and abnormal data whose deviation from the average is more than three times of the standard deviation, 20473 groups of valid data were obtained.

Correlation Analysis Between Influence Factors and PM 2.5
The correlation analysis results of daily average PM 2.5 concentration and meteorological factors and vegetation factors are given in Table 9 and Table 10, while Table 11 shows the correlation analysis result of annual average PM 2.5 concentration and terrain factors.
The correlation between air temperature and PM 2.5 was not completely consistent for the studied seasons. It was recorded negative in summer, autumn and winter while positive in spring. In terms of the correlation coefficient, the correlation between air temperature and atmospheric particulate was weak (< 0.4), among which the correlation The reduction rate of PM2.5 concentration (Fig. 10) was not in a simple linear correlation with the vegetation coverage. As with an increase of vegetation coverage, a decrease in the reduction rate of PM2.5 was observed, but when the vegetation coverage was too high (>83%), a decrease in the reduction rate took place.  The correlation between relative humidity and PM 2.5 was significantly positive in summer (R = 0.475) and winter (R= 0.368), while a weak correlation was observed for spring and autumn seasons, with the correlation coefficients less than 0.2.
Wind speed and PM 2.5 were negatively correlated in spring and winter, with the correlation coefficient greater than 0.4. In summer, the correlation was extremely weak (R < 0.1), while in autumn, the correlation between wind speed and atmospheric particles was not significant (sig > 0.05).
There was a significant positive correlation between the concentration of PM 2.5 and forest land types for all four studied seasons (sig < 0.01, R > 0). According to the table of assignment of forest land type (Table 3), the assignment was    gradually increasing from closed forest land to non-forest land, indicating that forest has a decreasing effect on the PM 2.5 concentration. Vegetation coverage, biomass and PM 2.5 concentration showed a significant negative correlation, with a higher correlation coefficient for summer (-0.503) and autumn (-0.508).
In this study, a weak correlation was observed between terrain factors and PM 2.5 concentration. Altitude was negatively correlated with PM 2.5 concentration (R = -0.266) while, the correlation between PM 2.5 and the slope direction (R = -0.054), and slope position (R = 0.023) concentration was poor.

Analysis of Impact of Vegetation on PM 2.5 and Correlation
Results of this study showed how different factors can affect the PM 2.5 concentration in an area. Our study outcomes showed that the coniferous forest can significantly reduce PM 2.5 concentration. The special leaf shape and three-dimensional microstructure of coniferous forest land has a larger contact area and can effectively intercept and retain fine atmospheric particles in the air. Moreover, the coarse texture and secretion promote the adsorption of particulates, which can cause a stronger PM 2.5 reduction effect than mixed, broad-leaved and other forest lands (Yu 2019). In terms of time, the negative correlation between vegetation coverage, biomass and PM 2.5 were larger in summer and autumn, which may be related to the growing season of main tree species in Beijing, concentrated in late spring to early autumn, thus all physiological functions of vegetation reached the best state, and the effect on PM 2.5 could have improved (Bai et al. 2020).
When the forest vegetation coverage was greater than 0.83, a decrease in the PM 2.5 concentration was observed. The dense canopy and complex morphological structure where the vegetation coverage is too high affect the air turbulence in the forest area, which is not conducive to the diffusion of atmospheric particles (Zhang et al. 2018). Moreover, when the vegetation coverage is dense, the microclimate has high humidity and low temperature in the forest area. The hygroscopicity of atmospheric particles makes it easier to gather in the area with higher humidity (Liu et al. 2016).

Correlation Analysis Between Meteorological Factors and PM 2.5
Overall, the temperature was negatively correlated with the PM 2.5 concentration, as the surface temperature increases, the convection between the surface and the upper cold air gets stronger. As a result, the turbulence is enhanced, which could be conducive to the diffusion of atmospheric particles (give a reference for this statement). In winter, the correlation coefficient value was very small, which may be related to the frequent inversion phenomenon in winter. Inversion in meteorology refers to the abnormal phenomenon that the air temperature over the ground increases with the height under certain weather conditions. At the same time, the atmospheric structure of inversion layer is stable, which is not conducive to the diffusion of atmospheric particles (Wang & Ni 2019). The correlation in spring was positive, which may be related to the increase of dust, vegetation floating pollen and other particles in the atmosphere, so temperature increase causes the acceleration of atmospheric flow, increasing particulate concentration. Air particles have obvious hygroscopicity, and the high humidity air often causes the agglomeration of the air particles, resulting in the heavy pollution of the air particles (Long et al. 2017).
Wind speed has an important influence on the diffusion, dilution, transportation and propagation of atmospheric particles. In a certain range, the greater the wind speed, the faster the diffusion and propagation of atmospheric particles. When the wind speed is high, an area with a lot of dust on the ground could produce fugitive dust. Besides, Beijing is located in the monsoon climate zone, therefore, in the summer monsoon blows from the ocean to the land, bringing the moisture content along with. The hygroscopicity of atmospheric particles causes the concentration of atmospheric particles to rise to a certain extent. In the area with high air pressure, the atmospheric convection and diffusion of atmospheric particles are weak, hence the concentration value is high.
Water vapour pressure was significantly correlated with PM 2.5 . This is because when the water vapour pressure is high, the water vapour content in the atmosphere causes the partial pressure of water vapour to be high, and water vapour is easy to attach to the fine atmospheric particles to form the pollution of atmospheric particles (Bao et al. 2017).

Correlation Analysis Between Terrain Factors and PM 2.5
Due to vertical movement of particles with air convection, their concentrations become lower at higher altitude. In addition, with the increase of altitude, the air temperature also decreases, which is not conducive for the diffusion of particles (Wang 2019b). In our results, a weak negative correlation was observed between slope and PM 2.5 concentration; the steeper the slope is, the lower the concentration of atmospheric particles. The possible reason is that the slope affects the air movement, to a certain extent. As the air collision and turbulence phenomenon at the steep slope is more frequent than that at the flat slope and the particles diffuse quickly with the airflow .