Estimation of PM 2.5 Concentrations over Beijing with MODIS AODs Using an Artificial Neural Network

Three years of Aerosol Optical Depths (AODs) retrieved from the Moderate Resolution Imaging Spectroradiometer (MODIS) and five meteorological parameters from the NCEP FNL reanaly - sis data, are used to generate an Artificial Neutral Network (ANN)- based nonlinear model for estimating the surface PM 2.5 concentrations over Beijing. To increase the number of both the training and forecasting samples for better training results and to guarantee the continuity and representativeness of the samples, the MODIS AODs are gridded with seasonally dependent windows sizes. The past PM 2.5 concentrations simulated by the ANN model are con - trasted with the real observations for six years from 2008 to 2013. The results indicate that the ANN model can effectively simulate the surface PM 2.5 concentrations, and the mean bias, correlation coefficient, and the root mean square error between these data are −16.10, 0.73, and 55.43, respectively. This study also demon - strates that the Planetary Boundary Layer Height (PBLH) is the most important meteorological factor in constructing the ANN model. Compared to the linear regression model using only AOD, the correlation coefficient can be increased from 0.68 to 0.76 with the ANN model by using both the AOD and the PBLH data. ( Citation : Lyu, H., T. Dai, Y. Zheng, G. Shi, and T. Nakajima, 2018: Estimation


Introduction
Air pollution is one of the most important forms of environmental pollution, especially in developing countries undergoing rapid industrialization and urbanization, such as China (Guo et al. 2011).Particulate matter with a diameter less than 2.5 μm (PM 2.5 ) is one of the most important pollution indices for air pollution.Exposure to PM 2.5 can cause health problems and increase morbidity and mortality rates, causing serious harm to both the economy and human health (Brunekreef et al. 2002;Pope et al. 2009;Drury et al. 2010;Gao et al. 2015;Goto et al. 2016).The observation of surface PM 2.5 concentrations is of great value to science and public health security.
However, the history of surface PM 2.5 concentration observations is limited, especially in developing countries, because of the high expense of the observation instruments and related personnel.China, for example, began national PM 2.5 observations in 2014.The surface PM 2.5 concentrations of 367 cities of China are available from the national air quality real-time publishing platform of the China Environmental Monitoring Station (http://106.37.208. 233:20035/).Before 2014, only certain major cities, such as Bei-jing, conducted observations of PM 2.5 concentrations.The remaining cities only have approximately 3 years of PM 2.5 observations.
Compared with the short observation history of surface PM 2.5 concentrations, the AOD product from the Moderate Resolution Imaging Spectroradiometer (MODIS) has a much longer history.Thus, many studies have been conducted to estimate surface PM 2.5 concentrations using MODIS AOD data and other meteorological data (Chu et al. 2003;Gupta et al. 2006;Koelemeijer et al. 2006;Liu et al. 2007;Hoff et al. 2009).
Linear regression method is a conventional method in estimating the PM 2.5 using MODIS AOD.It has advantages of fast to compute and easy to operate.However, the linear regression cannot handle non-linearity in estimation of the PM 2.5 .To solve the problem of unable to handle non-linearity, which is often seen in PM 2.5 estimation, methodology like Artificial Neural Network (ANN) is often used to, for example, forecast the future PM 2.5 concentration (Perez et al. 2006;Feng et al. 2015).
In this study, we first build a nonlinear model to estimate surface PM 2.5 concentrations over Beijing with MODIS AOD and NECP FNL reanalysis data using an Artificial Neural Network (ANN).Compared with the conventional linear regression method, ANNs can handle the non-linearity in estimation well, compute fast and have a moderate operational cost.Moreover, ANNs usually have a higher accuracy compared with conventional linear regression method.Especially, the MODIS AODs used in study are gridded with seasonally dependent windows sizes to both increase the number of the training and forecasting samples and guarantee the continuity and representativeness of the samples.We then aim to identify the relative importance of the factors used in the model, including AOD and five meteorological factors.

Data
The data used in this study include the Collection 6 MODIS Level-2 AOD data, the NCEP FNL reanalysis data (6-hourly) and the surface PM 2.5 concentration data for Beijing (hourly).

MODIS AODs
In this study, the Collection 6 Level-2 MODIS (Levy et al. 2013) AODs at 550 nm from Terra (MOD04_L2) are used to represent the AODs of Beijing.The MODIS Level-2 aerosol products are raster datasets with a pixel size of 10 km × 10 km (Mcmillan et al. 2008) and must be extracted from MODIS imagery using a particular window size following the methodology adopted by Ichoku et al. (2002).The size of the horizontal window should be chosen carefully to maintain both continuity and representativeness.If the size of the window is too small, the number of valid observations will be small and not sufficiently continuous, which is not good for the following model construction.However, if the size of the window is too large, the data will lack representativeness and accuracy.
To identify the proper window size, the MODIS AOD data are Lyu et al.,Estimation of PM 2.5 with Artificial Neural Network ducted in the same manner.
The correlation coefficients (R) between the AOD time series over each grid box and those over the center box are calculated.To consider the different aerosol sources and meteorological characteristics in different seasons, the variations in the seasonal average correlation coefficients for 2013 are calculated as shown in Fig. 2. The horizontal distributions of R have clear and significant seasonal variations, indicating that different windows sizes should be used in different seasons.To choose the window sizes for different seasons quantitatively, the seasonal and yearly average correlation coefficients are calculated and presented in Fig. 3.The correlation coefficients tend to decrease more slowly with increasing distance in the summer (June, July, and August) and autumn (September, October, and November) than in the winter (December, January, and February) and the spring (March, April, and May).To guarantee not only the accuracy, indicated by a comparatively high correlation coefficient, but also the representativeness, indicated by a sufficient number of valid AODs in the time dimension, the threshold correlation coefficient is set to 0.7 in this study.This value corresponds to window sizes of approximately 100 km in the spring and winter and to 150 km in the summer and autumn.

PM 2.5
The PM 2.5 concentration data used in this study are from the observations at the US Embassy (http://www.stateair.net/web/historical/1/1.html),which began hourly surface PM 2.5 concentration observations in 2008, representing a longer history of observations than that of the national air quality real-time publishing platform.The observed hourly PM 2.5 concentrations at 10:00 BJT and 11:00 BJT are averaged to obtain the PM 2.5 concentrations at 10:30 BJT corresponding to the observation time of the MODIS gridded into boxes with a size of 25 km × 25 km centered over Beijing (39.9°N, 116.3°E), after which we can get one-dimensional arrays, which are the time series of the gridded AODs in different boxes.The time series of the gridded AODs are then classified according to the distance to the center box, as shown in Fig. 1.The numbers marked on the boxes represent the group of the boxes and the corresponding time series of the AODs.For example, the center box is classed into Group 0. The 8 boxes outside of Group 0 are classed into Group 1.The rest of the classification is con- Terra AODs over Beijing.If the PM 2.5 observations at 10:00 BJT or 11:00 BJT are missing, then the observation before or after the missing observation is used.If the data from 9:00 BJT to 12:00 BJT are all missing, the PM 2.5 concentration is set to missing.

Meteorological data
Five meteorological parameters from the NCEP FNL reanalysis product with a horizontal resolution of 1 degree × 1 degree, a temporal resolution of 6 hours, namely Planetary Boundary Layer Height (PBLH), surface temperature at 2 meters (T 2 ), relative humidity at 2 meters (RH), surface wind speed (WS), and surface wind direction (WD), are used to investigate the effect of the meteorology on the PM 2.5 concentrations.To be consistent with the MODIS AODs and PM 2.5 concentrations, the above parameters are interpolated to 10:30 BJT at the location of the center box (i.e., the US Embassy site at 39.9°N, 116.3°E).

Methodology
A nonlinear model for estimating the surface PM 2.5 concentration of Beijing is constructed with the MODIS AODs and the five meteorological parameters.To better estimate PM 2.5 concentrations, an ANN is used in this study.Compared with the general statistical regressions, ANN is more capable of multivariate nonlinear fitting, and this method has been used successfully in air pollution estimation in the past (Balls et al. 1996;Gardner et al. 1999;Reich et al. 1999;Sofuoglu et al. 2006;Nagendra et al. 2008;Demir et al. 2010).A typical ANN is consisted of three parts: input layer, which used to receive information from outside; output layer, which is used to output the result of the network calculation results; and hidden layer(s), which is between the input layer and output layer.The hidden layer(s) does not directly accept the external signal, nor send signals directly to the outside.All the layers are consisted of units, and the units are connected by directed arc with particular weights.The weight indicates the strength of interaction between two interconnected artificial neurons in two different layers.For instance, the input layer can be analogous to dendrites, and the output layer can be analogized to axons.The hidden layer(s) can be analogous to cell nucleus.The whole ANN training process can be analogous to the training process of neural reflex.
To construct the ANN, we use the multilayer perceptron function in SPSS, version 22 (Corp Intel. 2013).Three years (2014− 2016) of MODIS AODs and meteorological parameters from the NCEP FNL reanalysis product are used to build the ANN model.There are a total of 653 groups of valid input data in this ANN construction.The input data are first normalized to a range of [0, 1], which can accelerate the calculation and eliminate the influences of single samples.The number of hidden layers is chosen as one to avoid overtraining, which may lead to poor results in the subsequent estimation (Kim et al. 2008;Yegnanarayana 2009).Hyperbolic tangent is chosen as the activation function of the hidden layer, and identify is chose as that of the output layer, as this choice can lead to better training results than other options.
The number of units in the hidden layer is also an important parameter in training an ANN.Typically, when the number of units is between 2n 1/2 + u and 2n + 1, where n represents the number of units of the input layer and u represents the number of output layer units, the training effect is comparatively good (Fletcher et al. 1993).Therefore, eight different units with numbers ranging from 6 to 13 are tested to choose the best one for the model construction.
To eliminate the influence of the randomly generated initial weights and threshold values used in the ANN, a methodology that is similar to ensemble prediction is used in the construction of the ANN.Twenty different ANN constructions with the same parameters are run for each number of units.These 20 ANNs have no difference in parameter setting, and the differences are only in the initial weights and threshold values, which are randomly generated by SPSS software.Then, the results of the 20 experiments are averaged.The results show that the average value of the experiments (the so-called ensemble prediction result) is clearly better than that of the individual experiment in term of correlation coefficient.
As shown in Table 1, the statistical results of the comparison of the ensemble prediction results and the real PM 2.5 observations, including mean error (ME), mean absolute error (MAE), rootmean-square error (RMSE), and correlation coefficient (R), are calculated.To simplify the calculation, only one year (2013) of the data is used.The influence of the number of units in the hidden layer is small (if this value indeed makes a difference, it does not have a great influence on the result).We choose 7 as the number of units, as this value has a slightly better statistical performance compared with other number of units.

Result and discussion
As shown in Fig. 4, the surface PM 2.5 concentrations from 2008 to 2013 estimated by the ANN model are compared with the real observations of PM 2.5 concentrations for validation.
The ANN model clearly has a good ability to estimate surface PM 2.5 concentrations with the AOD and meteorological data.The mean error, mean absolute error, root-mean-square error and correlation coefficient are −16.11,35.66, 55.43 and 0.73, respectively.The main drawback is that the ANN model tends to underestimate the surface PM 2.5 concentrations when the observed PM 2.5 concentrations are extremely high.This result is probably caused by the limited extreme values in the training sample and can be solved by tuning the loss function to improve the penalties for maximum false judgments in the ANN in future works.The ANN uses six factors (AOD, PBLH, T 2 , RH, WS, and WD) to estimate PM 2.5 concentrations, and it is valuable to analyze the relative importance of the six factors.An independent variable importance analysis, a function of SPSS, is used to identify the relative contributions of the six factors to the ANN.Sensitivity studies are performed to calculate and identify the relative importance of each predictor variable in the neural network.As shown above, 20 models with the same parameters are trained, and the results of the independent variable importance analysis are averaged, as shown in Table 2.
It is clear that the importance of AOD is much larger than that of the other factors, indicating that AOD is the most important factor.The PBLH has a larger independent variable importance than all other meteorological factors, indicating that PBLH is the most important meteorological factor in constructing the ANN model.
To quantify the contribution of PBLH in constructing the ANN, a new ANN model with only two factors of AOD and PBLH is built with the same methodology as the ANN model above.The parameters of the ANN are all the same except that the number of units in the hidden layer is set to 3 according to the above experience indicating that the number of units in the hidden layer makes little difference in the results.Only one year (2013) of the input data is used to simplify the calculation.The result is shown in Fig. 5.We can see that all the statistical results except for the mean error of the ANN model when using the two factors AOD and PBLH are better than those with the general linear regression method.The correlation coefficient of the ANN model using only AOD and PBLH is 0.76, which is much higher than that of linear regression (R = 0.68), and is slightly smaller than that of the ANN using all six factors (R = 0.78).This result further indicates that meteorological factors, PBLH at the very least, should be considered to estimate the PM 2.5 concentrations with MODIS AODs.

Conclusion and prospect
In this study, an Artificial Neural Network (ANN) model is constructed to estimate surface PM 2.5 concentrations in Beijing.The inputs of the ANN model are the MODIS AOD and five meteorological parameters from the NCEP FNL reanalysis data, namely the Planetary Boundary Layer Height (PBLH), surface temperature at 2 meters (T 2 ), relative humidity at 2 meters (RH), surface wind speed (WS), and surface wind direction (WD).After investigating the horizontal correlations of the MODIS AODs surrounding Beijing, the MODIS AODs are gridded with seasonally dependent windows sizes with a value of 150 km in summer and autumn and 100 km in winter and spring.The comparison of the past PM 2.5 concentrations estimated by the ANN model and the observed values from 2008 to 2013 indicates that the ANN model can be used to estimate surface PM 2.5 concentrations effectively.The mean bias, correlation coefficient, and root mean standard deviation between the estimated and observed values are −16.10,0.73, and 55.43, respectively.Among the five meteorological factors used in the ANN, PBLH is the most important.Sensitivity studies with one year of data indicate that the correlation coefficient of the ANN model using only AOD and PBLH is 0.76, which is much higher than that of the linear regression (R = 0.68) and is a little bit smaller than that of the ANN using all six factors (R = 0.78).
The methodology used in this study can also be extended to any other city in the world if the city has a sufficiently long observation history.
Edited by: D. Goto

Fig. 1 .
Fig. 1.The classification of the gridded AOD data with the group identifier marked in the boxes.

Fig. 4 .
Fig. 4. Comparison of the simulated and observed PM 2.5 concentrations from 2008 to 2013.

Fig. 5 .
Fig. 5. Comparison of PM 2.5 observations, the simulation using all factors, the simulation using AOD & PBLH and the linear regression of AOD in 2013.

Table 1 .
Statistical results of the different numbers of units in the hidden layer of the ANN.

Table 2 .
Independent variable importance of input factors.