Particulate matter concentration mapping from MODIS satellite data: a Vietnamese case study

Particulate Matter (PM) pollution is one of the most important air quality concerns in Vietnam. In this study, we integrate ground-based measurements, meteorological and satellite data to map temporal PM concentrations at a 10 × 10 km grid for the entire of Vietnam. We specifically used MODIS Aqua and Terra data and developed statistically-significant regression models to map and extend the ground-based PM concentrations. We validated our models over diverse geographic provinces i.e., North East, Red River Delta, North Central Coast and South Central Coast in Vietnam. Validation suggested good results for satellite-derived PM2.5 data compared to ground-based PM2.5 (n = 285, r2 = 0.411, RMSE = 20.299 μg m−3 and RE = 39.789%). Further, validation of satellite-derived PM2.5 on two independent datasets for North East and South Central Coast suggested similar results (n = 40, r2 = 0.455, RMSE = 21.512 μg m−3, RE = 45.236% and n = 45, r2 = 0.444, RMSE = 8.551 μg m−3, RE = 46.446% respectively). Also, our satellite-derived PM2.5 maps were able to replicate seasonal and spatial trends of ground-based measurements in four different regions. Our results highlight the potential use of MODIS datasets for PM estimation at a regional scale in Vietnam. However, model limitation in capturing maximal or minimal PM2.5 peaks needs further investigations on ground data, atmospheric conditions and physical aspects.


Introduction
Asia is fast developing due to industrialization and urbanization. Together with the development, air pollution concerns on human health are also increasing (MacNee and Donaldson 2003, Atkinson et al 2012, Krzyzanowski et al 2014). PM 2.5-10 consists mainly of crustal particles mechanically generated from agriculture, mining, construction, road traffic, and related sources, as well as particles of biological origin. Particulate matter with diameter less than 2.5 μm (PM 2.5 ) consists mainly of combustion particles from motor vehicles and the burning of coal, fuel oil, and wood, but also contains some crustal particles from finely pulverized road dust and soils (Laden et al 2000).
In several cities of Asia including Vietnam, the pollution levels exceed the World Health Organization (2006) Guideline values. In particular, exposure to PM 2.5 may lead to human respiratory diseases and even mortality (Pope et al 2009, Fann et al 2012, Kloog et al 2013. Specific to Vietnam, a recent report from the Environmental Performance Index (EPI) suggests that the quality of the environment in Vietnam has steadily dropped compared to other nations (EPI 2014). As per the EPI report, in the general environmental index, Vietnam is ranked 136 out of 178. Further, air quality in Vietnam is lagging with a rank of 170 and it is forecasted to worsen in the near future (EPI 2014). Of the different pollutants in Vietnam, particulate matter pollution is serious because of rapid growth of industrial activities, traffic operations and forest/ agricultural fires . Also, a recent National Environmental Report on air environment (Vietnam Environment Adminstration 2013) for Vietnam suggests that dust pollution based on total suspended particle (TSP), PM 10 , PM 2.5 and PM 1.0 surpassed national standards in several cities with impacts on human health. Thus, continuous mapping and monitoring of PM in different regions of Vietnam gain significance. PM 2.5 measurements through ground based stations although measure the concentrations with high level of accuracy and frequency, they are of limited geographic coverage in Vietnam. Also, point measurements from ground based monitoring stations are not necessarily representative of regional concentrations and regional variability is difficult to assess from point measurements alone . Recent studies suggest the potential use of satellite remote sensing technologies for mapping and monitoring of pollutants and relating them to ground sources (Badarinath et al 2009, Monks and Bierelle 2011, Kharol et al 2012, Vadrevu et al 2014. Specific to PM, the use of satellite instruments to estimate surface PM concentration is considered as an effective way to extend point based measurements to wide spatial scales (Duncan et al 2014). Satellite-derived PM maps are gradually becoming basic layers for air quality, human health and disaster management , van Donkelaar et al 2011.
Satellite aerosol products, which represent spatial distribution of particles in the vertical direction of the atmosphere, provide a practical solution for PM estimation. Several researchers used aerosol optical depth (AOD) or aerosol optical thickness (AOT) derived from satellite data for PM estimation (Chu et al 2003, Wang and Christopher 2003, Engel-Cox et al 2004, Badarinath et al 2007, Gupta and Christopher 2008, Lee et al 2011, Pelletier et al 2007, Schaap et al 2009, Hirtl et al 2014, Zha et al 2010, Ma et al 2014. Methods for estimating PM vary from linear regression (LR) or multiple linear regression (MLR) to the non-linear regression methods such as artificial neural network (ANN), support vector regression (SVR) and self organizing map (SOM) (Gupta and Christopher 2009a, 2009b, Yahi et al 2011, Hirtl et al 2014. Recently, modelling systems such as GEOS-Chem or CMAQ are also used for relating AOT to PM (Liu et al 2007a, Liu et al 2007b, Liu et al 2007c. In Vietnam, most of the earlier studies on PM characterization focused on ground measurements (Hien et al 2002). Within the framework of the Asian regional air pollution research network (AIRPET), PM 2.5 and PM 10 were studied by Kim Oanh et al (2006). Recently the effect of regional meteorology on mass and composition of PM was investigated in Ha Noi during December 2006-February 2007 (Hai and Kim Oanh 2013) and in a mining town in Quang Ninh province (Northern Vietnam) in both dry and wet seasons from 2009 to 2010 (Hang and Kim Oanh 2014). Compared to these studies, relatively few studies focused on using satellite remote sensing data for estimating pollution. For example, air pollution maps at high spatial resolution were estimated directly from SPOT 2, Landsat 5, Landsat 8 data for Quang Ninh and Ha Noi (Luong et al 2010, Nguyen andTran 2014). Earlier, our team carried out initial investigations for PM estimation using of satellite aerosol products in Ha Noi and obtained promising results .
In this study, we use satellite remote sensing for estimating PM for entire Vietnam. We mapped temporal PM 2.5 concentrations by integrating MODIS satellite data and ground based PM data from December 2010 to September 2014. We also validated our results for four regions.

Study area
Vietnam is the easternmost country on the Indochina Peninsula in Southeast Asia. The total area is nearly 332 210 km 2 and extends from (8°27′ N, 102°8′ E) to (23°23′ N, 109°27′ E) with the population of 90.5 million as of 2014. Vietnam is divided into seven different climatic zones which are based on summer and winter over northern regions (i.e.: North West-NW, North East-NE, Red River Delta-RRD, North Central Coast-NCC) and rainy and dry seasons to the South (South Central Coast-SCC, Central Highlands-CH and South East-SE). In this study, we applied our model and validated in four different regions i.e., NE, RRD, NCC and SCC ( figure 1(a)).
The figures 1(b)-(d) present variation of meteorological parameters for different regions. Over four studied regions, NE and RRD have cold and dry winter from December-February. During summer from June-August, high temperature and rain are observed. Moving to the South, SCC has dry season from February-August and rainy period from September-December with peak rainy month in November. The average annual temperatures have not changed largely (from 25°C-33°C in average). However, SCC together with NCC is one of hottest regions in Vietnam during summer time from June-August. NCC is a special case with climate characteristics of both northern and southern regions. NCC has cold weather as NE and RRD during December and February but dry and rainy seasons which shifts one month before in comparison with SCC seasons.

Datasets and methodology
3.1. Datasets MODIS instruments provide near-daily measurements of global coverage. The prefix MOD and MYD are reserved for data from MODIS onboard Terra (AM overpass) and Aqua (PM overpass), respectively. MODIS aerosol products at 10 km (i.e., Optical_-Depth_Land_And_Ocean at 0.550 μm of MOD04 in Collection 5.1 and AOD_550_Dark_Target_Deep_-Blue_Combin-ed of MYD04 in Collection 6) and MODIS meteorological products at 5 km (Skin Temperature of MOD07 and MYD07 in Collection 6) from 2009-2014 were investigated to estimate PM 2.5 maps for Vietnam. Currently, there are seven AERONET ground stations in Vietnam which include Bac_Giang, Bach_Long_Vy, Bac_Lieu, Nghia_Do, Nha_Trang, Red_River_Delta and Son_La (figure 1(a)). AERONET AOT from 2009-2014 at seven stations were compared to MOD04 and MYD04 AOT for accuracy.
The Center for Environmental Monitoring (CEM) at Vietnam Environment Administration (VEA) provides updated PM concentrations together with hourly meteorological parameters (i.e.: temperature, pressure, radiation, wind speed and relative humidity) at stations. The PM data at six CEM stations (Phu Tho, Ha Noi, Hue, Da Nang, Ha Long and Khanh Hoa) were considered from December 2010 to September 2014 (figure 1(a)) but period differed for stations. Data at Phu Tho, Ha Noi, Hue and Da Nang stations were used for modeling while data at Ha Long and Khanh Hoa stations in 2014 were utilized for independent validation process. Further, meteorological data from the National Center for Hydro-Meteorological Forecasting (NCHMF) covering 98 ground stations for temperature, relative humidity and precipitation from 2005-2013 has been used. At each meteorological station, temperature and relative humidity are measured at 13:00 but precipitation at 13:00 and 24:00 every day (Vietnam time zone).

Methodology
The methodology for PM 2.5 estimation over Vietnam from MODIS satellite images is carried out by integrating ground based PM 2.5 and MODIS data and then using Multiple Linear Regression and universal Kriging techniques. The detail processing steps are illustrated in figure 2.

Data pre-processing and integration
Since satellite and ground datasets have different temporal and spatial characteristics, they need to be integrated for modeling and testing process of PM estimation. Satellite data are first resampled to a grid of 10 km over Vietnam using a bilinear function and integrated using time and location constrains following Ichoku et al (2002). We considered only cloud-free aerosol data pixels which had distances to a ground station within a radius of 25 km but the directed temperature pixel over a ground station because of temperature coarse spatial resolution and lightly variant. Meanwhile, ground measurements are averaged within a temporal window of 60 min coinciding with the satellite overpasses. The optimal thresholds were selected by experiments.

Feature selection
The selection of temporal parameters (i.e. aerosol and meteorological factors from satellite images) for regression model is based on correlation assessment of each factor to PM at ground level. Besides, monthly temperature, relative humidity and precipitation by region using all 98 NCHMF stations are considered for the regression model to characterize climate regions in Vietnam. The use of relative humidity as seasonal indicator is also found in Liu et al (2007b) and Liu et al (2007c).

Multiple linear regression
Linear regression or multiple linear regression (MLR) are considered as a common and valid methodology to estimate PM from aerosol and other meteorological factors (Chu et al 2003, Wang and Christopher 2003, Engel-Cox et al 2004, Gupta and Christopher 2009b, Schaap et al 2009, Zha et al 2010, Lee et al 2011. In our approach, different MLR models have been applied to estimate PM 2.5 concentrations from satellite-derived AOT, satellite-derived temperature and monthly and regional indicators (i.e. temperature, relative humidity and precipitation) normalized in range of (−1, 1). The least square fitting technique determines MLR coefficients based on the least square errors of a model over a training dataset after that, the Bayesian model average (BMA) technique calculates Bayesian information criterion (BIC) and post probability for each MLR model and selects the best one that minimizes BIC and maximizes posterior probability. Finally, the Cook's distance (Cook's D) is applied on the selected models to identify and remove the samples that have significant impacts on the estimated coefficients.

Universal kriging
The universal Kriging is applied on regression PM maps to interpolate values for the entire region that also included areas frequently impacted by clouds. The spatial correlation of PM 2.5 values is modeled by fitting a spherical variogram to the experimental semi-variance accounting for all valid geo-locations in a PM 2.5 image. Several models such as Gaussian, spherical, and exponential were tested to fit data. Results suggested the spherical model as the best with the lowest mean square error between the variogram model and experimental data.
After the same, we performed universal Kriging on each MOD and MYD PM 2.5 image to obtain an interpolated map together with its associated error map.

Evaluation
To evaluate the estimated PM concentration from our approach, we used four statistical indicators i.e., Pearson's correlation coefficient (R), coefficient of determination (r 2 ), root mean square error (RMSE) (i.e.: absolute error) and relative error (RE) (i.e.: percentage error). In addition, mean fractional bias (MFB) and mean fractional error (MFE) have been used to assess model performance (Boylan and Russell 2006).
Where N is number of samples. M i and O . i are modeled and observed PM mass concentration, respectively. In the above equation, goal is the level of accuracy close to the best estimation and criteria is the level of accuracy acceptable for standard modeling purposes.

Meteorological and satellite data correlations with PM
The impact of meteorological variations, i.e., temperature (Temp), pressure (Pres), radiation (Rad), wind speed (Wsp) and relative humidity (RH) and satellitederived aerosol (MOD04 and MYD04) on PM of 1, 2.5 and 10 μm diameters has been evaluated. The results clearly identify the influence of temperature and aerosol parameters on PM concentrations (see figure 3). Results in figure 3(a) suggested stronger influence of temperature on PM than pressure, radiation, wind speed and relative humidity and the relationship gradually decreasing from PM 1 , PM 2.5 to PM 10 . Correlation coefficient (R) of temperature with PM are −0.561, −0.509 and −0.366 for PM 1 , PM 2.5 and PM 10 , respectively. Hien et al (2002) has been confirmed that air temperature is an important determinant for his PM 2.5 estimation model in summer, meanwhile inverse correlations of temperature and PM 2.5 were explained as a trend of observing atmospheric dispersion under warm air than cold air masses. The effect on PM at different sizes is similar to the temperature for MOD04 but MYD04 dataset ( figure 2(b)). R of 0.527, 0.522 and 0.429 were obtained for MOD04 and 0.617, 0.482 and 0.592 for MYD04 datasets and PM 1 , PM 2.5 and PM 10 datasets.

Satellite-derived aerosol and temperature validation
The MOD04 and MYD AOT at 0.550 μm are compared to AERONET aerosol data at 0.550 μm calculated using log-linear interpolation from two AOT values of two closest channels 0.500 and 0.675 μm . The correlation coefficient between MOD04 and MYD04 to AERONET AOT are 0.867 and 0.887, respectively. Figure 4 presents their scatter plots.
We compared MODIS temperature (MOD07 and MYD07 skin temperature) and CEM temperature from December 2010 to September 2014 over four stations (Phu Tho, Ha Noi, Hue, and Da Nang). Satellite data showed strong correlation with CEM temperature (0.878 and 0.799 for MOD07 and MYD07s temperature, respectively). The MOD07 temperature and PM correlation (table 1) was even higher than station temperature and PM data (figure 3).

Predictor variables and model selection
The BMA technique suggests five and four regression models for MOD and MYD dataset, respectively (see table 2). Each regression model with different variables (i.e.: MODIS derived aerosol-AOT t , MODIS derived temperature-Temp t , regional monthly temperature/ relative humidity/Precipitation-Temp mr , RH mr , Prec mr ) is evaluated using r 2 and posterior probability of model correction. The best regression models are MOD #1 and MYD #1 which use satellite-derived aerosol (AOT t ) and monthly and regional temperature (Temp mr ) as predictor variables and then, gain high r 2 and highest posterior probability. The low r 2 for MYD models can be explained by outliers in the MYD dataset. Therefore, Cook's distance using 4/(n-p-1) are PM 2.5 estimated from MOD and MYD models. AOT t-MOD and AOT t-MYD are MOD04 and MYD04 AOT. Temp mr is mean monthly temperature for each region. Table 3 presents regression results of the MOD and MYD regression models. Results suggested r 2 and RE of 0.602 and 33.348% for MOD data and 0.577 and 53.353% for MYD data. Figure 5 shows scatter plots of fitted PM 2.5 mass coentrations with ground based PM data from CEM stations. The above results are comparable to the other results found iliterare. For example, Christopher 2009a, 2009b achieved r 2 of 0.462 and 0.547 for hour PM 2.5 estimation from MODIS AOT and meterological factors using multiple linear regression and neural network techniques over southeastern US. Liu et al (2007b) predicted daily PM 2.5 from MODIS AOT, high RH season in Eastern and Western US with adjusted r 2 of 0.42 and 0.21, respectively. Without meteorological data, Lee et al (2011) calculated daily PM 2.5 directly from MODIS AOT with r 2 varying from 0.12-0.88 using linear regression and 0.82-1.00 using mixed effect models in New England region.  )   Figure 6 shows satellite-derived PM 2.5 versus ground measured PM 2.5 at Phu Tho, Ha Noi, Hue and Da Nang stations. Satellite-derived PM 2.5 was able to replicate average of ground PM 2.5 spatial Table 3. MOD and MYD regression model's results on filtered dataset using 4/(n-p-1) threshold in which n and p are number of samples and degree of freedom, respectively.

MOD MYD Total
Year patterns, however limited in capturing maximal or minimal peaks. We also analyzed results for different locations (table 6). Satellite-derived PM 2.5 at Phu Tho station has the best correlation (r 2 =0.412 and RE=39.693%) in comparison with ground-based PM 2.5 . At Ha Noi station, predicted PM 2.5 has low quality with r 2 =0.158 and RE=36.195% mainly due to large variations in the Ha Noi dataset. Large error was observed at Da Nang station (RE=45.656% and r 2 =0.281) while moderate results were obtained for Hue station (RE=32.747% and r 2 =0.300). MFBs show that PM 2.5 is underestimated in Phu Tho, Ha Noi, and Hue datasets but overestimated in the Da Nang dataset. However, four station's MFBs and MFEs still meet the goals. We attribute the errors to geolocation. For example, all CEM stations are located closely to rd sides whereas satellites signals represent an average of 10 km leading to high errors on satellite-derived PM 2.5 .

Independent validation
We used Ha Long and Khanh Hoa stations for independent validation. They are located in the NE and SCC regions respectively. Ha Long is in Quang Ninh which is a mountainous and coastal province with four distinct seasons and dominated by tourism and coal mining (90 percent of coal output of Vietnam). Khanh Hoa is mostly mountainous and coastal with two distinct seasons (i.e. rainy season from September to December and dry season on other months). It has strong industry and services and a small agriculture sector.     concentration is less in NE than in RRD. However, in both regions, level of PM 2.5 is lower in this period than in winter due to summer rains. Considering PM 2.5 satellite maps in NE and RRD (figure 8), similar trends in winter and summer can be observed.
SCC with rainy and dry seasons has low PM 2.5 concentration level and no seasonal trend which is confirmed by Da Nang, Khanh Hoa station measurements ( figure 8(d)), PM 2.5 maps (figure 10) and the previous study (Vietnam Environment Administration 2013).
NCC is a special case with two seasons as SCC (i.e., dry season from March to August and rainy season from August to the next January) but cold and still effected by very strong NE monsoon from the South during rainy season. As presented in figures 8(c) and 10, the seasonal pattern in NCC is more similar to NE and RRD than SCC. The significantly cold months are from December to January. Due to low temperature and precipitation, NCC climate be similar to NE and RRD's ones during these months.
Over four regions, PM 2.5 concentrations decrease from the South to the North. In addition, stronger PM 2.5 is observed in March especially in NE and RRD and we attribute the same to vegetation fires, especially agricultural residue burning . More data is needed to substantiate the causative factors of particulate pollution. In conclusion, satellite-derived PM2.5 maps were able to replicate seasonal and spatial trends of ground-based measurements in four different regions. Results highlight the potential use of MODIS datasets for PM estimation at a regional scale in Vietnam.

Conclusion
In this study, we developed an approach for temporal particulate matter concentration mapping over Vietnam using MODIS satellite images. PM 2.5 maps from December 2010 to September 2014 at a 10 km spatial resolution were developed through integrating ground based PM and MODIS data using multiple linear regression and universal Kriging techniques. Data representing four climatology regions in Vietnam i.e., North East, Red River Delta, North Central Coast and South Central Coast are used for modeling and   validation. Validation on satellite derived PM 2.5 datasets showed significant results in comparison with ground-based measurements. In addition, validation on two independent datasets (i.e.: Ha Long in North East and Khanh Hoa in South Central Coast region) showed similar results suggesting spatial stability and robustness of our methodology. Seasonal trends of PM 2.5 concentration over NE, RRD, NCC and SCC regions showed consistency between ground stations and satellite-derived PM 2.5 maps. Over four regions, PM 2.5 concentrations decreased from the South to the North and we attribute the patterns to regional climatology. Further, limitation in capturing maximal and minimal PM 2.5 peaks is being addressed. More concerning on ground data, atmospheric conditions and physical aspects for regression model should be done to improve model performance.