Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Using satellite data on remote transportation of air pollutants for PM2.5 prediction in northern Taiwan

Abstract

Accurate PM2.5 prediction is part of the fight against air pollution that helps governments to manage environmental policy. Satellite Remote sensing aerosol optical depth (AOD) processed by The Multi-Angle Implementation of Atmospheric Correlation (MAIAC) algorithm allows us to observe the transportation of remote pollutants between regions. The paper proposes a composite neural network model, the Remote Transported Pollutants (RTP) model, for such long-range pollutant transportation that predicts more accurate local PM2.5 concentrations given such satellite data. The proposed RTP model integrates several deep learning components and learns from the heterogeneous features of various domains. We also detected remote transportation pollution events (RTPEs) at two reference sites from the AOD data. Extensive experiments using real-world data show that the proposed RTP model outperforms the base model that does not account for RTPEs by 17%-30%, 23%-26% and 18%-22% and state-of-the-art models that account for RTPEs by 12%-22%, 12%-14%, and 10%-11% at +4h to +24h, +28h to +48 hours, and +52h to +72h hours respectively.

Introduction

Rapid urban development and industrialization in recent years have increased air pollution, especially PM2.5 with an aerodynamic diameter of less than 2.5 micrometers (μm) that cannot be filtered through nasal passages, leading to health problems such as respiratory and cardiac diseases [13]. Many nations have built urban stations to monitor the presence of PM2.5 in the environment. The resulting datasets can be used to better understand and predict PM2.5 [4]. Furthermore, the prediction of PM2.5 levels is a difficult problem, as the dispersion of pollutants is heavily dependent on the meteorological characteristics and terrain, in addition to the activities of the inhabitants [5, 6]. The prediction is further complicated by factors such as pollutant migration outside the observed area. Such long-range transport of air pollutants in this work is called Remote Transportation Pollution Events (RTPEs) and relies on wind and other meteorological effects [7]. The aerosol optical depth (AOD) from the Multi-Angle Implementation of Atmospheric Correlation (MAIAC) algorithm is a natural solution to understand air quality in large areas, especially as they have a strong correlation with PM2.5 [8, 9]. In this study, we considered pollutants transported from northeast Asia through the East China Sea to Taiwan [10, 11].

Different researchers have used satellite-based AOD measurements to estimate and predict PM2.5 due to their high correlation with PM2.5 and their large spatial coverage area [1214]. The AOD of the MAIAC algorithm at 1 km resolution produces better performance than other algorithms with 10 km resolution [2]. Most models use AOD for end-to-end training, where the input to their model is AOD and other related meteorological data, and the output is PM2.5 of the same area. In this work, we used MAIAC AOD data from a large remote area as input, while the output is local PM2.5 from a part of northern Taiwan. In this work, AOD is not used to directly predict local PM2.5, while the proposed model first predicts RTPEs that later with other data are used to predict local PM2.5.

In the literature, predictive models are designed using the classical dispersion approach that focuses on identifying the root cause of PM2.5 from emissions, chemicals, climatology, or a combination of these factors [15, 16]. For example, the Community Multiscale Air Quality Model (CMAQ) [17] is designed to study air pollution on a global scale. The challenge of this approach includes the failure to detect the complex relationship of features that affect PM2.5 [18]. Furthermore, it does not perform well in capturing the spatial and temporal distribution of PM2.5 and suffers from high computation costs, especially for a model with complex high-order equations [19].

There are studies that simulate and quantify RTPEs to Taiwan [2024]. Most of them use trajectory statistics (TS) and chemical transport modeling (CTM) approaches to discover the source of RTPEs. TS uses the frequency of backward trajectories in an area to determine whether pollution is due to remote pollutants. CTM involves a brute-force-based method, which involves two simulations, one without pollutants from the local area and a normal simulation. The difference between these simulations determines the RTPE amounts from the remote area. In this work, we predict RPTEs with the help of several deep neural networks.

The recent development of deep learning in the prediction of air pollution shows the ability to outperform the classic dispersion approach and the statistical approach. Deep learning is based on large historical datasets to capture complex interactions among various features of different datasets. For PM2.5 prediction, various deep learning methods involve different machine learning techniques to capture different knowledge from large datasets. Convolutional neural networks (CNN) [2528] are used to capture spatial knowledge, long short-term memory (LSTM) [27, 28] models are adopted for temporal knowledge, and fully connected neural network (FC) models extract complex interaction between those datasets. Convolutional Long Short-Term Memory (ConvLSTM) is another technique that combines CNN and LSTM to predict PM2.5 [26, 29, 30]. However, deep learning with an end-to-end training approach that directly applies the CNN and LSTM components could not perform well for complicated and heterogeneous data [27]. Specifically, the AOD image inputs cause high computation costs and training time that can only obtain a sub-optimal solution. Furthermore, PM2.5 prediction becomes more challenging in the long term (for example, for 48 to 72 hours) as there are influences from both known and unknown factors [19]. In this work, we adopted ConvLSTM, CNN and FC to improve the prediction accuracy of PM2.5

The philosophy behind deep learning is to produce good results if there is a sufficient training dataset [31]. Due to incomplete data and missing data from different observation stations, the ensemble machine learning approach [18, 32, 33] is used to improve the prediction of PM2.5. In an ensemble model, a linear combination of the outputs of different individual deep learning models is used for PM2.5 prediction, delivering better results than individual prediction results. These are the popular ensemble machine learning models, AdaBoost (AB) [32], bagging regression (BG), random forest (RF) [34, 35], extreme gradient boosting (XGB) [3638], and a generalized additive model (GAM) [33, 39]. In this work, we used the composite neural network that outperforms those ensemble models. The composite neural network framework [27, 40] is proposed to resolve complicated applications, such as PM2.5 prediction, that connects a collection of pre-trained neural network models to form a large neural network. It is proven that a composite neural network yields greater learning capabilities without the burden of high model training expenses.

Recently, a composite neural network framework that combines different pre-trained deep learning models [27, 40] has been proposed to resolve complicated applications, such as PM2.5 prediction. A composite neural network is a collection of pre-trained neural network models that forms a large neural network to yield greater learning capabilities without the burden of high model training expenses. Each pre-trained model utilizes the knowledge from different datasets, and outputs from them are connected into an acyclic tree construction. Later, the outputs are ensembled after constraining the weight of each component into a specific value by using a defined function instead of being ensembled by a weighted average like an ensemble machine learning model

This paper answered two main questions. The first one was about the identification of the occurrence of RTPEs in a local area. The second question was about the incorporation of knowledge about RTPEs to improve the local prediction of PM2.5. The questions created three challenges in terms of deep learning design and practice. The first challenge is how to prioritize the factors that influence the capture of remote pollutants, as air quality is affected by multiple factors [9, 28, 41], each with its own spatial and temporal distribution. The next challenge is how to identify factors in the design of a neural network model to capture the complex interactions between them for better PM2.5 prediction. The third one is how to fuse and train the proposed neural network model on large heterogeneous datasets for improved efficiency and prediction results.

We addressed the first challenge by considering the AOD data and weather data of remote areas are typically provided in coarse-grained grids. Generally, RTPEs are caused by monsoon and frontal surfaces which are synoptic; therefore, we considered wind speed, direction, and related features. To tackle the second challenge the resulting model [27] was selected as the pre-trained base model, and then including another large deep learning model called spatio-temporal remote information neural network (STRI), it is extended as the proposed remote transported pollutant composite neural network (RTP model). The STRI model incorporates long-range pollutants for PM2.5, grasps spatio-temporal features from remote areas and learns the spatial correlation between remote AOD and local PM2.5. For the third challenge, we broke the new STRI component into two parts: one for feature extraction and another for prediction. This reduces the number of training parameters and thus the computational cost of the training process with virtually equivalent prediction performance. The four contributions of this work are:

  1. The proposed composite neural network RTP model efficiently captures RTPEs and significantly improves PM2.5 prediction in comparison with the base model and state-of-the-art models.
  2. Addressed challenges using RTPEs as features for local PM2.5 prediction. These challenges are addressed in a combined fashion to learn from selected features and models.
  3. Developing a classification algorithm to classify RTPEs of two reference sites at different PM2.5 levels and increase rates.
  4. Applying a composite neural network [42] to develop neural network models incrementally to demonstrate the design rationale and contributions of each component for PM2.5 prediction.

Materials

Study area

The area under study comprises of a remote area where we captured RTPEs and a local area (Taipei area) where we performed the prediction of PM2.5. The remote area is within the East China Sea consists of four tiles as shown in Fig 1 where each tile covers an area of 1200 x 1200 kilometer square (km2). Through that sea the RTPEs from northeastern Asia cross towards Taiwan. The deposition of RTPEs in Taiwan originates from outside Taiwan [2024]. The Taipei (area = 271.8 km2) consists of 18 Environment Protection Administration (EPA) monitor stations where we used two northern shore stations Wanli and Tamsui to demonstrate the existence of RTPEs.

thumbnail
Fig 1. Study area.

Left side: Four tiles (adapted from NASA) label 1(h28v06), 2(h29v06), 3(h28v05), 4(h29v05) with Taiwan in the middle between tiles 1 and 2. Right side: The map of Taiwan after zooming with two stations Wanli (red circle) and Tamsui (green circle).

https://doi.org/10.1371/journal.pone.0282471.g001

Dataset

We used the 2014 to 2016 data to evaluate the proposed neural network models. The 2014 and 2015 data were used for training the model and the 2016 data were used for testing. The data for the extended local satellite dataset (ESD) model evaluation were prepared at daily granularity, whereas for the RTP model the data were at hourly granularity.

In both granularities the sample size (N) for the testing dataset varied with the target prediction time, for example the next 1day (+1day) N = 6390 while the next 3day (+3day) N = 12,708. But for the case of + 4H hour prediction with H ∈ {1, …, 18}, the corresponding number of samples is at most 35120 × H. For example, N for + 4, + 8, + 64, + 72 hours is 35120, 70160, 558080, and 626400, respectively. Table 1 show sample size (N) for every prediction hour.

thumbnail
Table 1. Sample size (N) summary for hourly (+hr) prediction.

https://doi.org/10.1371/journal.pone.0282471.t001

Observed air quality concentration.

We obtained hourly air quality data from the EPA website (data.epa.gov.tw) consisting of PM10, PM2.5, Carbon monoxide (CO), Nitrogen Oxides (NOx), Ozone (O3), and Sulfur dioxide (SO2). The Taipei area was divided into grids with total of 1140 (30 × 38) grids cells, thus we used four nearest neighbors (4-NN) method to fill grid cells with empty values. To evaluate ESD, we convert those datasets to daily interval.

Meteorological data.

The hourly meteorological data were obtained from the Center Weather Bureau (CWB) website (opendata.cwb.gov.tw). Each reading includes wind speed and direction, rainfall, pressure, temperature, and humidity. The data covered 77 grid cells in the Taipei area. Therefore, we used 4-NN to fill those cells without monitor stations. Again, we averaged those datasets into daily reading for ESD evaluation.

Remote meteorological data.

We used the National Center for Environmental Prediction (NCEP) final (FNL) global analysis data (rda.ucar.edu) covering all over the world. These data are provided over 28 × 28 km2 grids every six hours and we converted into hourly interval using linear interpolation. The data includes meteorological features: temperature, pressure, vertical velocity (VVEL), absolute vorticity (ABSV), lifted index, wind speed, and wind direction. The wind speed (ws)(denoted as ws) and direction(θ)(denoted as θ) are represented as u and v components, i.e., ws × cos(θ) and ws × sin(θ). The u component is the horizontal speed toward the east (known as Zonal Velocity) and v component is the horizontal wind speed toward the north (known as Meridional Velocity). The ws and θ, temperature, VVEL and ABSV were considered at pressure levels from 10mb to 1000mb. These data were used for evaluation of the RTP model only.

Satellite MAIAC AOD dataset.

This is satellite AOD data at a 1 × 1 km2 resolution created using the MAIAC algorithm which is updated twice a day and downloaded from the National Aeronautics and Space Administration (NASA) website (ladsweb.modaps.eosdis.nasa.gov). The remote area is covered by four satellite tiles (Fig 1) with the AOD data. The AOD is used to evaluate both RTP and ESD models but the data pre-processing is different for each model.

For ESD evaluation, we used tile 1 and 2 (Fig 1) to fill the AOD data in Taipei area. However, for the missing grids we use the mean of their neighboring grids (3 × 3) to fill their AOD data. For RTP evaluation, we calculated the daily means of AOD value for each grid in all tiles. We assumed the AOD value is the same for the whole day; thus we repeated the same value 24 times to match hourly reading. Furthermore, we also downscaled all tiles to produce a finer spatial resolution. The downscale approach was used on satellite images for precipitation [26] using mean pooling. In this work we use maximum pooling to maintain the distribution of values in each tile. At the end each tile is reduced to spatial dimension of 300 × 300 km2. The downscaled tiles match the available memory of graphics processing unit (GPU) and reduce computational cost.

Method

The description of the methods of the study is based on the composite neural network. The idea is to access or design several pre-trained deep learning models for different tasks and then treat them as the components of the final composite neural networks, ESD and RTP.

Proposed models

This subsection reports the tasks and the architectures of the following neural network models:STRI model, Base model, Local Satellite Data Model (LSD), RTP model and ESD model.

STRI model.

As depicted in Fig 2, STRI predicts the PM2.5 concentrations of the 18 EPA stations in Taipei using meteorological and AOD features from remote areas with local meteorological features and PM2.5 values. Due to the size of both STRI model and its input dataset being larger than the GPU memory limitation and to reduce the computational costs, STRI was divided into the STRI_fe and STRI_p submodels(i.e. components). There are two phases of training, the first trains the STRI_fe model and the second fine-tunes the STRI_p model with fed features from the STRI_fe model.

thumbnail
Fig 2. STRI model.

Structure of STRI, STRI_fe and STRI_p models, where red dashed circle denotes extraction of spatio-temporal features from remote areas.

https://doi.org/10.1371/journal.pone.0282471.g002

The inputs of the STRI_fe model are the current four downscaled satellite tiles and remote weather. Each tile is represented by four-dimensional (4d for short) tensor, [t, c, w, h] corresponding to time, channel, width, and height. Considering the available memory and computational resources, the model uses average pooling with 3 dimensions (3d) [c, w, h] on each tile along the time axis to reduce their dimension and output Tq. The CNN layers receive Tq for capturing spatial correlation, and aggregates information between grid cells. The output from the pooling layer on the 4d tensor is denoted by Pq: (1) where L represented the pooling layer, c is the convolutional feature from the convolutional layer, b is the additional bias, v is a vector with the same size as c, and is an activation function. (2) where Tq was the downsampled AOD data, * represents the convolutional operation, and K is the convolutional kernel. To speed up the training process, we applied batch normalization [43] between ConvLSTM layers in the STRI_fe model. The output of ConvLSTM for each tile (HT1, HT2, HT3, HT4) is concatenated and then flattened as a 1 dimension (1d) tensor |e|.

On the right hand side of the STRI_fe component, again the ConvLSTM structure with batch normalization is applied to the current remote weather dataset to extract spatio-temporal features, which represent historical weather patterns of wind and other features associated with time and location. Furthermore, the 4d tensor’s output from ConvLSTM, denoted by HW is flattened as 1d tensor |x| which later is merged with |e| to form another 1d tensor [g] denoted as Rp. Rp is the extracted spatio-temporal features of remote pollutants with their corresponding weather features which later is transferred to the STRI_p model after being converted to a 2d tensor.

In the second training phase, the STRI_p model is further refined with the fixed STRI_fe model to reduce the training time, model complexity, and model parameters for improved prediction results. STRI_p receives sequence of Rp,local sequences of PM2.5 and meteorology data which also include future weather forecasts. Future weather forecast data is included to reflect weather fluctuations, because the current weather is not satisfactory for long-term prediction, i.e., beyond 24 hours [28]. Furthermore, all input features are merged together and form a 2d tensor which is denoted as HR. Finally, fully connected (FC) layers are applied to HR to learn the complex interaction between features extracted from the remote area and local features and make predictions. More details of STRI model configuration can be found in S1 Table.

Base model.

The Base model [27] was designed for PM2.5 prediction for 18 EPA stations using local influential factors within the Taipei area represented as 30 × 38 = 1140 cells. This model uses 21 features from EPA and 26 features from CWB. Among the 1140 cells, there are 18 EPA stations and 77 CWB stations. This Base model itself is a composite neural network combining six heterogeneous models as its components: one LSTM, two FC (fully connected layer) and three ConvLSTM, where each component has its input data and its expected task. For example, ConvLSTMs extract spatio-temporal knowledge of EPA, CWB and weather forecasting datasets respectively, and two FC are expected to automatically distillate the information from EPA and CWB data. The trained weights of this base are always fixed in our consecutive steps.

Fig 3 shows the Base model for the next 72-hour prediction. For the next 24-hour and 48-hour predictions, the Base model have the same architecture but different details, such as activation functions and weights (W).

thumbnail
Fig 3. Base model.

Structure for Base model for next 24 and 48 hours.

https://doi.org/10.1371/journal.pone.0282471.g003

LSD model.

This model only considers local AOD data in the Taipei area, which made it a simpler composite neural network than STRI. We fill the area with AOD PM2.5, weather forecast, and meteorological data, all of which are aligned as daily readings and we used them as input to the LSD model.

The LSD model (Fig 4(a)) starts with a series of CNNs on the AOD data to capture the spatial correlation from neighbors along the temporal axis. Then a pooling layer is applied after CNN to reduce the spatial dimensions and aggregate features between the grids and output Ko. The model uses the same series of CNN and pooling layers on the current meteorological, air quality, and weather forecast data, and outputs Kl. Later, the model concatenated Ko and Kl using an Add layer and LSTM applied to extract temporal related features.

thumbnail
Fig 4. LSD, RTP and ESD models.

(a) Structure of LSD model, (b) RTP model, and (c) ESD model.

https://doi.org/10.1371/journal.pone.0282471.g004

Finally, the FC layer used to learn the interaction and correlation between all features in a nonlinear way [28] and then produces the PM2.5 prediction using the final FC layer.

RTP model.

Consisting of a pre-trained Base model and an STRI_p component, RTP is a composite neural network which handles knowledge from RTPEs. The RTP structure has two components (Fig 4(b)) where each component is trained separately, after which they are used as pre-trained components in the RTP model using a series of ReLU functions (ReLU()) for improved overall local PM2.5 prediction for the 18 EPA stations.

The RTP model predicts PM2.5 concentrations using composite techniques on input from the two components. The PM2.5 prediction results from the Base model(O) and STRI_p(X) for 18 stations for the next 4,8 up to 72hour were used as input for RTP. Then RTP predicts PM2.5 at the same hour interval for the same stations. The objective here is to improve local PM2.5 prediction by accounting for RTPEs.

ESD model.

We changed the topology of the Base model to the ESD model and apply it for the daily prediction of PM2.5. The ESD model with a series of ReLU functions (Fig 4(c)) is composed of the Base model and the LSD model. AOD data in LSD is composed of columnar pollutant measurements as opposed to ground measurements. The difference between the RTP and ESD models is that the ESD has the LSD component with local AOD knowledge to improve daily PM2.5 prediction, while the RTP utilizes STRI_p, which learns the remote AOD knowledge.

Given the PM2.5 prediction outputs from the Base model(P) and LSD model (Y) for 18 stations for next 1 day to 3 day, then the ESD predicts PM2.5 for the same stations at the same daily interval.

Evaluation

This section reports the experimental environment, the settings and the evaluation for the training of deep learning models.

Experimental environment and setting

The models were trained on an NVIDIA GPU and implemented on Keras with TensorFlow backend environment. All models were evaluated using root mean square error (RMSE), correlation coefficient (R) and mean bias error (MBE). The RMSE and R evaluate the model predicted values if they represent the true values. Furthermore, MBE estimates the average bias in the prediction. The mathematical equations of those metrics are defined below: (3) (4) (5) where y and are true values and predicted value at timestamp respectively. Also and n is a total number of incident in a sample. In this work we consider the mean of all monitor stations RMSE value.

Classification of remote pollutants

In this section we answered the first question by classifying PM2.5 levels at the Wanli and Tamsui stations in the Taipei region that are affected by RTPEs. The reasons that these two stations are considered the RTPEs indicators include (1) their locations are by the seashore as shown in Fig 1, and (2) their background PM2.5 values are stable and relatively low and the occurrence of RTPEs will cause the rise. Therefore, we started by producing PM2.5 predictions using the STRI_fe and SRTI_p models without considering the local PM2.5 influence factor. In our prediction experiment, we considered only November to May because they are months when RTPEs have the greatest impacts on northern Taiwan [24].

The proposed classification algorithm classifies the PM2.5 concentrations for the next 24, 48, and 72 hours (+24h, +48h, +72h) that are affected by RTPEs. Such RTPEs are understood to flow across the eastern China Sea to Taiwan; however, due to variations in wind direction, not all pollutants reach Taiwan. Thus, we seek to ascertain the amount of pollutants reaching Taiwan or the increased concentration caused by the remote pollutants, which corresponds to two conditions of RTPEs. Condition (1): the arrival of such RTPEs raises PM2.5 concentrations beyond a certain threshold at these stations. Condition (2): PM2.5 concentration increases over two consecutive hours. In particular, Condition (2) assumes if RTPEs arrive in the current hour (t), then the difference between the current PM2.5 and that of the previous hour (t-1) must be positive. The RTPEs are said to occur if the peak value simultaneously satisfies both conditions.

Classifying remote transportation pollution events.

For the two conditions, we created three thresholds for each. That for Condition (1), based on the finding of Chuang et al. [24] showing that the RTPEs in northern Taiwan account for PM2.5 concentrations ranging from 31 to 39, we selected 30, 33, and 36 in our experiments. For Condition (2), i.e., the differential threshold (Diff_tshd), the true and predicted PM2.5 concentrations were converted to first-order difference vectors, after which the differential thresholds 0.5, 1.0, and 1.5 were set.

Fig 5 is an example with the ground-truth (GT) and predicted results of the next one hour(+1h) and next four hours(+4h) for the Wanli station. The Epa_tshd and Diff_tshd used in the example are 30 and 0.5, respectively. The green dashed line indicates Epa_tshd, and the colored dots represent the peaks from different predicted hours that exceed the two thresholds. The 26 red dots represent the total number of RTPEs predicted in 1 hour, compared to the 69 ground-truth events. The accuracy of RTPEs detection is thus 37.7%.

thumbnail
Fig 5. Classify remote pollutant.

Prediction of +1h and +4h with ground truth at Wanli station. The dots and stars at the bottom show all peaks that meet both EPA and Differential conditions.

https://doi.org/10.1371/journal.pone.0282471.g005

Note that RTPEs are defined by conditions that depend on the given Epa_tshd and Diff_tshd thresholds. By definition, a True Positive (TP) RTPEs is when the ground truth and the model prediction are both larger than the Epa_tshd and Diff_tshd. A True Negative (TN) event occurs if neither ground truth nor prediction is larger than the given thresholds. False Positive (FP) and False Negative (FN) events are defined similarly. We used these to calculate the accuracy (A), precision (Pr), recall (R), and F1 score. Specifically, the F1 score is defined as: (6) The formulas of the remaining metrics can be found in the deep learning textbook [44]; further classification details are provided in S1 Appendix.

Results and discussion

Performance of ESD model

In Table 2 we compared the daily PM2.5 prediction results of the ESD model with that of its components and the RTP_ktile model (with k = 2,4 representing the number of tiles), where Δ% is the relative improvement in RMSE over the Base model. The Base model outperforms the LSD model for +1day in both R and RMSE, but for +2day and +3day, the LSD component outperforms the Base model due to the application of AOD data. The prediction underestimation (negative) values of the Base model are low compared with LSD however the scatter plots (Fig 6) of observed PM2.5 and predicted PM2.5 values show few outliers than the LSD model.

thumbnail
Fig 6. Observed PM2.5 vs predicted PM2.5 scatter graphs.

Plots of association between observed PM2.5 vs predicted PM2.5 for Base model and proposal models for next 1 day prediction.

https://doi.org/10.1371/journal.pone.0282471.g006

thumbnail
Table 2. Results for the Base, LSD, ESD and RTP models (the number of samples N = 6390, 12744, 18954.).

https://doi.org/10.1371/journal.pone.0282471.t002

By applying heterogeneous AOD data, ESD model improves its prediction in RMSE by 12.68%, 11.45%, and 6.65% for +1day, +2day, and +3day, respectively. It also shows a better R value than the base model with a prediction bias (underestimation) value of 0.304. The result demonstrates the ESD’s topological changes with the addition of new local AOD knowledge decrease the prediction error between the true and predicted values.

Table 2 also shows that RTP_2tile and RTP_4tile outperformed all those three models for for all target days in RMSE and R. For MBE bias both RTP’s and Base model show positive (overestimation) and negative prediction values at different target day however RTP has few outliers than all the models as shown in scatter plots of observed vs predicted PM2.5 (Fig 6). Overall, the results show that the RTP model captures RTPEs from remote AOD data and it helps improve the prediction performance for all days. Regarding the Base model, the RTP_4tile provides the greatest improvement prediction performance in RMSE by 25.77%, 28.96% and 21.17% for +1day to +3day. The RTP_4tile also outperforms RTP_2tile on all three days that the result demonstrates the enlarged remote area will help improve the local prediction of PM2.5. This matches with our idea of enlarging the remote area from 2tile to 4tiles to capture more RTPEs.

Prediction of RTPEs

To answer the first question that we raised in the introduction, i.e., to predict RTPEs, we predicted the local PM2.5 for the two stations first using only the local PM2.5 and weather as input to the STRI_p model with the extracted spatio-temporal features from remote areas. We predicted the RTPEs by applying the thresholds Diff_tshd and Epa_tshd to the PM2.5 predictions. To observe the general performance, we used combinations of various thresholds that Diff_tshd = 30, 33, 36 and Epa_tshd = 0.5, 1.0, 1.5. For example, we combined Epa_tshd = 0.5 with Diff_tshd = 30, 33, 36 and also combined Epa_tshd = 1.0 with Diff_tshd = 30, 33, 36 as it shown in both Tables 3 and 4. Tables 35 shows the classification results in terms of accuracy (A), precision (Pr),recall (R), and F1 score (F1). The first column indicates the data used, for instance, “P” represents the local PM2.5 values, “EP” represents the remote spatio-temporal features from four tiles, and “W” represents the local weather features.

We first noted the increases in accuracy and other metrics for many forecasts when EP and W are added to the model, which demonstrates the contribution of RTPEs to increasing PM2.5 concentrations at the stations. A similar trend is observed for the Epa_tshd, although the increases in accuracy are lower. For example, for the next 24hour (+24h in short) predictions for the Wanli station with a Diff_tshd of 0.5, the accuracy is 0.72, 0.62, and 0.46 for Epa_tshd thresholds of 30, 33, and 36, respectively.

This shows that as the threshold of PM2.5 increases, the prediction of PM2.5 tends to be conservative and cannot follow the PM2.5 increase resulting in low accuracy. In terms of precision and recall, the highest recall of 0.44 is observed at Wanli whereas for Tamsui, the highest is 0.41, both at thresholds of 0.5 (Diff_tshd) and 30 (EPA_tshd) for +24h. For precision, the highest score is 0.24 and 0.30 at thresholds of 0.5 (36) for the Wanli and Tamsui stations, respectively. Overall, these results demonstrate the effects of the STRI model on the RTPEs prediction of these two stations in northern Taiwan.

Performance of RTP model

In this section we answer the second question about improving local PM2.5 predictions using knowledge about RTPEs. We discussed the results of different training approaches for the RTP components, knowledge captured from RTPEs, and the results of RTP models in comparison to other models.

The effect of training strategies on prediction performance was evaluated by comparing the results from a full STRI model with those using the STRI_fe and STRI_p components, as described. STRI_p yields better prediction results than the full STRI model in both RMSE and R (Fig 7a and 7b). Since training a full STRI model on a single GPU can be challenging, STRI_fe for feature extraction and STRI_p for prediction were used. As STRI_p consists of a small number of layers, it converges quickly during training, leaving more room for model fine-tuning. Thus, the improved performance of STRI_p validates our idea of breaking the full STRI model into two components.

thumbnail
Fig 7. STRI, STRI_p and RTP performance.

Top:Performance of STRI and STRI_p components from +4h to +72h in (a)RMSE and (b)R. Down:Results in (c)RMSE and (d)R given remote pollutant and local features using STRI_p component.

https://doi.org/10.1371/journal.pone.0282471.g007

Secondly, the effects of the extracted remote pollutants and local features on the STRI_p component were evaluated. Experiments were conducted using one feature and incrementally added features while observing the results in terms of RMSE and R. Fig 7c and 7d shows the results of various features, including spatio-temporal features from two and four tiles (t12 and t1234) as well as the local PM2.5 (P) and weather (W) features from 18 stations. Therefore, the model input sequence is P, t12(tiles h28v06 and h29v06), the remaining two tiles(t34)(tiles h28v05 and h29v05) and then W.

A significant gap between the performance in R and RMSE when using P and when using data on remote pollutants from tiles t12 (P+t12) and t1234 (P+t1234) was observed (Fig 7(c) and 7(d)). Also the impact of expanding the range from tiles t12 to t1234 was observed. This impact is not present between +4h and +24h, possibly as events from t34 require additional time to make an impact. This fits with our goal of expanding the range to four tiles to improve prediction by capturing more RTPEs. The gap was attributed to long-term rather than short-term (+4h to +24h) weather fluctuations. Generally, the results show that the STRI_fe component captures knowledge from RTPEs by learning spatio-temporal behavior from AOD data with their corresponding weather features from remote areas.

Thirdly, we evaluated the RTP model performance for four seasons in a year using prediction results of +4hr up to +48hr. We divided the dataset into four periods each having three months and evaluate the prediction results using RMSE and R. Season one (S1) starts from January to March, season two (S2) covers April to June, Season three (S3) starts July to September and season four (S4) covers the remaining months (October to December). Fig 8a and 8b shows RTP produces better performance in RMSE and R for S1 than all seasons in every prediction hour. Furthermore, S1 is the period where winter is at its peak and is the same period where northeasterly winter monsoon wind transport pollutants from central and northeastern Asia to Taiwan [45]. Therefore the better performance of RTP in S1 is probably contributed by the existence of a high level of pollutants in that period in the training datasets. In addition, S2 and S3 represent the spring and summer seasons which normally have a low levels of PM2.5 this reflects the performance of the RTP model in that period. S4 is the period when winter starts and expecting the level of PM2.5 to increase however the RTP performance does not imitate that scenario. Overall, the RTP performance matches with season variation with the level of PM2.5 in the northern part of Taiwan.

thumbnail
Fig 8. RTP model performance.

The RTP performance in (a)RMSE and (b)R on four seasons in year. (c):Relative RMSE Improvement (%) of all models with reference to base model.

https://doi.org/10.1371/journal.pone.0282471.g008

Fourth, we evaluated the performance of the RTP model in comparison with the Base model [27] and other state-of-the-art ensemble models with the same settings: linear regression (LR) [46], AB [32], BG, RF [34, 35], XGB [3638], and a GAM [33, 39]. Note XGB yields better performance than gradient boosting machine [47] because of using more regularized model formalization to control over-fitting [48]. We also show RTP performance when we use RTPEs from 2tile (RTP_2tile) and 4tile (RTP_4tile) to show the impact of remote pollutants on the on the local prediction of PM2.5.

Fig 8(c) shows the relative prediction improvements in RMSE of both RTP models and the other state-of-the-art models w.r.t. Base model from +4h to +72h; the greater the improvement, the better the model does in comparison to the Base model. The figure shows that RTP_4tile yields the greatest improvements: from 17%–30%, 23%–26%, and 18%–22% for +4h to +24h, +28h to +48h, and +52h to +72h, respectively. Similarly, the RTP_2tile provides greater improvement: from 13%-24%, 17%-23%, and 13%-17% for +4h to +24h, +28h to +48h, and +52h to +72h. XGB and GAM, in turn, improve on the Base model by 6%–8%, 10%–12%, and 8%–11% for +4h to +24h, +28h to +48h, and +52h to +72h, respectively, with scores that are similar to those for the LR model. AB is outperformed by the Base model for most hours; RF is also, but to a lesser extent. The RTP_2tile and RTP_4tile outperform other models due to their composite neural network design [27], which involves high flexibility with learning capability to model nonlinear association between input features. The performance of RTP_4tile over RTP_2tile continues to demonstrate the importance of the enlarged remote area to capture more RTPEs from the remote area.

Conclusion

To characterize the occurrence of remote transportation pollution events (RTPEs), we define it as a combination of thresholds and increments in one hour of PM2.5 concentration, and then design an algorithm to classify PM2.5 concentrations into RTPEs. The proposed RTPE and algorithm are evaluated for the area in northern Taiwan and the corresponding satellite data, and we believe that the proposed method can be applied elsewhere. In particular, the evaluation shows that a well-designed deep learning model extracts the knowledge from satellite data that it improves the accuracy of PM2.5 prediction.

It is worth noting that RTPEs can be captured using the proposed composite RTP model, and then RTPEs can be aqpplied to improve the prediction of PM2.5. The proposed RTP comprises two main components: a pre-trained Base model and the STRI model that capture the knowledge of local PM2.5 concentrations and RTPEs, respectively. In addition, STRI learns spatio-temporal characteristics of AOD data and weather features through its component STRI_fe, and then predicts local PM2.5 through the other component STRI_p that their performances are validated from empirical study. Local PM2.5 predictions using the RTP model outperform the base model and other state-of-the-art ensemble models by 12%–30%, 12%–18%, and 10%–14% for +4h to +24h, +28h to +48h, and +52h to +72h, respectively. The ESD model, although it only considers local AOD data, still improves PM2.5 prediction, as evidenced by lower RMSE scores than the base model by 12.68% for +1day and 11.45% and 6.65% for +2day and +3day.

The outstanding performance of the STRI model on the prediction of PM2.5 will help the government policy maker to take measures, including controlling traffic in the area that is expected to have a high level of PM2.5. They may also use that information for warning systems and plan mitigation actions to reduce the risk to public health. In addition, the same information can be used by individuals to organize their activities, such as whether to exercise outside.

Future work will focus on expanding the remote area, using data that are updated at a higher frequency compared to AOD data, and considering other possible features and models.

Supporting information

S1 Appendix. More details on classifying remote pollutants.

https://doi.org/10.1371/journal.pone.0282471.s002

(PDF)

Acknowledgments

We thank the NASA, NCEP and FNL, CWB and EPA team for the data used in our work

References

  1. 1. Pandey JS, Kumar R, Devotta S. Health risks of NO2, SPM and SO2 in Delhi (India). Atmospheric Environment. 2005;39(36):6868–6874.
  2. 2. Lee M, Kloog I, Chudnovsky A, Lyapustin A, Wang Y, Melly S, et al. Spatiotemporal prediction of fine particulate matter using high-resolution satellite images in the Southeastern US 2003–2011. Journal of exposure science & environmental epidemiology. 2016;26(4):377–384. pmid:26082149
  3. 3. Liu ST, Liao CY, Kuo CY, Kuo HW. The effects of PM2. 5 from Asian dust storms on emergency room visits for cardiovascular and respiratory diseases. International journal of environmental research and public health. 2017;14(4):428. pmid:28420157
  4. 4. Song Z, Chen B, Huang J. Combining Himawari-8 AOD and deep forest model to obtain city-level distribution of PM2. 5 in China. Environmental Pollution. 2022;297:118826. pmid:35016979
  5. 5. Gibson MD, Kundu S, Satish M. Dispersion model evaluation of PM2. 5, NOx and SO2 from point and major line sources in Nova Scotia, Canada using AERMOD Gaussian plume air dispersion model. Atmospheric Pollution Research. 2013;4(2):157–167.
  6. 6. Chen Z, Chen D, Zhao C, Kwan Mp, Cai J, Zhuang Y, et al. Influence of meteorological conditions on PM2. 5 concentrations across China: A review of methodology and mechanism. Environment international. 2020;139:105558. pmid:32278201
  7. 7. Hung WT, Lu CH Sarah, Wang SH, Chen SP, Tsai FJ, Chou Charles CK. Investigation of long-range transported PM2.5 events over Northern Taiwan during 2005–2015 winter seasons Atmospheric Environment. 2019;217:116920
  8. 8. Xu Q, Chen X, Yang S, Tang L, Dong J. Spatiotemporal relationship between Himawari-8 hourly columnar aerosol optical depth (AOD) and ground-level PM2. 5 mass concentration in mainland China. Science of the Total Environment. 2021;765:144241. pmid:33385809
  9. 9. Guo J, Xia F, Zhang Y, Liu H, Li J, Lou M, et al. Impact of diurnal variability and meteorological factors on the PM2. 5-AOD relationship: Implications for PM2. 5 remote sensing. Environmental Pollution. 2017;221:94–104. pmid:27889085
  10. 10. Lai IC, Brimblecombe P, others. Long-range transport of air pollutants to taiwan during the covid-19 lockdown in Hubei province Aerosol and Air Quality Research. 2021;21(2):200392
  11. 11. Chuang MT, Chou Charles CK, Lin N, Takami A, Hsiao TC, Lin TH, et al. A simulation study on PM2. 5 sources and meteorological characteristics at the northern tip of Taiwan in the early stage of the Asian haze period Aerosol and Air Quality Research. 2017;17(12):3166–3178
  12. 12. Kloog I, Koutrakis P, Coull BA, Lee HJ, Schwartz J. Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurements Atmospheric environment. 2011;45(35):6267–6275
  13. 13. Chudnovsky A, Lyapustin A, Wang Y, Tang C, Schwartz J, Koutrakis P. High resolution aerosol data from MODIS satellite for urban air quality studies Open Geosciences. 2014;6(1):17–26
  14. 14. Di Q, Kloog I, Koutrakis P, Lyapustin A, Wang Y, Schwartz J Assessing PM2.5 exposures with high spatiotemporal resolution across the continental United States Environmental science & technology. 2016;50(9):4712–4721 pmid:27023334
  15. 15. Arystanbekova NK. Application of Gaussian plume models for air pollution simulation at instantaneous emissions. Mathematics and Computers in Simulation. 2004;67(4-5):451–458.
  16. 16. Kim MJ, Park RJ, Kim JJ. Urban air quality modeling with full O3–NOx–VOC chemistry: Implications for O3 and PM air quality in a street canyon. Atmospheric Environment. 2012;47:330–340.
  17. 17. Appel KW, Bhave PV, Gilliland AB, Sarwar G, Roselle SJ. Evaluation of the community multiscale air quality (CMAQ) model version 4.5: sensitivities impacting model performance; part II—particulate matter. Atmospheric Environment. 2008;42(24):6057–6066.
  18. 18. Xiao QY, Chang H, Geng GN and Liu Y An ensemble machine-learning model to predict historical PM2.5 concentrations in China from satellite data. Environmental science & technology. 2018;52(22):13260–13269.
  19. 19. Luo Z, Huang J, Hu K, Li X, Zhang P. AccuAir: Winning solution to air quality prediction for KDD Cup 2018. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019. p. 1842–1850.
  20. 20. Lin CY, Liu SC, Chou CC, Liu TH, Lee CT, Yuan CS, et al. Long-range transport of Asian dust and air pollutants to Taiwan. Terr Atmos Ocean Sci. 2004;15(5):759–784.
  21. 21. Lin CY, Wang Z, Chen WN, Chang SY, Chou C, Sugimoto N, et al. Long-range transport of Asian dust and air pollutants to Taiwan: observed evidence and model simulation. Atmospheric Chemistry and Physics. 2007;7(2):423–434.
  22. 22. Chuang MT, Fu JS, Jang CJ, Chan CC, Ni PC, Lee CT. Simulation of long-range transport aerosols from the Asian Continent to Taiwan by a Southward Asian high-pressure system. Science of the total environment. 2008;406(1-2):168–179. pmid:18790518
  23. 23. Chen TF, Chang KH, Tsai CY. Modeling direct and indirect effect of long range transport on atmospheric PM2.5 levels. Atmospheric Environment. 2014;89:1–9.
  24. 24. Chuang MT, Lee CT, Hsu HC. Quantifying PM2.5 from long-range transport and local pollution in Taiwan during winter monsoon: An efficient estimation method. Journal of environmental management. 2018;227:10–22. pmid:30172155
  25. 25. Zhang JB, Zheng Y, Qi DK, Li RY, Yi XW, Li TR. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence. 2018;259:147–166.
  26. 26. Sønderby CK, Espeholt L, Heek J, Dehghani M, Oliver A, Salimans T, et al. MetNet: A Neural Weather Model for Precipitation Forecasting. arXiv preprint arXiv:200312140. 2020.
  27. 27. Yang MC, Chen MC. Composite Neural Network: Theory and Application to PM2.5 Prediction. IEEE Transactions on Knowledge and Data Engineering. 2023; 35(2):1311–1323.
  28. 28. Yi X, Zhang J, Wang Z, Li T, Zheng Y. Deep distributed fusion network for air quality prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018. p. 965–973.
  29. 29. Hu WS, Li HC, Pan L, Li W, Tao R, Du Q. Feature extraction and classification based on spatial-spectral convlstm neural network for hyperspectral images. arXiv preprint arXiv:190503577. 2019;.
  30. 30. Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo Wc. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems; 2015. p. 802–810.
  31. 31. Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC. Convolutional LSTM network: A machine learning approach for precipitation nowcasting Advances in neural information processing systems. 2015;28.
  32. 32. Zhai B, Chen J. Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. Science of The Total Environment. 2018;635:644–658. pmid:29679837
  33. 33. Shtein A, Kloog I, Schwartz J, Silibello C, Michelozzi P, Gariazzo C, et al. Estimating daily PM2.5 and PM10 over Italy using an ensemble model. Environmental Science & Technology. 2019;54(1):120–128.
  34. 34. Chen B, Song Z, Pan F, Huang Y. Obtaining vertical distribution of PM2. 5 from CALIOP data and machine learning algorithms. Science of The Total Environment. 2022;805:150338. pmid:34537706
  35. 35. Song Z, Chen B, Huang Y, Dong L, Yang T. Estimation of PM 2.5 concentration in China using linear hybrid machine learning model. Atmospheric Measurement Techniques. 2021;14(8):5333–5347.
  36. 36. Wong PY, Lee HY, Chen YC, Zeng YT, Chern YR, Chen NT, et al. Using a land use regression model with machine learning to estimate ground level PM2. 5. Environmental Pollution. 2021;277:116846. pmid:33735646
  37. 37. Chen ZY, Zhang TH, Zhang R, Zhu ZM, Yang J, Chen PY, et al. Extreme gradient boosting model to estimate PM2. 5 concentrations with missing-filled satellite data in China. Atmospheric Environment. 2019;202:180–189.
  38. 38. Bagheri H. A machine learning-based framework for high resolution mapping of PM2. 5 in Tehran, Iran, using MAIAC AOD data. Advances in Space Research. 2022;69(9):3333–3349.
  39. 39. Kulkarni P, Sreekanth V, Upadhya AR, Gautam HC. Which model to choose? Performance comparison of statistical and machine learning models in predicting PM2. 5 from high-resolution satellite aerosol optical depth. Atmospheric Environment. 2022; p. 119164.
  40. 40. Meng X, Karniadakis GE. A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems. Journal of Computational Physics. 2020;401:109020.
  41. 41. Wu C, Zhang S, Wang G, Lv S, Li D, Liu L, et al. Efficient heterogeneous formation of ammonium nitrate on the saline mineral particle surface in the atmosphere of East Asia during dust storm periods. Environmental Science & Technology. 2020;54(24):15622–15630. pmid:33256403
  42. 42. Yang MC, Chen MC. PM2.5 Forecasting Using Pre-trained Components. In: 2018 IEEE International Conference on Big Data (Big Data). IEEE; 2018. p. 4488–4491.
  43. 43. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:150203167. 2015.
  44. 44. Zumel N, Mount J, Porzak J. Practical data science with R. Manning Shelter Island, NY; 2014.
  45. 45. Lin CY, Liu SC, Chou CCK, Huang SJ, Liu CM, Kuo CH, et al. Long-range transport of aerosols and their impact on the air quality of Taiwan. Atmospheric Environment. 2005;39(33):6066–6076.
  46. 46. Yang Q, Yuan Q, Li T. Ultrahigh-resolution PM2. 5 estimation from top-of-atmosphere reflectance with machine learning: Theories, methods, and applications. Environmental Pollution. 2022;306:119347. pmid:35483482
  47. 47. Pu Q, Yoo EH. Ground PM2. 5 prediction using imputed MAIAC AOD with uncertainty quantification. Environmental Pollution. 2021;274:116574. pmid:33529896
  48. 48. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. p. 785–794.