Time-series prediction model of PM2.5 concentration based on LSTM neural network

LSTM (long-short term memory neural network) is a neural network suitable for simulating and predicting time series changes. This paper designs a PM2.5 concentration prediction model based on the LSTM neural network. Use Python language programming implementation under the Tensor Flow framework, training the model using real weather data from 2015-01-01∼2019-01-01, and the obtained model can accurately predict the concentration change of future PM2.5. RMSE and DC were used as evaluation indexes to compare the LSTM neural network model with the traditional BP (Back Propagation) algorithm model. It turns out, LSTM model has the characteristics of high precision and stable prediction effect, proved that the time series prediction ability of the LSTM model is more superior. As an effective method for predicting PM2.5 concentration data, it can provide the basis for predicting pollutants.


Introduction
With the continuous development of industrial society, the degree of environmental pollution is aggravated, among which the air pollution is severe. PM (particulate matter) is an important index to measure the degree of air pollution. The higher the concentration in the air is, the more serious the air pollution is. PM2.5 are prone to attach toxic and harmful substances (e.g. heavy metals, microorganisms, etc.), and stay in the atmosphere for a long time, transport distance, not only affect the balance of ecosystems, but also cause harm to human health [1]. How to predict the concentration of PM2.5 so as to take reasonable prevention and control measures has become a hot topic.
Existing PM2.5 concentration prediction models mainly include the mechanism model, traditional time series model and machine learning model. The prediction method based on the mechanism model is to simulate the future air pollution by the physicochemical process of air pollutants. This kind of model process is not only complicated, but also challenging to obtain various parameters accurately, which makes the model accuracy not high [2]. The time series model can not construct the complicated nonlinear relationship in the PM2.5 data by using the historical time series information to predict the future PM2.5 concentration, so the prediction effect is not good. The shallow network based on machine learning, represented by artificial neural network and improved grey neural network model, can model complex nonlinear relationship prediction, but because the network structure is relatively simple, the prediction effect is generally [3]. Moreover, deep learning, as a new machine learning model, shows significant advantages in dealing with complex time series relationships. It effectively learns a large number of input data features and provides new research ideas and methods for PM2.5 time series prediction. Most of the research on prediction models only build concentration models with PM2.5 historical concentration or various meteorological elements as independent variables, but ignore that the This paper proposes an LSTM neural network prediction model for PM2.5 prediction under the framework of TensorFlow depth learning, using the historical concentration and various meteorological elements as independent variables. Use real weather data training model and verify the prediction results. Furthermore, the results are compared with the constructed BP neural network prediction model.
The model has good nonlinear learning ability in PM2.5 data prediction, and the prediction results are more accurate than BP neural networks.

LSTM Neural Networks
The neural network is a mathematical model that simulates the processing of information similar to the neural synaptic connection structure of the brain. It studies massive historical data to find the best way to solve the problem [4]. The prediction of PM2.5 pollutants needs to find the mapping relationship in the historical load data and then use it to predict the future data. Therefore, the neural network model can be used to solve the prediction problem. RNN (recurrent neural network) of neural network has a remarkable effect in processing time-series data. RNN neural network, different from the basic neural network, focuses on the cycle, not only establishing the weight connection between the layers, but also establishing the connection between the neurons in each layer. Use such sequential information to predict future changes (shown in Fig.1).

Fig.1 Schematic diagram of RNN network model
The input X t , output of the neural network unit is Y t . Each operation of RNN neural network depends on the previous calculation results, in theory, the sequence information of any length can be used, but as the time interval of training data increases, the RNN learning ability will become weaker. The model will appear gradient descent or gradient explosion with the increase of time interval.
LSTM neural network, as a variant of RNN neural network, controls forgetting or adding information by adding cell state design similar to 'highway' and allowing information to be selectively designed through 'gate'. In order to avoid RNN problems caused by the length of time series.
The gate structure of LSTM neural network includes three kinds: Input Gate, Output Gate and Forget Gate, which learn and predict the data by controlling the input, output and historical dependence of neurons (shown in Fig.2 In the above formula, C is the cell state, the W is the corresponding weight coefficient matrix, and the b is the corresponding bias term. According to the above formula, the principle of forgetting Gate is to read the output of the previous neuron and the current neuron input, and outputs a value between [0,1] multiplied by the previous cell state. If output 1, it represents complete 'memory' of the last cell state information. If output 0, it represents complete 'oblivion' to the previous cell state. In this way, the long-term memory of the network model is guaranteed (formula (2)). The input gate receives x t and y t-1 to obtain data and new candidate vectors (formula (3,4)). Then, the output result o t of the neuron is obtained by x t and y t-1 through the output gate (formula (5)).
The final output y t of the current neuron in formula (6) is obtained by multiplying the current cell state C t processed by the tanh layer with the results o t obtained from the output gate.

BP Neural networks
BP neural network is one of the most widely used artificial neural network models. It is a multilayer feedforward network, which is trained by error inverse propagation algorithm. It can learn and store a large number of mathematical equations of input-output pattern mapping relation [5]. Its network model topology includes input layer, hidden layer and output layer. The input data first reaches the input layer and then goes to the output layer after the hidden layer training and learning. The hidden layer can be one or more layers, which are composed of multiple neurons. The state of each layer of neurons is only related to the state of the next layer of neurons. The topology of BP neural network is shown in Fig.3. If the output results obtained in the output layer do not meet the expected target, the error signal is returned along the original connection path, and the weight of each neuron is changed. Repeat the positive and negative calculations until the calculated results meet the requirements.

Experimental data sets
The real data from 2015-01-01~2019-01-01 are selected as the experimental data set. The data set consists of 24 full-time weather data per day for the four years, with a total of 35040 (24×365×4) load data, each including the PM2.5 concentration, temperature, air pressure, wind direction, wind speed, accumulated hour snow, accumulated hour rainfall at that time point (shown in Fig.4).

Data preprocessing
Due to the human operation or equipment failure, the missing data can be completed in data acquisition. By observing the data set, there is no large number of continuous missing and PM2.5 continuity of the data, so the average value of the missing value is used to complement the missing data in the experiment. All the above seven kinds of data are numerical data except wind direction data, but one-hot coding is used for wind direction data because of its unintentional value.  Fig.5 Wind direction coding diagram LSTM model is sensitive to the input data scale, and each index varies significantly in the order of magnitude. In order to reduce the influence of the order of magnitude difference on the prediction, the original data should be normalized. Normalize the completed data using the following formula: The x is the original data, the x min is the minimum value of the original data, the x max represents the maximum value of the original data, and the normalized data (a value between [0,1]) can reduce the data scale on the premise of ensuring the same trend of data.

LSTM model
In the LSTM model, the loss function is used to measure the proximity between the calculated value and the measured value, and the model parameters are determined according to the principle of the minimum loss function. Choose the average absolute error MAE as the loss function.
In the formula above, m is the number of samples; y 0 is the measured PM2.5 concentration; y f is the concentration that the model calculates.
Divide the experimental data into two parts. The first three years are used as a model training set. Data for the fourth year as a test set, compare with the model prediction results to test the prediction effect of the model. Iterate the training data into the designed LSTM model. By continually tuning the model, finally, the model parameters are as follows: the number of hidden neurons is 60, 72 batches of training data, the initial learning rate is 0.001,and the number of iterations is set to 80.
In addition, the AdamOptimizer is selected to dynamically adjust the learning rate. The AdamOptimizer can adjust the learning rate adaptively from the two perspectives of gradient mean and gradient variance, and the effect is good. Fig.6 Loss value and iteration of LSTM model From the diagram, it can be seen that the loss function decreases gradually and tends to be stable, and there is no phenomenon of underfitting or overfitting, which indicates that the LSTM model rate is reasonable.

BP model
A total of 1 input layer, 3 hidden layers and 1 output layer structure are set up BP the neural network in this paper. The number of neurons in 3 hidden layers is 3, 5, 7, and the learning rate and iteration number are set to 0.01,1000, respectively.

Training results
Using the trained LSTM, BP model generates the forecast data predict_steps. for the next 200 hours, the final experimental results are shown in the figures.

Model evaluation indicators
Based on RMSE (Root Mean Square Error) and DC (Deterministic Coefficient), the accuracy of the model is evaluated.
(1)RMSE: (10) (2)DC: In the formula above, m is the number of samples; y 0 is the measured PM2.5 concentration; y f is the concentration that the model calculates, and y a is the average value of the measured concentration. Obviously, Q measures the average proximity between the predicted results and the measured values. The smaller the value, the higher the prediction accuracy. Moreover, DC reflects the fitting effect of the prediction process and the measured process as a whole. The greater the value of DC, the higher the model accuracy, and vice versa.

Conclusions
This paper selects the real pollution and weather data as the research object, constructs the future PM2.5 concentration forecast model based on the LSTM neural network and the BP neural network, and evaluates the model accuracy with the root mean square error and the deterministic coefficient as the index. The main conclusions are as follows: On average, the root mean square error predicted by the LSTM neural network model can be maintained below 26.7, while the root mean square error of the BP neural network reaches more than 8 30 many times. According to the deterministic coefficient, LSTM can maintain the prediction of accuracy above 0.922, which is higher than the maximum accuracy of BP model. As can be seen, the LSTM model has high accuracy and good applicability for the studied weather data.
To sum up, LSTM neural network has long-term dependent learning ability compared with the BP neural network, which provides an effective solution for long period PM2.5 prediction.