Bi-GRCN: A Spatio-Temporal Traffic Flow Prediction Model Based on Graph Neural Network

,


Introduction
Traffic flow prediction is to predict the future traffic flow of the road according to the historical traffic flow data. It is an important part of the Intelligent Transportation System and also provides a scientific suggestion for traffic planning and control [1,2]. According to the predicted traffic flow conditions, the Transport Department can deploy and guide vehicles driving in advance to reduce traffic congestion, and the transport agency can select appropriate transport routes to improve travel efficiency [3]. However, due to the complex spatial and temporal characteristics of traffic flow, real-time and accurate traffic flow prediction is a huge challenge. Traffic flow shows correlation and dependence in time and space. erefore, comprehensively considering the time and space characteristics of traffic flow is the key to realize traffic flow prediction. e time characteristic of traffic flow refers to the periodicity [4] and trend of changes [5] in traffic conditions over time. e traffic flow data shows periodic changes over time. For example, the traffic flow in the morning and evening peak periods on weekdays is significantly higher than that at other times and the traffic flow in the early morning is very small. e traffic flow data has a certain trend change with time, and the traffic flow at the historical time will have an impact on the traffic flow at the future time, especially on the flow at the adjacent time. For example, the traffic flow of a road bayonet from 8:00 to 8:15 A.M. will affect the traffic flow of the bayonet from 8:15 to 8: 30 A.M. e spatial correlation [6] of traffic flow means that the traffic condition of any road in the traffic road will be affected by the other roads, and has a spatial correlation with its adjacent or connected roads. e spatial dependence [7] of traffic flow is that the traffic conditions of the upstream roads will be transmitted to the downstream roads, and the traffic conditions of the downstream roads will also have a corresponding retrospective effect on the upstream roads, that is, from a spatial point of view, the geographically adjacent areas show strong spatial dependence. For example, if there are novice drivers on the upstream road driving slowly, the road congestion will directly lead to traffic congestion on the downstream road, and if there is slow traffic on the downstream road, the speed of the upstream road will also be affected accordingly. e traditional traffic flow prediction method [8,9] is to predict the future traffic flow by considering the time correlation of traffic flow data and learning the data characteristics of historical traffic flow, such as Kalman filtering model (KFM) [10,11], Autoregressive Integrated Moving Average (ARIMA) model [12,13], k-nearest neighbor model [14,15], Bayesian model [16,17], and so on. ese methods consider the dynamic changes of traffic conditions with time but ignore the influence of space, so they can not accurately predict traffic conditions. To better describe the relationship between traffic flow and spatial characteristics, Neural Network is introduced to model the spatial characteristics of traffic flow data. However, traditional Neural Networks are usually used for the analysis of neatly arranged Euclidean data, such as text, images, and audio, and are not suitable for irregular traffic roads with complex topology. erefore, the traditional Neural Network cannot deeply explore the spatial characteristics of traffic flow.
To better learn the complex spatial dependence and temporal correlation of traffic flow data and predict traffic flow more accurately, this paper proposes a spatio-temporal traffic flow prediction model based on a new Graph Neural Network (GNN), which is called Bidirectional-Graph Recurrent Convolutional Network (Bi-GRCN). e main contributions of this paper are as follows: (1) Aiming at the spatial dependence of traffic flow data, the Graph Convolution Network (GCN) is introduced and improved, and a new spatio-temporal traffic flow prediction model is proposed based on GNN. e spatial relationship between traffic flow and traffic road is studied, and the adjacency matrix without weight is constructed to represent the connection relationship of traffic road. rough the learning of GCN, the spatial dependence in traffic flow data is better captured, and a new traffic flow prediction model is constructed. (2) A traffic flow prediction model which could extract time features is constructed based on Bidirectional-Gate Recurrent Unit (Bi-GRU). Bi-GRU uses bidirectional layer-by-layer training and has good performance in feature extraction. Considering that traffic flow is time series data and has time-series correlation characteristics, Bi-GRU is used to capture the time correlation characteristics hidden in the data time-series, and to learn the correlation relationship among traffic flow data, historical data, and future data, so that the predicted value can be obtained. (3) e idea of integrating spatio-temporal data is adopted to improve the prediction ability of the model. Traffic flow is the data that integrates spatial and temporal information. e temporal correlation between the traffic flow data which is divided by time slices and the hidden spatial dependence in each time slice is learned, and the temporal and spatial characteristics are fused through the full connection layer to improve the prediction accuracy of the model. e rest of the paper is organized as follows: Section 2 shows the related research of traffic flow prediction. Section 3 introduces the definition and method of traffic flow prediction in detail. Section 4 explains the Bi-GRCN model for traffic flow prediction. Section 5 evaluates the prediction performance of the Bi-GRCN model through real-world traffic data sets, including model parameters, results analysis, and model interpretation. Section 6 is the conclusion of the paper.

Related Work
e existing traffic flow prediction models are divided into traditional traffic flow prediction models and traffic flow prediction models based on Machine Learning. e commonly used traditional flow prediction models include the Historical Average Model (HAM) [18], Kalman Filtering Model (KFM) [10,11], and Autoregressive Integrated Moving Average Model (ARIMA) [12,13]. HAM takes the average data of historical traffic flow as the result and the calculation is simple and efficient. KFM is a linear regression analysis model and it has the advantages of high precision and flexible selection of predictors. ARIMA forecasts the traffic flow by analyzing the relationship between historical and current traffic flow data and has strong interpretability. e commonly used classical Machine Learning methods for traffic flow prediction include K-Nearest Neighbor (KNN) [14,15], Support Vector Machine (SVM) [19,20], and Decision Tree (DT) [21]. KNN is to find the flow data of K historical periods closest to the traffic flow in the predicted period; however, it has high computational complexity. SVM uses the trained SVM model of traffic flow prediction to forecast the traffic flow; however, the prediction ability of the model depends on the kernel function. DT realizes the Classification Forecasting of traffic flow through continuous feature selection, and it has the advantages of high calculating speed and high prediction accuracy, but it is easy to overfit.
Deep Learning models considering the temporal correlation of data include Recurrent Neural Networks (RNN) [22,23], Long Short-Term Memory (LSTM) [24], and Gated Recurrent Unit (GRU) [25], while comprehensively considering spatial dependence and temporal correlation include Convolutional Neural Network(CNN) [26,27], Deep Belief Network(DBN) [28], and Stacked Autoencoder (SAE) [29]. RNN can effectively use the self-circulation mechanism, and they can learn long-term temporal correlation of traffic flow data well. LSTM transmits the time data through the gate unit, uses the memory unit to continuously store the updated data, and obtains the short-term and longterm temporal correlation of the traffic flow data. GRU has a simpler structure and fewer parameters compared with LSTM. erefore, GRU is better than LSTM in training speed and operational efficiency. CNN is a classical feedforward propagation Deep Learning model, which can capture the spatial dependence and temporal correlation of data at the same time. DBN consists of multiple Restricted Boltzmann Machine (RBM) [30] and can learn the traffic flow under the influence of spatial dependence between roads. SAE consists of multiple self-encoders and can learn multi-level features, so it can effectively mine the spatial dependence and temporal correlation in traffic flow data.
In recent years, Graph Neural Network (GNN) [31,32] has become the most discussed topic in deep learning research, showing state-of-the-art performance in various traffic applications [33], such as traffic congestion, traffic safety, travel demand, automatic driving, and traffic monitoring. Because of GNN's ability to capture spatial dependency, which is represented using non-Euclidean graph structures, it is ideally suited to solve traffic prediction problems; for example, the Diffusion Convolutional Recurrent Neural Network (DCRNN) [34], Graph Attention Network (GAT) [35], and Graph WaveNet [36] models.
Binary Graph Convolutional Network (Bi-GCN) [37] binarizes both the network parameters and input node features, and Bi-Directional Graph Convolutional Networks (Bi-GCN) [38] explore both characteristics by operating on both top-down and bottom-up, and Graph Convolution [39] introduced into the segmentation task and proposes an improved Laplacian. e historical days [40] are selected and added for daily traffic flow forecasting through contextual mining. Incorporating contextual factors and traffic flow patterns [41], and a deep-learning-based method for daily traffic flow forecasting could be introduced. A deep neural network [42] based on historical traffic flow data and contextual factor data is proposed. e GNN-based method utilizes various graph formulations, so it has been extended to other transportation modes. Based on this background, this paper proposes a new Deep Learning model on GNN [43], which can capture complex spatio-temporal characteristics from traffic flow data to further improve the accuracy of prediction.

Related Definition.
Traffic information is spatio-temporal data that has both spatial dependence and temporal correlation.
erefore, traffic conditions are not only affected by historical traffic conditions, but also by the upstream and downstream relationships in the road. e purpose of traffic flow prediction is to predict future traffic conditions based on historical information. Usually, traffic conditions are mainly described by variables such as traffic flow, vehicle speed, and road occupancy. In this study, the traffic conditions are measured by vehicle speed. Taking account of the spatial and temporal characteristics of the vehicle speed, the vehicle speed is transformed into a spatiotemporal matrix containing time series data of historical traffic conditions and spatial characteristic data of road connections, to predict the vehicle speed for a while in the future.
is an unweighted matrix that represents the spatial dependence between traffic roads.
In this situation, V is the collection of traffic roads, V i presents one link in the road network, and N is the number of roads in the traffic networks. E is the set of all the edges in the road graphic reflecting the connection between roads. Meanwhile, the adjacency matrix A shown in equation (1) stores the connection information of roads in graphic G.
e matrix A contains elements of 0 and 1 in equation (1), where e ij represents the edge from v i to v j in the graphic e graph structure is transformed into an unweighted adjacency matrix A, as shown in Figure 1.
e traffic information on the road networks suggests the temporal attribute features of roads, which is expressed as X N×P . N represents the number of roads and P represents the number of temporal attribute features of roads.
where m represents the number of vehicles in t minutes, v t i is the average speed of the i-th vehicle, and x t i represents the average speed of vehicles on the i-th road section in t minutes.
Definition 3. Spatio-Temporal Graphic G t of Traffic Information. e spatio-temporal situations of traffic information containing both the spatial characteristic information of traffic networks and the time-series characteristic data information of traffic conditions, which is expressed as G t � (V, E , X t ). In this way, G t represents the traffic conditions attributed to the vehicle speed dynamically changing with time. V presents the collection of traffic roads, E presents the set of all the edges in roads networks, and X t presents the time-series characteristic matrix of the vehicle speed at time t.
To solve traffic flow prediction problems, we could regard it as the mapping function f on the premise of acquiring temporal feature matrix X and road network topology G, and then calculate the traffic flow at the next T moment, as shown in where n is the length of the historical time series and T is the length of the time series to be predicted.

3.2.
Overview. We proposed a model Bi-GRCN for traffic flow prediction, which is composed of both GCN and Bi-GRU. At first, input the data with spatial characteristics at historical moments into the GCN, and then obtain the spatial characteristics by using GCN to capture the topological structure of the traffic roads. Second, input the time series data with spatial characteristics into the Bi-GRU, and obtain the bidirectional time characteristics through the forward and backward information transmission between the gate units. Finally, the traffic flow prediction results will be obtained through the fusion of spatio-temporal data on the fully connected layer. e framework of the Bi-GRCN is shown in Figure 2

The Proposed Method
e key problem to be solved in traffic flow prediction is to obtain the complex spatial dependence and temporal correlation of traffic flow data.

Spatial Dependency Modeling.
e traffic flow in the real world is changing with the transformation of traffic road topology. e commonly used CNN Modeling method can obtain the spatial characteristics of data, but it can only act on regular Euclidean space data, and cannot capture the spatial dependence of complex traffic roads. GCN can widely process non-Euclidean space data and has been successfully applied to image classification, document analysis, and other fields. Considering the spatial dependence of traffic flow data on road topology, this paper uses GCN to process traffic flow data to better capture the spatial characteristics of the data. e structure of GCN is shown in Figure 3. e GCN constructs a filter in the Fourier domain, and then acts on the nodes of the graph to capture the spatial characteristics between nodes. e GCN model is established by stacking multiple convolution layers. e calculation process of GCN is described in where A is the matrix with additional self-connections, I N is the identity matrix, D is the degree matrix, H (l) is the output of l layer, H (l+1) is the output of l + 1 layer, W (l) is the weight matrix, and σ is the sigmoid activation function. In this model, the two-layer GCN is used to obtain the spatial characteristics of traffic flow data, as described in A is a symmetric normalized Laplacian, which is obtained by symmetrically normalizing the adjacency matrix A. W 0 represents the weight matrix from the input layer to the hidden layer, and W 1 represents the weight matrix from the hidden layer to the output layer. P is the length of the characteristic matrix X, H is the number of hidden cells, and T is the prediction length. ReLU is a commonly used activation function in neural networks.
GCN learns the spatial characteristics of traffic flow data by setting the corresponding adjacency matrix for the traffic road code and the connection between traffic roads through a road is abstracted as a node. e spatial dependence characteristics of traffic roads are shown in Figure 4.

Temporal Correlation Modeling.
e traffic flow in the real world fluctuates with the change of time. At present, the most commonly used neural network model for processing time series is the RNN model, but RNN has the defects of gradient explosion, gradient disappearance, and unable to save data for a long time. LSTM is a variant of RNN, which effectively solves the defects of RNN. LSTM is composed of input gate, forget gate, and output gate. e input gate and the forget gate are used to retain and forget the input information, and the output gate is used to export the current state. However, LSTM has some defects, such as complex model structure and long training time. GRU model replaces the input gate and forget gate with an update gate based on the LSTM model, which reduces the complexity of the model, reduces the training time, and improves the training efficiency.
As shown in Figure 5, x t represents the traffic flow information at time t. z t is the update gate used to retain the status information of the previous time to the current status. r t is the reset gate for ignoring the state information of the previous time. h t is the memory for storing the information at time t. h t− 1 represents the hidden state at time t − 1, h t represents the output state at time t. GRU model obtains the state at time t through the hidden state at time t − 1 and the current traffic flow data as inputs. e GRU model can not only capture the traffic flow information at the current time, but also retain the traffic flow information at the historical time, so it can learn temporal correlation. e structure of GRU is shown in Figure 5.
Considering traffic flow data has a bidirectional temporal correlation on historical data and future data. Bi-GRU is used to learn historical data and future data at the same time, to fully extract the temporal correlation. e structure of Bi-GRU is shown in Figure 6.
h t− 1 represents the output at time t − 1, h t represents the output at time t, and h t+1 represents the output at time t + 1. z tf is the update gate of the forward GRU at time t, and z tb is the update gate of backward GRU at time t. r tf is the reset gate of the forward GRU at time t, r tb is the reset gate of backward GRU at time t. h tf is the memory for storing the forward information at time t. h tb is the memory for storing the backward information at time t.
Bi-GRCN obtains the topology of traffic road through GCN and the dynamic change of traffic flow with time through Bi-GRU. It then processes the complex spatial and temporal characteristics of traffic flow through the fully connected layer, and realizes traffic flow prediction finally.

Loss
Function. Y t represents the actual traffic speed and Y t represents the predicted traffic speed in Bi-GRCN. e goal of the model training is to minimize the error between the actual traffic speed and the predicted traffic speed. e loss function of the Bi-GRCN is shown in λ is a hyperparameter. L reg is introduced to avoid overfitting, and L reg is the regularization term of L2.

Experimental Data.
e experimental data set is the trajectory of taxis in Shenzhen from January 1 to January 31, 2015. e research area is 96 main roads in Luohu District. e experimental data are composed of an adjacency matrix that represents spatial dependence and a characteristic matrix that represents temporal correlation. e adjacency matrix has 96 rows and 96 columns to describe the spatial dependence between 96 roads. e row number and column number have corresponded to the road number in the adjacency matrix. e values in the adjacency matrix represent the connection relationship between roads, 0 represents that the two roads are not connected, and 1 represents that the two roads are connected. e characteristic matrix describes the vehicle speed on the road, which changes with time. Each column represents a road, and each row represents the speed of a road in different periods. e vehicle speed on the road is calculated every 15 minutes, so the characteristic matrix has 2976 rows in total. Use 70% of the data as the training set and 30% of the data as the test set to predict the vehicle speed in the next 15 minutes, 30 minutes, 45 minutes, and 60 minutes.

Baseline Methods.
To evaluate the performance of the proposed method, this paper uses the following baseline methods in comparison with Bi-GRCN: HA [18]: the average value of historical traffic flow data is used as the predicted value of traffic flow. ARIMA [12,13]: traffic flow data are treated as random time series. e non-stationary data are transformed into stationary series data through multiple differential calculations, and then the traffic prediction value is obtained by using Autoregressive Moving Average (ARMA) [44]. SVR [45]: Support Vector Regression (SVR) uses regression analysis to solve the problem of traffic flow prediction based on the principle of SVM [19,20]. e traffic parameters such as vehicle speed inputs the trained SVR and outputs the traffic flow prediction results in the corresponding period. e kernel function that has been selected is the key to using SVR. e kernel function used is a linear kernel in this paper. GCN [31,32]: GCN is a GNN [43] that uses the convolution operation. e traffic flow with spatial relationship inputs the trained GCN and outputs the traffic flow prediction results in the corresponding period. GRU [25]: GRU uses a gate unit to select information and forget data at the same time, and the model has high training efficiency. e traffic flow with time attribute characteristics inputs the trained GRU and outputs the traffic flow prediction results in the corresponding period.

Evaluation Methods.
Four metrics are used to evaluate the performance of Bi-GRCN, as shown in equations (15) to (18).

Journal of Advanced Transportation
Mean Absolute Error (MAE):

Hyperparameters.
e setting of hyperparameters determines the prediction effect of Bi-GRCN. In the experiment, the hyperparameters of the Bi-GRCN mainly include batch size, training epoch, learning rate, and the number of hidden units. Comparing the prediction effect of batch size set to 32 or 64, the batch size is set to 32 in the experiment. Comparing the prediction effect of the training epoch set to 3000 or 5000, the training epoch is set to 3000 in the experiment. We manually set the learning rate to 0.001. e number of hidden units is the most important parameter of the Deep Learning Model. Different numbers of hidden units have a great impact on the prediction results. To choose the best value, we experiment with different hidden units. We choose the number of hidden units from [16,32,64,80,96,100,128] and analyze the change of prediction precision.
As shown in Figure 7, the horizontal axis represents the number of hidden units, and the vertical axis represents the values of RMSE and MAE. Figure 7 shows the results of RMSE and MAE for different hidden units. It can be seen that the prediction error is the smallest when the number is 128. As shown in Figure 8, the horizontal axis represents the number of hidden units, and the vertical axis represents the values of accuracy and var. Figure 8 shows the results of accuracy and var for different hidden units. It can be seen that the prediction precision is the maximum when the number is 128. Based on the four evaluation metrics in Figures 7 and 8, the prediction result is the best when the number of hidden units is set to 128. erefore, we set the number of hidden units to 128 in the experiment.

Comparative of Experiments Using Different Models.
We set the batch size to 32, training epoch to 3000, learning rate to 0.001, and the number of hidden units to 128 in the Bi-GRCN model. 70% of the overall data set is used as the training dataset, and the remaining data is used as the testing dataset.
e Bi-GRCN model is trained using the Adam optimizer.
e prediction performance of the model is learned at four different time intervals of 15 minutes, 30 minutes, 45 minutes, and 60 minutes on the dataset. e prediction results of the Bi-GRCN model and other baseline methods are shown in Table 1.

Analysis of Experimental
Results. Spatio-temporal prediction capability. To verify whether the Bi-GRCN model could capture spatial and temporal features from the dataset, we compare the Bi-GRCN with the GCN and the GRU. Compared with the GRU, which considers only temporal features, for 15 minutes, 30 minutes, 45 minutes, and 60 minutes traffic forecasting, the accuracy of the Bi-GRCN is increased by approximately 2.59%, 1.16%, 0.25%, and 0.85%, indicating that the Bi-GRCN can capture temporal correlation well. Compared with the GCN, which considers only spatial features, for 15 minutes, 30 minutes, 45 minutes, and 60 minutes traffic forecasting, the accuracy of the Bi-GRCN is increased by approximately 11.77%, 14.11%, 14.72%, and 16.02%, indicating that the Bi-GRCN can capture spatial dependence well. e accuracy comparison between GRU and Bi-GRCN is shown in Figure 9.
e accuracy comparison between GCN and Bi-GRCN is shown in Figure 10.
Model prediction ability. According to the analysis of the data in Table 1, Bi-GRCN has better prediction performance than other baseline models. Compared with the GRU, GCN, HA, ARIMA, and SVR for 15 minutes, the RMSE of the Bi-GRCN is decreased by approximately 5.29%, 18.9%, 3.63%, 35%, and 7.57%, indicating that the Bi-GRCN can capture spatial dependence and temporal correlation well. e main reason for the worse prediction of ARIMA is that it is difficult to deal with long series of non-stationary data, and GCN ignores the temporal correlation of traffic flow data     which is only considered the spatial dependence. RMSE of various models is shown in Figure 11. Long-term prediction ability. As shown in Figure 12, the horizontal axis represents different times, and the vertical axis represents four evaluation metrics. RMSE and MAE represent the prediction error of Bi-GRCN. Accuracy and var represent the prediction accuracy of Bi-GRCN. e prediction results show that the prediction error and prediction accuracy of Bi-GRCN change little with time, indicating that Bi-GRCN has certain stability. No matter how the time changes, the model can obtain the best prediction results. erefore, Bi-GRCN can be used not only for short-term traffic flow prediction, but also for medium-term and long-term traffic flow prediction.

Conclusion
We propose a new traffic flow prediction model Bi-GRCN based on GNN, which combines GCN and Bi-GRU. e traffic flow graph network is modeling, the road is represented by the nodes, the connection relationship between roads is represented by the edges, and the traffic flow information on the road is represented by the attributes of the nodes. We use real traffic data in the experiment, and compare Bi-GRCN with other Neural Network models and traditional traffic prediction methods. e experimental results show that compared with GCN and GRU, Bi-GRCN has higher accuracy and better traffic prediction performance. Compared with the traditional traffic prediction methods HA, ARIMA and SVR, Bi-GRCN is also more effective. As weather, weekdays, holidays, traffic accidents, and other factors will also affect the prediction results, we will consider these factors in future research.
Data Availability e terms of use of the data used in this study do not allow the authors to distribute or publish the data directly. However, these data can be obtained directly from the following webpage: https://opendata.sz.gov.cn/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.