Prediction of Large Scale Spatio-temporal Traffic Flow Data with New Graph Convolution Model

Prompt and accurate prediction of traffic flow is quite useful. It will help traffic administrator to analyze the road occupancy status and formulate dynamic and flexible traffic control in advance to improve the road capacity. It can also provide more precise navigation guidance for the road users in future. However, it is hard to predict spatiotemporal traffic flow data in large scale promptly with high accuracy caused by complex interrelation and nonlinear dynamic nature. With development of deep learning and other technologies, many prediction networks could predict traffic flow with accumulated historical data in time series. In consideration of the regional characteristics of traffic flow, the emerging Graph Convolutional Network (GCN) model is systematically introduced with representative applications. Those successful applications provide a possible way to contribute fast and proper traffic control strategies that could relieve traffic pressure, reduce potential conflict, fasten emergency response, etc.


Background and current status
Traffic problems such as frequent traffic congestion, serious traffic accidents, and long commuting times have seriously reduced the travel experience of passengers and the efficiency of traffic operations [1]. To cope with these problems, researchers work on improving the traffic control strategies based on prediction of future traffic stratus [2]. Traffic flow is one of important road conditions to access [3]. Based on prompt and accuracy perdition, better and fast-adjusted traffic control and guidance could be applied. Therefore, reliable traffic flow prediction is also one of the key factors to upgrade the traffic system from "passive adjust" to "active control in advance"; even prediction of future short-term traffic status of road sections is quite useful to prevent congestion deteriorate. For traffic management departments, early detection of traffic instability and abnormal potential risks based on reliable prediction data can improve a large number of existing traffic management control applications, such as traffic calming, signal control, etc.; for road users, real-time route updates and adjustments based on dynamic traffic prediction results can adjust travel time and routes before congestion develops, thus providing vehicles to plan a driving path to avoid congested road sections and congested intersections or to plan a path with the shortest driving time for vehicles to improve traffic efficiency.
The current traffic prediction also faces the following challenges, as shown in Figure 1: (1) analyzing the spatial correlation of the road network: some roads are adjacent to each other and have different degrees of influence on upstream and downstream traffic volumes, so the traffic flow in this part is spatially correlated, and it is a challenge to consider the spatial location relationship to correlate the neighboring traffic flow characteristics [4]; (2) unlike the regular network layout, the traffic map structure is irregular; (3) the nonlinear retention of medium and long time prediction models: the traffic flow changes drastically at the peak time, which is difficult to predict, especially as the prediction time increases, the nonlinear retention ability of the model decreases and the time series signal gradually decays, so how to better correlate the time series relationship of traffic flow to maintain the steady-state time series prediction is also a long-term challenging task [5].

Related works
Traffic flow forecasting is based on historical traffic flow data to predict future traffic flow, which is a typical regression problem of traffic network time series [6]. In order to solve the traffic flow forecasting problem, factors such as traffic patterns, data types, spatial locations, and time periods need to be considered. Nowadays, many computational forecasting methods have been widely used in traffic flow prediction and have achieved good research results. As shown in Figure 2, common traffic flow prediction methods can be divided into three major categories [7]. ① early traffic flow prediction methods; ② machine-learning-based traffic flow prediction methods; ③ deep-learning-based traffic flow prediction methods.

Early traffic flow prediction methods
Early traffic flow prediction methods mainly model the relationship between traffic flow, speed, and density and regress the traffic flow data as well as optimize the parameters to achieve the fitting prediction of traffic data, mainly including statistical models and traffic simulation.

Typical methods
• Miska et al. [8] proposed cellular automata (CA) to simulate each participant of different flows and their interaction phenomena. • Ngoduy et al. [9] proposed the use of static and dynamic assignment methods to allocate traffic on a simulated road network.
• Stephanedes et al. [10] have applied the historical mean model (HA) in urban traffic control systems in 1984.
• Kumer et al. [11] used the Autoregressive Integrated Moving Average (ARIMA) model to represent the predicted traffic flow in the form of a mathematical model.

Advantages and disadvantages
Models such as statistical mathematical models and traffic simulations can describe this traffic flow prediction as a time series problem approximately. However, simulation systems and simulation tools still need to consume a lot of computational power and skilled parameter settings to reach a steady state, and it is more difficult to get accurate prediction results from this prediction model due to the complexity of traffic scenarios. Besides, these methods based on statistics are only applicable to linear data, while traffic flow data are nonlinear and complex; thus, such methods are not capable of handling complex nonlinear traffic data.

Traffic flow prediction methods based on machine learning
With the demand for high accuracy in intelligent traffic scenarios, the shortcomings of traditional prediction methods that cannot model the complex state of traffic flow become more and more prominent, and machine learning methods gradually take an important place in traffic flow prediction tasks.

Typical methods
• YS et al. [12] proposed a traffic flow prediction method using support vector machine regression. The idea of using support vector machine method is to map the low-dimensional nonlinear traffic data to a high-dimensional space by introducing a kernel function before linear classification.
• Zhu et al. [13] predict the path traffic volume and roadway flow by building a Bayesian network model. • Qi et al. [14] proposed a Hidden Markov Model (HMM) for short-term highway traffic prediction.

Advantages and disadvantages
Machine learning methods can better model the stochastic processes and nonlinear properties of traffic flows and have mostly better performance compared with traditional models. However, such methods do not consider the spatial and temporal correlation of traffic flow data and require extensive feature engineering. Therefore, it is difficult to solve complex traffic flow prediction problems.

Traffic flow prediction methods based on deep learning
Deep learning has been very successful in the fields of computer vision, speech recognition, and natural language processing, and more and more scholars are applying deep neural networks (DNNs) to various real-world scenario tasks. In traffic flow prediction, the models can be classified into road section prediction and area prediction according to their prediction range characteristics.

Typical methods
• Chen et al. [15] proposed a convolutional neural network (CNN)-based traffic flow prediction method using time series folding for multi-scale learning.
• Yu et al. [17] proposed a Long-Short Term Memory (LSTM)-based method for traffic flow prediction on road networks during peak periods.
• Cho K et al. [18] proposed a gated recurrent unit that can establish links for traffic data at adjacent moments and preserve the memory by gating and other means to learn long-term dependencies of traffic flow sequences.
• Yu et al. [19] proposed a spatiotemporal graph convolutional network (STGCN) for traffic flow velocity prediction on a multi-scale traffic network.
• The ST-ResNet [20] model restricts the input to grid data rather than graph structure in the traffic prediction problem, which makes it difficult to make predictions on complex highway data.
• Geom-GCN [21] proposed a network to update the node representation, but it could not capture the distance dependence between nodes.
• The DCRNN model (2018ICLR) [22] models spatial correlation as a diffusion process on directed graphs to model the translation of traffic flows and proposes diffusion convolutional recurrent neural networks capable of capturing spatial and temporal dependencies between time series using the seq2seq framework.
• The GMAN model (2020AAAI) [23] uses an attention mechanism to model dynamic spatial and nonlinear temporal relationships, respectively.
ASTGCN [24] considers only low-order neighborhood relationships between nodes and ignores correlations between different historical time periods.

Advantages and disadvantages
The core of the traffic prediction problem lies in how to effectively capture the spatiotemporal dimensional features and correlations of the data. Traditional convolutional neural networks can effectively extract local features of data, but can only work on standard grid data. The graph convolution can directly extract features from graph structured data and automatically mine the spatial patterns of traffic data. The convolution operation along the time axis can extract the temporal patterns of traffic data. Therefore, this paper focuses on the deep learning model based on graph convolutional network to capture the spatial and temporal characteristics of traffic data and effectively solve the traffic flow prediction problem.

Future directions for exploring traffic flow prediction models
GCN has become a mainstream method in the field of traffic flow prediction, but it started late, and its theoretical foundation and research depth are far from enough. At present, it still faces many problems that need to be solved. There are three main directions as follows.
1. Dynamic graph modeling: Most graph structures processed by GCN networks are static graphs, and there are fewer models involving dynamic graph structures. The graph structure of static graphs is static and unchanging, while the vertices and edges of dynamic graphs change randomly or even disappear, making it difficult to follow the rules.
2. Heterogeneous graph modeling: Homogeneous graph means that nodes and edges are only one type, and this kind of data is easier to handle. The heterogeneous graph refers to the type of nodes and edges, the same node and different node connections will show different properties, the same edge and different node connections will also show different relationships, heterogeneous graph structure is relatively complex to deal with. However, the heterogeneous graph is the most relevant scenario to the actual problem.
3. Deepening the model structure: One of the inherent advantages of GCN is that it smoothes the graph signal, but as its layers keep deepening, its training results are highly susceptible to over smoothing. Since graph convolution is a special form of Laplacian smoothing, the smoothing operation makes the signal more consistent at the feature level as the graph convolution aggregates the features of neighboring nodes, thereby causing the signal to lose its diversity and leading to a sharp performance degradation in the relevant prediction task, a phenomenon that is more pronounced on small data sets. Therefore, GCNs cannot be stacked continuously and deeply like general convolutional models, but shallow neural networks suffer from limited perceptual field and feature extraction capabilities.

Traffic prediction based on graph convolution
This section introduces the principles and techniques related to traffic flow prediction based on graph neural networks. First, an overview of graph neural networks is given, and the graph convolutional networks (GCNs) [25] used to capture the spatial dependence of traffic flows in the road network are introduced separately in this paper. Secondly, the transformation of graph structure into actual road traffic graph structure modeling method is introduced; finally, this paper models the GCN on urban road networks and uses the topology of the GCN capture graph to handle the spatiotemporal traffic prediction task, and the application scenarios of traffic flow prediction are added at the end of the paper.

Graph theory
Graph is a common data structure that is an important object of study in the field of computer and data science [26]. A graph usually consists of two elements, Vertex and Edge, where the vertices correspond to an abstract representation of the object of study and the edges represent the interconnection between two of the objects. Graphs are often used to represent things and specific relationships between things; in fact, graphs can represent any system with binary relationships. Graphs have a very wide range of applications in real life; social networks of human life, citation systems, urban transportation networks, and biochemical molecules can be effectively represented by graph structures.

Basic concept
In graph theory, a graph is usually represented as a set of vertices and edges [27], , v n f g is the set of vertices and the elements in this non-empty set are the vertices of the graph. The set of edges can be denoted as E. A graph can be classified into directed and undirected graphs depending on whether the edges in the graph have directionality or not. If the edges of a graph have directionality, such edges are called directed edges as in Figure 3(a), and if the edges of a graph have no directionality, then the corresponding graph is an undirected graph as in Figure 3 (b). The graphs can be classified into weighted and unweighted graphs according to the presence or absence of specific weights of the edges in the graph [28]. Each edge in a weighted graph has a real weight as in Figure 3(c), which represents the degree of connection between two vertices or the "distance" between two vertices. For example, in a traffic network, the weight of an edge can characterizes the physical distance between two vertices. In contrast, the other category is the unweighted graph, which can also be understood as the unweighted graph in which all the edge weights are equal.

Algebraic representation of graphs
As a common data structure, graphs have many kinds of algebraic representations, and common storage representations include adjacency matrices [29], adjacency tables, and association matrices. Among them, adjacency matrices are widely used in graph representation learning because they can represent the constructional properties of graphs well and are easy to combine with matrix operations to understand the structural features of graphs.
If two vertices of an edge in a graph are v i and v j , then v i and v j are said to be their respective neighbors. We define the set of neighbors of v i as N v i ð Þ: The degree of v i is defined as: the number of edges with v i as the endpoint, denoted as deg v i ð Þ, and therefore, In a directed graph, the degree of a vertex can be divided into out-degree and in-degree. The number of directed edges starting at vertex v i is called the out-degree of v i , and the number of directed edges ending at vertex v i is called the in-degree of v i . The sum of the entry and exit degrees of a vertex is equal to the degree of the vertex, and the sum of the degrees of all nodes is equal to twice the number of all edges. e ij and e ji represent edges in different directions between two vertices, e.g., e ij represents an edge in the direction from i to j, while e ji is the opposite.
The degree matrix is a matrix of the degrees of the vertices, so that the elements at the main diagonal positions are the vertex degrees and the remaining elements are 0. Accordingly, the directed graph has an entry degree matrix and an exit degree matrix. The adjacency matrix is a matrix used to represent the relationship between vertices. For graph G ¼ V, E ð Þ, the adjacency matrix can be expressed as: The core idea of the adjacency table of a graph is to have a neighbor table for each vertex of the vertex set. The association matrix is used to represent the direct association of nodes and edges and is defined as: The Laplace matrix [30] is a special matrix that is often used in graph theory to study the structural properties of graphs. The Laplace matrix is defined as L ¼ D À A, where D is the degree matrix of the graph and A is the adjacency matrix of the graph. Figure 4 shows the Laplace matrix representation of a simple graph.

Graph convolutional networks
Previous classical convolutional networks based on deep learning mostly consider regular data in Euclidean space in processing data. When inputting ordered data with fixed dimensions (e.g., images, speech, video, etc.), the convolutional operation and the capture and compression of the pooling layer make the network fitting effect remarkable. However, when faced with sequentially disordered road network traffic data with variable dimensions, the suitability of the traditional convolution operation decreases. However, graph neural networks (GNNs) can handle the abovementioned irregular graphs by passing node features into the neural network during iteration and outputting the node states. The original GNNs converge the hidden state to a fixed point based on the "immobile point" theory, which is ineffective for extracting edge information, and in the specific scenario represented by the graph, some feature information is shared among nodes due to the fixed convergence, making the actual information obtained scarce. Therefore, two types of Graph Convolutional Network (GCN) based on frequency domain and null domain are generated. Two types of GCN models: null domain convolution is the same as the traditional convolution method, which can convolve directly at the pixel point of the picture; frequency domain convolution needs to start from the graph signal processing, treating the kernel in the convolution as a filter and the learned features as signals for weighted summation.
As shown in Figure 5, the common network framework for graph convolution is illustrated. First, the neighboring nodes of the input graph structure are updated with a layer of convolution operation, and then a layer of ReLU activation function is added to obtain the basic convolution layer plus activation function structure. The above structure is stacked sequentially until the number of stacked layers reaches the prediction of the model, and the output part transforms the node features into labels for the relevant tasks. Unlike GNN circular iterative parameter sharing, GCN is a multilayer stack and the parameters are different for each layer.
Further, it mainly includes graph convolution based on the spectral domain (frequency domain) and the null domain. The spectral domain approach is to construct CNN simulations into the spectral domain by considering the localization of graph convolution through spectral analysis, such as Spectral Graph Convolution (SGC), which mainly focuses on the continuous derivation and improvement of the core formulations of spectral graph theory to reduce the computational power of the model from the perspective of optimization parameters. Empty domain methods perform convolution filters directly on the nodes of the graph and their neighborhoods, such as Diffusion Graph Convolution (DGC).

Spectral Domain-based Graph Convolution Network (SGC)
Spectral domain approach [31]: The absence of graph translation invariance poses difficulties in defining convolutional neural networks in the nodal domain. The spectral domain approach uses the convolution theorem to define the graph convolution from the spectral domain. The spectral domain graph convolution network is proposed based on graph signal processing, where the convolution layer of the graph neural network is defined as a filter, i.e., the filter removes the noise signal to obtain the result of the input signal. In practical applications, it can only be used to process graph structures that are undirected and have no information on the edges. The Fourier transform of the signal f(x) and its inverse transform are: where φ varphi denotes the Fourier transform. It can be found that the Fourier transform that changes the time domain to the spectral domain is essentially the integral of the summation f x ð Þ with exp Àiwx ð Þas the basis vector. Defining the graph G of the input signal as a characteristic decomposable Laplace matrix L ¼ D À A, the normalized Laplace matrix L is defined as: where D denotes the degree matrix of graph G, A denotes the adjacency matrix, and I n is the unit matrix of order n. After performing the eigendecomposition, it can be expressed as the universal structure L ¼ UΛU T . where Λ is a matrix with each eigenvalue as a diagonal element, and U is a vector matrix composed of eigenvectors corresponding to each eigenvalue. Since U is an orthogonal matrix, the basis of the conventional Fourier transform exp Àiwx ð Þis then replaced by U T and expressed in matrix form to obtain the Fourier transform of the signal x on the graph as: where x refers to the original representation of the signal,x refers to the signal x after transforming it to the spectral domain, and U T denotes the transpose of the eigenvector matrix for doing the Fourier transform. The inverse Fourier transform of the signal x is: Using the Fourier transform and the inverse transform on the graph, the graph convolution operation can be implemented as follows.
where * G as denotes the graph convolution operator, x denotes the signal in the node domain on the graph, g is the graph convolution kernel, and ⨀ refers to the Hadamard product, which denotes the multiplication of the corresponding elements of two vectors. By replacing the vector U T y with the diagonal array g θ theta, the Hadamard product is transformed into a matrix multiplication. The graph convolution operation is denoted as Ug θ U T x.
To solve the excessive computation of Laplace eigenvalues and eigenvectors, Defferrar et al. [32] proposed ChebNet based on Chebyshev polynomials. The eigenvalue matrix is approximated by Chebyshev polynomials, and the Chebyshev polynomials are as follows.
where θ is the Chebyshev coefficient; T kΛ À Á is the k-th order Chebyshev polynomial ofΛ;Λ ¼ 2∧ λ max À I n ,Λ are the normalized eigenvalue diagonal matrices. Thus, the convolution operation can be expressed as follows: where 2L λ max À I n ; the computational complexity of the graph convolution calculation is reduced from O N 2 À Á to O LE ð Þ by replacing the Chebyshev expansion T kΛ À Á with the eigen-decomposition part of the frequency domain convolution g θ in the original GCN, effectively avoiding the computational part of the eigen-decomposition, where E is the number of edges in the input graph and L is the order of the Laplace operator polynomial. ChebNet results in a significant reduction in computational complexity and a significant improvement in computational efficiency.
After that, Kipf et al. used first-order Chebyshev polynomials and simplified the spectral graph convolution by restricting the parameters in order to make ChebNet have better local connectivity properties.
Then the graph convolution calculation is simplified as: Where: θ 0 and θ 1 are free parameters, shared by the whole graph. Let θ ¼ θ 0 ¼ Àθ 1 , i.e., the two parameters are transformed into a one-parameter model, then the graph convolution is calculated as: However, since I n þ D À 1 2 AD À 1 2 is the eigenvalue of 0, 2 ½ , which may lead to the problem of disappearing, exploding or unstable values of neural network gradients, D À 1 2ÃD À 1 2 is used instead of I n þ D À 1 2 AD À 1 2 for normalization.

Graph Convolutional Network (DGC) based on spatial domain
The spatial domain approach: spatial-based graph convolutional networks were first proposed in Neural Network for Graphs (NN4G), which is different from the spectral domain graph convolutional neural network from signal processing theory, the spatial domain graph convolutional neural network starts from the nodes in the graph, designs the aggregation function to gather the features of neighboring nodes, adopts the message propagation mechanism, and thinks about how to accurately and efficiently use the features of neighboring nodes of the central node to update the features of the central node. The essence of CNN is weighted summation, and the spatial domain graph convolutional neural network is based on the basic construction process of CNN to accomplish the purpose of GNN aggregation of neighboring nodes from the perspective of summation. Since the nodes in the graph are unordered and the number of neighboring nodes is uncertain, one idea of the spatial domain graph convolutional neural network is (1) to fix the number of neighboring nodes and (2) to sort the neighboring nodes. If the above two tasks are completed, the non-Euclidean structured data becomes ordinary Euclidean structured data, and naturally the traditional algorithm can be completely migrated to the graph. Among them, step (1) also facilitates the application of GNN to graphs with many nodes.
Currently, GCN has become a fundamental model for traffic flow prediction research and a benchmark method for experiments. Although neither the airdomain graph convolution network nor the frequency-domain graph convolution network is proposed for the traffic flow prediction problem, the natural graph structure property of traffic data makes GCN show high efficiency and accuracy in the field of traffic flow prediction than the traditional methods.

Modeling experiments for traffic prediction
This section will first give a specific definition of the traffic flow prediction problem and then give the flow of the traffic flow prediction model based on spatiotemporal characteristics.

Construction of traffic road network graph structure
Traffic prediction is a typical time series prediction problem [33], and its road network traffic flow data exhibits a high degree of periodicity, which provides a great deal of potential for traffic prediction. Figure 6 shows the traffic data for the first week of December for individual toll stations on the Shaanxi Provincial Freeway, demonstrating a high degree of periodicity.
Given the first M flow observations, the flow data measured at the n sensor stations at time step H can be viewed as a matrix of size M Â N. The most likely flow measurements predicted at the next H time steps are: where F t ∈ R n is a vector of observations for n road segments at time step t, where each element records the historical observations for a single road segment. For unordered road network traffic data, the observations F t are not independent and can be viewed as graph signals defined on an undirected graph G with weight as shown in Figure 7, the graph is expressed in terms of an adjacency matrix G t ¼ V t , E, W ð Þ . F t is a finite set of vertices corresponding to the observations of the n toll stations in the traffic.E is a set of edges representing the connections between stations, and W ∈ R nÂn represents the weighted adjacency matrix of G t .

Traffic datasets
Traffic flow prediction by deep learning requires a large amount of data support, that is, real-world road traffic speed data. With the continuous improvement of  traffic facilities, the amount of traffic data has also produced an explosive growth. Traffic flow prediction is precisely based on huge traffic data, so understanding the current common traffic data is the basis for achieving traffic flow prediction. The sources of traffic data mainly include road fixed-point detectors, vehicle GPS records, bus IC cards, license plate recognition, cell phone data, etc. We have made the common traffic data used for traffic flow prediction as Table 1.

Data preprocessing
Traffic flow data mainly detects parameters such as speed, flow rate, time, etc. The data collection process may result in detection equipment failure, instrument error, software failure, communication interference, environmental noise, etc., and even sudden road failure may have a great impact on the data, resulting in real-time data may be missing or abnormal, so the overall process of validity processing of this type of traffic data according to its type is shown in Figure 8.
• Abnormal data processing The preprocessing methods of abnormal data can be divided into two categories: Data rejection. Data rejection can be used when there is less erroneous data in the traffic data. The rejection of individual erroneous data will not affect the integrity and trend of the data, but if the proportion of erroneous data is large, the rejection method cannot be adopted because too much rejection of erroneous data will destroy the integrity of the data and its trend. Peak denoising. Since traffic data is highly nonlinear and the traffic data at peak hours can be very significant, i.e., the noise oscillation region during peak hours, peak denoising is needed. Commonly used methods such as empirical mode decomposition (EMD), i.e., fluctuation decomposition in the local oscillation part of the trend change.
• Missing data processing [34] Missing data is caused by hardware and software factors that do not detect data at the detection end or packet loss during data communication. In road traffic, this can be due to excessive vehicle density and inaccurate data collection by traffic flow detection instruments, data failures in transmission, and many other reasons for gaps in the collected data, such as missing data at a point in time, a certain period, or several periods of time. Typically, there are two classical missing patterns in time series data as shown in Figure 9 below. Figure 9(a) indicates that the exported toll records have randomly lost observations at a single toll station, and the white circles indicate the missing values. Figure 9(b) indicates that there are several consecutive time points in the records of multiple toll stations with no observed values, which is  a more common pattern of missing spatiotemporal traffic data. The green curve in the green panel represents the observed values and the gray curve represents the missing values. This situation requires correlating and processing the missing data, and then repairing the data using interpolation and smoothing algorithms, prior to dimensionlessizing the data using initialization operators to consider the fact that the units and orders of magnitude of the characteristic series of influencing factors are not uniform.
• Data normalization/normalization Generally, the obtained traffic data are scattered, and the distribution characteristic curve presented by the data is fuzzy, and the distribution cannot be determined. Therefore, the data do not satisfy the normal distribution and need to be normalized to regularize the data and improve the comparability between the data to facilitate the subsequent model prediction. The data are z-core normalized to approximately satisfy the normal distribution, so that the weights are more evenly distributed in the subsequent model training, i.e., where μ, σ are the mean and standard deviation.

Classical graph convolution framework
To solve the problem of non-Euclidean structure of traffic network data, graph neural networks are often used to model spatial dependencies in traffic networks, and then convolution is used to fundamentally improve the efficiency of graph analysis and network construction from frequency and spatial domains, i.e., Graph Convolutional Network (GCN). Graph Convolution extends traditional convolution to graph-structured data, and powerful methods such as graph convolutional networks and their variants are widely used for these spatiotemporal network data prediction tasks with good performance. Most existing graph convolutional traffic flow forecasts are spatiotemporal in nature, since most traffic data sets have both spatial and temporal attributes. The development of traffic flow prediction models based on graph convolutional networks is presented as in Figure 10. In this paper, five of the most typical and most referenced models will be selected for illustration.

STGCN predicts traffic flow
The STGCN model proposed by Yu et al. [19] (2018AAAI) for the first time uses graph structures to model traffic networks while using graph convolution to model spatiotemporal sequences and uses pure convolutional structures to extract spatiotemporal features from the graph structures simultaneously.
STGCN is composed of several spatiotemporal convolutional blocks, each of which is formed as a "sandwich" structure with two gated sequential convolution layers and one spatial graph convolution layer in between. The framework STGCN consists of two spatiotemporal convolutional blocks (ST-Conv blocks) and a fullyconnected output layer in the end. Each ST-Conv block contains two temporal gated convolution layers and one spatial graph convolution layer in the middle. The residual connection and bottleneck strategy are applied inside each block.
The model, although using convolution instead of LSTM-like patterns, does speed up training, but it also leads to missing historical data information and can only achieve short-term prediction, not long-term prediction, and graph convolution captures information between different nodes to model spatial models, which does not seem to make good use of the potential relationships between different regions.

DCRNN for predicting traffic flow
The DCRNN model proposed by Li et al. [22] (2018ICLR) models spatial correlation as a diffusion process on directed graphs, thus modeling the transformation of traffic flow, and proposes diffusion convolution recurrent neural networks that can capture the spatial and temporal dependence between time series using a framework of seq2seq. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic prediction that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph and the temporal dependency using the encoderdecoder architecture with scheduled sampling.

GMAN prediction of traffic flow
The GMAN model (2020AAAI) proposed by Zheng et al. [23] uses a spatiotemporal attention mechanism to model dynamic spatial relationships and nonlinear temporal relationships separately, while using a gating mechanism to adaptively fuse the information extracted by the spatiotemporal attention mechanism.  Because the whole traffic is a network, the error of one node is amplified by other nodes, which affects the final prediction results. To solve the above problem, GMAN adopts an encoder-decoder architecture, where encoder is used to extract features and decoder to predict. A transformed attention layer is applied in between these two to transform the encoded traffic features to generate a sequential representation of future time steps as the input to the decoder. Here Encoder and Docoder are composed of ST-attention block. Then the authors use an STE block to combine the spatial and temporal information and then input into the ST-ATTENTION block to solve the problem of complex time-space correlation. Finally, the experimental results of the article on two real-world traffic prediction tasks (i.e., traffic volume prediction and traffic speed prediction) demonstrate the superiority of GMAN.

Graph WaveNet prediction of traffic flow
Wu et al. [35] propose in this paper a novel graph neural network architecture, Graph WaveNet, for spatial-temporal graph modeling. The model uses the idea of diffusion convolution in extracting spatial features of road networks and adds a novel adaptive connection matrix to make up for the deficiency of fixed topology in extracting spatial features and employs dilated causal convolution and gate mechanism on time series without the traditional RNNs cycle, which is validated by METR-LA and PEMS-BAY data sets, GWN in terms of training effect and time good results were achieved.

ASTGCN prediction of traffic flow
Guo et al. [24] propose a novel attention-based spatial-temporal graph convolutional network (ASTGCN) model to solve traffic flow prediction problem. ASTGCN mainly consists of three independent components to respectively model three temporal properties of traffic flows, i.e., recent, daily-periodic, and weeklyperiodic dependencies. More specifically, each component contains two major parts: 1) the spatial-temporal attention mechanism to effectively capture the dynamic spatial-temporal correlations in traffic data; 2) the spatial-temporal convolution, which simultaneously employs graph convolutions to capture the spatial patterns and common standard convolutions to describe the temporal features. The output of the three components is weighted fused to generate the final prediction results.
These five typical traffic flow prediction models above are compared in Table 2.

Application scenarios for traffic flow prediction
Many researchers have already applied the proposed traffic flow prediction models to various traffic scenarios and achieved excellent results. For example, by predicting the traffic flow of a roadway in advance, it can provide drivers with more advanced travel routes. In addition, it can also provide prerequisites for traffic light optimization, etc. In this paper, three scenarios are chosen to illustrate the application of traffic flow prediction in the context of highways. These three scenarios are the work that has been done by our team so far, and the reason for choosing the highway is that the work in this area is more mature.

Scenario I. quantitative assessment on truck-related road risk for the safety control
Traffic conditions of truck flow is one of the critical factors influencing transportation safety and efficiency, which is directly related to traffic accidents, maintenance scheduling, traffic flow interruption, risk control, and management. The estimation of the truck flow of various types could be better to identify the irregular flow variation introduced by various trucks and quantitatively assessed the corresponding road risks.
Jin et al. [36] first improved on the gated recursive unit (GRU) based on a deep learning approach to estimate various types of truck traffic. Then a multiple logistic regression method was proposed to classify the road risk into three classes: safe, risky, and dangerous. According to the CSV trend, road risks are classified into three categories as shown in Figure 16. Different risk classes can guide traffic

Models
Model's characteristics

STGCN (2018)
(1) Compared with traditional spatio-temporal models (RNN, LSTM) based on recurrent neural networks, the STGCN model combines graph convolution and gated time convolution, and for the first time uses pure convolutional layers to extract time and space information at the same time.
(2) The STGCN model uses one-dimensional convolution to learn information in the time dimension and is not limited by the prediction data at the previous time point, so that the model can better capture the drastic changes in the data (such as traffic flow data during peak hours).
(3) Due to the characteristics of the convolutional structure, STGCN model is parallelized at the input, with fewer parameters and faster training speed, allowing the model to process large-scale networks with higher efficiency.

DCRNN (2018)
(1) In view of the dynamic characteristics of traffic flow, DCRNN model introduces diffusion convolution when modeling spatial dependence and considers forward propagation and back propagation and is more suitable for traffic networks with a directed graph structure.
(2) When the DCRNN model models the time dependence, the matrix multiplication in the GRU model is changed to diffusion convolution, and the diffusion convolutiongated recurrent unit (DCGRU) is obtained.

ASTGCN (2019)
(1) The ASTGCN model effectively learns the dynamic spatiotemporal correlation of traffic data through the spatiotemporal attention mechanism.
(2) The ASTGCN model designed a multichannel network structure from multiple time periods, combining global time and space information to improve prediction accuracy GWN (1) In terms of spatial dependency acquisition, The GWN model constructs an adaptive dependency matrix that can retain the implicit spatial relationship of the road network.
(2) In the acquisition of time dependence, The GWN model adopts stacked dilated 1D convolution and does not need to consider the problem of information disappearance too long ago and can extract longer time dependence than RNN-based cyclic convolutional networks.

GMAN (2020)
(1) The GMAN model refines the complexity of time and space. It is divided into dynamic spatial correlations and Non-linear temporal correlations, and the attention mechanism is introduced.
(2) The GMAN model solves the cumulative error problem of stepwise prediction. A transformation attention layer is added between the encoder and the decoder, so that historical and predicted traffic characteristics can be converted. control and management and broadcast traffic information to drivers to help them choose their travel routes. Finally, the road risk calculated by the predicted truck traffic is shown in Figure 17, from which the road risk status can be obtained at every moment.

Scenario II. Improved manpower planning for highway toll gate
In China, the relatively heavy queues at freeway toll booths and service areas during peak hours, coupled with the saturation of manpower scheduled during offpeak hours, are undoubtedly a huge obstacle to efficient and cost-effective freeway operations. Therefore, it needs an intelligent manpower planning strategy to simultaneously ensure the efficiency of highway transportation management and road user satisfaction.
Jin et al. [37] addressed a high-precision prediction of vehicle flow based on historical multisource traffic data. Based on the prediction results, an improved manpower planning strategy is proposed to schedule the work accordingly. And the method was tested on a randomly selected toll station as an example, as Figure 18 shows the daily traffic pattern of the highway Hechizhai toll station.
The results show from Figure 19 that the upper part and the lower part show the lane opening at the entrance and exit of toll gates in one week, respectively. Two  narrow black dotted lines indicate the morning and evening peak hours. During the morning peak, it is obvious that two entry lanes are not used while the number of toll lanes of the exit has reached its upper threshold. The opposite phenomenon is seen in the evening peak. Therefore, we suggest that one or two of the entrance lanes can be set as a reversible lane so that the traffic pressure can be released in  peak hours. Note that the usage condition of this suggestion is that the entrance and exit of the toll gate must be adjacent such as Hechizhai toll gate.

Scenario III. Capacity analysis of toll stations
Yuan et al. [38] used the results of traffic prediction to analyze the capacity of a toll station and used different queuing models to describe the capacity of typical lanes and compared the delay time and queue length of each model and obtained that the single-way model is more efficient in a typical system. The traffic index of the multiplex lane is also simulated, and the specific simulation process is shown in Figure 20, and the capacity of the multiplex lane is obtained to be larger than that of the typical MTC lane, which can relieve the traffic pressure during the peak hours.

Conclusions
Traffic is the main driving force of urban development, and real-time and accurate traffic flow prediction is the key to the application of intelligent transportation system. Graph convolutional neural network is an efficient model for processing graph data and has received a lot of attention from researchers in the past few years. This section attempts to summarize the recent graph convolutional neural network models and their applications to traffic flow prediction.  clarified, and the structure and characteristics of different prediction models are further categorized and reviewed.
2. This section provides the traffic prediction problem with constructed traffic graph structure. Some public traffic data sets that are widely used in scientific research worldwide are introduced for traffic prediction experiments, including their data sources, data contents, and data acquisition addresses, and the whole data processing process is analyzed.
3. The application scenarios of traffic flow prediction are discussed. There are two applications are provided: 1) with prediction of the traffic flow of truck, the transportation safety and efficiency could further assess.
2) The work schedule arrangement could also improve based on the prediction of traffic flow to avoid manpower waste and allow more passing gate to open in the peak hours. Other than those applications, there still many aspects worth to explore.