Dynamic Spatiotemporal Causality Analysis for Network Traffic Flow Based on Transfer Entropy and Sliding Window Approach

With the rapid development of sensor and communication technologies, a large amount of spatiotemporal traffic data has been accumulated, presenting the characteristics of big data. ,e potential information and regularity of traffic state evolution can be extracted from the huge traffic flow time series data and applied to intelligent transportation systems. ,is study proposes a dynamic spatiotemporal causality modeling approach to analyze traffic causal relationships for the large-scale road network. Transfer entropy algorithm is utilized to detect the spatiotemporal causality of network traffic states based on the extensive traffic time series data, which could measure the amount and direction of information transmission. A combination of Gaussian kernel density estimation and sliding window approach is proposed to calculate the transfer entropy and construct dynamic spatiotemporal causality graphs based on the causality significance test. ,e indexes of affected coefficient, influence coefficient, input degree, and output degree are defined to evaluate the causal interaction of traffic states among different road segments and identify the critical roads and potential bottlenecks of the existing road network. Experimental results based on real-world traffic sensor data indicate that the structures of traffic causality graphs are time-varying; the traffic cause-effect interaction among different road segments during the peak time is more significant than that during the nonpeak time; and the critical road segments can be identified, which are mainly located at the intersections of arterial roads, undertaking the convergence and dispersion of large traffic flows.


Introduction
e rapid development of sensing and communication technologies in transportation promotes the accumulation of huge multisource spatiotemporal traffic data, which is collected by loop detectors, vehicle GPS, and mobile phones [1], presenting the characteristics of traffic big data. e valuable knowledge can be extracted from the huge observational spatiotemporal traffic data, which could be applied in the data-driven intelligent transportation systems (ITS) [2]. e diversity, uncertainty, and huge volume of traffic big data bring greater challenges to ITS. Potential traffic evolution characteristics can be extracted from extensive historical data through data mining techniques, such as correlation analysis and association rule mining. Some studies have integrated the extractive traffic correlations or association rules into the traffic flow prediction models to improve the prediction accuracy [3][4][5]. However, the directional causal interaction could not be captured by these data mining techniques. For the events A and B, association relationship can be extracted based on the statistical rule "A's existing means B's existing", but it is still not clear whether the occurrence of A leads to the occurrence of B, or other factors make A and B appear simultaneously [6]. Similarly, correlation analysis could determine whether A is related to B but could not verify the cause-effect relationship between them. e causal relationships among different objects, events, or variables are widespread in the natural and social sciences [7]. Causality can be detected from the observational nonstationary time series data, and the strengths also need to be qualified [8,9]. Many causal models have been proposed based on the probability theory, graph theory, Bayesian networks, etc. [10]. Spatiotemporal traffic causality can be discovered from the complex network traffic state. For example, if a traffic jam event happens in a certain road segment r a at time t, the traffic state of the upstream adjacent road segment r b may be congested at the next time t + 1 due to the shock waves of traffic flow, and then r a can be regarded as the cause segment of r b .
In addition to the dissemination of traffic waves in the physical network, the dissemination of traffic information can also lead to spatiotemporal causality. For example, if a serious traffic accident happens on the road segment r a ′ , the traffic management department would release the traffic accident information on the navigation platform and guide the drivers to avoid the congested segment r a ′ . en, more drivers prefer to choose another route, which leads to serious traffic congestion on the segment r b ′ . Although r b ′ is far away from r a ′ in space, real-time traffic information sharing strengthens spatiotemporal causality between them.
However, few studies have focused on the spatiotemporal traffic causality modeling so far. Previous studies on traffic causality either define the causal relationship based on the prior knowledge such as the chronological order of traffic jam events, which could not quantify potential causal strength [11,12], or use the data-driven Bayesian network method, which is not suitable for large-scale road networks because of sophisticated parameter estimations [13][14][15]. e traffic causal dependencies are usually combined with the network traffic evaluation, traffic outliers detection, or traffic state prediction [11,16].
Motivated by the lack of research on traffic causality and the challenge of traffic big data analysis, this study develops a dynamics spatiotemporal traffic causality research framework to capture the underlying causal knowledge of network traffic state for the decision making in ITS. e main scientific contributions include the following: (i) transfer entropy algorithm is first utilized to extract the traffic causeeffect interactions from the extensive traffic time series data; (ii) spatiotemporal traffic causality could be calculated dynamically and efficiently through the sliding window technique; (iii) dynamic spatiotemporal causality graphs could reveal the traffic causal structures and identify the critical road segments and potential bottlenecks of the existing road network; (iv) the proposed approach can be applied in the real-time traffic management system and combined with practical applications, such as the network traffic station evaluation and prediction. e remainder of the paper is organized as follows. Section 2 provides a summary of the literature regarding traffic causality analysis. Section 3 depicts the study framework and transfer entropy method. Section 4 describes spatiotemporal causality modeling for network traffic flow. Section 5 presents the computational experiments based on real-world traffic sensor data. e conclusions are summarized in Section 6.

Literature Review
Spatiotemporal data mining approaches have been widely applied to traffic congestion propagation and prediction. Inoue et al. [17] proposed a frequent pattern mining method to extract traffic congestion patterns from traffic sensor data and demonstrated the process of traffic congestion generation, diffusion, and dissipation from a data-driven perspective. Chawla et al. [18] proposed an optimized mining algorithm framework for inferring the root cause of anomalies from large taxis GPS data. Xiong et al. [19] developed a propagation graph approach to predict traffic congestion patterns in the near future based on the large real-world vehicle trajectory data.
Previous studies on spatiotemporal traffic causality mainly adopt simple prior knowledge methods and define the causal relationship according to the chronological order of traffic jams or abnormal conditions. Liu et al. [11] extracted the spatiotemporal causal interactions among the traffic outliers through constructing the outlier causality trees according to temporal order and spatial contiguity of detected outliers. Kapoor et al. [12] studied the causality of the traffic congestion at road intersections and how the congestion propagates from one point in the road network to all directions, and predicted possible propagation patterns.
In addition, dynamic Bayesian network method is utilized for spatiotemporal traffic causality modeling. Chu et al. [13] proposed a time-varying dynamic Bayesian network for traffic causality modeling, studied the region macro structure based on vehicle trajectory data, and extracted the road junction dependency structure from sensor data. Queen and Albers [14] proposed a multivariate dynamic Bayesian network model to capture the conditional independence and causality of traffic flow time series. e causality between variables in the Bayesian network and the lagging causality between time series in the dynamic Bayesian network could be identified by setting external interventions. Nguyen et al. [15] identified traffic congestion propagation patterns from spatiotemporal traffic data and estimated congestion propagation probabilities by dynamic Bayesian network. Potential causal relationship structure can be extracted based on the dynamic Bayesian network modeling, but a great number of computing resources would be consumed to estimate the parameters, especially for the large-scale road network. e above spatiotemporal traffic causality modeling methods are either too simple to fully capture the potential spatiotemporal causality and nonlinearity characteristic or too computationally complex, which is not suitable for largescale road network traffic state analysis.
Recently, the information-theoretic casual approaches have attracted much attention, which can measure and quantify the causality [7]. Granger causality test is an effective method to identify potential causality for time series data [20][21][22]. e principle of Granger causality is described as follows: if variable Y can be better predicted by using the historical values of both X and Y, rather than only the historical values of Y, X can be regarded as the Granger cause variable for Y. Li et al. [16] developed a Granger causality-based causal dependence mining approach for traffic predictions and revealed the relationship between the road network structure and the correlation among traffic flow time series through causal dependence graph. A linear or nonlinear relationship between variables needs to be assumed for Granger causality analysis.
Transfer entropy is a relatively emerging method in information theory, which can evaluate the causality correlation because of its asymmetry. Compared to the Granger causality method, transfer entropy does not need to assume the form of the causal relationship between variables, which is suitable for the long time series analysis of nonlinear systems, and has been widely applied in neuroscience [23], chemistry [24], finance [25], industrial processes [26], and so on. Transfer entropy can measure both the direction and quantity of information transmission, which is suitable for the nonlinear spatiotemporal causality modeling of network traffic flow.

Study Framework.
is study proposes a dynamic spatiotemporal causality modeling framework, as shown in Figure 1. Firstly, transfer entropy is adopted to detect the spatiotemporal traffic causality from the huge traffic time series data for the large-scale road network. e combined Gaussian kernel density estimation and sliding window approach is proposed to calculate the transfer entropy matrix, which can denote the dynamic nonlinear causal relationship of the traffic states among different road segments. Secondly, the causal correlation coefficient matrix is calculated based on the transfer entropy matrix. e affected coefficient and influence coefficient are defined to identify potential bottlenecks and critical road segments in the road network for different time periods. Finally, the dynamic spatiotemporal causality graphs are established based on the causality significance test. e input and output degrees are proposed to evaluate the spatiotemporal causality of network traffic states. e advantages of this research framework are as follows: (i) the nonlinear cause-effect interaction relationship could be extracted from huge traffic flow time series data, contributing to the deeper insights into complex network traffic state; (ii) the causal orientations and strength can be determined based on the asymmetry of transfer entropy; (iii) the sliding window approach can guarantee the computational efficiency of transfer entropy for the large-scale road network; (iv) the dynamic spatiotemporal traffic causality graphs can reveal the time-varying traffic causal structures.

Basic Concepts for Information Entropy.
e basic concepts for information entropy proposed by Shannon [27] are briefly explained. x i (i � 1, 2, . . . , n) is set as the states of discrete variable X. e information I(x i ) for variable x i is defined as (1), and I(x i ) ≥ 0. e larger the probability of x i , the smaller the information I(x i ), and the smaller the uncertainty of x i . Conversely, the smaller the probability of x i , the larger the information I(x i ), and the larger uncertainty of x i . When the probability of x i is 1, the information I(x i ) is 0. (1) Information entropy H(x) is defined as the mathematical expectation of information I(x i ) for X, which is calculated as (2). Information entropy H(x) can reflect the average uncertainty and information amount of X. e larger the information entropy H(x), the larger the amount of information for the variable X; the smaller the information entropy H(x), the smaller the amount of information for the variable X.
Mutual information M(X, Y) is proposed to quantify the common information between two correlative variables, as shown in (3). e larger the mutual information M(X, Y), the stronger the correlation between the variables X and Y. However, mutual information is symmetrical and cannot represent the directionality of information transfer.

Transfer Entropy.
In 2000, Schreiber [28] proposed transfer entropy to measure the amount of information transfer and asymmetric interaction between systems based on information theory. Transfer entropy TE Y⟶X for two discrete systems X, Y is calculated as (4), where x i and y i represent the state value of X and Y at the time i, respectively.
are the conditional probabilities. Transfer entropy has been regarded as an indicator of causality due to its asymmetric nature.
Transfer entropy represents the difference of the information entropy for x i+1 between the situations with both x (k) i and y (l) i known and with only x (k) i known. If the state of X at a certain time is completely determined by its historical state, not connected with Y, the transfer entropy is 0. e parameters k, l are the sampling period of X, Y for the calculation of transfer entropy. With the increasing of k, l, more computational resources and data are required to estimate the joint probability density. Considering the time delay impacts of information propagation, Bauer et al. [29] modified the calculation of transfer entropy by embedding the prediction horizon h, as shown in Schreiber's definition for transfer entropy is based on the assumption that the system should be approximated by the stationary Markov process, and the current system state only depends on the past states within the limited time periods. If the assumption of Markov process is not satisfied, transfer entropy may not be suitable to measure the causal relationship for this system [24]. e evolution of traffic flow has been regarded as being eligible for the nature of "Markov process" [30]. erefore, transfer entropy is suitable for spatiotemporal causality modeling of the network traffic flow.
As the prediction horizon varies, the reference is study adopts the modified transfer entropy proposed by Shu and Zhao [24] as (6).
i+h−1 , which is more suitable for estimating the transfer entropy considering the time delay.

Calculation Method for Transfer
Entropy. e joint probability density in (5) is estimated by the kernel density estimation function. e probability density is estimated in (7). K(x − x i ) is the value of kernel function at x i . e probability density p(x) is the average of the kernel function value over a certain range. Kernel density estimation method does not depend on the prior distribution of the data and is also applicable for the non-Gaussian distribution data. Gaussian kernel function is used to estimate the probability density of traffic state for each road segment as (8). e parameter θ denotes the width of the window for the calculation of the kernel function values.
e joint probability density p(x, y) for x, y is shown in (9), and the corresponding joint Gaussian kernel function is calculated in (10).
e interactions among different variables vary with time. Sliding window technique is utilized to dynamically calculate the transfer entropy between variables along the timeline, which can reduce the sampling data size and improve the efficiency of causal relationship analysis. e sliding window is described by the window width w and the moving step length l. e original state space is divided into n continuous subspaces S i . Each window consists of w time intervals. e moving step length l is smaller than w. e window width w should not be too small; otherwise, the small sampling data within the window would affect the accuracy of kernel density estimation.
e moving step length l should not be too large; otherwise, it could not reflect the variation of the information transmission process timely. As shown in Figure 2, for the time series data with a length of L time intervals, the sliding window starts to move with a fixed step length l. For each window, the probability density is calculated, and then the transfer entropy vector can be obtained with the dimensions of p � L − w + 1 in  e road network consisting of m road segments is utilized to illustrate the calculation of the transfer entropy matrix. e traffic state of each road segment can be treated as a variable, and thus the total number of variables for this traffic system is m. Transfer entropy between any two variables is calculated, and then two-dimensional transfer entropy matrix T m×m is obtained for each sliding window, as shown in (11). Considering the directionality of transferring entropy, T m×m is not a symmetric matrix, and for each pair of road segments, t ij ≠ t ji . e elements on the diagonal are 0.
After eliminating diagonal zero elements, the transfer entropy matrix T m×m for each sliding window is transformed into a row vector te i � [t i 12 , t i 13 , . . . , t i 1m , t i 21 , t i 23 , . . . , t i m(m− 1) ]. After the window slides from the beginning to the end of the time series, p transfer entropy vectors can be obtained. en, all the transfer entropy vectors are integrated together to form a transfer entropy matrix with the dimensions of p × (m 2 − m) for road network traffic state, which can represent the transferred information among different road segments. e sliding window can improve the computational efficiency of transfer entropy by using limited data within each window, which makes it suitable for the real-time traffic management system.

Causality Significance Test Method.
For the causal inference, the causal relationship is assumed to exist between any two different traffic state variables X and Y, and then the causality needs to be verified based on the observed data. e cause variables and effect variables can be discriminated through the causality analysis. Transfer entropy is asymmetry, because the amount of information transferred in opposite directions is different. To characterize the direction and strength of the causality, the causal correlation coefficient ρ X,Y is defined to model the causal strength [29]. e causal orientation and strength are measured by the difference between TE Y⟶X and TE X⟶Y , as shown in When the transfer entropy TE Y⟶X in the direction of Y ⟶ X is larger than TE X⟶Y in the direction of X ⟶ Y, Y is the cause variable of X, and the direction of information transfer is Y ⟶ X. Conversely, when TE Y⟶X is smaller than TE X⟶Y , X is the cause variable of Y, and the direction of information transfer is X ⟶ Y. When TE Y⟶X is equal to TE X⟶Y , ρ X,Y � 0, and there is no causality between X and Y. Due to data noise or interference, the causal correlation coefficient ρ X,Y is generally not equal to 0. If ρ X,Y is too small, the causal correlation is not significant. en, it is necessary to set a causal correlation coefficient threshold to define the significant causality, namely, the causality significance test. If ρ X,Y exceeds the threshold, the causality between X and Y is significant.
Causality significance test can be regarded as a hypothesis testing problem to determine the causal relationships. e null hypothesis is that if ρ X,Y is small, there is no causality between X and Y. If ρ X,Y is large enough, the null hypothesis is rejected, and there exists causal relationship between X and Y. Bauer et al. [29] used the Monte Carlo method to reconstruct a new alternative time series for causality significance test, which should satisfy the following assumptions: the causality between X and Y is completely destroyed, and the statistical distribution of X and Y remain unchanged.
is study utilizes the method proposed by Duan et al. [31] to disrupt the original time series for X and Y with the L time intervals. e new time series X ′ and Y ′ are constructed, as shown in (14). e statistical distribution of the reconstructed time series X ′ and Y ′ is consistent with the original time series X and Y.
where M is the length of X ′ and Y ′ ; i, j are randomly selected from 1, 2, . . . , L − M + 1 { }; and ‖i − j‖ ≥ e, where e is much larger than the prediction horizon h to make sure that there Journal of Advanced Transportation is almost no causality correlation between X ′ and Y ′ . en, the causal correlation coefficient ρ � [ρ 1 , ρ 2 , ρ 3 , . . . , ρ N ] are calculated for X ′ and Y ′ . e causality significance test is carried out according to (15), where the μ ρ and σ ρ are the mean and standard deviation of ρ 1 , ρ 2 , ρ 3 , . . . , ρ N . e significance threshold ε is set as μ ρ + 3σ ρ . When the causality coefficient ρ X,Y is smaller than ε, there is no causal relationship between X and Y; when the causality coefficient is larger than ε, there is significant causality between X and Y.

Network Traffic State Evaluation.
To evaluate the network traffic state, the influence coefficient and affected coefficient are defined for each road segment. For road segment i, the influence coefficient R out (I) denotes the sum of the transfer entropy from road segment i to the other road segments in the network as (16), which can describe the impacts of road segment i on the other road segments. In the same way, the affected coefficient R in (I) denotes the sum of the transfer entropy from the other road segments to road segment i as (17), which can describe the impacts of other road segments on the target road segment i. A data-driven method for identifying the potential bottlenecks and critical road segments is proposed from the perspective of spatiotemporal causality analysis. e road segments with large R in (I) can be regarded as the potential bottleneck segments, which are most likely to be affected by the traffic state of other road segments in the network. e road segments with large R out (I) can be regarded as the critical road segments, which are most likely to affect the traffic state of other road segments.

Dynamic Spatiotemporal Traffic Causality Graphs.
e time-varying network traffic state leads to dynamic spatiotemporal causality graphs. Due to the asymmetry of transfer entropy, the spatiotemporal traffic causality graphs are directed graphs, representing the dynamic causal structure for traffic state variables, as shown in Figure 3. e road network consists of n road segments that are denoted by the nodes r 1 , r 2 , . . . , r n . e directed edges demonstrate the significant causal relationship between the traffic states of two road segments. e structures of spatiotemporal causality graphs at different time slices are quite different. For example, r i is the cause segment of r j at time t − 1, while there is no link between them at t and t + 1. e causal strength between any two road segments is defined as the weight of directed edges. e causality coefficient ρ r i ,r j for road segment r i and r j during [t − 1, t] is calculated based on the transfer entropy TE r j ⟶ r i and TE r i ⟶ r j according to (13). e causality matrix W t for transfer entropy at time t is calculated as (18). e weight of the directed edges with strong causality correlation is set to 1, and the weight is set to 0 with no obvious causality correlation. en, the redundant connections can be removed for the construction of causality graphs. e calculation process for the causality matrix is shown in Figure 4.
Based on the dynamic spatiotemporal causality graphs, four indicators are proposed to evaluate the impacts of any road segment in the road network from the perspective of causal dependence. e input degree D t in (i) is defined as (19), denoting the impacts of the traffic states of the other n − 1 road segments on that of r i at time t. e output degree D t out (i) is defined as (20), denoting the influence of traffic state for road segment r i on the other n − 1 road segments. e sum of input degrees SumD in (i) and the sum of output degrees SumD out (i) are defined to quantify the cause-effect relationship between the road segment r i and the other road segments during the time period T, as shown in (21) and (22).

Data Description.
e expressway network of Shanghai in China is utilized to test the proposed causality analysis method. Traffic flow data is collected by the loop detectors distributed on the network, as shown in Figure 5. e detailed data preprocessing process has been illustrated in our previous study, including data aggregation, missing data estimation, and data noise reduction [32]. Traffic speed data on May 6, 2014, for 432 road segments in the expressway network is used to verify the proposed spatiotemporal traffic causality approach. e time interval of traffic flow data is 10 min.

Sensitivity Analysis for Transfer Entropy.
In this study, considering the limited computational resource, the parameters k and l are set as 1. e maximum time delay is set as 40 minutes, and the parameter h of the prediction horizon is set as [1,4]. e transfer entropy between any two segments can be calculated for different directions. For example, r 220 is the upstream road segment of r 223 , and the variation of TE r 220 ⟶ r 223 and TE r 223 ⟶ r 220 is shown in Figure 6.
Transfer entropy in different directions for r 220 and r 223 is shown in Table 1. For example, TE r 223 ⟶ r 220 is larger than TE r 220 ⟶ r 223 at the evening peak time 18:30, while TE r 223 ⟶ r 220 is larger than TE r 220 ⟶ r 223 at the nonpeak time 13:30. erefore, the downstream segment r 223 has a more obvious impact on the upstream segment r 220 in the evening peak congestion periods, while the upstream segment r 220 has a more obvious impact on the downstream segment r 223 in the nonpeak period.
In addition to the parameters k, l, h, the causality coefficient threshold ε also needs to be set. When constructing the spatiotemporal causal graph, as the causality threshold ε increases, the number of remaining directed edges with significant causality is reduced. e transfer entropy and causality coefficients between any two segments are calculated based on the traffic speed data set for the entire road network on May 6, 2014. e mean value μ ρ and standard deviation σ ρ of transfer entropy for the disturbed sequence X ′ and Y ′ are 0.0151 and 0.0116, respectively, and then the threshold ε is set according to ε � μ ρ + 3σ ρ � 0.05. Different settings for ε would affect the structures of the    Table 2.
Considering computational complexity, the three key parameters are set as k � 1, l � 1, h � 1. e time-varying transfer entropy in different directions, TE r 220 ⟶ r 223 and TE r 223 ⟶ r 220 , is shown in Figure 7. e variation of the traffic causality coefficient ρ r 220 ,r 223 is shown in Figure 8. e transfer entropy between the adjacent road segments changes greatly over time.
e direction and amount of information transmission in different time periods are quite diverse. For example, the direction of information transmission between 10:00 and 18:00 is mainly r 223 ⟶ r 220 , while the direction of information transmission between 18:00 and 21:00 is mainly r 220 ⟶ r 223 . e distribution of causality coefficients for network traffic flow is concentrated, as shown in Figure 9.

Spatiotemporal Traffic Causality Analysis.
Transfer entropy values among different road segments fluctuate greatly with time and space, reflecting the variation of information transfer. Each road segment may be a potential cause or effect segment. e influence coefficient R out (I) and affected coefficient R in (I) for all the road segments in the morning peak time, evening peak time, and nonpeak time are shown in Figure 10. Each road segment in the network is represented by one bubble. e bubble size denotes the average speed of the road segments. e distribution of the bubbles is determined by both R out (I) and R in (I), which can quantitatively describe the casual interaction of network traffic flow state. e bubbles for the morning peak time are the most scattered. Table 3 lists the potential bottleneck segments with the largest R in (I) and the critical road segments with the largest R out (I). e spatial locations of potential bottleneck segments and critical road segments in the road network are shown in Figure 11. e critical road segments are mainly distributed in the central and western regions of Shanghai City.
Not all traffic causality correlations are significant. In this section, the threshold ε is set as 0.05 for the causality significance test. e sum of input degrees SumD in (i) and output degrees SumD out (i) are calculated for the morning peak period (7:00-10:00), nonpeak period (13:00-16:00), and evening peak period (17:00-20:00). e distributions of SumD in (i) and SumD out (i) are shown in Figures 12 and  13, respectively. e output degree distribution is more concentrated than the input degree distribution. On the whole, SumD in (i) for the morning peak hours is larger than that of the evening peak hours. e distribution of SumD in (i) for the nonpeak hours is scattered with smaller values, and thus the road segments are more likely to be affected by the traffic state of other road segments for traffic congestion. Similarly, SumD out (i) for the morning peak hours is larger than that for the evening peak and nonpeak hours. e road segments are more likely to affect the traffic state of other road segments. Generally, the causal interaction among different road segments during the peak time periods is more significant than that of nonpeak time periods.

Spatiotemporal Traffic Causality Visualization and Evaluation.
e spatiotemporal traffic causality graphs for the Shanghai expressway network at 8:30 in the morning peak time, 13:30 in the nonpeak time, and 18:30 in the evening peak time are visualized in Figure 14, which can represent the spatial distribution characteristics of the input degree and output degree. Moreover, the circles represent the output degrees of the expressway segments. e larger the circle, the larger the output degree, demonstrating more significant impacts on the traffic states of other road segments in the network. e directed causal edges describe the causal-effect relationship between two segments. e head arrow for each directed edge connects to the affected road segment, while the end of each directed edge connects to the cause road segment. e density of arrows around each circle can present the impacts of other road segments on the target segment. e distributions of circles and directed causal edges are diverse for different periods. e output degrees of road segments in the peak time are generally larger than those of nonpeak time. In the morning peak time, the road segments with larger output degrees are mainly located at the north-south expressway, east-west expressway, inner ring road, and southern middle ring road. In the evening peak hours, the road segments with a larger output degree are mainly located at the eastern inner ring expressway. Generally, these critical road segments are mainly located at the intersections of arterial roads, which undertake the convergence and dispersion of large traffic flows, and have more significant impacts on the traffic state of other roads. e circles distributed in the eastern outer ring area are smaller than other areas, and the directed causal edges are also sparser than other areas, especially during the nonpeak period, indicating that the road segments in this area have no significant cause-effect interaction with other road segments. e main reason for this phenomenon is that the eastern region develops relatively late with weaker network accessibility, which is less likely to be affected by the traffic states  Figure 15. e specific spatial structures of r 302 , r 391 , r 286 , r 181 are shown in Figure 16. e yellow segments are the critical road segments, and the green segments are the entrance and exit ramps or the interchange ramps. e road segments r 302 , r 391 , r 286 are located at the intersections and near the import and export of expressways with intricate traffic flow. r 181 is located in the middle segment of the east-west expressway, which is the main corridor in Shanghai and bears the largest traffic volume in the east-west direction. ese critical road segments are normally congested, which may affect the traffic states of other segments in the road network.      Generally, the identified critical road segments are consistent with the spatial structure and traffic condition of the road network, which can prove that the transfer entropy is suitable to evaluate the causal interaction of network traffic flow. Real-time traffic control measures can be taken for the time-varying critical road segments or potential bottlenecks to prevent traffic jams and improve traffic operation efficiency. Furthermore, the potential flaws of road network structure can be optimized in the future.

Conclusions
is study proposes a novel dynamic spatiotemporal causality modeling framework, which can represent information transmission of network traffic flow and identify the potential bottlenecks and critical road segments of the existing road network. Gaussian kernel density estimation method is used to calculate the transfer entropy among different road segments. To reveal the dynamic variation of traffic causality, the sliding window technique is utilized for the calculation of the transfer entropy. Causality significance test is performed to construct spatiotemporal causality graphs.
is study can effectively extract the potential nonlinear causal relationships from massive traffic data and provide a data-driven research framework to identify the critical road segments and potential bottlenecks in the road network. e detected dynamic spatiotemporal traffic causality can be combined with the traffic prediction in the realtime traffic management system. e experimental results based on the traffic sensor data for the Shanghai expressway network indicate that transfer entropy for network traffic flow is asymmetrical, which fluctuates significantly with space and time. e output and input degrees in the peak time are generally larger than those in nonpeak hours with more information transfer and stronger causal interaction for the network traffic flow. e critical road segments with larger output degrees are mainly located in the intersections, bearing the convergence and dispersion of large traffic flows and having significant impacts on the traffic state of other segments in the road network. e causal correlation of the road segments with smooth traffic condition at the nonpeak time is weaker than that of peak time.
is study does not consider the connectivity of road network. In the future, we would integrate the network topology structure into spatiotemporal traffic causality analysis and then develop a traffic congestion propagation pattern identification model. In addition, the traffic causality analysis can be further combined with traffic congestion prediction.

Data Availability
e data used to support the findings of this study are not publicly available. Please contact the corresponding author for details.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.