Delay causality network in air transport systems

To better understand the mechanism of flight delay propagation at the system-level, we built a delay causality network (DCN) based on the Granger causality test. Through topological analysis of DCNs, we found that only about a quarter of airports were involved in delay propagation during the peak travel period and large airports affected by many upstream airports impact fewer downstream airports. Furthermore, temporal analysis of DCNs indicates that the culprits of delay propagation in the air transport system are not a fixed set of airports; instead, they vary daily depending on the operational environment.


Introduction
With increasing globalization, the world's civil aviation industry has been growing at a fast pace (Barnhart et al., 2009). The flight delay problems that have resulted from the rapid development of the civil aviation industry have become a worldwide challenge (Czerny, 2010). Flight delays have negative impacts on several aspects, such as passengers, airlines, and air transport systems (Ahmadbeygi et al., 2008). Delayed flights throw travel plans into disarray, often making passengers dissatisfied with the airlines (Britto et al., 2012). Airline companies also suffer, not only paying for the resource waste caused by delays but also having to invest more to improve passenger satisfaction (Zou and Hansen, 2014). Due to flight delays, air transport systems are faced with reduced efficiency and increased security risk, leading to economic loss and environmental pollution (JEC report, 2008). A recent study reported that the total direct cost induced by flight delays was nearly $28.9 billion in the United States in 2007 (Ball et al., 2010) (see Table 1).
An initial flight delay can be attributed to several reasons, such as air carrier issues, extreme weather, air traffic control, etc. (BTS report, 2017). However, a propagated delay occurs because of connected resources (Kafle and Zou, 2016). The most common resource is aircraft (Zou and Hansen, 2014). Because the same aircraft flies multiple flight legs, the delay of an earlier flight can affect the subsequent flights of the same aircraft (Lan et al., 2006). If passengers are not free from a previous delayed flight, the next flight will be delayed by waiting for it. Flight crews also switch between different aircraft, causing the delay from one flight to propagate across multiple flights (Beatty et al., 1999;Wang et al., 2017). For these reasons, a small initial delay may lead to larger delays later, inducing much worse situations (Li et al., 2014;Meng and Zhou, 2011). Therefore, research on the mechanism of delay propagation is timely yet challenging.

Literature review
The traditional Approximate Network Delays (AND) model was originally conceptualized in a prototype form of three-airport networks (Malone, 1995). Pyrgiotis et al. (2013) enriched the AND model and investigated the delay propagation based on 34 US airports. Their results showed that delay propagation tends to mitigate daily airport demand profiles and push more demands into late evening hours. Nayak (2010, 2011) used the multivariate simultaneous equation regression (MSER) model to study the impact of a single airport on the others, and vice versa. Their results revealed that major airports have a higher impact on the average delay. Later, Hao et al. (2014) used the MSER model and the Federal Aviation Administration (FAA) system-wide analysis capability (SWAC) simulation model to quantify the impact of the three airports in the New York area on delays throughout the airport network, finding that the delays within the New York area are lower than expected. Fleurquin et al. (2013) developed the maximum connected subgraph of congested airports for assessing the level of delays across the entire system. They also introduced a model that comprehends aircraft rotation, passenger connectivity, and airport congestion as well as crew rotation to simulate the propagation of delays. This model can simulate the congestion of the system accurately. Then, they proposed a new model involving slot reallocation and swapping to simulate the propagation of reactionary delays in Europe (Campanelli et al., 2014). Afterward, Campanelli et al. (2016) used these two models to simulate flight delay propagation and assessed the effect of disruptions in the US and European aviation networks.
Despite the advances in understanding flight delay propagation, few studies have investigated delay propagation by considering the interdependence relationship of delay time-series. Thus, a systematic framework probing the causal relationship among airports continues to be elusive. Recent years have witnessed a growing interest in the inference of causal interactions (Wahl et al., 2017;Stokes and Purdon, 2017) in complex systems. Thanks to theoretical innovation, the application fields of causality tests have grown to include biology (Stokes andPurdon, 2017), ecology (Sugihara et al., 2012), social sciences (Frank et al., 2018), physics (Martin et al., 2016) and economics (Song et al., 2008). All of these studies have a common theme: the details of the temporal mutual influence of units are difficult to understand, and causality tests are used to detect the interaction patterns in dynamical systems by time series analysis. All of the prior results have shown that causality tests yield new insights into large-scale complex systems. The air transportation system is also a typical large-scale complex system. Due to its complexity, the mechanisms of delay propagation are not fully understood, especially for the interdependencies of different airports. Causal analysis may provide a new perspective on this problem. In this study, we adopted Granger causality (Granger, 1969) as the main method due to its primary advance on the causation problem (Frank et al., 2018). Then we built a delay causality network (DCN) based on the Granger causality test and investigated the topological and temporal properties of the DCNs, offering insights into the features of specific airports.

Contributions and outline
We apply a theoretical framework of causality test to study the delay propagation of the complex airport system. By considering delay propagation problem from the perspective of delay time-series interdependence, this approach can capture the interaction patterns of delay between different airport pairs. Due to the large number of airports and their complex relationships, the features of delay propagation cannot be understood from information at the individual airport level alone. We construct DCNs to characterize the global structure and dynamics of delay propagation, revealing the direction and range of delay propagation at the network-level. Although air transport system have been abstracted to a directed/undirected, weighted/unweighted network in previous studies , existing studies have mostly considered static graphs, meaning that the dynamics of the network were could not explicitly be considered explicitly (Ren and Li, 2018). In our research, the edges of DCN are the results of daily temporal interactions, representing the functional connectivity and underlying operational conditions. Theory and application of complex networks are used to further reveal the properties of DCNs. We use the degree, reciprocity, clustering coefficient, community, and other metrics of complex network to describe the situation of delay propagation and find that only about a quarter of airports in China face delay propagation during the peak travel period in China. The results also reveal that large airports affected by many upstream airports actually impact fewer downstream airports. By studying the connected clusters formed by high-degree airports, we find that the culprits of delay propagation in the air transport system are not a fixed set of airports; instead, they vary daily depending on the operational environment. These findings not only help us understand the complex aviation system better, but also provide support to air traffic managers on decision-making. According to the findings, the air traffic managers can develop effective countermeasures to prevent the delay propagation in particular links to alleviate network-wide flight delay. Furthermore, through causality test, if one variable acts as the cause for another one, actively intervening on the first would lead to the changes in the second. Thus, policy makers could potentially adopt the proposed method to analyze the interaction of airports in order to identify critical ones. It will help them to make decisions on resource allocation for improving airport capacity.
The remainder of the paper is organized as follows: Section 2 introduces the methodology, including the Granger causality test, construction of the DCN, and network analysis; Section 3 reports the results, and Section 4 is the conclusions and discussions.

Delay time series
We utilize delay time series to represent the on-time performance of an airports. We focus on daily time series because daily interactions are the finest temporal resolution in the flight data set. For airport i, we construct its delay time series Y i by splitting one day into 24 time intervals. The value of each time interval represents the average delay d t ( ) i . The average delay for airport i is defined as represents the total delay of departure flights at airport i during + t t ( , 1), and c t ( ) i and s t ( ) i represent the numbers of cancelled flights and scheduled departure flights at airport i during + t t ( , 1), respectively. Traditional methods do not consider flight cancellations. However, not considering flight cancellations, under extreme conditions, may generate bias in the airport operations. Cancellations should be taken into account as a delay metric for assessing the performance of air transport systems (Xiong and Hansen, 2009). According to the regulations of the FAA, the Civil Aviation Administration of China (CAAC), and the European Aviation Safety Agency (EASA), h can be the equivalent delay time of a cancellation (h = 180). With this method, a daily delay time series can be constructed as a series of data points of d t ( ) i indexed sequentially as a function of time.

Causality test
In our research, causality reveals the impacts between airport pairs and reflects the interaction of airport delays. If the delays observed at one airport can explain the delays appearing at a second one after several hours, there exists a causality relationship. Here, Granger causality (GC) will help understand the existence and direction of the influence between two airports based on the delay time-series. A time series Y j is said to cause the time series Y i if it can be shown that the values in Y j provide statistically significant information about the future values in Y i .
First, the Granger causality test uses an unrestricted regression equation to obtain the residual sum of squares:: is the error term, and a m and b m are coefficients. In addition, p ij stands for the lag, indicating that the current value should be regressed with the values in the past p ij hours. Then, the null hypothesis that j does not cause i is defined as Second, we apply a restricted regression equation to obtain the residual sum of squares RSS R : Finally, F-statistic and p-value are adopted to test the null hypothesis: where w is the sample size of each time series. When the p-value is less than the chosen significance level ∂ (5% by default), the null hypothesis is rejected. If item Y j belongs to this regression, Y j is the cause of Y i ; hence, the value in i is partly attributed to j.

Individual test
Through the causality test process above, we can assess the delay time series of each airport pair for each day. However, before applying the method to delay propagation, the following criteria need to be met: • Due to the regression basis of causality detection, the current delays of an airport should not be independent but closely related to its delays that occurred during earlier time periods as well as other airports. This is reasonable because current delays in one airport cannot break off relations with delays in previous periods of itself or other upstream airports.
• Lag p ij equals the average flying time between the airport pair plus turnaround time in airport j. Therefore, airport activities are represented by departure delay time series, interactions of which are captured by the pairwise causality test. We postulate that the impact is still effective until the aircraft of the connecting flight (from airport i to airport j) departures from airport j, i.e., executing its subsequent flight. Statistical analysis shows that the turnaround time of nearly 95% of the flights is less than 120 min. Therefore, we postulate that the lag p ij equals the average flying time between the airport pair plus 120 min.
By satisfying these two criteria, the causality relationship of an airport pair can be determined. Here, we utilize four airports as examples to demonstrate the method for determining. For an airport pair, we construct their departure delay time series and obtain the average flight time of an airport pair. Then, GC is used to conduct the pairwise test. According to Fig. 1, when the p-value is less than 5%, the causality relationship can be accepted, and a directed edge connects these two airports. The causality relationship between airport 1 and airport 3 represents the interactions caused by delay propagation, i.e., delays in airport 1 are affected by airport 3. This relationship is not monopolistic. Delays in airport 1 could be partly attributed to not only airport 3 but also other airports. Moreover, there are many airports in air transport system; thus, determining the relationships are very complicated among different airports. Pairwise analysis cannot handle the complexity of the system-level delay propagation.

Network analysis
As aforementioned, due to the large number of airports and complex interactions, the features of delay propagation cannot be understood from the information at the individual airport level alone. Complex network theory and its associated metrics and tools present an apposite approach to study the air transport system beyond what is offered by classical techniques (Cook et al., 2015). Thus, a network-level analysis is adopted to capture the global structure of the functional interaction.
We construct a DCN of one day using the above method and analyze it with the tools provided by network science (Cardillo et al., 2013;Wang et al., 2014;Du et al., 2016). Fig. 2(a) presents a sample network to demonstrate the network analysis method. Some topologies with practical significance are introduced here to help analyze a system-level delay propagation.
Degree of an airport reflects the number of airports that have delay propagation links with it. In a directed network, airport i has in-degree ( = ∑ = k a i in j N ji 1 ) and out-degree ( = ∑ = k a i out j N ij 1 ) links, denoting the number of airports affected by airport i and the number of airports affecting airport i, respectively. The total degree of node i is = k k i i in + k i out . For a certain node in Fig. 2(b), airport 1 is affected by airport 3 and affects airports 2 and 3. Thus, k in 1 is one, whereas k out 1 is two. The average degree of the network in Fig. 2(a) is < k in > = < k out > = M/N = 1.5, meaning that each airport affects 1.5 others on average.
Reciprocity Parameter represents the bidirectional nature of delay propagation links between airport pairs. As shown in Fig. 2(c), reciprocity means that airport i affects airport j, whereas airport j also affects airport i ( = = a a 1 ij ji ). The parameter R is used to measure the overall symmetry of a directed network (da Rocha, 2009). It is defined as The maximum R is 1, implying that the delay propagation between all airport pairs is bidirectional. Larger R values indicate that a network is more symmetric.
Clustering coefficient is used to qualify the inherent cluster tendency of airports (Watts and Strogatz, 1998;Fagiolo, 2007). The clustering coefficient of an airport is the fraction of pairs of its neighbor airports (airports with delay propagation links to an airport) that have a direct delay propagation link (i.e., the number of triangles in the network). In Fig. 2(d), two neighbor airports of Airport 2 interact with each other, forming a clique, whereas the two neighbor airports of Airport 5 have no relationship. Thus, the clustering coefficient of node 2 is larger than node 5. For a network, the overall clustering coefficient is calculated as C D is 0.177 for the network in Fig. 1(e). Largest Connected Cluster is introduced to represent the extent of delay propagation (i.e., disaster area). The largest connected cluster (Broder et al., 2000) is a group in which airports are connected by propagation links. To make it represent the disaster area of delay propagation, we set an effective baseline for the members of a cluster. An airport is considered when it affects several other airports (k i out exceeds a threshold). If we postulate that the threshold is < k out > = 1.5, Fig. 2(e) shows that the largest connected cluster is formed by airports 1, 2, and 3. The size of largest connected cluster is M d is 3. To measure the cluster similarity of daily networks, the Jaccard index J is introduced, which is defined as: where both A and B are finite sample sets comprising the airport members. J equals one if the airports sets of clusters are the same, and zero if they are strictly different.
Community is used to evaluate whether the delay propagation among airports can be divided into several sub-regions in which each sub-region of airports have dense delay propagation links internally and sparse links with the rest of the system. (Arenas et al., 2007). Fig. 2(g) shows the two communities of the sample network. Furthermore, Modularity is designed to measure the strength of division of a network into communities (Newman, 2006). Q d in a directed DCN is defined as The δ function yields one if vertices i and j are in the same community, and is zero otherwise. M is the total number of edges, and Q d is 0.280 for the network in Fig. 1.
Network motifs are utilized to show the local relationship pattern among any three airports. It is defined as recurrent sub-graphs G' in network G (Shenorr et al., 2002). For example, Fig. 2(f) shows that the three airport groups have the similar relationship pattern. One important tool to evaluate the significance level of motifs is Z-score, which is defined as follows: where F G ( ) Network Randomization is used to conduct a random network that is compared with DCN. Starting with a certain number of nodes, i.e., 8 nodes, equal to that in the simple network, and zero edge, at each step, one new edge is uniformly chosen from the set of possible edges until the number of edges is equal to 12. During the process of randomization, self-connections and duplicated edges are prohibited.

Data description
The dataset analyzed in this paper was provided by the CAAC, comprising all flight information in July 2012 in China. July is the peak travel period in China and flight delays are typically severe. The database contained 219,845 domestic scheduled flights connecting 224 airports. The average delay of all flights during the study period was 42 min; July 1 was the best day, with an average delay of 23.1 min, and July 22 was the worst day, with an average delay of 65.1 min. Details of the data set are shown in Table 2.
3.2. Analysis of China air transport system 3.2.1. Basic properties of DCN To perform a system-level analysis of delay propagation, we build a DCN using the pairwise GC test based on the flight data described in Table 2. We focus on the daily time series and construct DCN d on d day. There are 224 airports in China air transport system. For one DCN of each day, 50,176 (224 * 224) times GC tests are performed. July 22 was the worst day in terms of flight delays. DCN 22 acts as a typical network to show the delay propagation properties. After removing airports with no connections in DCN 22 , we find that DCN 22 only contains 53 nodes and 242 edges (Fig. 3), which means that only about a quarter of airports have delay propagation links with other airports. According to the airport categories definition of the FAA, airports in DCN 22 consist of 32 large airports, 16 medium airports, and 5 small airports (FAA, 2017). Although small airports represent the majority of the system, and they rarely involved in the spread of delays. Large and medium airports are easily embroiled in delay propagation.
For this specific DCN 22 , we can answer the following questions about delay propagation by using network analysis tools.
1. How many airports does each airport affect or affected by? For a certain airport i in DCN 22 , k i in is the number of airports, each of which partly results in a delay in airport i, while k i out is the number of airports whose delay is partly caused by airport i. Here, , indicating that each airport affects approximately 5 airports and be affected by about 5 airports.  Du et al. Transportation Research Part E 118 (2018) 466-476 2. Are the delay propagation links between airport pairs bidirectional? The reciprocity parameter of DCN 22 is R = 0.20. For randomized networks with the same number of nodes and edges -1000 networks generated by network randomization technology for comparison purposesthe average R' of the randomized networks is only 0.01, which is much less than R = 0.20. Thus, the DCN 22 is more symmetric. One possible reason is that two-way flights between airport pairs result delay propagating in both directions. 3. What is the clustering tendency of airports? The overall clustering coefficient C D is 0.191, larger than twice that of a randomized network (C ' D = 0.088), indicating that airports on July 22 had a tendency to cluster. 4. Can the delay propagation among airports be divided into several sub-regions? Community detection algorithm is adopted to analyze the DCN 22 . Modularity is used to evaluate the strength of division of a network into communities (Newman, 2006), larger value of modularity, more obvious community structure. The modularity value of DCN 22 is 0.219 and the average modularity value of 1000 randomized networks is 0.213. Thus, no evidence suggests that delay propagation on July 22 could be clearly divided into sub-regions. Moreover, the average modularity value of other 30 DCNs is 0.244, indicating that delay propagation of other days in July is more regional. 5. How serious is the delay propagation? We use the largest connected cluster to represent the seriousness of delay propagation.
Members of the connected cluster were selected through a k out threshold so that the cluster contains sets of high out-degree airports whose delays affect many other airports. We define k i out threshold such that it should be larger than average degree of the network (< >= ∈ k k Z 4.57, out out ) representing the airports that affect more than or equal to five other airports. M d of DCN 22 is 26, indicating that 26 are contained in the disaster area of delay propagation.
The interactions among airports are mainly caused by delayed connecting flights. Thus, in theory, more flights at a particular airport should result in a higher probability of interactions. To investigate the relationship between the number of flights and influence times, we consider the k i in (k i out ) of airport i and its number of flights on July 22 in Fig. 4(b). The results show that airports with more flights tend to have a lager k i in (k i out ). Due to the randomness of daily operations, the daily flow and degree of airports only has a similar trend. Then, we further consider the correlation over the entire month. The average times is calculated and denoted by k i out (k i in ). The correlations between k i out (k i in ) and the average daily traffic flow q i of each airport is shown in Fig. 4a and b. Both k i in and k i out are positively correlated with q i , proving that airports with more flights could have a high probability to affect others or be affected. Moreover, Fig. 4a shows a linear relationship on a log-linear graph. Fig. 4b exhibits a linear relationship on a log-log graph. Hence, we further explored the relationship between k i in and k i out by defining Fig. 4c displays the traffic flow of the airports, larger circle higher flow, as well as the r values of various airports, with red color indicting r value less than 1 and blue more than 1. Note that for airport i, < r 1 means that outward links of airport i are fewer than its inward links, indicating that airports whose delays affect airport i are more than those whose delays are partly caused by airport i, while < r 1 the opposite. From Fig. 4c, we can conclude that airports with larger flows tend to have smaller r, i.e., larger airports are affected by many upstream airports but they impact fewer downstream airports. The conclusion for medium and small airports, however, is exactly the opposite. The same information is presented by the histogram in Fig. 4d, where the airports are ordered according to their flow ranking. Nevertheless, such findings are contrary to expectations. The largest airports are located in developed cities and bear the pressure of heavy traffic. Why are these largest airports able to affect fewer downstream airports and reduce the number of propagation paths of delays? Recently, the CAAC has implemented a no take-off limit regulation for the largest airports. Flights departing from these airports are not subject to traffic flow management initiatives. This measure has avoided delays in these largest airports from spreading out. Moreover, because the largest airports are already close to saturation, ATC often publishes the traffic flow management initiatives that can postpone upstream flights to later hours, ensuring that the traffic demands of the busiest airports are compatible with their capacities. The result of such traffic flow management initiatives is that upstream are delayed, which brings delays to the largest airports. However, the operations at largest airports are relatively normal and stable. This twopronged strategy for arrival and departure flights results in significant differences between the largest airports and others.

Temporal properties of DCNs
Another significant problem is the following, what features does the DCN present when the system is seriously delayed? To answer this question, we evaluate the Pearson correlation coefficient, PCC (Pearson, 2006), and P-value (Wasserstein and Lazar, 2017) between daily topologies of the DCN and the average daily delay of all flights D d during July 2012. Table 3 shows that there exists a rather weak correlation between D d and topologies except for the size of the largest cluster M d , which confirms the association between the connected cluster formed by high out-degree airports in the DCN and the level of delay.
We took this point to the next level by analyzing temporal properties in the DCNs. Fig. 5a and b shows the clusters in the DCNs of  The first column represents topologies of the DCN, and the second and third columns represent the PCC and P-value, respectively.
July 1 and July 22, respectively. The former is the least congested day, whereas the latter is the most congested, measured by flight delays in China's aviation system. The largest cluster in DCN changes dramatically. On July 1, only six airports belong to the connected cluster, whereas on July 22, the large cluster covers most airports. Fig. 5c shows M d and D d on each day of July, exhibiting the consistency between the two curves. This result implies the delay propagation induces high-level delays, and high-level delays facilitate delay propagation, making the delay more severe. Fig. 5d shows the similarity of the largest clusters on different days with the Jaccard index. Interestingly, the index is relatively low for most days, implying that the airports belonging to the largest cluster change substantially over time. Thus, although the clusters formed by airports affecting several others in DCN are highly correlated with D d , the culprits of the delay propagation are not always the same set of airports.

Local properties of DCNs
The global features of delay propagation are analyzed by the DCNs' properties, yet delay propagation cannot be fully understood without analyzing the local relationship pattern. Network motif is a useful tool to fill this gap. It is defined as recurrent relationship patterns among any three airports. Z G ( ) ' is used to evaluate the significance level of each motif. As shown in Fig. 6, an acute daily variance occurs for each type of motif over time, which can be attributed to the ever-changing system state. The average traffic flow at each position of the motifs is also shown, and the color of the nodes represents the size of the airport. Furthermore, a notable observation is that the average Z-score of G' 12 is the highest on most of days, which reveals a situation with three large airports having bidirectional interactions more likely on a given day. In fact, the motif is highly heterogeneous in DCNs, although some local properties of DCNs are revealed. From G' 1, G'3, and G'9 in Fig. 5, we can see that medium airports appear frequently in the original location of these motifs. For the problem "who is the first piece of delay propagation dominoes?", this analysis turns our attention to medium airports, although large airports still play the dominant role in delay propagation.

Conclusion
In this study, we investigated the mechanism of delay propagation among airports from a new perspective, i.e., building a delay causality network (DCN) based on the relationship of delay time series of each airport and applying network analysis tools to reveal the macroscopic appearance of delay propagation. To demonstrate the method, we built a DCN with the data set of flight delays of (d) Similarity of airports belonging to connected clusters of different days. J l is the Jaccard index between an airport set of connected cluster on each day and that on July 22 (whose connected cluster has the most airports); J b is the Jaccard index between the airport set of connected cluster on each day and the prior day. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Chinese commercial airports in July 2012. As a typical bad weather day, July 22 was chosen to present the network analysis results of DCNs. We found that on average each airport affected approximately 5 airports and was affected by about 5 airports as well. However, airports of different sizes were not alike. Large airports, i.e. airports with high traffic flow, were affected by more airports than what they impacted in the downstream. Medium and small airports were the opposite. The relationship between in-degree and out-degree of airports proves that some of the largest airports reduced delay propagation paths. Reciprocity parameter showed the bidirectional nature of delay propagation paths between airport pairs of DCN. Community and modularity indicated that delay propagation on July 22 could not be divided into several sub-regions. Nevertheless, we did find clusters in the analysis, which were formed by high-degree airports. We also found that the average daily delay of all flights was highly correlated with the largest connected cluster. Moreover, the airports belonging to the connected cluster varied substantially with time, suggesting that the culprits of delay propagation were not a fixed set of airports.
Understanding the delay propagation mechanism is very important for both air traffic management and aviation planning. For air traffic managers, identifying delay propagation paths could help them weigh the impacts of traffic management initiatives, for example slot controls and schedule optimization, and choose the best combinations to improve the efficiency of aviation system. For aviation planners, by applying the proposed method, they can identify the critical airports in the network in terms of delay propagation. Such information will help them make decisions on network capacity expansion and resources allocation. The counterintuitive result of large airports impacting fewer downstream airports would attract aviation planners' attention towards medium and small airports regarding delay mitigation, which may have been neglected previously.
The study of using DCN to study flight delay propagation could be extended further. For example, our DCN was an unweighted network, although edge weights could be taken into account, as they reflect the degree of causation. It is also interesting to compare the DCNs of the aviation systems in different countries and investigate the practices of the countries. We may come up with insights for mitigating flight delays from such international comparison.