Air delay propagation patterns in Europe from 2015 to 2018: an information processing perspective

The characterisation of delay propagation is one of the major topics of research in air transport management, due to its negative effects on the cost-efficiency, safety and environmental impact of this transportation mode. While most research works have naturally framed it as a transportation process, the successful application of network theory in neuroscience suggests a complementary approach, based on describing delay propagation as a form of information processing. This allows reconstructing propagation patterns from the dynamics of the individual elements, i.e. from the evolution observed at individual airports, without the need of additional a priori information. We here apply this framework to the analysis of delay propagation in the European airspace between 2015 and 2018, describe the evolution of the observed structure, and identify the role of individual airports in it. We further use this analysis to illustrate the limitations and challenges associated to this approach, and to sketch a roadmap of future research in this evolving topic.


Introduction
The complexity of many, if not of most real-world systems that surround us is the result of some kind of transport process. One may for instance think of social networks, in which individuals interact by transporting and interchanging information and emotions [1][2][3]; power systems, transporting energy in its different forms [4]; or the brain, in which neurons interact by means of chemical and electrical signals [5]. The most natural analysis involves considering them as pure transportation systems: the researcher directly analyses the movements taking place within them, i.e. tracking the movement of items through time and space, in order to synthesise various metrics about their behaviour. On the other hand, a much more abstract and powerful approach can be envisioned: individual movements are discarded, to look instead at how information is processed at different places. In other words, the focus is on how information is distributed among, combined at, and modified by the different elements composing the system. This thus represents a shift from a transportation to an information processing approach.
The epitome of this paradigm change is neuroscience and the study of the human brain, i.e. the study of how the brain performs complex computations to respond to internal and external stimuli. While early neuroimaging and electrophysiological studies typically aimed at identifying patches of activation or local time-varying patterns of activity, there is now consensus that task-related brain activity has a more complex spatially and temporally extended character [6,7]. As a consequence, complex network theory [8][9][10][11], a statistical mechanics understanding of graph theory [12], is progressively becoming standard in neuroscience. Brain activity is then represented through functional networks, constructed from association matrices of pairwise measures of functional connectivity (e.g. correlation or coherence) estimated from electrophysiological recordings [13][14][15]. The result is an abstract and coarse-grained representation of how information is transmitted and processed in the brain, and of the functional structures dynamically created to support cognitive tasks.
A relatively less studied, yet socially relevant system in which this methodological shift can be observed is air transport. While complexity science and network theory have long been used in its characterisation [16][17][18], the air transport system has mostly been analysed from a transport perspective, consistently with the nature of its objectives. Such an approach has especially been applied in the analysis and characterisation of delay propagation, one of the most important research topics in air transport management. Delays have profound implications in the cost-efficiency [19] and safety of the system [20], and contribute to the negative impact of air transport on the environment [21]. To illustrate, the Federal Aviation Administration estimates that US flight delays cost $22bn yearly: getting rid of delays would thus allow to pay one third of the whole national health care system of Spain (€72.8bn in 2017). Additionally, 1 minute of ground delay implies between 1 to 4 kg of fuel consumption, one order of magnitude higher in the case of airborne delay [21]. Delays have thus extensively been studied, usually by constructing large-scale simulations, as for instance in [22][23][24][25][26][27][28][29][30], or by relying on statistical and machine learning techniques [31][32][33][34][35].
In spite of a substantial body of literature, the mechanisms underlying delay propagation are still poorly understood, and all mitigation policies are broad in scope-i.e. policies tend to penalise all delays, irrespective of their role in the global dynamics. The reasons for this can be traced back to the limitations inherent these simulation-based studies, including limited availability of real data (as e.g. on connecting passengers and on airline's operational policies), the intrinsic uncertainty of the system's dynamics [36], and the difficulty of validating synthetic models. A better understanding of air transport architectural interactions may come from the study of how the system processes information. When aircraft travel between two airports, they do not only transport passengers and goods, but also transmit information about the status of the departure airport (and of the whole crossed airspace) to the destination. One airport receiving (possibly delayed) flights and dispatching them to other airports is not just managing the movement of the aircraft, but is also receiving, processing and retransmitting information about the system. A parallelism can be envisioned between the human brain and air transport: airports in the latter have a similar function of neurons in the former, with neurotransmitters and aircraft being just the instrument for moving and processing information. In recent years, a limited number of attempts have been made to apply this approach to model the propagation of delays in air transport [37][38][39][40][41][42], although limited by data availability and technical caveats.
In this contribution we propose a large-scale analysis of the structure and evolution of the delay propagation network in Europe, by leveraging on the techniques developed in network neuroscience in the last decades, and on more recent instruments for data pre-processing and network analysis. More in details, we describe the delay propagation patterns across the 50 largest European airports during four years, from 2015 to 2018, using a modified version of the celebrated Granger causality to compensate for the presence of missing values [43]. The resulting functional networks are analysed using standard network science metrics, to describe how the topology has changed through time in response to external events. We further show how the causality relations can be simplified, to cluster airports according to their main role in the delay propagation. To the best of our knowledge, this is the most complete study on network delay propagation based on functional networks to date, both in terms of temporal extension and data processing and representation, and as such complements studies based on other data analysis and modelling techniques. We conclude by drawing some lessons on the limitations associated to, and challenges offered by this interpretation of delay propagation as information processing, and by sketching a roadmap for future studies.

Traffic and delay data
Air traffic data were obtained from the EUROCONTROL's R & D data archive [44], a large and freely available repository of information about the European airspace and all commercial flights crossing it. Of interest for this study were aircraft trajectories, which are created by merging flight plans with data from air navigation service providers' flight data systems, radar and datalink communications among others. The data set has a temporal coverage of four years, from 2015 to 2018; four months are available for each year (March, June, September and December), and therefore data are discontinuous. From the full set of flights, only those landing at the 50 largest European airports (according to their number of passengers in 2015) have further been considered.
Given a flight, its delay at landing has been estimated as the difference between the actual and the scheduled landing times. All flights that have landed at a given airport and in a given hour have then been aggregated, to obtain a time series of average hourly delays for each airport. An overview and the temporal evolution of the number of flights is reported in figure 1, while additional details about the 50 considered airports (including names, International Civil Aviation Organization (ICAO) codes and number of operations) can be found in table A1 in appendix A.
A close inspection of the time series reveals that they are non-stationarity, as delays are usually higher at noon and lower during weekends. In order to correct this, as required by the subsequent Granger causality test, the delays are normalised by applying a Z-score detrend procedure defined as: where D (d, h) is the normalised delay for day d and time h, D(d, h) the original delay, and D(·, h) , σ(D(·, h)) the average and standard deviation for the delays for all days at the specific time h. The result is a stationary time series of zero average. D (d, h) also indicates how many standard deviations the observed delay is from the mean value, that is, how usual or unusual the delay is for that specific time of the day. In this sense, a positive value of the delay indicates that the system behaved worse than expected (it experienced a higher delay) and a negative value indicates that it did better than expected for that time of the day.

Detecting delay propagation: the Granger causality metric
The Granger causality test [45], developed by the economy Nobel Prize laureate Clive Granger on top of the prediction theory of Wiener [46], is one of the best well-known statistical tests for evaluating the presence of predictive causality [47] between pairs of time series. While this test has been analysed in numerous scientific works, for the sake of completeness the main elements of its mathematical formulation are discussed here below. Suppose an universe U, representing all elements (both observable by and hidden to the researcher) composing a system and relevant for a given problem. Within U we consider two elements A and B, respectively described by two time series a and b. Let us further suppose that these time series fulfil some basic conditions, including being stationary and regularly sampled. B is said to 'Granger-causes' A if: where σ 2 (a|U − ) stands for the error, in terms of the standard deviation of residuals, when forecasting the time series a using the past information of the entire universe U; and σ 2 (a|U − \b − ) the error when the information about time series b is removed. In other words, B is causing A if including information about the past of B helps predict the future of A, being this the origin of the term predictive causality [47]. While the forecast of equation (2) can be performed through any algorithm [48][49][50], a common and simple solution is the autoregressive-moving-average model. In this case, two linear models are fitted on the data, respectively called the restricted and unrestricted regression models: m here refers to the model order, the symbol ⊕ denotes concatenation of column vectors, C and C contain the model coefficients, and t and t are the residuals of the models. Equation (2) is then usually written as σ 2 ( t ) < σ 2 ( t ). In order to assess the statistical significance and obtain a p-value, an F-test is performed to check whether the coefficients C associated to the time series b are different from zero-i.e. whether b is actually having an impact in the prediction. One important limitation of the Granger test is its sensitivity to missing values [43,51]. To illustrate, and considering the application at hand, many airports do not operate around the clock, such that some hours can have no operations associated to them, resulting in a zero average delay. This zero is not equivalent to having no delays, but instead represents a missing value: we cannot know what would be the expected delay at the airport, would a flight have landed at that time. Such spurious values can bias the regression model at the basis of the Granger test, and consequently reduce the number of detected causal relationships. In order to solve this issue, we here substitute equation (3) with linear models in which the weight of missing values is set to zero. This effectively implies that missing elements are excluded from the calculation of the autoregressive models, and that the Granger causality test is performed only on the values esteemed as correct. As discussed in [43], this improves the sensitivity of the Granger test even when a significant fraction of values are missing, and allows recovering most of the original causal relationships.
As a final note, two facts about the Granger causality are worth highlighting. First of all, the Granger test, as implemented in equation (3), is linear, and can therefore only detect the linear part of a non-linear relationship. This is not a major problem for the application here considered, as delay propagation is mostly a linear process [52]. Secondly, while the test name includes the word causality, it does not necessarily measure true causality-as highlighted by Granger himself [53]. To be more exact, this test assesses the directed lagged interactions between joint processes, or the quantification of information transfer across multiple time scales. In spite of this, and for the sake of simplicity, the relationships detected by this test will here be called causal.

Propagation networks reconstruction and analysis
The Granger test has been applied to the detrended time series D (d, h) of section 2.1, to reconstruct 16 propagation networks, i.e. one for each available year-month. Each network is composed of 50 nodes, one for each airport; and is represented by an adjacency matrix A, of size 50 × 50, where the element a ij has a value of 1 to indicate that there is a directed edge from node i to j (i.e. the delay at airport i 'Granger-causes' the delay at airport j), and 0 otherwise [9,11].
As customary in functional network reconstruction, the elements of A are obtained by applying the Granger causality test over the time series of each pair of airports; the maximum lags is set to 6 values-i.e. we discard any propagation that would require more than 6 hours, a time interval longer than the duration of any intra-European flight. Additionally, it is worth noting that the output of the Granger test is a p-value. In order to avoid the increased probability of type I errors as a consequence of the multiple comparisons required by the reconstruction process, we applied a Bonferroni correction and rejected the null hypothesis of the test for an effective In order to characterise the obtained networks, the following topological metrics have been calculated: (a) Link density. Number of existing edges in the network, divided by the maximum number of edges that the network could have: where L is the number of edges and N the number of nodes of the network. (b) Diameter. Maximum distance between any pair of nodes in the network, such distance being defined as the number of edges in the shortest path connecting them. In the context of delay propagation it represents the maximum number of flights that are needed to disseminate the delays throughout the network. (c) Transitivity. Fraction of triplets of nodes that are included in triangles, also representing the tendency of nodes to form clusters [54]. It is mathematically defined as: where A high transitivity means that the network contains triplets of airport that are strongly connected, such that a delay originating in one of them is easily disseminated to the other airports of the group. (d) Assortativity. Tendency of edges to connect nodes of similar degrees [55]. Mathematically it is defined as: where k i and k j are the degrees of respectively nodes i and j. Positive (respectively, negative) values of assortativity indicate that nodes tend to connect with nodes of similar (different) degree. In the network of delay propagation, a positive assortativity indicates that airports propagating delays tend to connect with each other.
(e) Efficiency. Measure of how efficiently the network can exchange information between nodes. The efficiency is defined as the inverse of the harmonic mean of the distances between pairs of nodes [56]: where d ij is the distance between nodes i and j. (f) Information content (IC). Metric evaluating the existence of regular patterns in the adjacency matrix of the network, and calculated as the amount of information lost when pairs of nodes are iteratively merged [57]. The lower the IC, the more complex is the structure of the network, indicating the presence of some kind of meso-scale structure.
It is worth noting that some of the aforementioned metrics cannot be directly compared when the corresponding networks have different characteristics. For instance, the diameter depends not only on the structure of the network, but also on its link density; comparing two networks of different link densities can thus yield misleading results. In order to solve this, a set of random networks (here, 500) is generated, using the same number of nodes and edges as the network we are evaluating. The Z-score of the value m of the metric is then computed as: where μ M and σ M are the average and standard deviation of the metric as obtained in the random networks. Besides these network metrics, the following three measures of centrality are calculated for each node: (a) Out-degree: number of edges coming out from a node. Airports with high out-degree are those that distribute more delays to the rest of the network, i.e. they are responsible for initiating the propagation process. (b) In-degree: number of edges arriving to a node. Airports with the highest in-degree are those that receive more delays from other airports. (c) Betweenness centrality: metric defining the number of times a node is included in the shortest paths between pairs of nodes. The betweenness centrality c B of a node w is defined as: where V is the set of nodes, P(s, t) is the number of shortest paths connecting s and t, and P w (s, t) the number of shortest paths connecting s and t containing w. The airports with the highest betweenness centrality are those that control the flow of information (delays) via connected paths.

Detecting propagation roles
While the previously described functional network, and specifically the associated adjacency matrix A, completely encodes all information about causal relationships in the system, it also presents the drawback of not being easily explainable. To illustrate, a single node can be at the receiving and sending ends of multiple links, each one of them with different strengths; it can thus be difficult to synthesise whether that airport is a net source or receiver of delays. In order to yield a clearer representation, we here resort to an approach based on clustering nodes according to their causality role [58]. Specifically, we here hypothesise that each node (i.e. each airport) can be assigned to one of two clusters, respectively representing the group of net sources of causality (denoted as C 1 ) and of net receivers (C 2 ). Given one assignation of all nodes to clusters, the system is then simplified by reducing it to a network composed by two nodes, C 1 and C 2 ; these are then represented by two time series, defined as the weighted mean of the time series of the airports composing them. Afterwards, a metric J is calculated as: with pV 1,2 and pV 2,1 representing the p-value of the Granger test performed respectively between C 1 → C 2 and C 2 → C 1 . A low value of J indicates the presence of both that a strong causality C 1 → C 2 , i.e. C 1 is forcing C 2 ; and of a non statistically significant causality C 2 → C 1 , such that C 2 has a passive role. The best clustering of nodes is finally obtained as the one minimising J.
It is worth noting that the minimisation process can be extremely computational expensive, as a brute-force search would have to evaluate 2 N solutions (with N being the number of nodes, here N = 50). As proposed in [58], an approximation of the optimal solution is here obtained through a dual annealing (DA) optimisation, a stochastic approach combining the classical simulated annealing with a final local search [59,60]. In order to further discard local minima, the DA optimisation has been executed 50 times using random initial conditions, and the solution associated to the minimal J has been retained.
In order to extract more complete information about airport roles, we further consider the case of three clusters, thus representing sources, receivers and brokers-the last ones being airport that propagate delays between sources and receivers. This requires updating the definition of J in equation (13) in order to account for all possible combinations of three clusters-see [58] for further details.

Structure and evolution of the delay propagation network
As customary in complex network theory, we start by analysing the evolution through time of the topological metrics defined in section 2.3, see figure 2. It can be appreciated that, in general, metrics display strong oscillations, reflecting important changes in the underlying network structure. Assortativity and diameter have values close to zero, thus indicating no strong topological structure; this is nevertheless not valid for transitivity, efficiency and IC, all having large absolute values. While the negative IC implies the presence of a non-specified mesoscale structure [57], the large transitivity and low efficiency suggest a network composed of a large number of triadic relations. In order to further analyse this aspect, figure 3 reports the evolution through time of the Z-score of the four most frequent motifs, all of them having a triangular structure. On average, these four motifs have a Z-score of 8.43, as opposed to the Z-score of 2.66 of the four motifs that do not include a triangle (designated as motifs 1 to 4 in [61]).
March 2016 has a singular behaviour, with a large (in absolute value) density, transitivity, efficiency and IC-see figure 2; the same can also be observed in the evolution of the second motif of figure 3. This behaviour is due to two main factors. Firstly, a terrorist attack on March 22nd targeted the Brussels airport, disrupting the connectivity of the network in the following days, with most flights rescheduled to nearby airports; and further generating delays in the following months, as a consequence of more strict security checks. Secondly, the system experienced a 17% increase in reactionary delays (from an average 3.85 minutes in March 2015 to 4.52) due to a five-fold increase in en-route delays following French industrial actions [62]. The negative effects of these events are clearly represented in the network structure, also highlighting how two localised occurrences (one affecting one single airport, and one a single airspace) can have system-wide consequences.
We then move to studying the centrality of airports and how such centrality evolves through time. This is accomplished by creating a ranking of the airports in each time period, according to the three centrality measures explained in section 2.3; the average ranking position across the four years is then calculated for each airport. This metric thus represents how instrumental has been, on average, each airport in the propagation dynamics, with low values (i.e. top positions in the ranking) indicating a more important role. A list of the most central airports is reported in table A2 in appendix A. When plotted against the number of flights, a statistically significant negative relationship is detected between the out-degree centrality of an airport and its  size (see figure 4, linear fit, ρ = −0.039 and r = −0.3781). Such relationship is nevertheless lost when considering the in-degree or the betweenness. This indicates that large airports have the tendency to generate and transmit delays; but, on the other hand, that the roles of delay broker, i.e. to act as a bridge to propagate delays between airports; and of delay receiver is something more distributed across the whole network. Additionally, the aforementioned correlations are not statistically significant when the degree of nodes is weighted according to the average delay of sources or destination airports, suggesting that delay propagation patterns are not directly linked to delay magnitudes.
We further analyse the relationship between the average delay per airport, normalised according to the total number of operations, and the measures of centrality (out degree and betweenness). In order to obtain a complementary view, for each time period a linear model is fitted between the arrival delay and the centrality; the slope of such fit is then represented as a function of time in figure 5. The linear model is significant only in some time periods, marked with an * . There is nevertheless a general positive correlation, indicating that airports suffering from high levels of landing delays are also those spreading those delays throughout the network.

Global delay propagation roles
As a complementary view to what presented in the previous section, we here apply the clustering analysis proposed in [58] and described in section 2.4. We start with a simplified situation with two clusters, respectively cluster 1 including all airports being net sources of delay propagation; and cluster 2 with airports being net receivers. This allows to strongly simplify the interpretation of the role of each airport in the network, as it is now described by a single binary value-as opposed to a combination of centralities, as seen in section 3.  The (left panel) of figure 6 reports the evolution of the assignation of airports to clusters. As previously seen, the structure of the network evolves substantially over time, and as a consequence, so does the assignation. Still, some airports display a mainly static role, with some being almost always delays generators and absorbers-see (top and central right panels) of figure 6. The (bottom right panel) of the same figure also illustrates the presence of a correlation between the size of each airport, measures through its number of flights, and the number of months it has been classified in the first cluster. In other words, and confirming the previous analyses, large airports are mostly responsible for the generation of delay propagation patterns.
While the previously shown classification into two clusters presents the advantage of maximally simplifying the network structure, this may come at the cost of an over-simplification; in other words, by forcing nodes to be assigned to two categories, complex propagation dynamics may be lost. We thus performed the previously shown analysis for three clusters, with results reported in figure 7. As introduced in [58], this allows to detect a new role located in the middle of sources and receivers of delays, i.e. the one of delay brokers: airports that do not generate nor absorb delays, but mostly transmit them from and to other airports. While this analysis reduces the risk of oversimplification, it has to be noted that it is still a simplification of a complex propagation dynamics; as such, it allows to better understand the main propagation patterns, at the price of deleting smaller details. As can be seen in figure 7, results are qualitatively similar, and airports responsible for generating delays are mostly the same as the ones in figure 6. Additionally, the correlation between generation of delays and traffic  volume increases from ρ = 0.548 to ρ = 0.745, highlighting the role of major airports in the propagation dynamics.
We finally analyse the temporal evolution of the airport assignation to two clusters, by splitting the available data in two time windows, i.e.

Results validation and stability
In order for this information-based analysis of delay propagation to have a real impact, one essential aspect is the validation of the results by it obtained, especially when there is yet no clear consensus on the techniques that ought to be used, nor on the way data have to be pre-processed. This is nevertheless not a simple endeavour, as a direct intervention in the system, e.g. to check the effects of a given solution, is usually not feasible. Given this state of affair, we here propose the use of two types of indirect validation.
As a first step, one can try to establish if the evolution of the reconstructed networks is related to events that happened in the system, such that the latter can explain (and somehow validate) the former. As already seen in section 3, the abnormal network structure observed on March 2016 can easily be explained by the extreme disruptions suffered by the system that month. Explaining the behaviours observed in other time windows is nevertheless complicated, as no additional events of such magnitude have happened. One can further correlate the strength of the network connectivity with some macroscopic metrics describing the behaviour of the system. To illustrate, a correlation between the average log 10 of the link p-values for each network, and the average amount of reactionary delays for the corresponding month (as reported in CODA's 'all-causes delay to air transport in Europe' monthly reports), is negative (ρ = −0.093, r 2 = 0.062, p-value = 0.391). Although not statistically significant, this result seems to confirm that months with more delay propagation are reflected in more strongly connected networks.
As a second way of validating the results, we here consider a null model in which causality links between any two airports A and B are calculated using two different months. As the air transport system mostly resets every night, delays seldom propagate between consecutive days; any causality test across different months should therefore yield large p-values, with few links being statistically significant. Figure 9 reports the histograms of the −log 10 of the p-values of links (left panel), and of the −log 10 J of the causality clustering analysis (right panel), for both the real data and the null model. It can be appreciated that results become less statistically significant when considering different months, as expected; and specifically, that the number of significant links drops tenfold (from 4935 to 583, i.e. to less than one link per airport per network) in the case of the null model.
We finally analyse how stable results are when considering different time series lengths. Specifically, figure 10 reports the evolution of the number of statistically significant links (left panel) and of the average log 10 J in the causality clustering analysis (right panel) as a function of the length of the time series used to assess the Granger causality, expressed in number of days. As expected, longer time series yield more links, as the causality test is able to detect weaker interactions. On the negative side, the two metrics do not stabilise with one month on data; in other words, more links could possibly be obtained, were more data available. This highlights the need of more complete data sets, beyond what currently offered by, for instance, the EUROCONTROL's R & D data archive [44]; but also raises a note of caution against studies focusing on very short time intervals, as e.g. analyses of the evolution of causality on a daily basis [40], as the majority of the causality links may be undetectable with current techniques.

Discussion
We here presented an analysis of the structure and evolution of the network created by delay propagation in the European air transport, covering from 2015 to 2018 and the 50 largest airports.
As seen in figure 2, the monthly propagation networks have a highly variable structure, reflecting the events that drive the creation and subsequent spreading of delays-as for instance the terrorist attack on Brussels airport on March 2016. Such variability is mainly affecting the global (or macro-scale) structure of the network; nevertheless, the micro-scale structure is notably much more consistent, especially when considering the role of individual airports. As shown in figure 6, many airports maintain their global cluster assignation throughout time; and only three of them changed cluster in a statistically significant way when comparing 2015-2016 with 2017-2018, see figure 8. This may points towards the presence of two opposing forces: a structural one, according to which some airports have a stable propagation role, resulting from their connectivity, traffic volume, procedures, equipment, etc; and the appearance of random events throughout the system. While the former pushes the propagation network towards a fixed state, the latter events can appear at any location and time, thus effectively acting like a random rewiring.
The delay propagation network is dominated by triangular structures, as shown by the high transitivity (see figure 2) and high Z-score of triangular motifs (figure 3), in agreement with what previously found in [39,40]. The network is also dominated by large airports, which have a higher probability of starting a delay propagation-as seen by the negative correlation between the out-degree and the ranking, figure 4, and the positive correlation between size and time spent in cluster 1, figures 6 and 7. Such correlations are nevertheless lost in the case of the in-degree and of the betweenness, suggesting that the dissipation of delays is a process in which all airports contribute, independently on their size. This is at odd with what reported in other studies; specifically [39], identified a clear negative correlation between degree and size of airports, while [40] reported a positive correlation.
Such discrepancies can be due to many factors, including the use of different data sets with different number of airports; the different geographical area considered in [40], which also implies different prioritisation rules for flights and hence delay mitigation strategies; and the way data are pre-processed, as discussed in [43]. This raises questions about the reproducibility and validation of results; beyond the analyses presented in section 5, this topic will be discussed in the next section.
As a final note, it is worth discussing what operational knowledge can be extracted from the results here presented; or, in other words, how can these results guide the development of strategies aimed at reducing delay propagation? Given that large airports are mostly responsible for starting the propagation process (as reflected in their high out-degree), the simplest solution would be to deploy additional resources there. This is nevertheless not optimal for several reasons. First of all, large airports already operate close to their maximum capacity, and expanding this is usually not a simple process-as illustrated by the case of the construction of a third runway at London Heathrow Airport [63]. Beyond expanding capacities, another solution may involve increasing the efficiency of their usage. Yet, 30 of the 50 airports here considered are already included in the Airport collaborative decision making program, exchanging relevant information in real-time about aircraft turn-round and pre-departure processes [64][65][66]; expanding the program to all top-50 airports is expected to bring an increase in en-route capacity, but not necessarily a more efficient use of airport capacity [67,68]. Many research initiatives have also highlighted the increase in efficiency that can be achieved using machine learning and other data analysis tools, see for instance [34,[69][70][71][72]; the implementation path for these approaches is nevertheless not clear. Additionally, this would represent an example of a policy broad in scope, i.e. targeting indiscriminately all large airports, as opposed to pinpointing specific cases and tackling them in a more focused and cost effective way.
Results here presented highlight two different potential solutions. The first one involves finding those airports that have a high centrality, e.g. in terms of out-degree, but also a reduced numbers of operations, and identifying what operational aspect is hindering their performance; this may include Düsseldorf Airport (EDDL) or Geneva Airport (LSGG), which are operating well below their theoretical capacity. The second one includes changing the priority rules of flights, such that aircraft departing from high out-degree airports would be prioritised when landing at a high betweenness airport, in order to break the propagation chain. Yet, the validation and the economic assessment of these strategies go beyond the scope of this work.

Conclusions: can functional networks disrupt delay propagation?
Will functional network representations of delay propagation ever bring about a revolution in air transport?
As seen in the analysis here presented, interpreting delays as a form of information being processed throughout the network brings a radical new way of understanding their propagation, and has the potential of suppose a conceptual quantum leap. Yet, and in spite of a promising start with a constant if modest flow of new research works in the area, the analysis of air transport delay propagation through functional networks has a long way ahead, and the need to solve several theoretical and practical problems before being able to generate a tangible impact. We below sketch a roadmap based on the lessons learnt in this work, making parallelisms with the solutions already developed in neuroscience in the last decades when dealing with functional representations of brain activity.
First of all, metrics used in the reconstruction of functional networks are based on different assumptions about the data and the relationships between the system's elements, and therefore yield different (and at times, conflicting) views on the interaction structure. While this is a problem well-known in neuroscience literature [73,74], little is known about how delay propagation patterns are affected by the choice of the connectivity metric [37,41]. As shown in [43], even a same causality metric can yield different results according to how missing values are dealt with. It is further possible that a correct evaluation of these propagation structures will require tailored, yet to be developed metrics, on the line of what proposed in [41]. It has further to be noted that the problem of functional network reconstruction from data is still an open one even from a theoretical point of view, and that researchers within statistical physics are still improving our understanding of the process [75][76][77][78]. The problem here considered is further strongly connected with other research topics in statistical physics, like the dynamics of coupled oscillators with and without delays [79][80][81][82][83]. A stronger connection between both fields will therefore be essential.
Secondly, once delay propagation networks are reconstructed, the next logical step is the extraction of topological metrics representing their structure. As only natural, metrics initially used were those standard in network theory, as e.g. transitivity or modularity-see e.g. [39,40] and section 2.3. Specific topological metrics may nevertheless be needed to describe domain-specific structures, as e.g. is the case in neuroscience for the cost efficiency [84] or the leverage centrality [85].
As a third point, there is a need of validation of the obtained results, a process that can in turn be used to validate the selection of both the connectivity and topological metrics. In this regard, air transport is a complex domain. On one hand, a validation based on changing the state of the system and observing the resulting differences in the evolution is impractical, due to the high costs that this would entail, both economic and in terms of mobility. On the other hand, the system seldom experiences large-scale disruptive events that may be used as different conditions, akin to pathologies in neuroscience [13]. The remaining alternative is possibly the creation of synthetic models, able to generate time series representing realistic dynamics of airports and aircraft, and whose parameters can be tuned to recreate different propagation patterns. While models and toy models are common in air transport [25,86,87], one tailored to network analysis has not hitherto been proposed.
As a final step, delays propagation analyses cannot remain restrained within the academic world, but should instead be used to guide and evaluate policies aimed at improving the system. This will require the inclusion of complementary aspects, as e.g. safety and cost analyses, which are usually not considered in more theoretical works; and, in turn, the participation of multiple stakeholders, from network managers to airlines.