A study on centrality measures in weighted networks: A case of the aviation network

: Identifying influential spreaders in complex networks is a crucial issue that can help control the propagation process in complex networks. An aviation network is a typical complex network, and accurately identifying the key city nodes in the aviation network can help us better prevent network attacks and control the spread of diseases. In this paper, a method for identifying key nodes in undirected weighted networks, called weighted Laplacian energy centrality, was proposed and applied to an aviation network constructed from real flight data. Based on the analysis of the topological structure of the network, the paper recognized critical cities in this network, then simulation experiments were conducted on key city nodes from the perspectives of network dynamics and robustness. The results indicated that, compared with other methods, weighted Laplacian energy centrality can identify the city nodes with the most spreading influence in the network. From the perspective of network robustness, the identified key nodes also have the characteristics of accurately and quickly destroying network robustness.


Introduction
Many real-world networks can be regarded as complex networks, such as social networks [1] and transportation networks [2].The development of the complex systems theory has provided a strong theoretical foundation for the analysis of complex networks in real life.Studies in [3][4][5] have shown that the failure of a few nodes in a network often leads to complex cascading failure effects, even causing widespread network paralysis.Therefore, the identification of key nodes in networks has become a popular and important research topic.Currently, the study of network centrality is mostly based on the perspective of network topology [6][7][8].The topology of a network contains a lot of functional information about the network.Starting from considering different degrees of topological structure information, the recognition method is extended from locality to globality.The simplest local method is degree centrality [9], which is simple and efficient, and suitable for large-scale networks.Degree centrality applied to weighted networks is known as strength centrality [10], which defines the strength of a node in terms of the total weight of its connection.The most classic global centrality methods are closeness centrality [11] and betweenness centrality [12].Nodes with high closeness centrality have the best perspective to observe the information propagation in the network, while nodes with high betweenness centrality can directly influence the flow of information in the network.When applying these two methods to weighted networks, the calculation of the shortest paths was defined correspondingly as the weighted shortest paths in [13].In recent years, Ma et al. [14] has applied graph energy to identify central nodes in networks.In [15], we proposed the application of Laplacian energy centrality on undirected unweighted networks, which showed promising results.However, this method has not been extended to weighted graphs.The aviation network is a complex network formed by cities as nodes and airlines between them as edges.The aviation network serves as a platform for social communication and economic development.Events such as adverse weather conditions, public health emergencies, and terrorist attacks can result in airport closures or flight suspensions, impacting the transportation efficiency of the entire aviation network and causing significant economic losses.Therefore, identifying key nodes in the aviation network has become a crucial issue.The theoretical analysis methods of complex networks have effectively matched with the study of the topological structure of networks [16][17][18][19].In [20][21][22][23], a global aviation transportation network model was established and the analysis of this model shows that the air transport network exhibits small world characteristics and scale-free properties.Key city nodes in transportation networks often have a significant impact on the operational performance of the network.Shen et al. [19] used principal component analysis (PCA) to identify key transportation cities in the transportation network and improve its performance.Lordan et al. [24] studied the robustness of the global aviation transportation network using complex network theory, proposed a method to determine key airports and intentionally attack the network based on different importance indicators, and compared the differences in the corresponding evaluation results.This method can accurately identify the central node and open some paths to future research in this area.Mo et al. [25] introduced classical centrality methods to analyze airport nodes based on the analysis of the aviation network structure, showing that traditional centrality methods can effectively identify the airport's structural system, but the paper did not consider the weight of the network.Li et al. [26] applied the concept of minimum connected dominating sets to complex networks.This method can simultaneously identify both key nodes and edges in the network, which is of practical significance for studying network resilience and backbone network construction.This method is highly integrated with the actual situation of air transportation, but the experimental data in this paper has limitations and the applicability of the theory is not well verified.Lou et al. [27] analyzed the changes in the robustness of network structures after attacks on different airline companies.However, the aforementioned literature only focused on network robustness and did not verify the identified key nodes from the perspective of network dynamics.
Based on the above research, we will extend Laplacian energy centrality to weighted networks and apply it to identify key nodes in the Chinese aviation network in this paper.We validate the identified key nodes from the perspectives of network propagation dynamics and network robustness.The rest of this paper is structured as follows: Section 2 provides a detailed exposition of the methods employed for network construction, the topology of the network, and several classical centrality measures.Following that, Section 3 focuses on the principles of the weighted Laplacian energy centrality method.In Section 4, we validate the weighted Laplacian energy centrality from the perspectives of influence and robustness.We give a conclusion in the last section.

Preliminaries
In this section, we first introduce the way to construct the Chinese aviation network.Subsequently, the constructed aviation network and its topological structure are introduced, with the analysis of the network's topological structure being able to uncover various information about the network.Finally, other centrality measures that serve as comparisons to the methods proposed in this paper are listed.

Aviation network model
The flight data between various cities of China from January 1-7, 2022 is obtained by an app called 'Ctrip'.The aviation network is abstracted as G = (V, E, W), where the node set V = {v 1 , v 2 , v 3 , . . ., v n } represents all cities with air transport.If a city has two or more airports, merge them into one node.The edge set E represents the connections between cities; if there are flights between two cities v i and v j , then an edge v i v j exists between these two cities.The weight of an edge v i v j , denoted by w i, j , is the total number of round-trip flights between these two cities v i and v j from January 1-7, 2022, excluding the self-loop formed by routes between two different airports in the same city.The weight set W contains the weights of all edges.The aviation network constructed in this paper consists of 198 nodes and 2379 edges.The topological structure of the flight data on January 1, 2022, is shown in Figure 1.

Degree and degree distribution
The node degree is an important indicator that reflects the statistical characteristics of the interconnections between nodes in a network.The degree k i of a node v i is defined as the sum of the number of edges connected to the node v i .The degree of a node refers to the number of connections the node has in a network.The five largest cities in terms of node degrees in the aviation network shown in Figure 1 are Beijing, Shanghai, Chengdu, Guangzhou, and Shenzhen.It can be seen that these are the major hub cities in China.
Degree distribution [28] is the proportion of nodes with degree k in the network, denoted as p(k).Figure 2 shows the degree distribution of the aviation network depicted in Figure 1.From Figure 2, it can be observed that the degree distribution of this network approximates a power law distribution.This is consistent with the fact that most airports have very few routes, while a few large hub airports directly connect to hundreds of other airports.

Weighted average path length
In an unweighted network, the average path length is defined as the average of the shortest paths between any two nodes in the network, calculated as follows: where N represents the number of nodes in the network and d i j represents the length of the shortest path between nodes v i and v j .A lower L value indicates better connectivity in the network.
Calculating the shortest path between nodes in a weighted network requires consideration of weights, where the weight between two neighboring nodes indicates the length of the path between the two nodes.According to the definition of the average shortest path, a smaller average shortest path indicates better network connectivity, but in the aviation network, if we use the number of flight routes as the weight between two adjacent nodes, larger values indicate more flight routes, which means better connectivity.Thus in this paper, we normalize the weights of the network, and then use the method of taking the reciprocal of the weights to make the weights between strong connections smaller than those between weak connections.This normalization method was introduced in [29] as follows: where W i, j represents the normalized weight value, and w i, j denotes the original weight value between node v i and node v j before normalization.m represents the total number of edges in network G.
The calculation method for the weighted shortest path length s w i j [30] between node v i and node v j is as follows: where c and d refer to the nodes traversed by the shortest path between node v i and node v j , then the average path length of a weighted network with N nodes in this paper is calculated as follows: The average path length of the Chinese aviation network in Figure 1 is calculated to be 2.5097.This relatively small value indicates that we require only a few transfers to reach any city, which satisfies the transportation needs of our airline at the current stage.

The clustering coefficient
The clustering coefficient of a network refers to the proportion of actual connections among the neighbors of a node in the network.It is commonly used to measure the density of connections between nodes and the strength of community structures within a network.Let E i be the number of edges that exist among the neighbors of the node v i .The clustering coefficient C i for a node v i is defined as E i divided by the maximum number of possible edges, that is, . (2.5) A network with a relatively high average clustering coefficient means that the nodes in that network are more closely connected to each other, forming a stronger community structure.The clustering coefficient C of the network refers to the average value of the clustering coefficients of all nodes in the network.Note that C ∈ [0, 1].The clustering coefficient of the network shown in Figure 1 is calculated to be C = 0.7365, which indicates that the network is more closely connected, and because the weighted average path length of this network is relatively small, we conclude that the aviation network in our country exhibits characteristics of small-world networks.

Strength centrality (SC)
In weighted networks, the importance of a node v i is determined by the sum of the weights of the edges incident to node v i .The formula for strength centrality [10] is as follows: where w i, j represents the weight between nodes v i and v j and N i represents the neighbor sets of v i .

Betweenness centrality (BC)
Betweenness centrality [30] takes this position by giving higher centrality values to the nodes that fall within the shortest path of many pairs of nodes.In simpler terms, nodes with higher betweenness centrality often have a direct influence on the flow of information in the network.In this study on the aviation network, a shortest path between two nodes is determined by the minimum weighted shortest path lengths between them.The betweenness centrality of a node v i is represented by the following equation: where δ k j represents the number of weighted shortest paths between v k and v j , and δ k j (i) represents the number of weighted shortest paths between v k and v j that pass through the node v i .

Closeness centrality (CC)
The definition of closeness centrality [13] states that the closer a node is to the rest of the nodes in the network in terms of the weighted average distance, the faster information can spread throughout the network.The normalized value of closeness centrality essentially represents the inverse of the distance and can be expressed using the following equation: where N is the number of nodes in the network and s w i j is defined before.

Eigenvector centrality (EC)
The eigenvector centrality [31] suggests that the importance of a node depends on both the number of its neighboring nodes and the importance of those neighboring nodes.The more important the neighboring nodes that are connected to a particular node, the more important that node is considered.The calculation formula is as follows: where c is a proportionality constant, x = (x 1 , x 2 , . . ., x N ) T , and after multiple iterations to reach a stable state, it can be written in matrix form as follows: Here, A = (a i j ) refers to the adjacency matrix of the network, and x is the eigenvector corresponding to the eigenvalue c −1 of matrix A.

Weighted Laplacian energy centrality
In this section, we will extend a centrality method, called Laplacian energy centrality [15], to weighted networks.For this, we first give some definitions.
Let G = (V, E, W) be a weighted network with n nodes and m edges, without loops and multi-edges, where the node set is V(G) = {v 1 , v 2 , v 3 , . . ., v n } and the edge set is E(G) = {(e 1 , e 2 , . . ., e m )}.Any edge e = (v i v j ) in E(G) has a weight value w i, j which is contained in set W(G).It is clear that w i, j = w j,i and w i,i = 0. Then the adjacency matrix of G is defined below: .
For each row i, we define its sum as s i = n i=1 w i, j = v j ∈N i w i, j , where N i is the set of neighboring nodes of node v i .s i represents the weighted sum of node v i .The degree matrix of G is defined by The Laplacian matrix of the weighted network G is defined as . Some wellknown properties of the Laplacian matrix L(G) are listed as follows: • L(G) is symmetric, singular, and positive semi-definite; • All eigenvalues are real and nonnegative; • The smallest eigenvalue is always 0.
The eigenvalues of L(G) can be arranged as Using the spectral features of L(G), the third Laplacian energy of G is defined as: The third Laplacian energy centrality LC w (v i ) of a node v i in the weighted network G (the unweighted version was defined in [15]) is then defined as In this paper, we call the above centrality weighted Laplacian energy centrality.For the reason of choosing the third power of eigenvalues, readers can refer to the paper [15].Next, we give an expression of LC w (v i ) by the local information of the node v i .
Theorem 3.1.(weighted Laplacian energy centrality): Let G be a weighted network with n nodes.For a node v i ∈ V(G), we have where s i is the strength of node v i and ∆ w i is the sum of the weights of the triangles containing the node v i (the weight of a triangle means the product of the weights of its three edges).
Proof.Let ∆ w be the sum of weights of all triangles in G. From the definition of E 3 L (G), we have Since tr(D 2 A) = tr(DAD) = tr(AD 2 ) = 0, tr(DA 2 ) = tr(ADA) = tr(A 2 D) = n k=1 s k n j=1 w 2 k, j and tr(A 3 ) = 6∆ w , from the above equation, we obtain Therefore,

□
If we only consider the case that all weights of edges in the network are nonnegative, then from the above theorem with the fact that 3s i w 2 i, j − 2w 3 i, j ≥ 0, we conclude that LC w (v i ) ≥ 0 for any node v i .Moreover, by the above theorem, one can easily obtain an expression of Laplacian energy centrality for unweighted networks, which is given in [15].In the rest of the paper, LC w is simplified to LC for short.

Experiment and analysis
The performance of the LC is evaluated by network dynamics and robustness for assessing central nodes in the network.We will present experimental results showcasing the performance of weighted Laplacian energy centrality and a series of other methods under these metrics.

Evaluation metrics
In this paper, we assess the importance of identified key nodes by evaluating their significance from two perspectives: The influence of central nodes on information propagation and the degree of change in network robustness after being subjected to attacks.The SIR model (Susceptible-Infected-Recovered, which will be defined in next section) is used to simulate information propagation, while the primary indicators of network robustness include the relative size of the maximum connectivity subgraph and the network efficiency.

SIR model
To analyze the failure process of the network under dynamic conditions, the most important thing is to study the propagation dynamics model, which can effectively analyze the dynamic connection between various factors in the network, and derive the law followed by the system under dynamic conditions.In recent years, some scholars have used the SIR model [32] to simulate the spread of disease in the global aviation network, and have given a strategy to curb the spread and provide a set of risk assessment systems.The SIR model can be used to simulate the spread of disease in the aviation network and the impact of flight delays.
In the SIR model, the nodes can be in one of three states: susceptible (S), infected (I), or recovered (R).In the initial stage, the states of individual nodes in the network are established.At each time step, nodes in the infected state (I) attempt to infect susceptible nodes (S) with an infection rate β, while recovering to the immune state (R) with a certain probability γ.The recovered nodes become immune and cannot be infected or infect others.The propagation process concludes when there are no nodes in the infected state (I) present in the network, then we have with n = S + I + R, where S, I and R represent the number of susceptible nodes, infected nodes, and recovered nodes at time t, respectively.

Network robustness
The robustness of a complex network refers to the ability of the network to resist damage when it suffers different degrees of damage [33].The robustness of the aviation network studied in this paper refers to the ability of the network to maintain its overall transportation function when natural disasters, public health emergencies, terrorist attacks, and other emergencies trigger route disruptions or airport closures, specifically whether it can reach the final destination at the time of the attack and ensure transportation efficiency as much as possible while ensuring that it reaches the destination.Consider that an attack can change the structure and transportation efficiency of the network.The following section introduces two of the most classical measures of maximum connected subgraph relative size and weighted network efficiency to measure the robustness of networks.
The maximum connectivity subgraph is the maximum connectivity component split after a network is attacked, which is an index reflecting the connectivity of the network.The relative size of the maximum connectivity subgraph S is defined as the ratio of the number of nodes in the maximum connectivity subgraph of a network after it is attacked to the total number of nodes in the original network, which is calculated as follows: where N ′ is the number of nodes in the largest connected component after the attack, and N is the number of nodes in the original network.
Network efficiency refers to the effect of structural changes on the shortest path distance between nodes after a network has been attacked.For unweighted networks, the shortest path refers to a path with the least number of edges between two nodes v i and v j is the number of edges of the shortest path length d i j between two nodes.If there is no path between two nodes in the network, the shortest path length is infinite, and the efficiency between two nodes is expressed by the inverse of the shortest path length between two nodes, and its inverse is 0 which does not affect the result of the calculation.The efficiency of the entire network is defined as the average efficiency between all nodes denoted by E. The calculation is as follows: In the case of a weighted network, the calculation of the shortest path length between two nodes considers the weights assigned to the edges, denoted as s w i j .Therefore, the weighted network efficiency E w is defined as:

Influence experiment
This paper simulates the spread process of the aviation network using the SIR model.The initial state of the network consists of the top 10 ranked nodes obtained through a certain method, as shown in Table 1, which serves as the initial infected node, and the remaining nodes are susceptible.In this aviation network, if we use an infection threshold β c [34] as the probability of infection, then the probability of infection between any two adjacent nodes is the same.However, since the edges of the network have weights, the edge with a large weight value means that the more round-trip flights between the two cities, then the probability of propagation of information between the two cities should be different according to the weight value of the edge, and the infection probability of the edge with a large weight value β i j should also be larger.According to this principle, we set the infection rate between two adjacent nodes v i and v j as follows: where w max is the maximum weight among all edges in the aviation network.The recovery rate γ of the network is set to 1.The SIR propagation experiments based on each method are repeated 100 times to take the average value.The increase in the number of nodes that have been infected in the network as time increases is plotted in Figure 3.It can be observed that the number of infected nodes in the network increases rapidly with time during t < 5, while it gradually stabilizes after t > 5, in which the top 10 ranked nodes derived from the LC method as the initial infected nodes when propagation is stabilized can infect the network with the maximum number of nodes.This demonstrates that the top 10 ranked nodes derived from the LC method have the most influence and can maximize the influence of the nodes in the network.It can be seen that the top ten nodes of EC and LC are the same.Due to the issue of propagation probability in the propagation process, the SIR model has produced different results, but still ranks first and second.Protecting the most influential hub cities in the aviation network identified by the LC method can effectively stop the spread process of similar disease outbreaks.

Robustness experiment
In Figure 4, the X-axis represents the percentage of attacked nodes, that is, the ratio of the number of nodes that have been removed from the network to the initial number of nodes in the network, and the Y-axis represents the five evaluation methods for assessing the network stickiness metrics: The average degree < k >, the average clustering coefficients C, the ratio of maximally connected subgraphs S , the global efficiency E w , and the average path length L w .The networks all adopt two types of modes: random attacks and intentional attacks.
The trend of the average degree of the network < k > in the aviation network under random and intentional attack modes are shown in Figure 4a.The change of the average degree can reflect the robustness and anti-jamming ability of the network.As can be seen in Figure 4a: In the random attack mode, < k > shows a slow decreasing trend throughout the process.In the intentional attack mode, < k > shows an exponential decrease.It can be seen from the trend that the LC method of attack minimizes < k > the fastest compared to other methods.The trends of the clustering coefficient C of the Chinese aviation network in random and intentional attack modes are shown in Figure 4b.It can be seen that in the random attack mode, the value of C changes more smoothly, indicating that the impact of random interference on the local transportation efficiency is not too great at the initial stage.Compared to random attack, in intentional attack mode, the value of C decreases sharply and collapses rapidly, and the air transportation industry enters a paralyzed state rapidly, indicating that intentional attack has a large impact on local transportation efficiency.It can be seen that the LC method is very close to DC and EC in attacking network effectiveness, but is generally superior to BC and CC.
The trends in the relative size of the maximum connected subgraph (S ) in the Chinese aviation network under random and intentional attack modes are shown in Figure 4c.In Figure 4c, S exhibits a linear and continuously decreasing trend throughout the process in the random attack mode.It shows that the aviation is resistant to random attacks.In the intentional attack mode, S decreases dramatically.At %n = 20, the decreasing trend slows down, at which the network also nearly collapses, and at %n = 25, S is almost equal to 0. It can be seen from the changing trend that the LC method of attack minimizes S the fastest compared to other methods.
The trends of global efficiency E w of in the Chinese aviation network under random and intentional attack modes are shown in Figure 4d.In the random attack mode, E w shows a flat trend throughout, and the global efficiency does not decrease with the attack of the nodes, indicating that the random attack has little impact on the robustness of the network.While in the intentional attack mode, a variety of methods lead to a sharp drop in E w .Between %n = 8 and %n = 19, the experimental difference between the various methods is not large, and E w of the LC method drops to its lowest value at %n = 19 then remains stable.The trend shows that the LC method can minimize the global efficiency of the network faster.
The trend of the average shortest path L w is shown in Figure 4e.For the aviation network, the smaller L w means fewer node cities need to transit in the air transportation process.In random attack mode, the change of the value of L w is smooth with slight fluctuation.When %n = 13.5, the value of L w increases slightly, but is not very different from the initial value of the average shortest path of the network, and it has remained stable.These changes show that random attacks have little effect on the general convenience of the aviation network.In the case of an intentional attack, there is a sharp increase and then a sharp decrease, at %n = 13.5.The fastest increase of L w is caused by the attack of the LC method, which shows that the intentional attack has a significant impact on the overall convenience of the aviation network in the initial stage, because the node cities removed in the initial stage are the central cities of navigation.The intercity air transportation needs to make multiple transits, so the value of the node cities will increase dramatically as the nodes are removed.After two sharp changes, the remaining nodes of intercity transportation by the removed nodes of the impact of the nodes become smaller or even unaffected, that is, the number of transshipments will also be reduced to the average level, so after %n = 23, the convenience of the entire network is gradually completely lost.The above changes indicate that intentional attacks have a large impact on the overall convenience of the aviation network.It can be seen that in the initial stage, the LC method breaks the network at the fastest speed, and in the later stage, the BC method can lead the other methods to make the average shortest path of the network maximize the words; the LC method is also superior to the other methods.

Figure 3 .
Figure 3. SIR propagation graph of the aviation network.

Figure 4 .
Figure 4. Variations of five robustness metrics for node-attacked networks.

Table 1 .
Top 10 major cities obtained by different methods.