Approximating Betweenness Centrality to Identify Key Nodes in a Weighted Urban Complex Transportation Network

. The key nodes in a complex transportation network have a significant influence on the safety of traffic operations, connectivity reliability, and the performance of the entire network. However, the identification of key nodes in existing urban transportation networks has mainly focused on nonweighted networks and the network information of the nodes themselves, which do not accurately reflect their global status. Thus, the present study proposes a key node identification algorithm that combines traffic flow features and is based on weighted betweenness centrality. This study also uses weighted roads to construct an L-space weighted transportation network and an approximate algorithm for betweenness centrality in order to reduce the complexity of the calculations. The results of the simulation indicate that the proposed algorithm is not only capable of identifying the key nodes in a relatively short amount of time, but it does so with high accuracy. The findings of this study can be used to provide decision-making support for road network management, planning, and urban traffic construction optimization.


Introduction
Since Watts and Strogatz [1] developed a general network with small-world properties and Barabasi and Albert [2] created a scale-free network, the study of complex networks has been attracting increasing attention from scholars and industry professionals [3,4].Moreover, previous studies have suggested that urban transport networks have either smallworld features, scale-free features [5], or both [Ye [6]].Such networks typically satisfy the structure and functional characteristics of complex networks.For example, Sienkiewicz et al. [7] indicated that the degree distribution of an urban transport network in Poland follows either a power rate distribution or an exponential distribution.
In general, an urban transport network is composed of various types of roads that are associated with different functions, grades, and locations with a certain density.Within each network, the nodes are the junctions where two or more highways converge.Overall, the nodes and the connecting sections between the nodes are the key components of an urban traffic structure.Since the key nodes in an urban transportation network have a significant influence on the safety, reliability, and overall performance of the network, the identification of key nodes and the examination of their complexities have been the subjects of numerous studies [8,9].In addition, by identifying the key nodes in a network, it can provide accurate and effective traffic control as well as guidance for existing transport networks.Previous research has highlighted the importance of nodes [10][11][12][13] and provided new perspectives and methods for evaluating the node's importance in urban complex road networks.For example, such work may help traffic operators and managers to better propose the budget plan of infrastructure development [1].Song et al. [14] introduced urban flow intensity indicators in the evaluation index system to elucidate the importance of road network nodes.They also used factor analysis to avoid random subjective values in the node calculations and the K-means clustering method to distinguish the levels of each node for further analysis.Wang et al. [15] focused on traffic flow characteristics, travel speeds, delays, and intersection saturation rate in order to distinguish the key nodes in traffic network, which is superior to simply relying on traffic engineers to judge by experience.Moreover, Zuo et al. [16] used road network effectiveness and efficiency as evaluation indicators to identify the key nodes as well as links between them.M. Dreyfuss and Y. Giat [9,17] proposed a risk model to identify the key nodes within network that it can be used to map the critical nodes.Although the abovementioned research has made significant contributions to identify key nodes, a couple of critical issues need to be further investigated: (1) the widely used K-means clustering method typically suffers from the problem of computational efficiency due to the fact that the algorithm needs to constantly adjust the sample classification and recalculate the new cluster center; (2) the most of existing algorithms can only contend with the local information of the involved network while the global information is often ignored; (3) the most of existing studies can only consider the physical structure of urban road network while some traffic-related factors are typically neglected.
To overcome those reviewed limitations, this study proposes an enhanced Betweenness Centrality (BC) algorithm to better identify the key nodes in transport network.Betweenness Centrality (BC) [18], a well-known index that ranks the importance of a node, is the ratio of the shortest path number through the node to all of the shortest path numbers in the network.In this regard, the higher the BC of the node, the more important the node is in the network.However, when the complexity of the BC calculation is high and the computational complexity greatly restricts the size of the computable network (e.g., in the case of a complex traffic network in a large city), the time required for the algorithm is significant.Those barriers would lose its practical application value.Therefore, an enhanced BC is proposed, in which the Ulrik Brandes (UB), a fast algorithm proposed by Brandes [19], is further embedded into traditional BC for reducing the complexity of the calculation of BC.The time cost of the algorithm is then able to be greatly reduced, and the proposed algorithm is capable of dealing with the case of large urban complex network.The enhanced BC reflects the role and influence of the corresponding nodes in the whole network.Such the algorithm features an important global quantity, which can overcome the abovementioned research shortcomings.To verify the effectiveness and efficiency of the proposed study, Nanjing, the capital city of Jiangsu Province, China, is used as the case study.

Key Node Evaluation Model for a Weighted Urban Complex Transportation Network
. .Problem Description.Based on the primal approach [20], the research problem is represented as  = {, }, in which  = {V 1 , V 2 . . .V  } represents the set of nodes in the traffic network, the nodes represent the crossroads, the demarcation section indicates where the geometric factors of the roads have undergone major changes, and V  represents node .
Moreover,  = { 1 ,  2 . . .  } represents the set of edges that follows the footprints of actual mapped streets, while   represents edge .
. .Definition of Key Nodes in an Urban Complex Transportation Network.In large-scale complex networks, not all nodes are equal.The most important nodes in a weighted network are those whose removal results in the greatest increase in the shortest distance between two specified nodes [21].In an urban network, the key nodes play a central role in the entire network, since they are not only affected by the network topology, but they are also affected by the traffic flow in the road network.In terms of structural characteristics, key nodes also play a pivotal and controlling role in the road network.In fact, the failure of key nodes will lead to the loss of local connectivity in the road network and even the deterioration of global connectivity.In extreme cases, the overall efficiency of the transportation network will sharply decline.
The nonhomogeneous topology of an urban complex transportation network also determines the importance of the nodes in the network.The importance of the nodes usually depends on two aspects: the position of the nodes in the network (e.g., the center node and the noncenter node) and the connectivity capability of the nodes.In regard to the latter aspect, the shorter the path through the node, the greater the connectivity and importance of the node to the entire network.
. .Evaluation Index.The importance of a node is closely related to its spatial location in the network.As a spatial network, the urban road network demonstrates a strong compactness and complexity in two ways.First, the number of node edges in the urban road network is large, and second, any two nodes are connected, with each node associated with multiple edges.Thus, in addition to the characteristics of the most complex weighted networks, urban road networks include features that differ from abstract networks.Such aspects can help determine the topology of urban transport networks.
The key evaluation indicators of complex transportation networks mainly include the following: ( ) Degree and Degree Distribution of the Nodes.The degree is the simplest and most important concept in regard to evaluating the characteristics of a node.It is also a fundamental parameter that describes the local characteristics of a network.Moreover, the degree of a node is related to its number of connections.Thus, high-degree nodes have a greater impact on the entire network.As for the degree distribution of the nodes, it is the proportion of the nodes with degree  in the entire network.
( ) Average Path Length.The average path length, denoted as , is the average value of the shortest path between any two nodes in the network.In this case, the average path length is given by the following equation: where  is the number of nodes in the network and V denotes the shortest distance between node  and node .When  = ,   = 0.
( ) Clustering Coefficient.The clustering coefficient of the network is defined as the average clustering coefficient for all of the nodes in the network.The clustering coefficient is given by the following equation: where   is the clustering coefficient of node V  .When the network is a global coupling network, the clustering coefficient is 1.However, when the network does not have any edges, the clustering coefficient is 0.
( ) Betweenness Centrality (BC).BC, which is a global centrality index, is defined by the following equation: In this case,   represents the shortest path from node  to node , while   (V) is the shortest path from node  to node  that goes through node V. Overall, BC can reflect the role and influence of the nodes in the entire network.
( ) Road Network Connectivity.Road network connectivity is the ratio of the number of the shortest paths between all of the nodes in the road network (after node V  loses its effectiveness) to the number of the shortest paths between all of the nodes in the normal road network.It is defined by the following equation: where  is the number of the shortest paths between all of the nodes in a normal road network,   is the number of the shortest paths between all of the nodes in the road network (after node V  loses effectiveness), and   refers to the connectivity reliability of the road network.

Key Node Identification Model for a Weighted Urban Transportation Network That Integrates Road Network Traffic Characteristics
An urban transport network is a physical network with mileage and traffic capacity.It also includes both the characteristics of complex networks and distinctive traffic characteristics.Thus, this study proposes a key node identification model for a weighted urban complex road network that integrates the network topology structure and traffic network characteristics.For this purpose, the weighted traffic network includes weighted indicators of the road grades, which can be used to distinguish the important nodes in the transportation network.It also introduces an approximate algorithm for BC in order to reduce the complexity of the calculations.
In related research of urban road networks, BC and node degree indicate the importance of the nodes.Since BC reflects the impact of the nodes on the entire urban transportation network, it is an important quantification method for studying the characteristics of the network structure.
Overall, BC is an important global geometry, but its calculation must traverse the shortest path between any pair of nodes in the graph as well as record the route of the shortest path.This calculation can be difficult, since the computational complexity restricts the size of the computable network.Moreover, when the scale of the network is large, it is not feasible to employ conventional calculation methods.Thus, scholars, both at home and abroad, have conducted extensive and in-depth research on the estimation of BC.In 2001, Brandes presented [22] an efficient algorithm for calculating BC, in which the complexity of the algorithm in the weightless network was (, ),  was the number of nodes, and  was the number of edges.Tang Jintao et al. [23] proposed an approximate calculation method, CDZ, based on local centrality, while Bergamini and Meyerhenke [24] presented a fully dynamic algorithm for BC approximation in weighted and unweighted graphs, which indicated that the algorithm can achieve substantial speedups.
The BC algorithm proposed by Brandes was the fastest algorithm at the time.The core idea of the algorithm was to select any node as the source node, use the depth-first algorithm to find the shortest path from the source node to the other nodes in the network, and calculate the BC of all of the nodes that correspond to the shortest path.In this case, the accumulation of BC in each node, as the source node, is the BC for all of the nodes in the network.
According to the BC algorithm by Brandes, the number of nodes in the network is set as .Then, taking node V  as the source node, a depth-first traversal of the network is performed.The shortest path from node V  to the other nodes in the network corresponds to the centrality of node V, as shown in the following equation: The BC of node V is as follows: After node V is selected at random, set the BC of node V corresponding to the shortest path of the other nodes in the network as   =   (V) in order to obtain the following equations: Thus, the approximate BC formula of node V is as follows: Overall, this algorithm significantly reduces the computational complexity of BC.More specifically, since the complexity in the unweighted network is ( ) and the complexity in the weighted network is ( +  2 log ), the calculation can be completed rather quickly, even in a large-scale urban transport network.Hence, the proposed algorithm in this study integrates the characteristics of an urban transportation network and uses weighted roads to construct an L-space weighted transportation network.Moreover, Wang et al. 's analysis regarding the vulnerability of a road network [25] indicated that one of the key factors that affect the importance of nodes is road grade.Thus, the identification model in the present study not only integrates the characteristics of an urban transportation network, but also constructs a weighted network according to road grade.In this case, the BC of node V is expressed as follows: where  V is the weight of node V.
In general, road class is divided into freeways, trunk roads, secondary roads, and branch roads, according to road width and speed limit, among other characteristics.Each class was assigned a value to denote its significance; this is the value of  V in (10).According to the literature [25], the values of d are suggested to be set as 10, 8, 5, and 3, respectively.In addition, different intersections in an urban traffic network have different effects on traffic.For example, the traffic flow at the intersections of freeways or trunk roads is obviously higher than that at the intersections of secondary roads or branch roads.Accordingly, once the intersections of the former become congested, it is easy to cause a networklevel traffic congestion.
Construct a complex topological traffic network of Nanjing, represented by  = (, , ), the set of nodes in the network is represented by  = {V 1 , V 2 , ⋅ ⋅ ⋅ V  }, the number of network nodes is N, the set of edges in the network is represented by  = { 1 ,  2 , ⋅ ⋅ ⋅   }, and  = {  } refers to the weight of the edge between the node and the node itself.Let  be the adjacency matrix of , while Nanjing's weighted traffic network map information is stored as adjacency matrix ( , ).In this case,  , = 1 indicates that the nodes are directly connected, whereas  , = 0 indicates that the nodes are not directly connected.
The key node identification steps are as follows: (1) Construct a topology of the Nanjing's traffic network and number the nodes in the network.
(2) Each node was assigned weighted value  V to denote its significance.
(4) Calculate the (), clustering coefficient, and shortest path length and analyze network characteristics.
(6) Synthesize the above to identify the key nodes.
The algorithm in this study is presented in Algorithm 1.

Experiment Analysis
The city of Nanjing is surrounded by mountains on three sides, with the Yangtze River running from north to south.
During the period of the Republic of China (1912-1949), Zhongshan Road, known as the "National Meridian," was used as the main axis of the road network structure.After years of construction and development, the city of Nanjing completed its chessboard-style transportation network (see Figure 1).As shown in the Figure 1, the major highway network of Nanjing is composed of 15 freeways and a beltway.Known as "Longitude Six Latitude Nine," they form the fast-moving road network, which includes an inner ring and two outer rings with peripheral freeways that span in all directions.The total length of the overall network is approximately 325 kilometers.Overall, Nanjing's transportation network is wellplanned, compared to the networks of Jinan, Harbin, Beijing, and other large cities.However, the congestion during the morning and evening peak travel periods at some key nodes has been difficult to resolve.According to the China Urban Traffic Analysis Report [26], during the first quarter of 2017, Nanjing's Congestion Index was ranked 20th in China.Thus, traffic congestion, especially in Nanjing and other large cities in China, remains the subject of focus among transportation experts.
According to the characteristics of transportation network in Nanjing, its spatial relationship between road intersections have been meticulously designed.This inventive method was adopted to abstract the intersections of the road segments as nodes and to establish an L-space weighted  complex network.For the purpose of this study, the smaller streets with less traffic have been ignored.The topology of Nanjing's traffic network, based on the aforementioned conditions, is illustrated in Figure 2.
All of the results in this study are only applicable to the data provided in Figure 2. According to the results of adjacency matrix  = ( , ) × , the average clustering coefficient of Nanjing's traffic network is 0.0439, and the average path length is 6.46.The findings show that the network is a smallworld network that includes some properties of a random network.The degree distribution of the network, based on adjacency matrix  = ( , ) × , is presented in Figure 3.
As shown in Figure 3, the degree distribution includes a certain power distribution in the double logarithmic coordinate system, indicating that the nodes in the network are heterogeneous.Moreover, since the majority of the degree of the nodes is concentrated near the average degree of the network, the overall network structure is reasonable and the connectivity of the road network is acceptable.However, although such a network is robust to random attacks, it is still vulnerable to malicious attacks.In this regard, the key nodes in the network are effectively identified and monitored, which ensures the overall performance of the network.
In order to verify the proposed algorithm, the algorithms in [15,16] were compared with the algorithm in the present study.Table 1 presents a comparison of the top 10 key nodes among the three algorithms.
As shown in Table 1, some of the key nodes in the three algorithms are the same, but ranked differently.As a result, the proposed weighted network algorithm that integrates the characteristics of the traffic network in this study is more accurate.Moreover, based on actual road network operations, these nodes are consistent with the nodes in heavy traffic, indicating that they play an important supporting role in the connectivity of the road network.According to Nanjing Big Data, Nanjing's peak morning traffic period is between 7:10 a.m. and 9:30 a.m.Thus, the traffic patterns during this time period are the subjects of focus.Overall, the findings are consistent with the actual situation.For example, during peak hours, various nodes, such as 46, 49, 98, 221, 288, 299, 309, and 560, experience the heaviest traffic.Figure 4  As shown in Figure 2, the intersections of the peak morning traffic period in Nanjing are mainly concentrated in nodes 46, 93, 150, 288, 225, 299, 52, 313, 560, etc.Thus, these nodes have been identified in the proposed algorithm.However, some of the key nodes in the network have not been identified in [15,16].Meanwhile, the accuracy between [15,16] is somewhat similar.For instance, nodes 93, 288, 150, 52, and 313 have a low degree of node capability (which was not identified in [15]), since the characteristics of the road network do not fully reflect the role of the node in the network.Moreover, the identification of nodes 93, 46, 225, and 288 in [16] is inaccurate.Although the efficiency of the overall road network and the efficiency of these nodes are not high, the locations of these nodes in the network still play a pivotal role.
Overall, the proposed algorithm in this study makes up for the shortcomings of the algorithms in [15,16], since it can accurately identify the key nodes in an urban road network.These key nodes are important for focusing on future expansion and reconstruction projects, while considering the locations that are most susceptible to heavy traffic congestion.
In order to further analyze the performance of the proposed algorithm in this study and its effect on Nanjing's transportation network, these critical nodes are invalidated in descending order of importance (see Figure 5).
According to Figure 5, when the number of key node failures is less than 10, the network performance of the three algorithms is somewhat similar.However, with a further increase in the number of key node failures, the performance of the proposed algorithm decreases significantly.For example, when the number of key node failures reaches 30, the network efficiency drops to 10%.This indicates that the proposed algorithm can effectively identify key nodes in a complex urban transportation network.Moreover, when they fail, they will cause the node itself (and many local nodes) to become unreachable.In sum, the key nodes identified by this algorithm are more accurate and reasonable than the other two algorithms.

Conclusion
The key nodes in an urban traffic network have a significant influence on the safety and reliability of the entire road network structure.Therefore, the present study analyzed the characteristics of Nanjing's transportation network, abstracted the actual road network as a complex established an L-space weighted complex network according to road grade, and proposed a weighted complex network key node identification model suitable for this particular network.Moreover, it introduced an approximate algorithm for BC in order to reduce the complexity of the calculations.In order to verify the performance of the proposed algorithm, it was compared with the algorithms presented in [15,16].Based on the results, the key nodes identified by the proposed algorithm were more accurate and the sorting was more reasonable.An analysis of Nanjing Big Data also indicated that the results of this study are consistent with real-life situations.The findings of this study can be used to provide decision-making support for road network management, planning, and urban traffic construction optimization.

Figure 5 :
Figure 5: Comparison of Nanjing's transportation network based on the three algorithms and the key node failures.