Characterizing the Structure of the Railway Network in China: A Complex Weighted Network Approach

Understanding the structure of the Chinese railway network (CRN) is crucial for maintaining its efficiency and planning its future development. To advance our knowledge of CRN, we modeled CRN as a complex weighted network and explored the structural characteristics of the network via statistical evaluations and spatial analysis. Our results show CRN as a small-world network whose train flow obeys power-law decaying, demonstrating that CRN is a mature transportation infrastructure with a scale-free structure. CRN also shows significant spatial heterogeneity and hierarchy in its regionally uneven train flow distribution. We then examined the nodal centralities of CRN using four topological measures: degree, strength, betweenness, and closeness. Nodal degree is positively correlated with strength, betweenness, and closeness. Unlike the common feature of a scale-free network, the most connected nodes in CRN are not necessarily the most central due to underlying geographical, political, and socioeconomic factors. We proposed an integrated measure based on the four centrality measures to identify the global role of each node and the multilayer structure of CRN and confirm that stable connections hold between different layers of CRN.


Introduction
Transportation networks have enormous impacts on national economic and social activities [1,2].As argued by Alderighi et al. [3], the structure of a network shapes operational strategy and service quality of the system.Thus, it is crucial to examine the structural characteristics of transportation infrastructures.Complex network theory has been widely used to analyze the structural properties of various real-world transportation networks, including airport networks [4][5][6], shipping networks [7], subway networks [8], bus networks [9,10], and railway networks [11][12][13].
Like other transportation systems, railway networks are closely linked to sustainable regional development [14].Particular attention has been paid to the topological properties of railway networks.Sen et al. [12] first employed complex network theory to study India's railway network, revealing the system's small-world character.Soh et al. [13] showed that Singapore's railway network is almost fully connected, and its hub nodes experience disproportionately heavy traffic.The Chinese railway network (CRN) is one of the largest railway networks in the world and significantly contributes to the country's development [15].However, the structural performance of CRN has not been given sufficient attention in authoritative journals; previous research has focused mainly on the structural properties of CRN through pure statistical analyses of stations' connections [11,[16][17][18] while generally overlooking the spatial properties of CRN.
During the last decade, China's railway infrastructure experienced a construction boom, increasing the operating mileage of CRN dramatically from 78,000 km in 2007 to 127,000 km in 2017.Thus, the structure of CRN may have changed substantially and thus needs a thorough reexamination.In this paper, we modeled CRN as a weighted network and employed measures from complex network theory to explore the structural characteristics of CRN.Along with its statistical properties, we also evaluated CRN's spatial heterogeneity and hierarchy.Additionally, because a single centrality measure fails to capture the overall importance of a node in the railway network, we proposed a datadriven integrated measure based on the four centrality measures (degree, strength, betweenness, and closeness) to comprehensively quantify the importance of each node.This measure provides meaningful insights into the national roles of cities in CRN, which helps reveal the multilayer structure of CRN.
The rest of this paper is structured as follows: Section 2 introduces data and structure measures.Section 3 reports the statistical and spatial properties of CRN at a global scale.Section 4 presents the centrality measures of nodes and explores their relationships.Section 5 proposes an integrated measure to reveal the role of each node in CRN.Section 6 concludes the paper.

Data Sources and Network
Structure Measures 1. Data Sources.The data set analyzed was provided by a railway bureau of China.It includes information on all railway stations and over 7,000 domestic scheduled passenger trains in China in 2017.For transportation networks, a widely used methodology for abstracting a system into a complex network is the P-space method [19], illustrated in Figure 1.The P-space representation contains railway stations as nodes and shows a connection between two nodes if a train connects the station pair.For each connection, multiple trains are possible, and the weight of the connection (line) is the total number of trains between the pair of nodes.Following previous studies [4], we treated cities instead of railway stations as nodes.All railway stations in the same city are attributed to the city.For example, Beijing railway station, Beijing West railway station, and Beijing South railway station located in Beijing are all assigned to the node for Beijing City.If a train connects any one station in Beijing to any one station in another city, this is considered a connection between Beijing and that city.As a result, the network CRN has  = 1192 nodes (i.e., cities) and 67,594 edges.

Network Structure Measures.
We applied a variety of topological measures to explore the structural characteristics of CRN.The first two measures summarized the global scale properties of CRN, and the following four measures characterized nodes' centralities in the network.

Average Path Length.
Average path length [20] is defined as the average value of the shortest path lengths between all node pairs in a network: where  is the number of nodes and   is the shortest path length between node  and node .

Clustering Coefficient.
The clustering coefficient [20] of node  is as follows: where   is the number of neighbors connected to node ,   is the actual number of edges connecting the   neighbors, and   (  − 1)/2 is the largest possible number of edges between these neighbors.

Degree Centrality.
The degree of a node [21] is defined as the number of neighbors in the network connected to that node.It is represented as follows: where  represents the set of all nodes in the network except node , and   is defined as 1 if there is a connection between nodes  and , and 0 otherwise.

Strength Centrality.
As an extension of degree, strength centrality combines the connectivity and train flow information of a node [21].Strength is formalized as follows: where   represents the weight of the edge between nodes  and .
Here,   counts the number of possible shortest paths between nodes  and ,   () denotes the number of shortest paths between nodes  and  that pass through node , and   ()/   represents the proportion of the shortest paths between nodes  and  that pass through node .
2.2.6.Closeness Centrality.The closeness centrality [21] of node  is represented as follows: where  is the number of nodes in the network,  represents the set of all nodes in the network except node , and   is the shortest path length between nodes  and .

The Statistical and Spatial Properties of CRN
3.1.The Small-World Property of CRN.The small-world property is a ubiquitous characteristic of a complex network, as shown in other complex systems.A small-world network is a network with a short average path length and a large clustering coefficient.Small average path length exists in random graphs, and a large clustering coefficient can be found in regular lattices but not in random graphs.The small-world property measures the transportation efficiency of a network at the global scale.
The average path length of CRN is 2.21, which means passengers only need to take three trains on average to travel between any pairs of the 1192 cities of China.The maximum shortest path length between a pair of cities in CRN is 5, and city pairs of this type are rare.In addition, 71% of the city pairs are connected by two or fewer steps (topological distance), confirming that CRN is a mature and efficient transportation infrastructure.In China, railroads and airlines are in fierce competition.The average path length of CRN is similar to that (2.23) of the Chinese airline transportation system [22].However, CRN covers many more cities (1192 cites) than the Chinese airline transportation system (203 cities), offering valuable service to remote and small cities.
As previously noted, the clustering coefficient can be used to describe the cliquishness of CRN.The clustering coefficient of CRN is 0.68, which is substantially larger than a random network (  ≈ 0.095) of the same size (the same number of nodes and edges).We can conclude from these findings that CRN is a small-world network.

The Scale-Free Structure of CRN.
A scale-free network is one with a power-law degree distribution of p() ∝ c − with an exponent parameter .Such a network is regarded as robust to random node failure because a large portion of nodes have few connections with others.However, information of the edge-weight is crucial for analyzing CRN as a weighted complex network.We thus analyze the scalefree property of CRN via its edge-weight information.The edge-weights of CRN were counted using train flow information (the number of trains between city pairs), resulting in an average value of 5 and a range of 1 to 623.Around 20% of edges have larger weights than the average weight.This phenomenon is referred to as the "20/80" rule or the Pareto principle in other transportation systems [22].The statistical distribution of the edge-weight was fitted to reveal the pattern of train flow and was shown in Figure 2. The obtained cumulative distribution exhibits a long-tail with few extreme values, indicating a scale-free structure.A nonlinear least square method was applied to the scaling parameter estimator, and the results confirmed that the cumulative distribution obeys a power-law function with the exponent parameter  = 1.07.
The power-law distribution shows that a large portion of edges reflect low train flow intensity, implying a high level of heterogeneity in city connections within CRN.Similar heterogeneity of traffic flows was also found in other transportation systems, such as the Singapore railway system [13] and the bus transportation network in China [10].Generally, train flows are appropriate indicators of passenger flows between node pairs [8], and thus, we may infer that passenger flows between city pairs also exhibit significant heterogeneity in CRN.

Spatial Heterogeneity and Hierarchy of CRN.
Railway networks are spatial networks embedded in the geographical space.The above statistical analyses reveal the heterogeneity of CRN but fail to illustrate the spatial characteristics of the network.To uncover the underlying spatial structure, we map CRN into a connected graph in a geographic coordinate system (see Figure 3).CRN shows a clear difference between the southeast and northwest sides of the Hu Line, a demarcation line for China's population proposed by the prominent geographer Hu Huanyong.The southeastern terrain of China, dominated by plains of low elevation, has a high population density and intense economic activity, while the northwestern part of China is dominated by plateaus and mountains, resulting in a low population density and an underdeveloped economy.The unbalanced populations and economic development help explain the uneven distribution of CRN in southeast and northwest China.
To further explore the spatial heterogeneity and hierarchy of CRN, we constructed 28 subnetworks (a subnetwork includes cities in a province and the connections between the cites) by province in China.The inner network density (density= 2M/(N(N-1)), M is the number of edges between cities in the province and N is the number of cities in the province) of each subnetwork, namely, the connection strength between cities inside a provincial administrative  1).The densities of the subnetworks in southwest and northwest China, including Sichuan, Guizot, Yunnan, Tibet, Gansu, Sinkiang, and Qinghai provinces, are 0.038, 0.061, 0.025, 0.006, 0.064, 0.069, and 0.038, respectively, substantially smaller than those of the provinces of east China.This confirms the observations in Figure 3.Moreover, we applied the Gini coefficient (G-value) to measure the spatial heterogeneity of CRN [23], which is formulated as follows: where  denotes the number of edges in the network;   is the edge-weight, namely, train flow;   represents the rank of edge sorted by edge-weight in descending order; and  is the mean value of edge-weight.G-value ranges from 0 to 1, and a larger G-value indicates a more heterogeneous network.The G-value of CRN was calculated as 0.77, indicating the significant heterogeneity of intercity train flows in CRN.
The scale-free property of train flows also implies a hierarchical structure for CRN. Figure 3 shows that a small number of city pairs are intensely connected by hundreds of trains (warm color lines), and the majority of city pairs are connected by limited numbers of trains (cool color lines and gray lines).Notably, cities with good connectivity and intensive traffic flows form the framework of CRN and occur along "the four vertical and four horizontal" railway corridors.The four vertical lines include the Beijing-Shijiazhu ang-Zhengzhou-Wuhan-Changsha-Guangzhou line, the Beijing-Tianjin-Jinan-Hefei-Nanjing-Shanghai line, the Beijing-Qinhuangdao-Shenyang-Ha' erbin line, and the Shanghai-Hangzhou-Fuzhou-Xiamen-Guangzhou line.The four horizontal lines are the Xuzhou-Shangqiu-Zhengzhou-Xian-Baoji-Lanzhou, the Qingdao-Jinan-Taiyuan-Shijiazhuang line, the Shanghai-Nanjing-Wuhan-Chongqing-Chengdu line, and the Shanghai-Hangzhou-Changsha-Guiyang-Kunming line.The interactions of the four vertical and four horizontal lines are the core cities of CRN with the most connections and train traffic, such as Beijing, Shanghai, Zhengzhou, Wuhan, and Changsha.All these cities are national or regional economic and political centers, indicating that strong political factors influence the hierarchical structure of CRN.

Degree and Strength.
To gain deeper insights into the structure and evolution of CRN, we calculated the degree centrality (connectivity) and strength centrality of each city.For CRN, nodal degree ranges from 1 to 673, with an average value of 113.64% of the cities are less connected than average.The average value of nodal strength was 504 with a range from 2 to 6694.73% of the cities have lower strength than average train traffic.Figures 4(a) and 4(b) display the cumulative distributions of nodal degree and strength, both of which approximately follow an exponential decaying.This suggests that CRN evolves randomly, unlike airline networks, whose degree distributions tend to follow a power-law distribution [4,6,24].
A possible explanation for the difference in degree and strength distributions between CRN and airline networks stems from the organizations of the two types of networks.Generally, airline transportation networks adopt a hub-and-spoke service strategy, and expansions of airline networks coincide with the preferential attachment model, which draws expansions to more connected hubs; this is known as the rich-club effect [3].Moreover, two airports are usually connected by a nonstop air-route, and intermediate airports are rare.However, to ensure service to more cities, a train route usually covers 10-20 railway stations.In P-space representation, one route will generate a fully connected graph, which raises the degree and strength of intermediate stations.In addition, a railway station can only handle a limited number of railway tracks and trains, resulting in relatively homogeneous distributions instead of power-law distributions.

Betweenness and Closeness.
Special attention should also be paid to nodal betweenness and closeness centralities, which measure the global centrality and accessibility of nodes in the network, respectively.Nodal betweenness ranges from 0 to 0.053 in CRN, with an average value of 0.001.82% of the nodes exhibit lower value than average betweenness, which suggests that few powerful nodes have absolute control power over the whole network.13% of the cities have a betweenness value of 0, meaning that there is no shortest path passing through them.All of these nodes are terminal cities of railway routes, and most of the cities are located in less-developed regions with low population densities.Cities with high betweenness (e.g., the top 30) are mainly provincial capitals with advanced economies and high population densities.These cities compose the core layer of CRN and account for most of the transfers in the network.The cumulative distribution of nodal betweenness is plotted in Figure 5(a), which reveals a monotonically decreasing trend that can be approximated by an exponential function.
The closeness of nodes ranges from 0.29 to 0.69, with an average value of 0.46. Figure 5(b) presents the cumulative distribution of nodes' closeness, which shows an unfamiliar pattern of an inverse "S" curve.This shows that few cities have extremely weak closeness (only 20% of the cities have closeness under 0.4).This finding further confirms that CRN is an efficient infrastructure network.

The Relationships between Degree and the Other Three
Measures.The top 20 cities ranked by degree are listed in Table 2. Notably, the rankings of cities vary with the measure used.For example, Shanghai ranks second in degree, strength, and closeness but twelfth in betweenness.We exploited relationships between degree and strength, betweenness, and closeness to uncover further information on the topological structure of CRN. Figure 6(a) shows that the relationship between nodal degree and strength can be fitted by a power-law function, but not by the expected linear one.A similar trait was also found in the US intercity airline transportation network [24].The main reason behind this is that a small number of hubs have many connections and handle much more traffic flow than the peripheral nodes.
Nodal degree is positively correlated with betweenness and closeness (see Figures 6(b) and 6(c)), which can be fitted by a power function and a linear function (for degree ≥ 100), respectively.Notably, variations in degree and betweenness are less consistent.An important question explored by previous studies on the relationship between nodal degree and betweenness is whether the most connected nodes are also the most central.In many complex networks, including randomized networks [5], Internet networks, and social networks [25], nodal degree and betweenness have a strong linear relation.In contrast, CRN shows some anomalies: certain cities have small degree and large betweenness (circled by red ellipses in Figure 6(b)).
To explore the underlying reasons for anomalies, cities are classified into five tiers based on nodal degree and betweenness, respectively (see Figures 7(a) and 7(b)).Prominent differences can be identified in the two maps.These differences identified between the two classifications indicate the strong influences of socioeconomy, politics, and geography in the development of CRN.For example, Lasa and Haikou are in the bottom-tiers in terms of the nodal degree but belong to the top-tier in terms of nodal betweenness.Lasa and Haikou are located in the remote and peripheral area of  China and are weakly connected with other cities.However, those two cities are economic hubs, local political centers, and respective gateways to Tibet Autonomous Region and Hainan Province.Thus, they play critical roles in connecting small cities scattered around them to other cities in the network.

The Role of Cities in CRN
Centrality measures, including degree, strength, betweenness, and closeness, reflect different aspects of cities' importance in CRN.As noted above, the weak connection of a city does not imply unimportance, because the city may have high betweenness and play a bridging role.A city exhibiting good connectivity is not necessarily globally central in the network.That is, a single measure fails to capture the overall importance of any city in CRN.A typical example can be obtained through a comparison of Kunming and Jinzhou.The former is the economic center and capital city of Yunnan Province as well as the gateway to southwest China, whereas the latter is a less-developed, medium-sized noncapital city in central China.Kunming has the second-highest betweenness centrality (0.033), but its connectivity (396) is weaker than that of Jinzhou (463), whose betweenness is only 0.009.Overall, Kunming plays a more important role in CRN than Jinzhou.However, this cannot be easily discerned using only commonly used degree.Luan et al. [26] proposed a multiplecriteria indicator based on the degree, betweenness, and closeness centralities to capture nodes' importance and the hierarchical structure of a network.Inspired by this idea, we propose an integrated measure based on the four centrality measures and define it as a hub indicator to quantify the global roles of cities in the network.This measure is as follows: where , , , and  are respective weights of the unified nodal degree, strength, betweenness, and closeness.The values of , , , and  are calculated using the coefficient of variation method, a data-driven method for measuring quantity weights (see the references for the details on the procedures for the coefficient of variation method) [27,28]., , , and  are calculated to be 0.22, 0.14, 0.56, and 0.08, respectively, and   is in the range of 0.001 to 0.98 with an average value of 0.28.
Following the classification rules suggested by Guimerá et al. [5] and Du et al. [4], we divided the cities into four categories based on the value of   via the k-means clustering algorithm: (1) national core cities with 0.62 ≤   < 0.98, (2) bridge cities with 0.42 ≤   < 0.62, (3) peripheral cities with 0.26 ≤   < 0.42, and (4) ultraperipheral cities with 0 ≤   < 0.26.The spatial distribution of the city categorizations is plotted in Figure 8. Notably, core and bridge cities are mainly national or local economic and political centers scattered along the "four vertical and four horizontal" railway corridors.Most of the peripheral and ultraperipheral cities are located in remote or peripheral regions and are lessdeveloped.This categorization is significant and consistent with the organization and evolution of CRN.
There are 33 cities and 524 edges (0.8% of the total edges) in the core layer, 158 cities and 8,023 edges (11.9%) in the bridge layer, 451 cities and 10,838 edges (16.0%) in the peripheral layer, and 550 cities and 2,095 edges (3.1%) in the ultraperipheral layer.Moreover, there are 4,752 edges (7.1%) between the core layer and the bridge layer; 9,830 edges (14.5%) between the core layer and the peripheral or ultraperipheral layer; 25,490 edges (37.7%) between the bridge layer and the peripheral or ultraperipheral layer; and 6,024 edges (8.9%) between the peripheral layer and the ultraperipheral layer.A remarkable finding from this analysis is that stable connections hold between different layers in CRN, which is substantially different from the Chinese airline network, where most connections (63%) are within the core layer and minimal connections (0.25%) exist between the core layer and the peripheral layer [4].This finding further demonstrates that CRN is a mature and efficient infrastructure network.

Conclusion
We investigated CRN by modeling it as a complex weighted network.Our findings suggest that CRN is a small-world, scale-free infrastructure network with a small average path length (2.21) and a large cluster coefficient (0.68).Unlike other complex networks such as the Internet and social and biological networks, the distributions of nodal centralities of CRN, including degree, strength, betweenness, and closeness, exhibit patterns of exponential functions or an inverted "S" shape.Nodal degree is positively correlated with nodal strength, betweenness, and closeness.However, our analysis reveals that the most connected cities are not necessarily the most central because of the influences of social, political, and geographical factors.
Train traffic in CRN follows a power-law distribution, implying heterogeneity and hierarchy of the network.To illustrate the underlying reasons for this pattern, we mapped a topological connectivity graph of CRN in a geographic coordinate system.Our findings show that uneven population distributions and economic clout account for the uneven distribution of CRN services between southeast and northwest China.The "four vertical and four horizontal" railway corridors establish major connections and train flows of CRN and pass through cities that are national or local political centers with advanced economies and dense populations.This indicates that the uneven distribution of CRN services reflects a strong political influence.Nodal degree, strength, betweenness, and closeness quantify the importance of cities in CRN from different perspectives.However, no single measure can uncover the role of cities on a global scale.Thus, we proposed an integrated indicator that reveals the multilayer structure of CRN.It classified cities into four categories (core cities, bridge cities, peripheral cities, and ultraperipheral cities).Unlike the Chinese airline network, CRN has remarkably stable connections between different layers of the network, demonstrating the CRN's accessibility and efficiency.
Our research has some limitations.China's railway transportation system comprises different types of trains, such as G-number, D-number, C-number, Z-number, T-number, and K-number, with varying speeds and capacities.This research focuses on the connectivity of CRN and represents trains between cities as weighted edges in the topological network without considering train type.Assigning different weights to different trains will enable a more comprehensive analysis but requires substantial efforts.This is now included in our agenda for future research.Another direction is to examine networks formed by different types of trains separately, for example, China's high-speed railway network, comprised of G-number and D-number trains.Preliminary results suggest that China's high-speed railway network exhibits certain properties distinct from those of CRN as a whole.Further analyses in this direction are in progress.

Figure 1 :
Figure 1: (a) A railway network consists of two train routes, train route 1 in orange and train route 2 in blue.(b) The P-space network formed by the two train routes in Figure 1(a), where the orange lines denote node pairs connected through the train route 1, and the blue lines denote node pairs connected through train route 2.

Figure 2 :
Figure 2: The cumulative distribution of train flow (edge-weight) in CRN.

Figure 3 :
Figure 3: The spatial distribution of train flows in CRN.The size of the nodes reflects the degree of the city, and the colors of the lines indicate train flows.The black dashed line denotes the Hu Line, which demarcates the concentration of population of China.Cities along the four vertical and four horizontal railway corridors are marked as A Beijing, B Tianjin, C Qinhuangdao, D Shenyang, E Ha' erbin, F Taiyuan, G Shijiazhuang, H Jinan, 0 Qingdao, 1 Urumchi, 2 Lanzhou, 3 Xi'an, 4 Zhengzhou, 5 Xuzhou, 6 Nanjing, 7 Shanghai, 8 Hangzhou, 9 Wuhan, : Chongqing, ; Chengdu, ¦ Nanchang, oe Changsha, § Huaihua, X Guiyang, ¥ Kunming, Y Ningbo, Z Fuzhou, À Xiamen, ¤ Shenzhen, °Guangzhou.

Figure 4 :
Figure 4: (a) The cumulative distribution of the nodal degree in CRN.(b) The cumulative distribution of the nodal strength in CRN.

Figure 5 :
Figure 5: (a) The cumulative distribution of the nodal betweenness in CRN.(b) The cumulative distribution of the nodal closeness in CRN.

Figure 6 :
Figure 6: (a) The relationship between nodal degree and strength in CRN.(b) The relationship between nodal degree and betweenness in CRN.(c) The relationship between nodal degree and closeness in CRN.

Figure 7 :Figure 8 :
Figure 7: (a) The spatial distribution of cities classified by nodal degree value in descending order.(b) The spatial distribution of the cities classified by nodal betweenness value in descending order.

Table 1 :
Densities of subnetworks divided by provinces.

Table 2 :
The top 20 cities ranked by degree.