Urban Network and Regions in China: An Analysis of Daily Migration with Complex Networks Model

: This paper analyzed urban network and regions in China using a complex network model. Data of daily migration among 348 prefectural-level cities from the Baidu Map location-based service (LBS) Open Platform were used to calculate urban network metrics and to delineate boundaries of urban regions. Results show that urban network in China displays an obvious hierarchy in terms of attracting and distributing population and controlling regional interaction. Regional integration has become increasingly prominent, as administrative boundaries and natural barriers no longer have strong impacts on urban connections. Overall, 18 urban regions were identiﬁed according to urban connectivity, and the degree of urban connection is higher among cities in the same urban region. Due to geographical proximity and close interaction, several provincial capital cities form an urban region with cities from neighboring provinces instead of those from the same province. Identiﬁcation of urban region boundaries is of signiﬁcant importance for sustainable development and policymaking on the demarcation of urban economic zones, urban agglomerations, and future adjustment of provincial administrative boundaries in China. within the framework of social network theory and methodology. This paper measured connectivity characteristics among Chinese cities, identiﬁed urban regions (communities of cities) to reﬂect the strength of intercity linkages, and delineated boundaries of urban regions. Due to the availability of data, the analysis was conﬁned to 369 prefecture-level cities in China. node has a betweenness centrality of 1, as all the paths passing through the node are also the shortest ones.


Introduction
A city is an open system, which regularly exchanges materials, products, services, information, and capital with external environments. As Taylor has pointed out, a city can only exist in the network formed by the interaction with other cities [1]. Castells believed that our society is formed by various flows, and that the accumulation of wealth and power in cities is not achieved through what is possessed in cities, but through various flows [2]. Cities cannot exist independently, thus to a large degree, cities are defined by their exchanges of inputs and outputs with other cities and rural areas. Therefore, with the world becoming increasingly urbanized, the relationship of a city with other cities has come to be the essential element determining the status of the city. In addition, urban systems are increasingly characterized by 'networks' in the era of globalization. Consequently, the urban network perspective has been one of the primary approaches to exploring the complex issue of urban systems.
Within the framework of the urban network approach, analysis of relational data between cities is the key focus. Characterized by traffic flows as well as the global expansion and spatial reorganization of transnational corporations among cities, the functional connections among cities have become an important means of interpreting urban linkages and are often used to study the organizational patterns and changes of the world urban system [1]. For example, the Globalization and World Cities (GaWC)

Daily Migration Data
Daily migration data were obtained by aggregating individual movements extracted from Baidu's location-based services (LBS) open platform. Baidu, as an equivalent to Google in the United States, is the dominant Chinese internet search engine company. The LBS platform is the search provider's positioning facility that serves more than 5 billion daily positioning requests sent through its various products, such as Baidu Maps and other apps. With its cloud computing capabilities and wide penetration of positioning technologies on mobile devices, Baidu is able to fully and instantly reflect population migration. Daily population movement data, which are recorded and constructed from the positioning information of users' mobile phones, form the so-called 'Baidu migration' data. Data of people's movement within each city were discarded, while daily movements between cities were extracted and recorded. Daily migration among China's prefecture-level cities (excluding Sansha (Sansha is a prefecture-level city officially established in 2012 to administer the island groups and their surrounding waters in the South China Sea. Its transportation connection with other parts of the nation mainly relies on voyages, which is quite different from other cities in China. Hong Kong, Macau, and Taiwan have special regulations for entering into and exiting from mainland China), Hong Kong, Macao, and cities in Taiwan) were constructed for about three months in 2015 (February 7 to May 16, 98 days in total). Average daily migration between each pair of cities was calculated and recorded in a 369 × 369 unsymmetrical square matrix. During this period, the population flows reached 121 million in total and 1.24 million per day on average.
The unsymmetrical square matrix represents a complex network of population migration among cities. Daily movement of people from a given city to a number of receiving countries forms a set of multidirectional connections. Movement of people between any pair of cities can occur in opposite directions in physical space, implying that the migration network is a weighted directed spatial network. The network constitutes of nodes (i.e., cities) connected by links representing the migrants to and from each city. In this perspective, the topology and behavior of the network can be examined through calculation of selected network metrics.

Data Applicability
The Baidu migration dataset is the collective result of individual human behaviors. This dataset is constituted of origins, destinations, and flow paths. Although the origins and destinations in a population flow network have specific physical locations, due to data collection limitations and constraints, each is often linked to a territorial area. For instance, origins and destinations may be countries, provinces, cities, or smaller administrative districts. In other words, population flows build networks between territorial areas. In this case, although the population flow network exists in a physical space, the often discussed population flows are just types of projections of a real network in certain spatial scales [12]. Population flows data can be divided into two basic types according to their spatial scales: those occurring at intracity level, such as commuting flow networks [13], and those taking place at intercity or regional level, such as migration flow networks [14,15]. This study applied the Baidu migration data to investigate regional-level migration flow networks. Due to the fact that migration flow networks could be regarded as complex networks, various kinds of network metrics (e.g., degree distribution) are tentatively applied to the structural analysis of migration flow networks [16]. Such metrics play a major role in uncovering the characteristics and nature of migration flow networks, and hence, the spatial pattern of urban connectivity.

Measurement
With recent development of graph theory and social network analytical tools, exploration of network organization has received increasing attention in academia. In particular, studies on spatial organization and patterns of urban interaction using the complex network model have become a major theme in transport planning and transportation geography research [17]. The basic approach within the complex network framework is to derive typical complex network metrics to investigate characteristics and the nature of the network. Among these metrics, centrality is the most widely used one, which consists of degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC).
Communities in a network are the dense groups of the vertices, which are tightly coupled to each other inside the group and loosely coupled to the rest of the vertices in the network. Community detection, also known as community mining, denotes the identification of the community structure within the network through topological relationships and properties [14]. In our case, cities are well connected within communities and less connected among communities. Community structure is the most widely studied structural feature of complex networks. Consequently, community detection plays a pivotal role in understanding the functionality of complex networks.

Degree Centrality and Betweenness Centrality
Degree centrality is defined as the number of link incidents upon a node (i.e., the number of ties that a node has). DC is primarily used to measure the significance and influence of a node in the network. A higher value of DC indicates a greater distributing capacity of a certain node in the network. The calculation formulas for DC and BC are shown below: x i j is the total number of direct connections among the node i and other g−1 nodes, and denotes the total inflow and outflow of urban population flows. For example, each node of a line has a degree centrality of 2, as either node has one inflow connection and one outflow connection. In our case, DC was applied to measure the significance of a certain city in the urban network in distributing migrants and controlling interactions among cities.
As an indicator of a node's centrality in a network, betweenness centrality is defined as the number of shortest paths from all vertices to all others that pass through the node under concern. BC aims to measure a node's capacities of controlling information across the network. If a node falls on the shortest path between others, it means a higher degree of BC and a greater ability to control communication [18]. The calculation of BC is shown in the following formula: where ∂(i, j) is the number of shortest paths between nodes i and j; ∂ i, j v is the number of shortest paths passing through node v between nodes i and j. Also, taking a line as an example, of which either node has a betweenness centrality of 1, as all the paths passing through the node are also the shortest ones.

Community Detection
Recently, many algorithms have been proposed for community detection [19][20][21][22]. Among them, the fast unfolding algorithm has gained wide acceptance and has been widely used in the field of community detection [20]. This paper employed the fast unfolding algorithm to process community detection and this algorithm makes use of the function of modularity to characterize the network's distinctness among communities and reveal the characteristics of the local agglomeration within the network. A higher value of modularity indicates better division of communities [23]. In a weighted network, modularity (Q) is computed as follows: where m is the total number of edges in the network, A ij is the weight of the edge linking node i and node j; k i and k j are the sum of weights of all edges linking with node i and node j, respectively. C i and C j represent the communities to which node i and node j belong, respectively. Theoretically, the value of modularity (Q) falls between 0 and 1. The closer the value is to 1, the more obvious is the community structure. Empirically, the value of Q usually falls into the range from 0.3 and 0.7 in the complex networks representing a particular phenomenon in the real world, such as population flows.

Degree Centrality
The spatial distribution of DC is shown in Figure 1a. The Jenks natural breaks classification method is used to describe the distribution according to the amount of daily average population distribution for each city. The cities under investigation are classified into five categories with daily average population distribution ranging from 0-5952, 5952-16,441, 16,441-41,469, 41,469-74,881, and >74,881. The top 20 cities are as follows: Beijing, Shanghai, Suzhou, Tianjin, Shenzhen, Guangzhou, Langfang, Dongguan, Hangzhou, Chongqing, Nanjing, Baoding, Urumqi, Xi'an, Wuhan, Foshan, Xuzhou, Ganzhou, Ningbo, and Chengdu. It is obvious that the highest-level cities, namely, with the highest values of DC, are concentrated mainly in the coastal areas, especially China's four major urban agglomerations (i.e., Beijing-Tianjin-Hebei, the Yangtze River Delta, the Pearl River Delta, and Chengdu-Chongqing). In addition, an obvious belt-shape distribution of urban communities was identified along the Beijing-Shanghai high-speed railway. However, the high-level cities in the midwest of China are mainly provincial capital cities. The differences in the distribution hierarchy of provinces are also significant. There are no third-or higher-level distribution cities in Tibet, Ningxia, Hainan, Gansu, Fujian, Hunan, Yunnan, Guangxi, Inner Mongolia, Liaoning, Heilongjiang, and Jilin. Urumqi represents the highest level in northwest China, while Chongqing and Chengdu are the largest population concentration and distribution centers in southwest China. Shenzhen, Guangzhou, and Dongguan have the highest levels in Guangdong province, whereas the eastern, western, and northern parts of the Guangdong province do not have high levels of concentration and distribution centers. Fujian, Heilongjiang, Liaoning, and Jilin in northeast China, as well as Yunnan, Guizhou, and Guangxi in southwest China, do not have cities at the third level or above. Some provinces show exceptions, in which the provincial capital cities are not the highest-level cities. For example, the highest-level city of Jiangxi province is Ganzhou rather than its capital city, Nanchang. In addition, the level of Suzhou in Anhui province is slightly higher than the capital city Hefei, and the levels of Langfang and Baoding in Hebei province are higher than the capital city Shijiazhuang. Similarly, Shenzhen also has a higher level than the capital city Guangzhou in Guangdong province.

Betweenness Centrality
Betweenness centrality (BC) measures the ability to control communication among cities in the global structure of a network, which indicates the probability of a node serving as a 'bridge'. Hence, nodes with high BC play the role of hubs in cities' interaction networks. Figure 1b illustrates spatial variation of BC, which can be classified into five levels using the Jenks natural breaks method; namely, 0-85, 85-212, 212-383, 383-781, and >781. The top 20 centers controlling communication among cities are as follows: Beijing, Chengdu, Guangzhou, Shenzhen, Shanghai, Chongqing, Xi'an, Wuhan, Hangzhou, Suzhou, Tianjin, Nanjing, Shijiazhuang, Dongguan, Jinan, Nanning, Kunming, Fuzhou, Xiamen, and Foshan. Chongqing and Chengdu, Beijing and Tianjin, Shanghai and Hangzhou, Zhengzhou and Wuhan, and Guangzhou and Shenzhen are the communication control centers in western, northern, eastern, and southern China, respectively. However, there is a lack of strong control centers in northeast, southwest, and northwest China. In northwest China, the top three cities with the highest BC index are the provincial capital cities, namely Urumqi, Lanzhou, and Xining, whereas the top three cities with the highest DC index are Urumqi, Xining, and Haidong.

Community Dectection
Community detection provides a new way to understand the functional relationships shaping the spatial organization of a region and the territorial structures derived from such shaping. The open-source network analysis and interactive visualization tool Gephi v0.9.2 was used to implement community detection. Using a 3D rendering engine to display graphs in real-time, Gephi supports almost all types of graphical networks, including complex networks, and speeds up the exploration of relationships within data. Resolution in the calculation was repeatedly adjusted until the modularity reached its maximum of 0.532, which falls into the range between 0.3 and 0.7, and then the community detection is considered the final result [21]. The distribution of the communities identified is shown in both Figure 2 and Table 1, revealing a significant pattern of spatial clustering. Eighteen clusters of cities, or communities, were identified. Cities within communities are well connected, while cities between communities are less connected. There were no geographical elements in the calculation of the fast unfolding algorithm to detect communities, yet the results reveal strong influences of geographical proximity on the structure of the clusters.  The results of community detection of Chinese cities shed new light on the old yet important problem of regionalization. The delineation of urban regions is an important aspect of understanding spatial structure of urban system. Previous studies hold different views on the division of urban regions in China. For example, Gu claimed that China has 33 urban economic regions (two tier) [24], whereas Zhou and Zhang contended 11 urban economic regions (two tier) [25]. Chen et al. divided China into 19 urban economic regions according to highway passenger flows [26]. The prominent feature of detected communities is that the cities are well connected within each community, while less connected among communities. Population mobility can be taken as a measurement of economic connection between cities, and the boundaries of the detected urban communities have a high similarity to those of the urban economic zones. The number of detected urban communities matches the results of Chen et al. [26], while the spatial structure of urban economic regions is different. The main features of communities identified in this study are described as follows.
Firstly, urban linkages have obviously moved beyond administrative boundaries and natural geographical barriers. City regions show an increasing interprovincial integration. Cities in each province of Tibet, Xinjiang, Qinghai, Ningxia, Jilin, Heilongjiang, Jiangsu, Anhui, Guangdong, and Hainan belong to the province to which they administratively belongs, indicating a certain extent of an administrative district economy. However, almost all the communities' boundaries have moved beyond the provincial administrative boundaries, denoting that the restrictive effect on the regional economy by the administrative boundaries has been weakened. Cities in several provinces are separated into different communities. For example, Gansu is divided into three parts: the north part joins Xinjiang to form Community 13; the south part combines with Ningxia, central Inner Mongolia, Shaanxi, and parts of Shanxi to be Community 7; and the west part merges with Qinghai and Aba of Sichuan shape Community 9. In addition, due to geographical proximity and close urban linkages, the results reveal a phenomenon that some provincial capital cities may link up with their neighboring cities in adjacent provinces instead of those from the same province to establish a community. For example, the capital city of Nanjing in Jiangsu province joins hands with Ma'anshan and Chuzhou in Anhui province to form Community 2, while the other cities in Jiangsu province are under the realm of influence from Shanghai. Shanghai leads Community 5, consisting of the majority of cities from Anhui and Zhejiang provinces and part of Henan province. This phenomenon supports the argument to alleviate restrictive effects from the administrative boundaries on regional economic integration. Rapid development of transportation infrastructure has weakened geographical barriers, such as huge mountains and rivers, to urban linkages. The Wuyi Mountains and Taihang Mountains are the boundaries between Fujian-Jiangxi provinces and Shanxi-Hebei provinces, respectively. The Nanling Mountains meanders along the boundaries linking Guangdong, Guangxi, Hunan, and Jiangxi provinces. However, these mountains do not stand out in the community division map as dividing lines among city regions.
Secondly, divergences exist between the regional boundaries identified in this paper (e.g., the Yangtze River Delta (Community 5), Beijing-Tianjin-Hebei (Community 1), Pearl River Delta (Community 4), and Chengdu-Chongqing (Community 12)), with the boundaries of the urban economic zone or urban agglomeration defined by the government or academia. The scope of Beijing-Tianjin-Hebei (Community 1) consists of Beijing, Tianjin, Hebei, central and western Inner Mongolia, northern Shandong, and eastern Shanxi, which exceeds the scope of what is defined in the 'Beijing-Tianjin-Hebei Urban Agglomeration Plan' that states the Beijing-Tianjin-Hebei Urban Agglomeration only consists of Beijing, Tianjin, and 11 cities in Hebei province (Shijiazhuang, Zhangjiakou, Qinhuangdao, Tangshan, Baoding, Langfang, Xingtai, Handan, Hengshui, Cangzhou, and Chengde). The Yangtze River Delta agglomeration (Community 4) encompasses most parts of Jiangsu province (except Nanjing, Xuzhou, and Lianyungang) and Anhui province (except Suqian and Bozhou), Eastern Henan (including Xinyang and Zhoukou), and Northern Zhejiang province (Hangzhou, Shaoxing, Jiaxing, and Ningbo). This classification is also quite different from the scope of the Yangtze River Delta urban agglomeration as defined in the 'Yangtze River Delta Urban Agglomeration Plan' approved by the State Council in 2016. Furthermore, the radiation effects of the cities in the north of Shanghai are obviously stronger than those in the south. The Pearl River Delta urban agglomeration (Community 4) covers Guangdong, Guangxi, Hunan, Jiangxi, most parts of Fujian, Southeast Guizhou, and Eastern Yunnan, and its scope is significantly larger than the existing definition of the Pearl River Delta agglomeration, which refers to nine cities within the Guangdong Province (Guangzhou, Foshan, Zhaoqing, Shenzhen, Huizhou, Dongguan, Zhuhai, Zhongshan, and Jiangmen). Nevertheless, Community 4 is smaller than the so-called Pan Pearl River Delta region (including Guangdong, Guangxi, Fujian, Jiangxi, Guangxi, Hainan, Hunan, Sichuan, Yunnan, Guizhou, Hong Kong, and Macao). The Chengdu-Chongqing agglomeration (Community 11) includes Chongqing, eastern Sichuan, northwest Guizhou, northern Yunnan, etc., which is larger than the scope of Chengdu-Chongqing urban agglomeration (including Chongqing and 15 cities in Sichuan, which are Chengdu, Deyang, Mianyang, Meishan, Ziyang, Suining, Leshan, Ya'an, Zigong, Luzhou, Neijiang, Nanchong, Yibin, Dazhou, and Guangan) as defined in the 'Regional Planning of Chengdu-Chongqing Economic Zone' endorsed by the State Council in 2011.
Lastly, a typology of urban regions in China was formulated based on the value of DC and its spatial distribution. Communities of cities can be classified into four types according to the spatial patterns: monocentric mode, dual-nuclei mode, polycentric mode, and incompletely developed mode. It is noted that cities with DC over the third level are defined as the central city in a community. In general, Communities 2, 7, 12, 13, and 17 fall to the monocentric category, while Communities 11 and 15 are identified as dual-nuclei mode; Communities 1, 4, and 5 are classified as polycentric mode, and

Discussion
This study has revealed a prominent trend of interprovincial integration of regional development in China. The majority of urban regions identified in this study have moved beyond the provincial-level administrative boundaries and traditionally important geographic barriers, which is contradictory to relevant studies arguing that city communities often match well with administrative boundaries [27][28][29]. Administrative boundaries have been less important in shaping the spatial pattern of urban connectivity. The results of this study also confirmed the existence of administrative economy in China, yet it is limited to the relatively less developed regions. It is evident that, along with social and economic development, the boundary effects of administrative division and geographic barriers on urban and regional integration are weakening. This is argued to be the fundamental reason why a number of provinces have been separated into two or three urban regions. In particular, Gansu province is divided into northern, southern, and western parts, and Sichuan province is decomposed to be western and eastern parts. This finding has important policy implications as it can shed light on possible adjustments of provincial administrative boundaries in the future. Provincial capitals, such as Nanjing and Lanzhou, are closely connected to cities from adjacent provinces and constitute a de facto urban region, which implies that there may be a need to adjust administrative boundaries to better cater for urban interactions. This argument even holds for the boundaries overlapping with major mountain ranges, such as the Taihang, Nanling, and Wuyi Mountains. Traditionally these boundaries have been a critical factor in delineating provincial territories. However, according to the city communities identified in this study, they may not constitute the boundaries of urban regions any more. This finding is different from previous arguments that they usually form a relatively closed or semi-closed regional economy [26]. Part of the reason for this difference is due to the difference of data. Highway flow data were used in Chen et al., on which the natural barriers could impose a significant impact [26]. While this study examined population flow data, which were less confined by natural geographic barriers. The measurement of socioeconomic intercity interactions, such as population flow data employed in this study, can provide a more systematic and comprehensive picture of the organization of urban network. Hence, we argue that the demarcation of urban regions-as urban economic zones or urban agglomerations-should pay more attention to socioeconomic linkages rather than geographic boundaries. This should be particularly emphasized in making regional policies and planning proposals.
Thus far, the boundaries and spatial patterns of urban agglomeration in China are still inconclusive and, to a large extend, dependent on the measurement used by researchers as well as major concerns of regional development plans. A systematic way of capturing the dynamics of diverse urban linkages is of fundamental importance for delineating urban agglomerations in China. This paper argues that urban linkage is critical to examine the mechanism of urban system and urban agglomeration, and thus is an essential aspect in formulating regional sustainable development strategies. However, location still matters, and the radiation capacity of cities in urban regions is an important factor in explaining the function, role, and relative position of a city in the urban network hierarchy. This study reveals that the composition of the four prominent urban regions is different from the well-documented urban agglomerations in China. The boundaries of Community 1 and Community 4 are distinct from the Beijing-Tianjin-Hebei and Pearl River Delta urban agglomerations as defined by the government or academia [30]. The application of big data, such as cellphone signaling data, has already been used to study urban agglomeration from the perspective of urban linkages [31]. Using the big data of population flow, this paper applied the method of complex network analysis to examine spatial pattern of urban regions in China, demonstrating this method is useful to identify the core city and unfold the polycentricity of regional configuration. The results show that complex network analysis may shed light on urban agglomeration studies and urban system planning.

Conclusions
By using Baidu migration data, this study adopted social network analysis to investigate the pattern and characteristics of daily population flow in urban China, leading to our demarcation of urban regions in China. A prominent hierarchy exists among cities in terms of their roles in gathering and distributing population and regional resources. Cities on the top of the hierarchy are mainly located in coastal regions and key transportation corridors. However, the capacity of a city as an intermediary control is largely determined by its location and the condition of its transport infrastructure. Apart from regional centers, such as Beijing, Shanghai, and Guangzhou, cities with advantaged logistic capabilities such as Jinan, Chengdu, Xi'an, Wuhan, Zhengzhou, Fuzhou, and Xiamen also have a high betweenness centrality (BC). Chongqing and Chengdu, Beijing and Tianjin, Shanghai and Hangzhou, and Zhengzhou and Wuhan, as well as Guangzhou and Shenzhen are the 'bridges' of the west, north, east, south, and middle of China, respectively. Natural and administrative boundaries are no longer the major barriers of urban linkages, implying a prominent trend of regional integration, as urban regions have obviously crossed over the administrative boundaries. Huge mountain ranges, such as the Taihang, Nanling, and Wuyi Mountains, are important interprovincial boundaries, but they no longer constitute the boundaries of urban regions as identified in this study. It is interesting that this study found the provincial capital city may form urban regions with cities from its neighboring provinces due to geographical proximity and close urban linkages. For example, cities in Jiangsu province are divided into two different urban regions: the provincial capital city Nanjing leads the group of Ma'anshan and Chuzhou from Anhui province, while the rest of cities in this province fall to the group chaired by Shanghai. The spatial organization of urban regions in China can be classified into four types: monocentric, dual-nuclei, polycentric, and less developed mode. The identified regional boundaries of the Yangtze River Delta (Urban region 5), Beijing-Tianjin-Hebei (Urban region 1), the Pearl River Delta (Urban region 4), and Chengdu-Chongqing (Urban region 12) are quite different from the boundaries of the urban economic zones or urban agglomerations proposed by the government or academia. This finding can shed light on making policies on adjusting urban economic zones, regional sustainable development and urban agglomerations in China.
Two major limitations should be noted for the Baidu migration data used in this study. Due to the restrictions of data collection and preprocessing, travel behaviors by those who do not use Baidu Maps are not recorded in the dataset. In addition, for the recorded travels, a trip with a transfer station along the road was identified as two separate trips. Despite its inherited limitations, the Baidu migration dataset proves to be able to reflect the daily population flow dynamics in urban China, and reveal urban interactions and linkages in a comprehensive and systematic way. With regard to revealing dynamic relationships between cities, the dataset shows advantages over conventional data sources of census or transportation statistics. By exploiting the daily migration flows, this research sought to investigate the structure of urban systems in China. Yet the analysis was confined to the prefecture-level cities, due to the availability of such data. Future research may further investigate China's urban network with data at the finer scales, such as county level or even town level, which can enrich the details of the investigation and deepen our understanding.