Modelling impacts of high-speed rail on urban interaction with social media in China’s mainland

ABSTRACT High-Speed Rail (HSR) has increasingly become an important mode of inter-city transportation between large cities. Inter-city interaction facilitated by HSR tends to play a more prominent role in promoting urban and regional economic integration and development. Quantifying the impact of HSR’s interaction on cities and people is therefore crucial for long-term urban and regional development planning and policy making. We develop an evaluation framework using toponym information from social media as a proxy to estimate the dynamics of such impact. This paper adopts two types of spatial information: toponyms from social media posts, and the geographical location information embedded in social media posts. The framework highlights the asymmetric nature of social interaction among cities, and proposes a series of metrics to quantify such impact from multiple perspectives – including interaction strength, spatial decay, and channel effect. The results show that HSRs not only greatly expand the uneven distribution of inter-city connections, but also significantly reshape the interactions that occur along HSR routes through the channel effect.


Introduction
The High-Speed Railway (HSR) offers advantages in on-time departure/arrival time, comfortable travel experience, and less CO 2 emission in comparison to air travel (Yang et al. 2018a;Tian et al. 2021). Hence, many countries have been making great efforts to develop HSRs, such as those being planned under the Biden administration (Savidge 2021). The effects of regional rail investments is a longstanding topic of interest in the fields of regional/urban planning, real estate, housing, tourism, transportation, politics and economic geography because regional rail is a popular public investment used to promoting the economic agglomeration level of the regional economy, mobility of skilled labor-force, and the reduction of greenhouse gas emissions (Pagliara and Mauriello 2020;Shao, Tian, and Yang 2017;Yu et al. 2020;Audikana 2021;Tian et al. 2021). Researchers examine regional rail benefits from various perspectives, for example, a conventional perspective of investigating the effects is by comparing land or housing values along the rail corridor before and after the start of the rail service. However, fewer studies investigate the regional rail effects in terms of urban interaction, although promoting urban interaction is a primary objective for regional rail investment (Guo, Li, and Han 2020). Examinations of urban interaction resulting from regional rail project present important implications for benefit-cost analysis of regional rail investment and planning.
Furthermore, scholars of regional science, urban studies, and geography have long been interested in the interactions between cities and regions as they convey the spatial structure of a region (Zhen et al. 2019). The urban interaction involves the movement of people and cargo between places. The urban interaction also can refer to virtual connections, such as information communication between places Ye and Wei 2019). Conventionally, survey data were used to measure urban interaction, such as volumes of passengers between two cities (Xiao et al. 2013), migration flows (Flowerdew and Lovett 2010), trade flow (Hesse 2010), and telecommunications (Guldmann 2005). Compared with survey data, social media data offer a high temporal resolution, and are accessible at a large size with minimal cost (Cao et al. 2015;Wang et al. 2016;Li et al. 2017;Yang et al. 2019;Ghani et al. 2019).
HSR, a type of regional trail commuting between cities and running much faster than traditional rail, significantly impacts urban systems, regional development, and the level of urban spatial interaction (Zhang, Wan, and Yang 2019). Due to their high speed and large capacity, it has largely become substitutes for domestic air transportation between cities (Chen 2017;Gao, Su, and Wang 2019). The development of highspeed rail is a key urban infrastructure investment and economic development policy across a couple of countries (Wang, Acheampong, and He 2020b), especially for developing countries. HSRs are reshaping the maps of the spatial interaction, and reconstructing its network structure of cities. These inter-city interaction and relationship brought by HSR tends to play a more prominent role in promoting urban integration and development in areas with more developed economies (Guo, Li, and Han 2020), including knowledgeintensive economy (Wang, Acheampong, and He 2020b). The HSR network, with its reduced travel time, has greatly increased the mobility of people between cities and facilitated the spatial redistribution of urban population, with the consequence of population gain and lose further exacerbated by HSR (Deng et al. 2019). HSR inter-regional commuting can be seen as a type of "temporary" migration (Guirao, Campa, and Casado-Sanz 2018), give rise to a horizontal and polycentric city network and promote regional integration (Xu et al. 2019), and lead to a decrease in the disparity in population accessibility (Wang and Duan 2018). Understanding and quantifying the impact of HSR's interaction on cities, therefore, is crucial for longterm urban and regional development policy and planning. Quantifying dynamic inter-city interaction brought by HSR is challenging. Commonly used indicators -such as per capita income or industrial outputfail to completely capture the effects of HSR. Besides, since these indicators describe the characteristics of individual cities, it is challenging to quantify the interaction between cities, especially if their mutual influences are uneven. Big spatio-temporal data extracted from social media not only contains rich spatial and temporal information, but it also contains information that details the interactions between cities (Han, Tsou, and Clarke 2015;Andris 2016;Zhang et al. 2021).
This paper develops a framework for measuring the bi-directional interaction between cities impacted by HSR by using a toponym from social media as a proxy. The contributions of this paper are summarized below: (1) Designing new metrics for HSR impacts by examining spatial attenuation effects and asymmetric interactions between cities.
(2) Deriving a method to examine the channel effect of HSR, which refers to whether the two cities located at the endpoints of the HSR obtain far greater benefits than other cities through which the HSR passes.
(3) Presenting a case study using social media data to demonstrate the applicability of the framework.

Spatial impact of HSR
Researchers have previously analyzed the line topology of HSR and studied the relationship and impact of HSR and urban spatial structure Zhang, Wan, and Yang 2019). Urban accessibility to HSR is evaluated using network analysis. Inspired by the gravity model for global city network (Taylor and Derudder 2004), it was found that intercity relatedness -as well as radiation and gravity models along the Beijing-Shanghai HSR -were consistent (Liao 2014). The network centrality indicator was then used to investigate the spatial impact of HSR in urban networks (Cao, Feng, and Zhang 2019;Wang, Du, and Huang 2020a); it was found that, with the opening of the HSR network, overall connectivity between cities increased significantly (Jiao, Wang, and Jin 2017). These static studies focus on the spatial impact of the HSR line itself. Previous research indicates that the development of HSR has greatly improved tourism, and affected labor migration and house values in small and medium cities situated along an HSR line (Chen and Haynes 2015). There are other studies that take third-party data into consideration to measure the effect of HSR on everyday life and the urban economy. For example, survey data and regression models have been used to examine the effect of HSR on tourism (Guirao and Campa 2015); labor and economic activity (Guirao, Lara-Galera, and Campa 2017;Li et al. 2016); housing prices (Cheng, Loo, and Vickerman 2015;Diao, Zhu, and Zhu 2016), nighttime light data (Deng et al. 2020), and ticket fares (Cavallaro et al. 2020). The use of passenger flow data allowed for the spatial structure of 99 HSR cities in China to be analyzed, showing that the city network is multi-centered, particularly in the central and eastern cities (Yang et al. 2018b). The impacts of high-speed rail development on airport-level traffic are examined by combining airport-level data, by considering not only the availability of air-HSR intermodal linkage between the airport and HSR station but also the position of the airport's city in the HSR network (Liu et al. 2019). HSR and intercity coach timetable data have been utilized to evaluate city centrality and city-pair connectivity to compare hierarchical structures (Wang, Du, and Huang 2020a). Nevertheless, researchers continue to face limitations in evaluating the impact of HSR. The spatial and temporal scales of economic and demographic data are limited by administrative districts, and some detailed data -such as the number of passengers and survey data of scenic spots -are difficult to obtain (Ye and Carroll 2011).

Spatial relatedness in the big data era
On the other hand, spatiotemporal interaction data is accessible via social media data, which is widely utilized in mapping disasters (Zheng et al. 2018), social activities (Tsou et al. 2013), public health (Hay et al. 2013), spatial planning (Massa and Campagna 2014), and event detection (Sakaki, Okazaki, and Matsuo 2013). All of these studies illustrate the use of social media in the spatial social sciences. In examining HSR impact on tourism, for instance, check-in data collected revealed a high correlation between HSR and tourism. As such, this data can act as a proxy for real tourism arrivals (Liu and Shi 2017). Existing studies have also focused on mining spatial and temporal distribution patterns, instead of measuring spatial interaction between geographic units.
Inter-city flows based on big data provide new perspectives for evaluating spatial interactions. Such connections can be derived using transportation and human travel data (Liu, Andris, and Ratti 2010;Phithakkitnukoon et al. 2011), or telecommunications data ), but these data are difficult to obtain due to privacy and security concerns. Social media account follower data have therefore been used to analyze topological features and to map the spatial distribution of inter-city social interactions in China (Li et al. 2013). Data of this type can also be challenging to work with, because user status updates visible on social media are rarely updated, making it difficult to measure the impact of HSR.
Toponyms (place names) from social media also provide a new type of data for mining spatial patterns and relationships (Meijers and Peris 2018). Previously, this data has been used to examine how food environments influence food choices (Chen and Yang 2014). It has also been used to address the connective strength and differences between geographical entities Lin, Wu, and Li 2019), and to simulate urban growth in metropolitan areas (Lin and Li 2015). In addition, toponym data has been used to understand social media users' geographical awareness of US cities by allowing surveyors to study how Twitter users exchanged and recognized toponyms from various US cities (Han, Tsou, and Clarke 2015). The above study was based on the notion of toponym co-occurrence, assuming there is an inter-city association when the names of two cities occur together within the same text or web page. Toponym co-occurrence considers two toponyms is equivalently weighted in the same text, and such results reveal the undirected instead of directed interaction. Thus, the methods that rely on toponym co-occurrences cannot be directly used to quantify asymmetric interaction between cities.
Human behavior can also be reflected in the number of social media posts made by a person, or a community, or even as a whole, as well as the type of content posted. Social events -including emergencies, big events, holidays, and vacations -are common subjects for social media posts (Xu et al. 2020;Ghani et al. 2019;Dolan et al. 2019). In this paper, we utilize toponymy data and geotag information from social media posts to model inter-city interactions and changes in these interactions resulting from HSR.

Methods
In existing studies (Han, Tsou, and Clarke 2015;Meijers and Peris 2018), mentioning the name of a place in a tweet would indicate the awareness of that location. Given this assumption, we use "awareness" to represent the basic interactions between cities implicit in social media data.

Representing urban interactions with social media data
The city interactions used in this study is very similar to those in the web page texts collected with search engines (Devriendt et al. 2011). Rather than only the spatial information (toponym) co-occurring in web page texts, this paper adopts two types of spatial information: toponym information from social media posts, and location information included in social media posts. A post with text that contains the toponym of city j, posted in city i represents an instance of interaction from city i to city j. We refer to this as awareness from city i to city j. When a post mentions more than one city, each city can count once.
From a spatiotemporal perspective, inter-city social interaction extracted from social media can be expressed as a four-tuple (Toponym, location, Count, Time Duration). The "toponym" is the name of city j, the "location" is the name of city i, the "count" is the number of posts, and the "time duration" is the temporal slot for evaluation. A data example is shown in Table 1.

A framework
A larger value of awareness between two cities indicates a high level of interaction between these two cities. The suggested framework measures the strength of such relatedness from two dimensions. The first dimension describes the statistical perspective and the spatial perspective. The second dimension has two scales: global and channel effect. All inter-city interactions are considered and measured on the "global" scale, while the "channel" scale was developed to express characteristics between the terminal cities of an HSR. The framework of relations between cities is defined from both the statistical dimension and the spatial dimension, as shown in Table 2.
In fact, the awareness between cities is directional. The large value of awareness of city A on city B on social media not only means that there is a stronger influence from city B to city A, but it also represents an equally strong dependence of city A on city B. As such, in-awareness is defined as awareness received from other cities, and out-awareness may be defined as awareness given to other cities. The indices in Table  2 consist of two sub-indices that evaluate both inawareness and out-awareness, separately. The subindices of indices in Table 2 are listed in Table 3. In addition, we will discuss the three indexes in Table 3, while relatedness index, spatial relatedness index, and channel effect index will be detailed in Sections 3.2.1, 3.2.2, and 3.2.3, respectively.
The various indicators in the framework of different time periods are examined, and the dynamic effects of HSR rail can be evaluated by observing the changes of the indices before and after the HSR opening.
In a specific period, the relatedness of N cities in social media may be represented by an N × N matrix from the four-tuples in Table 1. C ij is the awareness index, which is the number of posts by all users located in city i, and whose text contains the name of city j, from city i to j. The larger C ij denotes the stronger awareness from city i to city j. Due to the differences between city i and city j, usually, C ij is not equal to C ji .

Relatedness index
Because C ij is represented by the number of posts between city i and city j, it is significantly connected with demographic factors (i.e. population, people composition, economy) of the two cities. In existing research, the populations of the two cities are employed to normalize the index. A drawback of this method is that the number of posts is not always significantly correlated with population. It is difficult to quantify the relationship between the number of posts and demographic factors. As shown in Figure 2 in section 4.1, the relationship between the number of posts and population varies in different cities.
Self-awareness is employed as the standard value to model inter-city relatedness, to consider the underlying population distribution. Both an awareness index and a dependence index are proposed in order to evaluate the regional relatedness, based on the regional asymmetry relatedness and the selfawareness.
To examine the total awareness of one city from all the other cities, the awareness index of city i is calculated as: Where n is the number of cities. The higher IAI i shows that the city i is more concerned by all the cities.
Not only does C ij represent the awareness from city i to city j, but it can also be explained as an output between city i and city j. For examining the dependence of a city to the totality of the cities, the output index of city i is calculated as:

Spatial relatedness index
The Global Awareness Index (GAI) (Han, Tsou, and Clarke 2015) estimates in-awareness based on spatial decay in social media. The mathematical expression for calculating what connects three of China's four first-tier cities -is given by: Where G ij denotes normalized spatial distance, which is calculated by the actual spatial distance divided by the half length of the Earth's circumference; and P i is the population of the i-th city. Reasonably, the higher the GAI is, the more in-awareness there is, stemming from all the observed cities, and the greater the influence city i has on all the cities. In the GAI model (Han, Tsou, and Clarke 2015), the population is used to normalize the value of GAI to eliminate the impact of inter-city differences in size. For the same considerations as IAI and OAI, this paper adopts localawareness instead of population, to model the spatial awareness index. In the SIAI model, G ij is adopted to denote the spatial weight between the i-th city and the  The sum of the in-awareness rate for a city Out-Awareness Index (OAI) The sum of the out-awareness rate for a city Spatial relatedness index Spatial In-Awareness Index (SIAI) The sum of the spatial inawareness rate for a city Spatial Out-Awareness Index (SOAI) The sum of the spatial outawareness rate for a city Channel effect index (TS) NA A quantitative index of channel effect j-th city. For a more general model, G can be a spatial adjacency matrix, a distance matrix, or a reverse distance matrix. Since spatial interaction is bidirectional, the index should be modeled separately in both directions. As a result, the GAI is extended to the SIAI index, and its calculation formula is as follows: where w ij is the spatial weighted matrix, which is a variant of G. For n cities that are located along an HSR line, the w ij is denoted by: w ij ¼ d ij maxfd ij ji ¼ 1;:::;n; j ¼ 1;:::;ng:: For a city, the higher the SIAI, the stronger the influence there is in space. As with SIAI i , the spatial outawareness index is denoted by Obviously, the higher SOAI i could represent that the i-th city has a stronger spatial out-awareness on all the other cities.

Channel effect
In general, cities at both ends of the HSR line tend to be important large cities. When an HSR line begins operations, the interaction between the two endpoint areas becomes greatly strengthened. How to quantitatively evaluate changes in interactions is a key concern for high-level decision makers and urban planners. According to economic theory, the channel effect refers to the behavior of majority shareholders, because they transfer capital and profits from the company to themselves (Carlson and Zmud 1999). The channel effect suggests that the more resources that are gained from the given channel, the richer communication using that channel, and ultimately the richer the perception of the channel. In this study, the channel effect refers to whether the two cities located at the endpoints of the HSR obtain far greater benefits than other cities through which the HSR passes.
The channel effect is said to be reflected by changes in the spatial in-awareness index and the spatial outawareness index. The spatial attention index here is adopted to estimate the channel effect because the channel effect mainly focuses on how to evaluate the unbalanced influence of cities in a particular geographical city. The channel effect can be estimated by Where the i-th city and the j-th city are the two cities in which the HSR line endpoints are located. The TS ij and TS ji are not equal due because their interactions are asymmetric. The two indices are combined into TS to check the overall channel effect of the high-speed rail line as follows: Under the randomization assumption, the expected value of TS is: When TS is greater than expected, it describes an indication that there is a channel effect in the research HSR line. The larger the value, the stronger the channel effect of the HSR line.
Similarly, a channel effect index can be derived from the perspective of economy as following: where TE ij is the channel effect from city i and city j, and GDP j is the GDP of city j. The TE index could be a reference for examining the TS index.

Data
HSRs in China were first built in 2004 following an announcement by the Chinese government of a longterm plan for railway development. The Beijing-Tianjin high-speed railway that opened on 1 August 2004 was the first HSR line to reach 350 km per hour. Since then, China built the largest HSR network in the world and has improved its existing railways over the course of a decade. As of the end of 2014, the total length of China's HSR exceeded 16,000 km, accounting for more than 50% of the total length of the global HSR. China has built the Beijing-Shenzhen HSR line as shown in Figure 1 from 2005 to 2012. The line consists of three parts, the Wuhan-Guangzhou rail, the Guangzhou-Shenzhen rail, and Beijing-Wuhan began running on 26 December 2009, 26 December 2011, and 26 December 2012, respectively. The line connects many cities, and had been in operation for several years so far, accumulating a presence and influence on social media. Recognizing this, we use this line to conduct our research.
The Beijing-Shenzhen HSR line -which connects three of China's four first-tier cities -is the main transportation corridor from northern China to southern China. The HSR passes through 35 cities which are classified into four groups as shown in Table 4 by the size and the role of those cities.
The First Group (FC) is composed of Beijing, Guangzhou, and Shenzhen, which are three of four of China's first-tier cities. The Second Group (SC) consists of Shijiazhuang, Zhengzhou, Wuhan, and Changsha, which are provincial capital cities. The Third Group (TC) includes 17 prefecture-level cities. In addition, the MC group includes small cities and consists of county-level cities. Actually, every MC city is a part of prefecture-level city in the TC group, and so the cities in the MC group will be removed from the following experiments. The interaction between the first three groups (24 cities) was selected as the research focus for this study.
Sina Weibo is a Chinese social media platform described as a cross between Twitter and Facebook platforms. Each post on Weibo is limited to 140 Chinese characters and some pictures. Weibo provides a web search query that can sort through and show posts -as well as provide the number of posts -based on filters, such as post location, content, and time. A search URL example can be organized to "https://s.weibo.com/weibo/? q = 深圳 &region = custom:11:1000&typeall = 1&timescope = custom:2010-01-01-0:2010-01-31-23". Where "深圳" is the Chinese name of Shenzhen city; "custom:11:1000" is used to refine the posts located in Beijing cities; "typeall = 1" means the return results will include all type of posts; "timescope = custom:2010-01-01-0:2010-01-31-23" means the return posts posted in January 2010.   A crawler was developed to simulate the process of post searches and the results were saved as a dataset. The dataset consists of a four-tuple (toponyms, location, count, time duration) as shown in Table 1. As such, the interaction between cities in each period forms an origin-destination matrix.
Inter-city interaction C ij was recorded on a month granularity for the 24 research cities in the three groups. The dataset contained 576 records (24 × 24) per month. A total of 27,648 (24 × 24 × 48) records were accumulated over 4 years, and thus the datasets from one city to another city are comprised of 48 C ij .
To examine the correlation between posts and HSR, the cities were classified into 2 groups. The first group cities named "JW" are located along the "Beijing-Wuhan -trail", and the second group cities named "WG" are located along the "Wuhan-Guangzhou-trail". The number of posts between the two groups and in each group were calculated and plotted in Figure 2. In December 2012, the "Beijing-Wuhan-trails" started running, so the "JW" and "WG" cities were directly connected by HSR. During this period, it was observed that the number of posts from "JW" to "WG" increased rapidly. This is clear evidence that the posts in social media can reflect the changes in the relationship between cities.
There are fluctuations in the number of posts per month as shown in Table 5, largely because posting behavior is closely related to human behavior, which carries a degree of uncertainty.
To reduce the uncertainty of the number of posts, records of every C ij , were classified into four groups cataloged by the year. The median value calculated based on acalendar year was employed to represent the value of a year. The new data consisting of median values from each year contained 2034 records for 24 cities over 4 years.

Population and inter-city social interaction
The population and four groups of C ii (the awareness from city i to itself) were examined for these variations from 2010 to 2013. The data from the five groups are normalized by the maximum value of each group as shown in Figure 3. The cities in the figure are arranged according to total population from left to right.
As shown in Figure 3, a noteworthy difference exists between the number of people and the value of localawareness (C ii ). The value of C ii of three FC cities -Beijing, Guangzhou, and Shenzhen -is significantly higher than that of other cities. A Pearson correlation analysis is employed to examine the correlation between population and local-awareness of those cities. It reveals their differences, as the coefficients between population and C ii are 0.436, 0.486, 0.466, and 0.483 in the 4 years. It shows that population as a variable has some limitations in eliminating the impact of city size on human activities on social media.
The stability of C ii is further examined over time. As shown in Figure 3, the trend of C ii remains relatively constant over the 4 years. A Pearson correlation analysis was also conducted to verify the interaction of C ii in the four groups. The correlation coefficients between the groups are greater than 0.89, and their statistical significance was less than 0.001 (2-tailed). This means that, although social media user activity is uncertain, the C ii is statistically relatively stable.
The above analysis shows that C ii -or the localawareness -likely reflects social media activities more accurately than the population for city i. In addition, the distribution of the basic indices is examined as well. The distributional pattern of C ij is shown in Figure 4. This distribution is long-tailed and consistent with the previous research (Scellato et al. 2010).
Finally, the significance of C ii is examined. It is seen that 66 of the total 2034 records showed C ij >C jj , or where concern for other cities is greater than localawareness. The number accounting for 3.24% of the total is statistically significant at the 0.05 level. The number of records of C ij >C ii -which stands for cities with a greater in-awareness rather than a greater localawareness -is a total of 14 of 2034. The number accounting for 0.66% of the total record has a statistical significance at the 0.01 level. In other words, both inter-city in-awareness and inter-city outawareness have significant spatial autocorrelation. In summary, the model based on C ii can better evaluate inter-city interaction than a model based on population.

Inter-city relatedness
The inter-city in-awareness index and out-awareness index are calculated and shown in Table 6 from 2010 to 2013. Table 6 illustrates that the mean and standard variance of the inter-city in-awareness index decreases over time, and the median similarly decreases in each of the preceding 3 years. In general, the trends show that inter-city influence and city differences are decreasing. They also demonstrate that the inter-city dependence and its differences are lessening, and the mean, median, and standard of inter-city out-awareness also continue to decrease. Although the total rate of each city's awareness from other cities is increasing (the conclusion in the last section according to the X index), the inter-city inawareness index and the trend of inter-city outawareness index are also on a downward trend. According to the definition of IAI and OAI, localawareness has a higher growth than in-awareness.
Obviously, there are some gaps between the trends of IAI and OAI. These gaps are examined using the difference in the value of IAI and OAI in the same group in the first year (2010), and in the last year (2013) as shown in Figure 5.
As can be seen from Figure 5, in most cities, the value of out-awareness is greater than the value of inawareness. To detail those differences, the ranked IAI and ranked OAI for seven central cities from 2010 to 2013 are listed in Table 7.
As can be seen from Table 7, the in-awareness index and out-awareness index are negatively correlated. The seven central cities (FC cities and SC cities) occupy the top seven spots in the rankings of the inawareness index from 2010 to 2014, even though their out-awareness indices are very low.

Spatial in-awareness and spatial out-awareness
Spatial in-awareness and spatial out-awareness are used to evaluate inter-city interaction based on spatial decay theory. The values of these indices from 2010 to 2013 are shown in Figure 6.
The cities in Figure 6 are sorted according to the grouping of the cities (FC, SC, TC) from left to right. To illustrate the differences between central cities and TC cities in SIAI and SOAI, the average level of central cities (FC cities and SC cities) and TC cities have been examined over the 4 years as shown in Table 8. Table 8 illustrates that the SIAI and SOAI of the seven central cities are lower than those of the other cities. In other words, these seven central cities have less in-awareness and out-awareness than in other cities.
This paper considers the increase in SIAI and SOAI from 2010 to 2013 as shown in Figure 7 to examine the impact of HSR. Table 8 also shows that the average SIAI of six of the seven central cities is on an upward trend while that of 11 of the 17 TC cities is decreasing. In other words, HSR improves the degree of influence of the central cities.
The increment of out-awareness indices are depicted in Figure 7(b). The average out-awareness and the individual out-awareness of five of the seven central cities increased between 2010 and 2013, while the average out-awareness and the individual outawareness for 15 of the 17 TC cities decreased. These findings suggest that HSR improves the spatial outawareness of central cities in the TC cities.

Impact and travel time
The great advantage of HSR is that it reduces inter-city travel times and provides additional convenience. The reduction in time is an important factor that causes people to choose HSR as their mode of transport.
Previous studies have verified that the significant change in inter-city relatedness is created by HSR in the two-hour travel time . To quantify the interaction between HSR impact and travel time, we adopt SIAI and SOAI to conduct the inter-city relatedness over cities from 2010 to 2013.
Nearly 100 trains are running through cities in the Beijing-Shenzhen HSR. Even between the two cities, the trip times vary slightly due to stop conditions and changes in schedules. To simplify the calculation of the trip cost, trip time is set as the travel distance divided by the HSR average speed. The distance between cities can be evaluated using the spatial distance because the HSR is a relatively straight line. Results from the previous section show that HSR has opposing impacts on the seven central cities (FC & SC cities), as well as 17 non-central cities (TC). The spatial relatedness, SIAI and SOAI, will be examined by two groups separated based on whether or not the city is a central city. The experimental results are shown in Figure 8.
On the local scale, there is a significant improvement in the spatial in-awareness and spatial outawareness in the seven central cities since the HSR began operations. These trends are consistent with global trends. The SIAI curve reached the top at 3 h, and slightly decreased after that. The SOAI curve peaked at 1.5 h, and remains stable within a larger spatial scale. The strengthened SIAI and SOAI for central cities indicates that local relatedness for each is strengthened, and their centrality is enhanced.
The trend of SIAI and SOAI in non-central cities conflicts with the trends in the central cities. The SIAI and SOAI touch the bottom of the curve at 1 h and 1.5 h, respectively, and then begin to rise slowly. These data show that they have less influence and connection with the HSR in operation. It can be seen from Figure 8 that the impact of HSR in the two groups varies based on travel time. This shows the differences in radiation between the two groups. HSR improves radiation of the central city within a distance of 3 h or more, while it decreases the non-central cities' radiation in the near distance (about 1-1 h). This change in radiation shows that the influence scope of HSR is not balanced, wherein the central cities' influence is larger than non-central cities' in spatial scope.

Channel effect
The channel effect demonstrates that there is a stronger connection between the starting city and the ending city than other cities, which can be quantified by the difference between the endpoint relatedness of the HSR and the total relatedness. There were two different tunnels formed between 2010 and 2013 in the HSR from Beijing to Shenzhen, due to the fact that the HSR started its service from Wuhan to Shenzhen at first and began its service for the rest of the line later. The first tunnel running from Wuhan to Shenzhen opened in 2012. The tunnel was discontinued when the second tunnel was constructed due to the HSR running from Beijing to Wuhan in 2013, and a new tunnel from Beijing to Shenzhen was formed. Both channel effect indices are examined from 2010 to 2013 as shown in Table 9, where TE presents the channel effect index derived from perspective of economy.
At the year an HSR line begin running, the channel effects of the HSR line are shown with bold font.
It can be seen from Table 9 that the TE index based on inter-city economy has some limitations when evaluating the channel effect: 1) The TE index does not reflect the inter-city channel effect correctly as the TE indices in bold show a lower-than-expected value, which is 0.0413 in this case. 2) The TE index need not indicate the impact of the HSR. For example, the TE index does not change significantly when a new HSR begins to run.
The TS index represents the inter-city influence caused by HSR, as a comparison point for the TE index. It shows a significant channel effect, as the TS indices are far greater than the expected value of 0.043 in both groups. In the first group, the trend of the TS indices in Table 9 decreased in 2010 and 2011, but increased in 2012 after the HSR began running. The index decreased again in 2013, at which time the Wuhan-Shenzhen was no longer a tunnel since Wuhan was no longer the endpoint of the HSR. In the second group, the TS index increased significantly when the Beijing-Shenzhen tunnel formed. In addition, the channel effect is not symmetrical, especially in the first group where the index from Wuhan to Shenzhen (0.319) is far greater than the index for Shenzhen to Wuhan (0.043).  Type  Population  2010  2011  2012  2013  2010  2011  2012  2013  Beijing  FC  12.97  1  1  1  1  24  24  21  16  Shenzhen  FC  2.87  2  2  2  3  23  23  24  24  Guangzhou  FC  8.22  3  4  6  5  19  19  12  14  Wuhan  SC  8.21  4  5  4  6  21  21  18  17  Changsha  SC  6.60  5  3  3  2  22  22  15  21  Zhengzhou  SC  10.72  6  6  5  4  20  17

Discussion
Alongside HSR promotes economic growth (   Yang 2017). It is challenging to precisely quantify such change because it is multidimensional and across multiple disciplines. The data recording the interaction between cities provide a new perspective to qualify the uneven relationships in the temporal context. Because social media data is featured with both spatial and temporal information, this paper captures interactions between cities based on social media data to develop a methodological framework evaluating inter-city relatedness and HSR influence. The framework generates a set of new metrics to advance the study of the impact of HSR.
In line with studies of spatial accessibility, HSR construction would lead to spatial cohesion for cities (Li et al. 2018). The study examines finegrained unbalances from the perspective of central cities and non-central cities. The central cities have high in-awareness and low out-awareness, while non-central cities have lower in-awareness. Very interestingly, the results also show that non-central cities benefit more from HSR than central cities. This provides support from the computing perspective to explain why policy makers hope to build HSR through their cities.
The results of this study suggest that HSR promotes the spatial imbalance between central cities and non-central cities. Specifically, HSR improves the spatial interaction of central cities and reduces the spatial interaction of non-central cities. The spatial impact from HSR between the central city and the non-central city is different. In particular, the central city affects inter-city interactions of more than 3 h of travel distance, while the non-central city has a greater impact within 2 h of travel. Consequently, policymakers in central and noncentral cities should adopt different strategies when formulating policies.
Viewing the HSR line as a corridor, the cities at both ends of the HSR form a natural channel between them. The end cities are usually big cities. The channeling effect between two HSR end cities tend to be more significant. At the same time, Spatial and Social Equity Railway Indexes (SpREI and SoREI) are derived to assess the variation in travel times, number of connections, prices and population affected by HSR (Cavallaro et al. 2020). These two indexes reveal that the intercity connection has decreased due to the significant increase in ticket prices, so the introduction of HSRs might generate the inequalities in the territorial connections. Hence, the need of compensation is expected from the policy makers. Because of the large spatial coverage and highvolume investment, the impact of HSR construction on socio-economic, political, cultural, and society is large and long-lasting.
This study is limited by the bias of social media content, because most users are young people. Inaddition, there may be some noise or missing data in the interaction data obtained because somecities have more than one name, while some cities have the same name. The significantly large value of C ii in Guangzhou in 2010 is because Guangzhou hosted the 16th Asian Games from 12 November to 27 November 2010. It will be a follow-up study to develop algorithms to deal with the bursting mentions of a city due to special events while calculating urban interaction.

Conclusion
This research develops a methodological framework to evaluate impacts of regional rail investment, HSR in China mainland, on the urban system. The framework measures inter-city in-awareness and outawareness, and combines the spatial decay to define spatial in-awareness and spatial out-awareness. It also proposes a measurement for the channel effect. The relatedness model in this framework can not only evaluate the inter-city asymmetric interaction, but can also evaluate the inter-city relatedness more effectively than the traditional methodology. In addition, the relatedness model can be used to examine inter-city interaction. Furthermore, the framework sensitively reflects the changes of intercity relatedness via a new perspective and provides an effective method to evaluate HSR impacts. Finally, the Beijing-Shenzhen HSR, which connects three of China's four first-tier cities, is used as a case study to demonstrate the applicability of the framework.
The follow-up research can incorporate more HSR lines to estimate the overall impact of HSR. In addition, migration data, traditional rail system and economic data between cities can be combined with social media data to assess HSR's impact on inter-city traffic.