Exploring the Spatiotemporal Patterns of Passenger Flows in Expanding Urban Metros: A Case Study of Shenzhen

: Despite extensive investigations on urban metro passenger flows, their evolving spatiotemporal patterns with the extensions of urban metro networks have not been well understood. Using Shenzhen as a case study city, our study initiates an investigation into this matter by analyzing the evolving network topology of Shenzhen Metro. Subsequently, leveraging long-term smart card data, we analyze the evolving spatiotemporal patterns of passenger flows and develop an analytical approach to pinpoint the major passenger sources of urban metro congestion. While the passenger travel demand and the passenger flow volumes kept increasing with the extension of the urban metro network, the major passenger sources were very stable in space, highlighting the inherent invariance in the evolution of the urban metro system. Finally, we analyze the impact of population and land use factors on passenger flow contributions of passenger sources, obtaining useful clues to foresee future passenger flow conditions.


Introduction
Characterized by the positive features of high speed, a large capacity, punctuality, and low pollution, the urban metro is widely regarded as a green and efficient transportation mode [1,2] and favored by commuters in big cities [3].With the rapid growth of passenger travel demand, however, the volumes of passenger flow often exceed the capacities of urban metro sections [4] (e.g., the train load rates of some sections in the Beijing Subway often exceeded 120% [5]).Excessive passenger congestion could cause prolonged train dwell times and an increased likelihood of stranded passengers.This situation, in turn, might result in crowding on platforms [6], accidents such as passengers being caught in screen doors [7], and declines in operational efficiency and service levels [8].To avoid excessive passenger congestion, many passenger flow control [9][10][11], route guidance [12][13][14], and train timetable optimization approaches [15][16][17] have been proposed and implemented in practices [9,18].
Understanding the spatiotemporal patterns of urban metro passenger flows can facilitate the development of effective congestion mitigation strategies.Some previous studies focused on the analysis of passenger congestion in stations.Yu et al. estimated the passenger flow volume of each station and identified the congested stations of Nanjing Metro [19].Lu et al. and Peng et al. studied how passenger congestion is formed in stations [20,21], discovering that critical facilities (e.g., security checks, automatic ticket gates) may become overloaded under situations of large passenger flow, leading to passenger congestion in stations.Moreover, some researchers used video data to monitor the congestion in stations.Zhou et al. analyzed the video data collected at the passages, stairs, and platforms of a station in Ningbo Metro and proposed the Pedestrian Crowd Level (GPC) index to quantify the congestion level [22].In addition, section congestion, which usually refers to the passenger flow volume of a section exceeding its capacity or the number of passengers on a train exceeding its capacity, was also investigated.Kang et al. located the congested sections and analyzed the impact of land use on section congestion [23].Haywood and Koning used the survey data and smart card data of the Paris Subway to uncover the relationship between train congestion and travel time [24].
Despite urban metro passenger flows being extensively investigated, previous studies have paid little attention on how passenger congestion is formed.To solve this, the concept of passenger source was proposed [25,26], and previous results indicated that passengers using congested sections mostly originate from a few stations (i.e., major passenger sources) [27].Leveraging on this discovery, some pioneering works improved passenger flow control approaches based on passenger source information, generating more effective and efficient passenger control schemes [13].However, the evolving spatiotemporal patterns of passenger sources with the extension of urban metro networks are still not understood well.To fill this research gap, we make a comprehensive analysis on the evolving patterns of the topology, the passenger travel demand, and the passenger sources of an expanding urban metro (i.e., Shenzhen Metro).The used geographic information data and smart card data were collected in November 2014, April 2016, August 2016, December 2016, and January 2018, during which Line 11, Line 7, and Line 9 were successively put into operation.This allows us to explore how the spatiotemporal patterns of passenger flows and passenger sources evolve with the changing network topology.
The remaining sections of this paper are organized as follows.Section 2 introduces the data used in the present study.Section 3 details the methods.Section 4 presents the main results.Section 5 discusses the significance of the contributions of this research to current knowledge and practice.Section 6 concludes the results and findings and discusses the research limitations and future directions.

Study Area
Shenzhen is a big city in the south of China.As a core city of the Guangdong-Hong Kong-Macao Greater Bay Area, Shenzhen plays an important role in China's economic and technological developments and has attracted a large number of new residents in recent years.The studied Shenzhen Metro is the backbone of the city's public transportation network.The Shenzhen Metro has been expanding since its opening in 2004, connecting more residential, commercial, and industrial areas in the city.In this study, we use Shenzhen as the case study area to explore the spatiotemporal patterns of passenger flows and passenger sources in the expanding urban metro (Figure 1).

The Smart Card Data of Shenzhen Metro
The obtained smart card data cover five time periods: 10 November to In the smart card data, each recorded line of data is generated when a passenger swipes their smart card to enter or exit an urban metro station.Each record documents the card ID, the station ID, the station name, the line ID, the time, the tapping type, and the gate ID (Table 1).The card ID is anonymized to protect the privacy of passengers.The tapping type '21' indicates that the passenger enters a station, whereas the tapping type '22' indicates that the passenger exits a station.The time is recorded in the format of 'year-month-day-hour-minute-second'.For example, '20180115221151' means that the recorded time is 22:11:51, 15 January 2018 (Table 1).

The Smart Card Data of Shenzhen Metro
The obtained smart card data cover five time periods: In the smart card data, each recorded line of data is generated when a passenger swipes their smart card to enter or exit an urban metro station.Each record documents the card ID, the station ID, the station name, the line ID, the time, the tapping type, and the gate ID (Table 1).The card ID is anonymized to protect the privacy of passengers.The tapping type '21' indicates that the passenger enters a station, whereas the tapping type '22' indicates that the passenger exits a station.The time is recorded in the format of 'yearmonth-day-hour-minute-second'.For example, '20180115221151' means that the recorded time is 22:11:51, 15 January 2018 (Table 1).The preprocessing of the smart card data involves the following steps: an initial removal of duplicate records followed by filtering out entries beyond the operational period.Subsequently, the records of each passenger are sorted chronologically.Each pair of consecutive entry and exit records is identified as a passenger trip, and a total of 55,113,024 passenger trips are identified.For each of the five data collection periods, the average daily passenger flow of each Origin-Destination (OD) pair is calculated.

The Operational Data of Shenzhen Metro
The basic information of the studied urban metro network is outlined in Table 2. Using the operational data released at the official website (https://www.szmc.net,accessed on 6 June 2019) of Shenzhen Metro, we estimate the train load rate of each urban metro section.The capacity of each train on Line 3 is 1440 passengers, whereas the capacity of each train on other lines is 1860 passengers.As shown in Table 3, the train departure interval is progressively reduced to accommodate the growing passenger travel demand.

Methods
This study develops a framework to explore the evolving spatiotemporal patterns of passenger flows and passenger sources in an expanding urban metro.First, the passenger flow volume of each urban metro section is estimated.Second, the passenger sources of congested sections are identified.Third, the Gini coefficients are employed to evaluate the heterogeneity of section passenger flows and passenger flow contributions of passenger sources.Finally, the evolving spatiotemporal patterns of passenger flows and passenger sources are analyzed.The details of the proposed analytical framework are shown in Figure 2.

Estimating the Section Passenger Flow Volume
We assume that all passengers use the shortest path measured via travel time, which is calculated using the Dijkstra algorithm [28].For passengers using a single line, the travel time is the sum of the travel time of each section.For passengers using more than one line, the travel time is composed of the section travel time and the transfer time.The passen-

Estimating the Section Passenger Flow Volume
We assume that all passengers use the shortest path measured via travel time, which is calculated using the Dijkstra algorithm [28].For passengers using a single line, the travel time is the sum of the travel time of each section.For passengers using more than one line, the travel time is composed of the section travel time and the transfer time.The passengers' perceived transfer time is longer than the actual transfer time [29].Based on the method proposed by Wang et al. [30], the perceived transfer time T r is estimated by multiplying the transfer time T r with an amplification factor β [31].
Here, we infer a passenger's path using the perceived travel time: where t(i, j) is the perceived travel time from station i to station j, and t s is the travel time of section s, n is the number of sections, and m is the number of transfers.After inferring the path of each passenger trip, the passenger travel demand is assigned to the urban metro network, and the passenger flow volume of each section is estimated.
A traversal method is used to determine the value of β.We calculate the root mean square error (RMSE) between the actual average travel time t(i, j) and the estimated travel time t(i, j).The parameter β is set to the value generating the minimum RMSE.

RMSE =
1 In Equation (3), N denotes the number of OD pairs, t(i, j) is the average actual travel time from station i to station j, and t(i, j) is the estimated travel time from station i to station j:

Identifying the Passenger Sources of Congested Sections
We use the load rate to quantify the level of congestion of an urban metro section.The load rate of an urban metro section k is defined as the ratio of the section passenger flow volume V k and the section capacity: where f k is the number of trains passing section k in a given time window, and C is the capacity of a train.The sections with a load rate exceeding 1.0 are identified as the congested sections.Consequently, passenger sources are defined as the boarding stations of the passengers using congested sections.

Analyzing the Heterogeneity of Passenger Flows
Gini coefficient is widely used to quantify the inequality among the values of a frequency distribution [32].As shown in Equation ( 6), we propose a Gini coefficient GN 1 to analyze the heterogeneity of section passenger flows.The value of GN 1 ranges from 0 to 1.A small GN 1 indicates the small differences among the section passenger flow volumes, whereas a large GN 1 implies that the section passenger flow volumes exhibit great variances.Hence, GN 1 evaluates the degree of unevenness of section passenger flows.Similarly, as shown in Equation ( 7), we propose another Gini coefficient GN 2 to evaluate the heterogeneity of passenger flow contributions of passenger sources.A small GN 2 indicates that the passenger flow contributions of different passenger sources are similar, whereas a large GN 2 suggests that some passenger sources contribute much more passenger flows to congested sections.
In Equation ( 6), s i and s j represent the passenger flow volumes of sections i and j; m is the total number of sections.In Equation ( 7), p i and p j represent the passenger flow contributions of passenger sources i and j to congested sections; y is the total number of passenger sources.The Shenzhen Metro expanded in 2016 with Line 11 opening on 28 June, and Lines 7 and 9 opening on 28 October.The obtained geographic information data cover the time periods before and after the opening dates of the new lines, allowing us to analyze the topology changes in the urban metro network.Using the geographic information data, we generate the networks of the Shenzhen Metro during different time periods (Figure 3).In the generated urban metro networks, nodes represent urban metro stations, and links represent urban metro sections.The degree of a node k i equals to the number of links connecting to it, and the average degree of the network is ⟨k⟩ = ∑ i k i .The path length l ij from node i to node j is defined as the least number of links connecting i and j.Consequently, the average path length of the network is ⟨l⟩ = ∑ ij l ij .
Figure 4 shows the network topology changes from 2014 to 2018.First, the number of nodes (stations) increases from 118 to 166, which greatly extends the service area of the Shenzhen Metro (Figure 4a).Accordingly, the number of links (sections) increases from 252 to 380, and the total length of the sections increases from 178 km to 286 km (Figure 4b,c).Second, different lines are more connected with each other with the expanding of the urban metro; the number of transfer stations increases from 13 to 28 (Figure 4d), leading to an increase in average degree ⟨k⟩ (Figure 4e).Third, the average path length of the urban metro network decreases, which implies that the transport efficiency is improved from the topological point of view (Figure 4f).In addition, taking the decreasing train departure interval into account (see Table 3), we can arrive at the conclusion that the Shenzhen Metro has evolved with more service areas and higher transport efficiency.

The Evolution of Passenger Travel Demand
As expected, the passenger travel demand keeps growing throughout the four observed years with the expanding of the urban metro network (Figure 5a).However, the average trip distance of passengers first increases and then decreases, reaching the highest value in August 2016 when Line 11 was just put into operation (Figure 5b).The prominent increase in average trip distance from April 2016 to August 2016 could be attributed to the opening of Line 11, which attracted the passengers who lived in the northwestern area of Shenzhen and generated many long-distance trips.Interestingly, the average trip distance has an obvious decrease after Line 7 and Line 9 were put into operation.An explanation is that a large part of Line 7 and Line 9 is located at the central urban area, where short trips are more frequent.Figure 4 shows the network topology changes from 2014 to 2018.First, the number of nodes (stations) increases from 118 to 166, which greatly extends the service area of the Shenzhen Metro (Figure 4a).Accordingly, the number of links (sections) increases from 252 to 380, and the total length of the sections increases from 178 km to 286 km (Figure 4b,c).Second, different lines are more connected with each other with the expanding of the urban metro; the number of transfer stations increases from 13 to 28 (Figure 4d), leading to an increase in average degree 〈〉 (Figure 4e).Third, the average path length of the urban metro network decreases, which implies that the transport efficiency is improved from the topological point of view (Figure 4f).In addition, taking the decreasing train departure interval into account (see Table 3), we can arrive at the conclusion that the Shenzhen Metro has evolved with more service areas and higher transport efficiency.As expected, the passenger travel demand keeps growing throughout the four observed years with the expanding of the urban metro network (Figure 5a).However, the average trip distance of passengers first increases and then decreases, reaching the highest value in August 2016 when Line 11 was just put into operation (Figure 5b).The prominent value in August 2016 when Line 11 was just put into operation (Figure 5b).The prominent increase in average trip distance from April 2016 to August 2016 could be attributed to the opening of Line 11, which attracted the passengers who lived in the northwestern area of Shenzhen and generated many long-distance trips.Interestingly, the average trip distance has an obvious decrease after Line 7 and Line 9 were put into operation.An explanation is that a large part of Line 7 and Line 9 is located at the central urban area, where short trips are more frequent.The temporal pattern of passenger travel demand is quite stable across the five studied time periods from 2014 to 2018 (Figure 6).The passenger flow volume exhibits two peaks in the morning and evening.In addition, the hourly passenger flow volume keeps increasing throughout the four years (Figure 6), which is consistent with the growing pattern of passenger travel demand (Figure 5a).In December 2016, when Line 7 and Line 9 were just put into operation, the passenger travel demand had an obvious rise in the evening period.An explanation is that many stations of Line 7 and Line 9 are located at the central urban area of Shenzhen, where more trips are generated in the evening by passengers who work late or engage in recreational activities.The temporal pattern of passenger travel demand is quite stable across the five studied time periods from 2014 to 2018 (Figure 6).The passenger flow volume exhibits two peaks in the morning and evening.In addition, the hourly passenger flow volume keeps increasing throughout the four years (Figure 6), which is consistent with the growing pattern of passenger travel demand (Figure 5a).In December 2016, when Line 7 and Line 9 were just put into operation, the passenger travel demand had an obvious rise in the evening period.An explanation is that many stations of Line 7 and Line 9 are located at the central urban area of Shenzhen, where more trips are generated in the evening by passengers who work late or engage in recreational activities.The OD passenger travel demand is defined by the daily average number of passengers taking trains from one station (origin) to another station (destination).As shown in Figure 7, the OD passenger travel demand can be well approximated using a power-law distribution across each of the five studied time periods, with minor variations observed in the power-law coefficients.While the majority of OD pairs indicate relatively lower passenger flow volumes, a few specific OD pairs exhibit significantly high passenger travel demand.For instance, in November 2014, the OD passenger travel demand registered 6564 individuals from Ping'zhou Station to Shenzhen University Station, whereas the average OD passenger travel demand was merely 121.This observation underscores the heterogeneous spatial distribution of passenger travel demand, demonstrating a universal pattern irrespective of the network scale and the operation plan.The OD passenger travel demand is defined by the daily average number of passengers taking trains from one station (origin) to another station (destination).As shown in Figure 7, the OD passenger travel demand can be well approximated using a power-law distribution across each of the five studied time periods, with minor variations observed in the powerlaw coefficients.

The Evolution of Section Passenger Flow
We use a morning peak hour (7:30-8:30 a.m.) as a case study time window.Using the passenger flow assignment method proposed by Si et al. [29] (see Section 3.2), we estimate the passenger flow volume of each urban metro section.While the passenger flow volumes of the majority of sections are less than 20,000 passengers/h, the passenger flow volumes of a few sections exceed 40,000 passengers/h (Figure 8a).The volumes of section passenger flow can be well approximated using a truncated power-law distribution: where  = 0.16,  0 = 1801143.35,and the exponential cutoff   = 13110.15.The coefficient of determination [33] (R 2 ) between the data and the truncated power-law function is 0.89, showing a good fitting performance.As the section passenger flow exceeds the exponential cutoff, () exhibits a rapid decline.The heterogeneity of the section passenger flow is fundamentally driven by the heterogeneously distributed passenger travel demand, which naturally generates more passenger flows in some sections but less passenger flows in the others.

The Evolution of Section Passenger Flow
We use a morning peak hour (7:30-8:30 a.m.) as a case study time window.Using the passenger flow assignment method proposed by Si et al. [29] (see Section 3.2), we estimate the passenger flow volume of each urban metro section.While the passenger flow volumes of the majority of sections are less than 20,000 passengers/h, the passenger flow volumes of a few sections exceed 40,000 passengers/h (Figure 8a).The volumes of section passenger flow can be well approximated using a truncated power-law distribution: where β = 0.16, f 0 = 1801143.35,and the exponential cutoff f cut = 13110.15.The coefficient of determination [33] (R 2 ) between the data and the truncated power-law function is 0.89, showing a good fitting performance.As the section passenger flow exceeds the exponential cutoff, P( f ) exhibits a rapid decline.The heterogeneity of the section passenger flow is fundamentally driven by the heterogeneously distributed passenger travel demand, which naturally generates more passenger flows in some sections but less passenger flows in the others.The Gini coefficients of the section passenger flows remain at a notably high level throughout the four years of observation (Figure 8b).Additionally, intriguing variations in the Gini coefficient emerge, predominantly influenced by the evolving network topology and passenger travel demand.Notably, a slight decrease in the Gini coefficient occurs from November 2014 to April 2016, indicating a relatively more even spatial distribution of section passenger flows during this interval.However, with the opening of Line 11, there is an observed increase in the Gini coefficient from April 2016 to August 2016.This rise can be attributed to the typically lower initial passenger flow volumes of the new line [34], accentuating the heterogeneity of the section passenger flows.A similar increase in the Gini coefficient is witnessed from August 2016 to December 2016, coinciding with the opening of Line 7 and Line 9. Finally, a subsequent decrease in the Gini coefficient from December 2016 to January 2018 aligns with our expectations.This decrease corresponds to the gradual increase in section passenger flows in the three new lines, contributing to a more balanced distribution of section passenger flows within the urban metro network.
Despite the expanded network size and the enhanced section capacity, the Shenzhen Metro becomes more congested.This is manifested in the increasing number of congested sections from 2014 to 2018 (Figure 9a).The more sufficient use of urban metro infrastructure can increase the profit of the operation company [35,36] and alleviate traffic congestion on roads [37][38][39].However, we must note that the increasing passenger congestion downgrades the level of service of the urban metro.As shown in Figure 9b, the number of passengers using congested sections keeps increasing from 2014 to 2018.The Gini coefficients of the section passenger flows remain at a notably high level throughout the four years of observation (Figure 8b).Additionally, intriguing variations in the Gini coefficient emerge, predominantly influenced by the evolving network topology and passenger travel demand.Notably, a slight decrease in the Gini coefficient occurs from November 2014 to April 2016, indicating a relatively more even spatial distribution of section passenger flows during this interval.However, with the opening of Line 11, there is an observed increase in the Gini coefficient from April 2016 to August 2016.This rise can be attributed to the typically lower initial passenger flow volumes of the new line [34], accentuating the heterogeneity of the section passenger flows.A similar increase in the Gini coefficient is witnessed from August 2016 to December 2016, coinciding with the opening of Line 7 and Line 9. Finally, a subsequent decrease in the Gini coefficient from December 2016 to January 2018 aligns with our expectations.This decrease corresponds to the gradual increase in section passenger flows in the three new lines, contributing to a more balanced distribution of section passenger flows within the urban metro network.
Despite the expanded network size and the enhanced section capacity, the Shenzhen Metro becomes more congested.This is manifested in the increasing number of congested sections from 2014 to 2018 (Figure 9a).The more sufficient use of urban metro infrastructure can increase the profit of the operation company [35,36] and alleviate traffic congestion on roads [37][38][39].However, we must note that the increasing passenger congestion downgrades the level of service of the urban metro.As shown in Figure 9b, the number of passengers using congested sections keeps increasing from 2014 to 2018.
To analyze the evolving spatial distribution of section passenger flow, we analyze the change in section passenger flow between two consecutive time periods.
where f i (t) is the passenger flow volume of section i in time period t (e.g., April 2016) and f i (t − 1) is the passenger flow volume of section i in time period t − 1 (e.g., November 2014).For each consecutive time periods t − 1 and t, we identify the congested sections in time period t − 1, and then analyze their passenger flow condition in time period t.
First, with the growth of passenger travel demand, the congested sections become more congested from November 2014 to April 2016 (Figure 10a).Subsequently, with the opening of Line 11, the passenger flow volumes of some sections of Line 1 decrease in August 2016, implying that Line 11 can share some of the transportation loads posed on Line 1 (Figure 10b).Yet, the other congested sections become more congested, which is probably caused by the new passengers induced by Line 11 (Figure 10b).With the opening of Line 7 and Line 9, three congested sections on Line 4 become less congested, which is also a sign of passenger flow diversion (Figure 10c).Finally, as expected, the congested sections become more congested with the growing passenger travel demand from December 2016 to January 2018 (Figure 10d).In summary, the opening of new lines may temporarily alleviate the congestion of some sections, but it will eventually induce more regular passengers to the urban metro.
[34], accentuating the heterogeneity of the section passenger flows.A similar increase in the Gini coefficient is witnessed from August 2016 to December 2016, coinciding with the opening of Line 7 and Line 9. Finally, a subsequent decrease in the Gini coefficient from December 2016 to January 2018 aligns with our expectations.This decrease corresponds to the gradual increase in section passenger flows in the three new lines, contributing to a more balanced distribution of section passenger flows within the urban metro network.
Despite the expanded network size and the enhanced section capacity, the Shenzhen Metro becomes more congested.This is manifested in the increasing number of congested sections from 2014 to 2018 (Figure 9a).The more sufficient use of urban metro infrastructure can increase the profit of the operation company [35,36] and alleviate traffic congestion on roads [37][38][39].However, we must note that the increasing passenger congestion downgrades the level of service of the urban metro.As shown in Figure 9b, the number of passengers using congested sections keeps increasing from 2014 to 2018.First, with the growth of passenger travel demand, the congested sections become more congested from November 2014 to April 2016 (Figure 10a).Subsequently, with the opening of Line 11, the passenger flow volumes of some sections of Line 1 decrease in August 2016, implying that Line 11 can share some of the transportation loads posed on Line 1 (Figure 10b).Yet, the other congested sections become more congested, which is probably caused by the new passengers induced by Line 11 (Figure 10b).With the opening of Line 7 and Line 9, three congested sections on Line 4 become less congested, which is also a sign of passenger flow diversion (Figure 10c).Finally, as expected, the congested sections become more congested with the growing passenger travel demand from December 2016 to January 2018 (Figure 10d).In summary, the opening of new lines may temporarily alleviate the congestion of some sections, but it will eventually induce more regular passengers to the urban metro.

The Evolution of Passenger Sources of Congested Sections
Passenger congestion may cause various problems such as crowding risks, low service level and low operation efficiency [6,40].This calls for an in-depth exploration on how passenger congestion is formed.To achieve this, we analyze the passenger flow volume g i contributed by each passenger source i to the congested sections.As shown in Figure 11a, the passenger flow contributions of passenger sources (G) follow truncated power-law distributions across the five studied time periods: where the fitted parameters are shown in Table 4.When the ranks of passenger sources exceed the exponential cutoff g cut (ranges from 11 to 14), the passenger flow contributions of the passenger sources exhibit a sharp decline.This indicates that most passenger sources contribute limited passenger flows to congested sections, while a few passenger sources are the key passenger sources causing urban metro congestion.
11a, the passenger flow contributions of passenger sources () follow truncated powerlaw distributions across the five studied time periods: where the fitted parameters are shown in Table 4.When the ranks of passenger sources exceed the exponential cutoff   (ranges from 11 to 14), the passenger flow contributions of the passenger sources exhibit a sharp decline.This indicates that most passenger sources contribute limited passenger flows to congested sections, while a few passenger sources are the key passenger sources causing urban metro congestion.
The passenger flow contributions of passenger sources are heterogeneously distributed, manifesting in the high Gini coefficients throughout the four observational years (Figure 11b).The Gini coefficient increases from November 2014 to December 2016, during which three new lines were put into operation.An explanation is that the passengers living in the vicinities of the major passenger sources may be attracted to visit the regions covered by the new stations.Consequently, more passenger flows are generated at the major passenger sources and the heterogeneity of passenger flow contributions to congested sections is enhanced.This, however, inevitably increases the management pressure at the major passenger sources.On the other hand, the Gini coefficient decreases from December 2016 to January 2018, during which no new lines were put into operation.An explanation is that the passenger travel demand of the new lines gradually increases during this period, the section passenger flows are more evenly distributed, and the heterogeneity of the passenger flow contributions of passenger sources is reduced.The passenger flow contributions of passenger sources are heterogeneously distributed, manifesting in the high Gini coefficients throughout the four observational years (Figure 11b).The Gini coefficient increases from November 2014 to December 2016, during which three new lines were put into operation.An explanation is that the passengers living in the vicinities of the major passenger sources may be attracted to visit the regions covered by the new stations.Consequently, more passenger flows are generated at the major passenger sources and the heterogeneity of passenger flow contributions to congested sections is enhanced.This, however, inevitably increases the management pressure at the major passenger sources.On the other hand, the Gini coefficient decreases from December 2016 to January 2018, during which no new lines were put into operation.An explanation is that the passenger travel demand of the new lines gradually increases during this period, the section passenger flows are more evenly distributed, and the heterogeneity of the passenger flow contributions of passenger sources is reduced.
The passenger sources are sorted in a descending order according to their passenger flow contributions to the congested sections.The top ranked stations that cumulatively contribute 50% of the passenger flows of congested sections are defined as the major passenger sources of the urban metro congestion.Figure 12 shows the spatial distribution of the identified major passenger sources of Shenzhen Metro.During 7:30-9:30 a.m., the major passenger sources are mainly distributed in the northern part of Line 1, Line 3, and Line 4 (Figure 12a,b).An explanation is that many residential districts are located at these regions, where a large number of passengers travel to the central urban area during morning rush hours.These passengers contribute to the majority of the passenger flows of the congested sections.During 5:30-7:30 p.m., the major passenger sources are mainly distributed in the central urban area of Shenzhen (Figure 12c,d), where a lot of passengers return to home during evening rush hours.The passenger sources are sorted in a descending order according to their pass flow contributions to the congested sections.The top ranked stations that cumula contribute 50% of the passenger flows of congested sections are defined as the majo senger sources of the urban metro congestion.Figure 12 shows the spatial distributi the identified major passenger sources of Shenzhen Metro.During 7:30-9:30 a.m., th jor passenger sources are mainly distributed in the northern part of Line 1, Line 3 Line 4 (Figure 12a,b).An explanation is that many residential districts are located at regions, where a large number of passengers travel to the central urban area during m ing rush hours.These passengers contribute to the majority of the passenger flows o congested sections.During 5:30-7:30 p.m., the major passenger sources are mainly di uted in the central urban area of Shenzhen (Figure 12c,d), where a lot of passengers r to home during evening rush hours.Next, we analyze the impact of population and land use factors on passenger flow contributions of passenger sources.Here, we define the service area of an urban metro station as the 1 km circular region around the station.Subsequently, using the fine-grained WorldPop data provided by the WorldPop Project (https://www.worldpop.org,accessed on 12 May 2024), we estimate the population within the service area of each station, which is a factor that potentially influences the volumes of passenger flow.In addition, leveraging the Point of Interest (POI) data provided by the Gaode Map (https://lbs.amap.com, accessed on 15 January 2018), we estimate the numbers of different types of POIs within the service area of each station to evaluate the land use properties (Table 5).The multiple linear regression (MLR) approach [41] is used to analyze the correlations between the passenger flow contributions of passenger sources and the land use and population factors.Based on the F-statistic and the associated p-values (all smaller than 0.05), the generated MLR models are significant.As indicated in Table 5, the population and land use factors have different impacts on the passenger flow contributions of passenger sources during the morning (7:30-9:30 a.m.) and evening (5:30-7:30 p.m.) peak hours.In the morning, the number of residences has a positive impact, while the number of companies and the number of public facilities have negative impacts on the volumes of passenger flow.Conversely, in the evening, the number of companies and the number of public facilities have positive impacts, while the number of domestic services has a negative impact on the volumes of passenger flow.This result highlights that the abundance of a certain type of POIs around a station may enhance the station's contribution to urban metro congestion at a specific time window.Moreover, the discovery provides a guideline to foresee the passenger flow conditions of the stations to be opened in the future, which could potentially facilitate more reasonable planning and the more efficient management of expanding urban metros.

Discussion
The first prominent feature of the present study relies on the analysis of long-term smart card data.There were a number of research works focusing on the analysis of passenger flows in urban metros; however, most of them were based on short-term smart card data.During the observation period, the structure of the studied urban metro network does not change, and the passenger travel demand is quite stable.Hence, most of those studies only analyzed an isolated slice of passenger flow status during a certain time period.The dynamic process of the passenger flow evolving with the extension of the urban metro network was not sufficiently investigated.In this study, we try to fill this research gap by employing long-term smart card data that spans four years, uncovering the evolving spatiotemporal patterns of passenger flows.
Indeed, we obtained some new and exciting discoveries by analyzing the long-term smart card data.We discovered a counter intuitive phenomenon, where building new lines does not alleviate urban metro congestion.This phenomenon could be explained by the growing attractiveness of the expanding urban metro to passengers.In addition, we discovered the inherent invariance in the evolution of the urban metro system.Although the topology, the structure of the urban metro network, the passenger travel demand, and the number of congested sections kept changing across the four observational years, the spatial distribution of the major passenger sources, which is mainly determined by the land use pattern, was quite stable.
The second prominent feature of this study relies on the proposed passenger source analytical approach, which is applied to uncover the inherent laws driving the formation of urban metro congestion.This reflects an improvement to previous research that mostly focused on passenger flow analysis.Given that the identified major passenger sources are the key contributors of urban metro congestion, the passenger source information could facilitate the improvement of urban metro operation strategies (e.g., short-turning plan, passenger flow control).To better show the important role of major passenger sources in alleviating urban metro congestion, we designed a simple proof-of-concept simulation experiment.
In the designed simulation experiment, the passenger travel demand is reduced to alleviate urban metro congestion, and two strategies are tested.In strategy S 1 , a certain number of passengers from the major passenger sources are randomly selected and controlled to not use the urban metro.In strategy S 2 , a certain number of passengers from all urban metro stations are randomly selected and controlled to not use the urban metro.Taking the congested section "Bao'an Center Station-Xin'an Station" as an example, we estimate the required number of controlled passengers to decrease its passenger flow by a fraction f (ranging from 4% to 20%).As shown in Table 6, much fewer passengers are controlled when using strategy S 1 compared with using strategy S 2 .Obviously, controlling passengers from major passenger sources can alleviate the congestion more effectively.Given that the operator cannot forbid passengers from using the urban metro, we could not directly apply the strategies of this simulation experiment in practice.However, the simulation result highlights the importance of pinpointing major passenger sources.In practice, more sophisticated operation strategies can be developed based on the passenger source information to improve operation efficiency and reduce management cost.

Conclusions
Exploring the spatiotemporal patterns of passenger flows and passenger sources in expanding urban metros is essential for the sustainable development of urban transportation.With the opening of new lines, the Shenzhen Metro has evolved to have more service areas and a higher transportation efficiency.However, the passenger travel demand grows at a faster speed, and the urban metro becomes more congested throughout the four observational years.Despite that the opening of new lines may temporarily alleviate the congestion of some urban metro sections, it will eventually induce more passenger travel demand.To this end, we propose a passenger source analytical approach to explore the inherent laws driving the formation of urban metro congestion.Interestingly, only a few key passenger sources contribute to the major urban metro congestion, and the simulation experiment suggests that developing operation strategies targeting these key passenger sources may achieve a better congestion mitigation effect.Finally, we explore the impact of land use and population factors on the passenger flow contributions of passenger sources.The results can help in anticipating the passenger flow condition of the stations to be opened in the future and facilitate the more reasonable planning and more efficient management of the expanding urban metros.

Figure 1 .
Figure 1.The studied area and the Shenzhen Metro network.

Figure 1 .
Figure 1.The studied area and the Shenzhen Metro network.

Figure 2 .
Figure 2. Analytical framework of this study.

4 . Results 4 . 1 .
The Evolution of Passenger Travel Demand with Network Topology 4.1.1.The Network Topology Changes in the Expanding Urban Metro

Figure 3 .
Figure 3.The evolving networks of Shenzhen Metro.(a) The Shenzhen Metro network in November 2014 and April 2016.(b) The Shenzhen Metro network in August 2016.(c) The Shenzhen Metro network in December 2016 and January 2018.(d) Enlarged view of Line 7 and Line 9.

Figure 3 .
Figure 3.The evolving networks of Shenzhen Metro.(a) The Shenzhen Metro network in November 2014 and April 2016.(b) The Shenzhen Metro network in August 2016.(c) The Shenzhen Metro network in December 2016 and January 2018.(d) Enlarged view of Line 7 and Line 9.

Figure 4 .
Figure 4.The network topology changes in the expanding urban metro.(a) The number of stations.(b) The number of sections.(c) The total length of the sections.(d) The number of transfer stations.(e) The average degree of the network.(f) The average path length of the network.4.1.2.The Evolution of Passenger Travel Demand

Figure 4 .
Figure 4.The network topology changes in the expanding urban metro.(a) The number of stations.(b) The number of sections.(c) The total length of the sections.(d) The number of transfer stations.(e) The average degree of the network.(f) The average path length of the network.

Figure 5 .
Figure 5.The evolution of passenger travel demand.(a) The passenger travel demand.(b) The average trip distance.

18 Figure 5 .
Figure 5.The evolution of passenger travel demand.(a) The passenger travel demand.(b) The average trip distance.

Figure 8 .
Figure 8.The evolution of section passenger flow.(a) Distribution of section passenger flow.(b) The Gini coefficient of section passenger flow.

Figure 8 .
Figure 8.The evolution of section passenger flow.(a) Distribution of section passenger flow.(b) The Gini coefficient of section passenger flow.

Figure 9 . 18 Figure 9 .
Figure 9.The number of congested sections (a) and the number of passengers using congested sections (b) throughout the five studied time periods.

Figure 10 .
Figure 10.The changes in passenger flow volumes of the congested sections during a morning peak hour (7:30-8:30 a.m.).(a) Comparison between November 2014 and April 2016.(b) Comparison between April 2016 and August 2016.(c) Comparison between August 2016 and December 2016.(d) Comparison between December 2016 and January 2018.

Figure 10 .
Figure 10.The changes in passenger flow volumes of the congested sections during a morning peak hour (7:30-8:30 a.m.).(a) Comparison between November 2014 and April 2016.(b) Comparison between April 2016 and August 2016.(c) Comparison between August 2016 and December 2016.(d) Comparison between December 2016 and January 2018.

Figure 11 .
Figure 11.The evolution of passenger flow contributions of passenger sources.(a) Passenger flow contributions of passenger sources across the five studied time periods.The scattered points of different colors represent the actual data, and corresponding colored lines represent the fitted functions.(b) The Gini coefficient of passenger flow contributions.

Figure 11 .
Figure 11.The evolution of passenger flow contributions of passenger sources.(a) Passenger flow contributions of passenger sources across the five studied time periods.The scattered points of different colors represent the actual data, and corresponding colored lines represent the fitted functions.(b) The Gini coefficient of passenger flow contributions.

Figure 12 .
Figure 12.The major passenger sources of Shenzhen Metro in January 2018.Figure 12.The major passenger sources of Shenzhen Metro in January 2018.

Figure 12 .
Figure 12.The major passenger sources of Shenzhen Metro in January 2018.Figure 12.The major passenger sources of Shenzhen Metro in January 2018.

Table 1 .
Information of the used smart card data.

Table 1 .
Information of the used smart card data.

Table 2 .
Basic information of Shenzhen Metro.

Table 3 .
Peak hour train departure intervals (in minutes) during the five studied time periods.

Table 4 .
(10)fitted parameter values across the five studied time periods in Equation(10).

Table 4 .
(10)fitted parameter values across the five studied time periods in Equation(10).

Table 5 .
The coefficient of multiple linear regression model.

Table 6 .
Comparison of strategy S 1 and strategy S 2 .