Data-Driven Method to Estimate the Maximum Likelihood Space–Time Trajectory in an Urban Rail Transit System

: The Urban Rail Transit (URT) passenger travel space–time trajectory reﬂects a passenger’s path-choice and the components of URT network passenger ﬂow. This paper proposes a model to estimate a passenger’s maximum-likelihood space–time trajectory using Automatic Fare Collection (AFC) transaction data, which contain the passenger’s entry and exit information. First, a method is presented to construct a space–time trajectory within a tap in/out constraint. Then, a maximum likelihood space–time trajectory estimation model is developed to achieve two goals: (1) to minimize the variance in a passenger’s walk time, including the access walk time, egress walk time and transfer walk time when a transfer is included; and (2) to minimize the variance between a passenger’s actual walk time and the expected value obtained by manual survey observation. Considering the computational efﬁciency and the characteristics of the model, we decompose the passenger’s travel links and convert the maximum likelihood space–time trajectory estimation problem into a single-quadratic programming problem. Real-world AFC transaction data and train timetable data from the Beijing URT network are used to test the proposed model and algorithm. The estimation results are consistent with the clearing results obtained from the authorities, and this ﬁnding veriﬁes the feasibility of our approach. can be solved easily using monotonicity of the objective function.


Introduction
As an important part of urban public transport, Urban Rail Transit (URT) serves increasingly more citizens, and its network scale is experiencing large growth. With the rapid development of URT, this network has expanded from a single line into multiple lines, forming a complex network. The Beijing Subway has developed from a simple network of four subway lines into a complex network of twenty-one subway lines in the past 10 years. On the one hand, passengers have more route choices under a complex network. There is typically more than one feasible spatial route between the Origin and Destination (OD) for passengers to travel. For example, approximately 73.9 percent of OD pairs had two or more spatial routes within the Shanghai subway in July 2015. Consequently, there is an urgent need to study passenger network flow within a complex URT network. On the other hand, complex subway networks are typically cooperated by several companies, especially in China. Because all fare payments are collected through a common Automatic Fare Collection (AFC) system, high accuracy is required to allocate revenues according to ridership shares. a passenger is left behind at the individual level can be estimated. The output of the model is the probability that a passenger boards each feasible train. At the aggregate level, the degree of train load and station crowding are estimated based on the assignment results with satisfactory accuracy regardless of transfers. That paper focused mainly on journeys without transfers. Even though that paper stated that the problem with transfers could be formulated in a similar way in principle, the diversity of spatial routes due to the complexity of the URT network was ignored.
In [27], a methodology was proposed to estimate the most likely space-time path by mining the AFC data. The model took the left-behind phenomenon into consideration (in which passengers are prevented from boarding the first arriving train due to the crowd) and incorporated a time-expanded network to formulate the passengers' space-time path. Assuming the independence of passengers, the most likely space-time path estimation model was developed. At the macroscale level, the network passenger flow distribution result estimated with the proposed method is consistent with the actual data. At the microscale level, all passengers' detailed space-time paths are estimated. However, the accuracy of the estimation results relies heavily on the probability that passengers can board a specific train vehicle. Table 1 provides a systematic comparison of the key modeling components in the existing network flow assignment research.  [19] Minimizing the deviation between path and conductor checks BFA √ √ √ As shown in Table 1, a data-driven method to assign the maximum likelihood space-time trajectory is proposed, and individual differences are taken into consideration based on prior studies and our earlier work. In contrast to the traditional methods, the proposed method outputs all the passengers' detailed travel information, which indicates the passengers' movements among activity locations with respect to time. Moreover, an estimation model using the station walk time parameters, which can be obtained easily and accurately, is developed to reduce the dependence on data and improve the reliability and accuracy of the estimation results.
This paper decomposes the journey within the URT network into different types of activities, including time spent on the access walk, platform, train, and egress walk. Transfer walk and additional platform waiting are also included when a transfer occurs. If all the trains are punctual according to train schedules, the passengers' space-time trajectories are constructed based on the train timetable data and URT network topology. Since the walking speed of a passenger generally fluctuates within a small range, one of our aims is to minimize the variance among the passenger's walk activities. Our other aim is to minimize the variance between the passenger's actual walk time and the expected value, which is obtained by a manual survey. Taking this individual difference into account, the maximum likelihood space-time trajectory estimation model is proposed. The proposed estimation model presents a method to calculate the weight of each space-time trajectory. Because a train vehicle departs a station at a fixed time according to a schedule, the passenger's arrival time and departure time can be calculated once his/her space-time trajectory is assigned. Thus, the total time consumption a passenger spent at a station can be calculated. Then, a quadratic problem is formulated to solve the estimation problem. If the activities of the passengers at different stations are independent, the quadratic problem can be converted to a set of one-quadratic problems to improve computational efficiency.
The main contributions of this paper are as follows:

1.
A data-driven methodology to estimate a passenger's detailed travel information is developed. Passengers' detailed trajectories can be used for further study, such as analysis of path-choice behavior.

2.
A method to estimate walk parameters of subway stations using AFC data and train schedules is developed.

3.
At the aggregate level, we develop outputs of various kinds of statistical reports for operators, such as time-independent network passenger flow distribution and time-independent congestion of train vehicles and stations.
This paper is organized as follows. Section 2 analyzes the passenger travel components and introduces the main idea of estimating the maximum space-time trajectory based on AFC data. Section 3 describes the model, followed by an introduction of the solution algorithm in Section 4. In Section 5, numerical experiments on a real-world network are presented. The final section presents the paper's conclusions along with a summary of the comments and future research steps.

Conceptual Illustrations
This section first analyzes the passenger's travel trajectory components. Then, we illustrate how to use a time-expanded network to represent a passenger's space-time trajectory.

Passenger Travel Trajectory Components
A URT system is a closed system that includes a pay zone and a free zone. A passenger enters the pay zone once he/she passes an entry gate and leaves when passing an exit gate by swiping his/her smart card. The AFC system records a passenger's entry and exit information and produces complete transaction data. Figure 1a illustrates a simple urban rail network topology, which is made up of three railway lines, and Figure 1b presents a complete journey from station A to station D by path A→C→E, which contains one transfer.
As shown in Figure 1b, in general, the travel time comprises the access walk time, Platform Waiting Time (PWT), On-Train Time (OTT), egress walk time and transfer walk time (if a transfer is included). The mean value of the access walk time and egress walk time are typically measured by a manual survey or simulation model depending on the station volume. The access walk time is defined as the elapsed time after entry and before arriving at the midpoint of the platform(s), as represented by arrow 1 in Figure 1b. Similarly, the egress, represented by arrow 5 in Figure 1b, is defined as walking from the midpoint to the exit gate(s), assuming that the egress walk starts immediately after the train arrives.
The OTT is determined by the departure time and arrival time. According to the punctual assumption, the arrival time and departure time at all stations are determined and fixed. Therefore, the OTT is constant once a space-time path is assigned for a passenger.  Figure 1b. Similarly, the egress, represented by arrow 5 in Figure 1b, is defined as walking from the midpoint to the exit gate(s), assuming that the egress walk starts immediately after the train arrives.
The OTT is determined by the departure time and arrival time. According to the punctual assumption, the arrival time and departure time at all stations are determined and fixed. Therefore, the OTT is constant once a space-time path is assigned for a passenger.
The transfer walk time is defined as the elapsed time after the end of the previous on-train travel and before the beginning of the next on-train travel if the transfer walk starts immediately after the train arrives. This time is represented by arrow 3 in Figure 1b.
The PWT is defined as the elapsed time after the end of access and before the beginning of on-train travel, i.e., the PWT is the time from the passenger arrival at the midpoint of the platform to the start of movement of the boarded train.

Passenger's Space-Time Trajectory
According to the previous section, travel time contains four parts. For journeys that require one (or more) transfer(s), the transfer walk time and additional PWT are also included. Thus, the spacetime trajectory comprises five types of links: the access link, transfer link, egress link, platform wait link and train link. The first three are a type of walk link. Travel trajectory is linked by train connections, and there are no two adjacent walk links. Here, we take the OD from station A to station E as an example.
As shown in Figure 1a, there are two feasible spatial routes for the passenger to choose. The first route is A→C→E, which requires one transfer. The other one is A→B→D→E. During the low peak hours, a passenger can board the first available train after his/her arrival at the platform. Figure 2a  The transfer walk time is defined as the elapsed time after the end of the previous on-train travel and before the beginning of the next on-train travel if the transfer walk starts immediately after the train arrives. This time is represented by arrow 3 in Figure 1b.
The PWT is defined as the elapsed time after the end of access and before the beginning of on-train travel, i.e., the PWT is the time from the passenger arrival at the midpoint of the platform to the start of movement of the boarded train.

Passenger's Space-Time Trajectory
According to the previous section, travel time contains four parts. For journeys that require one (or more) transfer(s), the transfer walk time and additional PWT are also included. Thus, the space-time trajectory comprises five types of links: the access link, transfer link, egress link, platform wait link and train link. The first three are a type of walk link. Travel trajectory is linked by train connections, and there are no two adjacent walk links. Here, we take the OD from station A to station E as an example.
As shown in Figure 1a, there are two feasible spatial routes for the passenger to choose. The first route is A→C→E, which requires one transfer. The other one is A→B→D→E. During the low peak hours, a passenger can board the first available train after his/her arrival at the platform. Figure 2a,c displays the different network-time paths of different routes, which are chosen by the passenger for his/her travel.
In constructing the time-expanded network, as shown in Figure 2, a transfer station is replaced by two or more stations, which is dependent on the account of railway line passing by this station. For example, Station C is a transfer station passed by Line 1 and Line 3. Therefore, Station C is replaced by Station C and Station C" in a time-expanded network.
As shown in Figure 2, there are usually two or more feasible space-time trajectories to assign with given entry and exit information. Figure 2a,b indicate that the passenger traveled via spatial path A→ C → E. However, the passenger spend much more time on walking from the entry gate to platform in Figure 2b than the one in Figure 2a. Figure 2c indicates that the passenger traveled by a different spatial route, In a complexity URT, generally, there are two or more feasible spatial routes for passenger to choose for their travels. Even though the entry information and exit information can be recorded by AFC system accurately, it is hard to tell the specific space-time trajectory only by AFC transaction data. This paper aims to develop a methodology to estimate the maximum likelihood space-time trajectory with given entry information and exit information recorded by AFC system.  In constructing the time-expanded network, as shown in Figure 2, a transfer station is replaced by two or more stations, which is dependent on the account of railway line passing by this station. For example, Station C is a transfer station passed by Line 1 and Line 3. Therefore, Station C is replaced by Station C′ and Station C″ in a time-expanded network.
As shown in Figure 2, there are usually two or more feasible space-time trajectories to assign with given entry and exit information. Figure 2(a) and 2(b) indicate that the passenger traveled via spatial path A C  E. However, the passenger spend much more time on walking from the entry gate to platform in Figure 2(b) than the one in Figure 2(a). Figure 2(c) indicates that the passenger traveled by a different spatial route, In a complexity URT, generally, there are two or more feasible spatial routes for passenger to choose for their travels. Even though the entry information and exit information can be recorded by AFC system accurately, it is hard to tell the specific space-time trajectory only by AFC transaction data. This paper aims to develop a methodology to estimate the maximum likelihood space-time trajectory with given entry information and exit information recorded by AFC system.

Maximum Likelihood Space-Time Trajectory Estimation Models
We now describe the formal problem statement for the passenger's space-time trajectory model to minimize the variance among the passenger's walking speeds and mean values. First, the variables used in the mathematical formulations are defined. Those variables describe the construction of the time-expanded network and passenger's space-time trajectory. Then, the method of constructing the time-expanded network and the maximum likelihood space-time trajectory estimation model are proposed. Tables 2 and 3 give the related notations, input parameters and decision variables of the corresponding problem.

Notations
Index of space-time link indicating the actual movement at the entering time t 1 and leaving time t 2 on the spatial link (i, j), (i, j, t 1 , t 2 ) ∈ E s name Name of station s, s ∈ S i station Index of station that spatial node i belongs to, i ∈ N o p , d p Index of origin node, destination node of passenger p, p ∈ P, o p , d p ∈ N c t 1 ,t 2 i,j,p Time cost for passenger p on link (i, j, t 1 , Interval time of train departure at station s at t, s ∈ S, t ∈ T tt s,p Total time consume of passenger p at station s, s ∈ S, p ∈ P Table 3. Decision variables used in the mathematical formulation.

Variable Definition t i,p
Time at which passenger p arrives at node i, p ∈ P, i ∈ N z t 1 ,t 2 i,j,p 0-1 binary variables: 1 if passenger p passes spatial link (i, j) from time stamp t 1 at node i to t 2 at node j; 0 otherwise.

Maximum Likelihood Space-Time Trajectory Estimation Model
Problem statement. Given the AFC transaction data and train timetable data, the passenger space-time trajectory estimation problem aims to assign the most likely space-time paths to the passengers.
Space-time flow balance constraints. To depict a time-dependent tour in the space-time network, we formulate a set of flow balance constraints as follows: The first term in this constraint for a node represents the total count that passenger p leaves from the node and the second term represents the total count that passenger p arrives at the node. The flow constraint states that the number of departures must equal the number of arrivals unless this node is an origin node or a destination node. If the node is an origin node for passenger p, the number of departures exceeds the number of arrivals and the number of departures minus the number of arrivals must equal 1. If the node is a destination node for passenger p, the number of arrivals exceeds the number of departures and the number of arrivals minus the number of departures must equal 1.
Activity time cost constraints. In the real world, the time spent on any activity is not less than zero. Therefore, c Total time consumption constraints. Given a passenger's AFC transaction data, the time that the passenger passed by the entry gate and exit gate is known. Then, the consumed travel time is calculated.
Platform wait time constraints. Assuming that all passengers can board the first coming train after their arrival at the platform, PWT should be less than the interval time between the departure time of the first coming train after the passenger's arrival at the platform and the departure time of the last train that leaves before the passenger's arrival at the platform.
Objective function. This paper aims to estimate the maximum likelihood space-time trajectory for all passengers based on two aims. Since the walking speed of a passenger generally fluctuates within a small range, one of our aims is to minimizing the variance among passenger's walk activities. The other aim is to minimizing the variance between passenger's actual walking time and its expected value. The objective function is given as followed.
In formulation (5), is an attribute of the space-time link (i, j, t 1 , t 2 ), which is 0-1 binary variable. If the space-time link (i, j, t 1 , t 2 ) is a walk link, w t 1 ,t 2 i,j equals 1. Otherwise, w t 1 ,t 2 i,j equals 0. Because our two aims are both related with passenger's walk activities. In order to reduce the feasible region and improve computational efficiency, only the set of space-time walking links are searched while calculating the utility value of a feasible space-time path. In general, the spatial location of the end node of a space-time walking link should be either platform or exit gates. Thus, the objective function can be converted into formulation (6).

Solution Algorithms
This section presents an algorithm to estimate the maximum likelihood space-time trajectory based on AFC transaction data and the time-expanded network. In absolute terms, a space-time trajectory is determined by z t 1 ,t 2 i,j,p and t i,p , where j ∈ N W . z t 1 ,t 2 i,j,p determines the spatial route and train the passenger boards, whereas the passengers' walk time and PWT are determined by t j,p , where j ∈ N W . Thus, the passenger maximum likelihood space-time trajectory estimation problem can be converted in a time-expanded network trajectory generation and assignment problem.

Feasible Trajectory Generation Algorithm
A space-time trajectory indicates a passenger's movements among activity locations with respect to time. Such a trajectory contains five types of links: The access walk link, platform wait link, train link, transfer walk link and egress walk link. Considering that trains are running according to schedules, train links are fixed and can be built based on the train timetable data. However, the travel time of passengers can vary. Therefore, the passenger's walk time and PWT are uncertain even if the train he/she boards is determined. Some constraints used to formulate the space-time trajectory are given below.
Transfer station constraints. According to the introduction, a transfer station is replaced by two or more stations. Then, if station s and station s are different stations of the same transfer station, they share the same name. s name = s name , s ∈ S With constraints (2)-(4) and (7), an algorithm is proposed to show how to construct a space-time trajectory network based on the URT train timetable data and AFC data. Figure 3 shows the procedures of building space-time trajectory network and Figure 4 gives an explanation to express the procedures.

Algorithm. Building the space-time trajectory network
Input: URT network, train timetable data, AFC data Output: space-time trajectory network Step 1. Initialize parameters and variables.
Input URT train timetable and AFC data, initialize parameters of the algorithm, and initialize variables t i,p and z t 1 ,t 2 i,j,p .
Step 2.1 Extend the transfer stations according to the count of accessing subway lines. The stations that represent the same transfer station have same station name. Add all stations to S.
Step 2.2 Replace a station by four spatial nodes that represent the entry gate, exit gate, platform and track of the station. These four nodes are either the start node or end node of passenger's activity links. Add all spatial nodes to N. Add all spatial nodes that represent platform to N W .
Step 3. Build the space-time node set.
Step 3.1 Generate train departure space-time nodes and arrival space-time nodes. Extend the track spatial nodes according to train arrival count and departure count. Add these space-time nodes to V.
Step 3.2 Generate passengers' entry space-time nodes and exit space-time nodes. Extend the spatial nodes that represent entry gate or exit gate passengers' entry information and exit records. Add these space-time nodes to V.
Step 3.3 Generate platform space-time nodes. Extend a platform spatial node according to the departure of a train accessing the platform. Add these space-time nodes to V.
Step 4. Build the space-time link set.
Step 4.1 Build train space-time links. Connect the space-time departure node and arrival node of the same train according to the sequence of space-time nodes passing by train. Add these space-time links to E.
Step 4.2 Build walk space-time links. First, build the access walk space-time link between entry space-time nodes and platform space-time nodes with activity time cost constraint and the constraint that the platform arrival time should be less than the exit time. Assuming that passengers get off the train immediately while transferring at transfer station or arriving at their destination station, build the egress walk space-time links between platform space-time nodes and exit space-time nodes, and the transfer walk space-time links between platform space-time nodes. Add these space-time links to E and E W . Step

Weighted Assignment
Since passengers are independent individuals, it can be assumed that a passenger's maximum likelihood space-time trajectory is dependent from each other. Then, it is reasonable to estimate the maximum likelihood space-time trajectory for all passengers one by one. A set of feasible spacetime trajectories can be obtained based on the algorithm in Section 4.1. Here, we will develop an algorithm to calculate the weight for all feasible space-time path.
By decomposing a passenger's travel trajectory link, the train link(s) and egress walk link are fixed for a specific space-time trajectory. That is to say, the train(s) taken by a passenger are determined for a specific trajectory. Thus, the time spent at each station is fixed and can be calculated. What is uncertain is the arrival time at platform and PWT. Our aim is to determine the value of , , where ∈ , to minimize the variance among passenger's walking speeds and the deviation between passenger's walking time and its excepted value. The estimation problem can be summarized using the following function.

Weighted Assignment
Since passengers are independent individuals, it can be assumed that a passenger's maximum likelihood space-time trajectory is dependent from each other. Then, it is reasonable to estimate the maximum likelihood space-time trajectory for all passengers one by one. A set of feasible space-time trajectories can be obtained based on the algorithm in Section 4.1. Here, we will develop an algorithm to calculate the weight for all feasible space-time path.
By decomposing a passenger's travel trajectory link, the train link(s) and egress walk link are fixed for a specific space-time trajectory. That is to say, the train(s) taken by a passenger are determined for a specific trajectory. Thus, the time spent at each station is fixed and can be calculated. What is uncertain is the arrival time at platform and PWT. Our aim is to determine the value of t j,p , where j ∈ N W , to minimize the variance among passenger's walking speeds and the deviation between passenger's walking time and its excepted value. The estimation problem can be summarized using the following function.
Problem P1: In the constraints (9), the equality constraint is total time consume constraint at a station. For a specific passenger and a specific space-time trajectory, the enter time, exit time and the train(s) boarded by him/her are fixed. In other words, the total time consume is a fixed value once a passenger's space_time trajectory is assigned. For the origin station, for example, passenger's enter time is recorded by AFC system and the departure time from origin station is the departure time of the train taken by him/her. Thus, the total time consume of passenger's activities in origin station is a certain value and c t 1 ,t 2 i,j,p is independent from each other. Thus, Problem P1 is a single-quadratic programming problem with a given range. Problem P1 can be solved easily using monotonicity of the objective function.

Numerical Experiments
This section shows the numerical experiment conducted using the proposed method based on real-world data from the URT network of Beijing, China between 10:00 a.m. and 12:00 a.m. We developed a software system using C#, Windows Presentation Foundation (WPF) and Human-computer interaction technology. The system aims to provide the subway corporations an intelligent tool to manage URT basic data, assign network flow, analyze the characteristic of network flow distribution and allocate ticket income. Figure 4 shows the real-world Beijing URT network topology formulated by our software.
As shown in Figure 5, Beijing URT network was constructed by 17 lines and 338 stations. Two subway lines, Line 4 and Line 14, are operated and managed by Beijing MTR Corporation. The remaining 15 subway lines are operated and managed by Beijing Subway. There are totally 53 transfer stations where passengers can interchange from one subway line to another subway line. Transfer station is the intersection of subway lines accessing it and represented by a node. Three of them are accessed by three subway lines and the remaining fifty transfer stations are all accessed by two subway lines. In addition, the airport line is independent of other subway lines. That is to say passengers must swipe their smart cards to leave for or depart from airport line.
One of our aims is to confirm whether the computation time required is practical. Another aim is to verify the effectiveness of the method. First, we present the data used in this numerical experiment. Second, we show the process of how to estimate the maximum likelihood space-time trajectory.

Input Data
The numerical experiment employs train timetable data and AFC transaction data obtained from the Beijing Subway. Selected portions of the train timetable data are given in Table 4. The time-expanded network is constructed based on these data.  Table 5 shows part of the AFC transaction data observed on 9 May, 2016. The AFC transaction data includes the card ID, origin station, destination station, entry time and exit time. Card ID is the unique identifier of a smart card and represents a passenger. It's critical to match entry information and exit information. The entry time and exit time are recorded by AFC system when passengers pass the entry gate or exit gate and swipe their smart cards. They are accurate and recorded to the nearest second. The time format is hh:mm:ss.

Input Data
The numerical experiment employs train timetable data and AFC transaction data obtained from the Beijing Subway. Selected portions of the train timetable data are given in Table 4. The time-expanded network is constructed based on these data.  Table 5 shows part of the AFC transaction data observed on 9 May, 2016. The AFC transaction data includes the card ID, origin station, destination station, entry time and exit time. Card ID is the unique identifier of a smart card and represents a passenger. It's critical to match entry information and exit information. The entry time and exit time are recorded by AFC system when passengers pass the entry gate or exit gate and swipe their smart cards. They are accurate and recorded to the nearest second. The time format is hh:mm:ss.

Results and Discussion
In this study, we present the estimation results from some aspects and compare these results with the manual survey results, including the travel time parameters and flow statistics. Table 6 shows the estimation results. We used a computer with an Intel Xeon E5-2640 2.4 GHz) processor and 8 GB memory; this system took 4.2 min to finish estimating all AFC transaction data between 10:00 a.m. and 12:00 a.m. In terms of the calculation time, the proposed methodology can be used for regular processing of transaction data and is acceptable for a daily routine of data processing.

Detailed Space-Time Trajectory Information
A space-time trajectory reflects not only the spatial route chosen by passengers but also the trains that passengers take. Figure 6 shows the set of feasible space-time trajectories for the passenger whose card ID is 15093452342. The time marked in the figures indicates the time when the passenger arrives at the spatial location, but the hour is ignored.    As shown in Figure 6, there are four feasible space-time trajectories in total for the passenger whose card ID is 15093452342 with his/her given entry and exit information. The first three space-time trajectories shown in Figure 6 indicate that the passenger transferred at Nanluoguxiang subway station. The differences among Figure 6a,b,c are that the times spent on walk activities are different and the passenger traveled by boarding different trains. However, the last trajectory identifies Chaoyangmen subway station as the transfer station. The detailed space-time trajectory information is given in Table 7, and Table 8 shows the basic walk time expectation obtained by the manual survey.  24. Clearly, the optimal likelihood space-time trajectory is the first one. Relative to the first trajectory, the relationship between access walk time and transfer walk time is exactly the opposite. The other two trajectories have the same problem in terms of walk time allocation. In addition, it is unreasonable that the passenger spent almost ten times more (as a result of the egress walk time) to tap out by the last trajectory. In contrast, the walk time distribution in the first path is much more reasonable. On the one hand, the actual walk time and its expected values are closer. On the other hand, this trajectory ensures the passenger's consistency between access walking speed and egress walking speed as much as possible.

Estimation Result of Walk Time Parameters
The walk time parameters of a subway station are one of the most important parameters of subway stations. In general, these data contain the access walk time and egress walk time. For a transfer station, the transfer walk time is also included. The parameters are related to the subway station layout and infrastructure properties, such as the length and width of passageways. The walk time consumption of passengers spent at a station depends on those parameters. In this paper, we assume that all passengers are independent individuals and that their space-time trajectories do not interfere with each other. Furthermore, individual differences are considered. Table 9 compares the access walk time parameters estimated with the proposed method to the parameters based on manual survey observations for the top six stations in tap-in passenger flow. As shown in Table 9, the relative deviations between the estimation result and manual survey are approximately 5%. The overall difference between the manual survey and the estimation result is small and can be tolerated. Figure 6 shows the access walk time distribution of the four different subway stations. Among them, there is one regular station, the Beijing Railway station, and three transfer stations. The Beijing West Railway station and Beijing South Railway station are transfer stations accessed by two subway lines, while Xizhimen is a transfer station accessed by three subway lines.
According to Figure 7, the distribution of the access walk time resembles the normal distribution. The access walk time that passengers spend walking from the entry gate to platform is concentrated over a certain range. Comparing these four distributions of access walk time, we find that the variance of access walk time of the Beijing Railway station is the smallest and that of Xizhimen station is the largest. This result arises because the layout of a regular station is typically simpler than that of a transfer station. Relative to transfer stations, regular stations generally have fewer entrances and exits. In addition, passengers have more ways to reach the platform, access walk aisles and transfer walk aisles included. These factors lead to a greater variance of access walk time for transfer stations.  Combining the comparison in Table 8 and the estimation results shown in Figure 7, it is clear that the estimation results are consistent with the survey results. While the manual survey is time-consuming and expensive, the proposed method results in a satisfactory estimate of the walk time based on the AFC transaction data and train schedule.

Distribution of the URT Network Passenger Flow
The distribution of the URT network passenger flow is one of most important characteristics of the URT network. This distribution reflects the time-dependent travel demand and its trend from the macroscale perspective. At a microscale level, the URT network passenger flow is made up of passengers. In other words, the passengers' space-time trajectories compose the time-dependent distribution of the URT network passenger flow. Figure 7 presents the passenger flow distribution of the URT network between 10:30 a.m. and 11:30 a.m. The thickness of the line indicates the count of the passenger flow, and the color indicates the load carried by the subway section.
According to Figure 8, the section load is less than 1, and the majority of passengers are travelling downtown in the study period. The most congested section is Caishikou-Xuanwumen section and its load is almost 1. The relatively congested segments are located mainly around the central business district (CBD) and railway stations, such as Beijing South Railway station and Sanlitun CBD. The passenger traffic in these segments is approximately 7000 to 9000 person per hour and the load rate is about 50%. The load rate of those segments in suburb is basically around 35% or less.  At the network level, according to Table 10, the estimated results are consistent with the clearing results from TOCC, which means that the network passenger flow distribution estimated  At the network level, according to Table 10, the estimated results are consistent with the clearing results from TOCC, which means that the network passenger flow distribution estimated with the developed method is correct at the macroscale level. The difference, relative to the clearing results provided by TOCC, lies at the microscale level. The proposed method estimates the maximum likelihood space-time trajectories for all the passengers. In other words, except for the network passenger flow distribution, all the passengers' detailed space-time trajectories can be estimated.

Conclusions
Network passenger flow is one of the most important characteristics of URT and the basis for train scheduling. Thus, it is significant to characterize the time-dependent network passenger distribution and enhance the efficiency of the URT system for subway operators. This paper proposed a data-driven methodology to estimate the maximum likelihood space-time trajectory based on bulk transaction data. The method proposed in this paper uses expected values of station walk time as the input data instead of the distribution function of the station walk time. This approach reduces the challenges associated with data acquisition and improves the accuracy of estimation results.
Furthermore, the estimation result indicates the passengers' travel information in detail, including the actual access walk time consumed, platform wait time and trains he/she boarded. The passenger's path-choice behavior can be estimated based on the detailed space-time trajectories, which is significant for operators to dispatch, especially in the case of unexpected accidents. Moreover, the train load and congestion of the stations can also be inferred.
We future research will focus on three major areas. First, extensions of the model to incorporate left-behinds due to crowd and the improvement of the methodology to estimate the space-time trajectory without station walking time parameters. Second, consideration of arrivals with a time variance. We aim to develop a practical method to solve the estimation problem for whole day. Finally, we also aim to optimize the URT timetable based on the estimation results.