A passenger-pedestrian model to assess platform and train usage from automated data

We present a transit model that, based on automated fare collection and train tracking data, describes pedestrian movements in train stations and vehicle-specific train ridership distributions. Our approach differs from existing models in that we describe on-board passenger dynamics and pedestrian dynamics at stations in a joint framework. We assume that travelers first decide on the train(s) they will take, and then pursue their journey through the network by successively choosing pedestrian paths, waiting positions on platforms, and specific train cars. Travelers explicitly maximize their travel utility. We model how crowding influences their walking speeds, and how it affects travel utility both at stations and on-board. To illustrate the framework, we present a real case study of a major Dutch rail corridor, for which we collect a rich set of passenger, pedestrian and train operation data. We observe a good agreement of model estimates with empirical observations, and discuss the use of the model for various transit-related problems including level-of-service assessment, crowding estimation, transit optimization, and integrated investment appraisal.


Introduction
A key characteristic of transit systems is their level of crowding, i.e., the accumulation of travelers on platforms, access ways, and in trains (Haywood et al., 2017).Typically, the number of travelers in a transit system is subject to strong spatiotemporal fluctuations (Hermant, 2012).Understanding these fluctuations, as well as the underlying interactions between travelers, infrastructure and rolling stock, is of key relevance to improve safety, comfort and efficiency (Raveau et al., 2014).
Significant progress has been made in understanding passenger dynamics at the level of transit networks, as well as in capturing pedestrian dynamics at the level of transit stations.For instance, fare collection and train tracking data are now routinely used to estimate train ridership (Zhu et al., 2017), and transit assignment models are employed to analyze and mitigate disruptions (Cats et al., 2016).Likewise, the measurement, analysis and modeling of pedestrian dynamics in train and metro stations has received considerable research attention (Xu et al., 2014;Hänseler et al., 2016).
Certain problems, such as the estimation of platform usage or the balancing of investment funds between rolling stock and pedestrian facilities, require simultaneous consideration of network-wide and station-specific traveler dynamics.Indeed, the usage of a train or platform depends on the position of access ways, and on the length and stopping position of trains in each station (Kim et al., 2014).Such usage patterns are a key determinant of dwell times, platform safety, or station throughput (Szplett and Wirasinghe, 1984;van den Heuvel, 2016).So far, no modeling framework exists that explicitly takes into account the intertwining of traveler dynamics at the network and station level.
The objective of this paper is to present a passenger-pedestrian model that, based on automated transit data, can describe traveler dynamics both in stations and in trains.The solution to this problem facilitates a number of applications, including (i) detailed estimation of platform usage for safety analysis or performance optimization (e.g. by adjusting train stop positions), (ii) analysis of car-specific on-board accumulation for comfort assessment, or for crowding information systems, (iii) investment appraisal when both in-vehicle and at-station benefits are relevant, and, indirectly, (iv) disruption management.
Our approach is inspired by agent-based transit assignment models, which we extend by a pedestrian model to capture walking and waiting behavior in stations.We consider travelers as individual decision makers whose movement along a transit trip is described both while riding and while moving and waiting at stations.Users are assumed to first decide on their transit itinerary, before considering choices at the station level, such as the choice of a specific walking route, or of a waiting position on a platform (Hoogendoorn and Bovy, 2004;Bureau Spoorbouwmeester, 2012).In describing the traveler dynamics, we rely where possible on data, namely automated fare collection (AFC) and train tracking data.Such data is increasingly available, and contributes to a high model accuracy.
In the remainder, we first review the relevant body of literature, present the model framework, and then apply it to a real case study.

Background
We briefly present related transit assignment models, focusing on simulation-based and data-driven approaches, as well as pedestrian movement models for train stations.We then propose a new approach that is hybrid in that it considers both the network and station level, and in that it contains both model-and data-driven components.

Approaches towards modeling transit assignment
Transit assignment models describe the propagation of travelers within a network, taking passenger demand and network characteristics as input.A large number of models have been proposed, for instance within a frequency-or schedule-based framework (Spiess and Florian, 1989;Nuzzolo et al., 2001).In the context of this work, disaggregate approaches are of interest, i.e., those that consider individual travelers and train services.Among them, we distinguish agent-based models and data-driven approaches.
Agent-based assignment models explicitly describe traveler behavior.Relying on simulation, they consider behavioral aspects such as route, train or departure time choice.One of the first agent-based transit models is due to Wahba and Shalaby (2005), who assume that passengers choose their transit itinerary, stop and departure time based on a desired arrival time and a given origindestination pair.By traveling repeatedly through the network, passengers build up individual knowledge of walking, waiting and invehicle cost, on which their decision making is based (see also Teklu, 2008).More advanced frameworks have been proposed by Toledo et al. (2010) and Cats et al. (2016), who focus on dynamic congestion effects e.g. in the form of on-board crowding, denied boardings or service reliability.
Data-driven approaches focus on the assignment of passengers to individual trains by relying on automated transit data (Pelletier et al., 2011).A wide range of approaches has been proposed, with early work employing rule-based methods (e.g.Bagchi and White, 2005), and more recent work focusing on probabilistic frameworks (Hörcher et al., 2017;Zhu et al., 2017).Increasing availability of AFC and train tracking data has enabled several widely cited case studies e.g. in Japan (Kusakabe et al., 2010), China (Sun and Xu, 2012) or Singapore (Sun et al., 2012).

Station models
At the level of transit stations, pedestrian traffic assignment models are used to describe local travel activities, way finding and movements of passengers.They are henceforth referred to as 'station models.'Station models are useful to predict the level-of-service of rail access facilities, and may be loosely divided into macro-, micro-and mesoscopic approaches.Macroscopic station models consider pedestrians as a continuum, and comprise network flow models (Lee et al., 2001), cell transmission models (Hänseler et al., 2014), or models based on Petri-nets (Kaakai et al., 2007).Microscopic approaches consider pedestrians and their interactions individually, including activity-based models (Hoogendoorn and Bovy, 2004;Campanella, 2016), social force-based models (King et al., 2014;Helbing and Molnár, 1995), or cellular automata (Davidich et al., 2013).Mesoscopic station models consider individual pedestrians, but describe their movements in terms of macroscopic relationships (Tordeux et al., 2018).Among the better-known approaches are queueing systems (Xu et al., 2014;Jiang et al., 2015) and hybrid models (Daamen, 2004).

Research approach
The above-mentioned approaches describe traveler dynamics at the network-and the station-level separately.They neglect how the two levels interact.In practice, at-station pedestrian behavior and in-vehicle passenger behavior are intertwined in several ways: • The distribution of passengers on platforms influences on-board crowding and vice versa (Lam et al., 1999; Liu et al., 2016).
• Travel utility, a central factor in traveler behavior, comprises components pertaining to both in-vehicle and at-station time (Tirachini et al., 2013;Guo and Wilson, 2011).
• Demand micro-patterns caused by arriving and departing trains yield circular dependencies between congestion in pedestrian facilities and boarding and alighting flows (Hermant, 2012;Hänseler et al., 2017b).
Our aim is to integrate the two levels of traveler dynamics in a way that is computationally feasible for large-scale applications, and consistent.By the latter we mean that characteristics of travelers are preserved along their transit trip, and that they take nonlocal consequences of their decisions into account (such as how the choice of a train car influences walking distances at their destination and vice versa).This notably precludes a station model (Srikukenthiran, 2015).
To achieve this aim, we envisage an agent-based transit assignment model that fully incorporates a mesoscopic pedestrian model for each station.We assume that fare transaction data and train tracking data are available.Hence, we rely on data for processes that are readily observable (namely train operations and passenger demand), and on behavioral models where direct observations are difficult (such as for platform dynamics).Our focus is on busy railway networks with considerable crowding (i.e., passengers may not be seated, and level-of-service on platforms may be low), but we assume that passengers are able to board any desired train (no denied boardings).
Our main contribution lies in the development of a hybrid approach that consistently describes traveler behavior both on-board and at stations.Our approach is capable of estimating spatiotemporal distributions of crowding levels on platforms and in trains, which is of direct use for safety and comfort considerations.Moreover, it serves as an important building block for real-time applications, for instance for information provision systems.

Model framework
The proposed model requires disaggregate travel demand and train tracking data as input.In a hierarchical way, it first assigns each traveler a sequence of train runs, and then a detailed path that specifies which pedestrian route or train vehicles they have used.The model takes the influence of crowding on walking speeds and on path choice explicitly into account.Rich model output allows for detailed analysis of traveler dynamics both at the station and network level.
Fig. 1 shows a scheme of the proposed model.Specification.We discuss each of its parts in the following.Alongside the more general framework, to illustrate the model, we present an example specification.It is geared towards commuting traffic in passenger rail networks, and focuses on elementary travel activities, namely walking, waiting, and riding, as well as level changes.Travelers are assumed to go straight to a platform when entering the railway system.Upon arrival, they go to some exit unless they seek a connecting train, in which case they walk to the corresponding platform (Davidich et al., 2013).No distinction is made between different fare levels, and travelers are assumed to be capable of using all infrastructures, including stairways and escalators.In Section 4, we will apply the example specification to a case study.
F.S. Hänseler, et al. Transportation Research Part A 132 (2020) 948-968 Appendix A provides an overview of the notation.Appendices C and D describe technical aspects relevant for an implementation of the model.

Model input
Three types of model inputs are required: Disaggregate travel demand, specifying for each traveler a departure time, an origin, and a destination.Such information is increasingly available from automated fare collection systems.AFC systems can be classified based on: (i) whether they include tap-in only or tap-in and tap-out records, typically corresponding to flat and distance-based fare schemes; (ii) whether fare validation is bound to vehicles or station gates.Depending on these two aspects, destination (for boarding only) and vehicle (for station validation) inferences might be applied.In the absence of supplemental data, methods to infer the alighting station of a given tap-in record (e.g.Trépanier et al., 2017;Munizaga and Palma, 2012;Gordon et al., 2013) and for inferring the vehicle boarded by each passenger (Hörcher et al., 2017;Zhu et al., 2017) may be applied.Alternatively, if no disaggregate AFC data is available, travel demand may also be obtained from a high-level transit assignment model (Nuzzolo et al., 2012).
Train tracking data, describing the itinerary of train runs, their composition and capacity.Such data is often available from signaling and control systems.If train tracking data is unavailable, the planned timetable in combination with an appropriate delay model may be used (Higgins and Kozan, 1998;Goverde, 2007).
Infrastructure characteristics, describing the rail network and rail access facilities, are finally required.Infrastructure characteristics include the topology of train stations, specifying the dimensions and capacity of platforms, platform access ways, and station halls, as well as stopping positions of trains.

System representation
We consider individual travelers in a continuous-time, discrete-space model.Time is denoted by t ∈ (t − , t + ), where t − and t + represent the start and end time of the period of interest.Space is represented by a set of train stations, S , of which each station S s is associated with a directed graph G s representing its rail access facilities.A station graph G N L = ( , ) s s s is described by a set of nodes, N s , and a set of pedestrian links, L s (see Fig. 2).Certain elements of pedestrian facilities, such as stairs or corridors, translate naturally into links, and others naturally into nodes, such as entrance/exit points.For other elements, such as waiting halls or platforms, the decomposition into links and nodes depends on the desired granularity (Løvås, 1994).A node N n s through which travelers enter or leave the pedestrian network is referred to as origin/ destination node, and their set is denoted by N N departure time t r s , dep and a platform, which are all assumed known from train tracking data.

Traveler decisions
To describe traveler decisions, we introduce the notions of a transit itinerary and of a travel path.A transit itinerary i is a sequence of train runs between train stations, … ( ) , where a segment w is represented either by a train car that carries a traveler between two train stations, or by a pedestrian link.
We assume that travelers first decide on a transit itinerary, subject to which they choose their travel path.This assumption is appropriate in the context of scheduled services with high regularity and no denied boardings.
Let I y denote the set of attractive transit itineraries of traveler y.It may be obtained from empirical observations, or from a suitable choice set generation model (e.g.Cats, 2011).The probability of traveler y to choose a transit itinerary I i y can be generally expressed as where n n t ( , , , ) y denotes the transit itinerary choice model.We assume that some appropriate model is available, which may be a simple all-or-nothing assignment model, or a complex choice model that ensures transfers, and minimizes travel fare and travel time.
Subject to the choice of a transit itinerary i, a traveler Y y is assumed to attach a utility to each feasible travel path H h i , with H i the set of available travel paths.The path-specific utility can be expressed as where u y h , denotes the deterministic part, and EV ~(0, 1) y h , an i.i.d.extreme value type I error.Travel utility typically comprises components of in-vehicle, walking and waiting time as well as transfer penalties, and may depend on various factors, such as personal attributes, or prevailing crowding levels.Note that correlations between path alternatives can in principle be captured by including a correction term as proposed by Ben-Akiva and Bierlaire (2003) or more recently by Fosgerau et al. (2013).
We assume that path utility is made up of link-wise additive components.The deterministic part of the utility then takes the form where u y h w , is the utility of traveler y associated with segment w on path h, and z y h w , the corresponding vector of attributes.These attributes characterize the travel path and personal preferences and may be subject to learning and adaptation, such as the expected level of on-board crowding, or subject to real-time information, such as the delay of an anticipated train run.
We assume that travelers have full information on their available choice set, and that they select alternatives that maximize their expected utility.The progress of individual travelers is modeled as a sequence of consecutive travel decisions in combination with a learning process, which we specify in the form of an exponential filter (Hickman and Bernstein, 1997;Cantarella and Cascetta, 1995).Details may be found in Appendix B.

Context and assumptions
We assume that an exogenous, case study-specific transit itinerary choice model of the form of Eq. ( 1) is available.Railway operators typically provide recommendations of transit itineraries, and often the corresponding choice fractions are empirically known.The specific model we will use in the case study is described by Banninga et al. (2016, in Dutch).
The utility of a travel segment (see Eq. ( 3)) is assumed to depend on the time spent on each activity.Let B w denote the set of activities available on segment w, including e.g.walking, waiting, or ascending a stairway.Taking into account spatiotemporal changes in level-of-service, the utility of traveler y on segment w is specified as In Eq. ( 4), z t ( ) w generically denotes the level-of-service of segment w at time t, and t 1 ( ) is an indicator equaling one if traveler y pursues activity B b at time t.The variable y w b , represents a time multiplier that depends on the level-of-service and the travel activity, and 0 ivt is a constant associated with typical in-vehicle conditions.Note that, since the number of transfers is determined by the transit itinerary, a transfer penalty would not influence the travel path choice, and is thus omitted.
In the context of Eqs. ( 2) and ( 4), the constant 0 ivt describes the residual deviance, i.e., the amount of unexplained variation.Its value depends on the specification of the time multiplier, on the targeted user group, and is country-specific.To obtain an approximate estimate, a preliminary stated-preference survey among Dutch commuters has been carried out, yielding a value of = 6.35•10 0 ivt 3 s −1 .The details of this survey are beyond the scope of the present work, and not relevant for the illustration of the proposed modeling framework.
For train cars, the level-of-service is captured by the load factor, i.e., the ratio between car occupancy and seat capacity.Among others, Wardman and Whelan (2011) have studied the in-vehicle time perceived by seated and standing passengers as a function of the load factor.Their specification is adopted in this work and shown in Fig. 3a.
For walking facilities, the level-of-service is captured by the pedestrian density (Fruin, 1971).At free-flow conditions, waiting and walking are valued at = 1.66 and 1.11 units of in-vehicle time, respectively (Wardman, 2004, Table 6, inter-urban rail passengers).
Walking is thus given a premium of approximately 50% compared to waiting, attributed to the physical effort (see also Douglas and Karpouzis, 2005).With increasing density, the time multiplier increases.Fig. 3b shows the proposed relationship, which is adopted from Douglas and Karpouzis (2005).For stairways and escalators, no density-dependent specification is available, and hence a constant valuation is assumed.Specifically, a time multiplier of 2.12 and 3.09 is assumed,1 not distinguishing between the upward and downward usage directions (Daamen et al., 2005).

Traveler movements
Traveler movements are considered to be the result of a stochastic service system formed by the transit network and its users.This stochastic system governs the times at which a traveler y enters and leaves a segment w, denoted by t y w , in and t y w , out , respectively.The service times depend on the state of the corresponding facility element, denoted by the vector x .For a pedestrian area a, the state vector groups the number of travelers m on each associated link In pedestrian areas, the service time of a pedestrian link is equal to the time required to traverse it.Specifically, for a traveler y on link L a , the traversal time is given by where v y, denotes the corresponding walking speed.Walking speed is influenced by various factors such as age, gender, the timetable and prevailing network conditions.It may be expressed as a state-dependent random variable that is governed by a potentially direction-dependent or stochastic fundamental diagram (Nikolić et al., 2016;Hänseler et al., 2017a) where FD y, denotes the density-speed relationship associated with traveler y on link .Typically, under free-flow conditions, walking speeds vary substantially across pedestrians.As more pedestrians occupy space, frictions among pedestrians increase, reducing velocity and its variability (Tregenza, 1976).Various specifications of Eq. ( 6) are available in the literature, also for inclined areas such as stairs or escalators (Weidmann, 1992).Upon being served by a facility element, travelers move to a next facility element provided that space is available, and that the resulting flows at interfaces such as doors do not exceed capacity.If capacity limits are reached, travelers are temporarily refrained from proceeding.They experience waiting time while continuing to consume space in their current facility element.A similar consideration holds for the assignment of seats to passengers, where standing incurs lower travel utility.Details may be found in Appendix C.

Specification
Walking speed is specified as a composite model consisting of an individual component and a mean density-speed relationship.The individual component + y takes into account heterogeneity in walking speed across travelers.It is modeled as a random Fig. 3. Example specification of time multipliers as obtained from the literature (Wardman, 2004;Wardman and Whelan, 2011;Douglas and Karpouzis, 2005;Daamen et al., 2005).
variable with Gaussian distribution N ~(1, 0.215), which is appropriate in absence of socioeconomic information (Daamen, 2004).A limitation of this approach is that it does not consider longitudinal differences in free-flow speed.Importantly, if a departure is imminent, prospective train passengers would likely accelerate their pace to reach a connection in time.To account for this, free-flow speeds are resampled for travelers that have missed a train in a previous iteration and whose assigned speed is below average.
To describe the density-dependence of mean walking speed, the fundamental diagram by Weidmann (1992) is used.Unlike most other specifications, the Weidmann-relationship has been calibrated for various facilities, including stairways, ramps and horizontal walkways.It specifies the walking speed of traveler y on pedestrian link as where the prevailing pedestrian density in area a, L a , is given by The parameter k jam denotes jam density, free the free-flow speed, and is a shape parameter.Table 1 provides facility-specific values.

Model output
As with any simulation-based model, there is flexibility in defining the desired model outputs.In the context of model validation, two groups of data sources may be distinguished: Pedestrian traffic data, describing passenger movements at train stations.Such information is useful to quantify the safety and level-of-service of rail access facilities, for instance in terms of flows or densities.Various measurement technologies exist to observe such data in practice (Bauer et al., 2009;U.S. Department of Transportation, 2013).
Passenger ridership data, describing the usage of trains.Such data is useful to describe on-board crowding, and to estimate the resulting service level.Automated measurements of ridership may be based on in-vehicle Wi-Fi sensors, door count systems, or axle load sensors (Kim et al., 2014;Nielsen et al., 2014).

Case study
We carry out a case study of the UtrechtSchiphol rail corridor in the Netherlands (see Fig. 4).This case study is of interest for two reasons: 1.The corridor is amongst the busiest in Europe, with a continuously high need for maintenance (Nederlandse Spoorwegen, 2016).
We discuss how the framework can provide clues for investment appraisal by comparing travel cost associated with in-vehicle and at-station time.2. The corresponding platform in Utrecht has been declared 'overloaded in the near future' by the responsible government agency (Kleinhout, 2016).There is an elevated risk of passengers falling on tracks, and generally a high level of congestion on platform exit ways.To monitor the situation, the central part of the platform has been equipped with a pedestrian tracking system at a cost of 200,000 € (Vaatstra, 2017).We investigate to what extent our approach can describe platform usage by comparing model prediction and sensor observation.

Case study description
The Utrecht-Schiphol corridor connects the largest train station in the Netherlands with Amsterdam's airport, a major transportation hub in Europe.In the train timetable of 2017, direct train service is available four times per hour with a capacity of up to

Table 1
Parameters for density-speed relationship and flow capacity in various walking facilities as reported by Weidmann (1992, W), Cheung and Lam (1998, CL), Daamen (2004, D) and Bodendorf et al. (2014, B).Speeds shown for stairways and escalators pertain to their horizontal component.For escalators, the speeds apply to walking pedestrians; for standing escalator users, the speed is given by the mechanical speed of the escalator, amounting typically to 0.5 m/s (Bodendorf et al., 2014).As in Singh et al. (2016), it is assumed that 30% randomly drawn travelers walk on escalators, while the remaining 70% stand.(For interpretation of the critical flow φ crit , see Appendix C. Values provided for level walkways and stairways follow from the corresponding parametrization of the fundamental diagram, whereas the value indicated for escalators represents an empirical estimate from Bodendorf et al. (2014).) 1128 seats per train.The total journey amounts to between 30 and 33 min, depending on the train run.
The direct connection serves four train stations, namely Utrecht Centraal (henceforth referred to as Utrecht), Amsterdam Bijlmer Arena (Bijlmer), Amsterdam Zuid (Amsterdam Zuid), and Schiphol Airport (Schiphol).These train services originate upstream of Utrecht and terminate in Schiphol.Such 'corridor trains' are served by platform 5/7 in Utrecht, platform 2/3 in Bijlmer, and platforms 3/4 both in Amsterdam Zuid and Schiphol.'Corridor passengers' start and end their train journey at one of the aforementioned stations.In contrast, 'auxiliary passengers' board upstream of Utrecht, or use trains that do not serve the Utrecht-Schiphol route.They are taken into account to reproduce the prevailing crowding conditions, and amount to less than 20% of total passengers on corridor trains.
Fig. 5 shows the configuration of the platforms used by corridor passengers, as well as the train stopping positions for various train lengths.Platforms are longitudinally partitioned into areas of the same length as train cars, i.e., into approximately 30 m long segments.Laterally, they are subdivided when necessary to account for spatial heterogeneity, in particular if built facilities are present along the centerline of the platform.Escalators and their service direction are indicated by arrows.
Travelers check in and out at ticket validation terminals that are placed above or below the platform level.These terminals are either turnstiles as in the case of Utrecht, Bijlmer and Amsterdam Zuid, or free-standing validators as in Schiphol.The walking space of the entire area restricted by ticket validators is modeled.In Bijlmer and Amsterdam Zuid the corresponding areas outside platforms are relatively small.In contrast, in Utrecht and Schiphol, they are vast, and free-flow conditions are assumed.This is a simplifying assumption, motivated by our focus on platforms and the typically dominant crowding levels on platform access ways.
On working days, demand reaches a maximum during the morning peak period between 7 and 9 a.m., which is studied in the following analysis (van den Heuvel et al., 2016).To ensure that the crowding experienced by travelers is as realistic as possible, an additional 30 min are considered before and after.During the resulting three-hour period, on average between 35,000 and 40,000 passengers use approximately 140 trains serving any of the aforementioned platforms.Rolling stock consists primarily of three train types, namely suburban trains comprising four to eight cars, and interregional single-and bi-level trains with up to 12 cars.They differ in terms of passenger and door capacity.Passenger capacity amounts to 45-55 seats per car for suburban trains, 63 for interregional, and 84-94 for bilevel interregional train cars (NS Reizigers, 2004).Crush capacities are approximately twice as high, but in practice only reached during extreme events which are not of interest in this case study.Each train car is equipped with two doors, associated with flow capacities of 1.02, 0.88 and 1.12 ped/s per door for sub-urban, interregional and bilevel interregional trains, respectively (Wiggenraad, 2001).
To describe transit itinerary choice, a proprietary model developed specifically for the Dutch railway network is used (Hoogenraad et al., 2013;van den Heuvel and Hoogenraad, 2014).For every origin-destination pair and departure time, it generates a set of relevant train itineraries, associates a generalized travel time with each alternative, and estimates choice probabilities based on empirically calibrated distribution functions.These choice probabilities implicitly take into account the chance of catching a train at the origin station.In case of multi-leg journeys, the probability of reaching a transfer connection is considered, and for larger stations also potential time spent on subsidiary travel activities such as shopping.Train itineraries with multiple legs include in particular Utrecht and Schiphol, with a high share of transfers to and from corridor trains.Transfer passengers using any of the four corridor stations are thus explicitly taken into account.

Available data
A rich set of data sources is available, describing travel demand, transit operations, and pedestrian traffic at stations: Automatic fare collection data is available for passengers carrying a smart card, comprising check-in and check-out stamps at their station of origin and destination.The average penetration rate of smart cards for both check-ins and check-outs amounts to 96% for Utrecht and Amsterdam Zuid, 91% for Bijlmer, and 67% for Schiphol.The reduced values in Bijlmer and Schiphol are due to printat-home tickets provided for nearby events and single-use tickets used by international travelers, respectively.Penetration rates specifically for the morning peak period are not known, but assumed to be higher due to a large share of smart card holders among commuters.Specifically for this study, to fulfill prevailing privacy requirements, the fare collection data needs pre-processing.Details are found in Appendix D.
Train tracking data is available, specifying for each train the realized arrival and departure times, assigned platforms, as well as the rolling stock type.The stopping position of trains is dependent on the number of cars, and can be inferred (see Fig. 5).Pedestrian flow counts are available on selected platform access/exit routes in Utrecht, Amsterdam Zuid and Schiphol, as indicated by camera symbols in Fig. 5. Flow sensors are bi-directional, providing minute-by-minute counts for access and exit flows.In Utrecht, access ways connecting platform 5/7 to the station hall are surveilled, whereas access ways to two underground tunnels are not monitored.In Amsterdam Zuid, the two central access ways are surveilled, while a third, more lateral route is not equipped.For platform 3/4 in Schiphol, all access routes are equipped.
Pedestrian density measurements are available on platform 5/7 in Utrecht for a 100 m-stretch on the narrow side of the platform, as mentioned at the beginning of the section.The covered part is indicated by the dotted area in Fig. 5.The density measurements are obtained by a network of 25 stereo cameras, and are available on a minute-by-minute basis at a spatial granularity of approximately 10 m 2 .
The above data is available for the time period between March 8 and March 31, 2017.Demand and transit operation data are used for model estimation, and pedestrian traffic data is used for validation.The analysis focuses on working days, and specifically on Tuesdays, Wednesdays and Thursdays, leaving a set of eleven days for detailed investigation.

Results
Up to 24 independent simulation runs are carried out per day, with 250 learning iterations each.While not optimal in terms of computational cost, this configuration is sufficient to reliably estimate all performance indicators of interest, which is our main goal.The computation time of one iteration on a single core of a 2.3 GHz-CPU amounts to approximately one minute, and multiple runs can be efficiently parallelized (see Hänseler, 2017, for a freely available implementation).While all available days are considered to study day-to-day fluctuations, special focus is given to the analysis of the morning peak period of March 22.On this day, all sensors have been functioning, and the train traffic was not disrupted.The consideration of a single day allows to study fast dynamics, such as platform exit flows after a train arrival, which otherwise get averaged out by deviations in train arrival times across days.

Consistency
To investigate the consistency of smart card data and the transit itinerary choice model, we first compare cumulative flows on monitored platform access routes.Fig. 6 shows for each station the total flows on escalators and stairways that are equipped with a count sensor.Individual curves represent observations and estimates of March 22, and the gray band the observation range associated with the entire set of days, i.e., the band spanned by the smallest and largest observed cumulative flow curve.Fig. 6a shows that flows on the monitored platform access routes in Utrecht are overestimated by 11.2%.This may be due to errors in the transit itinerary and travel path choice facilitated by a large number of transfer passengers, or sensor saturation leading to reduced flow observations.In contrast, platform exit flows in Schiphol (Fig. 6c) are underestimated by 19.2%, which is expected due to the lower penetration rate of smart cards in that station.In Amsterdam Zuid (Fig. 6b), the inflow is over-and the outflow underestimated, which is consistent with the overestimation in Utrecht and the underestimation in Schiphol.Absolute errors are generally smaller due to less freedom in local pedestrian route choice, and a lower share of transfers.
Overall, observation and estimation agree up to a degree that is comparable to the width of the fluctuation band, implying that the available smart card and pedestrian count data are consistent.Demand micro-peaks, discernible by the step-wise character of inflow curves, corroborate the consistency of alighting and walking dynamics and observed timetable realizations.

Validation
As mentioned, the platform serving corridor trains in Utrecht, and in particular the narrow areas adjacent to track 5, have been declared prone to overloading.Likewise, the two central access ways in Amsterdam Zuid represent a well-known capacity bottleneck.The performance in these two areas is studied in the following to assess the validity of the model.flow rates on March 22 are shown as curves, while the observation range of the 11-day dataset is shown as a shaded area.An aggregation period of 60 s is used, imposed by the measurement data.Root-mean-square (RMSE) and mean absolute errors (MAE) are provided to quantify the agreement between model estimates and observations.The observed stairway flow, shown in Fig. 7a, does not reach capacity during the entire observation period.In contrast, the escalator flow is saturated on multiple occasions, as depicted in Fig. 7b.The location of flow peaks is relatively well reproduced by the model, with minor differences in their magnitude.One reason for the latter is rooted in simplifications in the modeling process.For instance, it is likely that a person approaching a congested exit way anticipates the discomfort and decelerates in advance, thereby smoothing the observed flow.Such behavior is not considered by the model.At a coarser time aggregation, such as at the widely used five-minute level (Ross, 2000), differences between model prediction and observation decrease rapidly, although while averaging out flow peaks.
Fig. 8 shows pedestrian densities on the equipped platform area in Utrecht.Prior to train arrivals, prospective passengers start accumulating on platforms, yielding a gradual increase in density.Upon arrival, incoming passengers alight, leading to a short peak, followed by a rapid decrease in pedestrian density.Maximum observed densities are slightly above 1 ped/m 2 .On other days, density values exceeding 1.2 ped/m 2 are observed (see shaded area), although their occurrence is rare and can typically be explained by specific circumstances such as train cancelations.
A longitudinal comparison of the density profiles along the equipped 100 m long area (Fig. 8a-c) shows only small differences, which however is put in perspective by the total platform length of 430 m.Both RMSE and MAE indicate good agreement between estimates and observations, in particular if the substantial day-to-day fluctuations are considered (gray band).Certain differences in peak densities are perceivable, which may be due to the presence of first and second-class cars with differing loads.Also, it can be seen that the model predicts a faster accumulation of passengers prior to train departures.This deviation is likely due to subsidiary travel activities, such as getting a coffee before departure, which are not modeled.
If the entire platform length is considered, longitudinal differences in density become apparent in the model estimate.Fig. 9 shows the evolution of density for the five-minute interval on March 22 for which the maximum density has been measured.During the considered time span, two trains of ten cars arrive on both sides of the platform.At 8:04:47, train #3522 arrives on platform 7 (wider half of platform), with an estimated 567 alighting and 683 boarding passengers.Despite the large passenger volume, Fig. 7. Exit flow from platform 3/4, Amsterdam Zuid, along the two central exit routes (see Fig. 5 for position of sensors).Observations and model estimates are shown with a granularity of 60 s.Dashed lines represent capacities (obtained from Table 1).
Fig. 8. Density evolution on monitored area platform 5/7, Utrecht.pedestrian densities remain well under 1 ped/m 2 , with the exception of exit ways.At 8:06:11, train #3020 arrives on platform 5 (narrow side), with 691 alighting and 688 boarding passengers.Due to the narrow platform area and a concentration of passengers around the central platform, densities reach significantly higher values.These estimates, both regarding the lateral and longitudinal evolution of density, are in agreement with qualitative observations during peak periods.

Travel utility and train load distributions
Beyond the prediction of pedestrian traffic indicators, the model can be used to investigate the perceived travel utility, or passenger load distributions on-board trains.Fig. 10 shows the estimated travel cost grouped by origin-destination pairs, assuming a value of time of 9.62 €/h for Dutch commuters (Pel et al., 2014).The average of all available weekdays is shown, and error bars denote the corresponding standard deviation.It is apparent that, especially for short trips, the contribution of walking and waiting is significant, and may exceed in-vehicle cost.This is in line with related studies which also consider the travel cost contribution of walking and waiting, although typically without taking the impact of station crowding on walking speeds and comfort into account (e.g.Cats et al., 2016).For transit systems operating close to capacity, consideration of such dependencies is important for an accurate appraisal of infrastructural or operational measures, such as the evaluation of rolling stock investments against investments in rail access facilities.
Fig. 11 shows the estimated passenger distribution of train #3522 when arriving in Bijlmer, Amsterdam Zuid and Schiphol.Train #3522 has one of the highest riderships, particularly between Utrecht and Bijlmer, and is normally operated with ten cars.The estimated distribution pertains to the average of considered weekdays, except for March 30 on which only eight cars have been available.Naturally, as trains move along the corridor, the percentage of auxiliary passengers decreases.
The load distributions when arriving at Bijlmer (Fig. 11a) and Amsterdam Zuid (Fig. 11b) are high and nearly uniform, with differences of less than 13% between the train car with the highest and lowest load.This is expected, as walking efforts play a smaller role compared to the influence of in-vehicle crowding when passenger loads are high.The highest load in Bijlmer is found in cars #7 and #8, offering direct access to the single platform exit route.The estimated distribution in Amsterdam Zuid is bimodal, with one peak pertaining to the central platform exit ways, and one towards the front of the train associated with the lateral exit way at the head of the platform.When reaching Schiphol, the level of in-vehicle crowding is low, implying lower in-vehicle cost, and thus a higher influence of walking cost.Differences in the load distribution are more pronounced, with the occupancy in the busiest car being almost twice as high as in the least utilized car.Passenger load is particularly low in cars at the front of the train, which are not adjacent to any exit way.

Discussion
We first assess the proposed model by commenting on its strengths and limitations, and then discuss its scope of application.

Strengths and limitations
The results show that the proposed framework is capable of producing passenger flows and platform densities that are in good quantitative agreement with observations, and able to realistically describe generalized travel costs and car load distributions.This is encouraging since the required model inputs -automated fare collection and train tracking data -are increasingly available.The proposed simulation approach allows to potentially account for measurement errors if systematic correlations between measurement errors and passenger load levels can be established.Most station models require inputs that are less readily available, such as origin-destination demand at the station level, or pedestrian flow and density counts, which in this work have been used for model validation only.Likewise, transit assignment models may require case-specific assumptions, specifying for instance walking and waiting times, or the likelihood of successful connections in busy train stations.In contrast, many of the parameters required by the proposed framework are more general, and can be specified based on the literature.
Nevertheless, the issue of optimally specifying trip assignment models is yet to be tackled.Notably, the influence of crowding on travel behavior is fundamental, and still subject to research (Pel et al., 2014).To capture such impacts, the current specification borrows a wide range of parameters from experiments performed in various geographical areas.This is not ideal, and yet yields satisfactory results.A dedicated calibration based on revealed preference data is likely to further improve the model performance, as would be a detailed testing and 'calibration' of the space discretization (Openshaw, 1984).
A point of criticism may pertain to the hierarchical choice of transit itinerary and travel path, i.e., the assumption that travelers decide on a (sequence of) train runs before choosing individual pedestrian links and train cars.Such a structure is appropriate for rail systems as studied in this work, and practical since it allows for a seamless integration of empirical itinerary choice models.Yet for applications with large shares of multi-leg trips or when capacity limits result in denied boardings, it is unrealistic to assume that the itinerary is determined beforehand.In such cases, a sequential choice model also for transit itineraries is more appropriate (Spiess and Florian, 1989).A framework for such a choice structure in principle exists (e.g.Cats et al., 2016), although the consistent specification of an integrated choice model requires further research.

Scope of application
The proposed model is well-suited for problems that require consideration of passenger dynamics along the entire transit travel path.In the following, we highlight three fields of application.
Scenario evaluation and investment appraisal.Scenarios that affect both transit operations and station facilities can be evaluated.Empirical research has shown, for instance, that local adjustments in train stopping positions can significantly reduce dwell times when boarding and alighting process are time-critical (van den Heuvel, 2016).The analysis of such measures is not amenable to classical station models, as they typically assume uniform passenger distributions along a platform (Daamen, 2004;Vaatstra, 2017).Besides adjusting train stop positions, the model may be used to optimize transfer times, to identify platform sections requiring more space for pedestrian traffic, or more generally to determine performance and safety bottlenecks.Similarly, it is suitable for balancing investments between the network and the station level, as discussed previously.
Provision of crowding information.To yield a more balanced use of platforms and trains, transit operators increasingly employ crowding information systems.We illustrate this using two examples from the Netherlands.Fig. 12a shows a 180-meter long luminous band that indicates the occupancy of an arriving train.Crowding levels are directly measured by infrared counters that have to be installed at each train door, which in practice is prohibitively expensive.Fig. 12b shows a smartphone application providing crowding information based on historical observations.While more viable from an economical view point, the usefulness of such crowding estimates is limited by the lack of real-time information.Our model, in contrast, can provide vehicle-specific crowding estimates based on readily available AFC data in real-time.
Disruption management.In conjunction with a rail operations and demand estimation model, the proposed framework can be used as a building block for a disruption management system.It can readily estimate passenger loads in train cars, pedestrian densities on platforms, and flows on access ways.Such information is useful to develop and evaluate mitigation strategies, both for off-and online usage.For instance, given a delayed train, one may assess holding strategies to increase the likelihood of successful connections.Moreover, one may change the stopping position to alleviate platform crowding.To evaluate such measures, the model can obtain performance indicators such as travel time losses or number of missed connections.

Demonstration of scenario assessment
We illustrate the model's capability to support the assessment of 'what if'scenarios using two examples involving Utrecht Centraal Station: 'No escalators': We assume that both escalator pairs on platform 5/7 fail and are closed for pedestrian traffic.Such a disruption may result from a local electricity outage and following reparation work.'Lateral train stopping': We assume that all trains on platform 5, irrespective of their length, stop at the far right, i.e., at the stopping position for trains with 10 cars (see Fig. 5).This may be a planning scenario aimed at reducing pedestrian congestion along the central platform sections.
We analyze the two scenarios for the morning peak period of March 22, 2017.On this day, train #3020 arrives with a reduced number of cars during the busiest period (arrival at 08:06:10, operated with 7 cars), emphasizing the influence of the change in stopping position.
Fig. 13 shows pedestrian densities at 08:06:30 on platform 5/7 for the base case and the two scenarios.Caused by the outage of the nearby escalators, the 'no escalators' scenario shows a high density on the central stairways.The 'lateral train stopping' scenario reveals an increased use of the right half of the platform, and less elevated densities in the narrow, central platform areas.Hence, from a crowd management safety viewpoint, shifting train stopping positions may indeed be worthwhile in that specific case.
Fig. 14 shows the total number of passengers on platform 5/7 as a function of time.Six service levels are distinguished, revealing varying comfort and security (Fruin, 1971).The 'no escalators'-scenario shows that, due to the restrained platform egress, passengers remain longer on platforms and experience lower service quality, including unsafe conditions at times.Between 8:00 and 8:30, total egress times increase by several minutes.Disruption management may take this into account, for instance by holding connecting trains.In contrast, lateral train stopping positions hardly affect passenger accumulation and egress times, with a slight increase in the level-of-service.

Conclusions
A passenger-pedestrian model has been presented that, based on automated fare collection and train tracking data, simultaneously describes traveler behavior at stations and on-board.Travelers are assumed to choose pedestrian paths, platform waiting positions and individual train cars in a way that maximizes their utility.They explicitly consider the downstream consequences of choices, such as the anticipated variation of crowding in a train car, or the walking distance at their destination.The necessary knowledge is obtained within a day-to-day learning framework.
The interaction between travelers, rail access facilities and rolling stock is represented at the aggregate level.Passenger crowding, both in-vehicle and at stations, is endogenous, and its impact on individual travelers is explicitly modeled.The developed pedestrian model is applicable to multi-directional flows, as they may emerge on a walkway or a stairway, and considers static and dynamic  facility capacities, such as the maximum accumulation on a platform area, or the capacity of an escalator.
The model is applied to a case study of a major rail corridor in the Netherlands.The studied two-hour peak period involves approximately 35,000 passengers and 140 trains, depending on the day.Besides fare collection and train tracking data, a rich data set consisting of pedestrian counts and density observations on platforms is collected.In conjunction with an empirically calibrated transit itinerary choice model, the proposed framework is able to realistically reproduce level-of-service indicators such as flows and densities at the station level, as well as to plausibly predict further performance indicators such as ridership distributions.Such results are difficult to achieve with 'classical' transit assignment or station models, or they require additional assumptions, e.g.regarding platform waiting distributions.The computational speed is about two orders of magnitude faster than real-time, which makes the model applicable to extensive networks, or for time-critical applications.
In the long term, we see considerable potential in the application of the model for integrated investment appraisal, real-time estimation of crowding conditions, or for use in disruption management.

Appendix B. Recursive travel path choice
When reaching the end of a path segment w, a traveler is assumed to choose a next path segment W w w that maximizes the expected remaining travel utility.The set W w comprises all available path segments in the particular decision context, i.e., either walking links or train cars for a prospective train trip.The decision process is Markovian in that the choice of the next segment depends only on the current one, given an a priori chosen transit itinerary and individual traveler attributes (Bertsekas, 2017).
The specification of the choice set W is straightforward for walkway choices, or platform waiting section choices for prospective train passengers.Let where the indicator 1 a c , adj,board equals one if a and train car c are adjacent, and zero otherwise.The resulting choice set is contingent on the definition of adjacency.We present a definition of adjacency based on geometrical considerations below.
The inverse problem pertains to the platform section choice of alighting passengers.The choice set for a traveler riding on-board train car c alighting in train station s is given by , comprises all available subpaths connecting path segment w to the destination node n d that are associated with transit itinerary i.The operator E (•) provides the expected value, for which typically downstream attributes need to be predicted.The logit-structure of the choice model, Eq. (B.3), follows from the assumed distribution of the random utility component.The log sum in Eq. (B.3) is a direct consequence of a travel segment generally being associated with multiple travel paths.
Eq. (B4) implicitly requires estimation of downstream attributes of each choice alternative.To that end, an exponential filter is employed (Cantarella and Cascetta, 1995), which forecasts attribute z in iteration by convex combination of the attribute forecast in the previous iteration and of the value realized in that same iteration 1, forecast 1 experienced 1 forecast (B.5)where (0, 1) denotes the learning rate.Wahba and Shalaby (2005) suggest a learning rate of 0.7 for a similar application, which is adopted in this work.This specification implies that passengers consistently weight recent experiences more than older ones, stimulating exploration in path choice behavior.The choice of the learning rate mainly influences the speed of convergence, which may be further explored if computational considerations are crucial.
Specification.The adjacency indicators for pairs of train cars and pedestrian areas are defined based on their proximity, and the corresponding choice sets are constructed by simulation-based sampling.
It is assumed that the stopping position of each train in every station is deterministically known.Consider a train dwelling in a station as depicted in Fig. 15.The stopping position of the train head, and its distance from the front and rear of an adjacent area a are denoted by + q a and q a , respectively (see Fig. 2).The position of the front and rear of car c relative to the head of the train are denoted by + q c and q c , respectively.Assuming a uniform distribution of pedestrians within an area a, the longitudinal position of a traveler relative to the tip of the train is described by U + q q q ~( , ) a a .If a boards the closest train car, the adjacency indicator for boardings from area a to car c is given by = + 1 1 if q [q ,q ), 0 otherwise.The choice set resulting from Eq. (B.1) includes for every boarding traveler one alternative, contingent on the sampled position q.
The same approach is used to determine the adjacency indicator for alighting passengers.For each traveler, a random 'position' is drawn, and the alighting indicator is specified as = + 1 1 if q [q ,q ), 0 otherwise.In the propagation of travelers across the network, several capacity constraints are considered: For pedestrian areas, a maximum number of pedestrians is defined, which is denoted by a occ and referred to as occupation capacity.Similarly, for the interface between two areas a and a a flow capacity is considered, which is denoted by  an indicator that equals 1 if traveler y moves from area a to a at time t and 0 otherwise, the capacity limit for the interface between area a and a at time t can be expressed as The aggregation period needs to be large enough such that 1 a a , flow , and small enough for the interface flow to be sufficiently restrictive in case of demand fluctuations.Ideally, the capacity thresholds a a , flow are chosen in conjunction with to obtain a meaningful specification.For capacity limits during boarding and alighting, the corresponding flow for train car c is  Various priority schemes can be envisaged for space and seat assignment, ranging from random to first-come-first-served or more complex approaches.
Specification.From Table 1 and the functional form of the assumed density-speed relationship, occupation and flow capacities can be readily derived.The occupation capacity of area a is given by = k .local exit gate group is sampled; (iv) In case a connecting train is sampled, step (ii) is applied again, otherwise the synthesis of the current trip is terminated.
For trains serving only a single station within the considered corridor, boarding passengers are assigned a generic downstream station as destination, and alighting passengers a generic upstream station as origin.

.Fig. 2 .
Fig. 2. Space representation of a train station.Walkable space (gray) is partitioned into areas (delimited by dotted lines), and represented by a network of directed links (pairs of links as double-headed arrows) that connect internal nodes (black circles) and origin-destination nodes (stars).A train, represented by individual train cars (rounded rectangles), is dwelling at the upper platform.

Fig. 4 .
Fig. 4. Railway network of the Netherlands, with the Utrecht-Schiphol train corridor highlighted.

Fig. 5 .
Fig. 5. Platforms of the Utrecht-Schiphol corridor.Train movement is from left to right, top to bottom.

Fig. 6 .
Fig. 6.Platform access and egress flows on monitored routes as observed and estimated for March 22, 2017 (individual curves), with observation range representing weekdays between March 8 and March 30 (gray band).In brackets, sensor IDs are shown (see Fig. 5 for their location).

Fig. 11 .
Fig. 11.Average passenger load distribution of train #3522 when reaching Bijlmer, Amsterdam Zuid and Schiphol for the 11-day dataset.The dashed line denotes the seat capacity, amounting to 94 seats per car.

Fig. 12 .
Fig. 12.Two examples of information provision systems for on-board crowding.Green indicates that a train car is largely empty, orange means 'semi-crowded,' and red means 'full.' (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 14 .
Fig. 14.Level-of-service distribution on platform 5/7 of Utrecht Centraal for the morning peak period of March 22, 2017.
is an indicator that equals one if train car c and pedestrian area a are adjacent, and zero otherwise.Based on Eq. (2) and the aforementioned choice sets, the probability of traveler y on path segment w to choose segment W w each train car, we define a seat and standing capacity, c seat and c stand with units of pedestrians, and a capacity determining maximum boarding and alighting exchange rates per car, c ex with units of pedestrians per time.Upon being served by a facility element w, travelers may move to a next facility element W w w , provided that two conditions are met.First, space must be available in the target facility element w , which in case of a pedestrian area a at time T capacities at area interfaces and at train doors may not be exceeded.Assuming an aggregation period , for boarding and alighting, respectively.If capacity limits are reached, travelers are temporarily refrained from proceeding.A similar consideration holds for the assignment of seats to passengers.The number of seated passengers train car c is restricted by its seat capacity, i.e.,

Fig. 15 .
Fig. 15.Position of idling train with respect to platform, indicating the position of the train head (diamond), and its distance from the highlighted pedestrian area and train car.