Limitations of Recursive Logit for Inverse Reinforcement Learning of Bicycle Route Choice Behavior in Amsterdam

Used for route choice modelling by the transportation research community, recursive logit is a form of inverse reinforcement learning. By solving a large-scale system of linear equations recursive logit allows estimation of an optimal (negative) reward function in a computationally e ﬃ cient way that performs for large networks and a large number of observations. In this paper we review examples of recursive logit and inverse reinforcement learning models applied to real world GPS travel trajectories and explore some of the challenges in modeling bicycle route choice in the city of Amsterdam using recursive logit as compared to a simple baseline multinomial logit model with environmental variables. We discuss conceptual, computational, numerical and statistical issues that


Introduction
Bicycling in Amsterdam is serious business: one third of the daily movements in Amsterdam by residents and visitors is done by bike and almost half of the commute trips between work and home is done on a bike. Furthermore bicycling has seen a steady growth, not just in Amsterdam but in all (major) cities in the Netherlands. This leads to increasing congestion issues for bicyclists especially at intersections with traffic lights. To address this, policy changes are required such as new infrastructure, changes in traffic signal policies, etc. To create effective policies, policy makers need tools to gain insights in the behaviour of bicyclists, to model when and where they will bicycle.
Our previous research in Koch et al. [10] found that bicyclists in Amsterdam rarely take the shortest path and make more detours than cars. It is thus less straightforward to predict routes taken by the average bicyclist route and consequently estimate the number of bicyclists on a given street. However with the advent of new data collections

Introduction
Bicycling in Amsterdam is serious business: one third of the daily movements in Amsterdam by residents and visitors is done by bike and almost half of the commute trips between work and home is done on a bike. Furthermore bicycling has seen a steady growth, not just in Amsterdam but in all (major) cities in the Netherlands. This leads to increasing congestion issues for bicyclists especially at intersections with traffic lights. To address this, policy changes are required such as new infrastructure, changes in traffic signal policies, etc. To create effective policies, policy makers need tools to gain insights in the behaviour of bicyclists, to model when and where they will bicycle.
Our previous research in Koch et al. [10] found that bicyclists in Amsterdam rarely take the shortest path and make more detours than cars. It is thus less straightforward to predict routes taken by the average bicyclist route and consequently estimate the number of bicyclists on a given street. However with the advent of new data collections techniques such as GPS on smartphones, more and more data is available on the revealed preference of bicyclists, allowing researchers to develop new ways to model influences on bicycle route choice. In this study we will attempt to quantify the influence of environmental and spatial planning features on route choice decisions of everyday bicyclists in Amsterdam by estimating a model that can predict where and how these cyclists traverse in and around Amsterdam.

Discrete choice modeling of travel routes
Since the 1970's discrete choice modeling has been a leading method to understand choice behaviour of individuals in a wide range fields such as marketing, economics and transportation. Described by McFadden et al. [12] in 1973, discrete choice modeling has subsequently been extended over the decades in order to overcome specific limitations such as overlapping alternatives and correlations over time.
The study of the specific field of route choice, is more complicated than a choice between easily enumerable distinct alternatives, as route choice is typically a sequence of choices at each intersection, each transit stop, each mode, etc. This leads to very large choice set that is theoretically infinite due to loops. Often there can also be a large overlap between different route alternatives leading to difficulties for choice modeling. We will highlight two commonly used approaches: An approach established in the 1990s to model route choice using a collection of observed paths and for each observations a set of generated paths by a route choice generator. This approach has been used to estimate models such as multi-nominal logit (MNL) and mixed logit. This approach comes with limitations: as discussed in Koch et al. [10], these route choice generators do not necessarily create realistic routes; and Frejinger et al. [8] argues that parameter estimates can vary significantly depending on the bias of the route choice generator. To address the issues with the overlap between difference alternative paths and the resulting correlations, multiple extensions have been proposed to attempt to avoid erroneous path probabilities and substitution patterns. The most two popular are path size logit [1] and C-Logit [5], which decrease the utility of overlapping paths proportional to the overlap with other paths included in the choice set.
A second approach is to achieve a consistent choice set by sampling as proposed by Frejinger et al. [8]. This approach attempts to set up a sampling protocol in order to obtain unbiased parameter estimates from the route choice sets to neutralize the bias introduced by the route choice generator.

Bicycle route choice
In 2010, Menghini et al. [13] published a seminal route choice model for bicyclists estimated from a large sample of GPS observations in a revealed preference study with 2435 persons logging 73,493 trips in Zurich, Switzerland. Using this data they estimated a multinominal logit model and used breadth-first search link elimination (BFS-LE) to generate choice alternatives for each observed trip. They included six different variables in the choice model: length of the route, average absolute gradient change, maximum gradient change, percentage of marked bicycle paths along the route, number of traffic lights and the path size measure. Accounting for the similarity between alternatives with the path size vector, their model showed that the elasticity with respect to trip length was nearly four times larger than that with respect to the percentage of bicycle paths along the route. The only other explanatory variable that had an impact albeit small, was the product of length and the maximum gradient along the route.
In prior research with the same specific sub-selection of the data, we found in Koch et al. [10] that bicyclists in Amsterdam often deviate from the shortest path, more than car drivers, indicating that there are different and possibly also more factors that have an effect on the routes bicyclists in Amsterdam take. In Koch et al. [10] we focused on the concept of route complexity: counting the number of locations where people deviate from the shortest path, in the interest of improving route choice generation techniques and potentially get more insight into the motivations for the route choice for bicyclists. In this study we explore other effects on route choice using different methodologies, without looking at route complexity or where people deviate from the shortest path. In future research we intend to combine both streams of work.
Zimmermann et al. [19] showed that is possible to estimate bicycle route choice without the restrictiveness of pregenerated route choice sets and model route choice as a sequence of choices via recursive logit. For comparison with the recursive logit model, we will estimate a simple baseline multinomial logit model using a synthetically generated choice set. The synthetic approach allows us to generate additional plausible route alternatives outside the set of observed routes. This means that we can include all observations even between origin destinations pairs with only a single observations, unlike the study by Ton et al. [16] that is limited to trips traversing the inner city of Amsterdam, due to a insufficient density of trips in the suburbs for this empirical approach to work there.

Methods
In this section we will review two variants of performing choice modeling without choice sets: an analytical solutions via recursive logit and a computational approximation via inverse reinforcement learning.

Dynamic discrete choice modeling of travel link sequences
An alternative approach uses link-based Markov decision process to model route choice as a series of sequential decisions. First proposed by Fosgerau et al. [7] it uses a linear system of equations to efficiently compute choice probabilities by using a solver to solve Bellman equations.
An incidence matrix is established that defines the exponential utility to perform action a from state k: otherwise. The size of the incidence matrix is given by | A| describing the number of states A and the number of dummy links d representing termination states of destination. As the dummy links d have no successors, the row k = d will be zero. Secondly Fosgerau et al. [7] define a vector z of size | Ax1| vector where z k = e 1 µ V(K) and a vector b of size | Ax1| where b k = 0, k d and b d = 1. Now given the identity matrix I, Fosgerau et al. [7] write the linear equation: This system has a solution if I − M is invertible, which might not be the case. As Fosgerau et al. [7] note this is highly dependent on the balance between the number of paths that connect the nodes in the network and the size of instantaneous utilities 1 µ v(a|k). They note that this issue is particularly important to consider when estimating a model, as depending on the value of β, I − M can be ill-conditioned or even singular. Fosgerau et al. [7] note that this limits the possible values of parameters, as when equation 3.1 does not yield a valid solution for at least one observation, the log likelihood function is not defined. They suggest to deal with this issue by starting at a feasible point (meaning a large enough magnitude in the parameters) and then being conservative in the initial step size of the line search algorithm at the price of an increased number of iterations.
In Mai et al. [11] an improvement is proposed to Fosgerau et al. [7] by reducing the numbers of linear systems that need to be solved. By adding all observed destinations in vector b of size | Ax|D|| it becomes possible to solve the problem one iteration instead of solving the system for each destination separately, allowing for 30 times performance gain in their example. They use this performance gain to propose a mixed recursive logit, which allows for random taste variation by adding a random value to the utility function and running the model n draws each iteration to allow for a random variation. They perform a case study in two cities. First a car route choice model in the Swedish city of Borlänge, with 466 destinations, 1832 observations and a bicycle route choice model in Eugene, Oregon with 286 destinations with a unknown number of observations.

Collecting data on bicycle movements
For this study we used the 2016 FietsTelweek ("Bicycle Counting Week") data set (Bikeprint [3]) that is available at their website. During the week of the 19th of September 2016 approximately 29,600 bicyclists volunteered to track their bicycle movements using a smartphone app. For this case study we limited the study to bicycle trips to and/or from the city of Amsterdam, Diemen, Amstelveen and Ouder-Amstel, leaving around 29,684 trips.
This app ran in the background collecting all movements by the bicyclists using the phone's GPS and acceleration sensors. The cyclists used their bike in a way as often seen in the Netherlands, using their bike as transportation from and to work, supermarket, school, etc. For privacy reasons the resulting data was anonymized by the data provider before making it publicly available (i) by the removal of user information to make it impossible to trace multiple trips to a single person and (ii) by rounding of the trip departure time into one-hour bins to the nearest hour and (iii) removal of the random number between 0 and 400 meters from the start and the end of the trip to obfuscate the true origin and destination of each trip.
In prior research based on this data we found in Koch et al. [10] that bicyclists in Amsterdam often deviate from the shortest path, more than car drivers, indicating that there are different factors at play in the route choice of bicyclists in Amsterdam. In Koch et al. [10] we focused on the concept of route complexity: counting the number of locations where people deviate from the shortest path, in the interest of improving route choice generation techniques and potentially get more insight into the motivations for the route choice for bicyclists.

Generation of alternatives
To find out what kind of alternatives exist for each observed path we applied synthetic route choice generation using the Double Stochastic Generation Function (DSGF) method described by Nielsen [15]. The DSGF approach produces heterogeneous routes because both the cost and parameters used in the cost function for the links are drawn from a probability function. This way it can generate random paths, just by calculating the shortest path since the cost of each route is based on random factors. Halldórsdóttir et al. [9] showed that DSGF has a high coverage level of replicating routes taken by bicyclists and that it performs well up to 10 kilometer. Furthermore Bovy and Fiorenzo-Catalano [4] state that the method guarantees, with high probability, that attractive routes are included in the choice set, while unattractive routes are left out.
We used an existing implementation of DSGF, specifically POSDAP by ETH-Zurich [6] working on a street network provided by the data collection team of the Fietstelweek, that they imported from OpenStreetMap. We slightly modified POSDAP to execute at most a given number of M = 128 iterations (instead of running for a given duration) so that it behaves identically on different machines. For some origin destination pairs POSDAP was not able to find as many as N 0 routes in M iterations, in which case we will use all found routes. The choice sets are written to CSV files for further processing.

Environmental variables
To collect a set of variables that would reasonably impact route choice of bicyclists we collected and processed open data sources to compute various explanatory variables describing each route.
First of all for each link in the network we include the length of that link as distance and if that link is a dedicated cycle-way, we include the length as oncycleway. Additionally we have a variable traveltime based on the length and an estimated speed based on the GPS observations.
To include data about the environment of each link we extracted information of data made openly available by the city of Amsterdam. Firstly we pulled potentially relevant variables from a geographical data-set with land-use zones. To combine the street-network with other relevant geographical data-sets, we cut each street link into small segments of 5 meters and determined the distance of that segments to a geographical feature in the land use data-set. The variable nearwater measures the distance of street situated close to water bodies such as the canals of Amsterdam, (small) lakes, rivers and other water bodies wider than 6 meters. To determine a preference for routes through parks and forests we did the same thing with the variable neargreen, measuring distance of street situated within a 25 meter radius of 'green' land used for parks, forests and meadows.
For a more fine-grained indication of the level of green and trees along a route, we used a data-set of the location of each individual tree in Amsterdam to determine what portion of each street segment is covered by trees. Our reasoning is that the number of trees has an influence on route choice as they can provide shade on hot days and function as a cover against the wind in storm conditions. To determine the variable neartree we measured the distance of street within 30 meters left or right from one or more tree(s). This way a street along a row of trees would have the full distance. We determined the distance of 30 meters between road and tree based on various situations where a rows of trees are situated along bicycle roads in Amsterdam.
To measure the effect of residential areas the variable nearresidential measures the distance of streets in residential areas. . The variable nearretail describes distance within areas purposed for 'Shops, malls and hotels-restaurantspubs', 'Public offices and services' and 'Cultural, social, medical, educational'.
To see if the vicinity of busy roads, a major source of noise and pollution, has any impact on route choice we used a data set with the noise contours map of road traffic in Amsterdam. This data-set is produced by a model that estimates the level of exposure to traffic noise in this map there are four noise levels with respectively at least 55, 60, 65 or 70 decibels of noise. The variables nearXdb represent the distance of the street passing through these exposure zones.
Based on the idea that tramlines in Amsterdam form a radial artery towards the heart of the city, we construct the variable neartram indicating the portion of the route that is situated 100 meter from tram rails either to the left or right of the path, measured using segments of 10 meters.
Finally we wanted to see if the number and frequency of traffic signals has a measurable effect on route choice. We included this in two ways: first the exact number of traffic signals with ntrafficsignal and secondly the frequency of traffic signals trafficsignalfreq where the number of signals is divided by the length of the route.
Since Amsterdam has no elevation changes beyond the occasional bridge, we did not include any elevation changes as a variable.

Baseline Multinomial Logit Model
Based on MNL models with a single variable per model, we removed some variables with high correlation between them. Firstly we decided to include only the highest and lowest level of traffic noise exposure. While the four variables were significant on their own, there was not much difference for the estimated coefficient values between 60 and 70 decibels. Secondly we only included the absolute number of traffic signals as the frequency of traffic signals per kilometer had a lower t-test score. The choice model was estimated both using PandasBiogeme [2] the estimation results are reported in Table 1. The results of the choice model in Table 1 are what we expected to see and saw before in other studies. Bicyclists are prepared to travel longer to travel over nicer and safer routes. In the model we see a very significant effects of dedicated cycle-way infrastructure and the number of parks, meadows and forests along the route. The effect of just trees along the route is less significant yet still positive. We also see the result we expected for noise exposure: a significant avoidance of very heavy traffic noise exposure of 70 decibels or higher and smaller yet still significant effect from noise exposure to lower noise exposure of 55 decibels or higher. The effect we expected of attraction for routes along tram lines possibly being a landmark for navigation could be true based on the positive beta. For the effects of land-use along the route we see a negative value along routes with more residential land-use, likely because it is easier to navigate around residential neighbourhoods than go through them. Bicyclists also seem to prefer routes along areas with retail land use.

Recursive Logit Model Experiments
In this section we describe a series of experiments in modeling bicycle route choice in the city of Amsterdam using recursive logit model with the observed bicycle routes in our data, environmental explanatory variables, and the Amsterdam street network -without generating route choice alternatives.

Recursive logit with environmental variables
Our initial attempt was to model the Amsterdam network with each intersection as a node and the streets as actions, following example in Zimmermann and Frejinger [18]. This resulted in a network with approximately 46,000 links and 30,000 observations, which we carefully controlled for full connectivity and no isolated graphs. Our motivation to model intersections as states instead of links as states was driven to lower the number of total states to be modeled, under the assumption that turn angles might have a low influence on bicycle route choice in Amsterdam. We tested the recursive logit model with the five variables length, oncycleway, nearwater, neargreen and near55db. However we were unable to get the solver to give plausible results for equation 3.1 as the solver would return incorrect results. Fig. 1: Two fixed paths to same destination along the boundaries of the graph (in red and green), plus example of one randomly generated path (in blue). All paths start at the top left corner and end respectively at the large red/green circle and the large blue circle.
Subsequently we simplified the study area to just the Amsterdam city center area containing only about 4500 links, excluding the entire municipality and surrounding suburbs. Again we carefully controlled for full connectivity and no isolated graphs. This too did not lead to plausible estimation results.
Based on the remark by Fosgerau et al. [7] on dense networks and the number of alternative paths, our next action was to simplify the street network in the Amsterdam city center and remove all footpaths to reduce the complexity of the network. Again we carefully controlled for full connectivity and no isolated graphs. We accordingly also removed all observations of GPS trajectories cycling over footpaths. This too did not lead to plausible estimation results.
Finally to transform our model to a model more similar to the studies in the literature, we instead created an edge-based network, instead of the intersection-based network. In the adapted implementation, each state is a streetsegment and each action is a move to another street segment. This link-link approach allows the possibility to create new features with a boolean to indicate turns, left turns and u-turns, similar to the Borlänge model in Fosgerau et al. [7] and Mai et al. [11]. For the entire city of Amsterdam this model contains 40063 links as states with 137724 transitions between states; for the city center area only it consists of 4204 links as states with 15234 transitions.

Discussion
Given our experience with the Amsterdam model, we highlight several challenges during the estimation of the recursive logit model. We reflect on why our initial plan for estimating the bicycle choice behavior in the entire city of Amsterdam with environmental variables was feasible with the baseline multinomial logit model, but faced numerical complications with the recursive logit model.

Negative reward formulation
In the original paper by Fosgerau et al. [7] on recursive logit it is mentioned that to formulate the path choice problem as a dynamic discrete choice model with the utility maximization problem consistent with a dynamic programming problem, the deterministic utility component is required to have negative value: v n (a|k) = v(x n,a|k;β ) < 0.
As an experiment we set up a network based on a simple grid layout, with 625 intersections, allowing the user to move left, right, up, down. There is one diagonal connection across from the top left corner to the bottom right corner. We included each segment between the intersections as a single unit of distance. See Figure 1 for a visualization of 10 by 10 grid. We set up 4 different variables: β distance for the unit distance, β intersection that counts each intersection passed, β le f t that counts each move towards the left side of the grid, β diagonal that counts each diagonal move. We included two observations across the top and right of the grid and a observation across the left and bottom of the grid and a series of 10 random observations that have a strong preference to move diagonally when possible. This model estimated with a log likelihood of -10 and β distance = −1.54467129, β intersection = −2.04467129, β diagonal = −2.09161539, βle f t = −81.34025.
What we observed is that altering the attribute of a single link of this model to make the utility of that link positive lead to the inability of the linear solver to return a valid solution and thus not being able to find a log likelihood or estimate a model.
An implication that when using recursive logit you should aim for only including costs in your function u. In practice this might turn out tricky as cost variables may turn out to be correlated with reward variables not included in your model. For example heavy traffic near a bicycle path may seem like a cost variable at first, but as such roads are likely equipped with street lights in contrast to a path through a dark and empty park, such variable may turn out to have a negative cost.

Valid initial parameters and length of observations
To take a closer look at how difficult it can be to determine a valid initial parameter prior to iterative solution of the system, we proceeded to look at solely at travel-time without any other features in the model. To do so, we manually computed the log-likelihood function for a range of the β travel−time parameter in the range between -1 and -25. We saw that only in a small window of β travel−time between approximately -18.02 and -21.01 a valid log likelihood function exists. For a β travel−time <= −18 the equation system would return an invalid sign for the log likelihood, for β travel−time >= 21.01 at least one of the observations would return a exp(V) = 0 for at the starting value.
This narrow range was achieved with a number of links in each observations limited at 40. If we allowed observations with more links we were unable to find a window of initial parameters where the log likelihood function is valid at all. We see numerical issues as the root cause of this. As a long recursion will be a sum of each link utility, with high values due to the exponential, we expect these results to be caused by overflows and under flows in the solver.

The distribution of values of features and network degree centrality
Subsequently we attempted a similar experiment with the only feature in the model being β length , which is correlated with β travel−time . We were unable to find an exact parameter of β length that is valid, but deduce it is somewhere between -413.6 and -413.7, based on where the solver returns a valid solution but exp(V) = 0. To look at the difference between both variables we refer the histograms of the distributions plotted in Figure 2. Based on the descriptive statistics we

The number of alternative choice options
Another difference with existing studies in the literature that due to the complexity of bicycle infrastructure in Amsterdam, the number of possible options is higher than we would see in car route choice or in a city without two cycle-paths on both sides of major roads or two roads in both directions (for cyclists) along the canals.

Conclusion
Recursive logit is a promising solution for inverse reinforcement learning on specific route choice problems. However when designing your model and variables it is very important to keep the limitations of the linear equation system in mind. These limitations can make it impossible to estimate your model or lead to wrong estimations.
As recursive logit may fail to converge if even a single link has a (high) reward instead of cost, it is important to think through whether your variables are always costs for all links in the network. This can be hard in practice, as assumptions can be deceiving. For example you might model a bridge as a cost, as there is a small slope involved, however in reality people might prefer a route over a bridge as a form of sight seeing opportunity. Furthermore preferences can differ by person or vary over the time of day. For example a park might be a beneficial detour during the day, but during the night an empty badly lit park that feels unsafe might be worth a detour around instead.

Recommendations
For future study we are interested in the precise computational details that lead to the invalid estimates by the solver when faced with numerical overflow and underflow issues.
We are also looking into how well extensions of algorithms based on maximum entropy IRL of Ziebart et al. [17] will function with the Amsterdam bicycle network given the successful implementation of inverse reinforcement learning for bicycle paths in the work by Mo [14], however as in this study similar limitations were noted regarding the size of the state space.
In our initial experiments with implementing maximum entropy IRL by Ziebart et al. [17] on even the simplified problem of the Amsterdam city center with observations with a length of less than 30 links, we encountered overflows when calculating the local action probabilities as the expected reward for each state grew exponentially even when applying a discount factor.
Our recommendation would consist of searching for a way to dramatically simplify the state space by find ways to abstract the decision process. One possibility could be for example applying principles of path complexity such as in Koch et al. [10].