1 Introduction

Improvements in public transport systems have the potential to better the life of millions. In the sustainable development goals of the United Nations Development Programme, goal 11 (“Sustainable Cities and Communities”) states the need to “provide access to safe, affordable, accessible and sustainable transport systems for all, improving road safety, notably by expanding public transport” [1]. Operations Research can help achieve these goals by making better use of the existing infrastructure.

One part of the process is to design timetables that allow passengers short travel times. Since the introduction of the periodic event scheduling problem (PESP) in the late 80s (see [2]), periodic timetables have been a major challenge to optimize. In this model, we assume that services are offered periodically (e.g., there is a train departure of a specific line from a station at 20 min past the hour throughout the day). Periodic timetabling models are surprisingly flexible in the aspects that can be included [3], and have been driving algorithmic development through the theoretical (see, e.g. [4, 5]) and computational challenge that they pose (see, e.g., [6,7,8,9,10,11]). Algorithmic development requires instances that allow the research community to compare performance results. The PESPlib benchmark set [12] has been offering such an opportunity in the case of pure periodic timetabling problems.

At an increasing rate, recent research has gone beyond the possibilities of PESP by looking at more integrated models for public transport [13,14,15]. This approach is particularly natural in the case of periodic timetabling. To find a good timetable, we need to know the routes that the passengers wish to use. But passenger routes depend on the timetable. In classic timetabling problems, we simply fix a passenger routing that is calculated in a heuristic way. This can potentially lead to solutions that are far from optimal, see [16, 17], even if we follow an iterative timetabling and passenger routing strategy, see for example the discussion in [18]. For this reason, integrated models need to be considered, that optimize both the timetable and the passenger routes simultaneously. In many recent publications, solution methods for integrated models have been discussed, see, e.g., [17, 19,20,21,22,23,24].

This creates a demand for new benchmark instances that allow researchers to compare the results that solution methods can achieve. The purpose of this paper is to address this demand by introducing a new, freely available benchmark set of instances for the integrated periodic timetabling and passenger routing: the TimPassLib [25].

The remainder of this paper is structured as follows. In Sect. 2, we give a formal definition for the integrated problem that we study. The problem instances that constitute the TimPassLib are described in Sect. 3. We conclude our work in Sect. 4 and described the data format in Appendix A.

2 Problem Definition

We first introduce the classic periodic timetabling problem where passenger routes are fixed, which is based on the periodic event scheduling problem [2]. We consider a set of events \({\mathcal {E}}\). For each event \(i\in {\mathcal {E}}\) we would like to schedule a time \(\pi _i\) in the discrete interval \(\{0,\ldots ,T-1\}\), where T denotes the period length. The schedule is supposed to be repeated every T time units. In timetabling, typical events are the arrival or the departure of a train at a station. Additionally, there is a set of activities \({\mathcal {A}}\subseteq {\mathcal {E}}\times {\mathcal {E}}\) that connect events with each other. Such an activity may model a train driving from one station to the next, a train waiting at a station for passengers to embark or disembark, or a group of passengers changing from one train arrival to the departure of another train. Another class of activities, which is not related to passengers, could model headway between trains or synchronization of lines running more frequently than once every T time units. Events and activities together result in the event-activity network \(\mathcal {N} = ({\mathcal {E}},{\mathcal {A}})\). Each activity \(a\in {\mathcal {A}}\) has a lower bound \(\ell _a\in \mathbb {N}\) and upper bound \(u_a\in \mathbb {N}\) that reflect requirements on minimum and maximum durations of activities. We denote by \(y_a\) the slack of an activity, i.e., difference of an activity duration to its lower bound. Finally, we assume that some weight \(w_a\) is known for each activity, which represents the number of passengers who wish to use this activity.

The periodic timetabling problem can then be formulated as the following optimization problem:

$$\begin{array}{llrlr} &\text{min} &\sum\limits_{a\in\mathcal{A}} w_a (y_a + \ell_a) \\ &\text{s.t. } & y_a &= [\pi_j - \pi_i - \ell_a]_T & \forall\, a=(i,j)\in\mathcal{A}, \\ & & 0 \le y_a &\le u_a - \ell_a & \forall\, a\in\mathcal{A}, \\ & & \pi_i &\in \{0,\ldots,T-1\} & \forall\, i\in\mathcal{E}.\end{array}$$

Solution approaches based on mixed-integer programming typically focus on one set of variables; either the timetable variables \(\pi _i\) (node-based formulations) or the slack variables \(y_a\) (cycle-based formulations) [26]. The symbol \([\cdot ]_T\) denotes the modulo bracket \([x]_T = \min \{ x + zT: x+zT \ge 0,\ z\in \mathbb {Z}\}\) and is usually modeled by introducing additional integer variables.

In the integrated periodic timetabling and passenger routing problem, we choose to model the activity weights \(w_a\) in more detail. We assume that an origin–destination (OD) matrix is given, where every entry \(d_{st}\) denotes the number of passengers who wish to travel from origin station \(s\in V\) to destination station \(t\in V\), where V denotes the set of all stations. Note that multiple events in the event-activity network can be assigned to the same station. Let \(P_{st}\) denote all simple paths in \(\mathcal {N}\) that connect an event corresponding to station s with an event corresponding to station t, and use exclusively drive, wait or change activities as described above. For each path \(p\in P_{st}\), let \(f_p\) denote the fraction of passengers that travel from s to t along p. This means that for each OD pair (st) with \(d_{st} > 0\), we require

$$\begin{aligned} \sum _{p\in P_{st}} f_p = 1. \end{aligned}$$

The passenger weights \(w_a\) for each activity \(a\in {\mathcal {A}}\) are then determined as

$$\begin{aligned} w_a = \sum _{(s,t) \in V\times V:d_{st} > 0} \sum _{p\in P_{st}: a \in p} d_{st} f_p\quad \forall a \in {\mathcal {A}}. \end{aligned}$$

From a passenger’s perspective, there are more criteria which connection to choose, apart from just the travel time. Another frequently considered criterion is the number of transfers along a journey [27]. This can be included in this model by using an additional penalty if passengers need to use a change activity, which represents the discomfort in comparison to a direct connection. Let c denote this penalty value and let \({\mathcal {A}}_{\text{change} }\subseteq {\mathcal {A}}\) denote the set of change activities in the event-activity network. The objective function is extended by the term

$$\begin{aligned} \sum _{a\in {\mathcal {A}}_{\text{change} }} cw_a \end{aligned}$$

to include change penalties.

The integrated periodic timetabling and passenger routing problem is now to find both the timetable \((\pi ,y)\) as well as passenger routes and weights (fw) simultaneously. The resulting model is thus non-linear because of the bilinear term in the objective. Observe that we can assume that variables \(f_p\) are binary, as there is always an optimal solution where passengers are routed along a single shortest path, since the model does not contain capacity constraints. Hence, the overall model for our problem can be summarized as follows [16, 18, 21, 28]:

$$\begin{array}{lrlr}\text{min} &\sum\limits_{a\in\mathcal{A}} w_a(y_a + \ell_a) &+ \sum_{a\in\mathcal{A}_{\text{change}}} cw_a & \\ \text{s.t.} & y_a &= [\pi_j - \pi_i - \ell_a]_T & \forall a=(i,j)\in\mathcal{A}, \\ & \sum\limits_{p\in P_{st}} f_p &= 1 \qquad \forall (s,t) \in V \times V : d_{st} > 0, \\ & w_a &= \sum\limits_{\underset{d_{st} > 0}{(s,t) \in V\times V:}} \sum\limits_{\underset{a \in p}{p\in P_{st}:}} d_{st} f_p & \forall a \in \mathcal{A}, \\ & 0 \le y_a & \le u_a - \ell_a & \forall a\in\mathcal{A},\\ & \pi_i &\in \{0,\ldots,T-1\} & \forall i\in\mathcal{E}, \\ & f_p & \in \{0,1\} & \forall (s,t)\in V\times V : d_{st} > 0, p\in P_{st}.\end{array}$$
(⋆)

3 Data Sets

All benchmark data sets presented in this paper are available at https://timpasslib.net [25]. The file format is described in Appendix A. The website collects the best known solutions and bounds for each data set. A potential new solution can be checked by means of a verification script written in Python, which also computes an optimal routing for a given timetable.

In the remainder of this section, we briefly describe the TimPassLib data sets. All instances are based on a set of lines, where we interpret a line as a sequence of stations without repetitions. Table 1 presents an overview of some key features of the instances. The columns contain the following information:

instance:

The name of the instance. More information can be found in the following subsections.

stations:

The number of stations.

lines:

The number of operated lines. Note that a line can have a frequency higher than one.

OD pairs:

The number of OD pairs with \(d_{st}>0\).

OD total:

The total number of passengers.

events:

The number of events in the event-activity network.

activities total:

The total number of activities in the event-activity network.

activities fixed:

The number of activities in the event-activity network with \(\ell _a=u_a\).

activities free:

The number of activities in the event-activity network with \(u_a-\ell _a=T-1\).

activities restricted (restr.):

The number of activities in the event-activity network with \(\ell _a<u_a<\ell _a+T-1\).

reference objective (ref. obj.):

The objective value w.r.t. (\(\star\)) of the best timetable computed by the concurrent PESP solver by [7] within a wall time limit of one hour. The computations have been executed on an Intel Xeon CPU E3-1270 v6 running at 3.80 GHz with 32 GB RAM using Gurobi 9.5.2 [29] as MIP solver. The weights for the periodic timetabling problem have been obtained from a passenger routing according to lower bounds \(\ell _a\) and change penalty c. The aim of this reference objective is to provide initial solutions obtained with uniform computational power.

lower bound:

A lower bound on the objective value of (\(\star\)) obtained by routing all passengers on shortest paths w.r.t. lower activity bounds \(\ell _a\) and change penalty c.

gap:

The gap per passenger between reference objective and lower bound:

$$\begin{aligned} \frac{\text{reference objective} - \text{lower bound}}{\text{OD total}}. \end{aligned}$$

This gap can be used to estimate the gains of solving the integrated problem.

Table 1 Overview of key features for the data sets

3.1 Hamburg

The instance Hamburg models the suburban commuter rail network of S-Bahn Hamburg. The infrastructure is mostly independent from other railways and has only few single-track sections. The network is operated with six lines, where one of the lines has two branches. The period length is 10, as all lines run every 10 min in the rush hour. Bounds for travel and dwell times are derived from the annual timetable for 2023. The passenger demand is based on data from trains with sensors for automatic passenger counts, which is publicly available on the open data portal of Deutsche Bahn AG [30]. We solve a linear program that fits this passenger flow to a gravity model derived from the total number of passengers boarding and alighting at each station.

3.2 Schweiz_Fernverkehr

The instance Schweiz_Fernverkehr is an excerpt of the ICE, TGV, InterCity and InterRegio trains within Switzerland and contains 80 lines. The period length is 120 min. The data is based on GTFS timetable data for 2023 and on station passenger counts, which are both publicly available on the open data platform of the Swiss federal office of transport [31]. We use a gravity model based on geographical distances to sample an OD matrix.

3.3 toy and toy_2

The instances toy and toy_2 are based on a small artificial data set in the software library LinTim [32]. The line concepts are generated algorithmically and consist of two and six lines, respectively. The period length is 60. Note that the same instances are used in [17].

3.4 regional

The instance regional is based on the regional train network in Lower Saxony, Germany, and is available as part of the software library LinTim [32]. The line concept is generated algorithmically and consists of 8 lines. The period length is 60. Note that the same instance is used in [17].

3.5 grid

The instance grid is a benchmark data set, originally introduced in [33] and available as part of the open source data set [34] as well as the software library LinTim [32]. The line concept is generated algorithmically and consists of 8 lines. The period length is 60. Note that the same instance is used in [17].

3.6 long-distance

The instance long-distance is inspired by the long-distance train network in Germany and part of the developer version of software library LinTim [32]. The line concept is generated algorithmically and consists of 42 lines. The period length is 60. Note that the same instance is used in [17].

3.7 metro

The instance metro is based on the metro system in Athens, Greece, and is available as part of the software library LinTim [32]. The line concept consists of 4 lines. The period length is 150 and the timetable is planned in increments of 6 s. Note that the same instance is used in [17].

3.8 Erding_NDP_S020 and Erding_NDP_S021

The instances Erding_NDP_S020 and Erding_NDP_S021 are based on the transport supply in Erding, Germany, and are available as part of an open source data set [34]. The corresponding line concepts consist of 21 lines and the period length is 60.

3.9 Stuttgart

The instance Stuttgart is based on the transport supply in Stuttgart, Germany, and is available as part of an open source data set [34]. The corresponding line concept consists of 156 lines and the period length is 3600, i.e., the timetable is planned in seconds.

3.10 RxLy

This group consists of 16 instances, numbered from R1L1 to R4L4. They are an extension of the 16 core instances of the PESPlib [12], which in turn was based on long-distance train network data in Germany from the software library LinTim [32]. The numbers of lines are in the range from 54 up to 133, with a period length of 60.

4 Outlook

Integrating passenger routing into periodic timetabling is a natural step to resolve the mutual interdependence of both problems. For more than ten years, the PESPlib library [12] has stimulated a variety of research on periodic timetabling with fixed routing. As of now, recent improvements are only marginal, disregard passenger flow, and therefore do not necessarily improve passenger comfort. With the initiation of TimPassLib, we hope to foster the investigation of periodic timetabling with passenger routing, to extend the scope of the PESPlib library to a solid base of benchmark instances for the integrated problem, and to contribute to the attractiveness of public transport with mathematical optimization methods. We are therefore very much looking forward to submissions of new solutions or lower bounds as well as new benchmark instances. Furthermore, we think of extending TimPassLib with more optional features in the future, including vehicle capacities and different routing models.