Estimating the external costs of travel on GPS tracks

Providing people with information on the external costs of their mobility, including generated emissions and congestion, has been shown to influence travel behaviour. We present a methodology for estimating transport externalities at the link level on GPS data, with the aim to provide more accurate estimates of external costs. Emission values for various pollutants are calculated for each link using the HBEFA database, accounting for vehicle type, road category and traffic conditions. For congestion, average link delays are computed for each hour of the day. We apply this methodology to GPS trips collected during a national mobility pricing study. The method- ology is validated against the Swiss national estimates. The range of per-kilometer values for various external costs are compared to the Swiss norm values. The observed trip-level hetero- geneity in per-kilometer external costs supports the hypothesis that using average values in the analysis of external costs is insufficient.


Introduction
It is increasingly recognized that both the environmental and social costs of travel need to be internalized to manage the demand on already strained transport networks by encouraging shifts in travel patterns. In this direction, there is a growing body of evidence that informal feedback on energy use can encourage more efficient behaviour, both regarding home energy use (Faruqui et al., 2010) and travel behaviour (Taniguchi et al., 1839;Fujii and Taniguchi, 2006). However, providing accurate and individualized feedback on external costs in transport is particularly challenging, primarily due to the heterogeneous nature of travel behaviour and the difficulties inherent in data collection at the individual level.
The main external costs of transportation can be divided into two groups: those that affect other users in the network, namely congestion and accident risks, and those that affect those outside the system such as noise and emissions (Button, 2004). These two categories are called intra-and inter-sectoral, respectively. The impact of congestion is primarily the loss of time spent waiting or slowed down in traffic, whereas emissions and noise have both environmental and health consequences.
Most of the literature on external costs in transportation focuses on road transport, partly because this is where the external costs are the highest (Maibach et al., 2008). The external costs arise because road users only consider their own costs of travel -known as the Marginal Private Cost (MPC) -and do not consider their own contribution to the total societal costs of road usage -known as the Marginal Social Cost (MSC), since the private costs (in time) of each trip rises with the number of drivers in the network. This is the primary intra-sectoral external cost. Combined with the inter-sectoral costs of pollution and noise, the unregulated level of travel exceeds the optimum for the network. As such, by imposing a price or tax equal to the difference between the MPC and MSC, the benefit

Background
Building on Pigou's two road model (Pigou, 1920), Vickery's bottleneck model (Vickrey, 1963) has become a key model for examining congestion effects in a network (Arnott et al., 1993;Van Den Berg and Verhoef, 2011), as well as social-optimums via pricing (Chakirov, 2016) and pricing schemes (Laih, 1994). However, Arnott et al. (2001) note that traditional macroscopic models focus on link congestion, while ignoring or simplifying other elements of congestion such as modal congestion, parking, interactions with pedestrians and spillback effects. In particular, the importance of value of time heterogenity among individuals in road pricing models has been recognized by numerous researchers (Small and Yan, 2001;Verhoef and Small, 2004). Modern traffic microsimulation frameworks such as MATSim (Horni et al., 2016) are specifically designed to incorporate many of these various heterogeneities, making them useful for such modelling. Fellendorf and Vortisch (2000) developed one of the first approaches for the microsimulation of pollutant emissions. A traffic flow model is used to calculate the speed and acceleration of each vehicle at a 1-s frequency, and available engine maps to calculate the emissions. More recently, Kraschl-Hirschmann et al. (2011) coupled the microscopic traffic flow simulator (VISSIM) with a microscopic emissions model (PHEM) to investigate the impact of traffic signalling on emissions. Ma et al. (2015) used the output from a travel-diary survey to build a microsimulation of Beijing, and estimated the CO 2 emissions and possible reductions. Kaddoura and Kickhöfer (2014) developed an agent-based marginal-cost pricing approach for congestion and applied it successfully to a large-scale scenario of Greater Berlin (Kaddoura, 2015). When considering the internalization of congestion costs, a particular contribution of this work was to assign the external congestion costs to the causing agents. In particular, they note that it is simple to calculate the incurred time-loss through congestion for each agent, but much more challenging to map it back to the causing agents. The approach calculates each agent's contribution to the delays on travelled links using a queue-based node-link model including spillback.
In real networks, such an approach would require knowing the location and of every driver connected to a particular incident of congestion, to determine who was affected. Quantifying the monetary value of the delay would then require knowing each affected driver's willingness to pay for a unit of travel time savings, a measure known as the value of travel time savings (VTTS). This is clearly unrealistic as it would involve tracking a large proportion of the population.

MATSim and the Switzerland scenario
MATSim is a powerful tool for performing agent-based transport simulations. A population is represented by a set of agents who try to optimize their daily travel plan over repeated iterations of the model, through a process called re-planning. It can handle scenarios consisting of millions of agents travelling on a city, regional or national transport network. It is designed as a modular event-based framework, where the actions of agents, such as a departure, arrival, link entry or exit, are events which are passed around the framework.
This event-based design makes the traffic flow simulation framework, separate from the re-planning component, well suited for processing individual-level mobility data at the trip level. The traffic flow simulation is based on a first-in first-out queue model where each link is represented as a queue with three attributes: the non-congested (freespeed) travel time t free , the flow capacity c flow and the storage capacity of the link c storage . The link queues are updated typically every second, and agents are moved from one link to the next if the freespeed travel time on the link has passed, enough time has passed since the last vehicle left the link (the inverse of c flow ), and there is enough capacity c storage on the following link. Importantly, an agent who leaves a link prevents all following agents from leaving that link for the time of 1/c flow , and the tracking of agents restricted by c storage on the incoming links allows the consideration of spillback on congestion.
The IVT Switzerland Scenario builds on the work of Bösch et al. (2016) to provide a MATSim scenario for all of Switzerland (Hörl, 2020) with a synthetic population for 2019. It represents a typical working day in Switzerland. As a MATSim scenario, the population consists of individual agents, each with daily travel plans (preferences) and social-demographic characteristics. These agents represent the entire population of Switzerland on a network generated from OpenStreetMap (Haklay and Weber, 2008). The scenario is available in 1%, 10% and 100% samples with respectively increasing runtimes (Hörl, 2020). The simulations here are carried out using a discrete mode choice approach developed by Hörl et al. (2019). The initial travel demand, which comes from census data, is routed along the shortest path based on non-congested travel times. Then, at each iteration, a small fraction of agents are selected for re-planning. Each feasible mode alternative for each agent is routed along the shortest path based on the updated travel times from the previous iteration, and one is selected based on a discrete mode choice model. This process is repeated over several iterations until equilibrium.

Analysis of road transport externalities in Switzerland
Numerous sources are available for the analysis of external costs in Switzerland, including standards, government reports and databases. These sources guide and inform the evaluation of new and existing infrastructure projects. The Swiss Federal Office for Spatial Development (ARE, 2020) produced a report on the external costs and benefits of transport in Switzerland, built on the methodology developed by Ecoplan/Infras (2019). It presents the most recent external cost-benefit analysis for the Swiss transport system, primarily focusing on external environmental, health and accident-related costs. Specifically, external costs for 12 different cost categories are computed, differentiated according to three different perspectives: transport mode (road/rail/air/water, passenger/freight, vehicle type), transport user and heavy vehicles.
For the modelling of road transport pollutant emissions in Switzerland (and other European countries), emission factors are commonly taken from the Handbook Emission Factors for Road Transport (HBEFA) (De Haan and Keller, 2004;Keller et al., 2017). The HBEFA database contains emission factors for a range of vehicle categories and traffic situations, differentiated by emission type, pollutant and year. The HBEFA is the standard for road pollutant analysis in Germany, Switzerland and Austria, and is supported by the European Commission.
The Swiss Federal Office for the Environment (FOEN, 2010) also use the HBEFA to provide a detailed analysis of past and predicted future pollutant emissions, covering road transport in Switzerland from 1990 to 2035. Emissions values are calculated for three emissions types: emissions when the engine is in hot operating condition, cold-start emissions and evaporation emissions. The calculation of these values require both traffic volume data as well as the emissions factors from the HBEFA for each emission type. The  (Keller and Wüthrich, 2016) and then again from 2015 to 2017 (Keller and Wüthrich, 2019). In this study, vehicle hours of delay for Switzerland were estimated and the proportion attributable to heavy vehicles determined. From 2013 onwards, this was achieved by combining and aligning INRIX traffic flow data and traffic demand data from the National Passenger Transport Model. The time lost per road section was calculated by subtracting the free-flow travel time from actual travel time, where traffic jams are considered to occur only when the actual speed is less than 65% of the free-flow speed. This approach only considers flow congestion, and not queuing delays. For the other years, online data from the Swiss Federal Roads Office (FEDRO) counting stations was used. A summary of their results is provided in Table 1. The values provide a useful estimate of delay costs in Switzerland. However, the use of an "atleast" approach will tend to underestimate the lost time and resulting associated delay costs (Keller and Wüthrich, 2016). This is particularly the case for non-motorway road segments, where long road lengths and imprecise speed data can influence results.
For the monetization of externalities, the Swiss Association of Road and Transportation Experts (VSS) has published a series of norms (SN 641 82*: Cost Benefit Analysis for Road Traffic) aimed at guiding the assessment of monetary effects and the cost benefit analysis of transport projects, policies and regulations. Norms SN 641 820 (Basic Standard), SN 641 822a (Travel Time Costs for Passenger Traffic) and SN 641 828 (External Costs) are of particular interest in the context of external cost evaluation. They provide standard values for time costs and willingness to pay per vehicle type and trip purposes as well as standard methods for evaluating the monetary impacts of air pollution and climate impacts.

Limitations of the aggregate values
The values from the Swiss standards are only available as CHF/km or CHF/h. Although some external costs are given under a urban/rural or motorway/non-motorway classification, there is no temporal or spatial variation. One hypothesis of this paper is the following: For private car travel, the variation in external costs is significant, and justifies a disaggregate approach to the calculation of external costs. In the following work, this is done for the calculation of private car emissions and congestion delays. For noise, this was found to be too computationally expensive to do on a national scale, and hence per-kilometer values from the norms are still used. Other researchers have previously identified the usefulness of disaggregate noise models (Kaddoura et al., 2017;Kuehnel and Moeckel, 2020). However, their noise calculations were carried out only on a city level and are based on the German RLS-90 approach (Bundesminister für Verkehr, 1990; Forschungsgesellschaft für des Straßenverkehr (FGSV), 1997), which differs from the norms used in Switzerland.

Methodology
In this section, the methodology for estimating externalities on GPS data is presented. The approach requires that the GPS data be already segmented into trip-stages and labelled with the transport mode used. This can be done with one of many methods. For an overview, see (Zheng, 2015). In the case of this paper, the data was segmented and labelled with the transport mode by the GPS tracking app 'Catch-my-Day' (MotionTag GmbH), developed by MotionTag GmbH (MotionTag GmbH, 2020).
The methodology requires a few static data inputs for the calculation of various externalities. Reference values for both emitted air pollutants and caused congestion are required. For the calculation of emissions, the HBEFA database (version 3.3) is used (Keller et al., 2017). For congestion, average 15-min-interval values of the delay caused by a vehicle present on that link are calculated for each link using a 10% sample from the 2019 MATSim scenario for Switzerland (see Section 2.1). This is done using the approach of Kaddoura and Kickhöfer (2014), described in more detail in Section 3.5.
A multistage pipeline has been developed for estimating car-based externalities on labelled GPS traces using the MATSim framework. The pipeline consists of the following steps, described in more detail below: 1. Cleaning of GPS data 2. Map matching to the MATSim network using Graphhopper 3. Calculation of link entry and exit times 4. Conversion to MATSim events 5. Estimation of externalities on MATSim events 6. Monetization of the externalities More broadly, the pipeline is grouped into two stages: the first creates a series of MATSim events representing the map-matched path of the GPS traces; the second processes those events using the previously mentioned reference values to estimate the generated emissions and delays. Fig. 1 illustrates how data flows through the externalities pipeline. The objects in bold are those developed as part of this paper. Dotted lines indicate data inputs from static sources, and solid lines are the flow of the GPS-based trip data through the model. The lack of flows inside the MATSim framework is intentional, as those modules are built on top of the MATSim event framework. This is discussed further in Section 3.8.

Data cleaning
GPS data accuracy can vary considerably depending on the sensor used, the surrounding environment and even geographical location. Hence, any GPS points not within 200 m of a segment of the Swiss road network are removed before map matching.

Map matching with Graphhopper
To map trip legs to the MATSim network, the Graphhopper (Graphhopper, 2018) map-matching library was modified to support matching to a MATSim network instead of OpenStreetMap. Graphhopper uses a Hidden Markov Model (Newson and Krumm, 2009) to identify candidate links for each GPS point, with an error radius, σ, which in our case was set to 200 m -equivalent to the filtering distance used to exclude GPS points.
An unlimited distance between consecutive points is allowed. The Graphhopper routing engine then identifies the best route between the set of candidate links, where a minimum of two-matched GPS points are required. However, the standard implementation of Graphhopper does not calculate the entry and exit timestamps for each link in the network, which are needed to calculate the time spent and average speed on each link. Additionally, in the absence of high-frequency GPS measurements or additional sensor information, there may be insufficient GPS measurements to pinpoint the entry and exit times for each link. The MATSim compatible version of Graphhopper was been extended to return the entry and exit times for each link, including links where few or no GPS measurements are available.  (a, b) gives the summed travel time over a set of links in the network. A helper function t between(a, b) returns the time needed to travel between projected points and the vertices of a link l, travelling at the freeflow speed for that link: for example, from p ′ l,e to the end of the link; or the start of the link to the first projected point on that link p ′ l,s . In MATSim, the assumptions hold that an agent always starts and ends somewhere on a link. Hence, only the exit time for the first link and the entry time for the last link need to be calculated. Additionally, entry t(l j ) = exit t(l j− 1 ), ∀j = 1..n. As such, the algorithm can be separated into two cases: ({i,…,j,…,k}) where l i and l k are the most recent and next link with P(l k ) ∕ = ∅, respectively else exit t(l j ) = entry t(l j ) + t between(l j , p ′ lj,e ).

Calculation of link entry and exit times
The sequence of links with entry and exit times are then converted to valid MATSim events for each person and date.

Estimation of emission externalities
To estimate the externalities of each trip leg, the generated events are processed using the MATSim framework, extended with two additional modules. The first, developed by Hülsmann et al. (2011) and Kickhöfer and Nagel (2016), applies the HBEFA factors to calculate the emitted pollutant amounts incurred on each link, based on the observed travel speed on that link. The emissions factors are taken from the HBEFA database (version 3.3). To this module a few extensions have also been made. The module was originally designed to work with simulation output from MATSim, where real world boundary conditions (speeding) and data artifacts are not present. Hence, average speeds on each link are now capped at the freespeed of the link. Furthermore, the road types for assigning emissions factors are extracted from OpenStreetMap, rather than a VISUM model, as was done in the original Berlin Scenario. These improvements have been contributed back to the MATSim codebase, in accordance with open-source principles of the MATSim framework.
The HBEFA provides four traffic states, free-flow, heavy, saturated and stop&go, while MATSim considers only two in its queuing model -free-flow or queuing (to exit the link). Hülsmann et al. (2011) align these by assigning the difference between the actual travel time and the free-flow travel time on a link (the congestion) to the HBEFA stop&go traffic state, and the rest to free-flow. In doing so they ignore the heavy and saturated states. However, in the original paper, they also suggest an alternative version, which accommodates all 4 HBEFA traffic states, using the average speeds of each traffic state provided in the HBEFA. In this paper, we implement this method, allowing for all 4 HBEFA traffic states to be considered in the emissions model.
The emissions module outputs quantities in non-monetized terms. These are then converted to monetary damages using the most current norm values for Switzerland derived from the "Nachhaltigkeits -Indikatoren fü Strasseninfrastrukturprojekte" (NISTRA) (Federal Roads Office (FEDRO), 2017), which is itself based on the Swiss Standard SN 641 820 (Swiss Association for Road and Transportation Professionals (VSS), 2013). For this work, the values were revised for the year 2019, and the values used are presented in Table 2. For PM 10 emissions, distinct normative values were available for urban and rural areas. Links in the network were assigned the rural or urban classification based on the Swiss building codes (Federal Office for Spatial Development (ARE), 2017). Links in unbuilt areas were assigned as rural, and all others as urban. The assignment was done based on the midpoint of the link.
The NISTRA does not specify whether its monetization values are average or marginal. However, it is widely recognized that for air pollution costs, the marginal costs are virtually the same as the average costs, as numerous epidemiological studies have shown that the relationship between pollutants and health effects are almost linear (Van Essen et al., 2019).

Estimation of congestion externalities
The calculation of the experienced delay is a simple affair, if one makes the broad assumption that all delay is attributable to other users in the system, and not external causes such as signal control, rogue pedestrians and extraordinary events. However, calculating the true caused delay to other users in the network would require GPS traces for all users of the transport network. As such we use an average model of caused delay from the output of the MATSim scenario for Switzerland. This method gives the average marginal external cost for travelling on each link in a certain time window. The approach of Kaddoura and Kickhöfer (2014) is used to calculate the caused congestion on each link by an agent. The approach has a number of diverging implementations, and in this paper we apply version 3, where the delays caused to each agent are allocated to the agents ahead in the queue until the delay is fully internalized. For each link in the network, the entry and exit times of each simulated agent are stored as a queue of potential delay-causing agents. Each time an agent exits a link with a delay, this queue is iterated through, and each causing agent pays for 1/c flow , the delay they caused on that link, until the delay is internalized. If an agent exits a link without delay, the previously stored queue on that link is reset. Any remaining non-internalized delay is considered to be a result of c storage , which is carried over to the next link closer to the bottleneck and then distributed to the stored queue for that link. In this manner, the delays caused by an agent to other agents on other links in the network can be accounted for. For a specific example of how this algorithm works, the reader is referred to Section 3.2 of Kaddoura and Kickhöfer (2014).
A 30-h MATSim simulation period is used to allow all trips to conclude, and the average delay caused by a vehicle for each link in the network over a set of time windows covering an entire day (24 h) is computed. Let x l,t,a be the delay caused on link l at time t by agent a to all other agents in the network which might have been affected on other links. A l,t is the number of agents who passed through link l in time period t. The average delay caused by travelling on link l at time t is then given by This gives a matrix of dimensions L x (1440/T) where L is the number of links in the MATSim network, and T the size of the time period in minutes. The value of 1440 corresponds to 24 h in minutes. For a trip matched to the MATSim network, it is then trivial to obtain the average caused marginal delay on each traversed link and calculate the average marginal delay caused by the trip. In an ideal world, the time loss caused to each agent in the simulation would be monetized individually based on the VTTS of the affected agent during that trip, before the aggregation was performed. However, as this information is not contained in the MATSim scenario, congestion externalities were monetized using the Swiss reference value of time (VOT), and the monetization factor can be applied to the aggregated values.

Other external costs
Where monetization by discrete link based quantities is not implemented or possible, externalities are calculated directly using available CHF/km values. Ideally, noise emissions from car travel would be calculated based on the surrounding population, other noise emitters and the presence of noise reduction features. The Swiss norms (SN 641 828) provide decibel thresholds, above which health-related costs for the affected persons are to be considered. Using the MATSim scenario for Switzerland in a manner analogous to congestion, noise cost values per link per time could be calculated. However, it was unfeasible within the scope of this project to develop and integrate a nation-wide noise model. Instead, a normative value for noise emissions in CHF/km (differentiated by mode) is used. An in-depth spatially and temporally variant consideration of noise for all motorized modes is left as future work to improve the described methodology.
For walking and cycling modes, there are no pollutants or significant congestion externalities to calculate (at least in Switzerland, and excluding E-bikes). Hence, health benefits and damages are calculated in the pipeline on a CHF/km basis.
The pipeline is designed to support the marginal external cost calculations for other modes for which only per-kilometer values are available. Furthermore, the pipeline is adaptable to support the mapping of public transit trips to links in a MATSim transit network. This would enable the calculation of link-level congestion externalities on public transport, if data on crowding was available. Maibach et al. (2008) estimate crowding externalities to be roughly 50% of the VOT. However, they also note that the VOT varies greatly by transport mode, trip purpose and travel distance. The externality would also depend on the definition of "crowding".

Calibration of the congestion model
Congestion externalities were computed using the MATSim framework, as described in Section 2.1. A 10% scenario for Switzerland for 2019 was first simulated for 40 iterations such as to reach equilibrium in terms of mode shares. Since MATSim is a stochastic simulation, this equilibrium is one within a distribution of possible outcomes. Therefore, the average caused congestion per link per time was computed as described in Section 3.5 over a further 30 iterations, where the agents could only reroute their trips during replanning. Delays only contribute to the average congestion if the corresponding travel speed was less than 65% of the free-flow travel speed on that link, consistent with the methodology used by Keller and Wüthrich (2016).
The median congestion across these additional 30 iterations was then computed and converted to a per-kilometer cost (Fig. 2). During each additional MATSim iteration, 10% of agents are allowed to modify their chosen routes. This occasionally results in multiple agents simultaneously choosing to travel on previously non-congested links in the next iteration, and if this number of agents is large, it might take several iterations before they are randomly selected and routed along other less-congested paths. Thus, these oscillations in route choice can still result in high median congestion costs on a few links at certain times; in the current case, the maximum median cost per kilometer is nearly CHF 4,500. Therefore, the congestion costs per kilometer were capped to the 95 percentile value of the distribution, corresponding to a maximum cost per kilometer of just under CHF 2. After capping, the average cost per kilometer over all links exhibiting congestion is 0.22 CHF/km. A comparison of different capping thresholds is presented in Section 4.3.

Software architecture
To enable compatibility with both MATSim and Graphhopper, the pipeline has been built to run on the JAVA virtual machine. Java version 1.8 or above is required. MATSim version 11.0 and Graphhopper 0.12.0 are used. A script has been developed to divide the run into multiple instances of the pipeline to allow computation in parallel, allowing the expedient generation of results from the pipeline.
The input and output stages of the pipeline are also modular, meaning that data can be read or written from either a database or JSON files, allowing for the integration with different data sources and other models.
The externalities pipeline described in this paper takes advantage of the event-based framework in MATSim. Each module is designed as an EventHandler in a event-listener framework, where a function is called when certain events are fired, such as when a vehicle enters or exits a link. Sequences of these events are used to determine values such as travel times and average speeds on links. These handlers in turn generate new events such as a WarmEmissionEvent, containing the amounts of various pollutants produced by an agent travelling on a link.
On receiving a link-exit event, the congestion module determines the estimated congestion caused on that link based on the link id and exit time. The monetization module processes arrival events -which denote the end of a trip -to tally up all the produced externalities, and compute the monetary damages. All externalities are available on a trip leg level.

Results
Using the output of the Swiss MATSim scenario, which represents a simulated day for a synthetic population of Switzerland, the externalities pipeline is first validated by comparing the computed externalities with emission and congestion estimates from previous Swiss external cost reports. The pipeline is then applied to GPS tracking data collected within the course of the MOBIS mobility pricing study (Molloy et al., 2021) to demonstrate the heterogeneity in external costs that can be observed in the data.

Emissions
To validate the estimation of pollution externalities using MATSim, the total emissions are calculated using a 10% MATSim scenario for Switzerland and compared to the reference values available in the literature. To accommodate the new vehicle registration statistics according to Blessing (2013) and Bianchetti et al. (2016), the personal vehicle fleet composition was adjusted to match the mileage-weighted fleet composition projected by the Swiss Federal Office for the Environment (FOEN, 2010) for 2020 (Table 3). Emission values are estimated for the following pollutants: CO 2 , CH 4 , N 2 O, PM 10 (exhaust and non-exhaust) and NO x . Fig. 3 shows hourly CO 2 emissions estimated from the MATSim scenario. As expected, these correlate with typical commuter patterns: two distinct peaks during the morning and evening rush-hour, low emissions in the morning and at night and higher values around noon.
Hourly CO 2 emission values for Switzerland in kilotons (metric). Pollutants other than total CO 2 are omitted as their values are negligible in comparison.
The MATSim computed emissions values are then compared to those estimated by FOEN for 2020. Since MATSim simulates a single workday, the MATSim emissions values are scaled such as to match the total yearly travel distance by car reported by FOEN for 2020. Table 4 compares the total estimated emissions values for both MATSim and FOEN in metric tons per year. Deviations are likely due to the fact that emissions factors depend on the exact type of petrol or diesel engine.

Congestion
Contrary to emissions, congestion caused cannot directly be estimated and assigned to the causers from GPS traces alone, since information on how many other drivers were present on the road at that given moment is lacking. Hence, MATSim is used to estimate the marginal congestion externalities during a typical workday.
To assess the suitability of this approach, we compare the total calculated congestion costs over the course of a 10% simulation of the MATSim scenario with those computed by Keller and Wüthrich (2019). As a calibration step to account for unresolved oscillation effects in the MATSim scenario, the per-kilometer congestion costs are limited to the 95% percentile (see Section 3.7). The total congestion costs from the scenario are scaled to be equivalent to the total yearly travel distance reported by Keller and Wüthrich, for motorways and non-motorways respectively. The comparison between the congestion costs in the MATSim scenario for Switzerland and the reference values is presented in Table 5 for different percentile thresholds, with the 95% percentile threshold in bold font. The corresponding per-kilometer congestion cost for each threshold is also reported.
The total yearly congestion values and thus the resulting costs estimated in MATSim for motorway and non-motorway road segments are lower respectively higher than those estimated by Keller and Wüthrich. This may be due to several factors. On the MATSim side, the model simulates passengers vehicles as well as trucks during a typical workday, and therefore does not account for seasonal variations in travel demand nor extraordinary circumstances such as large events, accidents and holiday traffic which also impact yearly congestion. Unlike emissions, which mainly depend on the total distance travelled, congestion is highly dependent on the actual demand patterns, road infrastructure and travel behaviour. Thus, any deviations in the mode shares, route choice or road capacities affect the computed congestion values. In addition, the grouping of the MATSim estimates by road segment type is based on OSM data, which might differ from the classification used by Keller and Wüthrich. Finally, Keller and Wüthrich state that they have taken an "atleast" approach in estimating delays and that the values for non-motorway segments are highly underestimated. A combination of these effects likely contributes to the underlying cause of the deviation between the estimates.

Sensitivity analysis
Taking the 95% percentile provides a good calibration against the overall total costs calculated by Keller and Wüthrich. However, the MATSim congestion cost estimates are sensitive to the chosen threshold. The sensitivity is evident for both motorway and nonmotorway road types. We propose that this sensitivity stems from the long tail in the distribution of the per-kilometer congestion cost values (see Fig. 2), caused at certain bottlenecks in the MATSim network for Switzerland, where route-choice oscillations result in an all-or-nothing switching behaviour between routes. This oscillatory behaviour remains an open problem. It may be that once this is solved, the thresholds are no longer needed. The sensitivity analysis suggests that before applying this methodological approach in other study regions, individual characteristics of the scenario and network need to be taken into account through a calibration step as was performed above.

Capturing the heterogeneity in external costs
As noted in the introduction, is it important to capture both the temporal, spatial, and individual variation in external costs when assessing the policy implications of proposed measures to tackle emissions and congestion. Using a set of over 1.6 million car trips collected from 3,680 participants during the MOBIS study, the external costs for each trip were calculated using the methodology presented in this paper. Fig. 4 demonstrates the heterogeneity in external costs observed in the GPS data, as opposed to the available average per-kilometer reference values taken from Table 6. The range of the external costs is smaller for pollution emissions than for congestion. The mean values are still consistent with the reference values. Using the map-matching computed with graphhopper, the motorway-share of the trip is computed to allow the application of values for highway and non-highway kilometers separately, rather than just an average.
In Fig. 5, the hourly variation between the two methods is compared, on a trip level. In subplot (a), a roughly constant split between highway and non-highway travel throughout the day results in a nearly constant average hourly external cost with the ARE method, even during the middle of the night. On the other hand, average values with the MATSim method vary between 0 and 0.05 CHF/km depending on the time of day. The small spike around hour 24 is due to agent behaviour in the MATSim scenario, and could benefit from further calibration. Subplot (b) shows the total externalities caused over the observation period. Hence, the increase in traffic during the peak-hours does lead to some temporal variation in the total emissions with the ARE method, but much less than with the MATSim method.
In Table 7, a summary of the external costs for different periods of the day are presented for the MATSim method. The morning and evening peaks cover 7:30am to 9:30am and 4:30 pm to 7:30 pm respectively. Here one can see that while the maximum marginal  external costs are high (i.e. 15 CHF for one trip in the evening peak), the 95% percentile is much lower. The average external cost per kilometer in the morning peak is 1.63 times higher than the daily off-peak average, and the evening peak is 2.22 times higher. The pertrip values have similar ratios. The minimum values are not shown, being zero for all time periods.

Discussion
As Verhoef (2002) stipulates, it is important to consider external costs on an individual level. Not only is this important in understanding the spatial and temporal distribution of the external costs within a transport network, it is also an important step towards understanding the potential impact of pricing policies. The method presented here shows how an agent-based transport microsimulation can be applied to real-world trip data to calculate external costs at an individual trip level. This approach captures much more variation than the use of average values.
The validation of the model in Section 4 identifies some discrepancies between the reference values and the output of the proposed method, resulting from the various limitations of both approaches. However, an exploration of the trip-level heterogeneity on the realworld data indicates that the mean per-kilometer averages for both congestion and emissions are very close to the reference averages.
The main insights follow from an exploration of the temporal variation captured by the congestion model. The mean values hide much of the temporal variation in external costs, and there are implications for policy analysis and transport planning. While average external cost values for congestion and emissions are currently used for the cost-benefit-analysis of new transport projects in Switzerland, the analysis in this paper clearly shows that the respective total societal congestion costs and benefits would be distorted for policies aiming at reducing peak-hour congestion. One such policy might be a mobility pricing scheme, where the use of the average congestion values would lead either to an ineffective price structure, or the benefits of the scheme being undervalued.
The influence of the emissions model on the variation in external costs is less pronounced, but an effect is still evident. In particular, the external costs of PM 10 emissions are a large component of the estimated externalities, and these vary greatly depending on the age,  Keller and Wüthrich (2016) and Keller and Wüthrich (2019). b Federal Roads Office (FEDRO) (2017) and Rexeis et al. (2013). size and engine type of the vehicle (Rexeis et al., 2013). These effects could be considered using the vehicle data reported by the study participants. The results indicate that reliance on the average per-kilometer values of external costs neglects the large variation around the average per-kilometer values, which undervalues the use of more efficient vehicles. There are some limitations to the approach used. The performance of the map-matching step is reliant on the quality of the GPS input, the segmentation and mode detection. In a small minority of cases, the map matching is not consistent with the route chosen, which may lead to an over-or underestimation of the external costs. The assumption is also made that the owner used their reported vehicle, and not another one. Additionally, the MATSim model used to estimate the link-level external costs does not take into account other variations in demand that may affect the delay caused by a driver. These include accidents, changes in road conditions, and dayto-day variation in traffic. The external cost of scheduling delays is also not incorporated into the calculation of external costs, though this would be possible using the MATSim framework. As Arnott et al. (1990) note, the scheduling delay costs can be equal to the costs from congestion delays. The model also relies on the assumption that the MATSim scenario accurately reflects average conditions on the network, although thresholds were needed to account for outliers resulting from the oscillating behaviour in the scenario. In future work, observed travel times and delays from real-world GPS data could be used to make link-level adjustments to the model on a dayto-day basis to account for these variations.

Conclusion
This paper presents a methodology for estimating the externalities on GPS traces using the MATSim framework. A MATSim scenario for Switzerland is used to provide aggregate estimates of caused congestion for 15-min time periods. Pollutant emission factors are taken from the HBEFA. The suitability of the MATSim scenario for this purpose is evaluated by validating the Switzerland-wide externalities against published reference values. The agent-based aspect of MATSim allows for a much finer calculation of externalities by taking into account the heterogeneity in both the population and travel behaviour. The validation step indicates that the aggregated congestion model calculated on the MATSim scenario for Switzerland is suitable for this purpose, with some caveats. Although the total external costs of congestion obtained from the scenario for motorways and other roads are lower respectively higher than the reference values when considered separately, the combined estimates are within 8% of the published values. An analysis of the heterogeneity in the external costs shows that the approach captures important variation in the external costs for different externalities, around mean values which are consistent with the published values for Switzerland. These results indicate that a proper consideration of the individual, spatial and temporal variation in external costs, and not just the mean values, is important to the analysis of potential transport policies and projects.