Impact on Network Performance of Probe Vehicle Data Usage : An Experimental Design for Simulation Assessment

Probe-based technologies are proliferating as ameansof inferring traffic states. Technological companies are interested in trafficdata for computing the best routes in a traffic-aware manner and they also provide real-time traffic information with certain temporal accuracy.This paper analyses and evaluates how data provided by a fleet of probe cars can be used to develop a navigation service and how the penetration rate of this service affects a set of city-scale KPIs (Key Performance Indicators) and driver KPIs. The case study adopts a model-driven approach in which microscopic simulation emulates real-size fleets of probe vehicles that provide positions and speed data. What is noteworthy about the modelling behaviour is that drivers are segmented according to their knowledge of network conditions for selected trips: experts, regular drivers, and tourists. The paper presents and discusses the modelling approach and the results obtained from an experimental Barcelona CBD model designed to evaluate the penetration rates of probe vehicles and route guidance. An analysis of the simulation experiments reveals remarkable links among city-scale KPIs, which—from a multivariate point of view—is a novelty. A simulation-based framework for results analysis and visualization is also introduced in order to simplify the simulation results analysis and easily visualize OD paths for driver segments.


Introduction
Mobility is a key component in urban areas and should be addressed as part of a complex system, since it is a nonisolated component that strongly interacts with all the other components.
There is wide consensus about the brand new family of services that will be enabled by advances in intervehicular communications.From the early considerations of equipped vehicles as a network of mobile sensors [1] to more recent surveys [2], the capabilities for effectively monitoring traffic conditions have been studied.In urban areas, vehicles equipped with onboard sensors are expected to reach high concentrations in the near future.See [3] for a survey of works and applications related to traffic state monitoring.
In a parallel line, researchers have been motivated by advancements in vehicle positioning (onboard sensors) and in wireless communications that support Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) applications.They have investigated how the collected data could be used to generate travel time information (see [4,5]), and this research could be considered complementary to the estimation of links or path travel times from GPS probe vehicles.Summarily speaking, two complementary approaches are explored in the literature: [6] presents an overview of the statistical approaches; and this was later improved upon by [7] that combined GPS data with data from other travel time sources.A variant of these statistical models that exploit GPS data is analysed in [8] to identify the network paths whose travel times are estimated in [9].An academic approach relying on traffic flow theory has been adopted by [3,10] for the estimation of fundamental variables in arterials or urban motorways using Probe Vehicle Data (PVD) and Edie's definitions [11].V2I applications under incident conditions have been developed and route guidance strategies evaluated by [12].
In the context of exploiting mobile data, a popular trend [13,14] explores the use of handover information in cellular networks to estimate the traffic level of service, although they note important limitations on the overall performance.
The use of PVD has been investigated in some research projects such as Mobile Millennium or CarTel [1,15,16], which included a pilot traffic-monitoring system using the GPS in cellular phones to gather traffic information, process it, and distribute it back to phones in real time.Products and companies performing mobile crowdsourcing (Google Traffic, INRIX, and TomTom Traffic) allow for real-time data gathering.Machine learning applications for estimating travel time delays due to road work from GPS data have been proposed by [17].
Route guidance impact on travel time, safety, and environment have been intensively investigated [12,[18][19][20][21], usually in relation to their benefits under incident conditions and while simultaneously conducting quantitative assessments of the potential impacts of real-time routing guidance and advisory warning messages to guided vehicles.Some other authors have analysed different types of reactive [22] and proactive route guidance [23] policies using simulation, but the elaboration of traffic state estimation is simplistic.
The aim of this paper is to present a simulation-based platform that allows modelling several penetration rates for a fleet of PVD vehicles that feed travel time estimation between Points of Interest (POIs) and several penetration rates of route guidance for connected vehicles while considering driver behaviour and route choice models.Travel time estimates between POIs from PVD are critical inputs in Kalman filtering formulations, which some authors have addressed in the context of dynamic OD matrix estimation [24]: travel time availability and reliability from PVD guarantees a simplified linear formulation approach [25].The resulting analysis platform can be easily adapted to any microscopic traffic simulator with the required extended functionalities, and it has been tested using a model of the Barcelona Business District developed and calibrated in the past for previous projects [3].The conceptual framework is described first, and the section following that describes the simulation experiments, in which a large fleet of PVD vehicles are accounted for according to penetration rates (V2I) and additional factors considered in the experimental design.The next section provides an analysis of the results based on a set of network and driver KPIs, which are jointly considered by applying multivariate analysis techniques.The paper ends with our conclusions.

Simulation Testbed Framework
Figure 1 shows the simulation testbed framework, which is comprised of an Execution Controller, a Traffic Simulation Module, a Results Processing Module, and a Visual Analytics Module.The visualization and analytics tool has been implemented using the Shiny [26] web application framework for R, which simplifies the development of simulation results analyses by incorporating it into interactive web applications.The methodological framework can be implemented with any traffic simulation software by using the utilities that enable it to integrate-via API-the user-defined applications that implement the system's required functions, thus guaranteeing the transferability of the approach.
The Traffic Simulation Module was programmed using API extensions, and its components are as follows: emulation of PVD; auto demand split into vehicle classes according to driver type definition; estimation of lane and link travel times from data collected from PVD emulation; and customization of route choice models for guiding drivers according to the implemented navigation strategies.The assessment of navigation strategies (route guidance) is not the goal of the current work, although it is the aim of a follow-up project.
The simulation results analysis is performed by two fundamental and independent components: Results Processing and Visual Analytics modules.The Results Processing in Figure 1 covers all preprocessing of data automatically.In this way, the Visual Analytics Module can use the results of the traffic simulation environment directly without the need for a manual update.

Traffic Simulation Module.
The traffic simulation component includes a microscopic traffic simulator and a set of custom modules and functions that were developed using API extensions.In addition to the functionalities already described at the beginning of this section, it also generates time-dependent system tables as well as link and lane traffic data by driver class.
In this work, an Aimsun [27] model was available from previous projects.Aimsun functional architecture and the interaction libraries (Aimsun API) support the extended modelling utilities that are required.
The exchange of information between the API applications and the microsimulator can be made at every simulation step (0.5 sec).The programming languages in which Aimsun provides its API are C++ and Python.While Python is used to easily collect some of the data, C++ is needed for emulating the probe vehicles due to performance reasons.

Probe Vehicle Data
Emulation.This work assumes that the V2I and V2V technology is on board in probe cars.In a previous study by some of the authors, field test data from a fleet of 3 probe cars is discussed for Barcelona's CBD [3].Collected and filtered data were used to calibrate the emulation of PVD by an API included in the Traffic Simulation Module (see [3]).Only basic vehicle sensors and no frontal camera data were used previously in the probe vehicles.The API-extension for emulating PVD depends on onboard sensors and technological specifications.
The aim of the current work is to emulate the "real-time data" of probe car data that is used in connected car guidance systems within different levels of probe car penetration.To this end, a reduced set of sensors for probe cars has been  assumed, thus allowing data for vehicle position and speed at each simulation step.

Execution Controller.
A simulation experiment consists of  replication executions that are launched from a controller, and each of them is set up by preprocessing and is then postprocessed at the end of the simulation in order to collect the results generated from each replication.KPIs are grouped into either network KPIs or driver type KPIs.

Results Processing and Visual Analytics Modules.
The results of the simulation executions are stored on a server.The Results Processing Module (see Figure 1) contains the set of processes that are in charge of finding new execution results from recently performed simulation replicas and applying to them the postprocessing that the Visual Analytics Module needs.
The Visual Analytics Module in Figure 1 covers the visualization of input data, the model details, and the simulation results from any replication execution in the experimental design.It also consists of two implementations: that of the visualization application and the one that serves access to the web.The visualization is implemented with R-Shiny framework [26].For performance reasons, C++ is used in some of the application processes.

Modelling Issues
Drivers are split into six groups according to guidance availability and their knowledge of network and traffic conditions.This work emphasizes driver behaviour modelling issues, which have not been considered in related papers in the literature [12,18,19,22,23,28,29].

Driver Behaviour.
The first group is expert drivers, i.e., those who know the network and historic traffic conditions for the selected horizon of study.They are modelled with route choice selection and proportions by following experienced travel times where both satisfy dynamic user equilibrium (DUE) [30] and assume a historic demand pattern.DUE paths and proportions are loaded into the simulation environment from a precalculated binary file.The second group is regular drivers, i.e., those with knowledge of the network and historic traffic conditions for recurrent trips (50% randomly selected) but who use the main streets based on free-flow for nonrecurrent trips (50%).Expert and regular drivers exhibit driving characteristics related to the car-following model, such as reaction times, desired speed, and acceptance of speed limitations.According to the calibrated profile of Barcelona drivers, reaction time (1.0 s), reaction time at stop (1.35 s), reaction time at traffic light (1.35 s), speed acceptance, and Minimum Intervehicular Distance are assumed to be truncated normally distributed, with the former having mean 1.1, sd 0.1, min 0.9, and max 1.3 and the latter having mean 1.0 m, sd 0.3 m, min 0.5 m, and max 1.5 m.The third group is tourist drivers, who have limited knowledge of network and traffic conditions and use -shortest paths based on free-flow conditions and main streets.Tourist drivers behave roughly with a 25% increment in reaction times, means, and limits (same standard deviation), and they strictly adhere to the speed limits, while Minimum Intervehicular Distance is truncated normally distributed with mean 1.25 m and sd 0.1 m, between 0.75 and 1.5 m.Finally, guided drivers constitute a design-dependent proportion for any expert, regular, or tourist driver class, and they are modelled with a 100% acceptance of navigation advice.

Travel Time Estimation.
The selected approach is based on the first model presented in [5] for estimating lane travel times.The time-window concept, which can be viewed as a rolling horizon for updating lane/link travel time estimates, is critical for understanding the approach.The time-window interval is a design factor in the conducted simulation experiments because the penetration rate of PVD leads to lane-(link-) level vehicle data availability.Hence, as the timewindow interval is increased, the percentage of lanes (links) with available data and data units also increases.
Position and speed data provided by the PVD can be emulated for each simulation step (0.5 sec) or at any multiple of the interval.Having PVD provided every 2 sec has been finally assumed for the experimental study described in this paper in order to limit the computational burden of executing replications.Thus, detection interval is a configurable parameter set to 2 sec for the current study.Many alternative possibilities could be considered to develop travel time estimators, but that is not the aim of this work and it is considered a topic for further research.The particular proposal implemented here considers three cases for every lane: (i) Case 1: "No PVD in the last window": in this situation, the lane travel time value is the same as the lane travel time of the most recent time-window.If no data is available for this lane, then the lane travel time is set to free-flow travel time.(ii) Case 2: "PVD from just one car in the last window": to compute the travel time in this case, consider the following: (a)  1 : it is estimated travel time for the fraction of the lane until the first detection of the vehicle. 1 is computed by dividing the length of this fraction by the probe vehicle's instantaneous speed at the first observation in this lane for the considered detection interval, which gives a time measure.(b)   : this value is the difference in seconds between the first and last detection intervals.(c)  2 : this is computed in the same way as  1 .
We divide the distance from the last observation at the beginning of the next section in the trajectory by the probe vehicle's instantaneous speed in the last observation (detection interval) of the time-window.Finally, the travel time of the considered lane is iii.Case 3: "PVD available from more than one car in the last time-window": in this case, the weighted travel time of the lane is (1/) ∑  V=1  V  V , where  V is the travel time of a car computed in the same way as in Case 2, and  V is the fraction of the section length in [0, 1] where V probe car is detected in the lane.
Hence, the weights  V handle the lane changing behaviour of vehicles.In other words, if a vehicle travels in the slower lane at first and then switches to the faster lane and travels for some time before switching back to the slower lane and then exiting the segment, then 3 independent PVD detection types are assumed, which thus prevents underestimation of the travel time for the slower lane and overestimation of the travel time for the faster lane.
The estimation of link travel times from PVD is obtained as the mean of travel times for streams.Lane travel time estimates are combined into stream travel times according to the turning movements that are allowed.

Navigation Strategies.
A navigation application is modelled as being available to a common percentage in driver classes, since PVD can be used to estimate travel times in network links and thus travel times between OD POIs.Route guidance assessment is not the aim of this paper (see [20] for an interesting discussion about the topic); thus, three routing strategies are implemented according to the possibilities of the AIMSUN [27] platform: Stochastic Route Choice (SRC) modelling accounts for the estimated instantaneous -shortest travel time paths, as defined by either user-defined link costs (link travel time estimates from probe car data) or user-defined stream costs (lane travel time estimates are combined and together they constitute stream travel times).Also, 100% rerouting across the trip is enabled (every time-window interval 1.5, 3, or 6 min) in order to account for dynamic and instantaneous travel-time-based route choice. is set to 3 in all the experiments.A proportional SRC is calibrated to determine the probability of selecting a route for a -shortest travel time path that is calculated using instantaneous travel times.Thus, choice probability   of a given alternative path  is where CP  is the cost of path .The proportional parameter is set to  = 1.2 after calibration in order to consider the probability as inversely proportional to path costs.
The key point is that a dynamic (real-time) routing strategy for connected cars (guided cars) is applied from PVD-derived travel times.Travel time estimates used in -shortest path calculations might depend on either lanebased travel time estimates or overall link-based travel time estimates, both of which rely on PVD sent to a centralized subsystem.
Travel times between POIs can be inferred from OD path travel times and route choice proportions.They can then feed Kalman filtering formulations to estimate dynamic OD matrices, as proposed by the authors.

Design of Experiments
The selected scenario is the Barcelona CBD, known as "L'Eixample" (see [3] for details), which comprises 7.46 km 2 and 250,000 inhabitants.The horizon study is 1 h, accounting for 42,500 trips.Passenger car demand is modelled as 15 min time-sliced demand whose origin-destination pattern reproduces the 9-10 h morning period in L'Eixample.The model includes the description of the 50 bus-routes operating in the area and accounts for frequencies and stops for boarding/alighting.(ii) guidance penetration (GP factor): percentage of cars that are connected cars whose route choice decisions follow those advised by a Navigation Tool fed by PVD: (a) base level 0%, (b) alternative levels: 10%, 20%, 30%, 70%, 80%, 90%, and 100%; (iii) demand pattern (DP factor) into 4 levels-0%, 10%, 20%, and 30%-referring to a perturbation of the historic demand pattern in OD pairs belonging to the fourth percentile trip distance (according to Manhattan distance): they account for 42,500, 44,600, 46,860, and 48,600 trips, respectively: (a) base level: 0% means historic demand pattern, (b) alternative levels: 10%, 20%, and 30% increments; (iv) probe vehicle penetration percentage (PVD factor), modelled as common to any driver type into 4 levels: 0%, 10%, 20%, and 30%: base level is 0%; it indicates route guidance based on free-flow travel times; an additional Ground Truth level consisting of travel time estimates based directly on "simulated Ground Truth" was also included in some initial experimentation; (v) time-window length (TW factor) which is the rolling horizon interval considered for the estimation of traffic variables from PVD: (a) base length: 3 min, (b) alternative lengths: 1.5 min and 6 min (TW is not affected when 0% PVD is set); (vi) navigation strategy (NS factor) models driving recommendations based on either lane-level or link-level PVD when PVD is available: base level is lane-level, but it is not affected when 0% PVD is set.
The critical KPI, from a driver satisfaction point of view, is considered to be mean travel time (min), also for researchers interested in the assessment of travel times between POIs inferred from PVD.A detailed analysis on the base scenario for all design factors found that when  = 5 replications, this facilitates a global 5% relative precision in mean travel time (min) at 95% confidence for any driver type, while the greatest absolute error was about 1/3 min (Tourist).Table 1 shows absolute/relative errors at different confidence levels for driver types when 5 replicas are considered.Thus, the base scenario is set at factor levels: TD factor 40-50-10, GP factor 0%, DP factor 0%, and PVD factor 0%. Therefore, TW and NS levels are irrelevant.Running the full factorial design is unfeasible for computational reasons, since 3,072 × 5 = 15,360 replications would be needed.Therefore, the first set of experiments (first round) was constrained in order to identify nonaliased factor main effects according to the Fedorov algorithm [31] for optimal designs: 29 experiments were given (thus, 145 replications were executed, each one taking around 2 h on an Intel Core i7-4790 CPU (frequency of 3.6 GHz), 4 cores, 8 GB DDR3 memory, and Windows 8.1 (x64 system)).
A second round of simulation experiments for the base level 40-50-10 in TD factor and 3 min for time-window (TW) factor was launched to quantify the most significant factors found in the first round of simulations.In decreasing order of importance, demand pattern (DP), PVD penetration, guidance penetration (GP), and navigation strategy (NS) factors are found.The second round design consisted of 140 new replications.

Results and Discussion
Network KPIs are affected by design factors, when either gross effects or net effects are considered (i.e., elaborated from the linear model for each global network KPI in all design factors).A gross effect factor indicates the factor impact when it is considered alone, and this is the mean of KPIs at different levels of the selected factor.A net effect for a factor means the effects of all other factors-except for the selected factor-are taken into account before calculating the net effect for each level of the selected factor.A heatmap representation has been chosen to summarize the gross and net effects of design factors on network KPIs (see Figure 2).Clearly, demand pattern (DP) and guidance penetration (GP) strongly affect all network KPIs.The gross effects of driver class configuration (TD) impact total travel time and emission KPIs.The net effects of PVD, NS, TW, and TD factors decrease when the other factors are considered, while the net effects of the DP and GP factors remain the same or are even magnified for some KPIs, such as those related to emissions and total travel distance.There is a reduction in the net effects of DP on the mean travel time for covering one km (mtt.s.km) and the mean delay by km (mdelay.s.km) once the remaining factors are controlled for.Dendrograms of KPIs and design factors are grouped from bottom to top according to the most similar KPIs.They start by assigning each item to its own cluster and proceed to find the closest (most similar) pair of clusters and then merge them into a new single cluster.DP and GP effects are similar on the factor side, although on the KPI side similarities arise between emission KPIs, travel time (mtt.s.km), and delay (mdelay.s.km) for covering one km.
Network KPIs are numeric variables with different scales and internal correlation; thus a normalized PCA (principal component analysis) is applied, which explains almost 95% of the total inertia (variability in KPIs) in the first factorial plane.Each axis in the plane models a "hidden variable" that combines the original ones.The first and second axes explain, respectively, 56% and 39% of the total variability.On the other hand, the projection of the cloud of replicas and KPIs onto the first factorial plane reveals the meaning of hidden variables (see Figure 3(a)).
The first factorial axis (see Figure 3(a)) is a size axis that is positively associated with the most contributive variables, such as the total travel time (ttt.h),density, and the CO2 and NOx emissions, as well as DP factor (total demand).In the opposite direction of mean speed, this axis clearly represents quantity of trips (total demand), and it is dominated obviously by the DP factor.Once the total number of trips is controlled for, the network performance is represented on a second axis that is orthogonal to the first one.The network throughput flow (mflow) and the total travelled distance (ttdis) are positively correlated, indicating an increase in total distance when throughput flow increases.This total travelled distance is also more positively related to fuel consumption than to CO2 and NOx emissions.On the negative side of the second axis, there is also a positive correlation with density when we look at the mean delay (mdelay.s.km) and the mean travel time (mtt.s.km) needed to cover a km.
According to the overlapped representation of ellipses around the centres of demand pattern levels: low increments over historic demand are located to the left of the interpreted size axis 1; high increments go to the right; and the largest variability range changes from axis 1 to axis 2, thus increasing congestion.Clearly, the second axis is a hidden variable that  indicates quality of service in terms of congestion (better in the positive part of axis 2).Once the demand pattern factor DP has been clearly identified, the rest of the factor levels are located in the diagram to gain interpretability on the quantity-quality hidden variables plane.
In Figure 3(b), the meaning of the axis quantity-quality is reinforced for the second round of simulations, which has more homogeneous KPI results in terms of variability.Clearly, the second axis is a latent congestion-level axis where delay and mean time for covering one km (mdelay.s.km and mtt.s.km) are 180 degrees opposite of the throughput rate.Once the demand factor DP has been clearly identified, the rest of the factor levels are located in the diagram to gain interpretability from the quantity-quality latent plane (Figure 3(a)).
For the second round of simulations in the base level of driver type configuration (TD), the meaning of the quantityquality latent plane is reinforced (Figure 3(b)).Blue and yellow arrows in Figure 3 from left to right join the gravity centres for DP level ellipses from low to high demands.
Marginal net effects of mean speed and fuel consumption have been analysed for design factor levels in Figure 4.It is noteworthy that a guidance penetration (GP factor) increment up to 30% benefits mean speed, but the mean speed tends to decrease for guidance levels over 30%, since guidance strategies are merely reactive.Nevertheless, as demand increases (DP factor), mean speed decreases almost linearly.However, the fuel consumption KPI tends to increase as either GP or DP increases.
Clearly, increasing the guidance penetration (GP) has a positive effect on performance in terms of mean speed (up to a 30% limit), but also on fuel consumption KPIs, as shown in Figure 4; and lane-level guidance outperforms the results of link-level guidance for the two selected KPIs in Figure 4.
A diagram in the style of a macro fundamental diagram [32] is presented in Figure 5(a).It considers flow, density axis, and contour curves for speed, and it concerns the first round of simulations, with replications grouped according to unsupervised clustering techniques based on factorial space distances.Five clusters are identified by hierarchical classification rules.Note that there are two clusters (3 and 4) at the break point of congestion: one is over the curve indicating a large throughput (network performance, cluster 4); the second (cluster 3) is mostly under the curve, which indicates lower performance (throughput).The difference between these two groups relies on the 30% DP with 70%, 80%, and 90% GPs in cluster 4 over the curve, while cluster 3 points exhibit 20% DP and low GP levels.Cluster 5 belongs to 30% DP, with no situations in which serious damage to network performance occurs due to guidance being provided.In Figure 5(b), simulations in cluster 2 have 20% DP, so congestion is found for GP over 30%.Routes that are experienced according to recurrent conditions (historic demand) are not valid when an increasing demand appears; therefore, the reactive navigation strategies tend to maintain the level of service by increasing total travel distance and fuel consumption.Otherwise, the level of service decays and NOx and CO2 emissions increase.
Finally, Table 2(a) presents interesting figures for the means of speed and trip travel time (according to driver KPIs) in different scenarios for the second round of simulations.No guidance is the reference for each class of driver and DP level.Under base demand (0% DP), all drivers tend to increase their speed moderately when probe vehicle penetration grows (guidance is available for a variable part of the population according to experimental design), but nonguided expert and regular driver speeds are better than those for guided drivers (since recurrent congestion conditions are present).As more demand is injected into the network (DP 20% and 30%), guided versus nonguided expert differences increase in favour of guided ones, but even nonguided vehicles from any class benefit from incremental PVD penetration.In Table 2(b), the mean trip travel time for guided vehicles is always less than trip travel time for nonguided vehicles when demand increases (20% and 30% DP), when low PVD penetration is present (10% PVD) and the overall benefit within each driver class is enhanced as PVD increases.Nevertheless, as PVD increases, nonguided drivers benefit from shorter mean travel times.A detailed analysis of the results suggests that reactive navigation strategies should be considered in detail, because longer distance routes are produced for guided vehicles and, therefore, speed is higher for guided vehicles while their final trip travel times are not reduced since they follow longer routes.

Conclusions
The research carried out was based on an approach consisting of a general framework and simulation architecture for emulating and evaluating penetration rates of probe car fleets and navigation services for a subset of drivers.The contribution of this paper relies on a detailed simulation of driver classes.Reactive navigation strategies were calculated according to travel times estimated from PVD, and they have been shown to be advantageous for guided and nonguided cars up to a certain level of around 30% of guided vehicles for any PVD penetration.Nevertheless, reactive guidance increases the level of service at the expense of increasing guided trip lengths (the overall trip travel time decreases), thus increasing fuel consumption.This is unhelpful in terms of sustainability.Additionally, when demand increases and the level of service for drivers decreases, it is worth using route guidance for a fraction of the vehicles because this reduces NOx and CO2 emissions (despite the longer distances and consumption).We can definitively say that any assessment of mobility services must consider several KPIs, since a simple increment in driver speed or a reduction in travel time does not automatically revert positively in terms of  sustainability (fuel consumption and emissions reduction).In recurrent traffic conditions, navigation devices are not suitable for expert drivers, but those drivers do benefit from their being used by the other cars.Furthermore, because travel times involved in guidance have been estimated from a fleet of PVD and even their lowest tested penetration rate is 10%, the overall effect on network KPIs is positive.Hence, after extensive analysis of the results obtained for several KPIs, it can be concluded that OD travel times between POIs inferred from PVD appear to be reasonable inputs for simplifying dynamic OD matrix estimation formulations that are being developed by the authors in an ongoing work (also proposed in [24]).Finally, PVD penetration should represent a nonnegligible percentage of drivers, since 20-30% of the results are consistently better for any KPI at the network level than those obtained for a 10% PVD penetration rate (a fleet of 4,250 probe vehicles in the base scenario).
(i) Free-flow -shortest paths when no probe fleet is available (ii) Stochastic Route Choice: lane-based, according to instantaneous traffic conditions (lane travel times) as inferred from data provided by PVD (iii) Stochastic Route Choice: link-based, according to instantaneous traffic conditions (link travel times) as inferred by PVD.

3. 4 .
Collected KPIs.Default statistics in traffic microsimulation platforms are usually very rich.Statistics have been collected every 90 s and stored in an SQLITE database for each replication.Driver KPIs collected for each expert, regular, and tourist driver type-either guided or nonguided-are as follows: (i) Mean travel time to cover 1 km (s/km) (ii) Mean speed per vehicle (km/h) (iii) Mean delay while covering 1 km (s/km) (iv) Mean travel distance per vehicle (km) (v) Mean travel time per vehicle (min).Network KPIs are global statistics (all driver classes, buses included) and, over the whole simulation horizon, they are as follows: (i) Total travel distance (km), ttdis (ii) Total travel time (h), ttt.h (iii) Fuel consumption (l), fuelc (iv) Total CO2 emissions (kg), co2 (v) Total NOx emissions (kg), nox (vi) Mean travel time to cover 1 km (s/km), mtt.s.km (vii) Mean delay time while covering 1 km (s/km), mdelay.s.km (viii) Density (veh/km), density (ix) Mean speed (km/h), mspeed (x) Mean flow (veh/h) (throughput measure), mflow (xi) Throughput rate (%) completed trips divided into total demand, thrputrate.

Figure 2 :
Figure 2: Gross (a) and net (b) effects of design factors on network KPIs.All replicas.Colour proxy developed from (partial) Spearman correlation.First round of simulations.

Figure 3 :
Figure 3: Biplot of principal component analysis of network KPIs.90% confidence ellipses around demand levels.(a) All configurations-first round of simulations and (b) only base driver class.Second round of simulations.

Figure 4 :
Figure 4: Speed (a) and fuel consumption (b).Network KPI net effects on design factor levels.All driver class configurations.First round of simulations.

FlowFigure 5 :
Figure 5: Flow-density-speed diagram with replications grouped by unsupervised clustering techniques.(a) First round of simulations.(b) Second round of simulations.

Table 1 :
Relative (%) and absolute errors (min) for long trips (Manhattan distance origin to destination centroid over 3 km) when 5 replicas are considered in the base scenario at 90%, 95%, and 99% confidence levels.

Table 2 :
Mean speed (top) and mean travel time (bottom) for driver type 40-50-10 expert-regular-tourist (second round of simulations) according to DP and PVD factors.