COMPUTING TRAVEL TIMES FROM FILTERED TRAFFIC STATES

. This article experimentally assesses the inﬂuence of sensor data rates on travel time estimates computed from ﬁltered traﬃc speed estimates. Using velocity data obtained from GPS smartphones and inductive loop detector data collected during the Mobile Century experiment near Berkeley, CA, and an evolution equation for average velocity along the roadway, an estimate of the traﬃc state is obtained via ensemble Kalman ﬁltering. A large–scale batch of computations is run to produce estimates of traﬃc velocity with varying degrees of input data, and instantaneous and a posteriori dynamic travel times are compared to travel times recorded using license plate re-identiﬁcation. We illustrate that dynamic travel time estimates can be computed with less than 10% error regardless of the data source, and that existing inductive loop detector data can signiﬁcantly improve the accuracy of travel time estimates when GPS data is sparse.


Introduction.
1.1. Objective. Probe data will likely become ubiquitous in the not too distant future, due to the rapid expansion of consumer generated probe data from cellphones, personal navigation devices, and intelligent vehicles. As the use of probe data for traffic monitoring increases, so does the need to understand the benefits and trade-offs between GPS data and conventional data sources. Yet, a complete analysis of the trade-offs between probe data and fixed sensors is difficult, because the value of the data from any sensor (GPS equipped probe vehicles, inductive loop detectors, etc.), is dependent on the specifics of the sensing technology, the method used to process the data, and the specific traffic monitoring objective in question.
As navigation devices in vehicles become more common, either on smartphones or through integrated in-vehicle systems, the need to estimate travel times between any two arbitrary points on the network in real-time will become increasingly important. One approach to solving this problem is to construct explicit travel time estimators for all origin destination pairs on the network. Obviously, this approach quickly becomes intractable as the size of the network grows to scales relevant to commercial traffic monitoring companies. A computationally efficient alternative is to estimate the traffic speed throughout the network. Then, with the speeds on each link known, travel time estimates can be computed as requested by the navigation devices.
The latter approach to travel time estimation motivates our work, which is to study the following question. To what degree can GPS probe data act as a substitute for conventional traffic monitoring technologies such as inductive loop detectors, for the purpose of estimating travel times? Specifically, we focus our attention on estimating travel times by integrating various data volumes from inductive loop detectors and GPS equipped probe vehicles into a velocity flow model equivalent to the Cell Transmission Model [11,12], using an estimation technique known as ensemble Kalman filtering (EnKF). A preliminary version of this work was presented in conference form in [23].
This article emphasizes the experimental performance of a flow-model based estimator using data collected during a one day field experiment known as Mobile Century [17]. The data set collected during this experiment is unique because of the large number of GPS equipped probe vehicles representing 2-5% of the traffic flow, the dense coverage of working inductive loop detectors on the experiment site, and the availability of travel time data obtained from video license plate reidentification. Thus, although the results presented in this article are still limited in geographic scope and in time, to the best of the authors' knowledge, they are based on the most comprehensive publicly available GPS data set to date [30].
1.2. Related work. Several field experiments have been conducted to assess the applicability of cell phone-based measurements for traffic monitoring [1,2,3,8,22,25,28,29], including data generated from cell phone towers, which produces less accurate vehicle position and speed measurements compared to GPS. Bar-Gera [3] compared several months of network data from cellphones to inductive loop detector data on a 14 km freeway segment in Israel, and found them to be in good agreement. Liu et al. [22] evaluated a different network-based cell phone system in Minnesota, and compared travel times to license plate re-identification, and found the system generated results with varying accuracies. A summary of the major network-based cell phone experiments to date can be found in Liu et al. [22].
Several studies have also been conducted to assess the trade-offs between inductive loop detector data and data collected from GPS equipped probe vehicles. In Kwon et al. [20], it is shown that annual estimates of total delay, average duration of congestion, and average spatial extent of congestion can be made with less than 10% error by using either inductive loop detectors placed with half-mile spacing, or by using probe vehicle runs at a rate of about three vehicles an hour. Approximately four to six days of data is needed for reliable estimates from either data source.
The work of Herrera et al. [16] compares a nudging algorithm and a mixture Kalman filtering algorithm to examine how the addition of probe vehicle measurements sampled at a fixed time interval can decrease errors in estimating traffic velocity. On a 0.4 mile stretch of roadway, sampling 5% the traffic at 150 second intervals with inductive loops at both ends of the domain leads to a 16% improvement over the inductive loop detector data alone. The article also uses the Mobile Century experiment data to compare three scenarios of time-based sampling of probe vehicles, finding that probe data outperforms inductive loop detector data for estimating traffic velocity if a sufficient number of measurements can be obtained from probe vehicles. This work uses the same data set from Mobile Century, but we now consider nearly one thousand scenarios to compare probe data to inductive loop detector data.
The works [4,7,10] are also closely aligned with the present work. In Cristiani et al. [10], experimental mobile sensor data is used to calibrate a flux function and to determine an initial condition for a traffic flow model instantiated for a stretch of highway in Rome, Italy. The model is then simulated forward in time to make accurate travel time estimates. Blandin et al. [4] use forward simulation with boundary data obtained from sensors to compare estimates of the traffic state from a calibrated scalar traffic model with a calibrated 2 × 2 phase transition model [9]. Instead of open loop simulation with a calibrated model, the traffic prediction in [7] is updated by assimilating historical data in the forecast.
1.3. Methodology overview and organization of the article. In order to assess the trade-offs between velocity data collected from GPS smartphones and velocity data obtained from inductive loop detectors, it is necessary to define the process by which the data is transformed into an estimate of travel time. In this article, we rely on a velocity estimation algorithm developed as part of the Mobile Millennium project [30]. The algorithm combines velocity measurements from GPS smartphones or inductive loop detectors with a model of traffic evolution, using a technique known as ensemble Kalman filtering (EnKF) to produce an improved estimate of the velocity field, from which the travel time is computed. The resulting travel time computed from this process is then compared to the travel times recorded from the license plate re-identification video data.
With the data processing algorithm determined, we create a number of scenarios in which the volume of probe data and number of inductive loop detectors made available to the processing algorithm are adjusted. For example, this allows us to compare the accuracy of computing travel times when all of the probe data is made available, to travel times which are computed when only some of the probe data is available, to travel times when some probe data is available and some inductive loop detector data is available. In this way, we can quantify the trade-offs of various amounts of data from probes and inductive loop detector data in terms of increased or decreased accuracy of the computed travel times.
In order to describe and quantify what probe data is made available to the travel time processing algorithm, we detail two metrics of importance to probe data, namely the penetration rate and the sampling rate. The penetration rate is defined as the percentage of cars on the roadway reporting probe data compared to the overall traffic flow, including the vehicles which do not send data. In addition to increasing the number of measurements, as the penetration rate increases, the sample of vehicles which generate measurements are more likely to be representative of the total traffic flow. The sampling rate refers to the frequency at which data is collected from the probe vehicles, and can be used to increase or decrease the number of measurements made available for estimating travel times from the same vehicles. The probe data collection technique used in this work collects data from probe vehicles at fixed points in space using a technique known as Virtual Trip Lines (VTLs) [18] invented by Nokia. By decreasing spacing between the VTLs, the probe vehicles will send more measurements, with smaller spacing between measurements.
In order to modify the amount of data obtained from inductive loop detectors, the number of inductive loop detectors which are made available to the processing algorithm is adjusted. Because this article focuses on a real highway, it is not possible to modify the location of the inductive loop detectors. Instead, given a fixed number of inductive loop detectors to include for a given scenario, we select the specific loop detectors such that they achieve as uniform of a spacing along the highway as is possible.
The work presented in this article summarizes the findings of a large scale computational study in which an ensemble Kalman filtering estimation algorithm is used to produce traffic estimates, and to characterize the dependency of the solution in the amount of sensing data used for the computation. The contribution of the article is thus the method used to assess the potential gains provided by probe and inductive loop data.
The remainder of this article is organized as follows. The key features of the processing algorithm used for velocity estimation are given in Section 2, and the methods for computing travel times from the velocity field are described. The data collected from the Mobile Century experiment is detailed in Section 3.1, and in Section 3.2, the techniques for generating scenarios with various amounts of input data from probe vehicles and inductive loops are presented. In Section 4, the results of nearly one thousand scenarios using various amounts of inductive loop detector data and probe data for travel time estimation are analyzed. Finally, the discussion in Section 5 summarizes the results.
2. Algorithm for estimating travel times. The processing algorithm used in this work is based on a velocity estimation algorithm developed in the Mobile Millennium system. The algorithm takes velocity data from inductive loop detectors and probe vehicles as input, combines the data with a physical model of traffic evolution, and produces an improved estimate of the velocity along the corresponding stretch of roadway. Using this improved estimate of velocity, an estimated travel time is computed using an instantaneous method and a dynamic method, to compare against the travel times recorded from video data. A brief overview of this process is described in this section.
2.1. Mobile Millennium highway traffic estimation algorithm. The velocity estimation algorithm developed in the Mobile Millennium system is based on a discretization of a traffic flow model known as the Lighthill-Whitham-Richards (LWR) partial differential equation [21,24] which describes the evolution of traffic density on the highway. In its discrete form, this model is also known as the Cell Transmission Model [11,12]. In order to simplify the velocity estimation problem, this model is transformed into an equivalent velocity evolution equation [27] operating on a 30 second time step. The complete mathematical details of the employed traffic velocity evolution equation, and the fusion of velocity measurement data with the evolution equation using ensemble Kalman filtering (EnKF) are presented in [27]. We give an overview of the model and filtering algorithm next.
2.1.1. Velocity traffic dynamics. The seminal LWR equation, as proposed in [21,24], reads: ∂ρ(x, t) ∂t where ρ(x, t) and q(x, t) respectively denote the density and flow at location x and time t. Additionally, we denote v(x, t) the velocity field on the highway, and assume the velocity can be specified as a function V of the density only. This allows the flux q to be defined as a function of the density: The widely used triangular flux function assumes a constant velocity in free-flow and a hyperbolic velocity in congestion: where v max , ρ max , ρ c and w f are respectively the maximum velocity, maximum density, critical density at which the flow transitions from free-flow to congested, and the backwards propagating wave speed, respectively. Because the triangular velocity function is not strictly monotonic in free-flow, it cannot be inverted.
In order to use the triangular model in a velocity setting, we approximate it by the Smulders velocity function [26], with a linear expression in free-flow and a hyperbolic expression in congestion: For continuity of the flux at the critical density ρ c , the additional relation ρc ρmax = w f vmax must be satisfied. The Smulders velocity function (3) can be inverted to obtain the velocity as a function of density: where v c is the critical velocity: v c = V (ρ c ). The LWR PDE is discretized using a Godunov numerical scheme to obtain a discrete density evolution equation. After discretization, we can apply a variable change on the densities using the inverse velocity function V −1 . Let K and i max be two positive integers, we discretize time (indexed by k) and space (indexed by i) in K time steps of length ∆T = T K and i max + 1 space cells of length ∆x = L imax+1 . Then, according to the Godunov scheme, the velocity value v k+1,i of cell i at time step k + 1 can be computed as: where g (v 1 , v 2 ) represents the numerical flux between consecutive cells with respective velocities v 1 and v 2 . In the case of the Smulders model, we obtain: We note that the evolution of the velocity field at each discrete point on an edge is well defined by (5) and (6), except at the boundary points v k,0 and v k,imax . At these boundaries, the equations contain references to the ghost cells v k,−1 and v k,imax+1 , which are points which do not lie in the physical domain. The values of v k,−1 and v k,imax+1 are given by the prescribed boundary conditions to be imposed on the left and right side of the domain respectively. Extension of the model to networks follows a standard approach [15], and is detailed in [27].

Traffic state estimation. Given the velocity field
T on all space cells at time step k, the velocity field at time k + 1 can be obtained by applying: where f represents the update algorithm as described in (5), (6), (7) and (8). The term w k ∼ (0, Q) is the white, zero mean model noise with covariance Q. At every time step k, the measurements y k are related to the traffic state through the observation equation: where H k ∈ {0, 1} z k ×(imax+1) encodes the location of the z k data sources that sent measurements during that time interval, and ν k represents the measurement errors, which are assumed to follow a zero mean distribution with a measurement error covariance matrix R k . Note that H k depends on the time due to the fact that the location where measurements are recorded changes over time [27]. The recursive state estimation problem for linear time invariant systems is commonly solved using the Kalman filtering [19] algorithm. New measurements can be integrated optimally at every time step, using only the estimate at the previous time step. An a priori estimate at time k given data through time k − 1, denoted v k|k−1 , is computed by evolving the a posteriori estimate v k−1|k−1 through a linear evolution equation. Then, measurements at time k are collected and used to construct a corrected estimate v k|k by minimizing the trace of the posterior error covariance matrix P k|k .
Due to the nonlinearity of (9), a Kalman filter cannot be used, and an ensemble Kalman filter (EnKF) approach is chosen instead, as described in [13] and [14]. The EnKF circumvents the need for linearizing the evolution equation to fit the Kalman filtering framework, and instead approximates the evolution of the error covariance matrix P k|k through an ensemble (sample) approximation. The algorithm is detailed in Algorithm 1.
A few remarks on the performance of the velocity estimation algorithm described above are in order. First, it is noted that the algorithm was designed as part of the Mobile Millennium system, where it is not possible to track probe vehicles for privacy reasons, and thus no continuous GPS records from probes are assumed to Algorithm 1 Ensemble Kalman filter 1: Initialization: Draw J ensemble realizations v j 0|0 (with j ∈ {1, · · · , J}) from a process with a meanv 0|0 and covariance P 0|0 . 2: Model prediction: Update each of the J ensemble members according to the CTM-v forward simulation algorithm f : Then update the ensemble meanv k|k−1 and covariance P k|k−1 according to: 3: Measurement update: Obtain measurements, compute the Kalman gain G, and update the estimate of the state: 4: Return to 2.
be available for the estimation algorithm. In practice, it is expected that the performance of the estimation algorithm could be improved when tracking of individual probe vehicles is allowed. Second, it should be noted that the flow model requires some historical flow information to calibrate the model. In this study, historical inductive loop detector data from PeMS was used to estimate constant mainline inflow and outflows, and thus all results presented use inductive loop data in this way.
Next, the methods for computing the instantaneous and dynamic travel times from an estimated velocity field are described.

2.2.
Methods for computing travel times. The position of a vehicle p(t) moving at the average speed of traffic evolves according to: where v(x, t) = V (ρ (x, t)) . Thus, the dynamic travel time τ dyn (t 0 ) of a vehicle starting a trip at time t 0 at position p 0 is computed by solving (16) for the vehicle position, then solving: where L is the distance of the trip. Note that in general, the dynamic travel time may be infinite if the velocity drops to zero and remains zero indefinitely, although this is not an issue in practice.
A limitation of (16) is that it relies on knowledge of the velocity field up to τ dyn into the future. This means that the velocity field requires a short term estimate into the future, which further introduces computational overhead. Moreover, when the future forecast is an open loop model prediction without data to correct the estimate, the error in the velocity estimate increases. It is also worth noting that the dynamic travel time for individual vehicles is measurable, for example via license plate re-identification.
As an alternative, a common simplifying assumption is to assume the velocity field does not evolve during the travel time computation. This simplification gives rise to the instantaneous travel time, τ inst . To compute the instantaneous travel time, the position of the vehicle starting at p 0 at time t 0 is assumed to evolve according to: Then, the instantaneous travel time can be computed by solving (17) for the vehicle position, then solving: The instantaneous travel time has several advantages, most notably that it does not require the overhead of computing the velocity field forward in time, which can be intensive for large network predictions with an accurate discretization. This comes at the cost of decreased accuracy when the traffic state is changing quickly. Moreover, the issue of an infinite travel time is a more serious issue for the instantaneous travel time, as it can be generated if at any point the velocity drops to zero. To keep the estimates bounded, a minimum velocity greater than zero can be used during the integration of the vehicle's position.
Since the velocity field is the a posteriori filtered estimate, the instantaneous and dynamic travel times are approximated by integrating through the discrete velocity field v k|k,i . Specifically, the velocity field in (16) or (17) is approximated by: where i = p(t)/∆x with · denoting the floor operator. When computing the dynamic travel time, the time index k is updated according to k = t/∆T , and fixed as k = t 0 /∆T for the instantaneous travel time.
Because the estimated velocity field is piecewise constant in each cell over each time step, the solution of (16) or (17) can be computed exactly. Assuming the current position of the vehicle is in cell i, the vehicle updates its position each ∆T by computing the time τ required to reach the boundary between cells i and i + 1 when traveling at the speed v k|k,i . If τ < ∆T , the vehicle reaches the boundary within the time step. The vehicle travels at the speed v k|k,i until the boundary is reached, and travels at the speed v k|k,i+1 afterward. If the boundary is not reached, the vehicle travels at v k|k,i for the full time step. The position update is summarized as After ∆T , the position of the vehicle is updated, k is incremented, the new cell index i is computed, and the process is repeated.
Note that higher resolution numerical reconstructions are available for the dynamic travel time reconstructions [5] if discretization errors become significant. 3. Experimental data.
3.1. The Mobile Century experiment. In this section, the key features of the data collected during the Mobile Century experiment are presented. For a complete description of the experiment, the interested reader is referred to [17]. The Mobile Century field experiment was a one-day test in the San Francisco Bay Area which collected GPS data from cell phones in probe vehicles, inductive loop detector data, and travel time data from license plate re-identification video data. The experiment took place on February 8th, 2008, and involved 100 probe vehicles equipped with Nokia N95 cell phones which repeatedly drove a stretch of the I-880 freeway near Union City, CA, generating 2,200 vehicle trajectories.
The experiment site is also covered with 17 working inductive loop detector stations which feed measurements into the PeMS system [6]. The inductive loop detectors record the sensor occupancy and vehicle counts every 30 seconds, which is processed by a Mobile Millennium filtering algorithm in order to obtain the 30 second average velocity at the sensor. At 5 minute intervals, the PeMS system produces an estimate of the 5 minute average velocity at the sensor, which is shown in Figure 1b for the northbound traffic. The locations of the inductive loop detector stations are shown in Figure 1a.
Finally, as part of the experiment, high definition video cameras were temporarily installed to record license plates of northbound traffic. The travel times recorded from the re-identified vehicles traveling northbound is shown in Figure 2 most re-identified drivers experienced heavy evening congestion with travel times increasing to 15-20 minutes.

3.2.
Algorithms for data selection. To assess the trade-offs between different amounts of probe data and inductive loop detector data for the purpose of estimating travel times, we algorithmically select different subsets of loop and GPS probe data from the Mobile Century experiment [17], and use these subsets as inputs to the estimation process described in the previous section. We now describe the scenarios which modify the type and amount of the data which is made available for estimation, and the selection criteria which are used to generate the scenarios.

Selection of inductive loop detector data.
In order to modify the number of inductive loop detector stations which are made available for computing travel times, a simple selection criterion is developed. Specifically, given a fixed number of stations to include, the loops are selected in order to minimize the variance of the distance between consecutive sensors, resulting in approximately uniformly spaced sensors.
We consider a stretch of highway of length L, starting at x = 0 and ending at x = L, with n inductive loop detector stations located at x 1 , x 2 , · · · , x n , as shown in Figure 3. Let S i denote the spacing between sensor i and i + 1. In order to treat the boundaries without explicit knowledge of sensors outside the domain x ∈ [0, L], it is assumed that only half of the first inter-station spacing S 0 and of the last interstation spacing S n lie in the domain of interest. The weighted average spacing between the n sensors is given by: where the first and last spacings have a weight 1 2 , since only half of these spacings actually lie within the [0, L] domain. Note that the average spacing is independent of the specific locations of the sensors x i , and consequently cannot be used as a selection criterion.
Instead, we use a selection criterion which explicitly takes the uniformity of the inter-station distances S i into account. This is achieved by minimizing the variance σ 2 of the inter-station spacings S i , given by: Again, the first and last spacings have a weight 1 2 , since only half of these spacings actually lie within the [0, L] domain.
In practice, rather than minimizing the variance σ 2 , it is convenient to minimize an equivalent loop detector placement criterion denotedS: which shares the same minimizer as σ 2 . The best set of m inductive loop detector stations is then given by: where |U | represents the number of elements in the set U . The resulting selections for the inductive loop detector stations are shown in Table 1.
In the case when the chosen inductive loop detector stations are uniformly spaced within the section of interest, the criterionS is equal to the average spacingS. BecauseS serves as a lower bound forS, the difference betweenS andS indicates the degree of non uniformity of the sensor spacings caused by the fixed set from which the sensors are selected. Table 1 shows the difference between the inductive loop detector placement criterionS(U * (m)) and its lower bound, the average inductive loop detector spacingS(m), is small, indicating that the sensor spacing is relatively uniform.

Penetration rate for probe data.
A few remarks about probe penetration rates are important to make, before criteria to modify the penetration rate are discussed. In general, the penetration rate is difficult to determine for probe vehicles specifically because it depends on the number of equipped probe vehicles, the total traffic flow, and the evolution of the traffic flow in space and time. Typically, only the total number of equipped probe vehicles is known to probe data providers. Similarly, the total traffic flow can only be estimated from counts recorded by inductive loop detectors at predefined locations. Finally, because the evolution of the traffic flow Table 1. Inductive loop detector selection results. Given a number m, the selection algorithm returns the set U * (m) of m inductive loop detector stations which minimizes the inductive loop detector placement indexS(U * (m)). The labels in U * (m) correspond to the labels of the inductive loop detectors in Figure 1a. is not under the control of the probe vehicles, it is nearly impossible to a priori specify a penetration rate which is both uniform in space, and uniform in time.
Because of the inherent difficulty in specifying the penetration rate a priori, we instead elect to directly modify the number of equipped probe vehicles as a proxy for modifying the penetration rate. The number of equipped probe vehicles in this study varies from 0% to 100% of the 2,200 Mobile Century vehicle trajectories, increasing by increments of 10%. Over the eight hour experiment, this corresponds to an average rate of probe vehicles between 27.5 veh/hr and 275 veh/hr. At a probe rate of 275 veh/hr, the 20 minute average penetration rate at the center of the experiment site ranges between 1.5% and 3% (Figure 4b). When a subset of the vehicle trajectories is required, the subset is determined by selecting the trajectories at random before the simulation.

3.2.3.
Space-based sampling. In order to modify the number of measurements used from each probe vehicle trajectory under spatial sampling, the number of locations where measurements are collected are modified. The locations where measurements are obtained are encoded through the placement of virtual trip lines (VTLs), which can be viewed as virtual geographic markers which trigger vehicles to send measurements when the vehicle trajectory intersects the VTL. A complete description of the VTL sampling strategy is described in detail in Hoh et al. [18].
Because the VTLs are virtual, it is possible to place them anywhere on the experiment site. The determination of optimal VTL placement is complex, so instead we elect to place the VTLs uniformly across the experiment site. The number of VTLs, denoted by n VTL , tested in our scenarios varies from 9 VTLs to 99 VTLs, increasing by increments of 10 VTLs. This corresponds to an average spacing between 0.79 to 8.68 VTL/mi. Note the number of VTLs used on the experiment site is significantly higher than the number of inductive loop detector stations. This is possible because unlike inductive loop detector stations, the marginal cost of virtual trip lines is small. The number of probe vehicle measurements used in each simulation is shown in Figure 4a.  3.3. Summary of scenarios considered. Below, we give a summary of the various combinations of input data used for computing travel times in this work.
• Number of inductive loop detectors. Nine sets of inductive loop detector data, ranging from scenarios with zero to 16 stations, increasing by increments of two. • Number of probe data measurements. The amount of probe data is modified in two ways.
-Penetration rate. Eleven penetration rates are considered, ranging from scenarios with no probe data, to scenarios when 100% of the 2,200 probe vehicle trajectories are used, increasing by increments of 10%. This corresponds to an average rate of probe vehicles between 27.5 veh/hr and 275 veh/hr. -Number of measurements per vehicle. Ten sets of VTL locations are considered, ranging from scenarios with 9 (0.79 VTL/mi) to 99 VTLs (8.68 VTL/mi), increasing by increments of 10 VTLs. In total, 917 scenarios are created by instantiating scenarios with all combinations of the 9 sets of inductive loop detector data sets, 11 probe penetration rates, and the 10 VTL sets. In the remainder of this section, we describe the specific algorithms which select the data for each scenario. The scenarios tested are summarized in Table 2.  4. Results and discussion. In this section, we present the results of the 917 runs with varying amounts of probe and inductive loop detector data. We also vary the type of travel time computed (instantaneous or dynamic). First, the quantification of error is described, then the computational results are presented.

Error metric.
Because validation data is available for dynamic travel times via license plate re-identification, an error metric is used to compare the velocity estimation algorithm output that has been converted to travel times with the travel time measured from video recordings. By using the travel time error as a performance metric, estimation algorithm results can be compared with the results obtained when using different types and quantities of the input data.
Since the license plate re-identification data provides a distribution of individual vehicle travel times (see Figure 2), we define the true travel time as a one minute moving average of the recorded travel times. Figure 2 also shows the division of the experiment into four time periods that represent the different phases of the traffic during the experiment. These periods are (i ) the morning accident, were travel times are decreasing as an incident clears, (ii ) a free flow period during the middle of the day when travel times are low, (iii ) a congestion building period before the evening rush hours, and (iv ) full congestion during the evening rush hours. Because of the different traffic conditions present in these time intervals, in addition to computing the error across the full day, the error is also computed for each time interval.
The travel time error is computed as follows. Let K be the number of estimates given in a period for which the error is to be computed, with each estimate indexed by k. Let τ v (t k ) be the mean travel time from the video data for vehicles entering at time t k , and let τ inst (t k ) and τ dyn (t k ) be the estimated mean travel time computed with the instantaneous and dynamic methods at time t k , respectively. The mean absolute percent error (MAPE) for the travel time computed with the instantaneous method is: while the MAPE for the travel time computed with the dynamic method is computed similarly.
4.2.1. Implementation. The estimates were computed using the existing Mobile Millennium [27] highway model summarized in Section 2, and implemented in the Java programming language. The model was run 917 times with various data inputs. Each run consisted in the computation of the mean speed field evolution and computation of both instantaneous and dynamic travel time every 30 seconds, from 10 am to 6 pm. The runs took 320 CPU-hours, and were distributed on 8 servers equipped with 2.2 GHz dual core AMD Opteron CPUs and 8 GB RAM, which reduced the computation time to 39 hours. Despite taking nearly two weeks of CPU time, we note that our computational implementation is actually quite efficient, which enables computational experiments of this scale. Specifically, each simulation requires propagation of 100 ensembles through the discretized network flow model, and correcting the model predictions with the EnKF algorithm repeatedly over a period of eight hours, before the instantaneous and dynamic travel time estimates are computed. In total, each eight-hour numerical experiment takes 20 min to complete, which means it runs about 24 times faster than real-time.

4.2.2.
Using only inductive loop detector data in the model. The first analysis of the traffic estimates is based on the results obtained when using inductive loop detector data as the only input to the model. These results give us a baseline for the comparison between probe and loop detector data. A total number of 17 runs were conducted based only on the inductive loop detector data, by varying the number of sensors according to the selection algorithm in Section 3.2. Both instantaneous and dynamic travel times were computed. The labels of the inductive loop detector stations used in the estimation are presented in Table 1 as a function of the number of stations selected (see also Figure 1a).
The results of these runs are shown in Figure 5. The subfigures show the estimation error broken down by time of the day, as defined in Figure 2. During the morning accident (Figure 5a), the dynamic travel times converge to estimates with 7% error, while the instantaneous estimates remain above 20% error. The instantaneous and dynamic estimates have between 6% and 7% error during the free flow and congestion building periods (Figure 5b and 5c), and 13% error during the full congestion period (Figure 5d), with the instantaneous and dynamic estimates performing similarly.
The number of inductive loop detector stations used tends to have a positive impact on the quality of the estimate when less than eight inductive loop detector stations are used. Note that the curve is not monotonically decreasing. This is because when only a few sensors are deployed, the error becomes highly dependent on the placement of the sensors. It is expected that an optimal sensor placement  (Table 1). However, using data from more than eight inductive loop detector stations does not improve the quality of the estimates. If fewer than three inductive loop detectors are used, the estimation error is unacceptably high, at some points reaching as high as 100% error.

4.2.3.
Using only VTL data in the model. The second part of the analysis consists of the travel time estimates obtained when using VTL data only. The changing parameters of the input data are the number of VTLs deployed on the experiment site and the rate of the probe vehicles used to produce speed measurements at the locations of VTLs. The estimation errors of the travel times obtained with the dynamic method are shown in Figure 6.
In each of the time periods, estimates of the travel time can be achieved with less than 6% MAPE, with sufficient probe vehicles and virtual trip lines. However, when more than 137.5 veh/hr are used with more than 2.54 VTL/mi, only small improvements in the accuracy of the estimates can be achieved. When compared with inductive loop detectors, using 137.5 veh/hr and 2.54 VTL/mi performs as well as the estimates using more than eight inductive loop detector stations during the morning accident, free flow, and congestion building periods, but has less than half the error of inductive loops during the full congestion period.

4.2.4.
Fusing VTL and loop detector data. The dynamic travel time estimation error using both VTL and loop detector data simultaneously is assessed in Figure 7, where the change in the dynamic travel time MAPE due to the addition of data from six inductive loop detectors is computed. The results shown are a representative subset of all the runs performed when mixing the two data types.
At low probe data rates during the morning accident, free flow, and congestion building periods, adding inductive loop detector data increases the accuracy of the dynamic travel time estimates. For example, during the morning accident (Figure 7a), with a probe rate of 27.5 veh/hr and a VTL spacing of 0.79 VTL/mi, adding inductive loop detector data reduced the error from 29% to 8%. During the full congestion period, the dynamic travel time estimate accuracy decreased when inductive loop detector data was added at low probe rates (27.5 veh/hr). This is likely due to the fact that the estimates based on virtual trip line data only were unusually accurate, even performing better than simulations with more probe vehicles. At higher penetration rates (above 137.5 veh/hr) adding data from the six inductive loops has negligible effect, increasing or decreasing the accuracy only slightly. The exception is during the free flow period, when the MAPE increased (between 0.05 and 0.08) even at high probe rates, when 0.79 VTL/mi were used. The errors in the free flow period are magnified due to the small base travel time, which is under 10 minutes, and it is in fact not constant during the period (see Figure 2). Moreover, it is clear from Figure 1 that there is an area of heavy congestion around postmile 26 even during the free flow period, which is difficult to capture correctly with sparse sampling. can be determined at any time on any route using the speed estimates, and used as a proxy for dynamic travel times. As was shown for the inductive loop detector data in Figure 5b and 5d, instantaneous and dynamic travel time estimates are very similar when traffic conditions change sufficiently slowly. The same holds when estimating travel times from probe data. By looking at the instantaneous travel time errors in Figure 8a, an interesting result can be seen. The results suggest that adding more probe data results in an increased travel time estimation error. However, this result is expected, and can be explained by focusing on the scenarios (in Figure 8a) in which the penetration rate of the probe vehicles is low and no loop detectors are used. Here, the instantaneous travel time estimate performs well, and may seem like a valid estimate of the true travel time during the incident. However, this gives a misleading indication of the quality of these travel time estimates. The good performance of the instantaneous estimate is caused by the fact that the current state of the traffic (speed field) is very poorly captured in the underlying scenario and the speed of the traffic is heavily overestimated. This causes the instantaneous travel time estimate to perform as a good predictor of the future traffic conditions, namely, as a predictor of the clearing incident. When the number of probe measurements increases, the speed field estimate is captured more accurately, and the increased error in the travel time estimate is caused by the instantaneous approximation. 5. Summary. In this study, trade-offs between velocity data collected from GPS smartphones in probe vehicles, and velocity data obtained from inductive loop detectors, for the purpose of computing travel times on a stretch of highway were studied.
This work used experimental probe data obtained from the Mobile Century field experiment and loop detector data from PeMS. The measurements were combined with a mathematical traffic model in a highway traffic estimation algorithm using a data assimilation technique called ensemble Kalman filtering, developed as a part of the Mobile Millennium project. The results of the algorithm were compared against the true travel times experienced by the drivers, obtained through license plate re-identification. A number of scenarios were created in which the volume of the probe data and number of inductive loop detector stations available for the estimation algorithm could be adjusted.
There are several practical results which were obtained from the extensive experimental studies including: • Achieving 10% error for dynamic travel times. In this study, it was found that the dynamic travel time estimates can be achieved with less than 10% error when using a flow model with data assimilation, by using either inductive loop detector data, probe data, or a mixture of both inductive loop detector data and probe data. Moreover, the estimates from virtual trip line-based probe data can achieve a higher degree of accuracy when all available probe data is used compared to estimates from inductive loop detectors when all inductive loops on the experiment site are used, although in general the performance is similar. • Minimum loop detector spacing for travel time estimation. In this study, using data from more than eight inductive loop detector stations (average spacing 0.83 miles) did not give additional benefit in the travel time estimation. The error remains constant between 6-13% depending on the time of day, regardless of the added loop detector stations. • Diminishing travel time accuracy improvement. When sampling probe vehicles at a rate of 137.5 veh/hr with more than 2.54 VTL/mi, increasing the number of probe measurements by adding more probe vehicles or additional trip lines causes only a small improvement on the travel time accuracy. • A mixture of probe and loop detector data in travel time estimation. It was found that when complementing loop detector data with probe vehicle data, better estimates for travel times are obtained, especially at low penetration rates. For example, if using loop detectors spaced more than 2.11 miles apart, probe data can give over 50% increase in the travel time accuracy. The above findings may assist practitioners to identify the value of flow model based traffic monitoring systems for computing travel times.