Uncertainty quantification in a macroscopic traﬀic flow model calibrated on GPS data

The objective of this paper is to analyze the inclusion of one or more random parameters into the deterministic Lighthill-Whitham-Richards traﬃc ﬂow model and use a semi-intrusive approach to quantify uncertainty propagation. To verify the validity of the method, we test it against real data coming from vehicle embedded GPS systems, provided by Autoroutes Trafic .


Introduction
Macroscopic traffic flow models consisting in (systems of) partial differential equations are used to simulate traffic flows on road networks since decades [25].Yet, these models are usually fully deterministic, and the coupling with real data poses severe difficulties, which require advanced data assimilation techniques (see e.g.[29] and references therein) and may result in poor prediction outcomes.
In this paper, we focus on the basic Lightwill-Whitham-Richards (LWR) first order model [17,22], augmented with random variables in the velocity function and the initial condition to account for real data uncertainty.This model is specifically designed to cope with the traffic data set we were provided, which consists of floating car data coming from embedded GPS devices.For alternative stochastic traffic flow models, we refer the reader to [4,7,14,16,23,24,27].
Several stochastic methods have been proposed in literature to evaluate uncertainty propagation in stochastic PDE models.The so-called non-intrusive methods, like Monte Carlo or stochastic collocation [18], allow to use the underlying deterministic code but suffer of slow convergence rate and curse of dimensionality.On the other side, intrusive methods, like polynomial chaos expansion [21], require deep modifications of the deterministic simulation code and are not suitable for discontinuous solutions.In this work, we choose the semi-intrusive approach proposed by Abgrall and Congedo [3].The underlying idea is to extend the spatial computational domain to the probabilistic components and to compute conditional expectations of the flux in the probabilistic direction.To evaluate the expectations of the flux, we use a piecewise polynomial approximation computed using a reconstruction method.This polynomial reconstruction is the same used for space finite volume methods, except that the measure is no longer the standard Lebesgue measure but the probabilistic one.Compared to the above mentioned approaches, the Abgrall-Congedo method requires minor modification of the deterministic code and ensure rapid convergence.It can be applied to any probability distribution function, in case of correlated random variables and it is suitable to discontinuous solutions.
We test the model against real data collected in the sector between Antibes and Nice of the French highway A8, la Provençale.This section is very busy and is suitable to study congested traffic.For this analysis, we rely on preliminary data treatment and model calibration preformed in [6].
The paper is organized as follows.Section 2 describes the data set.In Section 3 we recall the deterministic LWR model and the corresponding numerical scheme used later.In Section 4 we introduce the stochastic setting and the random variables of interest.The semiintrusive approach is described and tested in Section 5. Section 6 is devoted to validation against real data and conclusion and perspectives are presented in Section 7.

GPS data: a case study
The traffic data available for this research were provided by the company Autoroutes Trafic [1] and they are presented and treated in [6, Section 4].They correspond to a sector of the French A8 highway, also called la Provençale, between the exit no.45 (Antibes) and the exit no.49 (Nice St Isidore), for a total length of 17,5 km, see Figure 1a.In this study, we will consider the direction from Antibes to Nice St Isidore, denoted as Direction 1. Data were collected on four Tuesdays (March 19 and 26 and April 2 and 9, 2013) from 6 a.m. to 11 a.m. and are divided into two categories: GPS data and magnetic loop detector data.GPS data, supplied by Coyote embedded systems, include the device ID, the position and the velocity of the car, sent every minute.In order to show the behaviour of data, in Figure 1b we report in a space-time plot the speeds registered on March 19.Loop detector data are   supplied by the highway operator ESCOTA [2] at 16 locations and consist of hourly and 6 min flux averages.They will be used for the estimation of the real amount of cars travelling on the considered section, see Section 4.2.

The deterministic LWR model
Macroscopic traffic flow models describe the evolution of the position of vehicles on a (infinite) road identified with the real line in terms of averaged quantities, such as the density ρ = ρ(t, x), t ∈ R + , x ∈ R and the average speed of cars v = v(t, x).The first model was introduces in the mid '50 by Lighthill, Whitham and Richards [17,22] and it is based on the conservation of the number of cars, which is expressed by the following scalar conservation law: where q = ρv is the traffic flow.To close the equation, the LWR model assumes that v = v(ρ) is a non-increasing function of the density.In this work, we will use a modified Newell-Daganzo velocity function [9,20], which is characterized by a linear decreasing free-flow speed and a hyperbolic velocity in congested regime, and we add a downward jump at ρ = ρ c to model the capacity drop observed in real traffic (cf.Fig. 2): where v max is the maximal free-flow speed, ω f is the backward propagating wave-speed, ρ c is the critical density (the limit density between the fluid and congested phases), ρ max is the maximal density corresponding to a bumper-to-bumper situation and ρ a is a further parameter accounting for the capacity drop, so that This choice has shown to best fit our data, as results from [6].Approximate solutions to (3.1) can be computed via Godunov finite volume method [13], which is equivalent to the supply-demand method (or Cell Transmission Model) used by the transportation engineers [9,11,15].
Let ∆x and ∆t be respectively the space and time grid sizes, x i = i∆x, i ∈ Z, be the grid points and C i = [x i−1/2 , x i+1/2 [ the space mesh cells.We aim at constructing approximate solutions In order to iteratively define ρ ∆ , the demand and supply functions are usually defined respectively as In the case of a discontinuous flux function, the above definitions are modified as follow [28]: where q(ρ c ±) denote the left and right traces of q at ρ c .
The Godunov numerical flux at the interface x i+1/2 is then defined by and the recursive numerical scheme is given by under the classical CFL condition ∆t λ n ≤ ∆x, where λ n is the maximum of the absolute values of wave speeds at time t n , see [28,Section 5] for more details.The scheme (3.6) is initialized taking for a given initial datum ρ 0 : R → [0, ρ max ].

Stochastic scalar conservation laws
Following [18,19,26], we will consider N = 2 independent random variables X := {X 1 , X 2 } defined on a probability space P = (Ω, B Ω , µ), where ω := (ω 1 , . . ., ω s ) ∈ Ω := Π s i=1 [a i , b i ] is the sample space of random parameters, B Ω is the σ-algebra of the Borel sets on Ω and µ the Lebesgue measure.We assume that X i : Ω → R is measurable and we denote by f X i : R → R + its probability density function.Since the variable are independent, their cumulative distribution is given by f . Hence, for any measurable real valued function g, its expected value is given by We are interested in the stochastic Cauchy problem where Since we are led to consider flux functions with jump discontinuity in ρ, we refer to the corresponding theory [5,10,12].Let us assume that F (ρ; X) , where H denotes the Heaviside function and H denotes the multivalued extension of H ( H(0) ∈ [0, 1]).We say that F is jump continuous at ρ c if the left and right limits at ρ = ρ c exist and are finite.• For µ-a.e.ω ∈ Ω, it satisfies Well-posedness results for problem (4.1) in the case F ∈ W 1,∞ (R; R) can be found in [19,Theorem 3.3] and [18,Theorem 3.11].

Random velocity
To account for vehicle speed variability, we consider a stochastic velocity function in the form expressing the perturbation from the equilibrium velocity v.The distribution of the perturbation depends on several factors as, for instance, the drivers behavior and the weather conditions.We use GPS real data to fit the distribution function.To this end, we plot speed values against densities and we find a spline curve interpolating the medians, separating the density domain in cells and computing the median speed of each spatial cell, see Figure 3.
We then compute the y-distance of each point from the spline and, in order to analyze the normality of the distributions, we perform the QQ-plots (Figure 4) in congested and free flow phases separately.Our goal is to apply a suitable perturbation to the velocity in order to reproduce this deviation from a median value.Hence, we have to adjust the parameters of the distribution of these perturbations.We compute the relative distances from the median and we can see their distributions in Figure 5. Their standard deviations are respectively 0.12 and 0.23.For simplification purposes, we will use the same perturbation for free-flow and congested conditions.We take σ = 0.2, which corresponds to the normal distribution law N (µ=0,σ=0.2) .Since we need a distribution with bounded domain, we replace the normal distribution by a triangular law T [−0.5,0,0.5], which still has standard deviation equal to 0.2.

Random initial datum
The second random variable we consider is a perturbation applied to the initial condition to compensate for the lack of information on the penetration rate of GPS data.In our specific case, the amount of equipped vehicles represents only a little percentage of the total volume of traffic.Its variation depends on the time of the day and, indirectly, on traffic density.To estimate the percentage of equipped vehicles compared to the total traffic density, we use the corresponding real flux measurements obtained by some magnetic loop detectors.In   particular, we consider measurements taken at km 180, in the direction west-est (Antibes to Nice), and we compare them with the GPS data at the same time and location, see Figure 6.Percentages per hour and per day are collected in Table 1.We can observe that the penetration rate of GPS-equippend vehicles has the same pattern every day: early in the morning the percentage of cars equipped with a Coyote system is very low (0.9-1.4%), then it starts growing until it reaches the maximum between 9 and 10 a.m.(2.3-3%).Finally it starts decreasing again.This is probably due to the fact that Coyote systems are mainly used by some specific categories of people such as taxi or truck drivers.Taking a least Day 6-7 a.m.7-8 a.m.8-9 a.m.9-10 a.m.10-11 a.m.square interpolation of the percentages of equipped vehicles (Fig. 7a ), we obtain rescaled values (Fig. 7b) and we can analyze how the GPS flux varies with respect to the real flux and, consequently, estimate the uncertainty.We firstly compute the rescaled GPS data as  a function of the flux.Since we know when traffic at km 180 is in free or congested flow condition, we can distinguish the flux values (Fig. 8a) and compute the corresponding density through the inverse of the flux function.We eventually compute the relative error as the difference between two fluxes divided by the real one and we can model the perturbation on the initial density as an exponential function of the density multiplied by a random variable  where α and β are positive parameters.Fitting the real data, we can set α and β to model the maximal perturbation.Therefore we can find quite easily an exponential function that contains all the data (Fig. 8b).We set the first constraint as a perturbation of 100% in ρ = 0, which implies β = 1, and the second one as a perturbation of 60% in ρ c .Therefore we use We consider a uniform distribution . Remark that this choice ensures that the density does not attain negative values.

A stochastic LWR model for uncertainty quatification in GPS data
We are thus led to consider problem (4.1) where ρ(t 0 , x) is the density reconstructed from data at time t = t 0 , see Section 6.1.In the next sections, we will explain how to estimate the space-time evolution of statistical moments for the above model, and compare the results to our data set.

Numerical scheme
We aim at computing approximate solutions ρ ∆ (ω) of (4.1) by an iterative procedure ρ n+1 ∆ (ω) = H(ρ n ∆ (ω)).Following [3], besides the space and time discretization introduced in Section 3, we introduce a partition of the probability space Ω ⊂ R s , i.e. a set of Ω j , j = 1, ..., N , of mutually independent subsets covering Ω: We look for approximate solutions of the form For every Ω j , its probability measure is Therefore, we want to approximate the solution of (4.1) by the conditional expectation Once the conditional expectation is computed, the mean and the variance of ρ ∆ (ω) on the cell C i at time t n are given by Since the operator H is nonlinear, we need to compute the conditional expectation E[g(X)] of a function evaluation g(X) in Ω j , given the conditional expectations E[X|Ω j ].For each Ω j , we wish to find a polynomial P j ∈ P p (R s ) of degree p, defined on a stencil S j , i.e. a set S j = {Ω k } k∈I k with Ω j ∈ S j , such that This means that the conditional expectation of the reconstructed polynomial P j is equal to the conditional expectation of X in each Ω k of the stencil.Once this polynomial P j is known, we can estimate by using a quadrature formula in each Ω k .
We focus now on the case of a one-dimensional probability space Ω ⊂ R, the twodimensional framework being a straightforward generalization.
The Godunov scheme (3.5)-(3.6)applied to (4.1) gives As we said before, the expectation of the numerical flux can be approximated as where P i,j is a piecewise polynomial reconstruction of the density in the geometric cell C i and probabilistic cell Ω j and computed on the stencil S j .
In order to compute the integral (5.4), we use the third order Gaussian quadrature: where Therefore, the integral (5.4) can be computed as: The most important step is computing the polynomial reconstruction, which defines the order of the method.For any cell Ω j , we define a polynomial P j ∈ P p (R) of degree p, described by a stencil S j = {Ω j , Ω j 1 , ..., Ω jp }, where j 1 , ..., j p = j.Since in the space variable we use a first order finite volume method, we expect that the convergence order will not change increasing the order of the polynomial in the probability space.We will test the approach with a piecewise constant and piecewise linear interpolations (using a MUSCL scheme), analyzing the convergence orders.
• p = 0: first order reconstruction.We take S j = {Ω j } and constant polynomials.
• p = 1: ENO second order reconstruction.We evaluate two linear polynomials, and take the least oscillatory one.We introduce the average mid-points For the cell Ω j , we define two polynomials of degree 1: p − j is constructed using the cells Ω j−1 and Ω j and p + j is defined on Ω j and Ω j+1 .For p + j we have: such that Since E[x − ω j |Ω j ] = 0 by definition of ω j , we obtain the 2 × 2 system and similarly forl . From (5.7) and (5.8), we recover We choose P n i,j equal to the one that realizes the least oscillation: min The procedure is summarized in Algorithm 1.

Numerical tests
To validate the approach, we show some results concerning the stochastic conservation law (4.1) with random flux function (4.4) without capacity drop and piece-wise constant (deterministic) initial datum ρ 0 (x) = ρ L = 10 for x < x 0 and ρ 0 (x) = ρ R = 80 for x > x 0 , x 0 = 0.5.In this case the mean and the variance can be computed analytically: for each realization of the random variable, the solution is given by a shock wave moving with speed and we deduce μ(t, x) = x−x 0 We consider both uniform and triangular probability distributions and the piece-wise constant and ENO polynomial reconstructions described above, which give similar results.To analyze the L 1 -convergence for the two polynomial reconstructions, we denote by μ∆ and μ respectively the means computed by the semi-intrusive method and analytically.We then keep fixed the space mesh, while refining the probabilistic discretization, and compute the error err(∆) = μ∆ − μ L 1 .
Figure 11a shows that the errors corresponding to the first and second order interpolations are very close, since they are conditioned by the first order spatial approximation.Figure 11b    mesh sizes ∆x = 2 −n 10 −2 , n = 1, . . ., 4. This confirms that if the probabilistic discretization is fine enough, the main contribution comes from the spatial error.
To evaluate the computing time, we compare the Abgrall-Congedo approach using a piecewise constant polynomial reconstruction of the density with the Monte Carlo method.Figure 12 plots the time as a function of the L 1 -error, for ∆x = 0.001 and a triangular distribution T [−0.5,0,0.5]for the speed perturbation.We remark that the Abgrall-Congedo approach is significantly faster than Monte Carlo.See also  6 Uncertainty quantification in traffic flow simulations

Model calibration
We calibrate the fundamental diagram following [6, Section 5].In particular, for the section around km 180 in direction 1, we take v max = 125 km/h, ω f = 17 km/h, ρ max = 614 vehicles/km, ρ c = 120 vehicles/km and ρ a = 300 vehicles/km.To estimate the initial conditions at some fixed time t 0 ≥ 6 : 00 am, for every space cell C i , we consider a weighted average of the data measured in the hour preceding time t 0 , with an exponentially decreasing weight (for more details see [6,8]): Then, as initial condition, we compute ρ t 0 i as where ρ data i (t k ) is the density measured in cell C i at time t k .The coefficient a depends on the reliability of the data and on the variability of the traffic in the hour before t 0 .After some tests, we decided to fix a ∈ [1.5,2] to give more importance at data close to t 0 and we increase it if we need to consider a longer period.
To avoid dealing with boundary conditions, we run simulations on a larger space domain so that information coming from upstream and downstream boundaries do not reach the domain of interest.uncertainty propagation in traffic simulations and we tested the results against real data collected on a busy sector of the French A8 highway close to Nice.The results are quite satisfactory, in particular when traffic does not change phase.From the application point of view, these preliminary results can be improved considering higher order finite volume methods for the spatial discretization, and taking into account boundary conditions and their uncertainty.Besides, well-posedness of stochastic conservation laws with discontinuous flux function needs to be investigated.

( a )
The considered section of A8 highway, with the location of the loop detectors (map data: @2013 Google).
Scatterplot of GPS data of March 19 th , taken from[6].

Figure 1 :
Figure 1: The A8 highway between Antibes and Nice.
Free flow phase (speed values higher than 70km/h).Congested phase (speed values smaller than 70km/h).

Figure 5 :
Figure 5: Densities of relative distance from the median
Loop detectors data and rescaled GPS data.
Rescaled GPS density on real density.Initial perturbation shaped on real data.
Figures 9 and 10 show the results obtained with the ENO reconstruction for ∆x = 0.01 and ∆x = 0.002 respectively, and N = 40 probabilistic cells in [−1, 1].
Figure11ashows that the errors corresponding to the first and second order interpolations are very close, since they are conditioned by the first order spatial approximation.Figure11bdisplays the probabilistic convergence curves for N = 2, . . ., 2 10 , corresponding to different space

Figure 11 :
Figure 11: Convergence analysis in the probabilistic direction.

Figure 16 :Figure 17 :
Figure 16: Space-time plots of computed mean travel (blue) ± st.deviation (green) and actual travel (red) when the traffic is in free-flow.

Figure 18 :
Figure 18: Space-time plots of computed travel (blue) ± st.deviation (green) and actual travel (red) when the traffic phase changes (from congested to free flow (a) and from free flow to congested (b)).

Table 2 :
Computing times for Abgrall-Congedo approach and Monte Carlo.