Statistical rigidity of vehicular streams—theory versus reality

By means of standardized statistical tests applied to empirical traffic data (recorded at the Expressway R1 in Prague, Czech Republic) we verify a hypothesis that vehicular clearances are distributed via Generalized Inverse Gaussian distribution (GIG) g ( x ) ∝ e − β / x e − λ x ( x > 0 , β ≥ 0 , λ > 0 ) . We formalize mathematical theory explaining recent results obtained by means of advanced statistical analysis applied to vehicular/pedestrian microstructure. For these purposes we use (and generalize) approaches applied in the theory of counting processes. Quantities standardly analyzed in vehicular headway modeling (headway, multi-headway, interval frequency, and statistical rigidity) are here reformulated into formal mathematical definitions and then analytical predictions for statistical rigidity of particle systems with GIG-distributed headways are compared with empirical behavior. We show that a connection between clearance distribution and rigidity in real-road data is not tight as in theoretical structures. This discrepancy is explained as a consequence of the fact that interaction rules acting in vehicular systems (unlike level processes studied) are not short-ranged, which supports a hypothesis that mutual interactions exist among several succeeding cars (as investigated in Krbálek et al 2018 Physica A 491, 112).


Abstract
By means of standardized statistical tests applied to empirical traffic data (recorded at the Expressway R1 in Prague, Czech Republic) we verify a hypothesis that vehicular clearances are distributed via Generalized Inverse Gaussian distribution (GIG) We formalize mathematical theory explaining recent results obtained by means of advanced statistical analysis applied to vehicular/pedestrian microstructure. For these purposes we use (and generalize) approaches applied in the theory of counting processes. Quantities standardly analyzed in vehicular headway modeling (headway, multi-headway, interval frequency, and statistical rigidity) are here reformulated into formal mathematical definitions and then analytical predictions for statistical rigidity of particle systems with GIG-distributed headways are compared with empirical behavior. We show that a connection between clearance distribution and rigidity in real-road data is not tight as in theoretical structures. This discrepancy is explained as a consequence of the fact that interaction rules acting in vehicular systems (unlike level processes studied) are not short-ranged, which supports a hypothesis that mutual interactions exist among several succeeding cars (as investigated in Krbálek et al 2018 Physica A 491, 112).

Introduction and motivation
Vehicular headways (gaps or clearances) represent, as is well known, fundamental quantities investigated in empirical/theoretical studies of traffic flow [1 -3]. Their statistical distributions characterize how vehicles are distributed on a roadway/freeway [4,5] or on intersections [6]. Therefore, headway distributions represent a crucial factor for mathematical predictions of freeway/intersection capacities. At present, vehicular headway modeling is an extensively developing discipline which reveals a number of remarkable features in vehicular microstructure. Historical overview and achievements of this discipline are synoptically summarized in the survey [1].
On the other hand, vehicular headways (and clearances) are closely interconnected to the fundamental macroscopic quantities, namely traffic density ñ and intensity I (flux). Correct localization of a traffic system in the ID plane strongly depends on accuracy in enumerating the local intensity and local density. Both quantities are, however, calculated with help of random variables like number of cars occurring inside a freeway-segment of a given length, or number of cars crossing a given point during a certain time interval. In fact, a stochastic component of those variables (denoted generally as interval frequencies) plays a principal role in all theoretical branches of the theory. A deeper (and theoretically substantiated) understanding of statistical linkages among usual traffic variables is, therefore, highly desirable.
In this paper we aim to reformulate a standard traffic description into a mathematical language working with associated stochastic variables. Since a standard mathematical theory dealing with events consecutively coming in time is the theory of counting processes [7,8] we will try to use its approaches to reveal a linkage between clearance distribution and interval frequency of the same particle system. Particularly, we plan to interconnect clearance distribution (time or spatial) with the second central moment of interval frequency. In the theory of • (Segmental data processing) Variance of real-road clearances is rapidly changing with traffic density [2][3][4]. It corresponds to the fact that statistical resistivity (measuring a level of resistance to statistical perturbations) depends strongly (as analyzed in [10,23,26]) on parameters of actual traffic state (intensity, density, mean velocity). Therefore, necessary part of intended estimation procedure should be a segmentation (i.e. restriction to small sub-areas of ID plane-as discussed in [26,29] and applied in [23,26], for example), which prevents a mixing of states with different vigilance of drivers.
• (Balancing criterion) Suggested analytical probability density g(x) should belong to a family B of balanced distributions (see appendix A.1). This is a property generally-valid for all one-dimensional particle ensembles, in which number of interacting neighbors is substantially smaller than number of all particles (see a theoretical derivation published in [13]).
• (Empirical criterion) According to recent knowledge all empirical clearance distributions (measured in a congested phase) show the heavy plateau at the origin (see figure 3 in [6] and an explanation in [23]). Such a plateau can be understood as a consequence of strong repulsions among closely-occurring vehicles whose drivers make an effort to prevent a possible crash. Some of older approaches simulate this plateau by means of shifted densities (shifted negative exponential [18], shifted log-normal [19]), which therefore are not smooth. Contemporary models work with smooth functions g(x) (like GIG distributions [12]) fulfilling the following property In fact, it corresponds to the mathematical definition of the plateau in x=0.

Origin of empirical traffic data
Induction double-loop data analyzed in this paper have been provided by Road and Motorway Directorate of the Czech Republic. They have been measured at several segments of highway circuit R1 in Prague, Czech Republic, where traffic is saturated for most of the day. For a following data-processing they have been used data from fast lanes only where number of long vehicles is negligible. This procedure guarantees a homogeneity of the data. We have used 12 data-files, each having data from measurements lasting three months approximately.

Hypothesis of GIG-distributed clearances
Being motivated by the effort to estimate a probability density for clearances gauged at European expressways we suggest (similarly to [4-6, 10-12, 23, 26]) to search for a suitable candidate inside the two-parametric family of Generalized Inverse Gaussian densities [30,31] , 0 , 0 , 2 x Dx which respect all the theoretical criteria discussed in the previous subsection. We note that the function (2) does not represent an ad-hoc assumption only. Quite the contrary, this function has been found (see [5]) to be a steady-state distribution for distances between particles in thermodynamical traffic gas. The parameter β can be understood (see [5,6,12,23]) as statistical resistivity of the gas whose interaction rules are based on a verified premise that a repulsive force (preventing a collision of cars in real-road traffic) is inversely proportional to a square of distance between particles-similarly to the well-known dynamics equations in Intelligent Driver Model [32]. In (2), Θ(x) stands for the Heaviside unit-step function and the normalization factor reads where  n ( ) x stand for the Macdonald's function of the νth order. Furthermore, the scaling constant D is related to the first moment (the expectation) In this part of the text we present results of two standard procedures estimating an parametric vector (β, D), which are based on a maximization of log-likelihood and a minimization of functional metrics (Kolmogorov distance) , 6 x respectively. Here (x 1 , x 2 , K, x n ) is a vector of empirical clearances;x is the associated sample mean,x is the harmonic mean, H(x) is the empirical distribution function of the sample (x 1 , x 2 , K, x n ), and b is the distribution function of the GIG distribution (2). Practically, the proposed procedure is implemented as follows. Empirical data extracted from congested traffic (and divided into samples of fifty succeeding time clearances) are sorted by a local density (for a definition of a local density please see A.4). Then all the ordered samples are divided into 100-samples sets, each with n=5000 clearances. Altogether we have N=91 such data sets with average local densities varying from 30 to 66 veh/km. In fact, such a sorting can be identified as a segmental data processing discussed in section 2.1. For each of the 91 data sets the afore-mentioned estimators are applied. An evolution (induced by changes in density) of point estimates a r g m a x l n , , 7 is summarized in figures 1 and 2.
To decide on the quality of the statistical model proposed we carry out a standardized statistical test (χ 2 -test in the bootstrap approach [33]). With respect to previous considerations we formulate a null-hypothesis in the expected form H 0 : Empirical time clearances (detected in congested traffic flows) are GIG-distributed. For each of the 91 above described data sets represented by the corresponding value of the local density we repeat the following procedure: • Denoting S={x 1 , x 2 , K, x n } the set of n=5000 clearances we randomly pick 300 clearances from S and test whether H 0 is/may not be rejected at significance level α=0.05.
• Such a random choice is repeated k=1000 times and average rate of rejection f MLE , f MDE is then evaluated (according to [33]).
The obtained rejection rates for the assumed values of the local density are drawn in figure 3. In this diagram we see that rejection rates differ only slightly from an ideal value determined by the chosen significance level α. In fact, with exceptions the suggested hypothesis is rejected in less than 15% of all cases. To be more specific, since the samples were drawn independently the CLT implies that the rejection rates are, for each value of the density, approximately N(α, α(1−α)/k) distributed under H 0 . Taking into account this asymptotic distribution, we plot in figure 3, for each density, the interval where the rejection rate should lie with probability 99% under H 0 Figure 1. Estimated resistivity as a function of density. We plot values of statistical resistivity β estimated by MLE (maximum likelihood estimator (5)) and MDE (minimum distance estimator (6)), respectively. Due to the scaling E = ( ) x 1, the resistivity is dimensionless. The dependence on traffic density (visualized by curves) is approximated by quadratic functions  (the grey zone). In the figure one can see a very good correspondence of the observed rejection rates with the calculated asymptotic interval, particularly for densities larger than 46 veh/km. For comparison, in appendix A.3 we analyze an alternative hypothesisH : 0 Empirical time clearances (detected in congested traffic flows) are Gamma-distributed. We show (see figure A1) that average rejection rate is, in this case, about 50%, which 1) rejects a suspicion of Gamma-distributed clearances; 2) demonstrates that values of a rejection rate in figure 3 are sufficiently low to support the hypothesis H 0 .
To conclude, as directly follows from the presented outputs, the hypothesis H 0 seems to be a mathematically relevant (taking into account the balancing criterion), empirically substantiated (reflecting the empirical criterion discussed in section 2.1), and statistically verified, which (among other things) explains a motivation for the second part of this paper.

Convention note
For purposes of this article, all types of headways (time headway, distance headway, time clearance, distance clearance) are called level spacings. Terminology used in this paper is inspired by the Random Matrix Theory [34] where the term level spacing represents distance between ordered eigenvalues in the unfolded spectrum of a random matrix.

Fundamental Terms and Properties
The key object we inspect here is a level counting process introduced by the following definition.
Definition 1. Let  Î ( ) S n n be a level spacing sequence and L 1 an independent random variable with the density l = - The sequence  Î ( ) L n n is then called a level sequence and the process  n ( ) t t 0 is said to be a level counting process.
In fact, in terms of Renewal Theory [7] the level counting process is a specific variant of delayed renewal process, where the distribution of L 1 is chosen specifically through the density for the nonzero intensity parameter λ. From this similarity one can profit when deriving some beneficial relations and theorems (see Appendix A.2).

Rigidity and Its Properties
The term rigidity refers to the variance of a level counting process and it represents one of the most powerful tools for describing and comparing various one-dimensional particle-like systems. The formal definition is provided below.

Definition 2. Let
Var t for  Î + t 0 is said to be the rigidity.
Analogously, the variance of a renewal process can be defined. The function is then said to be the number variance. Nevertheless, there are numerous disadvantages of this quantity compared to the rigidity. Concerning the theoretical issues, the rigidity is actually much more convenient to handle analytically. It is confirmed by the assertion of the following theorem, which provides the expression of the rigidity through the corresponding renewal function.
Theorem 1. Let Δ be a rigidity and  the corresponding renewal function. Denote l as the related intensity parameter. Then Proof. To get the desired relation, it is first necessary to derive the expression for the second moment of the underlying level counting process  n ( ) t t 0 . Applying the assertion of the theorem 9 then yields After plugging this result into the definition of the variance of the process and using the relation (24), we obtain the desired assertion Another way of deriving the expression is exploiting the assertion of the theorem 4 and the fact that the increments of a renewal process is a uniformly integrable process. Then, the variance of the increments converge to the variance of the level counting process where the formula for the number variance is The operator årepresents the convolution with respect to generalized functions. The formula (11) is analogous to the equality (9) and its visually even more compact, however, the relation (9) is indeed analytically more simple. The renewal function might be actually quite a complicated one and thus, the expression    may be extremely hard to get. Moreover, the derivative of the rigidity can be easily obtained, i.e. it is of the form satisfied for those  t 0 at which the renewal function is continuous. We will now turn our focus onto dealing with the asymptotic behavior of rigidity. Apparently, the derived relation (12) is just shifted and scaled function Combining that result with the well known asymptotic expansion coming from the renewal theory, the derivative of rigidity subsequently converges in infinity to the constant l s 1 . So that this convergence holds, it is necessary to additionally assume the mentioned continuity of the renewal function on the neighborhood of infinity. Below, a more general result which does not require such an assumption is provided.
Theorem 2. Let D be a rigidity and let the corresponding renewal function satisfy the asymptotic expansion (13). Denote l as the related intensity parameter and s 2 the variance of the related level spacing. Then Proof. The proof is based on the relation (9). Using the expansion (13), the limit of the integral part of the investigated difference can be then expressed as where h is non-negative. The interchange of the limit and the integral is eligible since the integrand is bounded function on  + 0 . In conclusion, the final result is given by The theorem immediately implies that the rigidity must be asymptotically linear once the renewal function has an asymptote.
We will now focus on the derivation of the intercept of the asymptote for the rigidity, which allows us to completely determine the asymptotic behavior of the corresponding rigidity. To accomplish that, the Laplace transform is necessary to be applied. Namely, the mapping defined as for > p 0 and F representing some general distribution function will be used. This is very useful tool in the counting process theory since it allows one to determine the asymptotic behavior of a function through the behavior of its Laplace transform on the right neighborhood of zero. Such a procedure applied in the case of rigidity is given in the proof of the following theorem. Proof. From the definition of variance and the fact that the expected value of a level counting process is linear, it suffices to perform the asymptotic expansion for the second moment of the process. Denoting where the Laplace transform has been applied on the relation (10). Another step requires expressing the Taylor polynomial of the function  f ( ) p p of the second order. It is of the particular form Plugging this result into the definition of the variance gives the final asymptotic expansion for the rigidity (15). Clearly, the existence of the expansion of the first order for rigidity is equivalent to the existence of the asymptotic expansion of the second order for the second moment  2 . ,

Simulation experiment
In order to illustrate and check the obtained asymptotic results we carry out the following simulation experiment. We generate 10 000 independent realizations of level counting and renewal processes for Î [ ] t 0, 7 with step equal to 0.1, i.e. the length of every realization is 71 observations. On the basis of these 10 000 realizations we then calculate for each t the empirical rigidity and number variance, respectively to the process.
The level spacings of the renewal and level counting processes are generated using iid variables coming from the GIG distribution with the density given in (2) and the scaling constant This particular choice of constant D implies that the expectation of the GIG distribution is approximately equal to 1 so that the intensity parameter used in the definition of the level counting process (cf. definition 1) is l = 1 and the corresponding first element L 1 of the level sequence has the density e e d ò = - In the case of renewal process the first level spacing S 1 has the same GIG distribution as the rest of the spacings, i.e. the one described above.
For the assumed level counting process the asymptotic expansion of rigidity given in (15) takes on the form is a so-called compressibility (see also [12,13]), is the intercept, and The results obtained for rigidity and number variance are presented in figure 4. From this figure we can observe the following. First, the asymptotic approximation of rigidity works well and convergence is quick for small values of β and is getting slower with increasing value of the parameter β. Second, there is a clear visible difference between the behavior of rigidity and number variance even though the corresponding processes differ only in the first level spacing. It seems, when considering a fixed value of b, that the asymptote for number variance has the same slope as the one for rigidity (as predicted by the theory) but the intercepts are different. Again the difference increases with growing value of the parameter β.

Rigidity in empirical traffic samples
In this section we present a numerical analysis of the statistical rigidity extracted (similarly to [10]) from empirical traffic data described in section 2.2. Following the same segmentation procedure as that applied in the section 2.3, we calculate the quantity D( ) t in all the samples being sorted by a local density. Then a course of associated rigidities is plotted in figure 5 where it is clearly evident that an asymptotical trend is linear for all density bands, i.e.
This fully corresponds to the theory discussed. However, the compressibility  ĉ ( )-determined from a linear tail of an empirical dependency D( ) t by a linear regression-shows a substantial deviation from the theory presented. To be specific, a regression coefficient  ĉ ( ) and analytical value (17) enumerated for bD , (plotted in figures 1 and 2) differ significantly, as visualized in figure 6. The direct consequence of such inconsistency is the fact that the assumptions of the above-discussed theorems must necessarily be violated in real traffic flows. It means that clearances of neighboring vehicles (as random variables) are not independent which implies that traffic flow does certainly not represent a system with short-ranged interactions (according to [12]). Hence we assert that for a given driver the number of actively-followed vehicles is greater than one. It supports a contemporary opinion (published in [12]) that vehicular systems are driven by potentials/forces/rules that are definitely middle-ranged. It means that driver's decision routines include more vehicles moving in front of a reference car.

Summary and conclusions
In the first part of the article we discuss three substantial attributes that should be taken into account if one aims to estimate vehicular clearance distributions. We explain that a necessary part of such estimations is a data segmentation preventing an undesirable mixing of states with different stochastic resistivity or compressibility. Also, we assert that any proposed statistical model (distribution family) should belong to the class of balanced distributions since otherwise the model does not fulfill a necessary condition which is generally valid in all onedimensional particle systems with short-or middle-ranged interactions. The third essential attribute of a suggested estimation procedure is a requirement that probability density g(x) should generate a plateau in = x 0, which implies (for smooth probability density functions) that = ¢ =  =¼= ( ) ( ) ( ) g g g 0 0 0 0.This requirement is an output of earlier studies of traffic data.
Taking into account all these attributes we suggest the GIG distribution as a suitable candidate for a respective estimation procedure. In the next part of the text we prove (by means of standardized statistical tests and the bootstrap approach) that a hypothesis on GIG-distributed vehicular clearances is justified in all subareas of ID-plane (a rejection rate is significantly low) and that stochastic resistivity β (which is a free parameter in GIG family) increases almost linearly with traffic density.
In the next part of the text we derive a direct theoretical connection between a clearance distribution and a statistical rigidity (property being observed in several investigations of traffic microstructure). We show that the statistical rigidity has a perspicuous linear asymptote whose slope χ (called a compressibility) is (in systems where spacing are scaled to the unit average) equal to the variance s 2 of spacings. All these results have been thoroughly tested and verified with help of simulation experiments.  Furthermore, we present results of statistical analysis working with data measured on Czech freeways. We show that empirical rigidity generates (in accordance with the theory) a linear tail whose slope depends on traffic density. Inconsistently with the theory, however, the empirical compressibility do not correspond to the variance s 2 detected from clearance distributions. To be specific, in all sub-areas of ID-plane the theory underestimates a real value of the compressibility. This discrepancy represents an interesting open problem in a field of vehicular headway modeling. The elementary explanation (if we disregard dynamical segregation effects such as a clustering or occurrence of dramatically faster vehicles behind a slower one) is, as understandable, hidden in the fact that vehicular spacings do not fulfill the fundamental prerequisite in the definition of a level process. Indeed, vehicular headways/clearances/gaps do not represent an iid sequence of random variables and analytical predictions therefore deviates from reality. Such a deviation generates, however, a following very strong conclusion. Vehicular systems (in a congested phase) do not represent a short-ranged system. It means, beyond all doubt, that driver reacts to more front-running vehicles, which results in a middle-ranged decision making. This knowledge is in a consonance with previously-published findings [12].
To conclude, we show that in short-ranged one-dimensional systems (where spacings are iid) the spacing distribution unambiguously determines a statistical rigidity so that a respective compressibility χ is equal to the variance s 2 of spacings. In contrast, a significant deviation c s ¹ 2 detected in empirical measurements supports a recent opinion that the number of actively-followed vehicles (in all typical vehicular systems) is greater than one. Unfortunately, these findings are omitted in most traffic models commonly used.
(iv) (Lebesgue integrability): is Lebesgue integrable; (v) (balancing criterion): w $ > 0 so that The parameter ω (unambiguously determined by the balancing criterion) is then called a balancing index and denoted by ( ) g inb . The class of balanced densities is denoted by B. The balanced density g(x) will be referred to as the normalized (balanced) density if d  ò = ( ) g x x 1. Therefore, every normalized density can be understood as the probability density function (pdf) of a certain continuous random variable.

A.2. Level counting process theory
The essential object we inspect here is a level counting process (see some important results published in [7,8,[35][36][37]). To completely understand its nature, it is first necessary to deal extensively with a renewal process. However, providing the results of the renewal theory is not the purpose of this paper. Additionally, the theory is quite well known and understood [7]. Nevertheless, we will at least provide the definition of the process since there is a newly used term needed for further investigations.
The sequence  Î ( ) S n n is then called the level spacing sequence and the process  ( ) N t t 0 is said to be a renewal process.
The term level spacing comes from the theory of random matrices [34] where the level spacings refer to the spacings between consecutive random eigenvalues. In this paper, the term will eventually represent the spacings between two consecutive vehicles on a road. Definition of the level counting process  n ( ) t t 0 is introduced in the main part of the paper (see definition 1). The level sequence, which determines the level counting process, is entirely defined through the level spacing sequence which determines the renewal process. It is then not a surprise that the two counting processes are very alike in terms of their structures. This close relation can be described through the equation In other words, the spacings between two consecutive random variables in the sequence of levels have the distributions determined by the sequence of level spacings. Additionally, that distribution depends only on the difference between indexes of the levels. Such property is apparently satisfied regardless of the distribution of L 1 . If the distribution of this first random variable is general, the corresponding counting process is then truly called a delayed renewal process which is also quite a well known kind of process (see [7]). Nevertheless, in the definition of a level counting process, the distribution of L 1 is chosen specifically through the density l = - for the non-zero intensity parameter λ. This particular choice puts the two counting processes into another close and important relation which is provided in the following theorem (proved in [7]). The proposition above says that the random process consisting of the increments of a renewal process converge in distribution to the level counting process. To put it in another way, the properties of a level counting process are actually given by the limiting properties of the corresponding renewal process. Before we prove some crucial results, we first need the formula providing the relation between the distribution of levels and the corresponding level spacings.

Theorem 4. Let
Proof. In the case where n=1, the proposition contains the random variable S 0 . Because S 0 is defined to be zero almost surely, the respective distribution function takes only the value one on the non-negative part of the real line. According to the distribution of L 1 introduced in the definition 1, the relation (23) is satisfied.
Suppose now that the natural n is greater than one. Using the definition of L n as the sum of two independent random variables L 1 and -S n 1 , it is possible to write where  t 0. That is already the desired expression considering the fact that S n represents the sum of n iid random variables all having the same distribution as that of S 1 ., Equipped with the theorem 5, we can now proceed to the most important result of this section. Namely, it will be shown that the expected value of a level counting process is a linear function. Theorem 6. Let  n ( ) t t 0 be a level counting process and l the respective non-zero intensity parameter. Then Here, we will use the well known result from the renewal theory stating that the expected value of a renewal process can be written as the series of all the distribution functions of the corresponding level spacings. Mathematically written, it holds for  t 0. In fact, this proposition can be actually generalized to a counting process of any kind. Its application on a level counting process along with the relation (23) yields , From the results above, we can see that the expected value of arbitrarily chosen increments of some fixed length are still the same for a level counting process. As a matter of fact, there is a much stronger proposition saying that the increments even have an identical distribution. Such property is called homogeneity and the respective theorem (proved in [7]) is given below.
Theorem 7. A level counting process satisfies the properties of a homogenous process, i.e.
The essential results of the paper are based on the expression of the moments of a level counting process through the moments of a renewal process. To conveniently perform the derivation of such a relation, we will first establish the corresponding terminology and notation.  The definition of the renewal functions makes sense since for all  Î k and  Î + t 0 , the moments E( ) N t k are finite. Such property holds even for the moments of a general counting process as discussed in [7]. With this kind of notation, the mentioned expression for the moments of a level counting process have more elegant and slick form. However, before providing its proof, the following auxiliary proposition needs to be proved. Theorem 8. Let  k represent the k-th renewal function for  Î k and let l be the corresponding non-zero intensity parameter. Then The proof is based on expressing the relation (23) through the probability distribution of the corresponding renewal process  ( ) N t t 0 . The essential equality used also in (25) then allows one to write To get the last equality, the sums need to be swaped and consequently, the definition of the function  i as the i-th moment of a renewal process is necessary to be applied. , Just proven relation involves already obtained result related to the linearity of the expected value of a level counting process. Namely, since  = 1 0 on the non-negative part of the real line, the equality (26) for k=0 represents the derivative of E n ( ) t as discussed in the proof of the theorem 6. The generalization of the relation (24) is also included in the following proposition.