Statistical Modeling of Time Headway on Urban Roads: A Case Study in Baghdad

Time headway is an important feature of traffic engineering, determined by measuring the time elapsed between successive vehicles crossing a point on the road. This paper proposes a time headway modeling of urban roads for each lane of selected sites based on headway distribution. The study was carried out in the capital of Baghdad, Iraq, choosing the Mohamed Al-Qassim and Qanat Al-Jaish highways to collect field headway data. Data were obtained using image processing and Python for video analysis. Each road consisted of six lanes divided into three lanes in each direction. The three lanes were studied in the forward movement of both highways. Seven flow rates were considered for the headway data to evaluate the flow state of each lane. The EasyFit 5.5 software was used to select an appropriate headway distribution for each flow rate, using a method depending on the Kolmogorov-Smirnov test for each lane. The results demonstrated that the proper model per lane was different. The selected distributions were used to develop a model through nonparametric regression using Theil's slope estimator method. Keywords-time headway; headway distribution; urban road; headway model; flow rate; traffic volume


INTRODUCTION
Traffic modeling is a fundamental tool in solving traffic engineering problems. Traffic simulation depends on modeling of time headway because it represents important characteristics in microscopic flow theory. The temporal distance between two successive vehicles is known as time headway [1]. It is the time difference between the arrival of a leading vehicle and the following vehicle to a reference point. Time headway depends on the flow rate of vehicles. A short time headway represents a high flow rate. This property are critical for roadway system planning, analysis, design, and operation [2]. Time headways can be used in various traffic evaluations such as safety, capacity, and service level [3]. The minimum of time headway is obtained when traffic flow reaches its maximum. To increase route capacity and reduce vehicle delays, traffic engineers can employ detailed modeling and assessment of vehicle headway distribution. A statistical model of time headway is essential for theoretical and simulation-based traffic modeling. In addition, headway distributions are used in digital simulations that mimic multi-lane and highway traffic [4]. Traffic volume, the proportion of big or heavy vehicles in the traffic, lane position, road layout, time, and weather affect the time headway distribution of vehicles and their appropriate models [5]. Lane distribution and speed are affected by lateral clearance on the roadsides [6]. Data are collected and analyzed using advanced technologies, such as image processing techniques, due to their effectiveness compared to sensor and loop detector methods [7].
Several studies examined the modeling of time headway distribution. A study on the effect of lane position on time lane distributions at high levels of traffic flow was presented in [8]. Based on headway distribution analysis, this study assessed driving behavior in several highway lanes in Isfahan, Iran. The results demonstrated that the fit model for the passing lane was distinct from the middle lane, due to drivers' behavior. Numerous drivers adopted unsafe headways through carfollowing conditions in the passing lane, demonstrating the high risk ability of the driver population, which led to considerable distances in capacities and statistical distribution models of two lanes. The proper model for the passing lane was the shifted lognormal while for the middle was the shifted gamma.A traffic detector based on a laser beam sensor was used to model a suburban artery in South Korea [9]. The observed data were divided into five flow states. The test rejected the randomness for a flow rate of 5-9veh/m. Hence, theoretical modeling was performed for the remaining flow rates 10-14, 15-19, 20-24, and 25-29veh/m. The goodness of fit tests, examined using the Kolmogorov-Smirnov (K-S) method, revealed that the best fits were the Johnson SB model for a flow rate of 10-14veh/m and the Johnson SU for the remaining three flows. The log-logistic model was the second-best fit for flow rates of 20-24 and 25-29veh/m. The progression of time under high flow conditions was studied on the Palestinian arterial street in Baghdad using two different periods for data collection [10]. The quality of fit was evaluated using the nonparametric chi-square test to determine the suitability of the logistic distribution of empirical data for the time headway distribution. The logistic probability distribution function described the time headway at the congested Palestine street for scale parameters ranging from 1.5 to 0.85. The differences in time headway distributions were studied in [11] for four flow levels, 0-600, 600-1200, 1200-1800, and 1800-2400PCU/h, for different two-lane and four-lane roads in Assam using the traffic volume data collected by cameras on four-lane roads. The Burr and Log-Pearson distributions were found to be suitable for 600-1200PCU/h, while the lognormal and loglogistic distributions fit the headway for traffic flows above 1200PCU/h. In [12], the mathematical distribution of vehicle headway was analyzed for freeway and arterial roads in Riyadh city, Saudi Arabia, evaluating low-volume traffic conditions. The findings demonstrated that the shifted exponential and the gamma distributions appeared to fit data on freeways and arterials, respectively. The time headway distributions of vehicles traveling on the urban expressway in Bangkok, Thailand, were examined in [1], using exponential, lognormal, and generalized extreme value GEV distributions. This study showed that the GEV distribution could describe more than 90% of the empirical distributions on most lanes and sections of an expressway. On the other hand, the exponential was the less practical distribution, as it only described the experimental distributions during ultra-low traffic conditions. II. DATA COLLECTION Field data were collected at two sites, as shown in Figures 1 and 2. The first site was the Mohamed Al-Qassim Highway, a 20km-long major arterial expressway, while the second was the Qanat Al-Jaish Highway, a 22km-long major arterial expressway in Baghdad. The camera was mounted on the footbridges at approximately 10m height from the road surface to monitor traffic flow.   Table I shows the camera coordinates. The selected sections consisted of six main lanes, three lanes in each direction, termed left or lane one, middle or lane two, and right or lane three. The study section was distant from ramps, loops, and horizontal or vertical curves. Data were collected during daytime, under ideal weather and good visual conditions. Data were selected to include different levels of traffic flow and the stop-and-go situation was neglected. Traffic data for site one were collected for 21 hours over three days: 3, 4, and 5/5/2021. Two hours of the second day, 1:00 PM and 2:00 PM, were excluded due to the stop-and-go condition of the vehicles. Data for site two were collected for 18 hours over three days: 22, 23, and 24/2/2021. The spot speed of vehicles passing each site was measured using a Bushnell 101911 speed gun. The speed gun was set to measure the speed in km/h and pointed in the direction of the moving vehicle. This speed gun transmits signals to moving objects and can capture a vehicle moving faster than 16km/h [13]. The average speed was determined for an interval of 15min. Therefore, the sample size of speed data of 100veh/h/ln was randomly selected from each lane. Figure 3 shows the direction and equipment used to record the data. The video clips were analyzed using Python to extract the real data, as shown in Figure 4.  The data were compared for their accuracy. The extracted data included the number of vehicles, vehicle classification, and time headway for each lane. Figures 5 and 6 depict the traffic volume observed during the study.  Seven flow rates were used to represent each lane's different flow state conditions and provide accurate headway data. Tables II and III illustrate a few characteristics of the time headways average speed, defined by the spot speed for each flow rate. The flow rate was calculated using: where h avg is the average headway in a lane.
Time headway was defined as the time between bumper to bumper of successive vehicle arrivals at a reference point or line on the lane playback screen. This time was simultaneously calculated for multiple lanes, as shown in Figure 7.
III. AUTOMATIC VEHICLE AND CLASSIFICATION METHOD Data were extracted from videos using Python and imageprocessing technologies to analyze vehicle counting, vehicle classifications, and time headway for each lane. Python and the OpenCV library were used to count the vehicles. The PyCharm IDE was used to develop the code, and Prettify was used to display it. Additionally, the NumPy, CV2, and CSV libraries were also used. The developed method combined grayscale and threshold algorithms to improve the quality and accuracy of vehicle detection. A dilate filter was used to track the position of each vehicle and categorize the detected vehicles into distinct classes, which were then tallied individually for each lane to offer helpful information for traffic flow analysis. The method's flowchart is depicted in Figure 8.  The count of vehicles was executed when the contour areas touched the Region Of Interest (ROI). The ROI was formed by drawing three imaginary lines diagonally across the road, connecting the two ends representing the three lanes [14]. When the front bumper of the vehicle touched this imaginary line, the value of the counter increased, indicating that the object's movement was observed. An imaginary was drawn diagonally by connecting two ROI points. The virtual detector was used for the classification. This method consisted of a rectangle box, considering the ROI, and the vehicles within that box were identified based on their dimensions, i.e., width and height [15]. The colors were specified to track the object using a histogram-oriented technique. In this method, the contour properties, such as solidity and contour area, were extracted and compared with the assumed values to determine whether the vehicle is a passenger or a heavy vehicle. The contours represent the connected pixels in an image. The contours of the foreground objects were detected after performing background subtraction. The video was binarized into frames to obtain the foreground. The Open CV method was used for the process of thresholding black and white images. The Open CV algorithm operates in the form of thresholding, as it considers the frame into two levels, i.e. background and foreground. Additionally, a binary threshold was used for the segmentation of a vehicle. The vehicle counting results were obtained and compared using manual and automated methods, and it was observed that the latter method provided accurate results quickly, as shown in Table IV

IV. METHOD
The EasyFit v5.5 software examined the collected data to select an appropriate distribution model, as shown in Figure 9. The distributions for each flow are shown in Table V. The analysis was carried out by evaluating the goodness of fit in the K-S test. It used the maximum likelihood method for the estimation parameters of fit distributions [16]. The K-S test is commonly used to detect whether two datasets are significantly different or not by calculating the probability of similarity between the two distributions [17]. Since the K-S tests are nonparametric and do not require prior assumptions based on distribution, they were applied to the observed data for distribution fitting. In this study, five distributions were used, where most of them were previously used for time headway modeling of highways and freeways: Pearson 6 [18], inverse Gaussian [19], generalized Pareto [20], Dagum [21], and generalized gamma. The following process was used to determine the appropriate time headway distribution model per lane: • The goodness of fit of the distribution of total headways collected from the lane was examined per lane.
• The goodness of fit of the distribution of the headways occurring for every flow scope (veh/h/ln) was evaluated for each lane. The flow scopes are presented in Tables ΙΙ and ΙΙΙ.
• The acceptable models were identified for each lane using the above two steps.
• A standardized model was selected using the Theil's slope estimator method for the flow ranges for each lane.
The parameters of time headway distribution models were estimated based on the previous steps.

V. RESULTS
The appropriate distribution for the time headway modeling was determined by the K-S test for each flow scope, and the results are shown in Tables VI and VII. The goodness of fit of the models was tested with a 5% level of significance. The hypotheses for each test were: The compatibility of the headway distribution with the fitted model is rejected (h=1) or not (h=0). The model parameters were estimated through the headway data for each test, as shown in Tables VIII and IX. Each flow rate's parameter estimation was performed using the conventional Maximum Likelihood Estimate (MLE). The MLE method was used because it is an unbiased estimator [22]. The results showed that the flow scope in the left lane of site 1 followed Pearson type 6 distribution, where the p-value of flow rates 937, 1291, 1366, 1493, and 2015v/h/ln was greater than 0.05. However, the flow rates of 1679 and 1812veh/h/ln were less than 0.05, but greater than 0.02. The middle lane results demonstrated that the flow scope followed the inverse Gaussian distribution. The results of the right lane showed that the flow scope followed the generalized Pareto distribution. The left lane in site two followed the Dagum distribution with a p-value greater than 0.05, except for the flow rate of 985veh/h/ln with a p-value less than 0.05 but greater than 0.02. The middle and right lanes showed that the flow scope followed the generalized gamma distribution.
The p-value, rank, decision to reject or accept the distribution for each flow, and rank value to determine if the distribution is most compatible with the data were calculated. These results were used to determine the appropriate distribution for each flow rate. The estimation of parameters for each lane was performed using the Theil slope estimator method in MATLAB. The parameters of the simple linear regression model were determined using the estimated parameters for each flow rate. A t-test was performed to compare the regression coefficients with the null hypothesis, as shown in Tables X and XI. Theil's slope estimator method was used for the parameters of the linear regression model because it is simple to perform in the case of small sample sizes [23]. Subsequently, the mean flow rate (F) function, the ߚ0 denoting the constant and ߚ1 denoting the slope in the model were calculated.   Each estimated parameter corresponded to a mean flow rate (F). Then, the parameters could be calculated as functions of the mean flow rate in each lane.   As seen in Table XII, the convergence of the results for the  flow values for each lane amongst the model and the real indicates the model's strength. At the same flows, the mean value of headways in the left lane is less than the middle, which is less than the right. The median value of headways in the left lane is lower than the median value in the middle, which is less than the right. This indicates that the mean and median relationship is directly proportional to each lane. This also shows that the driver's behavior is different in each lane, even when the flows are the same in different lanes.

VI. CONCLUSION
This study investigated the characteristics of the time headway modeling for each lane on two urban highways in Baghdad. The field observations used video recordings and simulations of the two sites exhibited different statistical distributions of time headways for varying flow level ranges under mixed traffic conditions. Five probability distributions were considered for investigating the time headways. Furthermore, the fitted models were different for different lanes depending on the driver's behavior while selecting the lane and their movement within it. The results are summarized as follows: • 39 hours of traffic data were collected and the ratio of heavy vehicles ranged from 9% to 38% of total flow.
• According to the field observations, the results indicate that Mohamed Al-Qassim Highway's minimum and maximum time headways were 0.166 and 59.85sec, respectively. The Qanat Al-Jaish Highway had a minimum time headway of 0.2sec and a maximum of 41.84sec. Compared to other countries, such as the USA, it is very difficult to follow a vehicle safely with a time headway of fewer than 2sec [24]. In Germany, the recommended minimum distance headway is half the speedometer, which means that a car traveling at 80km/h should maintain a distance of at least 40m. In Sweden, the National Road Administration recommends time headway of 3sec in rural areas. Police uses a critical time headway of 1sec as an orientation to impose fines [25]. This reflects the behavior of drivers on Iraqi roads. This is a traffic violation that may lead to traffic accidents while these violations increase with traffic volume.
• The comparison between automated vehicle counting and the manual method showed accuracy results of 96% for total flow and 88-97% for each lane. These results, along with the less time to extract the data, show the effectiveness of the automated method.
• Several statistical models were selected using EasyFit, and the parameters for each flow rate were estimated using MLE. The K-S test was used to examine the goodness of fit tests. • In the case of the Mohammed Al-Qassim highway, the left lane had a mean flow rate of 1432veh/h/ln for the observed data and 1591v/h/ln for the modeled data using Pearson's Type 6 distribution, showing a 6% error. The middle lane had a mean flow rate of 1210veh/h/ln for the observed data and 1261veh/h/ln for the modeled data using an inverse Gaussian distribution, having a 4% error. The right lane had a mean flow rate of 459veh/h/ln for the observed data and 484veh/h/ln for the modeled data using a generalized Pareto distribution with a 5% error.
• In the case of the Qanat Al-Jaish highway, the left lane had a mean flow rate of 1587veh/h/ln for the observed data and 1608vrh/h/ln for the modeled data using the Dagum distribution, giving an error of 1.3 %. The middle lane had a mean flow rate of 1122veh/h/ln for the observed data and 1125veh/h/ln for the modeled data using a generalized gamma distribution, with an error of 0.3 %. The right lane had a mean flow rate of 635veh/h/ln for the observed data and 620veh/h/ln for the modeled data using a generalized gamma distribution, giving an error of 2.4 %.
These results show that the driver's behavior is different in each lane, even at the same flows in other lanes.