Accident prediction modelling for expressways: a review

Expressways are the need for fast movement of goods and human being for long distances in current scenario for every country, but at the same time these facilities have the highest severity rate among all the road categories. Various researchers around the world have tried to find out the effect of traffic volume, road geometry and environmental factors on frequency of accident on expressway by using different accident prediction modeling methods during last three decades. The purpose of this review paper is to find out the appropriate modeling method which can be used to predict accident frequency on the expressways in the developing country like India where traffic conditions, vehicular characteristics, and the driver behaviour are very different from the developing countries and to find out the research areas where the findings are inconclusive. Literature review suggests that among various models used so far, correlated random parameter model found to be the most advanced model to simultaneously account for both the heterogeneous effects of explanatory factors across the road segments and the cross correlations among the random parameter estimates. Findings related to variables like percentage of heavy vehicles, vertical gradient and lane width are not conclusive. However, safety effect of variables like speed limit, roadway lighting, pavement type and fog are less studied on expressway. The effect of allowing two wheelers on the expressways safety has not been studied yet, and the accidents related to fatigue or drowsiness also need examination.


Introduction
Road traffic accidents are the important cause of death and injury in all countries, causing more than 1.35 million deaths each year and about 50 million injuries globally. About ninety percent of these deaths and injuries are being happened in developing countries [1]. Among the developing countries, India have the highest numbers of traffic deaths in the world with 1,51,113 persons killed in 2019 [2]. Road traffic deaths and injuries become unnecessary burden on the health system and as well as on the economy of the developing countries. IOP Publishing doi: 10.1088/1757-899X/1236/1/012011 3 handling over-dispersion and under-dispersion. Its results are seriously affected when size of the sample as well as the sample mean are small [3]. To overcome these limitations and enhance the predictive power of the Poisson model many variants of Poisson model are in use.
The zero-truncated Poisson (ZTP) model [21], also known as conditional Poisson distribution or the positive Poisson distribution, is used when due to limitations of observability and omission of certain accident data, the sample has to be truncated. Hosseinlou et al. [19] developed a zero-truncated Poisson (ZTP) model for the estimation of accidents and potential violations on Iran's expressway and compared the results with Poisson model. The results show that ZTP could fit data better than Poisson model for truncated sample.
The negative binomial/Poisson gamma (NB) model has been widely used during last three decades for expressways by various researchers. It is an extension of Poisson model [3], effectively dealing with over-dispersion issue of accident data. However, it has limitations in addressing the correlation in data due to temporal effects and location specific effects as it assumes accident frequency on a road segment as independent [7,[22][23][24].
Poisson lognormal model can account over-dispersion issue [25] and found more flexible than NB model [3]. However, it has limitation of complex model estimation. Caliendo et al. [17] developed negative multinomial model for expressway to address the problem of correlation among observations. Similar to Poisson and NB model, negative multinomial model also do not have the capacity of handling under-dispersion [3].
A univariate negative binomial (UVNB) model was developed by Li et al. [26] for the estimation of accident likelihood by collision type on the expressway in the state of Florida. Univariate or independent count NB models do not address correlation problem between different severity levels [26]. To address correlation problem, bivariate or multivariate approaches can be applied. However, these models are complex to estimate due to formulation of correlation matrix [3]. A bivariate negative binomial (BVNB) model was applied by Chen et al. [27] to analyze the effect of geometric elements on the safety of expressway. A multivariate negative binomial (MVNB) model was applied by Anastasopoulos [28] to estimate injury-severity frequency for expressway in Indiana.
A multivariate Poisson lognormal (MVPLN) model was estimated by Li et al. [26] to estimate the impact of independent variables simultaneously on various types of accidents, and the results showed that prediction accuracy of MVPLN model was found high as compared to the UVNB model. The MVPLN model can handle negative correlations, whereas MVNB model is not capable of handling this issue [29]. Wang et al. [30] was also recommended the use of the MVPLN in estimating weather related accidents by accident type and severity, and results showed more unbiased parameter estimates as compared to the UVPLN model.
Haq et al. [31] applied a zero inflated negative binomial (ZINB) model for the expressway in the state of Wyoming to develop APMs for truck accidents. The ZINB model is good for over-dispersed accident frequency data that have excess zeros and offers enhanced goodness-of-fit than the NB model [32]. This model assumes that there are two different state of accident generating process: an accident free state, and a normal accident prone state. However, the ZINB model does not correctly repro-duce the accident-data generating process [32]. Multivariate zero inflated negative binomial (MVZINB) model can handle correlation issue among different injury-severity levels and types of accident [28,33].
The unobserved heterogeneity arising from temporal and spatial effects, results biases in parameter  [34]. Many researchers [7,[35][36][37][38] have used random effect negative binomial (RENB) model to address the problem of unobserved heterogeneity while estimating accident frequency on expressways. The results showed that RENB model performed better as compared to NB model in handling unobserved heterogeneity issue. However, if the assumptions regarding variability of spatial and temporal effects are not correct, RENB model may be insufficient [37]. Some researchers [11,18,39] developed random effect model for expressways to address the unobserved variations at different sites. Zeng et al. [18] developed Random effect model with autoregression-1 (AR-1) correlation structure (REAR-1) to address chronological association among accidents occurred in the consecutive months and also unseen variations and temporal correlation present in the data. Many researchers have used random parameter negative binomial (RPNB) model to address the issue of unobserved heterogeneity for the estimation of accidents on expressways [35,37,38,[40][41][42][43]. RPNB model is an extension of the RENB model [37], and have more flexibility in handling unobserved heterogeneity as compared to the later one. However, parameter estimation process for RPNB model is more complicated. The RPNB model cannot address the problem of probable correlation between the varied effects of explanatory factors [38]. Hou et al. [38] developed a correlated random parameter negative binomial (CRPNB) model for expressway tunnel safety in China and the results showed better goodness of fit as compared to RPNB model. CRPNB model estimates were found better accounting for cross correlations among the random-parameter estimates as well as heterogeneous effects of explanatory factors [43].
Due to the increasing availability of geo-referenced accident data and increasing computational power, Liu et al. [44] have estimated a geographically weighted negative binomial (GWNB) model to predict accident frequencies on expressway. GWNB model is capable of handling spatial heterogeneity and outperforms the traditional NB model. However, it does not address temporal heterogeneity. GWNB models are highly localized and they cannot be generalized to other locations [44].
Random parameters negative binomial-lindley (RPNB-L) model has been used by Rahman Shaon et al. [42] for handling expressway accident data with unobserved heterogeneity and high overdispersion, and the results showed that this model outperformed RPNB model.
APMs for expressways have also been described in Highway Safety Manual (HSM) supplement [45]. These prediction models consist of a base safety performance function (SPF) with accident modification factors (AMFs) and a calibration factor to estimate the predicted average accident frequency. Using HSM approach La Torre et al. [9] analyzed single and multiple vehicle casualty and severe accidents for Italian expressways and provided a trustworthy tool for expressway agencies to deal with potential safety issues.
The generalized estimating equation (GEE) is an extension of generalized linear models to accommodate correlated dependent variable [46]. There are different approaches in this method to handle serial correlation including independent, dependent, autoregressive 1, exchangeable and unstructured correlation structure [3]. The GEE method was used by Mohammadi et al. [47] and Zheng et al. [46] to model the yearly accident counts on expressways to accommodate the temporal correlation problem.
To account for the effect of unobserved heterogeneity and spatio-temporal correlation in the data, Bayesian spatial and temporal approaches have also been proposed for the accident prediction [11]. Conditionally auto regressive (CAR) model is most common Bayesian spatial model, capable of handling correlation issue among the consecutive segment on the expressway [11]. Spatial Poissonlognormal model is useful for addressing unobserved heterogeneity and spatial correlation among neighboring road segments of expressway [35].

Non-Parametric Accident Prediction Models
Non-parametric models do not require any prior assumption of the relationship between dependent variable and explanatory variables [20,48]. Artificial Neural Network (ANN) model is one of the commonly used models suitable for nonlinear interactions among variables and it allows the inclusion of a large number of variables. However, ANN does not explain parameter estimates associated to the explanatory factors [20] and the results cannot be applied to the data sets with different specifications [49].
Classification and Regression Tree (CART) is a good tool for prediction and classification problems. This model can effectively handle collinearity and outliers problem which may be an important issue with regression models [50]. It was found a better substitute for NB regression models to analyze expressway accidents. However, CART model has its own limitations of not make use of continuous, ordinal variables, have problems in handling the interactions between independent variables and in doing elasticity analysis or sensitivity analysis [50].

Significant Explanatory Variables
The literature suggests that the choice of explanatory variables into the model has been based upon availability of data, significant variables established in earlier re searches and the new variables which can be accurately calculated [51]. Significant explanatory variables found in previous studies and their impacts on accident frequency are discussed in the following paragraphs.

ExposureVariables.
The main exposure variables selected are Traffic volume and segment length [27]. Almost all researchers found traffic volume as a significant independent variable affecting accident frequency. Most of the researchers [7, 16, 17, 20, 28, 30, 40-43, 46, 47, 52-55] found that annual average daily traffic (AADT) or ADT was positively correlated with accident frequency. Zeng et al. [18] used monthly total traffic (MTT) instead of AADT to avoid information loss in time-varying variable and found positive association with accident frequency. Persaud and Dzbik [53] used hourly traffic volume for microscopic model to capture daily variation in traffic volume. Many researchers [17,18,40,41,43,46,55] found the expressway segment length to be positively correlated with number of accidents. Wen et al. [56] used variable daily vehicle kilometer travel (DVKT) and found it positively associated with accident frequencies. Similarly, Wen et al. [11] used variable monthly vehicle kilometer travel (MVKT) and found positively correlated with the accident frequency.

Traffic Conditions.
Traffic composition was used as an important independent variable by various researchers to identify the safety effects on expressway. Some researchers [20,24,31] found that higher percentage of trucks in traffic volume on expressway causes more number of accidents, and this could be attributed to the difference of speed between light vehicles and heavy vehicles. However, other researchers [11,47,57] found that high fraction of trucks and trailers in traffic volume result in decreasing accident risk. This contradictory finding could be associated with cautious driving of light vehicle drivers close to heavy vehicles, better driving capabilities of heavy vehicle drivers as compared IOP Publishing doi:10.1088/1757-899X/1236/1/012011 6 to light vehicle drivers, and increased homogeneity in traffic due to higher percentage of heavy vehicles in traffic volume that reduces traffic conflict [11,47,57]. This ambiguous conclusion needs more investigation in future. Speed limits for car and trucks were used by Hou et al. [37] as an independent variable and observed that a lower speed limit for trucks and high speed limit for cars may reduce truck accidents. Hosseinlou et al. [19] found a positive cor relation between speed and weekly accidents on expressway in Iran. However, "the safety effects of speed limits are still not fully understood and should be further inves tigated" [37].

Geometric Design Elements.
Design consistency, horizontal alignment, and vertical alignment are the main component of expressway geometric design that affect the traffic safety. Montella et al. [58] found two design consistency measures namely operating speed consitency, and driving dynamics consistency as significant independent variables. Design consistency is the conformance of expressway's geometry with driver expectancy. Accident frequency on horizontal curves increases with increase in operating speed reduction and decrease in the difference between friction demand and friction supply [58].
Hou et al. [37] found that the accidents were significantly associated with curvature, deflection angle, curve direction, and length of horizontal curve. The curvature has been found positively correlated with accident occurrence [24,37,54]. Wen et al. [56] found that the curvature was positively associated with the property damage accidents only. Haq et al. [31] found a decreasing impact of horizontal curve radius on the truck related accidents. Studies also show that setting more gradual horizontal curves can reduce the number of accidents [46]. The deflection angle found positively correlated with the probability of accidents and left turn curves were observed to be more dangerous than right turn curves leading to different design of safety measures [37,52]. The length of horizontal curve was found negatively associate with the likelihood of accident occurrence [37].
The expressway sections with steep grades found positively correlated with the accidents likelihood due to escalating speed variation between light and heavy vehicles [18,37,48]. However, Wen et al. [11] and Ma et al. [7] found negative correlation among vertical gradient and probability of accident occurrence. This is a contradictory conclusion as compared with the previous investigations [18,37,39,48,56] and need to be further investigated. Downgrade sections of expressway found more unsafe as compared to upgrade sections and this could be attributed to the failure of brakes of heavy vehicles due to frequent application to control speed [7,24,31,37,58]. Increase in vertical grade found more injury accidents on expressway, and this could be attributed to adverse impact on stopping sight distance on steep grades [ 2 7 , 5 6 ] . Hou et al. [37] found that the vertical curves separations increase safety on expressway due to improved sight distance. [7,18,24,30,37,41,48] found positive correlation between number of lanes on expressway and accident probability due to escalating lane change opportunities and consequently rising conflict between the traffic.

Cross-section Elements. Many researchers
Saeed et al. [37] found that lane width have negative correlation with the injury accidents. Ture Kibar et al. [14] concluded that truck involvement in accidents reduces with increased lane width. However, Mohammadi et al. [47] found positive correlation among the lane width and accident occurrence. Thus, this contradictory finding requires further investigation. Undersized and oversized lane factors were found statistically significant and positively related to accident frequency for urban expressway [55]. Lane width less than 3.25 m and more than 3.75 m were considered as undersized and IOP Publishing doi:10.1088/1757-899X/1236/1/012011 7 oversized segments respectively [55].
Some researchers found a decreasing impact of adding the climbing lane on the accident frequency [24,31,37] along section with steep grades. This could be attributed to the reduction in interaction among light vehicles and heavy vehicles after adding climbing lane.
Hou et al. [37] found that increase in the median barrier offset can significantly lower the accidents as the larger median barrier offset reduces driver's tension and anxiety of hitting the barrier. It was found that the segments with median barriers witness decreased number of accidents [40,42].
Median width was found significant in affecting safety for expressways [16]. A negative correlation was found between median width and number of accidents on expressway [ 2 7 , 4 2 , 5 4 ] .

Category of Segment.
Higher accident frequency was observed on the segments in service areas, tunnel and within interchanges as compared to the basic segments under similar conditions, probably caused by frequent braking, accelerating, and lane changing maneuvers on these segments [43].

Pavement Related Variables.
Pavement condition factors significantly associated with accident frequency on expressway include distress ratio, rutting depth, and International Roughness Index (IRI) [37]. Friction and pavement type were also significantly affected accident frequency. Saeed et al. [43] found that the segments with excellent pavement condition (IRI ≤ 100) had a higher number of injury accidents due to associate higher speeds. Chen et al. [27] found that 1 percent increase in pavement serviceability index would decrease no-casualty accidents on expressway by 0.26 percent. Increasing IRI was found associated positively with the accident frequency [40]. Increased friction had a positive impact on the accident frequency [42]. Change in pavement type (from concrete to asphalt) almost always increases the probability of accidents [42]. No explanation of this phenomenon is found in the literature and hence need to be investigated further.

Some Other Variables.
Traffic Sign: Installation of more traffic signs up to a specific number would have positive impact on accident frequency [46].
Ramp Density: Ramp density and ramp combination are statistically significant and positively related to accident frequency for urban expressway [55].
Lighting: In comparison to both side lighting, point lighting, no lighting, and continuous lighting were found to have a negative effect on accidents [41]. However, the effect of this variable was less studied.
Weather Related Variables: Haq et al. [31] found a increasing impact of rainy day per year on the truck related accidents on expressway. Strong winds increased the problem of controlling heavy vehicles on curves and consequently the frequency of accidents [37].
Pavement Marking: Zhao et al. [59] found raised pavement marking as an effective measure for reducing accidents in expressway tunnels. Lee et al. [60] observed a significant reduction in the average driving speed of the vehicles at the ramps of expressways by using transverse pavement marking and thus consequently found a reduction in accident frequency. Yang et al. [61] compared the effect of traditional marking, fishbone-shaped marking and edge-rate marking on the speed reduction of vehicles at the expressway's deceleration lane and found that fishbone-shaped marking was effective in reducing vehicles speed from 12.3 km/h to 15.2 km/h.  [11,30,47] have tried to understand the interaction effect of risk factors on accident frequency. Mohammadi et al. [47] observed that accident frequency on an intersection in urban area decreases with increase in speed limit and at a speed limit of 55 mph, by increasing the number of lanes. Wang et al. [30] found that total number of rain related accidents reduce with increase in median width, but, single-vehicle rain related accidents increase. This might be due to reduced driving visibility and the driver's lane control ability, which results in median-related run-off-road accidents in presence of a wider median [30]. Interaction between curve and rain increased accident frequency on expressway and it might be attributed due to decline in skid resistance in rainy weather [11,30]. Interaction between rain and vertical gradient had reduced the number of accidents and it might be due to alerted driving behavior under difficult climatic situations [11]. Wen et al. [11] also found that interaction of wind speed and vertical gradient could increase accident likelihood.

Conclusions
This paper attempts to review findings of more than 60 research articles related to accident prediction modeling on expressways to find out suitable methods of modeling, specific data requirements and issues related to data quality and quantity, and criteria for selection of a variable into the model. One of the major objectives of this work was to identify the areas where the results are still inconclusive. According to the literature review, it has been found that the most of the APMs have been developed for the expressways in the developed countries and only a few researches available on APMs for the expressways in the developing countries, which may be attributed to the late development of the expressways type facilities in these countries as compared to the developed one. However, the traffic composition, vehicle characteristics, driver behavior, and environmental characteristics in developing countries like India are very much different from those in developed countries. So, the APMs developed for expressways in developed countries could not be directly applied in developing countries.
The correlated random-parameter model is the most advanced approach in all the parametric modeling methods to date, as it can account for both the heterogeneous effects of explanatory factors across the road segments and the cross correlations among the random-parameter estimates, but at the same time this approach is computationally expensive.
Non parametric methods of accident prediction modeling for expressways have better predicting capabilities as compared to the parametric one, however, at the same time these models do not provide the quantification of the effect of various explanatory variables on the accident frequency.
Various independent variables (exposure, traffic condition, expressway design, cross-section, segment category, pavement condition, weather condition, and interaction effect related) have been used to develop the APMs for expressway during last three decades. Findings related to variables like percentage of heavy vehicles, vertical gradient, lane width are not conclusive requiring further investigation. However, safety effect of variables like speed limit, roadway lighting, pavement type, and fog are less studied in the literature, and should be investigated further. Literature review suggests that no study has been conducted so far on the accident frequency due to the effect of allowing two wheelers on the expressways. Also, the accidents related to fatigue or drowsiness also needs examination which is one of the leading causes of accidents on expressways in India.