Analysis of Crash Severity for Hazard Material Transportation Using Highway Safety Information System Data

Crash severity, as a major concern in the routing and scheduling of hazardous material shipments, has caused great loss of lives and property damage every year. Although abundant studies have been conducted to identify the relationship between different factors on crash severity, the analysis of the severity of hazard material transportation (HMT) crashes is very limited. Factors including road, vehicle, driver, and environment are not well considered in previous studies. This article analyzed the influence of various factors on HMT crash severity using Highway Safety Information System data. The random forest combined with the ordered logistic model is used for factor analysis. The results showed that annual average daily traffic, fatigues/asleep, number of lanes, speeding, adverse weather, and light are the six most important factors affecting HMT crash severity. Different from the non-HMT crashes, driver factor (e.g., driver age, gender, and drug/alcohol influence) was found to be not significantly related to crash severity. Speeding should be strictly forbidden for HMT drivers, considering the potential increased crash severity. Increasing the level of lighting can help reduce the number of severe crashes. The corresponding recommendations were provided based on the regression results.


Introduction
Transport of hazardous materials plays a critical role in the constant development of economy. The characteristics of hazardous materials are becoming more complex and the transport volume of hazard material is continuously increasing (Nam & Mannering, 2000). Depending on the type of hazardous material being shipped, a crash may mean an explosion and fire, significant injuries to the motorists involved, or environmental damage if the truck rolls over and a spill occurs. More than 13% of trucks on the roads carry some type of hazardous material (Schmidt & Price, 1979). It is estimated that about 200 hazmat trucks annually are involved in fatal crashes and 5,000 in nonfatal crashes (Craft, 2004). Among those hazardous crashes, more than half happened on roads (Oggero et al., 2006), releasing hazardous material alongside the road. People's health and the environment can be directly or indirectly affected by hazardous material crashes (Nicolet-Monnier & Gheorghe, 2013). The hazardous material release often requires a special person to deal with the event. Although the hazardous material crashes are rare events, the potential adverse consequences raise serious concerns to all stakeholders affected by the hazard material transportation (HMT; Zografos & Androutsopoulos, 2008). Thus, to ensure safe transport, HMT has its unique features. The selection of route for HMT has to meet three main objectives: to minimize the total risk of crashes, to minimize the operation time, and to minimize the sensitive number of people (Ma et al., 2013). Ma et al. (2018) applied a road screening algorithm based on genetic algorithm and Levenberg-Marquardt neural network for HMT route selection. A total of 15 features of each road were involved in the developed algorithm. Fabiano et al. (2002) developed a model to estimate the risk of HMT and to select better routes. Later, Fabiano and Palazzi (2010) analyzed the relationship between HMT and road tunnels. They suggested providing adequate and well-designed ventilation in the tunnel to tackle fires related to HMT. They also suggested designing the tunnel with positive slope from a safety viewpoint. The crash risk evaluation has been the principal criterion for determining the optimal route for HMT (List et al., 1991;Zografos & Davis, 1989). As the route selection needs to balance three objectives, there is not an "absolutely safe" route (Pradhananga et al., 2010). Crash frequency and crash severity are two concerns in understanding the relationship between crash occurrences and various risk factors (Wu & Xu, 2018b). In this article, crash severity means crash injury severity, and only the injury of the driver is considered. As the frequency of crashes related to HMT is relatively low, crash severity has been a major concern of the HMT system. Crash severity analysis is widely used for safety evaluation, traffic planning, and traffic management. It would be very useful if a comprehensive understanding of the effects of risk factors on crash severity is available (Goniewicz et al., 2016;Huang et al., 2008). Uddin and Huynh (2018) used the crash data from the Highway Safety Information System (HSIS) to investigate the influence of different factors on HMT crash severity. It was found that rural locations, light conditions, and gender were related to high probability of major injuries.  analyzed HMT crash risk at highway-rail grade crossings using the Federal Railroad Administration (FRA) crash data set with a logistic regression model. The results showed that temperature, weather condition, and vehicle types were highly related to HMT crash risk. In another study,  explored the impacts of different factors on the probability of HMT crashes and found that signal and communication causes were key factors for hazmat release. Wilkinson (2018) analyzed the factors related to the release of radioactive materials in severe air crash environments.
Although a lot of studies (Bener et al., 2003;Bochner, 1998) have investigated the influence of different factors on crash severity, none of them have highlighted the crashes related to HMT. A critical issue in crash severity analysis for HMT relates to the lack of data due to the low frequency of such events (Bener et al., 2003). The HSIS is a multistate database that contains crash, roadway inventory, and traffic volume data for a select group of states, which provides the ideal data source for HMT crash severity analysis (Pour-Rouholamin & Jalayer, 2016). This article analyzed the influence of various factors on HMT crash severity using HSIS data in four states: California (CA), Minnesota (MN), North Carolina (NC), and Ohio (OH). The corresponding recommendations were provided based on the analysis results.

Materials and Method
Five-year crash data of the four states were extracted from the HSIS database. The crash data, roadway data, and vehicle data were linked together using the accident report number, county route, and milepost information. A total of 2,484 HMT crash records were then identified. The HMT crash severity was divided into five ordered levels: Property Damage Only-PDO (P), Complaint of Pain (C), Other Visible Injury (B), Severe Injury (A), and Fatal (K). Among those records, 1.97% of them were fatal crashes (K), 29.79% were injury crashes (A, B, or C), and 68.27% were PDO crashes (P). There are multiple potential factors that may influence the severity of HMT crashes (Harwood et al., 1989). The potential influencing factors of crash severity can be roughly grouped into four parts according to different attributes: driver factors, vehicle factors, road factors, and environmental factors. For HMT, the vehicle types are usually limited to heavy-duty trucks; therefore, vehicle factors are not considered in this research. A total of 18 potential influencing factors were extracted from the HSIS database. To better interpret the influence of categorical factors, different codes were assigned to those categorical independent variables. After excluding those blank records among the investigated factors, 2,340 records were left for further analysis. A summary of the variables and their corresponding codes in the database are documented in Table 1.
A lot of models have been developed for the analysis of crash severity and its contributing factors. Discrete response models (e.g., multinomial logit models, nested logit models, mixed logit models, ordered logit models [OLMs], and ordered probit models) are widely applied in previous studies (Savolainen et al., 2011). In the mixed logit models, the impact of variables on severity levels can vary across observations. Chen and Chen (2011) applied a mixed logit model to identify key factors influencing crashes on rural highways. In another study, a time-varying mixed logit model was successfully developed for vehicle merging behavior analysis in work zone (Weng et al., 2018). A random parameters bivariate ordered probit model is another widely used model for crash severity analysis. This model can also address the unobserved heterogeneity. Chen et al. (2019) used the random parameters bivariate ordered probit model to analyze the factors influencing the injury severity of drivers in rear-end collisions. Some other models such as partial proportional odds (PPO), decision trees , multivariate Poisson regression (Ma & Kockelman, 2006), and artificial neural network (Delen et al., 2006) have been also used for crash severity analysis by researchers. Basically, each method has its own advantages and disadvantages. Therefore, there is not a uniform standard for model selection. In this article, we used a combination of random forest (RF) and OLM named RF-OLM for factor analysis.
The RF, one of the decision tree-based methods, has attracted more interests by researchers (Das et al., 2009;Kaplan & Prato, 2012) because of its excellent performance in handling multiple independent factors (numerical and categorical factors). In this research, RF is applied to identify the ranking of the importance of suspicious factors on HMT crash severity. The principle of RF regression is the minimum mean square error (MMSE). For any random feature A, the corresponding random division points can generate two data sets:D 1 and D 2 . The RF calculates the situation when both D 1 and D 2 have MMSE. Then the corresponding feature and division point can be obtained when the sum of MMSE of D 1 and D 2 reaches the minimum value. This procedure can be presented as where c 1 and c 2 are the average value of the samples in D 1 and D 2 , respectively. RF can reduce the overfitting issue by averaging several trees. In this research, RF is applied for variable importance examination before OLM regression.
The results from RF are used in the following stepwise regression for OLM. The "mean decrease in accuracy" can be generated from RF to rank the importance of the variables. The strategy can be illustrated as Variables with large values are ranked as more important. The idea of the mean decrease in accuracy is to disorder the values of each feature and identify the impact of the change on the accuracy change of the model, which is named as outof-bag. If the feature is important, disordering the values can reduce the accuracy of the model (Wu & Xu, 2018a).
OLM is a regression model for ordinal dependent variables. We assume the model takes the form of Equation (3): where y * is the exact but unobserved dependent variable, x is the vector of independent variables, ε is the error term, and β is the vector of regression coefficients in the estimation. Assume y represents the categories of y * ; then, OLM uses the observations on y, which are a form of censored data on y * , to fit the parameter vector β. It should be noted that the OLM is essentially a linear classifier. The outlier can greatly impact the effect of the OLM. But in this article as the data were already pre-processed, the influence of data variance on the OLM should be very limited.

Results
The results of importance ranking of 16 independent variables with RF are illustrated in Figure 1. The top 5 prioritized factors influencing crash severity were annual average daily traffic (AADT), number of lanes, road surface, adverse weather, and speeding. Removing AADT can decrease the accuracy by 26.0%, indicating AADT has the highest influence on HMT crash severity (similar to non-HMT crashes). Driver age and drug/alcohol influence have limited impact on crash severity as removing driver age and drug/alcohol influence can only decrease the accuracy by 2.5% and 2.6%, respectively. As drug/alcohol usage is usually strictly prohibited for drivers of HMT by the law and regulations of HMT companies, the percentage of drug/alcohol influence is pretty low in the records (0.8%). Therefore, drug/alcohol influence showed less importance in Figure 1.
Previous studies (Goniewicz et al., 2016; have found that road surface condition and weather have a strong correlation with each other. "Road surface" was excluded from the potential contributing factors to eliminate the collinearity among the independent variables. As drug/ alcohol influence and driver age have limited influence on the mean decrease in accuracy, those two variables were also removed before OLM regression. The package "polr" in R was used to implement the model. As there are 13 independent variables involved in the initial model, it is necessary to examine the significance of their influence on HMT crash severity. The analysis of variance (ANOVA; Van Houten & Malenfant, 2001) is used for null hypothesis examination in this article. Table 2 shows the results of ANOVA of the initial model.
The results in Table 2 show that a lot of independent variables have a p value higher than .1, indicating the null hypothesis could not be rejected. Only AADT and fatigues/ asleep showed a p value at .05 significant level. Apparently, some unnecessary predictors exist in the current OLM model. Unnecessary predictors can add noise to the estimation of other quantities that we are interested in and can waste degrees of freedom. The unnecessary predictors can also cause a collinearity issue. A backward elimination method was applied for variable selection (Michael Shenoda, 1998). The p value and Akaike information criterion (AIC) are used for predictor removal. AIC is an estimator of the relative quality of statistical models for a given set of data, relative to each of the other models. A good model should have a lower AIC value and a lower p value for each independent variable.
Our proposed backward elimination is briefly introduced as follows: (a) start with all predictors in the model; (b) remove the predictor with the highest p value; (c) refit the model, calculate AIC, and go to step (b); and (d) stop when the minimum AIC is found. The generated AIC of different models is summarized in Table 3.
It is shown that Model 8 has the lowest AIC. When removing light from Model 8, the new generated Model 9 has a higher AIC. This means that light should be included in the regression model. The final selected Model 8 contains six independent variables: fatigues/asleep, speeding, number of lanes, AADT, light, and weather. The regression result is illustrated in Table 4.
Other than the six predictors, there are four intercepts (4 = number of levels in severity − 1) in the results. The order of the variables (importance from high to low) is AADT, fatigues/asleep, number of lanes, speeding, adverse weather, and light. The order is somewhat different from the results of RF in Figure 1. This also verified that the unnecessary predicts in the initial model can binary the regression results. The estimates in the output are given in units of ordered logits or ordered log odds. For example, for a one-unit increase in fatigues/asleep (going from No to Yes), there is a 0.6919 increase in severity on the log odds scale, given all of the other variables in the model are held constant. The coefficients from the model are somewhat difficult to interpret because they are scaled in terms of logs. Another way to interpret OLM is to convert the coefficients into odds ratios. Table 5 illustrates the odds ratios of the independent variables. As for fatigues/asleep, for a one-unit increase in fatigues/asleep, that is, going from 0 (No) to 1 (Yes), the odds of increased severity is 2.00, given that all of the other variables in the model are held constant. This means that fatigues/asleep can increase the severity of HMT crashes. The odds ratio of speeding is 1.31, indicating that speeding can increase HMT crash severity. This is common sense. The odds ratio of the number of lanes is less than 1.00. These results showed that the presence of more lanes on the road can decrease HMT crash severity as there are conflicts between HMT vehicles and other vehicles. When AADT is less than 50,000, the odds ratio varies from 1.10 to 1.42, indicating crashes tend to be more severe with increased AADT. When AADT is more than 50,000, the odds ratio drops to 0.94. This may be caused by the speed reduction in heavy AADT (Rangel et al., 2013). The calculated odds ratio for light indicated that darkness can increase the severity of HMT crashes. The adverse weather has a 0.86 odds ratio. The result shows that adverse weather is not necessary to increase the severity of HMT crashes. It should be noted that HMT drivers are required to have a commercial driver license (CDL). Drivers are usually well trained to drive under adverse weather. At this point, HMT drivers' behavior is different from non-HMT drivers' behavior.  Note. AIC = Akaike information criterion.

Conclusion
The factors influencing HMT crash severity levels were investigated in this article. A total of 2,340 crashes were used for analysis. RF was used to initially examine the relationship between 16 suspected factors and HMT crash severity.
And it was found that AADT had the highest influence on HMT crash severity, whereas driver age and drug/alcohol influence had limited impact among the investigated factors. The OLM was applied for regression. A backward elimination process was proposed to filter the important factors. According to the regression results, AADT, fatigues/asleep, number of lanes, speeding, adverse weather, and light are the six most important factors influencing HMT crash severity. Different from the non-HMT crashes, driver factor (e.g., driver age, gender, and drug/alcohol influence) was found to be not significantly related to crash severity. AADT is another important factor influencing the severity. The impact of AADT on HMT crash severity does not follow a linear trend. In general, the increasing AADT (<50,000) generates more severe crashes. However, when the AADT is heavy (>50,000), the severity of crashes decreases. The results in this method being used for HMT route selection.

Discussion
As the trip of HMT is usually long, fatigues/asleep is a critical issue and can greatly increase the severity of HMT crashes.
To reduce fatigues/asleep, the HMT companies or HMT policy-makers may need to force the drivers to rest after they reach a pre-defined working time. Speeding should be strictly forbidden for HMT drivers, considering the potentially increased crash severity. The number of lanes should be considered when planning the HMT routes. A higher proportion of more lanes on the road is helpful to reduce crash severity.
Other geometric factors, such as the presence of intersections, divided/undivided roads, and horizontal/vertical alignment,   have limited influence on crash severity. It should be noted that although crash severity may not increase when AADT becomes high, the high AADT can definitely increase the crash frequency based on the Highway Safety Manual (Goniewicz et al., 2016). Increasing the level of lighting can help reduce the number of severe crashes. This is in good agreement with the finding by Pour-Rouholamin and Jalayer (Pradhananga et al., 2010). As for weather, the odds ratio is less than 1, indicating the adverse weather may not increase the number of severe crashes. This is somewhat different from general crashes.
In this research, we applied the combination of RF and OLM for the analysis of HMT crash severity. The other methods may also achieve similar or even higher accuracy compared with RF-OLM. The performance of other methods requires further investigation. This article did not consider the influence of releasing hazardous materials into the nearby population. For future research, the population along the route should also be considered for HMT route planning. The HSIS did not contain different accident scenarios (e.g., toxic release, fire, explosion) and potential environmental damage; more data need to be collected to analyze the pollution related to HMT crashes. It should be noted that the same factor may have different effects on crash severity in different observations; adding random coefficients to the OLM can be a possible way to solve this limitation in future studies. Comparison of the results of this article with the findings from different countries will be conducted in the next step.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.