Analysis of Rear-End Crash on Thai Highway: Decision Tree Approach

,


Introduction
Crash trends on ailand highways are continuously on the increase [1][2][3].Crash type statistics reveal that rear-end collision is the second most common type of collision.However, the highest number of fatalities occur as a result of rear-end collisions (Figure 1).erefore, nding strategies to decrease the number and severity of rear-end crashes is urgently needed.
ere are two important issues in the study of rear-end crashes at intersections.First, a study of the causes of rear-end crashes, focusing on at-fault and not-at-fault drivers, has found that most crashes are caused by drivers not leaving enough space between their own car and cars in front [4].erefore, the cause of rear-end collisions, is the car behind [5]. is study focuses on the driver characteristics of the at-fault driver, that is, the driver of the car behind that crashes into the car in front, by applying Quasi-Induced Exposure Methods [6].
ese methods have been widely used in the eld of tra c accident research.e principle of these methods is to predict the at-fault driver based on the accident report [7,8] by supposing that the distribution of not-at-fault drivers closely represents the distribution of exposure to accident hazards [9,10].Second, this research explores the relevance of the high fatality rate caused by rear-end crashes.Fatal crashes must be considered from a characteristic study of rear-end crashes by focusing on ways to reduce fatalities.Sullivan and Flannagan [11] studied fatal crash risks and found that darkness is the risk factor causing the greatest number of fatalities, due to the invisibility of vehicles parked along the roadside.Wiacek, et al. [12] found that the greater the di erence in velocity of the struck car and the striking car, the higher the number of rear-end crash fatalities.However, if a truck is involved in a rear-end crash, the chances of fatality are further increased.
Previous research has found that factors causing death in rear-end crashes included driver characteristics a ecting braking, such as gender, age, and alcohol or substance abuse [21].Use of a seatbelt has been found to be another important contributing factor to rear-end fatalities [22].Vehicle type is an important factor in all accident types [23], but especially in rear-end crashes.If the types of vehicle involved in a crash are very di erent, the chances of severity are higher [24].Speed limit factors also a ect the severity of the crash [12].Other important characteristics of fatal crashes are physical road characteristics and visibility [21].
A statistical analysis of rear-end crashes involves independent variables, such as weather conditions, vehicle type, seat belt use, and dependent variables, such as at-fault driver and not-at-fault driver, and fatal and nonfatal rear-end crash.
e distribution analysis method has been widely used to generalize whether an estimated parameter exists.If there is an estimated parameter, the relationship between independent and dependent variables is considered.If there is no estimated parameter, data are investigated proportionally.Yan and Radwan [9] have stated that there are limitations to the use of parametric analysis (binary logistic regression) due to the difculty in using it to investigate the relationship between two variables.us, an appropriate alternative is nonparametric analysis, or Decision tree or classi cation tree (DT). is is an algorithmic arrangement to perceive proportions of data according to determined dependent variables (also known as data mining) [25].us, appropriate data can be used to analyze complex independent data [9,26].A decision tree is a structure that includes a root node, branches, and leaf nodes [27].Yan and Radwan [9] have used DT to study rear-end crash data in Florida, USA, by analyzing two models.e rst was an analysis of which accidents involved rear-end crashes, and the second was an analysis of driver characteristics of individuals who could potentially become at-fault-drivers.
In choosing a model for this study, other models that can analyze the relationship between independent variables and target or categorical variables were considered.A traditional model using multiple logistic regression which has been widely used [24].Another common model is the multinomial logit model which theoretically analyzes data using the nested logit model (NLM), which can examine hierarchical dependent variables  1: Crash on ai highways by crash type.[28].Odds ratio is used to interpret probability.e advantage of this method is the ability to compare the effects of explanatory variables on dependent variables, especially when independent variables result in statistical significance.However, the limitation of each of these models is their inability to find relationships between explanatory variables.e Decision Tree Model (DT), however, potentially solves this problem.As mentioned earlier, rear-end crashes are the cause of high fatalities.erefore, the presentation of this model simultaneously identifies relationships between independent variables, which may allow for the application of findings to policy development.For example, an examination of whether the different ages of drivers in different traffic lanes affects the role of the driver (at-fault/not-at-fault) in a crash can influence the development of effective policy.Research by Khan et al. [29], which compared DT and ordinal discrete choice model, confirmed that DT can help to address issues of multicollinearity and variable redundancy.Among studies that have analyzed rear-end crashes (Table 1), most have analyzed crash frequency, followed by crash severity (fatal/nonfatal).One study has analyzed both crash frequency and severity outcome [30].However, the crash data used in that study came from a country with different roads, conditions, driver behaviors to ailand, leading to the development of a very different model.No concentrated road crash study of highways in ailand has been conducted which applies the DT model to the reduction of the number of rearend crash fatalities and fatal rear-end crashes. is research will discuss model consistency with the number of fatalities, by comparing the two with previous studies as a guideline for conducting future research.

Highway Crash Reporting
is study used Department of Highway (DOH) road accident data from 2011 to 2015.ese data included dates, road segments, physical characteristics of accident scenes (e.g., straight road, curved road, work zone, median, intersection), environmental conditions (e.g., rain, lighting conditions, time of accident), cause-and-effect data (e.g., driving over the speed limit) and injury data (including fatalities, serious injuries and minor injuries).e information provided by the DOH may not cover all accidents.In cases of minor collisions, where victims came to an agreement, accidents were not recorded.
Rear-end type collisions were selected from these data, and divided into three main types according to the movement of the front car prior to the collision [5].ese three are (1) going straight, with the front car traveling at normal speed, (2) decelerating speed, with the front car decelerating, such as when turning the car or executing a u-turn, and (3) stopping, with the front car parked on the roadside or on the hard shoulder or stopped at traffic lights.A er screening, there were 2,096 cases of rear-end collision.As vehicle data had to be considered in this analysis, driver and vehicle factors were added to the model.e dataset comprised 5,445 vehicles involved in accidents.
Descriptive statistics, shown in Table 2, define the dependent variables: (1) e at-fault driver is the driver of the striking car, while the not-at-fault driver is the driver of the struck vehicle, (2) Fatal rear-end crash refers to a collision with at least one fatality either at the accident scene or at the hospital, while nonfatal rear-end crash denotes rear-end crash without fatality.For all 22 independent variables of the two models, they exhibited with the values of the two dependent variables.Data description was displayed to help illustrate the overall picture created by the data [31,32].A er cleaning the data for driver exposure, there were 2,458 at-fault drivers and 2,096 not-at-fault drivers.With regard to crash fatalities, 1,156 vehicles were involved in fatal rear-end crashes, and 3,396 vehicles were involved in nonfatal rear-end crashes.According to vehicle type (Veh_type), medium cars, such as private cars T 1: Comparison with others studies in analysis of rear-end crash field.
Note: At-fault or not-at-fault driver are assumed related to the rear-end crash frequency.usually include a depressed median and barrier.Some pairs exhibited no relationship, such as driver age and road slope, or driver gender and road surface type.

Classi cation Tree and Building Model.
is study used a decision tree or classi cation tree (DT) model for rearend crash data analysis, which started by determining target variables (dependent variables).Two models were constructed.Model#1 analyzed at-fault/not-at-fault drivers.In order to consider this variable, the driver factor was only selected for the rst and second vehicles, as the rst vehicle was clearly identi able as accident-prone.erefore, 4,192 vehicles (2,096 rear-end crashes) were analyzed in the model.Model#2 was an analysis of factors resulting in fatal and nonfatal rear-end collisions.erefore, data included the two or more vehicles and pickup trucks had a 28.0% chance of being at-fault.According to fatal rear-end collisions, larger vehicles were the cause of 11.7% of collisions, and small vehicles, such as motorcycles, were the cause of 8.5% of accidents (Figure 2(a)).Light condition (env_light) was the dominant environmental factor a ecting fatalities, with 42.2% of rear-end collision fatalities occurring at night in the absence of light, 27.9% occurring at night with light, and 22% occurring in the daytime (Figure 2(b)).
e distribution of continuous variables is shown in Table 3.With regard to driver age distribution, there was little di erence between the ages of at-fault drivers and notat-fault drivers.e average age of at-fault drivers was 38.04 years and of not-at-fault driver was 38.58 years.e mean value of trucks involved in fatal rear-end crashes was 18.2% and in nonfatal rear-end crashes was 16.2%.e DT model was then used for further analysis.Predictions could then be presented as logical if-then conditions at the terminal node.us, data did not require normal distribution.In other words, the relationships between independent and dependent variables was not obligatory for the existence of linear relationships [33].
Relationships between the independent variables are shown in the pairwise coe cient correlation model (Table 4).Two highly correlated pairs were found: (1) road surface factor (env surface) correlated with weather condition ( = 0.840).
is was particularly evident in cases where there were unusual conditions, such as rain resulting in a wet road surface; (2) Factor of the number of tra c lanes and median type ( = 0.621).is relationship is rational, as roads in ailand typically have four or more tra c lanes and median types   road lane, which can be interpreted that if a driver aged more than 21 years drives on a road with 10 or more traffic lanes (considering at only 10 lanes as there is no frequency of seven lanes), the chances of being at-fault drivers are 61.7%. is is consistent with research by Pande et al. [38]. is may be because roads with many lanes provide greater opportunities for speeding and vehicles are o en parked on the roadside.Some less observant older drivers may be at fault for rear-end collisions.For accidents occurring at the median (median_ opening), where the median is on a road with fewer than 10 traffic lanes, drivers older than 21 years were more likely to be at-fault.Due to the characteristics of median openings, front car drivers are more likely to reduce car speed in order to turn or execute a u-turn.If the car behind is too close, the chances of a rear-end collision are high.Dividing drivers into less and more than 25 years is a variable that has not previously been investigated.is research found that drivers in these two age ranges potentially consist of not-at fault drivers.When considering drivers aged over 25 years together with median type, there are more at-fault drivers when driving on unoccupied streets with a raised or flush median, with a greater chance of being at-fault than drivers on roads with barriers or depressed medians.e causes of these results were raised median, no median, and painted median.In ailand, most of these median types are used on roads with low traffic flow, such as in residential areas or urban streets.erefore, when driving too close, there is a chance of rear-end collision.is is consistent with research conducted by Joon-Ki et al. [40], Baldock et al. [41], who concluded that spacing on lowspeed roads is a major cause of rear-end collisions.However, a study by Das and Abdel-Aty [30] indicated that median type had no effect on the frequency of rear-end collisions.
Overall policy and public relations, therefore, should promote the reduction of rear-end collisions in the following ways: driver training should place special emphasis on drivers under 21 years of age, focusing on driving at the legal speed limit, and maintaining an appropriate distance from the vehicle in front.For drivers aged 21 years and older, it is important to pay special attention to roads with more than 10 lanes, and to take greater care of median openings on roads with fewer lanes.In other words, drivers should observe whether the car in front is executing a u-turn.Drivers aged 25 years or older, involved in a rear-end crash.Out of a total of 4,554 vehicles, 2,096 were involved in those crashes.
e DT model consists of three components.ese are decision node, branches, and leaf nodes.Within the DT structure, each decision node displays the variable, and each branch displays one variable value based on decision rules, while leaf nodes exhibit the expected values of target variables [34].
SPSS was used to conduct the analysis.In order to create the DT, the full dataset was first split according to root node, which was the proportion of values in the target variable.is was then split into a number of smaller subsets.Several SPSS types can be used to carry out splitting and growing, including CHAID, CART, and QUEST.Each of these types has advantages and disadvantages. is study chose CRT for two reasons.First, CRT is capable of analyzing binary node splitting, which is suitable for the interpretation of accident data analysis results [9].Second, CRT can potentially analyze influence variables.is research sought to find the relationship between target variables and other variables expressed in form of the rank of each independent (predictor) variable according to its importance to the model [35].A great deal of previous research has used CRT to analyze accident data [36][37][38], as CRT functions to emphatically focus on maximizing within-node homogeneity.e extent to which a node does not represent a homogenous subset of cases is an indication of impurity [35].
Choosing the correct splitting algorithm is also important.SPSS CRT offers two types of splitting, Gini and Twoing.Ginisplits, which are widely used function, to maximize the homogeneity of child nodes with respect to the values of the dependent variables.Gini is based on squared probabilities of membership for each category of the dependent variable [35,36,39].For CART acceptance, splitting was achieved by using unit misclassification costs.is is the proportion of observed and predicted data comparisons [29].
In order to determine the optimal tree model, ten-fold cross-validation was undertaken, which is one of several cross-validation techniques to select for appropriate tree size.To avoid over-fitting the model, the maximum tree depth was five nodes, minimum cases in the parent node were 150, and minimum cases in child node were 75 [29].

Results and Discussion
According to the results from the CART of the two models, when considering misclassification costs for predictive accuracy (Table 5), Model#1 had overall correctness of 52.9% and Model#2 of 65.1%.Despite these low values, as confirmed by Khan et al. [29], Kashani and Mohaymany [36], they can be accepted and interpreted.

Model#1.
Model#1 (Figure 3) found six major variables related to the target variables.e most significant variable is driver's age (person_age).Drivers aged less than 21 years were at-fault drivers in 57.3% of accidents.is may be because younger drivers are less careful.Ma and Yan [5], Chandraratna and Stamatiadis [7] found that young drivers are more likely to be at fault than middle-aged drivers.ose aged over 21 years were at-fault only 48.9%.e significant variable was with a minimum of 2-8 tra c lanes on which a large number of trucks are parked, the chances of rear-end crashes are high.Moreover, the second variable, vehicle type (Veh_type), shows that large cars and trucks with six wheels or more result in 39.7% of deaths. is is relevant to the ndings of Chen et al. [21], Chang and Chien [39], who found that the chances of fatality while decelerating and going straight were 53.1% (60/113 of crash accidents).Large vehicles which hit small vehicles on 2-4 lane roads have a high chance of fatality due to the vehicle body size factor [24].With regard to other crash types, stopped crash type has a 33% chance of fatality (80/240 crash accidents).In other words, rear-end crashes, occurring when the front cars are stopping, have a high fatality rate.With regard to medium and small vehicles, the chances of fatal crashes are high when the driver is aged more than 36 years (31.4%).
For drivers who use seatbelts, the second variable of raised and ush median led to a higher chance of fatality than other median types as these two types exist in areas of low-speed driving.If drivers violate the rules, the chances of rear-end should take special care on roads with no median, with a raised median, or with a depressed median, and they should maintain a greater distance from the car in front.

Model#2.
e results of Model#2 (Figure 4) reveal 14 variables essential to fatal/nonfatal crashes.e most signi cant variable was safety equipment (SafertEqui), such as seatbelts or helmets.ose who did not use safety equipment were at 29.5% risk of dying in a rear-end collision.is is consistent with other research which has found that the use of safety equipment can reduce accident severity [21,41].e next most signi cant variable was visibility, with a rear-end crash at night with no light having a 49% risk of fatality.is result supports ndings by Sullivan and Flannagan [11], Chen et al. [21].Low light driving leads to rear-end crashes against cars parked along roadsides.In addition, a lower quantity of night-time tra c leads to drivers driving at higher speeds, which, in turn, causes a greater number of fatalities due to high velocity while crashing.In the case of su cient light (in the daytime and at night with light), the variable of roads e factors acquired from this analysis can be used to develop transportation office and rural road office policy and public relations practices, in order to reduce the number and severity of rear-end collisions.
It is recommended that future research parametric and nonparametric analysis to compare these factors in order to better understand the factors affecting crashes.In addition, a further investigation of lane numbers, median type, and median opening affecting the number of rear-end collisions, and fatal crashes, is called for, as these three variables were imperative for both models.collisions will be very high.For example, roads with a flush median type usually have no auxiliary lane to separate turning cars.erefore, if a speeding car comes from behind, the resulting rear-end crash will be severe.is is consistent with the second variable, median opening, where there is a 48.8% probability of death.For other median types, two to four traffic lanes had 16.2% fatal rear-end crashes.With regard to leaf node, envi_light was found to be in accordance with Chen et al. [22], who found that collisions occurring at night both with light and no light have a greater chance of fatality than collisions occurring during the daytime.
Policy recommendations to reduce fatalities from rear-end collisions are as follows: promoting awareness of seatbelt use by focusing on the driving license test, and increasing the strictness of law enforcement.For light conditions affecting visibility, drivers must be made aware of the danger of driving on roads with no lights, especially at night.Relevant authorities should consider increasing light installation on roads where the risk of rear-end collision is high.With regard to vehicle type, truckers must increase their awareness of parking their vehicles on roads with a high risk of rear-end collision, such as where there are no parking lanes and no light.In other words, the relevant departments, such as the DOH, should consider setting up illuminated roadside rest stops for trucks.e first variable is the small number of lanes (2-4 traffic lanes), which is common in ailand.e results of the models differed.Model#1 found fewer at-fault drivers in cases of a small number of traffic lanes, while model#2, found a high chance of fatalities.Future research should analyze this issue with regard to how different traffic lanes affect the frequency and severity of rear-end collisions.Another variable which was significant in both models was median type.Barrier and depressed median types result in a small number of rear-end collisions, and a low fatal crash rate.erefore, when subordinate units of the DOHs make road improvements, these two median types should be considered.With regard to median opening point, both models found that rear-end collisions occurring at the median opening had a high incidence of at-fault, and caused high proportion of fatal crashes as the front vehicle decelerated or executed a u-turn.In these conditions, there is a high probability for the occurrence of a rear-end collision.In the case of fatal crashes, if the following vehicle has not seen the turning signal, a serious rear-end collision will occur.

Conclusion
is research sought to explore two issues related to rear-end crashes.First, to find the factors which increase the number of rear-end collisions. is was achieved by focusing on the driver and environmental characteristics that cause rear-end collisions.Second, to find the factors causing fatal rear-end collisions.Using highway rear-end collision data from 2011 to 2015, nonparametric analysis was conducted on the significance of other variables which affect target variables, using

4. 3 .
Discussion of the Two Models.Considering the overall picture of the two models, similar variables result in frequent rear-end collisions and fatalities.
At-fault/not-at-fault Fatal injury Comparison of two models Journal of Advanced Transportation an overview of factors, including drivers, the driving environment, and physical road characteristics.e model results were found to be able to predict rear-end collisions and fatalities with acceptable accuracy.e factors can contribute to a reduction in the number of at-fault drivers, and a reduction in the fatality rate of rear-end collisions.