Comparative Analysis of the Reported Animal-Vehicle Collisions Data and Carcass Removal Data for Hotspot Identification

Two common types of animal-vehicle collision data (reported animal-vehicle collision (AVC) data and carcass removal data) are usually recorded by transportation management agencies. Previous studies have found that these two datasets often demonstrate different characteristics. To accurately identify the higher-risk animal-vehicle collision sites, this study compared the differences in hotspot identification and the effect of explanation variables between carcass removal and reported AVCs. To complete the objective, both the Negative Binomial (NB) model and the generalized Negative Binomial (GNB) are applied in calculating the Empirical Bayesian (EB) estimates using the animal collision data collected on ten highways in Washington State. The important findings can be summarized as follows. (1)Theexplanatory variables have different effects on the occurrence of carcass removal data and reported AVC data. (2)The ranking results from EB estimates when using carcass removal data and reported AVC data differ significantly. (3)The results of hotspot identification are different between carcass removal data and reported AVC data. However, the ranking results of GNB models are better than those of NB models in terms of consistency. Thus, transportation management agencies should be cautious when using either carcass removal data or reported AVC data to identify hotspots.


Introduction
Animal-vehicle collisions (AVCs) have always been one of research frontiers and hot topics.Van der Ree et al. [1] indicated that mortality rate of AVCs is a major concern across most of the developed countries, and it becomes more serious in the developing countries in the next few decades.It was estimated that the number of AVCs per year exceeded 1 million in the 1990s [2].There are about 155-211 deaths, 13,713-29,000 injuries, and 1 billion dollars property loss per year caused by AVCs [2][3][4][5].The fact that the average number of fatal AVCs was increasing year by year was inferred from the record from the NHTSA Fatality Analysis Reporting System (FARS) [4].Previous studies found that the number of wild animals decreased significantly due to AVCs [6][7][8], and billions of wild animals died annually in the collision with vehicle and other types of transportation mode [9,10].
Most previous AVCs studies considered either the reported AVC data or the carcass removal data.For the carcass removal data, Gkritza et al. [31] evaluated the effect of deterministic factors on the occurrence frequency and severity of AVCs using Poisson regression model and NB regression model.For the reported AVC data, a stepwise logistic regression model was used to identify the important factors and the high risk collision points [32][33][34].Seiler [35] predicted the collision of nonincident control points through the reported AVCs data using a multiple logistic regression model.Researchers like Lao et al. [36] found a carcass found on the road is likely caused by collisions with vehicles.However, many previous studies found that the number of the carcass removal differs from the number of the reported AVCs [37][38][39].The discrepancy of two AVC data sources is explained as follows.First, not all the wild animals related to the AVCs are died.Second, not all the carcasses are reported through the media.
Meanwhile, there are several researchers focusing on the difference and relationship between two datasets.A fuzzy logic-based mapping algorithm is used to merge the two incomplete datasets [36].Lao et al. [40] developed a diagonal inflated bivariate Poisson regression model to consider the two datasets simultaneously.To predict AVCs risk, Visintin et al. [41] proposed a model that considers two types of factors: vehicles and animals.
However, few studies have compared the hotspot identification results obtained from the carcass removal and the reported AVCs data.Thus, the primary objective of this paper is to examine the difference in hotspot identification and the effect of the explanation variables on the carcass removal and the reported AVCs.To complete the objective, both the traditional NB model and the generalized Negative Binomial (GNB) are applied in calculating the EB estimates.The dispersion parameter of the NB model is fixed, while the GNB assumes the dispersion parameter varies from site to site.This study analysed the crash data collected at ten highways in Washington State.
The rest of the paper is organized as follows.The second section introduces the methodology of the EB method based on NB model and GNB model used in this study.The third section provides the data description and preliminary data analysis.The following section displays model results.The reported AVC and the carcass removal are also compared by the EB method based on the NB model and GNB model.Finally, the model results are discussed and summarized.

Materials and Methods
The following two sections introduce Negative Binomial model based and generalized Negative Binomial model based EB methods, respectively.

. . Negative Binomial Model Based Empirical Bayesian
Method.The EB estimate of a site consists of two parts: predicted number of crashes from similar sites and observed number of crashes at the site.The prediction is usually based on safety performance functions (SPFs), which commonly assume the traffic counts follow some probability distributions.Until now, the NB method is the most popular approach to estimate the EB values.And the weight factor is determined by the dispersion parameter of the NB models.The NB model has the model structure below.Poisson distribution is used to assume the number of crashes during a specific time period, which is defined by where  = mean response of the observation.
If the Poisson rate is assumed to be gamma distributed, the response variable follows a NB distribution.Thus, the NB distribution can be seen as a mixture of Poisson distributions.Hilbe [42] illustrated the whole derivation of the NB model.The probability density function of the NB is defined below: where  = response variable;  = mean of the observation; and  = dispersion parameter.
Compared to the Poisson distribution, the NB distribution is appropriate for handling the overdispersion (that is, the variance is larger than the mean).For  = 0, 1, 2, ..., ∞, the mean of y is [] =  and variance is () = + 2 .If  → 0, the variance equals the mean and the NB distribution converges to the Poisson distribution.
The dispersion parameter  of the NB model is of great significance in calculating the EB estimates.Thus, the EB method is proposed to calculate the long term mean for the site i by Hauer (1992) [43].And the EB method is shown as follows: where 푖 =predicted number of crashes per year for site i estimated by EB method; ∧  푖 = predicted number of crashes per year for site i expected by the SPF;  . .Generalized Negative Binomial Model Based Empirical Bayesian Method.Traditionally, the NB models assume fixed dispersion parameter  (i.e., all sites share the same dispersion parameter), and it is used to calculate EB estimates.However, in recent years, some studies have found that the dispersion parameter  is related to the explanatory variables.They also discovered that GNB model presents better statistical adaptive performance and describes the dispersion phenomenon better [25,44].That is to say, the varying dispersion parameter has an impact on the EB estimates and may potentially improve the EB estimates [45].For the GNB model, the difference of the EB estimates between the carcass removal and the reported AVCs is shown in this section.
When estimating the EB value, the weight factor will be influenced by the selection of the functional form.As discussed in a previous study [46], we considered several different functional forms to calculate the dispersion parameter .The functional forms representing dispersion parameter of GNB model are shown as follows: Model where  푖 = the dispersion parameter at segment i; 푖 = the segment length in miles for segment i; and  = ( 0 ,  1 ) 耠 = coefficients to be estimated.

Data Description and Preliminary Data Analysis
The collision dataset used in this study was collected at ten highways (I90, US2, SR8, SR20, US97, US101, US395, SR525, US12, and SR970) in Washington State.This dataset includes the reported AVC and the carcass removal data over a fiveyear period from 2002 to 2006 [40].In our study, 10475 road segments are chosen as the research targets.That is, the number of the count is 10475.According to specific road characteristics (i.e., median width, lane width, and shoulder type), the highway is divided into road segments with different length.Carcass removal dataset is gathered from the maintenance files recorded by the maintenance workers of WSDOT.However, compared with the actual number of collisions, two datasets are both underreported.The reason is described as follows.(1) For reported AVCs dataset, collision is recorded only when its cost is larger than a threshold.Moreover, due to human factors of drivers, not every collision is reported to police officers.(2) For carcass removal dataset, some carcasses are hidden in roadside facilities and difficult to find.Another cause is that not each carcass is removed by professional maintenance workers.Thus, although two datasets overlap in some extent, there is a great discrepancy shown in Table 2 between two datasets.Table 3 provides the summary statistics of characteristics for reported AVCs and carcasses in the Washington data.Apparently, the reported AVCs and carcass removal datasets differ significantly.And the number of carcass removal records is typically more than the numbers of reported AVCs data.Table 3 also describes the explanatory variables used in the models.Some variables (restrictive access control, rural or urban and terrain type, etc.) are binary variables.The percentage for binary variables is 43.75%.Figure 1 describes the distribution of binary variables in the defined segments.
Annual average daily traffic (AADT) is scaled into thousands of vehicles.The model also takes three animal habitats into consideration.These three kinds of habitats include the white-tailed deer habitat, mule deer habitat, and elk habitat, since the deer and the elk are the most common animals in the AVC researches in Washington State.
Table 3 also shows the mean, maximum, minimum, and standard deviation (SD) of each variable.And the distribution of zero observations is provided in Table 4.It can be observed from Tables 3 and 4 that both the number of reported AVCs and the number of carcasses per section of a highway are overdispersed.

Results and Discussion
The modelling results for the carcass removal data and reported AVC data are provided in this section.This section is divided into two parts.In the first part, the NB model with fixed dispersion parameter was utilized to compare the difference between the carcass removal and the reported AVC data when estimating the number of crashes for a specific site.In the second part, the difference is analysed using the GNB model (i.e., NB model with a varying dispersion parameter).

. . Comparison of the Reported AVC and the Carcass Removal
Data Using the NB Model with Fixed Dispersion Parameter.In the NB model, the mean functional form is shown below: where Tables 5 and 6 show the NB modelling results without insignificant variables for carcass removal and reported AVCs, respectively.For the carcass removal data, the insignificant variables are truck percentage, terrain type of mountain and right shoulder width.However, for the reported AVCs, terrain type of rolling, right shoulder width, and rural or urban type are insignificant.In summary, these insignificant variables should be eliminated when obtaining the EB estimates by NB model.On the other hand, restrictive access control is the most significant variable of reported AVCs, while the white-tailed deer habitat is the most significant variable of carcass removal.
For the possibility of the reported AVCs, AADT, speed limit, left shoulder width, white-tailed deer habitat, elk habitat, and mule deer habitat have a positive effect.And AADT, speed limit, terrain type of rolling, lane width, left  shoulder width, white-tailed deer habitat, and elk habitat have a positive effect on the possibility of the carcass removal.
AADT is found to increase the likelihood of both carcass removal and reported AVCs.The exposure between traffics and animals is main cause of animal-vehicle collisions.As mentioned above, it is more significant for reported AVCs than carcass removal.The cause of this is described as follows: if there is a heavier traffic flow, it is more likely that the AVCs can be reported timely since more people can notice the AVC occurrence.The coefficient of speed limit is positive, and it is less significant for reported AVCs than carcass removal.Under the condition of high speed limit, drivers prefer to choose a higher speed, and the drivers need a longer stopping distance.Thus, the driver is unlikely to stop at a safe distance to avoid a collision with an animal.As the truck percentage increases, the number of collisions will decrease.First, when a truck is traveling, it may cause a lot of noise to drive away the surrounding animals.Second, compared to smaller cars, trucks have a wider view.Moreover, drivers are likely to drive more carefully when more trucks appear.Consequently, the number of crashes will decline.Restrictive access control has a decreasing effect on the possibility of the carcass removal and the reported AVCs.The number of crashes may be smaller on the road with restrictive access control.This is because that the restrictive access control limits the animal activities.Thus, it is difficult for animals to cross the road.As a result, the number of collisions will decrease.Moreover, it is of greater significance for reported AVCs than carcass removal.
Total number of lanes is found to decrease the likelihood of both carcass removal and reported AVCs.It is more significant for carcass removal than reported AVCs.This may be because the road with more lanes is more difficult for animals to cross.And the more lanes, the easier it is to find carcass removal.Left shoulder width is found to increase the likelihood of both carcass removal and reported AVCs and the effect of left shoulder width is similar for carcass removal and reported AVCs.White-tailed deer habitat and elk habitat both have increasing effects on the possibility of the carcass removal and the reported AVCs.The finding demonstrates that collisions are prone to happen in the site with more animals.In addition, white-tailed deer habitat is less significant for reported AVCs than carcass removal, while elk habitat is of greater significance for reported AVCs than carcass removal.Terrain type of rolling has an increasing effect on the carcass removal, while terrain type of mountain has a decreasing effect on the reported AVCs.The cause of this phenomenon is shown as follows: in roll or mountain area, there are more animals than in level terrain.Another cause is that collision location is likely to be hidden and difficult to find in roll or mountain area [47].Rural or urban type has a decreasing effect on the carcass removal.This may because that the carcass in urban area is more likely to be found.Median width decreases the likelihood of both carcass removal and reported AVCs and the effect is similar for carcass removal and reported AVCs.With wider median, animal activities are limited and the likelihood of crashes may decrease on the road.
Figure 2 shows the comparison of EB estimates between carcass removal and the reported AVCs.As demonstrated in Figures 2(a) and 2(b), for carcass removal and reported AVCs data, the expected number of collision and the weight factor show a similar association pattern.And the expected number of collision is inversely proportional to the weight factor.That is, smaller weight parameter is related to larger expected number of collision.In addition, since the dispersion parameter  estimated from carcass removal data is greater than the dispersion parameter  estimated from the reported AVC data, the weight parameter for reported AVC data is generally larger than the weight parameter for carcass removal data.Consequently, the EB estimates from carcass removal data will put more weight on the observed number of carcass removal data than the EB estimates from the reported AVC data.
Figure 3 shows the difference in EB estimate ranking results between the carcass removal and the reported AVC data.Smaller values of the reported AVCs ranking or the carcass removal ranking mean that the site is more dangerous.If both the carcass removal data and the reported AVC data have the similar effect on predicting the number of crashes, the distribution of the scatter in Figure 3 will be concentrated to the red line (i.e., y = x).It can be easily seen from the figure that when using the carcass removal data and the reported AVCs data to identify the hotspots, respectively, the results are very different.Further ranking comparison results are provided in Table 7. Notable difference is that the ranking differs significantly.For example, 49.53% of the results have a ranking difference beyond 1,000 positions between the carcass removal data and the reported AVCs data (note that the number of road segments in the Washington data is   10,475).On the whole, the type of the data will influence the identification of dangerous sites.
In order to further measure the differences between the carcass removal and the reported AVC data in identifying the hotspot, two evaluation tests are used, which are similar to the tests proposed by Cheng and Washington [48].
. . .Test I. Data Consistency Test.The number of the same sites identified as hotspots using the carcass removal data and the reported AVC data is used to evaluate the performance of two types of data.The number mentioned above is defined in where  퐼 is the number of the same sites identified as hotspots using the carcass removal data and the reported AVC data; n is the total number of sites; c is the threshold of hotspots; and k is the site ID.
The test process includes comparison across two types of AVC datasets, and we consider three cases in terms of the number of hotspots selected.The three cases correspond to considering 1%, 5%, and 10% of all sites as hotspots (i.e., c = [0.01,0.05, 0.10]).For example, in this study, when c =0.01, a total of approximately 105 sites (i.e., about 1% of the 10,475 sites) will be considered as hotspots.
Table 8 shows the result of test I in EB estimate ranking results between the carcass removal and the reported AVC data.If the EB estimates from the carcass removal data and the reported AVC data yield similar HSID results, the number of hotspots will be equal to the threshold and the percentage will be concentrated to 100% (note that the number of road segments in the Washington data is 10,475).It can be easily seen from the table that when using the carcass removal data and the reported AVCs data to identify the hotspots, respectively, the results are different significantly.
. . .Test II.Total Rank Differences Test.Taking the ranking difference into account, test II calculates the total ranking difference of the hotspots identified using the carcass removal data and the reported AVC data.Note that only c × n hotspots are considered.The test statistic for test II is shown in where  퐼퐼 is the total test statistic; R( 푐푎푟푐푎푠푠 ) is the rank of site k obtained using the carcass removal data; R( 푟푒푝표푟푡푒푑 ) is the rank of site k obtained using the reported AVC data; and, k is the site ID.
Table 9: The total ranking difference of the hotspots identified using the carcass removal data and the reported AVCs data.
Threshold level c Sum c=1% using the carcass removal data 67,875 c=1% using the reported AVCs data 51,543 c=5% using the carcass removal data 406,430 c=5% using the reported AVCs data 407,258 c=10% using the carcass removal data 989,728 c=10% using the reported AVCs data 797,760 Note.There are 10,475 road segments in the Washington data.
The total ranking difference of the hotspots identified using the carcass removal data and the reported AVC data for different threshold levels c is provided in Table 9.For example, the sum of difference in ranks is up to 989,728 for threshold level 10% using the carcass removal data to identify the hotspots.The sum of difference in ranks using the carcass removal data is larger than that using the reported AVCs data when c= [0.01,0.1].Moreover, when c =0.05, there is a slight difference between the two datasets.On the whole, the analysis in this part indicates that the result will be onesided and inaccurate if we identify the hotspots only using the carcass removal data or the reported AVCs data.As a result, the type of the data will influence the identification of dangerous sites.

. . Comparison of the Carcass Removal and the Reported
AVC by EB Method Based on the GNB Model.With the approach described in previous sections, the two datasets were analysed using GNB-based EB method with three models (i.e., (4)-( 6)).Tables 10 and 11 show the modelling results with taking out insignificant variables for GNB model with the carcass removal data and the reported AVCs data, respectively.As can be seen from Tables 10 and 11, Model 3 outperforms the other two models, since Model 3 has the lowest Akaike information criterion (AIC) and Bayesian information criterion (BIC) values.
The GNB-based EB estimates for carcass removal and the reported AVCs are also compared.As observed in Figures 4(a) and 4(b), the two datasets have similar associations between the modelling values and the weight factor.In other words, the shape of the scatter distribution in the figure is approximately similar.When the modelling value ( 푖 ) is fixed, the weight factor of the carcass removal is lower than that of the reported AVCs.We can find that a varying dispersion parameter will influence the weight factor for the crash prediction model.In addition, when adding a varying dispersion parameter to the NB model, there will be a similar influence on both the reported AVCs data and the carcass removal data.
Figure 5 presents the comparing results of hotspot identification between the reported AVC data and the carcass removal data by EB method based on the GNB model.The depth of the color in the figure represents the density of the point.As shown in Figure 5, the link between the two kinds of data based on the GNB model is more positive than that based      12 indicate that there is significant difference in the ranking between the reported AVCs and the carcass removal.Overall, the gap between the two kinds of data is narrowing when using the GNB model.
Similar to the NB model, two methods mentioned above are used to measure the differences between the carcass removal and the reported AVC data in identifying the hotspot.Table 13 presents the comparison results of hotspot identification between the reported AVC data and the carcass  Note.There are 10,475 road segments in the Washington data.
Table 14: The sum of difference in ranks over all identified sites for threshold level  using the carcass removal data and the reported AVC data using the GNB-based EB estimates.
Threshold level c Sum c=1% using the carcass removal data 2,335 c=1% using the reported AVCs data 1,426 c=5% using the carcass removal data 45,029 c=5% using the reported AVCs data 20,914 c=10% using the carcass removal data 218,068 c=10% using the reported AVCs data 88,179 Note.There are 10,475 road segments in the Washington data.
removal data by the EB method based on the GNB model.As shown in Table 13, the percentage based on the GNB model is greater than that based on the NB model; that is, the link between the two kinds of data based on the GNB model is more positive.Moreover, the comparing results provided in Table 14 indicate that there is a significant difference in the ranking between the reported AVCs and the carcass removal.Furthermore, the sum of difference in ranks using the carcass removal data is different from the sum of difference in ranks using the reported AVCs data.Overall, the gap between the two kinds of data is narrowing when using the GNB model.

Conclusions
This paper has examined the difference between the reported AVCs data and the carcass removal data in identifying hotspots and the influence of explanatory variables.To accomplish the objectives of this study, the EB method based on the NB model and GNB model, separately, is used to model the animal crash data collected in Washington State.
The important conclusions can be summarized as follows.
(1) Some explanatory variables have different effects on the occurrence of carcass removal data and reported AVC data.
(2) Based on the modelling results from NB and GNB models, the ranking results from EB estimates when using the carcass removal data and reported AVC data differ significantly.(3) The results of hotspot identification are significantly different between the carcass removal data and the reported AVC data.However, the ranking results with GNB models are relatively more consistent than that of NB models.Thus, transportation management agencies should be cautious when analysing the carcass removal data or reported AVC data to identify AVCprone sites.
In this study, the EB method based on the NB model and GNB model is applied to compare the HSID results using the carcass removal and the reported AVCs data collected at ten highways in Washington State.In the future, the AVC datasets with more variables (i.e., road classification, etc.) from other sites will be collected to validate the findings from this study.In addition, spatial models should also be developed to analyse the carcass removal and the reported AVCs data [49].
weight factor defined as a function of ∧  푖 and dispersion parameter ; and  푖 =observed number of crashes per year at site i.

AFigure 1 :
Figure 1: The distribution of binary variables in the defined segments.
Weight Factor for carcass removal Weight Factor for reported AVCs (b) Associations of weight factors between carcass removal and reported AVCs

Figure 2 :
Figure 2: Weight factors produced by NB model.

Figure 3 :
Figure 3: Comparison in HSID ranking by the carcass removal and the reported AVC using NB model.
Weight Factor for carcass removal Weight Factor for reported AVCs (b) Associations of weight factors between carcass removal and reported AVCs

Figure 4 :
Figure 4: Weight factors produced by GNB model.

Figure 5 :
Figure 5: Comparison in HSID ranking by the carcass removal and the reported AVC using GNB model.

Table 1 :
Data collection information.

Table 2 :
Frequency distribution of reported AVCs and carcass removal in the Washington data.
푖 =terrain type of mountain for segment i;  푖 =lane width in feet for segment i;  푖 =left shoulder width in feet for segment i;  푖 = right shoulder width in feet for segment i;  푖 =white-tailed deer habitat for segment i;  푖 =elk habitat for segment i;  푖 =mule deer habitat for segment i;  푖 = area type (rural or urban) for segment i;  푖 = median width for segment i; and,

Table 3 :
Summary statistics of characteristics for reported AVCs and carcasses in the Washington data.
Note. † SD = Standard Deviation.a Reported AVC data record.b Carcass removal data record.c Dependent variable.d Six out of 10,475 segments have only one lane.-= not applicable.

Table 4 :
The distribution of zero observations in the defined segments.

Table 5 :
Modelling results of carcass removal for NB models with the Washington data.

Table 6 :
Modelling results of reported AVCs for NB models with the Washington data.

Table 7 :
Differences in ranking between the carcass removal and the reported AVC using the NB-based EB estimates.Note.There are 10,475 road segments in the Washington data.

Table 8 :
The number of the same sites identified as hotspots using the carcass removal data and the reported AVCs data.Note.There are 10,475 road segments in the Washington data.

Table 10 :
Modelling results for the GNB model using the carcass removal data.

Table 11 :
Modelling results for the GNB model using the reported AVCs data.

Table 12 :
Differences in ranking between the reported AVC and the carcass removal using the GNB-based EB estimates.Note.There are 10,475 road segments in the Washington data.

Table 13 :
The number of hotspots identified by both the carcass removal data and the reported AVCs data using the GNB-based EB estimates.