Goodness-of-fit testing for accident models with low means
Highlights
► We examined test statistics and compared their performance for the GOF of crash models subjected to low sample means. ► We proposed a new test statistic that is not dependent on the sample size for Poisson regression model. ► We provided guidance on how and when to use appropriate test statistics for both Poisson and negative binomial regression models.
Introduction
The modeling of relationships between motor vehicle crashes and underlying factors, such as traffic volume and highway geometric features has been investigated for more than three decades. The statistical models (sometimes referred to as crash prediction models) from which these relationships are developed can be used for various purposes, including predicting crashes on transportation facilities and determining which variables significantly influence crashes. Recently, many highway safety studies have documented the use of Poisson-gamma (also referred to as negative binomial – NB) regression models (Miaou and Lum, 1993, Poch and Mannering, 1996, Miaou and Lord, 2003, Maycock and Hall, 1984, Lord et al., 2005, Miaou, 1994, Maher and Summersgill, 1996). On rare occasions, the Poisson model has been found to be suitable for modeling crashes, especially when crash sample mean is low (Joshua and Garber, 1990, Miaou et al., 1992, Ivan et al., 2000, Lord and Bonneson, 2007). Although Poisson model is not used as commonly as NB model, it is frequently used when the sample mean is low due to the nature of count data (see, e.g., Lord and Bonneson, 2007). With the Poisson or NB models, the relationships between motor vehicle crashes and explanatory variables can then be developed by means of the generalized linear model (GLM) framework.
Pearson's X2 and the scaled deviance (G2) are two common test statistics that have been proposed as measures of GOF for Poisson or NB models (Maher and Summersgill, 1996). Statistical software (e.g., SAS) also uses these two statistics for assessing the GOF of a GLM (SAS Institute Inc., 1999). Unfortunately, transportation safety analysts often deal with crash data that are subjected to low sample mean values. Under such conditions, the traditional test statistics may not perform very well. This has been referred to in the highway safety literature as the low mean problem (LMP). The study by Sukhatme (1938) concluded that, “for samples from a Poisson distribution with mean as low as one, Pearson's X2 test for goodness of fit is not good.” In the field of traffic safety, this issue was first raised by Maycock and Hall (1984) and further discussed by Maher and Summersgill (1996), Fridstrom et al. (1995), and Agrawal and Lord (2006). Wood (2002) proposed a more complex technique, the grouped G2 method, to solve this problem. The grouped G2 method is based on the knowledge that through grouping, the data become approximately normally distributed and the test statistics follow a X2 distribution. Some issues (e.g., sample size) regarding this method are discussed in Section 3. It should be noted that the comparison of different models could be achieved by means of Akaike's information criterion (AIC) (Akaike, 1974) or Bayesian Information Criterion (BIC) (Schwarz, 1978). However, similar to the previous studies (Maher and Summersgill, 1996, Wood, 2002, Agrawal and Lord, 2006), this research intends to study statistics for the GOF of a given model (either Poisson model or NB model), that is, to investigate how well a developed model fit observed data.
This study has three objectives. The first objective is to examine all the traditional test statistics and compare their performance for the GOF of GLMs subjected to low sample means. Secondly, this study will propose a new test statistic that is not dependent on the sample size for Poisson regression model, as opposed to the grouped G2 method. The proposed method is easy to use and does not require grouping data, which is time consuming and may not be feasible to use if the sample size is small. Moreover, as will be shown in this study, the proposed method can be used for lower sample means than documented in previous studies. The third objective is to provide guidance on how and when to use appropriate test statistics for both Poisson and NB regression models, especially the grouped G2 method that is complex and may not ready for practitioners.
Section snippets
Statistical models
GLMs represent a class of fixed-effect regression models for dependent variables (McCullagh and Nelder, 1989), such as crash counts in traffic accident models. Common GLMs include linear regression, logistic regression, and Poisson regression. Given the characteristics of motor vehicle collisions (i.e., random, discrete, and non-negative independent events), stochastic modeling methods need to be used over deterministic methods. The two most common stochastic modeling methods utilized for
Goodness-of-fit test statistics
GOF tests use the properties of a hypothesized distribution to assess whether or not observed data are generated from a given distribution (Read and Cressie, 1988). The most well known GOF test statistics are Pearson's X2 and the scaled deviance (G2). Pearson's X2 is generally calculated as follows: , where yi is the observed data, μi is the true mean from the model, and is the error and is usually represented by the standard deviation of yi. The scaled deviance is
Discussion
The results of this study show that the Pearson's X2 statistic tends to overestimate GOF values for low μ values, since V(X2) are larger than 2. This is because the components (i.e., for Poisson models) will be inflated when the predicted values (μi) are low. For instance, with the observed crash dataset in the first case, the Poisson model predicted 1.02 crashes per year for one of the intersections. However, 4 crashes were observed at that intersection. The contribution to X2would
Conclusions and future work
The NB regression model is the most commonly type of model that is utilized for analyzing traffic crashes. Depending on the characteristics of the data, the Poisson model has been found to be a suitable model on rare occasions. These models help establish the relationship between traffic crashes (response variable) and traffic flow, highway geometrics, and other explanatory variables. To evaluate their statistical performance, GOF tests need to be used. Since crash data are often characterized
Acknowledgements
The authors greatly acknowledge the comments provided by Dr. Graham Wood from the Department of Statistics at Macquarie University in Sydney, Australia. This study is supported by National Natural Science Foundation of China (Grant No. 51128802) and Jiangsu Province leading disciplines of universities program.
References (42)
- et al.
A goodness of fit test for the Poisson distribution based on the empirical generating function
Statistical and Probability Letters
(1992) - et al.
Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts
Accident Analysis and Prevention
(1995) - et al.
Recent and classical goodness-of-fit tests for the Poisson distribution
Journal of Statistical Planning and Inference
(2000) - et al.
Explaining two-lane highway crash rates using land use and hourly exposure
Accident Analysis and Prevention
(2000) Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter
Accident Analysis and Prevention
(2006)- et al.
Modeling crash–flow-density and crash–flow-v/c ratio for rural and urban freeway segments
Accident Analysis and Prevention
(2005) - et al.
Comprehensive methodology for the fitting of predictive accident models
Accident Analysis and Prevention
(1996) - et al.
Modeling vehicle accidents and highway geometric design relationships
Accident Analysis and Prevention
(1993) - et al.
On the Nature of Over-dispersion in Motor Vehicle Crash Prediction Models
Accident Analysis & Prevention
(2007) Generalized linear accident models and goodness-of-fit testing
Accident Analysis and Prevention
(2002)
Highway Safety Manual
Effects of Sample Size on the Goodness-of-Fit Statistic and Confidence Intervals of Crash Prediction Models Subjected to Low Sample mean Values. CD-ROM
A new look at the statistical model identification
IEEE Transactions on Automatic Control
The statistical analysis of insect counts based on the negative binomial distributions
Biometrics
Empirical likelihood as a goodness-of-fit measure
Biometrika
Methods for exact goodness-of-fit tests
Journal of the American Statistical Association
A family of power-divergence diagnostics for goodness-of-fit
The Canadian Journal of Statistics
Multinomial goodness-of-fit tests
Journal of Royal Statistical Society Serious B
Goodness-of-Fit Statistics for Discrete Multivariate Data
Pearson's X2 and the log likelihood ratio statistic G2: a comparative review
International Statistical Review
The negative binomial distribution
Annals of Eugenics
Cited by (9)
Pedestrian and bicyclist flows in accident modelling at intersections. Influence of the length of observational period
2016, Safety ScienceCitation Excerpt :Mathematical models in the form of Eq. (1) are often referred to as safety performance functions. Some studies apply Poisson regression for the modelling process (Geyer et al., 2006; Ye et al., 2013). Poisson distribution assumes equal mean and variance, and since accident data are often over-dispersed, this might influence the significance level of the parameters (Cameron and Trivedi, 1990; Poch and Mannering, 1996; Washington et al., 2013).
Safety PL- A Support Tool for Road Safety Impact Assessment
2016, Transportation Research ProcediaSpatio-temporal hotspots analysis of pedestrian-vehicle collisions in tunisian coastal regions
2020, 2020 13th International Colloquium of Logistics and Supply Chain Management, LOGISTIQUA 2020Single-vehicle run-off-road crash prediction model associated with pavement characteristics
2020, International Journal of Engineering, Transactions A: BasicsInjury Prediction Models for Onshore Road Network Development
2019, Polish Maritime Research