Spatial and temporal aSpectS of prior and likelihood data choiceS for BayeSian modelS in road traffic Safety analySeS

In a Bayesian regression model, parameters are not constants, but random variables described by some posterior distributions. In order to define such a distribution, two pieces of information are combined: (1) a prior distribution that represents previous knowledge about a model parameter and (2) a likelihood function that updates prior knowledge. Both elements are analysed in terms of implementing the Bayesian approach in road safety analyses. A Bayesian multiple logistic regression model that classi - fies road accident severity is investigated. Three groups of input variables have been considered in the model: accident location characteristics, at fault driver’s features and accident attributes. Since road accidents are scattered in space and time, two aspects of information source choices in the Bayesian modelling procedure are proposed and discussed: spatial and temporal ones. In both aspects, priors are based on selected data that generate background knowledge about model parameters – thus, prior knowledge has an informative property. Bayesian likelihoods which modify priors are data that deliver: (1) information specific to a road – in the spatial aspect or (2) the latest information – in the temporal aspect. The research experiments were conducted to illustrate the approach and some conclusions have been drawn.


Introduction
Traffic road safety, as an element of a human-vehicle-road system, has been the subject of scientific and research works for many years.There are many researchers and specialists in a wide range of fields or disciplines who are involved in the process of recognizing and understanding mechanisms related to a road crash.Many theories and models have been elaborated in order to evaluate the level of road traffic threats, as well as to identify circumstances, and cause and effect relationships of road accidents.The research area is extensive and covers: simulation and behavioural research (e.g.[8,9]), elaboration of entropy models (e.g.[1,12]), investigations of road polygons including road surroundings, and traffic and weather conditions (speed in particular) (e.g.[3,10]), as well as exploration and mining of real road accident data (e.g.[15,19]).

sciENcE aNd tEchNology
knowledge, which is a resultant probability and a measure of a rational expectation of the event occurrence after getting information from the data.Bayesian thinking, supported by the development of numerical sampling techniques, has created modern statistics fundamentals, which enables formulating and solving problems not available in classical statistics.Bayesian regression modelling is a non-classical methodology which becomes widespread in road traffic safety analyses, mainly because it allows eliminating various weaknesses of classical models.Bayesian regression models are difficult from both conceptual and computational points of view.Nevertheless, they bring a new quality to the development of scientific research methods, and they enable a flexible, though non-standard, approach to modelling issues.The models are used in order to develop safety performance functions (e.g.[6,7,13,16]), including a before-after analysis (e.g.[17]), and also to classify descriptive road accident features, such as driver's behaviour, accident type, or accident severity (e.g.[2,5,16]).
The non-classical method of statistical inference was used in the study in order to develop logistic regression models, in which road accident severity is a response variable and selected features describing accident circumstances are input variables.A certain methodology of defining two basic sources of information for the Bayesian model was elaborated.The research is directed towards establishing informative priors as a general background for the model, and then towards choosing likelihood data in order to obtain posterior knowledge.Both elements would reflect various aspects of road safety research interests.

A Bayesian road accident severity classifier
The subject of the analysis is a statistical classifier -a logistic regression model that classifies road accident severity AcSrv into one of two values (categories): LA -light accident (assumed to be a failure) and FSA -fatal or serious accident (assumed to be a success).Input variables represent the description of a road accident location, at-fault driver's characteristics and accident features.
Logit is a link function in a logistic regression model.Conditional probability P(AcSrv = FSA | X 1 , …, X k ) that an accident which occurred under circumstances described by a set of input variables values is fatal or serious constitutes the argument of the link function: The assumed model is relatively simple since the main purpose of the research is not to analyse the influence of the chosen features on the response variable, but to discuss the methodology that helps in developing a Bayesian regression model.
Contrary to the classical approach, it is assumed that Bayesian regression model parameters are not constants, but random variables.Therefore, each parameter is described by a certain posterior distribution that results from previous (prior) knowledge about the parameter and from the knowledge update using empirical data (Bayesian likelihood data) [18]: The posterior distribution mean of the parameter β i accompanying the variable X i is the measure used to assess the magnitude and the direction of the variable influence on the response.
According to Bayes' rule, posterior distributions P(β | Y, X) contain information from two sources: prior distributions P(β) and likelihood functions L(Y, X | β).A variety of posterior distributions for a regression parameter β i is possible (Fig. 1), which is the consequence of the assumptions made about previous knowledge and likelihood data choices.Whenever one of the sources changes, the posterior changes as well.
Marcov Chains Monte Carlo (MCMC) sampling methodology [4,18] is used in order to obtain posterior distributions P(β |Y, X).Each distribution is calculated from the series of numbers meeting the Marcov chain criteria.The Mertopolis-Hastings algorithm belongs to the most popular generators of the series.The Gibbs sampler is also frequently used.The results of the MCMC method depend on: the number of iterations in the chain, the number of burn-in values and the thinning rate.Converging the Marcov chain to stationarity is a significant issue in the generation process.It gives rise to an output sample from the stationary posterior distribution.Diagnostic tests (e.g.Gelman-Rubic, Geweke, Heidelberger-Welch), as well as trace diagnostic and correlations plots are used in order to assess the Marcov chain quality.

Building a Bayesian road accident severity classifier
A Bayesian regression classifier (1) is created from a two-step Bayesian modelling procedure in which selected aspects of a road accident data investigation are adopted.The proposed approach and its results are strongly data-dependent: a several-year accident data registration period for a network of the same category roads in a given country region is needed (in particular roads supervised by a specific road administration unit).The data are selected in order to focus on either spatial or temporal aspect of the model estimation.The whole procedure extends and develops the concept presented in the investigation by Yu and Abdel-Aty [20] on the selection of informative priors for Bayesian models of a safety performance function.
The algorithm of building the Bayesian road accident severity classifier is presented hereafter.

Bayesian Modelling Step 1; defining the priors -the BM-S1 model
There are three general types of prior distributions used in Bayesian regression models: non-informative, semi-informative, and informative.The first one is utilized in road traffic safety analyses more often than the others, although it is dominated by likelihood data in the final output, and mean values of Bayesian model parameters are very close to parameter estimators of a classical regression model.Better results can be obtained when, instead of diffuse non-informative prior distributions, well-defined informative prior ones are used, because they reflect knowledge on an investigated subject.In order to generate such distributions, suitable data processing is proposed.It is

Fig. 1. A graphical interpretation of a Bayesian regression model parameter
the first step of the above-mentioned procedure, thanks to which the Bayesian BM-S1 model is obtained.
There are the following sources of information for the BM-S1 model: priors -non-informative, normal distributions with zero mean and a • very big standard deviation (1E+06), Bayesian likelihood (likelihood function) -road accident data se-• lected according to the chosen aspect of the analysis: spatial or temporal one.
The Bayesian likelihood for the BM-S1 model is defined in the following way: for the spatial aspect: all accident data registered on the same cat-• egory roads in a given country region for an assumed period of time, for the temporal aspect: all historical accident data registered on the • same category roads in a given country region, excluding the data from the latest (most recent) registration period covering the whole season cycle (a calendar year).
Means and standard deviations of posterior distributions obtained for the BM-S1 model become means and standard deviations of prior normal distributions for the parameters of the Bayesian regression model created in the second step.

Bayesian Modelling Step 2; defining the likelihood -the final BM-S2 model
Since normal distributions derived in the first step are not diffuse, they generate informative prior knowledge constituting a basic background (a generalisation) for the final BM-S2 Bayesian model which follows the chosen aspect of the analysis.The likelihood data for the BM-S2 model define a training data set and they are treated as a factor emphasising and clarifying the research context: for the spatial aspect: accident data for a given road that are selected • from the whole data set modify priors related to the road, for the temporal aspect: the latest (most recent) accident data update • historical knowledge related to the whole area.
Fatal accident observations are extremely rare in road accident data, which usually results in a weak classification quality of the accident fatality.Therefore, in order to overcome such a negative phenomenon and to strengthen the rare values influence on final modelling results, balancing [1,14,15] is applied to the likelihood data in the BM-S2 model forcing smaller differences in the proportions of the values of the response variable AcSrv.Firstly, the primary data set is split into three subsets according to the accident severity AcSrv: light, serious, and fatal.Then, all fatal accident observations are taken to create a 20% stratum in a new training data set.Next, serious and light accident observations are selected at random from the remaining subsets in order to constitute, in the newly created data set, 30% and 50% strata respectively.Finally, the data modification is carried out so as to receive the binary-valued response variable AcSrv which defines a failure by the light accident severity category and a success by combining the serious and fatal accident severity categories.In such a balanced likelihood data set, the fatal accident observations grow considerably and, at the same time, the relatively rare success category does not exceed 50% of the data set size.
The research experiment has been carried out utilising the balancing scheme in each aspect of the data definition for the likelihood function in the BM-S2 model.

Data description
The road accident data used in the study, acquired from the SEWiK police database system, were provided by the Police Headquarters of the Świętokrzyskie province, Poland.The accidents registered during the time period from 2008 to 2014 on all of the nine national roads in the province are analysed in the study.The roads are supervised by a national road administration unit (a division of the General Directorate for National Roads and Motorways) because they serve interregional connections.

Results
The Bayesian regression models were obtained from the 10000-element Marcov chains generated using the Metropolis algorithm for the following settings: the number of burn-out samples = 50000, the number of final chain iterations = 300000, the thinning indicator = 30.All the sciENcE aNd tEchNology Marcov chains reached the stationarity, which was verified by the autocorrelation and trace plots, as well as by the Geweke and Heilderberger-Welch tests.The resultant posterior distributions were unimodal.
The research experiments were conducted using the SAS ® software: the in-built MCMC procedure and the author's own SAS 4GL and SAS macro language computer programs.
The data were prepared taking into account: for the spatial (S) aspect: The results of Bayesian modelling for the spatial aspect are presented in Table 1, and for the temporal aspect in Table 2.The BM-S1models obtained in the first step are called prior models since they deliver informative prior knowledge for the second step.The BM-S2 models obtained in the second step are called posterior models because they are the final classifiers of the modelling procedure.Both tables have a similar structure: mean, and standard deviation values ( • Mean (S.D.)) of parameter distributions for the prior models (BM-S1 -prior) and for the posterior models (BM-S2 -posterior), reference of each posterior model to its corresponding prior model

Bayesian models for the temporal aspect
The sets of statistically significant input variables in the BM-S1(T) 1.
and BM-S2(T) models differ in two variables: age of correctly classified FSA cases), specificity (the percentage of correctly classified LA cases), and the harmonic mean of sensitivity and specificity HMSS (which balances the two measures).All the indices were calculated from the primary likelihood data set for the BM-S1 model and from the primary (nor balanced) likelihood data set for the BM-S2 model.
For each parameter of a Bayesian model, the highest probability density HPD interval can be constructed unambiguously, provided that the parameter distribution is not uniform.To some extent, the HPD interval corresponds to a credible interval in classical statistics -if it contains zero, values of its parameter cannot be clearly interpreted.The uncertainty is also indicated when the absolute value of the parameter coefficient of variation exceeds 50%.Such statistically insignificant parameters are highlighted in red in Tables 1 and 2. The HPD intervals for the statistically significant parameters of the final models (the BM-S2 models obtained in the second step) are illustrated in Figures 2 and 3.
In Tables 1 and 2, and in Figures 2 and 3, all the input variables are grouped according to their substantial meaning, i.e. accident location characteristics, at-fault driver's features, and accident features.

Bayesian models for the spatial aspect
The sets of statistically significant input variables are roughly the 1.
same in the BM-S1(S), as well as in both BM-S2(S) models.The driver's age group proved significant in the BM-S2(S) model for the DK74 road only due to the significance of the coded variable AgGrp_05 (50-65 years old).The directions of the influence of the individual statistically sig-2.
nificant variables on the accident severity are the same in the BM-S1(S) model and in both BM-S2(S) models.
The nature (magnitude and direction) of the change in the values 3.
of the statistically significant posterior parameters (the BM-S2(S) models) in relation to the values of the corresponding prior parameters (the BM-S1(S) model) is road-dependent: the positive influence of the accident location characteristics • on the accident severity is greater by more than 20% in the BM-S2(S) model for the DK74 road, whereas the change of the influence in the BM-S2(S) model for the DK7 road is different -there is a rise by 5% in the parameter mean for builtup area ArTp_Bt and a drop by 6% in the parameter mean for night darkness LgCnd_NgDrk, the positive influence of single-track motor vehicle (motorcy-• cle, scooter, and moped) VhTp_Mtr and the negative influence of female driver's gender Gndr_F on the accident severity identified in the prior parameter distributions become smaller by nearly 10% in the posterior distributions for the DK74 road, whereas they remain at almost the same level for the DK7 road, the modification of the parameter prior distribution for the sin-• gle vehicle accident variable NrVhIn_Sng by using the likelihood data taken from different roads caused different results in the posterior distributions: the parameter mean value rose by 7% for the DK74 road and dropped by 20% for the DK7 road, the range of the change in the parameter posterior distributions • for driver's behaviour is different for the DK74 and DK7 roads, which is particularly evident for not giving right of way Bhv_  sensitivity is greater than 57%, • specificity is greater than 65%, • the HMSS coefficient is greater than 61%.

•
A general picture of the coefficients of variation for the statistically significant parameters of the models is presented in Fig. 4 in the form of a bubble plot, where the centres represent mean values and the radii are standard deviations of the coefficients.The standard deviation values are similar, irrespective of the step (prior or posterior models) and the aspect (spatial or temporal) of modelling.A slightly greater difference can be noticed for the mean values -they are smaller for the parameters of the second step models, which indicates the better estimation precision of the final posterior models.

Conclusions
Parameters are random variables in Bayesian regression models.Their so-called posterior distributions are obtained by combining systematic (prior) knowledge about the parameters with Bayesian likelihood -the knowledge derived from data.Some issues concerning the methodology of such models development for road traffic safety analyses is presented in the study.A logistic regression model that classifies road accident severity is analysed.
Road accident data are treated as a potential source of both information types for the Bayesian model: prior knowledge and Bayesian likelihood.Some researchers apply such an approach in their road safety investigations.In the study, however, a specific interpretation of both sources has been proposed and consequently their special application in the modelling process in which an additional task to obtain the best possible final classifiers was considered as well.
Prior knowledge about regression parameters can be obtained from data the range of which depends on the subject of a research.If the investigation focuses on the spatial aspect, all accident data recorded on the same technical class roads in a given country region are a possible source of informative priors, creating a reference background for being updated by Bayesian likelihood originating from accident data recorded on a chosen road.Thus, a model related to the road is obtained.If the investigation focuses on the temporal aspect, historical road accident data create informative prior background, and new accident data from the latest registration period update the priors, providing a new general picture of the region road network safety.
Balancing likelihood data, in both spatial and temporal aspects, positively affects the classification quality of the final Bayesian logistic regression models.The result is particularly important since the level of correct classification of rare success categories, i.e. serious or fatal road accident severity, is crucial.
Bayesian regression models work well when a quasi-complete or complete separation of data points appears [15] in a short data set with qualitative input variables.Classical regression models estimated on the basis of such data are not credible due to some constrains of the maximum likelihood method used in the estimation process.To solve the problem, enlarging the data set (not always efficient for some specific data structures) or a suitable aggregation of categories within chosen qualitative variables (which causes the reduction of information delivered to the model) is recommended.In the research, the quasi-complete separation was detected in the training data set for the time aspect.However, no interference into the data was necessary owing to the Bayesian approach to the modelling tasks.
Notwithstanding their complex nature, Bayesian models become more and more widely used in road traffic safety analyses.As it was shown in the study, they can provide great possibilities in interpreting and utilizing real data.Further studies are recommended to confirm the obtained findings and to widen possible implementations of the discussed technologies.
• BM-S1(S): all the national roads in the Świętokrzyskie prov-• ince, for the time period 2008-2014 (the data set length is equal to 1329 records), BM-S2(S): the DK74 and DK7 roads for two independent • models, for the period 2008-2014 (after balancing, the data set length is equal to 220 and 196 for the DK74 and DK7 roads respectively); the main difference between the roads is that the DK7 road, being the part of the European road network, additionally serves international traffic, for the temporal (T) aspect: • BM-S1(T): all the national roads in the Świętokrzyskie prov-• ince, for the time period 2008-2013 (the data set length is equal to 1221 records), BM-S2(T): all the national roads in the Świętokrzyskie prov-• ince, for the year 2014 (after balancing, the data set length is equal to 60 records).
(1) night lighting condition LgCnd_NgDrk is insignificant in the BM-S1(T) model, tribution means of the corresponding model parameters calculated by the expression: (mean posterior(DK74) -mean posterior(DK7) ).The difference values are given in the Posterior comparison column in Table 1, Deviance Information Criterion ( • DIC) measure calculated from the training data sets: the unbalanced one for the BM-S1 model and the balanced one for the BM-S2 model, classification quality assessment measures: sensitivity (the percent-• sciENcE aNd tEchNology but significant in the BM-S2(T) model, (2) incorrect turning or U-turning Bhv_InTrUTr is significant in the BM-S1(T) model, but insignificant in the BM-S2(T) model.Similarly to the spatial models, the influence directions of the cor-2.responding statistically significant variables are the same in the BM-S1(T) model in the first modeling step and in the BM-S2(T) model in the second modeling step.The latest information modified the up-till-now (prior) knowledge 3. about the importance of the individual input variables in the posterior model, and in particular it caused strengthening the following: the positive influence on the fatal or serious accident • status of the factors: night lighting condition LgCnd_ NgDrk (increase by 10.6%), inappropriate speed for the prevailing traffic and weather conditions Bhv_InSpPrCn (an increase by 12.9%), incorrect overtaking or bypassing Bhv_InOvBp (an increase by 12.5%), the negative influence on the fatal or serious accident • status of the single-vehicle accident variable NrVhIn_ Sng (a decrease by 5.9%).Balancing the likelihood data in the second modelling step, both in spatial and temporal aspects, improves the classification quality of all the final Bayesian models.The values of the quality assessment measures are satisfactory:

Fig. 2 .
Fig. 2. HPD intervals for statistically significant parameters of Bayesian models for the spatial aspect Prior to the analysis, the data were cleaned and the records with outliers, missing or extremely rare values that couldn't be aggregated (considering the physical meaning of the values) were removed.The resultant data set includes 1329 observations and it consists of the following variables chosen for the investigation: the group of accident location characteristics (input variables):• ArTp •-area type with the following values: Bt -built-up area (39.2%),NBt -non-built-up area (60.8%),LgCnd • -road lighting conditions with the following values: NgDrk -night darkness, i.e. no lighting at night (16.6%),PrLg -poor lighting, e.g.dawn, dusk or artificial lighting (usually poor on non-urban roads) at night (14.7%),Dlgdaylight (68.6%),RdSrf • -roadway surface conditions with the following values: NDr -not dry, i.e. wet, snow-covered or ice-covered (38.5%),Dr -dry (61.5%), the group of at-fault driver's features (input variables):

Table 1 .
• by determining the index that, for any parameter, compares the posterior distribution mean with its corresponding prior distribution mean.The index is calculated by the expression (mean posteriormean prior )/|mean prior |.The index values are given in the Comparison columns for: DK74 vs. prior, DK7 vs. prior, and 2014 vs. prior, comparison of two posterior models for the spatial aspect (for the • DK74 and DK7 roads) by showing the difference between the dis-Results of Bayesian accident severity classifiers for the spatial aspect NGvWy (a drop in the mean value by 6.8% and a rise by 11.7% respectively), for incorrect turning or U-turning Bhv_InTrUTr (almost without a change and a drop by 9.1% respectively), and for poor psychophysical condition Bhv_PrPsCn (a rise by 5.2% and a drop by 7.7% respectively).

Table 2 .
Results of Bayesian accident severity classifiers for the temporal aspect