An Investigation into Unsafe Behaviors and Traffic Accidents Involving Unlicensed Drivers: A Perspective for Alignment Measurement

Road traffic plays a vital role in countries’ economic growth and future development. However, traffic accidents are considered a major public health issue affecting humankind. Despite efforts by governments to improve traffic safety, the misalignment between the policy efforts and on-ground infringements, distractions and breaches reflect the regulatory failure. This paper uses the Bayesian network method to investigate unsafe behaviors and traffic accidents involving unlicensed drivers as a perspective for the regulatory alignment assessment. The findings suggest that: (1) unlicensed drivers are more likely to have unsafe driving behaviors; (2) the probability of being involved in a severe traffic accident increases when the drivers are unlicensed and decreases in the case of licensed drivers; (3) young drivers are noticeably more likely to engage in unsafe behaviors, usually leading to serious injuries and deaths, when their driving licenses are invalid; (4) women are more likely to engage in right-of-way violations and to have collisions with no serious injuries, contrary to unlicensed men drivers, who are involved in other types of traffic accidents resulting in serious injuries.


Introduction
Traffic safety has become a major public health concern all around the world [1][2][3]. The traffic safety problem is multidimensional, and many risk factors, i.e., technical factors (vehicles), environmental factors (the road and infrastructures), human factors (the road users) and their interactions, contribute to causing crashes [4][5][6]. Generally, two approaches to traffic safety studies have been established so far [7,8]. The first approach focuses on advancing engineering and enhancing traffic infrastructures, and the second approach is interested in the driver's individual factors and driving behaviors. Indeed, these two approaches are complementary to each other within the systems perspective of Vision Zero [9]. This global vision is conditioned by the efficiency of the abovementioned approaches and involves a mix of initiatives to address safe mobility issues, i.e., vehicle safety, safety of infrastructures and promotion of road users' behaviors [10,11].
This study investigates drivers' unsafe behaviors and violations and the factors concerning traffic accidents considering driving license status and the influence of demographic variables (i.e., gender and age) as a way to better measure the regulatory alignment. Thus, it contributes to the literature in several ways. First, considering the growing concerns related to unlicensed driving and questions about the magnitude of traffic accidents involving unlicensed drivers, the present study examines their severity in depth. Second, whereas previous studies discussed unlicensed driving mainly as part of unsafe driving behaviors, this paper considers it as a separate risk behavior: it investigates the behaviors of the unlicensed drivers and elaborates the mechanism by which their behaviors are influenced by demographic factors so as to proactively measure the regulatory alignment and enable policymakers to set proper corrections. Finally, to overcome the shortcoming of the approaches used in road safety research and allow better estimation of the risk and uncertainties, the present paper uses the Bayesian network methodology to model the interplay between the driving license status and the behaviors of the drivers and therefore assess the impact of the influential factors, i.e., the demographics. For conceptual clarity, in this paper, the term "traffic accident" is used to refer to the road accidents involving motor vehicles and occurring on roads open to public circulation. The traffic accidents are studied based on two factors: the type (i.e., collision, run over and other types), and the severity of the outcomes (none/mild or severe injuries/death).

Bayesian Network
Road safety has been approached by many disciplines, for instance, transportation engineering, economics, social sciences, psychology and safety research, and each discipline examines a particular facet. Most studies deploy frequentist approaches, which are ad-hoc and account only for the expected values and do not carry the force of deductive logic [31]. As a response to the limitation of the frequentist methods, and to model the interplay between driving license status and drivers' behaviors and to assess the impact of influential factors, the present study uses the Bayesian Network to derive posterior distributions from prior knowledge on the considered factors.
The deployment of the Bayesian methodology in recent decades has been developed for various subject areas for learning, modeling, forecasting and decision-making [32,33]. As regards the regulatory alignment and traffic safety research, Bayesian networks have been used to assess the safety impact of red-light cameras on the reduction of traffic signal violations [34], predict road safety hotspots [35], analyze the causation of road accidents [36,37], measure the influence the drivers' behaviors and psychophysical factors on injury severity and distractions [38], measure the influence of the seat-belt use on the traffic accidents severity [39] and analyze the role of the journey purpose in road traffic injuries [40].
The Bayesian network is a formalism that combines graph and probability theory to provide a compact and natural representation offering effective inference and efficient learning [41]. The directed acyclic graph (DAG) represents the structure of the Bayesian network and qualifies the causal relationship between the variables of interest, while probability theory is responsible for the quantification of the network, that is, the quantification of the probabilistic causal relationships between the variables through the joint probability distribution (Equation (1)) based on the Bayes theorem (Equation (2)) [42]: P(X 1 , . . . , X n ) = n i=1 P( X i |Parents (X i )) (1) P(A|B) = P(B|A)P(A) P(B) (2) where P(X1, . . . , Xn) reflects the joint probability distribution, Parents (Xi) are parents of Xi, P(A|B) is the a posteriori probability, P(A) is the a priori probability and P(B|A) is the verisimilitude. In this way, Bayesian networks consider the direct and conditional statistical dependencies between all of the study variables in one model. This flexibility allows the measurement of the influence of one or more variables on the target variable based on the a priori and a posteriori probabilities.

Cross-Validation
The practicability of the obtained Bayesian network and its accuracy are assessed using the K-fold cross-validation approach, and the Bayes Net Toolbox [43,44] for Matlab [45] is deployed to perform the cross-validation, generate the Bayesian Network and compute the sensitivity analysis.
In this study, a 10-fold cross-validation has been considered. Accordingly, the data were divided into 10 folds, each containing 10% of the sample, and 90% of the sample was used to predict the sample in each of the corresponding folds. This operation was repeated ten times, and the entire sample prediction was obtained by joining the 10 folds. The evaluation of model skills was therefore measured using the area under the receiver operating characteristic (ROC) curve, called AUC. This standard measure for probabilistic and binary classifiers ranges between 0 and 1, where less than 0.5 corresponds to opposite and wrong predictions, 0.5 implies random prediction and non-reliable model and 1 refers to a perfect prediction and denotes that the model is reliable.

Z-Test
To validate the conclusions driven from the sensitivity analysis, the statistical Z-test was used to measure the significance of the differences between the initial and a posteriori probabilities through the following hypotheses [46,47]: H 0 : P 1 = P 2 H A : P 1 P 2 where P 1 is the initial probability and P 2 is the a posteriori probability. Under the assumption of binomial distribution, the statistic test, Z 0 , is given by Equation (4): where n 1 and n 2 are populations of the probabilities P 1 and P 2 , respectively.

Data Acquisition
The dataset for the study was prepared from three years of official data (2016, 2017 and 2018) of traffic accidents in Spain. The original data were provided by the Traffic National Department of Spain and are made up of three databases: accidents, drivers and vehicles databases [48]: The drivers database contained data about the drivers involved in the accidents, for instance, age, gender and unsafe driving behaviors. - The accidents database contained data about the type of accidents and the severity of the injuries, zone, etc. - The vehicles database contained data about the vehicles involved in the accidents, for instance, the type of the vehicle, vehicle inspection and insurance.
In general, the three databases contained a total of 169 statistical elements (variables) collected from the "Form of Traffic Accidents with Victims" and 306,894 registered traffic accidents in which 524,785 drivers and 539,772 vehicles were involved. Each traffic accident has been registered with a unique registration ID; however, one or more driver(s)/vehicle(s) could have been involved in any registered traffic accident. The traffic accidents involving stationary vehicles, i.e., without drivers, were considered too.
For the purpose of the present study, the dataset used was obtained by filtering the original drivers' database to consider only car and motorcycle drivers and the study key variables, which were grouped into objective variables, i.e., the driving behaviors of the drivers and the traffic accident factors; the variables affecting the behaviors, i.e., the influential variables; and one evidence variable, i.e., the driving license ( Figure 1). In doing so, the final dataset contains only a total of 467,431 drivers. were grouped into objective variables, i.e., the driving behaviors of the drivers and the traffic accident factors; the variables affecting the behaviors, i.e., the influential variables; and one evidence variable, i.e., the driving license ( Figure 1). In doing so, the final dataset contains only a total of 467,431 drivers.

Study Variables
In the present study, special attention is paid to the unsafe behaviors of drivers based on the driving license status (Table 1). Unlicensed driving is defined as operating illegally motor vehicles on the road, putting these drivers themselves and other legitimate drivers at great risk [49]. In the context of the present study, the target variable is the driving license, including valid driving license and invalid driving license, which entails not only driving prior to the eligible age for licensing but also those unlicensed due to license expiration, suspension and cancellation, or inappropriate class of the license. The objective variables are therefore the unsafe behaviors of drivers and traffic accident factors. In this study, unsafe behaviors were grouped into four main groups: distractive behaviors, speed infringement, other infringements and right-of-way violations ( Table 2). As regards the traffic accident factors, two variables were considered, the type of the traffic accident-collision, run over and others-And the severity of the traffic accident, i.e., no-injury or mild injuries and serious injuries or death (Table 3).

Study Variables
In the present study, special attention is paid to the unsafe behaviors of drivers based on the driving license status (Table 1). Unlicensed driving is defined as operating illegally motor vehicles on the road, putting these drivers themselves and other legitimate drivers at great risk [49]. In the context of the present study, the target variable is the driving license, including valid driving license and invalid driving license, which entails not only driving prior to the eligible age for licensing but also those unlicensed due to license expiration, suspension and cancellation, or inappropriate class of the license. The objective variables are therefore the unsafe behaviors of drivers and traffic accident factors. In this study, unsafe behaviors were grouped into four main groups: distractive behaviors, speed infringement, other infringements and right-of-way violations ( Table 2). As regards the traffic accident factors, two variables were considered, the type of the traffic accident-collision, run over and others-And the severity of the traffic accident, i.e., no-injury or mild injuries and serious injuries or death (Table 3). The regulatory alignment in the context of road safety is a multidimensional construct and includes a wide range and multivariate combination of influencing factors, for instance, age, gender, decision-making behavior, personality, visibility, road type, zone, time and weather consideration and vehicle characteristics [50,51]. In the present study, the influential factors were grouped into three categories considering the available data: individual factors, situational factors and vehicle factors. However, the interplay between unlicensed driving and unsafe behaviors was assessed considering only the influence of the first group of factors, i.e., the individual factors, which include two demographic variables: age and gender of the drivers (Table 4).

The Bayesian Network Validation
As explained in the methodology section, the validation of the obtained model was performed with a 10-fold cross-validation method and the results are given in Table 5. All of the AUC scores range between 0.69 and 0.96 (with the exception of the affirmative status of the other infringements variable). These scores reflect the accuracy and high performance of the learned Bayesian network and confirm the practicability of the proposed approaches.

Z-Test
The differences between the probabilities used in the discussion of the sensitivity analysis results were examined using the statistical Z-test (results are given in Appendix A). The Z-test was conducted considering a confidence interval α of 95% (an admissible error of 5%) in a binomial distribution that proposes as limits +/−1.96 with Z 0.0/2 . To this end, all differences whose Z values are less than −1.96 or greater than 1.96 are acceptable statistical differences and significant.

Sensitivity Analysis of the Objective Variables Considering the Driving License Status
The initial probabilities for each of the objective variables considering the driving license status were computed, and results are given in Tables 6 and 7. A confidence interval of 95% was considered to assess the statistical significance of the probabilities change.  Values highlighted with an asterisk, *, are statistically significant at a 95% confidence level.
The results in Table 6 show that the probabilities of the licensed drivers having safe driving behaviors are almost two times the probabilities of the unlicensed drivers.
For instance, the probability of committing no right-of-way violations when drivers are licensed is 50.84%, while the probability decreases to 23.27% when the drivers are unlicensed. Similarly, the probability of being involved in a minor traffic accident with no injuries is high at 90.79% when the drivers are licensed, and the probability decreases to 82.88% when the drivers have an invalid driving license.
However, according to Table 7, the probability of speeding is high when drivers are unlicensed, i.e., 12.38%, and decreases to 8.01% when the drivers have a valid driving license. In the case of right-of-way violations, the results show that licensed drivers have the highest probability, i.e., 35.08%, which decreases to 27.79% in the case of unlicensed drivers.
As regards the severity of traffic accidents, the probability of having a serious traffic accident leading to death is 9.21% when drivers are licensed and increases to 17.12% when the drivers have invalid driving licenses (a difference of 7.91%).
As far as the types of traffic accident, results show that the probability of having a collision is high in the case of licensed drivers, i.e., 77.65% and decreases to 73.56% in the case of unlicensed drivers; however, the probabilities of run-overs and other types of traffic accidents are high in the case of unlicensed drivers, i.e., 9.11% and 17.33% respectively.
According to these results, the status of the driving license is likely to have an important impact on the driving behaviors of drivers and the severity of traffic accidents.

Sensitivity Analysis of the Objective Variables Considering the Driving License Status and the Individual Factors
According to the objective of the present paper and considering the learned Bayesian network that includes the joint probability distribution of the study variables, a sensitivity analysis was conducted to measure (1) the influence of individual factors and driving license status on drivers' behaviors and (2) the influence of individual factors and driving license status on the type and severity of traffic accidents. A confidence interval of 95% has been considered to assess the statistical significance of probability change.

Sensitivity Analysis of the Probabilities of the Drivers' Behaviors Based on the Driving License Status and the Individual Factors
As regards the influence of individual factors, the sensitivity analysis results of Table 8 show that the probability of engaging in right-of-way violations increases from 27.79% (initial probability) in the case of the young unlicensed drivers (<25 years old) to 28.39% (a difference of 0.6%), and from 27.79% (initial probability) to 29.86% (a difference of 2.07%) in the case of older unlicensed drivers (>60 years old).
The probability of compliance with speed limits increases in the case of drivers older than 40 years old, regardless of the status of their driving licenses.
However, the probability decreases when drivers are younger than 25 years old from 67.03% (initial probability) to 60.34% (a difference of 6.69%) in case of valid driving licenses and from 31.80% (initial probability) to 26.98% (a difference of 4.82%) in the case of invalid driving licenses. Similarly, the results show that the probability of committing speed infringement increases in the case of the young licensed drivers by 6.47% and by 9.21% when they are unlicensed. However, in the case of the older drivers (>60 years old), the probability decreases regardless of the status of their driving licenses. The probability of not having other infringements increases in the case of the older licensed drivers (>60 years old) by 2.75% and decreases by 1.87% in the case of young licensed drivers (<25 years old).
For distracted driving behaviors, the sensitivity analysis results show that the probability of having no distracted driving decreases in the case of the young licensed drivers (<25 years old) by 2%. However, the probability increases by about 2% in the case of older licensed drivers (>60 years old). Results of the sensitivity analysis of the influence of the gender variable on the behaviors of the drivers, considering the status of the driving license, are given in Table 9. In general, these results propose that the variable gender does not have an important influence on the driving behaviors of the drivers, regardless of the driving license status. However, some slight changes in the probabilities can be noticed. For instance, the probability of not engaging in right-of-way violations in the case of licensed men drivers shows an increase of 0.63%. However, the probability of engaging in these aberrant behaviors increases by about 2% in the case of women drivers regardless of the driving license status. For speed limit infringement, the probability increases by 0.48% in the case of the unlicensed men drivers.

Sensitivity Analysis of the Probabilities of Traffic Accident Factors Based on Driving License Status and Individual Factors
Results of the sensitivity analysis of the influence of driving license status and individual factors on the traffic accidents are given in Tables 10-13.  As regards the influence of the age variable and the driving license status on the type of the traffic accident, results of Table 10 show that the probability of being involved in a collision increases in the case of older drivers (>60 years old) by 2.25% in case of valid driving license and by 2.14% when their driving licenses are invalid. However, the probability of the younger drivers being involved in other types of traffic accidents increases by 3.06% when their driving licenses are valid and by 2.5% in the case of invalid driving licenses. As regards the severity of the traffic accidents, results in Table 11 show that in the case of older drivers (>60 years old), the probability of having a traffic accident with mild or no injuries increases regardless of the status of their driving licenses and it decreases in the case of serious traffic accidents.  However, with younger drivers (<25 years old), the probability of having a severe traffic accident increases, and does so more importantly when they are driving unlicensed, by 2%.
Results of the influence of drivers' gender and driving license status on the probabilities of traffic accident types are summarized in Table 12.
These results reveal that the probability of having a collision increases by about 2% in the case of women drivers regardless of their driving licenses, while in the case of unlicensed men drivers, the probability increases in the case of other types of traffic accidents.
As regards the severity of the traffic accidents, results in Table 13 show that the probability of having a mild traffic accident with no injuries decreases in the case of men drivers regardless of their driving licenses; however, it increases in the case of women drivers by 2.79% when they have a valid driving license and, more importantly, when they hold an invalid driving license by about 7%. The same results show that the probability of having a serious traffic accident decreases in the case of women drivers regardless of their driving licenses. However, in the case of men drivers, the probability of having a serious traffic accident increases and, more importantly, when their driving licenses are invalid (an increase of 2.38%).

Discussion
Despite the improvements in the legislation and enforcement of laws targeting many traffic risk factors [52] and the fact that the drivers are aware of the adverse outcomes of engaging in unsafe driving behaviors, regulatory breaches continue to be witnessed and severe injuries, disabilities and deaths caused by traffic accidents continue to be recorded. Research studies have either shown the relationships between unsafe behaviors and traffic accidents or explained the contribution of unlicensed drivers to the frequency of traffic accidents. However, the relationship between these has not been investigated.
To assess the regulatory alignment, this study investigated the unsafe behaviors of unlicensed drivers. Such a focus first sheds light on the illegal driving of unlicensed drivers that escapes, in one way or another, follow-up strategies and road safety improvement projects. Second, such a focus proposes a proactive perspective for the assessment and monitoring of regulatory alignment, which is better than doing so reactively depending on traffic accident data, to help policymakers detect real deficiencies and make efficient and effective countermeasures.
In doing so, three years (2016, 2017 and 2018) of data were obtained from the Spanish National Traffic Department, and a Bayesian network has been deployed to provide predictions of changes in the probabilities and estimate how individual factors, i.e., demographic variables, impact the objective variables considering the statistical dependency relationships in the Bayesian network model.
This study demonstrated that licensed drivers are more likely to engage in safe driving behaviors such as respecting speed limits and less likely to be involved in run-over traffic accidents. In contrast, unlicensed drivers were found to engage in more unsafe behaviors like speeding and to have severe traffic accidents. This finding supports previous research studies [29,53] that have reported similar observations on risky driving behaviors of unlicensed drivers such as speeding and non-use of seatbelts, showing that unlicensed drivers form an important part of the profile of regulatory misalignment and that better traffic safety results could be achieved if policymakers and road safety authorities tackle unlicensed driving. As regards the severity of traffic accidents, results of the present study show that the probability of being involved in a minor traffic accident with no injuries is high when the drivers are licensed. In contrast, the probability of having a serious traffic accident leading to death increases when drivers have invalid driving licenses. This finding is in line with conclusions of many scholars [54,55], confirming that unlicensed drivers are more likely to be involved in fatal traffic accidents than licensed drivers, and the severity of such accidents is therefore high.
However, the present study marked some exceptions and found that the probability of licensed drivers engaging in right-of-way violations is higher than that of unlicensed drivers. In our opinion, the explanation lies in the complexity of the phenomenon of driving behavior, which, in such a particular case, is not exclusively influenced by the status of the driving license. The high probability of licensed drivers engaging in right-of-way violations could be explained by the fact that unlicensed drivers become "prudent drivers" in the streets because, in many countries, if the driver is cited for driving without a valid driving license, they may be fined, barred from obtaining a valid driving license for a period of time or incarcerated. Indeed, as reported by [56,57], drivers on roads or highways are more likely to be unlicensed than drivers on streets because on rural roads and highways, less public transport and taxi services are available and, considering the long distances, the likelihood of the unlicensed driver encountering the police is slim.
Another finding of notable interest is that both elder and younger drivers have unsafe driving behaviors. However, the results showed that each age group is likely to engage in some unsafe behavior more than others. For instance, young drivers are more likely to commit speed infringement, especially when their driving licenses are invalid. In contrast, older drivers (>60 years old) are more likely to engage in right-of-way violations. This finding supports the results of [57][58][59], confirming that young unlicensed drivers are the least committed to traffic instructions and violate traffic lights and use mobile phones the most. Adolescence is a critical developmental period that brings many important cognitive, social and emotional changes, affecting these young drivers' ways of dealing with hazard and their proneness to engage in unsafe driving behaviors. Furthermore, as in many studies [60,61], the present study found that young drivers, and particularly young unlicensed drivers, are overrepresented in traffic accidents resulting in most of the serious injuries and deaths.
As regards the influence of the gender variable, the sensitivity results showed that women are more likely to engage in right-of-way violations and to have collisions. It was also found that the probability of having mild traffic accidents increases in the case of women unlicensed drivers. For men drivers, the results suggested that they are more likely to be involved in other types of traffic accidents and that the probability of having a serious traffic accident generally increases when their driving licenses are invalid. In general, these results are significantly consistent with many previous studies [62,63] that have agreed that women take fewer risks than men do when driving and are less involved in fatal traffic accidents.
To this end, it is clear that unlicensed driving is more than an unsafe behavior and that unlicensed driving motivates other disqualified driving performances. Thus, this study provides the most direct means for proactively estimating regulatory alignment and allows policymakers to better implement effective and efficient actions that might, first, buffer the impact of unlicensed driving unlicensed; second, reduce the likelihood of committing other unsafe behaviors; and finally, reduce the severity of traffic law violations and improve the alignment.

Conclusions
It is widely accepted that many people all around the world are killed or suffer disabilities due to traffic accidents. As a result, immense efforts are being made by road safety authorities all over the world to develop alternative ways to improve the behaviors of drivers at the wheel and therefore reduce the heavy costs of traffic accidents.
Relatively little previous research has investigated the mechanisms by which unlicensed driving affects driving performance and drivers' regulatory alignment. In this paper, the interrelations between the alignment and compliance with traffic enforcement regulations, unlicensed driving, unsafe behaviors and traffic accidents were investigated.
As expected, findings of the present study confirmed that unlicensed driving exerts a significant negative impact on drivers' behaviors and consequently their alignment with traffic regulations. Consequently, these findings provide evidence for promoting and improving traffic safety enforcements by targeting unlicensed driving in various safety education and enforcement programs.

Practical Implications
The present study provides a useful conceptualization of the regulatory alignment and the unsafe behaviors of unlicensed drivers that negatively affect traffic safety records. Accordingly, policymakers and practitioners could consider these results as the basis and empirical framework for interventions aimed at addressing unsafe behaviors and improving driving performance by paying more attention to the unlicensed driving problem. The interventions could fundamentally involve two important points: (i) the sanctions for the unlicensed driving should be reviewed, the laws tightened and special attention paid to unlicensed driving in prevention campaigns; and, (ii) since unlicensed driving is illegal and therefore goes underreported, moving towards using electronic driver licenses to deter unlicensed drivers from operating vehicles has become a necessity.
This study has also considered the use of big data techniques allowing, based on prior probabilities, the calculation of posterior probabilities, which is important to approach such public health problems and traffic safety studies.

Limitations and Future Research
The main limitation of the present study lies in the fact that the study variables were limited to those extracted from the database; however, there are many other unsafe driving behaviors and influential factors that could be of interest.
As regards the methodology, the machine-learning technique requires large amounts of data to train the data's behavior; consequently, the concept of unlicensed drivers, in this paper, has grouped all of the categories. Thus, it is recommended that future research considers the influence of each category separately. This is because not all unlicensed drivers are similar. For example, a driver whose license was suspended or canceled due to a past driving offense is not the same as a driver whose license was expired. In considering each category separately, the interventions targeting unlicensed driving could be more specified and the focus could be directed to the disqualified drivers only. Moreover, this study has investigated only the influence of individual factors, and follow-up studies could investigate the influence of other factors. Funding: This work started with funds from the Dirección General de Tráfico (DGT) for the Project "Modelo Cuantitativo de Red Bayesiana con capacidad predictiva de la gravedad del accidente en función de los comportamientos y actuaciones de las personas", ref. SPIP2015-1852 and pursued with research project "Modelización mediante técnicas de machine learning de la influencia de las distracciones del conductor en la seguridad vial. Diseño de un sistema integrado: simulador de conducción, eye tracker y dispositivo de distracción. Ref. BU300P18" supported by funds from FEDER (Fondo Europeo de Desarrollo Regional -Junta de Castilla y León).
Acknowledgments: As well as the funders, we would like to thank the Spanish General Directorate of Traffic (DGT) "Dirección General de Tráfico" for providing the data to undertake this study.

Conflicts of Interest:
The authors declare no conflict of interest.