Network models of driver behavior

The way people behave in traffic is not always optimal from the road safety perspective: drivers exceed speed limits, misjudge speeds or distances, tailgate other road users or fail to perceive them. Such behaviors are commonly investigated using self-report-based latent variable models, and conceptualized as reflections of violation- and error-proneness. However, attributing dangerous behavior to stable properties of individuals may not be the optimal way of improving traffic safety, whereas investigating direct relationships between traffic behaviors offers a fruitful way forward. Network models of driver behavior and background factors influencing behavior were constructed using a large UK sample of novice drivers. The models show how individual violations, such as speeding, are related to and may contribute to individual errors such as tailgating and braking to avoid an accident. In addition, a network model of the background factors and driver behaviors was constructed. Finally, a model predicting crashes based on prior behavior was built and tested in separate datasets. This contribution helps to bridge a gap between experimental/theoretical studies and self-report-based studies in traffic research: the former have recognized the importance of focusing on relationships between individual driver behaviors, while network analysis offers a way to do so for self-report studies.

148 Observations such as these function as an empirical motivation for creating the network models 149 reported in the present paper.
150 Figure 1 shows schematically the fundamental assumptions of latent variable models. Individual 151 behaviors, such as speeding or misjudging speed, are assumed to reflect the level of the 152 underlying latent variable and measurement error. Background factors such as enjoying speed 153 figure as predictors of the latent variables. Importantly, background factors are unrelated to 154 individual driver behaviors, which are assumed causally inefficacious. Typically, a dependent 155 variable of interest, such as the number of crashes, is regressed on sum variables representing 156 violations and errors 16 . One distinct benefit of such models is their simplicity: if the latent 157 variables indeed manage to capture the important commonalities among the individual behaviors, 158 they provide an extremely parsimonious representation of the data. In addition, the model shown 159 in Figure 1 is, on purpose, a rather simple example of a structural equation model; these models 160 are a flexible tool that would allow the researcher to add more latent variables and to represent 161 much more complex relationships among them. Further, the model shown in Figure 1 is not 162 intended to summarize theoretically or practically important findings in traffic research, but 163 rather, together with Figure 2, to illustrate the kinds of relationships that may obtain between the 164 different types of variables in latent variable models and network models, respectively. Manuscript to be reviewed 236 Procedure 237 The "Learning to drive questionnaire", which includes the question items related to attitudes, 238 was filled in after the practical driving test, prior to responding to the DEQ. Informed consent 239 was inferred from returned postal questionnaires in accordance with the social research 240 guidelines of Department of Transport.

Statistical analyses
242 Network analyses 243 The network analyses were based on the common, widely used 27-item version of the DBQ 3, 5, 30 244 together with items related to driving under the influence of alcohol and drugs, using the cell 245 phone while driving and having to brake or swerve to avoid an accident, resulting in 31 DBQ 246 items. These latter items are often omitted from latent variable models of the DBQ because of 247 their low factor loadings 5 , but they were now included due to their potential significance as 248 determinants of other driving behaviors. Two types of network models were estimated. The 249 cross-sectional network model was based on the 31 DBQ items together with nine items related 250 to the background factors. The latter were chosen from among candidate variables using a series 251 of exploratory network analyses (Supplementary Figs. S1-S3). The between-person network 26, 31 252 was formed by calculating average scores for each respondent across the four time points to 253 represent the overall pattern of driving behaviors during the first years of learning to drive. The 254 benefit of between-person networks is that they are less susceptible than cross-sectional networks 255 to spurious effects and reporting biases 26 such as mood-congruent recall 32 , as different biasing 256 effects are likely to cancel each other out.
257 In both models, individual questionnaire items correspond to the nodes of the network, while the 258 edges represent partial correlations controlling for all other nodes. The procedure attempts to 259 uncover a graph known as a Gaussian Graphical Model in which each node is independent of the 260 rest given the values of immediately neighboring nodes as described by the Markov properties 33 . 261 In other words, an edge connecting two nodes indicates their conditional dependence given the 262 other nodes. The edge weights were scaled to the joint maximum value of the two models to 263 ensure the comparability of the results. All network analyses were performed in R 34 using the 264 packages qgraph 35 and bootnet 36 .
265 Some of the partial correlations are likely to differ from zero because of sampling variation and 266 can be thought of as false positive findings 9 . For this reason, the graphical LASSO 37 was used in 267 estimating the networks based on polychoric correlations. The procedure constrains low values 268 of partial correlations to zero, thus resulting in sparse models 23 43 . The minimization of the GCVE aims at creating models that explain the maximum 296 amount of variance without overfitting the model to data at hand. This is achieved by trading 297 some increase in bias to a reduction in variance 43 .
298 The data set was randomly split into a training set and a test set (75 / 25 ratio, with N training = 864 299 and N testing = 288). The uneven ratio was chosen to enable a sufficiently large number of cross-300 validation splits, with initial model fitting and cross-validation taking place within the training 301 set, followed by fitting the same model in the test set using the R-package glmnet 44 . The same 302 penalty, controlled by the regularization hyperparameter λ, was applied to all the predictors, 303 which were standardized prior to analysis. Self-reported mileage was also log-transformed.  326 The results reported below are based on the maximum number of cases available for the 327 respective analyses as described in the Methods section (N = 1173 for the between-person model 328 and N = 8858 for the cross-sectional network model, respectively). First, the between-person 329 network is shown in Fig. 3. It illustrates between-person differences in the connectivity of the 330 driver behaviors during the first three years of learning to drive. Errors are shown as striped blue 331 nodes and violations as striped red nodes. The presence of an edge between two nodes represents 332 their conditional dependence when controlling for all other nodes.
333 First, in outline, violations and errors occupy different regions of the graph, and their distinction 334 seems a sensible rough description of the data. However, if all violations and all errors were 335 reflections of respective latent variables, we would expect all violations and all errors to be 336 interconnected 23 and for all violations and errors to be independent. In contrast, thematically 337 related violations are clustered together and connected by strong edges, aggression-related nodes 338 (v2, v4, v9) and speeding-related nodes (v5, v11) being a case in point. The edges are interpreted 339 as showing, e.g., that drivers who were more likely than others to exceed speed limits within 340 residential areas (v5) were also more likely to do so on highways (v11). Similarly, nodes related 341 to substance abuse (v8, v12) were connected to each other and few other behaviors, except using 342 the cell phone while driving (v6). Similar considerations apply to errors. For instance, the nodes 343 related to forgetting something (e3, e6) shared a strong edge and were connected to few other 344 nodes except absent-mindedness (v13).
345 Second, certain violations and errors were connected by relatively strong edges. Notably, drivers 346 who exceeded speed limits within residential areas (v5) were more likely than others to tailgate 347 other drivers (e15). Further, the tailgating drivers were more likely to need to brake or swerve to 348 avoid an accident (e19). Similarly, crossing junctions against a red light (v3, a violation) was 349 connected by roughly equally strong edges to other violations and errors.  Table S5). The indices whose CS-coefficient exceeded the 363 recommended value of 0.5 36 are shown in Fig. 4.
364 Both strength centrality (associations with immediate neighbors) and closeness centrality 365 (associations with all nodes) of the between-person network indicated the presence of a group of 366 nodes that were especially central in the network. Closeness centrality is perhaps the more 367 revealing in this context, as it capitalizes less on single strong edges. The node with the highest 368 closeness centrality was v5 (speeding within a residential area), followed by nodes v3 (crossing 369 junction on red), e15 (tailgating), e10 (queuing, nearly hit car), e17 (fail to check mirror) and v11 370 (speeding on motorway); nodes e10, v5, e15 and v3 had also high strength centralities. In 371 addition, nodes e14 (getting into a wrong lane) and e7 (failing to notice pedestrians) had a high The cross-sectional network model (Fig. 5) shows the DBQ variables in relation to various 380 background factors: attitudes (yellow), self-judged improvement needs (green) and self-image as 381 a driver (lilac). The model is based on data collected at the first time point, 6 months post-382 licensure. It appears at the first sight quite different from the between-person model, but this is 383 largely because the node placements are different due the use of the Fruchterman-Reingold 384 algorithm. Similarities between the two models are revealed by examining the connection 385 strengths and the centrality indices (Fig. 6). The indices whose CS-coefficient exceeded the 386 recommended value of 0.5 36 are shown in Fig. 6. The speeding-related nodes (v5 and v11) were 387 again central together with node e7. Node e16 (missing give way signs) now had a high strength 388 centrality, while in the between-person network the corresponding centrality value was average; 389 on the other hand, the connectivity patterns of the node were similar. The path v11-v5-e15 was 390 present also in this model, even though node e15 was now slightly less central. A prominent 391 difference between the models was that the aggression-related nodes v2, v4 and v9 were more 392 central in the cross-sectional model. Looking at the background factors, the self-judged improvement needs were strongly 397 interconnected. They were, however, also related to driving behaviors in a revealing pattern. In 398 general, they shared negative associations with different violations. In particular, the judged need 399 for improvement in changing lanes (in4) was related to less speeding (v11) and less pushing into 400 a lane (v7), while the perceived need of improving controlling the car (in1) was associated with 401 having problems with gears (e1) and car controls (e4). Negative attitudes to speeding (a1) and 402 overtaking on the inside (a2) were related to fewer self-reported behaviors of those kinds (v11 403 and v1, respectively). On the other hand, perceiving oneself as a fast driver (si2) was positively 404 associated with speeding (v5 and v11) and racing from the lights (v10). The drivers' self-405 judgment of themselves as irritable (si1) was closely associated with the anger-related nodes, 406 even though the node was quite redundant as judged by the clustering coefficient. Self-perceived 407 need of improvement in hazard-perception (in3) had a similarly high value of the clustering 408 coefficient. The self-reported driving behaviors, age, sex and mileage at the first time point were used in 415 predicting subsequent crashes. As seen in Table 1, the naive Poisson model fit the training data 416 best (as it should), but its performance deteriorated notably in the hold-out data. In fact, the 417 model fit the hold-out data worse than the null model without predictors, offering a dramatic 418 demonstration of how such models sometimes overfit data. The elastic net model fit the training 419 data the worst, but its fit to the hold-out data was almost identical to that of the ridge regression 420 model and is to be preferred due to its parsimony (16 vs. 42 parameters, respectively). Table 1  Manuscript to be reviewed 445 (error-proneness and violation-proneness). The empirical part of the study dealt with the first 446 three questions. Research related to the fourth is also discussed, as the context-sensitivity of 447 behavior is an important motivation for network models of human behavior.

---------------Insert
448 The dynamics of the relationships between the typical levels of individual violations and errors 449 were examined by constructing the between-subject network. The analysis showed that drivers 450 who were more likely than others to exceed speed limits within residential areas were also more 451 likely to tailgate the vehicle in front; further, the tailgating drivers were more likely than others 452 to need to brake or swerve to avoid an accident. These associations were interpreted as causal 453 hypotheses: drivers who exceed speed limits will likely catch up with the traffic flow and may as 454 a consequence need to react. Similarly, crossing a junction against a red light (a violation) was 455 associated with failing to notice pedestrians and with other errors. Importantly, such associations 456 between violations and errors are difficult to accommodate within latent variable models of 457 driver behavior.
458 Exceeding speed limits within residential areas appeared as the most closeness central and the 459 second-most strength central node in the between-subject network. Insofar as the centrality of a 460 node can be taken to reflect the causal connections emanating from that node, a successful 461 intervention of reducing speeding might affect the other behaviors directly or indirectly linked to 462 it. Care must be taken with such interpretations, since the edges in the network may well reflect 463 other factors than a cause-effect relationship 23, 24, 25, 66, 67 , but at the very least, such causal 464 hypotheses can be formulated based on the present results. In fact, network models benefit traffic 465 research precisely in that they encourage thinking of the dynamics of the behaviors. In contrast, 466 latent variable models remain silent in this respect, as they conceptualize individual behaviors as 467 causally passive indicators of the latent variables 15 . Thus, if we take the latent variable view 468 seriously, we can only influence individual behaviors through manipulating the latent variables: 469 whether we want to reduce drunk driving or speeding, we should aim at the drivers' rule-470 breaking tendencies, because influencing an individual violation has no effect on other behaviors 471 under the latent variable view.
472 The relationships between background factors and traffic behaviors were examined in the cross-473 sectional network model based on data collected six months post-licensure. Drivers who 474 perceived themselves in need of improving lane-changing skills were less likely to report 475 speeding and pushing into a lane. Further, the respondents reported behaving according to their 476 attitudes: a preference of decreasing speed limits was associated with less speeding. On the other 477 hand, perceiving oneself as a "fast driver" was associated with more speeding and racing from 478 the lights. In short, individual background factors were related to driver behaviors in an 479 understandable pattern.
480 Even though it is difficult to explain direct associations between driver behaviors using latent 481 variable models, they pose a challenge to the network view: certain behaviors are more likely to 482 occur together than others; why is this so if there are no latent variables? One can begin to 483 answer by considering relationships between behavior and environment, as illustrated by the 484 following quote from a study on driver irritation and aggression: "... Drivers who enjoy a somewhat faster speed than other drivers will more often be 486 obstructed by other traffic, and therefore they will become irritated more often and be 487 more likely to educate other road users. They probably also will become more irritated 488 than other drivers when obstructed, because they want a faster progress" 55 .
489 In other words, people have characteristics such as emotions, attitudes and personality 490 components 56 that affect their behavior in traffic, which is not only something to be explained, 491 but also a variable that feeds back into the system of emotions, perceptions and other behaviors. 492 Further, not all behavioral characteristics are equally compatible with each other. For instance, 493 the people described in the quote may be unlikely to make errors related to car controls, which 494 are perhaps related to lack of experience or lack of interest in cars. In technical terms, network 495 models exhibit non-trivial topology 47 . Further, they can -as also demonstrated in the present 496 study -accommodate background variables such as beliefs, and account for why beliefs, feelings 497 and behaviors become aligned 10 . For instance, the drivers described in the quote are perhaps 498 likely to consider speed limits in general too low to avoid cognitive dissonance between their 499 actions and beliefs.
500 The above example illustrated stable differences between people. However, people do not always 501 behave in a stable manner; rather, their behavior is context-dependent. The idea has much in 502 common with the cognitive-affective personality system 57 (CAPS) theory developed within 503 personality psychology, which posits if-then rules that describe how someone typically reacts in 504 a certain type of a situation. CAPS is influenced by connectionist models, and characterizes 505 behaviors, memories and emotions as differently activated by the situational context and each 506 other. The interconnected elements are described as a network in which activation is sustained by 507 feedback loops. Such situation-specific rules might apply to, e.g., young people driving with 508 their friends vs. with a mother and a baby on board. In the former situation, the driver's repertoire 509 of certain risky behaviors is more highly activated and activation spreads through excitatory 510 links. For example, the threshold of speeding might be lowered, activating an impulse to race 511 other drivers. In short, the network of driving behaviors can be seen as being in two qualitatively 512 different states distinguished by the connection strengths between the behaviors. As another 513 example, it has been shown that young mothers have an elevated crash risk when driving with an 514 infant passenger compared to driving alone 58 ; the dynamics of their behavior in these contexts 515 are likely to differ.
516 In addition to presenting network models of driver behavior, this study involved predicting 517 crashes from individual behaviors. In contrast, previous self-report studies have mainly used 518 latent variables in crash prediction (even though see Wallén-Warner et al. 59 for an exception). 519 The novel contributions of the present study were three-fold. First, crashes were truly predicted 520 from earlier behavior. Second, predictive models were first fit in a training subsample and then 521 verified in an independent subsample of data. Third, regularized regression was used to prevent 522 overfitting.
523 Three predictive models were tested, out of which the elastic net model 45  526 the ridge regression model, so it is to be preferred, other things equal. In the elastic net model, 527 predictors included displaying anger while driving and driving fast (disregarding speed limits 528 and racing from traffic lights). Further, errors in visual search (failing to notice people, missing 529 signs) and in controlling the car (driving off at wrong gear, forgetting the handbrake) were 530 included in the model. Interestingly, hitting something when reversing, which is itself a minor 531 crash 16 , predicted future crashes. Remarkably, the naïve Poisson model including all predictors 532 fit the hold-out subsample worse than the null model with no predictors, offering a dramatic 533 demonstration of the dangers of overfitting.
534 The present study has its limitations. First, network models are motivated by modelling the 535 components of a phenomenon, with a component defined as something having unique causal 536 relations with the rest of the network 56 . The DBQ has been psychometrically developed to 537 maximize reliability, which has resulted in a certain redundancy of the items. In this study, this 538 shows as high values of the clustering coefficient for the nodes related to aggression and 539 speeding, which could in future studies be represented by single nodes. On the other hand, 540 developing a novel questionnaire describing potentially causally related behaviors, thoughts and 541 emotional reactions is another option. Further, the behaviors examined here are influenced by 542 other road users, which could not be accounted for. For instance, aggressive behavior is difficult 543 to understand without knowing something about its target. It is naturally possible to take an even 544 more critical perspective and argue that none of the relationships indicate potential causal 545 relationships and driver behaviors should continue to be viewed as being caused by latent 546 variables. Even so, the present study presents a challenge: why are variables that commonly load 547 on different factors strongly correlated? Finally, it is likely that general psychological 548 characteristics, such as impulsivity, conscientiousness, neuroticism, agreeableness, attention and 549 memory capacity etc. would explain the associations observed in the present study. The fact that 550 information on such characteristics was not available is a clear limitation of the present study.
551 In addition to the substantive questions, some methodological decisions were problematic. First, 552 polychoric correlations that estimate normally distributed variables underlying ordinal input 553 variables were used, even though the variables were positively skewed. Although the practice is 554 widespread in psychometric network models, it is not optimal 60 . Further, listwise deletion of 555 missing data was performed; less wasteful methods are under development for network models 556 but not yet available 60 . Further, it has been shown that young respondents may give exaggerated 557 responses to questions they find funny, a response bias dubbed "mischievous responding" 61 ; this 558 could affect items such as drug use while driving. On the other hand, if such biases are transient 559 in nature, between-person networks are likely a suitable method to use 26 .
560 Choosing the correct components of driver behavior is a central issue to be tackled in future 561 network models of driver behavior. In addition, future studies should aim at developing a 562 network theory of driver behavior instead of merely applying a novel modelling tool. A recently 563 developed intricate error taxonomy 54 might provide a good starting point together with factors 564 contributing to such errors. Also, the need for self-report studies will remain in the future even 565 though studies involving instrumented vehicles have become ever more intricate 62 , because 566 physical measurements are underdetermined psychologically 29 : for instance, a sudden 567 acceleration can be due to either racing from the lights or unfamiliarity with car controls.
568 Further, the context-dependent nature of driving needs to be taken into account in future studies. 569 Drivers may behave differently depending on who they are traveling with; similarly, 570 investigating drivers' developing situation-awareness of when to desist from violating rules has 571 been called for 29 , and network models are an ideal tool for investigating such developmental 572 trajectories 31 . Another fascinating future direction is to consider the effects of the driver's state: 573 being fatigued, intoxicated, stressed, in a hurry or in a strong emotional state could conceivably 574 cause the network of driver behaviors to occupy qualitatively different states. Existing task 575 analyses and models of driving situations 63, 64 are likely to be a good starting point, because 576 individuals may behave in a more-or-less stable manner in a certain type of a situation, but not 577 across situations 57 . Intriguing avenues for future research await those willing to look into the 578 networks of driver behavior.

Conclusions
580 Representing violations and errors on the road as interconnected networks of behaviors, 581 cognitions and emotions makes it possible to formulate data-driven hypotheses on causal 582 connections between individual violations and errors. For instance, exceeding speed limits may 583 make it more likely for drivers to end up tailgating other vehicles, which in turn may make it 584 more likely that they need to brake or swerve to avoid accidents. In contrast, previous 585 psychometric work has been based on the latent variable view, according to which individual 586 errors and violations function as (nothing but) reflections of underlying psychological properties, 587 error-proneness and violation-proneness. It is argued herein that this is an overly simplified view 588 of the determinants of traffic behavior, and that the network view provides a useful novel point 589 of view in this respect.
590 More generally, network models show promise for bridging a gap between experimental and 591 theoretical work in traffic research on the one hand and self-report-based research on the other 592 hand. The latter has commonly assumed the existence of a small number of mutually exclusive 593 psychological mechanisms whose operation is reflected in respective sets of driver behaviors 594 (e.g. violations and errors) that can be represented using latent variables or sum scores. On the 595 other hand, the importance of individual driver behaviors, such as speeding, is recognized in 596 theories of driver motivation 48, 49 , studies that aim at determining reasons for speeding 50-52 and 597 engineering models of accidents 53 . Similarly, an error taxonomy involving action errors, 598 cognitive and decision-making errors, observation errors and information retrieval errors has 599 been proposed 54 , indicating the need to differentiate errors in a fine manner. Network models that 600 focus on individual driving behaviors and their interrelationships offer a novel point of view 601 from which to integrate the results of self-report studies with these lines of research.
602 In addition to presenting network models of driver behavior, this paper involved predicting 603 crashes based on individual errors and violations. This was done using cross-validated penalized 604 regression analysis, which resulted in a model that was both predictive of accidents and 605 generalizable to new data. Similarly to the network models, the predictive models can be Error e1: drive away from traffic lights at too high a gear e2: attempt to overtake and hadn't noticed signalling right e3: forget where left car in carpark e4: switch on one thing when meant to switch on other e5: pull out of junction so far that driver has to let you out e6: realised have no recollection of road been travelling e7: failed to notice people crossing when turned into sidestreet e8: misread signs and taken wrong turning off roundabout e9: turning left, nearly hit cyclist on inside e10: when queuing to turn left nearly hit car in front e11: misjudged speed of oncoming vehicle when overtaking e12: hit something when reversing that hadn't seen e13: noticed ending up on a different road than intended e14: get into wrong lane when approaching roundabout/junction e15: drive so close to car that would not be able to stop e16: missed giveway signs and avoided colliding with traffic e17: failed to check rear−view mirror before manoeuvring e18: brake too quickly on slippery road or steer wrong in skid e19: had to brake or swerve to avoid accident Violation v1: overtake a slow driver on inside v2: sound horn to indicate annoyance v3: crossed junction knowing lights have turned against you v4: become angered by driver and given chase v5: disregarded speed limit on residential road v6: used mobile phone without hands free kit v7: stay in motorway lane you know will be closed v8: drive when suspect over legal alcohol limit v9: become angered by driver and indicated hostility v10: raced away from traffic lights to beat other driver v11: disregarded speed limit on motorway v12: drove after taking drugs which affected you