Characterisation of Motorway Driving Style Using Naturalistic Driving Data

The study of measurable differences between drivers has ramiﬁcations for several sub- ﬁelds in trafﬁc and transportation research. Better understanding of the variability in individual driving styles would be especially useful for understanding driver preferences, psy- chological mechanisms for vehicle control and for developing more realistic trafﬁc simulations. In our study based on a large naturalistic data set, we investigated the driving style of 76 individuals driving in a motorway setting. We discovered that the majority of between-driver variation in keeping longitudinal and lateral safety margins, lane changing frequency, acceleration and speed preference, can be reduced to two dimensions, which we interpret as habitualised motives centred around mental effort and expediency. (cid:1) 2020 The Authors. Published by Elsevier Ltd. ThisisanopenaccessarticleundertheCCBY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
The descriptions of individual variability in driving behaviour have been variously called ''driving style", ''driving patterns", or ''driver heterogeneity". The meaning of these names is rather fuzzy, the difference being not only semantics but also the context they appear in: in the literature concerning traffic simulation it is ''heterogeneity", while ''style" is the preferred term for traffic psychologists. Attempts at definitions have led only to more definitions as the researchers have not worked within the same conceptual or mathematical space. Over the several scales and measures which have been suggested, none have been accepted as the standard. A part of this is due to the complexity of the phenomenon itself, since to recognize and include the relevant variables influencing habitual driving behaviour is very difficult. In addition, different fields of study may have different aims in their modelling efforts.
In their review, Sagberg, Selpi, and Engström (2015) outlined a framework where driving style is proposed to contain the persistent habitual behaviours that differ between drivers (or groups of drivers). These habitual behaviours are a product of both environmental influence (socio-cultural norms or values), personal preferences and technical or instrumental factors, such as the vehicles that enable the behaviour in the first place. According to Sagberg et al., these habitual behaviours could be described as ''global styles", comprised of ''specific styles", which in turn would manifest themselves through various behavioural indicators. The modelling of these global or specific styles is still in a conceptual phase, with behaviours such as ''tailgating" being a part of ''aggressive style" and manifesting as some measurable threshold of following distance or time gap. Sagberg et al. suggest that underlying motives might function as the cause of formation of a particular style. For example, specific styles associated with aggressiveness might be caused by a desire for punishing other for offences.
The research on driving styles is also important in the study of psychological mechanisms for vehicle control. It provides information on the norms and variations of the driver's habitual behaviour as they engage in the perceptual-motor loop of driving a vehicle. In a naturalistic task, this reveals the driver's preferences, which help in analysing the driving task and the motivations behind control actions. The various scales and classifications describe common factors behind everyday actions and stated preferences, which may be used to update existing theories of driver behavior.
In their work on perception of risk in driving, Näätänen and Summala (1976) and Summala (2007) also establish a role for motives in determining driving behaviour: certain excitatory motives push drivers towards greater speeds (and hazards), while the subjective feeling of risk limits those behaviours. By their account, driving is a self-paced task and drivers exert control through various safety margins, regulating the time available for what they are doing. Over time, these behaviours turn into habits that are upheld as a comfortable personal norm in driving. It is important to note that in both cases it is assumed that the motives manifest themselves in observable behaviour. Itkonen et al. (2017) found that ''jerky driving" and ''tailgating", as measured by acceleration, jerk, and time gap, were correlated on a between-subjects level, and could be represented by a uni-dimensional measure, which they called ''driving intensity". This finding prompted further questions on the structure of Sagberg et al.'s proposed structure of driving styles. If people could be placed on a single dimension of ''observed longitudinal driving style", perhaps other habitual behaviours could be related to this measure as well.
In this study, we investigated a naturalistic field operational test (FOT) data set containing motorway driving in order to uncover statistical relationships between well-defined driving behaviours. We demonstrate psychologically plausible factors to describe driving style using longitudinal and lateral safety margins, car-following, lane-changing and speed preference. We argue that this work is particularly useful for understanding driver preferences and psychological mechanisms of vehicle control, as well as building a base of knowledge for the simulation of driver heterogeneity. We build on the broad theoretical foundation by Sagberg et al. (2015), and argue for informative, robust and replicable measures to use for driver behaviour.

Previous work
Better measures of driving style are needed. A lot of work has been done in traffic psychology on the differences in selfreported driving style through questionnaires (Reason, Manstead, Stradling, Baxter, & Campbell, 1990;French, West, Elander, & Wilding, 1993;Lajunen & Summala, 1995;Taubman-Ben-Ari, Mikulincer, & Gillath, 2004;Martinussen, Møller, & Prato, 2014). While the self-reported methods can produce impressive results when coupled with observed measures, we omit this literature from this paper in favour of observed behaviour. Especially, if we want to achieve higher-fidelity simulation models accounting for between-driver differences, we believe we must begin with an inquiry into what is readily measurable.
Some research on variability in driving patterns using observable data has been done in relation to fuel exhaust emissions. In her study of driving patterns, Ericsson (2000) investigated the variability in the driving patterns of 12 drivers driving a controlled route in Lund, Sweden. The route included five different road types, including urban roads, with all but one being two-lane roads with a speed limit of 50 km/h. The resulting analysis of variance uncovered the largest significant effect for road type, with between-driver variation coming second. The variables under study were estimated parameters of the speed, acceleration and deceleration distributions. In a follow-up study, Ericsson (2001) created a similar experiment with a group of 30 families, each using one of five vehicles of different models. The vehicles could be driven by several people during the trial period of two weeks. They decided on 62 different driving patterns and reduced them to 16 factors, which were then correlated to the emission models from two vehicles. The main finding of the study suggested a moderate correlation between acceleration and stopping behaviour and fuel consumption.
As investigations into driving style, Ericsson's studies have some limitations. Among them are the small group of participants, variable vehicle types and lack of measures relating the vehicle to the immediate environment, such as safety margins. Moreover, the study of Ericsson (2001) was not designed to reveal between-driver differences rather than classify a large group of samples which did not take individual drivers into account.
The practical importance of empirical grounding for driver variability becomes apparent when considering traffic simulation: some simulation studies have shown that heterogeneous platoons behave differently than traditional microscopic models, where drivers are assumed to be identical (Ossen & Hoogendoorn, 2007). It is possible that the stochastic nature of congestion formation can be better understood by inducing the stochasticity on the level of individual vehicles rather than traffic flows. Simulating the effects of automation on traffic phenomena also requires introducing realistic variability to the human drivers. Car-following models have been criticised for being psychologically implausible or in need of development to incorporate human factors (Van Winsum, 1999;Saifuzzaman & Zheng, 2014). Ossen and Hoogendoorn (2007) studied the effects of introducing heterogeneity in the parameters of a multi-leader car-following model (so-called Helly model) with the aim of analysing it's effects on the stability of traffic. While the study found differences between the parameter configurations, it is uncertain whether they generalise to real traffic as neither the car-following models nor the heterogeneity induced are validated empirically. In a follow-up study Ossen and Hoogendoorn (2011) calibrated several car-following models to novel real-world trajectories and concluded that different drivers may employ different stimuli in their carfollowing responses, since different models suited particular drivers better than others. While this is possible, we would note that it has been very difficult to calibrate car-following models faithfully to real trajectories, and that during the calibration, variation within a single driver and trip can be larger than the between-driver variation (Brockfeld, Kühne, & Wagner, 2004;Kesting & Treiber, 2008).
In a recent study James, Hammit, and Ahmed (2018) calibrated three car-following models to study the resulting parameter distributions in relation to biological and socio-economic factors such as age, gender and income. The models were the Gipps model (Gipps, 1981), the Intelligent Driver Model (Treiber, Hennecke, & Helbing, 2000), and Wiedeman99 from the VISSIM software suite. Minor differences between the factors were found, but no model was found to consistently outperform the others.
In summary, the literature suggests that the current car-following models are unable to capture a single driver's behaviour at the fidelity required for studying between-driver differences.

Objective
In this paper we will identify underlying factors in motorway driving style by investigating a dataset from naturalistic field operational test. In addition to contributing to the theoretical debate of driving style, we argue that the identified factors can be useful for improving the traffic simulation models to account for individual variation.

Methods
The methodology of the study consists of defining the criteria of inclusion for participants and trips, extracting the relevant driving segments of interest, and deciding the methods of statistical analysis.

EuroFOT dataset
The data used in this study comes from the Swedish test site of the euroFOT project (Kessler et al., 2012). EuroFOT stands for European Large-Scale Field Operational Tests on In-Vehicle Systems. It was an EU-funded project, active from 2008 to 2012, to study the effects of Advanced Driving Assistance Systems (ADAS) (Kessler et al., 2012). The Swedish portion of the data were collected from 100 instrumented vehicles and more than 200 participants, from whom a year's worth of driving was collected. The instrumented vehicles were Volvo Cars models XC70 and V70.
EuroFOT was designed to study the effects of ADAS, and it's design reflects this with a split to ''baseline" and ''test" phases. The baseline phase took up four months of the twelve-month driving period. During this period all driving assistance features except for the normal cruise control system were disabled. Only the baseline phase was included in our study, since we did not want the driving assistance systems interfering with the behaviour of the participants. One of the reports of the project stated that the headway keeping of the participants was different depending on the level of automation of cruise control (Benmimoun, Pütz, Zlocki, & Eckstein, 2013). Therefore we further excluded driving with the regular, non-adaptive, cruise control from our study.
Importantly, we constrained our data to contain only motorway driving. This was done to ensure that the driving environment remained reasonably homogeneous. Most motorways in Sweden have at least two lanes on each direction, permitting overtaking depending on traffic conditions. The minimum amount of motorway driving required for each participant included in this study was 135 min. Although this lower bound is arbitrary, we believe that approximately three times the amount of driving as required by Itkonen et al. (2017) in a controlled, one-lane simulator study, would suffice for variation in traffic conditions not to have an overbearing effect. Additional, measure-specific constraints are discussed in Section 2.2.
Only data from drivers who have given consent for their data to be used for scientific publications and future studies (i.e., after euroFOT project) were considered for inclusion. After all the constraints were applied, 76 participants met the criteria for inclusion in the study. 32% were female, with a mean age of 45.4 years (SD = 9.32), with the males having a mean age of 49.3 (SD = 8.90). The female participants drove on average 8.70 h, whereas the men drove an average of 9.95 h. The participant with the longest amount of motorway driving drove for over 32 h.

Measures of driving style
We chose ten observable measures to represent the driving style (Table 1). Six measures were based on the longitudinal control. Median time gap for each participant was chosen to represent headway keeping (or conversely, tailgating), as well as the median time-to-collision when the headway was shrinking. Interdecile ranges (IDR) of the longitudinal acceleration and jerk distributions, along with the medians of the absolute values of longitudinal acceleration and jerk represent the ''jerky driving" style of Sagberg et al. (2015). We interpret the deciles of the distributions as limits of comfort for each driver, as the situations at the very extremes are rare and at least partly unintended. For lateral control of the vehicle, we took the median absolute lateral acceleration and median time-to-lane-crossing. In addition, we use lane changes per hour as indicative of tendency towards overtaking behaviour. Finally for speed preference, we used a standard score of the average deviation from the posted speed limit. For each driver, we computed In the above, there were k drivers with n samples each; v k is the average difference between the current speed limit and the driver's instantaneous vehicle speed, v was the sample mean over k drivers and S was the sample standard deviation.
The speed preference represents individual drivers tendency to drive faster or slower than their peers. The measure included only the road sections without a leading vehicle present in order to describe the speed the drivers were willing to drive in unconstrained traffic. The speed limits varied between 70 and 120 km/h, which is typical for Swedish motorways. It should be noted that this measure does not consider if the speeding behaviour is consistent across speed limits, for example, if some drivers speed only at low speed limits.
The measures were chosen because of the relative ease in interpreting their meaning, and because they can be measured robustly with common instruments, which makes our results replicable.
In many older studies, driver behaviour has been measured with little emphasis to the events and surroundings of the vehicle. Speed choice alone does not tell much about a driver's preferred speed, unless we consider the circumstances that the driver is reacting to. For that reason, whenever possible, we place high value on the ''stimulus" as well as the ''response". For this reason we do not consider e.g. only the average speed, rather than the speed relative to other drivers, to be the more descriptive measure of habit or preference. Similarly, the time gap distribution is essential to understanding what is, in traffic simulation, considered the most important speed-modifying stimulus of all: the vehicle in front of the driver. The time gap, time-to-collision and time-to-lane-crossing are also interpreted more generally as time-margins or safety-margins, which are regarded as important control variables (Summala, 2007). The importance of time gap and time-to-collision as the main safety indicators in car-following situations have been verified by analysis from real driving data (Liu, Selpi, & Fu, 2018) and from simulator study (Mai, Wang, & Prokop, 2019).

Analysis
For analysing the covariation between the driving style features we used Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA). For easier interpretation, we reversed the measures time gap, time-to-collision and time-to-lane-crossing.

Results
To assess at a glance the amount of shared variance across the measures, a principal component analysis was performed. Fig. 1 shows a scree plot of the explained variance ratio, where the first principal component explains 45% of the total variance, leaving 22% for the second component, with the rest trailing off to single digits. Together these two components explain 67% of the variation in the data and are the only ones exceeding an eigenvalue of 1, a traditional demarcator in model selection.
Based on the component structure, we explored a two-factor solution for the data. Fig. 2 shows the varimax-rotated EFA factor loadings, where it can be seen that most measures loaded on either of the two factors, while the longitudinal time margins show a degree of cross-loading. Table 2 shows the same information, including pre-rotated loadings, in table form. The grouping of the measures in this space shows a cluster of longitudinal acceleration behaviours on one axis and another that comprises speed-preference and lane-changing behaviours. These appear to be independent from each other, while the longitudinal time margins, time-to-collision and time gap, exhibit cross-loading on both factors. Allowing the axis to covary through a non-orthogonal rotation (oblimin) method did not markedly simplify the structure further, as the factors show small correlation (rð75Þ ¼ :23; p > :05). The Cronbach alpha for the ten items in question was 0.85.
Comparing the measures individually results in some high correlations. From the correlation matrix in Table 3, one can see that those measures which were derived from the same distribution correlate highly, such as the median absolute of longitudinal acceleration and it's interdecile range (rð75Þ ¼ :89; p < 0:0001). Some correlations can be viewed as a sanity check for the validity of the measures i.e., the frequency of lane changing and the time-to-lane-crossing (reversed) correlate (rð75Þ ¼ 0:60; p < 0:0001) as one would expect, since lane changing is logically preceded by closing in on the lane boundary (but not vice versa).
In summary, the covariation in observed style measures can be largely reduced to two factors, with factor 1 encompassing longitudinal safety margins and acceleration behaviour, and factor 2 loading on speed preference and lane changing behaviour. Some cross-loading is present for the longitudinal safety margins.

Discussion
Our results suggest that the majority of between-driver variability in motorway driving can be characterised by two distinct factors, one encompassing the keeping of longitudinal safety margins and acceleration behaviour, and another related to speed preference and lane changing behaviour. We will seek to interpret these results as habitualized behaviours related to driver's motivations, and tentatively call the first factor intensity and the second expediency. Sagberg et al. (2015) suggested that ''global" driving styles may be conceptualised in terms of their underlying motives. These kinds of motives must be systematic and persistent enough to become habitualized and be counted as belonging to a driver's style. Sagberg et al. themselves give examples such as ''aggression" or ''expediency", but due to the lack of empirical evidence in co-occurring driving behaviours this has not progressed beyond theoretical foundations. Aggression is difficult to operationalise with observed measures without proper context and intent, and it is no surprise that aggressive driving has numerous definitions. Instead, we seek to interpret the empirical results in a way which gives substance to the theoretical frameworks.
Interpreting the two factors as stemming from different habitualized motives, we suggest that the main motive behind factor 1 or ''intensity" is to locate an individually preferred level of mental effort relative to the driver's capability. The control of longitudinal time margins is equivalent to the control of available time to react to the leading vehicle, and the driver is free to adjust this margin according to their preference (Summala, 2007). Previous studies give tentative support to this conclusion: Pekkanen, Lappi, Itkonen, and Summala (2017) showed that time gaps and attention allocation are tightly coupled on car-following tasks, where less attention allocated leads to the elongating of the time gap to the leading vehicle. Similar to this study, Itkonen et al. (2017) found a connection between short time gaps and jerky acceleration and deceleration behaviour. If a driver follows the lead vehicle closely and is willing to expend attention and effort to remain in that position, there will be more rapid accelerations and decelerations.
It is also known that cognitive load can influence driving speed, and vice versa Recarte and Nunes (2002), Engström, Markkula, Victor, and Merat (2017). Drivers prefer speeds which minimise the need to exert cognitive control over the speed, Fig. 1. A scree plot for explained variance and eigenvalues for the ten Principal Components (PCs) from the PCA analysis. The X-axis displays the Principal Component and the Y-axis on the left shows percentage of variance explained, while on the right the Eigenvalues are shown.  which is presumably the speed they are habituated to drive in a certain environment. Deviating up or down from that speed requires some cognitive control, that is, mental effort. In car following context, this may mean that drivers can either choose to spend their mental effort for maintaining a lower speed and not ending up close to the lead vehicle, or following closely the lead vehicle. The second factor reflected speed preference and lane changing frequency. Here, lane changes are most likely a consequence of higher speed preference, because maintaining a higher speed requires overtaking slower vehicles. For this reason the second factor could be considered a manifestation of the basic motive of expediency. It is interesting to note that this appears as an independent orthogonal factor from the acceleration and jerk behaviour. While faster drivers do exhibit slightly shorter longitudinal time margins, one must be careful with interpretation, as this could be the result of momentarily closing the gap before overtaking occurs.
In comparison to the studies by Ericsson (2000Ericsson ( , 2001, this study is contained in a limited environment, as opposed to several road types. It contains a limited, but informative group of input variables, which include safety margins that carry information on the limits of comfort in interacting with the surroundings. It would be interesting to correlate our measurements with exhaust emissions to see if they are sufficient for making estimates of environmental pollution. The differing design of the studies make it difficult to compare the results, except to state that between-driver variation exists.
The cross-loading of the longitudinal safety margin variables for both factors might suggest that there is overlap in these behaviours: Those high on expediency may not drive behind other vehicles for long, but they may have a habit of getting close while passing them -this would explain the cross-loading on longitudinal following. As both behaviours happen in the same longitudinal space, they are difficult to differentiate in the analysis.
When discussing habitual behaviours, it is necessary to discuss the sources of variation in the observed variables. They can be roughly grouped into three categories: driver-dependent (endogenous), vehicle-based, and environment-based variation. As we are interested in the driving style of the participants, which is the human component, any study should seek to control the other sources of variation. Driving behaviour happens in a real and dynamic environment, and it may not be possible to account for variation in an experiment not done in the laboratory. In this study all the vehicles were built on the same car-platform with only minor difference in model, so we have averted most of the vehicle-based variation. We have not taken into account any differences based on differing traffic or weather conditions, although the driving period coincided with the non-snowy season in the spring and the summer.
As the structure of the analysis is not designed, rather than exploratory in nature, it is not even reasonable to expect that all variance could be neatly explained by a handful of independent factors. The fact that they do show an interpretable pattern at all, gives us hope that the phenomenon is in fact tangible and that the variation in driving patterns can be dealt with greater precision than have been thought.

Implications for traffic simulation
There are benefits for traffic simulation in having the majority of between-driver variation in motorway traffic explained by two factors. These results can offer an insight into car-following models, where there has been a debate on the correct variables to model, be they optimal velocity (Bando, Hasebe, Nakayama, Shibata, & Sugiyama, 1995), safe following distance (Gipps, 1981) or time headways (Treiber et al., 2000;Van Winsum, 1999). Explicitly modelling time headways (or more precisely time gaps) and speed preferences in car-following and lane-changing models is supported by this study.
Based on our results, it could be enough to independently model the desired speed and desired time headway to reach a semi-realistic distribution of drivers (provided that the lane-changing model and car-following model share the parameters). Here, expediency would be reflected in the increase of lane changes and lateral movement (traffic permitting) as overtaking happens. Independently from this, a desired time headway parameter could keep the vehicle at approximately fixed amount of seconds behind the leader, and this would also determine the urgency of acceleration and deceleration to reflect the level of attention required for the task.
The results can also be used to estimate the space of acceptable variation in the ten variables in micro-simulation experiments. The variables are easy to record, and if the simulation set-up is valid, should provide a similar pattern of factors and their loadings.

Conclusions and future directions
In this paper we have presented evidence that the driving style in motorway driving can be described with two empirically observable factors. We have opted to interpret these factors as motive-based aspects of driving style in a motorway setting and termed them ''intensity" and ''expediency". These results will hopefully give some substance to theoretical debate on driving style.
The measures we opted to use are relatively easy to measure with regular vehicles, and include important control variables, such as safety margins. Therefore, they should be relatively robust and easy to replicate in future experiments. The exact relationships between speed preferences, mental effort and safety margins are yet to be described in full. Withinsubject variation and it's relationship with average forms of behaviour in these measures is another point of interest for future research.
In traffic simulation, future challenges include translating the observed variation between drivers into simulation models.
It is an open question on whether existing simulation models can reproduce the empirical results, or whether new models, sharing aspects of lateral and longitudinal control, have to be developed.

Funding statement
This work was supported by the Academy of Finland, Grant No. 279827.