Driver and Path Detection through Time Series Classiﬁcation

Driver identiﬁcation and path kind identiﬁcation are becoming very critical topics given the increasing interest of automobile industry to improve driver experience and safety and given the necessity to reduce the global environmental problems. Since in the last years an high number of always more sophisticated and accurate car sensors and monitoring systems are achieved, several proposed approaches are based on the analysis of a huge amount of real-time data describing driving experience. In this work, a set of behavioral features extracted by a car monitoring system is proposed to realize driver identiﬁcation, path kind identiﬁcation and to evaluate driver’s familiarity with a given vehicle. The proposed feature model is exploited using a time series classiﬁcation approach based on a multi-layer perceptron (MLP) network to evaluate their eﬀectiveness for the goals listed above. The experiment is done on a real dataset composed of totally 292 observations (each observation consists of a given person driving a given car on a predeﬁned path) and shows that the proposed features have a very good driver and path identiﬁcation and proﬁling ability.


Introduction
Driver identification allows discovering the identity of a vehicle driver using what he possess and/or his physical and behavioral characteristics [1]. This topic is recently becoming very critical for the automobile industry, achieved by an increasing number of always more sophisticated and accurate car sensors and monitoring systems able to extract information about the driver (i.e., hand geometry, keystroke dynamics, voiceprint). The reason for the interest towards the topic is that driver identification may improve driver experience allowing: i) a safer driving and an intelligent assistance in case of emergencies, ii) a more comfortable driving and, iii) a reduction of the global environment problems [2]. Looking for the driver safety, the driver identification may support to detect some changes in the driver (due to possible indisposition or state of being drunk) and activate any security procedures (for example, a ring may invite the driver to stop). The study of the driver behavior, for each segment of a road, also allows profiling each road section supporting the activation of alerts signals when more caution is required (for example, a vocal message may alert the driver to reduce the brake pressure in a dangerous curve) [3]. Concerning the driver comfort, for instance, the driver identification may discover which member of the family is currently driving the car and, consequently, perform an automatic setting of the car equipment (i.e., radio volume and frequency, temperature, or even speed limit) [4,5]. Finally, the driver identification can be useful to suggest new car improvements based on the driver preferences or new systems with the aim to reduce the car consume and pollution based on the driving characteristics. Basing on the above-discussed advantages deriving by the driver profiling and identification, several studies have been proposed in the last years focusing on the identification of driver physical and behavioral features. The physical features [6,7] are stable human characteristics that have been largely diffused in banking and forensic domain to guarantee a higher safeness concerning the more traditional authentication system based on the ownership of a key (this authentication system can be easily by-passed when someone comes in handy of the key). The behavioral features are used to detect individual personality features and are becoming the target of several studies in recent years that are mainly focused on speaker recognition [8]. The limit of these approaches is that they are based on the analysis of only one feature. This may cause a high uncertainty in driver identification especially if there is a noisy sensor. For this reason, new approaches based on multimodal identification systems are introduced [9,10]. They provide a more accurate driver identification based on the detection and analysis of a higher set of behavioral features.
In this work, we propose a set of behavioral features to perform driver identification, path kind identification (i.e., dirt road, road with bumps, highway) and to evaluate driver's familiarity with a given vehicle. Moreover, we exploit a Time Series Classification (TSC) approach based on a Multi-Layer Perceptron (MLP) network to evaluate the effectiveness of the proposed set of features extracted by using a CAN bus monitoring system. The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 describes the background of our study. Section 4 presents the proposed approach discussing the overall classification process. Sections 5 and 6 discuss a set of experiments to answer the proposed research questions using three datasets (made of 292 driving sessions performed by ten drivers on four cars in five different paths). Finally, Section 7 provides conclusive remarks and future directions.

Related Work
In the past, the real-world automotive data retrieving was limited due to the difficulty to equip the sensors in the cars. Since the introduction of CAN protocol 1 , this limit is overcome and driving style identification is becoming a very appealing scenario. CAN protocol defines a generic communication standard for all the vehicle electronic devices. CAN protocol can cooperate with the OBD-II diagnostic connector 2 that provides a candidate list of vehicle parameters to monitor along with how to encode their data.
As a matter of fact, researchers in [11] discuss a driver identification approach that is based on the driving behavior signals observed while the driver is following another vehicle. The analyzed signals (they were measured using a driving simulator) are the following: accelerator pedal, brake pedal, vehicle velocity, and distance from the vehicle in front. The approach obtains an identification rate equal to 81% for twelve drivers and equal to 73% for thirty drivers.
The accelerator and the steering wheel as characteristics to discriminate between different drivers are analyzed in [12]. They employ hidden Markov model (HMM) on the considered features to model the driver characteristics. They build two models for each driver: one trained from accelerator data and another one learned from steering wheel angle data. Consequently, the models are used to identify different drivers obtaining an accuracy equal to 85%.
Another HMM-based approach to model driver human behavior is proposed in [13]. This method employs a simulated driving environment to evaluate the effectiveness of the proposed solution. Van Ly [14] explores the possibility of using the inertial sensors of the vehicle from the CAN bus to build a profile of the driver observing braking and turning events to characterize an individual compared to acceleration events.
Authors in [15,16] represent gas and brake pedal operation patterns with the Gaussian mixture model (GMM). They obtain an identification rate equal to 89.6% using data extracted by a driving simulator and equal to 76.8% for a field test with 276 drivers, resulting in 61% and 55% error reduction, respectively, over a driver model based on raw pedal operation signals without spectral analysis. Considering data from steering wheel angle, brake status, acceleration status, and vehicle speed, researchers in [17] model the driver behavior through HMMs and GMMs with the aim to capture the sequence of driving characteristics acquired from the CAN bus information. They obtain 69% of the accuracy of action classification and 25% of accuracy for driver identification.
Authors in [18] classify real-world mechanical features from the CAN bus with four different classification algorithms: they obtain an accuracy equal to 0.939 using Decision Tree, equal to 0.844 using KNN, equal to 0.961 for Ran-domForest and equal to 0.747 using MLP algorithm. Researchers in [19] classify a set of features extracted from the powertrain signals of the vehicle, showing that the learned classifier is able to recognize the human driving style based on the power demands placed on the vehicle powertrain with an overall accuracy equal to 77%.
In reference [20], the features extracted from the accelerator and brake pedal pressure are used as inputs to a fuzzy neural network (FNN) system to ascertain the identity of the driver. Two fuzzy neural networks, namely, the evolving fuzzy neural network (EFuNN) and the adaptive network-based fuzzy inference system (ANFIS), are used to demonstrate the viability of the two proposed techniques.
Summarizing the results obtained from the above-described studies, we can conclude that the obtained identification rate is ranging from 25% [17] to 0.961 [18]. The method we propose is able to reach a precision rate equal to 99%, overcoming the current literature in terms of precision. Furthermore, the existing methods are often tested in simulated environments: a plethora of variables (like the traffic jam and the number of the cars involved in the scenarios) are set apriori.
Differently, we conduct experiments in the real-world environment, in order to take into account real-world variables that can not be predicted. In addition, we perform a set of experiments aiming to identify the car owner regardless the car and path. Differently, the other discussed methods usually perform the experiments considering a single setting: for example, in the experiment proposed in [18] (it is the method for which the best precision values are obtained) the drivers under analysis perform the same path on the same car.

Backgound: Time Series Classification approaches
Machine Learning (ML) explores the study and implementation of some algorithms aiming to learn from some monitored data and make predictions about their future values. Here, we focus on TSC algorithms used to classify sequences of observations of a phenomenon [21].
Time Series (TS) consists of a sequence of discrete-time observations evaluated at successive equally spaced points in time. They are widely diffused to describe the time course of a phenomenon. TS is useful to predict its future trend and relate it to other phenomena under study. For this reason, they are adopted in several domains with different aims. A problem recurring in several domains is TS classification (TSC) which requires training a classifier on a set of cases, where each case contains an ordered set of real-valued attributes and a class label (qualifying its kind or nature). TSC problems arise in a wide range of fields including environmental sciences, computational biology, image processing, and software engineering.
ML approaches are widely adopted to perform such classification. There are two main ML algorithms families: • Supervised Learning (SL): the classifier is presented with example inputs and their desired outputs, given by an oracle, and the goal is to learn a general rule that maps inputs to outputs. The learning phase is the process of building a model able to discriminate the classes from a set of records that contain class labels.
• Unsupervised Learning: no labels are given to the learning algorithm, leaving it to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data), or a means towards an end (feature learning).
In this paper, the focus is on supervised algorithms to perform classification of TS data.
A well-known class of classifier is based on decision trees. A decision tree can be used to predict the future values of an item or to perform classification tasks. The classification consists of splitting a dataset into smaller sets organized as a binary tree-like structure obtaining an easier description of the distribution of the data [22].
Random Forest [23] is a widely adopted decision trees algorithm proposing a way of averaging multiple deep decision trees (trained on different parts of the same training set) with the aim to reduce the variance. It starts by constructing several decision trees at training time. For each tree, the class (that can be the mode of the classification or the mean of prediction) is obtained. Random decision forests are used to correct decision trees habit of over-fitting to their training set.
Another well-known method for TS classification approach is based on Dynamic Time Warping (DTW) that allows an elastic shifting of the time axis, to accommodate sequences that are similar, but out of phase. DTW is a classic technique for time series classification. DTW is normally used with instancebased classifiers and can be incorporated into a decision trees algorithm. For constructing the decision trees, the J48 algorithm was exploited [22]. It starts by building decision trees from a set of training data and then visits all decisionnode. At each iteration, it chooses the most active node and splits until each leaf is reached and no more any splits are obtainable.
Other effective approaches to classification problems exploit neural networks. MLPs are among the most popular kind of neural networks and have been used in a wide variety of modeling, classification, prediction and optimization problems. They are sub-classes of more general structures called feedforward neural networks that are capable of approximating generic classes of functions (including continuous and integrable functions).
In this paper, an MLP network architecture is used. An MLP neural network is based on multiple layers of nodes in a directed graph where each layer is fully connected to the next one [24]. In particular, the simplest MLP network model describes a fully connected network with three layers (one input layer, one hidden layer, and an output layer) in which each node is a neuron that uses a nonlinear activation function.
An activation function in MLP network is a (typically non-linear) function transforming the weighted sum of inputs to a neuron into an output value. Usually, it maps the range of input signals into a reference interval (such as [−1, 1] or [0, 1]). The activation functions used in typical neural networks are classified into threshold, linear and nonlinear activation functions. For MLP networks the hidden layers have nonlinear activation functions whereas the output layer could have both linear and nonlinear ones. The MLP models adopt the following bipolar sigmoidal function (i.e. hyperbolic tangent sigmoid) as activation function: whereas a linear activation function is used for output layer. Looking for example to a single network model, it contains at least three fully connected layers (one hidden layers and one output layer). The hidden layer neurons are connected to the nodes of input layer by W weights, whereas V weights are used to connect the neurons of the output layers with the neurons of the hidden layer. Each layer applies the same activation function. Weights in the network are used to connect neurons between layers. MLP behavior and performances depends upon three fundamental aspects: (i ) activation functions of the units; (ii ) network architecture and (iii ) the weight of each input connection. Since the first two aspects are fixed, the behavior of the MLP is mainly defined by the current values of the weights. The weights are defined during the training process: they are initially set to random values, and then instances of the training set are repeatedly exposed to the network. The values for the input of an instance are placed on the input units and the output of the net is compared with the desired output for this instance. Then, all the weights in the net are adjusted slightly in the direction that would bring the output values of the net closer to the values for the desired output. There are several learning algorithms with which a network can be trained but the most well-known and widely used learning algorithm to estimate the values of the weights is the Back Propagation (BP) algorithm. An MLP network is trained using some form of gradient descent and the gradients are calculated using backpropagation. For classification, the learning algorithm minimizes the Cross-Entropy loss function (E L ) and weights are updated using Stochastic Gradient Descent (SGD) evaluating the gradient of the loss function as follows: where η is the learning rate parameter which controls the step-size in the parameter space search and the cross-entropy loss function E L is given by: where α||W || 2 2 is an regularization factor that penalizes complex models and α > 0 is a positive parameter that controls the magnitude of the imposed penalty.
For binary classification, the function computed by the network passes through the logistic function: to obtain output values between zero and one. In this case a threshold t would assign samples of outputs larger or equal to t to the positive class, and the rest of the negative one. When there are n > 2 classes, the output of the network is a vector of size n. In this case, in order to assign a single class to the input sample, the softmax function is used, which can be defined as: where z i represents the i-th element of the input to softmax, which corresponds to class i, and n is the number of classes. The result is a vector containing the probabilities that the sample x belong to each class and the output class is the one with the highest probability.
In this paper, a dataset that has n classes and an architecture based on n joint models (each with one output layer neuron) has been adopted. All the models are jointly trained across K-fold cross-validation over the classes on the whole input w, and hence they are all characterized by the same best parameters (i.e. same number of neurons in hidden layer, same learning rates, and same number of iterations). As already observed since this architecture, with n > 2 models, is suitable to detect more than two classes, it requires the softmax stage that takes the n inputs (the probabilities that the input sample belong to each class) and provides the best class as output.

The Methodology
This section presents the proposed approach for classification of drivers and paths using data extracted from vehicle sensors. The Figure 1 summarizes the overall mining process structured as two main sub-processes: (a) Datasets Generation and (b) MLP Training and Time-Series Classification. The remaining of the section will describe in more details these two sub-processes.

Datasets Generation
The Datasets Generation steps are reported in Figure 1-(a). The first step consists of the dataset cleaning (by performing the removal of incomplete and wrong data values) and normalization. The cleaning and normalization activity is necessary since real-world data tend to be rather noisy, not complete or even inconsistent. It applies techniques aimed at filling missing values, filter out the noise, and correct the inconsistent values (or remove them) from the data set. We adopted the following data cleaning sub-process to polish data produced by the car monitors in order to obtain a consistent dataset that is suitable for statistical inference: • fix missing values; • remove noise; • remove special character or values; • verify semantic consistency; • normalization.
In the last step numerical attributes are normalized using a Min-max normalization that performs a linear transformation of the original data. If min X and max X are the minimum and maximum values for the attribute X, the min-max normalization maps a value v i of X to a v i in the range {newM in X , newM ax X } by computing: The normalized data is split in two sets (Training and Test Set generation step): (i ) a training set used to train the classifier and (ii ) a test set used to assess the performance of the classifier. The dataset partitioning is performed by using a K-Fold Cross-Validation approach [25] consisting to split the data into k equally sized folds. Subsequently, k iterations of training and validation are performed (for each iteration a different fold of the data is used for validation and the remaining folds are used for training). Moreover, to ensure that each fold is representative, the data are stratified prior to being split into folds. This model selection method, according to [26] provides less biased estimation of the accuracy.

MLP Training and Time-Series Classification
The MLP Training and Time-Series Classification process is reported in Figure 1-(b) and is based on an MLP network for learning [27].
The adopted time-series classification approach is based on the following main steps: • Time Series Segmentation, in which the time-series are analyzed and divided into segments (i.e. windows); • Post-processing, in which for each window a representation based on values and trends features is generated; • Multi-Layer Percepton Network Model Generation, in which an MLP network is trained; • Classification, in which the trained classifier is tested on new time-series samples to perform the classification step.
In the first step, the multivariate time series is firstly divided into a sequence of segments by sliding a window (that can be fixed or threshold-based) incrementally across the time series values. In this paper, we experimented two time-series sliding window approaches: fixed windows and threshold-based windows ( we refer to it as start&stop). The former exploits windows of fixed sizes. In particular, we defined five windows of increasing sizes ranging from 1 sec to 60 sec. Outside of this range, results or performances were not acceptable. The latter segmentation approach defines time series regions based on signal variation thresholds. A start&stop region is defined as the interval in which the vehicle moves off from a steady state until it reaches another steady state. This means that only transitory parts of the signals are captured. The rationale for using just these portions of the time series instead of the entire signal is intuitive: when signals became constants, they give little information (especially for what concerns the driving behavior). This, in effect, has proven to be very effective in filtering out useless information that waste classifier memory but provides no useful information to help discriminating driving behavior or path kind.
After the segmentation, in the Post-processing step, a set of features is evaluated for each time-series window. In particular, each window w i is represented by a sequence of features (F v 1 , . . . , F v n , F t 1 , . . . , F t m ) containing n value-based features and m trend-based features. The first n features represent the discretized values of the time series and take one value in a discretized set (in the [0,1] range). The second set of m features describes the trend of the time series local to the window and is usually represented using shape-based metrics (including moments of various orders, mean, standard deviation, average energy, entropy, skewness, and kurtosis). In this paper, only a single feature is used as a value-based feature and two moments (standard deviation and skewness) are used to represent trend-based features. The approach to generating these representations is similar to the trend and the value based analysis proposed by [28].
In Multi-Layer Percepton Network Model Generation step, an MLP network, the ensemble of n networks has been used. It is instantiated for each dataset and exploits a number of networks model equal to the number of the classes to be detected.
The input vectors for the MLP network contains the window representation components (value, and the two moments) representing data for each feature reported in Table 1.
In this step, the MLP network is trained using the class labels available for each set of value-based and trend-based information of each window.
In the classification step, the trained classifier can be used to classify new data and is validated on new samples to assess its performances.

Features Selection
The proposed classification approach is based on some assumptions that are confirmed by the experimental results: i) each person has a different driving style, that can be recognized irrespective of the path or vehicle; ii) there exist features whose distribution of values is well separated from women and men allowing an effective genre identification; iii) there exist features, related to how the vehicle is solicited by the road, that allow recognizing road kind (among several types). Based on such considerations, we selected an initial set of sixteen features that are reported in Table 1.
The set of features reported in Table 1 is analyzed performing a feature selection step to reduce the dimensionality of the dataset. This step requires to analyze and to understand the feature's impact on the classification model. The most relevant features are selected by evaluating correlations among the features (the selection of a subset of an orthogonal and independent set of features allow  discarding redundant information). To this aim, we used a correlation-based features selection algorithm (CFS) exploiting correlation matrix filters as discussed in [29]. CFS approach allows rating the features using a correlation based heuristic evaluation function. This function is biased towards those subsets of features largely correlated with the class to be detected and uncorrelated with each other. All the unnecessary features can be ignored given their low correlation with the class and the redundant features can be eliminated given their high correlation with one or more of the remaining features. A feature is approved if it gives an extended and efficient classification in regions of the examples space not already covered by other features. The CFS evaluation function can be expressed as in the following: M S represents the heuristic goal function of the subset S containing k features. The mean feature-class correlation is indicated as r cf while r f f is the average feature-feature inter-correlation. The numerator of the equation indicates the capability of a set of features to classify an example while the denominator indicates how much redundancy there is among the features set.

Experiment Setting
In this section, we describe the features extraction process, and the dataset realized to perform the experiments.

Features extraction
We constructed a dataset gathering data from the CAN bus of a set of real vehicles. In order to collect data, the Torque Pro 3 application and Mini Bluetooth ELM327 OBD 2 Scanner were used.
The OBD scanner was installed on the vehicles to produce a self-diagnostic report generated by the onboard monitoring system.
The data is recorded every second during driving using Torque Pro application by an Android smartphone fixed in the car using an adequate support.

Datasets definition
In this paper three different datasets 4 (D A , D B , D C ) are exploited to answer the proposed research questions. All the datasets refer to the area shown in Figure 2, where the study has been executed: x and y axes of the figure are associated with longitude and latitude whereas the color represents one of five paths.
Each dataset contains two replicas of the same observation obtained with different conditions to avoid bias. D A is composed of two replicas of 16 observations and has the following characteristics: • the observations are performed by four drivers (DF 1 , DM 1 , DM 2 , DM 3 ) on 4 cars (Hyundai i20, Lancia y, Fiat Punto and Nissan Note); • for each replica, four persons drive four cars on the entire track.
D B is composed of two replicas of 50 observations having the following characteristics: • each observation describes one of the paths composing the track of Figure  2. As shown by the figure we consider five consecutive paths (respectively called P 1 , P 2 , P 3 , P 4 , P 5 ); • five men (DM 1 , . . . , DM 5 ) and five women (DF 1 , . . . , DF 5 ) drive the same car (Hyundai i20).
D C is composed of two replicas of 80 observations having the following characteristics: • each observation describes one of the paths reported in the Figure 2; • the observations are performed by four drivers (DF 1 , DM 1 , DM 2 , DM 3 ) on 4 cars (Hyundai i20, Lancia y, Fiat Punto and Nissan Note) and on 5 paths; • for each replica, each person drives all the 4 cars one time on each path.

Descriptive Statistics
In this section, descriptive statistics are used to describe the features used in this study (a quantitative analysis of the distributions of features has been performed). The analysis shows that several groups of features are well separated and median values do not fall into inter-quartile ranges of the distributions.
In order to provide statistical evidence that the features can be considered as characteristics of the driver behavior we show the box plot related to four drivers (i.e., DF 1 , DM 1 , DM 2 , DM 3 ) features distribution shown as boxplots. Figure 3 shows the distributions of four drivers related to Average Trip Speed, Litres Per 100 Kilometer, Trip Average KPL and Engine RPM.
As shown in the figure, the Average Trip Speed of the four drivers is the same but it is interesting to observe that the drivers DM 2 and DF 1 do not present peaks of speed (they keep the same speed variation both in acceleration and deceleration during the entire driving session).
The figure also shows that even if the interquartile ranges are comparable among the drivers, the median values are quite different for different features.
Moreover, for the feature "Liter per 100 Kilometer", distributions present different medians for the drivers. This is symptomatic of the different fuel consumption between the drivers involved in the experiment.
The driver DM 1 is confirmed to be the more aggressive relating to the driving style: indeed his median is very close to the 3rd quarter, and this is confirmed by the fact that he is the driver that reaches the higher speed (as confirmed by  Figure 3). The driver DF 1 , on the other side presents a fuel consumption very close to the one exhibited by DM 1 but, as we observe from the box plot in Figure 3 she reaches an average speed very close to the one of  The driver DM 3 exhibits an average fuel consumption slightly lower to DM 2 : this confirms the balanced driving style of the driver.
As shown by the boxplots, the medians of the Trip Average KPL (i.e., the ratio between distance traveled and fuel consumed) for the DM 2 and DM 3 exhibit the best values; moreover DM 1 and DF 1 present almost the same medians. This means that DM 2 and DM 3 can travel more kilometers with less fuel with respect to DM 1 and DF 1 .
The distributions relating to the Engine RPM box plot show that the drivers DM 2 and DF 1 exhibit the higher value related to the Engine RPM: probably they change gear too late, and hence the engine RPM rises. From the other side, we observe that the DM 1 and DM 3 drivers present a lower media value for the considered feature: this means that they do not stress the engine. This is also confirmed looking at the scatter plots of the distributions shown in Figure 4. The top side of the figure shows a long term average of the kilometers per liter that are done by drivers (shown in different colors). The scatter plot, in this case, shows that the distributions are well separated. Same considerations can be done for the distribution of velocity concerning path kind that is shown in the bottom of the same figure. In this case, the scatter plot reveals that different path kinds have quite different velocity distributions.

Experiments Description
In this section, the description of the experiments conducted to answer to the RQs introduced in Section 1 is reported.

Experiments design
The set of conducted experiments is synthesized in Table 2. The table shows for each RQ, the list of the conducted experiments (the experiment label, its goal, and the involved dataset).
Looking at the table, the RQ 1 is explored with five experiments (E 11 , E 12 , E 13 , E 14 and E 15 ). E 11 , E 12 and E 13 aim to evaluate the effectiveness of the approach in identifying the driver in three cases: • regardless of the car but fixing the path (E 11 ); • regardless of the path but fixing the car (E 12 ); • regardless of both car and path (E 13 ).
Moreover, E 14 and E 15 aim to identify the gender of the driver: • regardless of the car but fixing the path (E 14 ); • regardless of the path but fixing the car (E 15 ).
Similarly, the RQ 2 is explored with a single experiment (E 21 ) aiming to evaluate if it is possible to identify the path kind fixing the car but regardless of the path and the driver. The RQ 3 is explored by means of two experiments. The first experiment (E 31 ) aims at evaluating if it is possible to identify the driver familiarity with the vehicle fixing the car but regardless of the path and the driver. The second experiment (E 32 ) aims at evaluating if it is possible to identify the driver familiarity with the vehicle when car, driver, and path can change.
Each experiment consists in applying the classification method described in Section 4 on a specific dataset ( Table 2 reports for each experiment the considered dataset).
In particular in E 11 the classification method is applied to the dataset D A , since it contains several drivers and a single path. For what concerns E 12 , the dataset D B can be used to study the influence of the path, fixing the car (since it was built on a single car). In E 13 the dataset D C is used since it is based on four cars and contains five paths. E 14 is conducted using the dataset D A since we need multiple cars involving drivers of both genres. Conversely, E 15 is performed on the dataset D B since it is based on 5 paths for each driver and contains a well-balanced set of male and female drivers (five men and five women) on a single car. For E 2 we exploited the dataset D B since it contains twelve drivers, five paths and it was performed on a single car. The RQ 3 is explored in the experiments E 31 and E 32 involving respectively, the datasets D A and D B . Basing on the classification process described in Figure 1, each considered dataset is cleaned and normalized performing the Datasets Generation steps. The normalized dataset is used to generate a training set and a test set. Each experiment is performed using the MLP classifier as described in Figure 1.
Moreover, the experiment E 11 is performed using two alternative classifiers (RF and DWT) in order to compare the effectiveness of the MLP classifier respect to other approaches often used in literature. Finally, each experiment is also repeated using different window segmentation strategies (1s, 10s, 30s, 60s and the start&stop method) in order to evaluate the impact of the sliding window approach on the classifier performances.
Another consideration can be made on the training dataset. It is obtained as a partition of the normalized dataset and it is augmented with a column that specifies the classification labels associated with the evaluated instance. This column was derived by an expert looking at the evaluated instance in the area considered for the study. This classification is then used to both train the classifier and perform the evaluation. In the context of RQ 1 , for the experiments E 11 , E 12 and E 13 , the considered datasets are augmented with a column that specifies the driver identity. For the E 11 and E 13 the possible drivers identity labels can be DF 1 ,DM 1 ,DM 2 ,DM 3 (corresponding to all the possible drivers involved in D A and D C ). For the E 12 , looking the the explored dataset D B the drivers identity label can assume the following values:DM 1 ,DM 2 , DM 3 , DM 4 , DM 5 and DF 1 , DF 2 , DF 3 , DF 4 , DF 5 . Similarly, for the experiments corresponding to the RQ 14 and RQ 15 the considered datasets are augmented with a column that specifies the driver gender (driver is labeled as "Male" or "Female"). In the experiment conducted to answer the RQ2, the dataset D B was augmented with a column that specifies the kind of the path (the path is labeled as "City Street", "Highway" or "Dirt road") needed to perform the training and the validation. Similarly, in the context of RQ3, the explored datasets were augmented with a column that specifies the ownership ("Owner", "Not Owner").

Evaluation Strategy
The validation has been performed using classification quality metrics. The four metrics used to evaluate the performance of our approach for the research questions are Precision, Recall, Accuracy.
Precision has been computed as the proportion of the observations that truly belong to investigated class (e.g., driver, driver genre, driver path, driver's familiarity) among all those which were assigned to the class. It is the ratio of the number of records correctly assigned to a specific class to the total number of records assigned to that class (correct and incorrect ones): where tp indicates the number of true positives and fp indicates the number of false positives.
The recall has been computed as the proportion of observations that were assigned to a given class, among all the observations that truly belong to the class. It is the ratio of the number of relevant records retrieved to the total number of relevant records: where tp indicates the number of true positives and fn indicates the number of false negatives.
Accuracy is defined as a statistical measure of how well a binary classification is able to evaluate correctly the instances under analysis with respect to the considered features. Basically the accuracy is the proportion of true results (both true positives and true negatives) among the total number of instances evaluated..
In the following, for each RQ, the results of experimentation has been described and discussed. The datasets were all replicated once in different conditions of traffic and timing, to avoid the bias of such variables. The results were consistent with the ones obtained on the first replica and hence are not reported in detail. In the online repository 5 all the dataset, along with replicas are provided to allow experiments replication. Table 3 reports results for the experiment E 11 performed by using the MLP classifier.

Discussion of results
In the first column of the table, the adopted window sizes are reported. For each window size, we also specified the number of observations that are collected (for example, considering a 1-second window, the number of collected training segments is 14592). Starting from the third column, the table reports, for each segmentation choice and for each driver (column two) the values of precision, recall, accuracy, and training times obtained by using the MLP classifier. Precision and recall are evaluated for each class (the driver in this context) and as averages. The results of the experiment E 11 performed by using the RF and the DTW classifiers are shown in Table 4 (for briefly, they are shown only for this first experiment). Comparing the tables 3 and 4, we can conclude that the best results are obtained for MLP on almost all the sizes. Basing on the results, we can also conclude that the best segmentation choice on MLP is the threshold-based segmentation. However, MLP is also the slowest classifier in terms of training time. Moreover, MLP and RF performs better than DTW on medium window sizes and for threshold-based segmentation.
The E 11 results are also sintetized by Figure 5. The left side of the figure shows the trend of the classification accuracy for MLP, RF, and DTW with respect to the adopted window strategy. On the right side, the figure shows the training times: RF and DTW are comparable but several times (6x) faster than the MLP network.
For what concerns the experiments E 12 and E 13 , the results are reported in Table 5 and Table 6. As we can see, data follows the same trends observed for E 11 . The following considerations can be done: • driver classification on the dataset D A , on a single path, gives better results for the same identification on dataset D B that is performed on a fixed car but varying the path; • the effectiveness obtained in E 13 (i.e. regardless of the car and the path) is, as expected, lower than E 11 and E 12 effectiveness. This is important to estimate the quality of classification for applications where the path or the car is fixed. In those cases, it could be adopted a RF or a DTW approach since precision and recall are higher and could be acceptable (leading to faster training times). Conversely, MLP approach remains the best classifier in our experimentation in term of classification quality, and it is the best one to choose for the general case.
Results for precision and recall of the driver gender identification exper- iments (E 14 and E 15 ) for different size windows are reported respectively in Table 7 and Table 8.
The tables show that even if the best results have been obtained for the threshold-based segmentation, the fixed windows of sixty seconds provides quite reasonable results with a much more reduced training time. This means that for applications more sensitive to training time, it could be preferred. Table 9 shows the results of the experiment E 2 . It reports, for each segmentation choice, the results (precision, recall, accuracy and training times) of the runs on the MLP classifier for path kind identification among three classes (Highway, City Street and Dirt Road). Precision and recall are evaluated for each class (the path kind in this context) and as averages. The best results are also obtained in this case for threshold-based segmentation on the MLP. In particular, the threshold-based segmentation provides the best accuracy of 0.94 (that is also the best accuracy obtained for path kind identification) whereas, for fixed windows segmentation, the best values of accuracy was 0.75 (obtained for the fixed windows size of 30s). Fixed windows segments too small (around 1 second) and too wide (more than 60 seconds) provided very bad results limiting useful windows sizes in the range (10s,60s). Table 10 and Table 11 respectively report results of the experiments E 31 and E 32 . They report, for each segmentation choice, the results for MLP classifier in terms of precision, recall, accuracy and training times. The evaluated instance is the familiarity detection and it is labeled as "Owner" and "Not Owner". Precision and recall are evaluated for each class and as averages.
Threshold-based segmentation on the MLP confirms to be the best classifier. For the E 31 , the best values of accuracy (that was also the best achieved overall accuracy for familiarity detection) was 0.97. The performance trend among fixed windows sizes is consistent with other classifiers. For the experiment E 32 , the best values of accuracy is 0.9 showing that when car, driver, and path can change the classifier has worse performance.
Finally, even if it is not possible to directly compare the obtained results with the results obtained in the work discussed in Section 2 (each approach is tested on a different dataset), we can observe that the obtained precision rate is very encouraging. However, we reach a precision rate equal to 0.99 with respect to the precision rate of 0.961 obtained in [18] (it is the best precision described in related work).
We conclude this section reporting the trend of accuracy metric for all the experiments and all the segments. The Figure 6 highlights that the adoption of threshold based segmentation improves with respect to using fixed windows for  all the groups of features.

Threats to validity
In this section the main threats to the validity of our research are discussed. Construct validity represents the quality of choices about the particular forms of the variables (i.e., the choice of outcome measure or the choice of treatment).  They concern the relationship between theory and observation. In our proposal, some problems can be introduced by the hypothesis guessing of all the involved Internal validity is concerned with the possibility that some factors would be more suitable for the proposed features to perform classification. To exclude this eventuality, we performed a specific feature selection step studying correlation and independence for all features available in OBD II standard. Moreover, in order to best validate the training of the classifier, we adopted a k-fold cross-

validation.
Conclusion validity regards the degree to which the conclusions we state about the relationship (between the treatment and the outcome) are reasonable.
Threats to external validity concern the generalization of our findings. Of course, replication on further projects to confirm or contradicts the obtained results is always desirable.

Conclusion
This paper proposes an approach to identify the driver, the familiarity of the driver with the vehicle and the kind of the road basing on the study of the behavior of a person during the driving. It is based on the assumptions that a proper set of behavioral features can be used to: (i ) recognize different drivers by capturing their different driving style; (ii ) detect their familiarity with the car and (iii ) detect the road kind on which they are driving (among several types). Basing on these assumptions, we extracted, using a monitoring system placed in the cars, an effective set of features whose samples are sent to a time series classification approach. The classifier exploits a supervised learning approach and is based on a MLP network; its performances were compared with classic decision tree classifiers. The proposed time series classifier has been proved to be effective at identifying the driver, the driver genre and the road kind after trained on the proposed set of behavioral features. Specifically, the approach has been evaluated with eight experiments on three datasets made from real data logged on four cars driven by ten drivers in the Naples area. Each experiment allows exploring a specific aspect of the proposed research questions, evaluating the effectiveness of the proposed approach to: • identify the driver regardless of the car but fixing the path; • identify the driver regardless of the path but fixing the car; • identify the driver regardless of both car and path; • identify the drivergender regardless of the car but fixing the path; • identify the driver gender regardless of the path but fixing the car; • identify the path kind fixing the car but regardless of the path and the driver; • identify the driver familiarity with the vehiclefixing the car but regardless of the path and the driver; • to identify the driverfamiliarity with the vehicle when car, driver, and path can change.
The obtained results show high accuracy for all the performed experiments. In particular, the proposed approach is very effective in identifying driver. The best accuracy (0.97) is obtained to identify the gender of the driverregardless of the car but fixing the path. Good accuracy is also obtained in all the other experiments aiming to perform driver identification (the accuracy value is never less than 0.92). Looking for the driver genre identification (it is evaluated regardless of the car but fixing the path), the proposed approaches shows an accuracy of 0.91 revealing that man and woman have different driving style. Moreover, the proposed approach can be also used to identify the road kind (Highway, City Street, and Dirt Road) by fixing the car but regardless of the path and the driver. The obtained accuracy value is, in this case, equal to 0.94. Effective results are also obtained in the detection of driver familiarity. Here we have an higher accuracy (0.97) in identifying the driver familiarity when the vehicleis fixed. A slightly lower accuracy (0.91) is obtained when all car, driver, and path can change. Further experimentation in this general case should be performed to see if the size and the number of hidden layers of a single network and the number of networks positively influence the resulting accuracy.
Finally, we also compare the proposed MLP classifier with tree classic decision classifiers. The obtained results show how even if all the classifiers are characterized by high values of accuracy, the best performances are obtained by using the proposed ensemble MLP classifier. Training times however also show that our ensemble classifier is the slowest during the learning phase. This means that our approach is perfect for applications that do not require real-time identification of changing drivers (e.g. the owner can perform continuous training on its car and can be detected with very high levels accuracy). As future work, we are extending the study adding a behavioral characterization of the driver using both formal approaches (using model checking) and fuzzy rule extraction from the example data set. This allows not only to perform the identification of driver and paths but also will provide an explanation of how the identified driver is behaving for predefined classes (e.g., polluting driver, aggressive driver, cheap driver). Moreover, the evaluation can be extended to a more higher number of drivers, cars, and paths. Finally, the application of the proposed approach in the road kind identification can be further explored considering a more accurate road classification model.