Driver Identiﬁcation Methods in Electric Vehicles, a Review

: Driver identiﬁcation is very important to realizing customized service for drivers and road trafﬁc safety for electric vehicles and has become a research hotspot in the ﬁeld of modern automobile development and intelligent transportation. This paper presents a comprehensive review of driver identiﬁcation methods. The basic process of driver identiﬁcation task is proposed as four steps, the advantages and disadvantages of different data sources for driver identiﬁcation are analyzed, driver identiﬁcation models are divided into three categories, and the characteristics and research progress of driver identiﬁcation models are summarized, which can provide a reference for further research on driver identiﬁcation. It is concluded that on-board sensor data in the natural driving state is objective and accurate and could be the main data source for driver identiﬁcation. Emerging technologies such as big data, artiﬁcial intelligence, and the internet of things have contributed to building a deep learning hybrid model with high accuracy and robustness and representing an important gradual development trend of driver identiﬁcation methods. Developing a driver identiﬁcation method with high accuracy, real-time performance, and robustness is an important development goal in the future.


Introduction
Electric vehicles are characterized by their fast starting, strong power, energy saving, emission reduction, etc., and are gradually becoming an important part of the urban public transport system. In the process of the globalization of electric vehicles, there are several key issues that need to be considered, such as the impact of renewable energy and supporting charging facilities, the impact of range and charging time, and the safety issues of public transportation in the development of intelligence, which seriously affect the development of electric vehicle technology [1,2]. Driving safety is a research hotspot in the field of electric vehicle technology, in which the drivers play a key role in the safety of urban public transportation. Driver identification is a basic task in the design of an intelligent cockpit. After accurately identifying the driver, it can create a personalized driving environment for the driver and provide better driving feeling and safety. Driver identification methods have broad application prospects in the fields of advanced auxiliary driving system design [3], driver portrait [4], fleet management [5], vehicle anti-theft [6], automobile insurance services [7], and online car hailing [8]. For example, they can automatically adjust the seat position and air condition temperature according to the driver information, call customized program parameters for personalized driving assistance, and reduce the risk of vehicle theft and misuse by comparing the driver authorization information. In addition, the fleet can use the driving data to utilize an online or offline driver portrait and evaluate according to the driver information [4], and insurance companies can use the driver information to calculate the insurance premium separately according to the driver's behavior habits [7]. Especially with the application of shared cars, it can be used in the driver identity authentication system [8] of online car hailing services to improve service quality. Driver identification methods have become a research hotspot in duce the risk of vehicle theft and misuse by comparing the driver authorization information. In addition, the fleet can use the driving data to utilize an online or offline driver portrait and evaluate according to the driver information [4], and insurance companies can use the driver information to calculate the insurance premium separately according to the driver's behavior habits [7]. Especially with the application of shared cars, it can be used in the driver identity authentication system [8] of online car hailing services to improve service quality. Driver identification methods have become a research hotspot in the field of intelligent networked vehicles and traffic systems [9], which have great market application value and important research significance.
In recent years, scholars in the field of vehicle engineering and intelligent transportation have carried out a lot of research works around driver recognition, resulting in many achievements. The main objective of the driver identification task is to accurately and quickly determine the driver's identity information and provide support for the development of personalized value-added services for drivers and traffic safety. According to the needs of the target, the researcher designs the driver identification task process with different granularity. The basic process of the driver identification task is divided into four steps: data acquisition, data processing, driver identification, and result application, as shown in Figure 1. The process firstly collects various data related to driving, including driving manipulation, vehicle movement status, and the road environment. Secondly, the data should be properly preprocessed, such as by data cleaning, segmentation, or normalization, so as to ensure the high quality and proper format of the data, and at the same time, the data characteristics should be designed and selected. The third step is driver recognition. By using some approaches to fuse various data collected, a mapping model of the relationship between data features and drivers is established, and the model is used to identify drivers. Lastly, based on the results of driver identification, customized driving assistance, vehicle insurance services, and other engineering applications are carried out.  Obtaining high-quality driving data is a key prerequisite for realizing the task of driver identification. With the development of sensor technology and on-board network technology, there are many data sources available for driver identification, such as driver biometric data, driving simulation data and natural on-board sensor driving data. Each type of data has different characteristics, and different data directly affect the identification accuracy and robustness. Another key is to build a high-performance driver recognition model. In the early stage, some scholars employed models such as the hidden Markov Obtaining high-quality driving data is a key prerequisite for realizing the task of driver identification. With the development of sensor technology and on-board network technology, there are many data sources available for driver identification, such as driver biometric data, driving simulation data and natural on-board sensor driving data. Each type of data has different characteristics, and different data directly affect the identification accuracy and robustness. Another key is to build a high-performance driver recognition model. In the early stage, some scholars employed models such as the hidden Markov model (HMM), Gaussian mixture model (GMM), random forest (RF), support vector machine (SVM), linear discriminant analysis (LDA), artificial neural network (ANN), k-nearest neighbor (KNN), and extra trees (ET) to identify drivers. For example, Choi et al. [10] used the HMM to identify drivers with an accuracy rate of 70%, Miyajima et al. [11] used the GMM to identify drivers with an accuracy rate of 76.8%, Qian et al. [12] used the SVM model to identify the driver with an accuracy rate of 85%. Ezzini et al. [13] applied RF, KNN, ET, and other models to identify the driver with an accuracy of more than 90%. Although these models can recognize the driver's identity based on a small number of data samples, the recognition process needs to rely on personal experience to manually extract features, and the recognition accuracy is greatly affected by the features. With the development of new technologies such as big data and artificial intelligence, some scholars have explored the use of deep learning methods such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short term memory (LSTM) to build models World Electr. Veh. J. 2022, 13, 207 3 of 18 to further improve the driver recognition performance. For example, Rundo et al. [3] used a CNN for driver recognition, with an accuracy rate of more than 90%. These deep learning models can automatically extract features of different depths from data, which can obtain higher recognition accuracy with enough data samples. However, these models require big data samples to be input and more training time than traditional models to train and acquire a validate model. In recent years, some scholars have attempted to use hybrid models to further improve the recognition quality and reduce training time. For example, Mekki et al. [8] constructed a hybrid model by applying a CNN and LSTM, which not only has good robustness and real-time performance but also has an accuracy rate of 95% for driver recognition. The hybrid models give full play to the characteristics of different models and may have bright application prospects in the field of driver identification.
Prior reviews explored driver recognition as well. The difference of the current study from prior studies presented a key finding on the recognition methods of driver identification, reflecting the significant contributions of the current study in the field. To be more specific, prior reviews focused on driver behavior such as fatigue and distraction. Another systematic survey focused on the vehicle behavior such as lane changing and car following. However, to the best of our knowledge, systematic review papers focusing on data and models of driver identification are not available in this field yet.
Therefore, the main purpose of the paper is to determine the model used in the prior studies, particularly the features of these models, by reviewing the research and application of driver identification models in recent years. In addition, some other contents are also summarized: the basic process of the driver recognition task is proposed; the advantages and disadvantages of biometric data, driving simulator data, and on-board sensor data used for driver recognition are analyzed; and the characteristics of various models are compared and analyzed. This paper also discusses the future development trend of driver identification models, data collection and processing, model construction, and so on.
The main contributions of this paper are as follows: (i) Elaboration on the recent research on driver recognition model, presenting conclusions on the state of the art in driver study field and a potential solution to help researchers develop models with higher performance. (ii) Understanding the characteristics of the data sources for driver recognition, providing reference for further research on data collection, help researchers to find more reasonable data for training driver recognition model. (iii) Serving as a guide for future research on driver identification methods, addressing the ambiguities in the driver recognition, and providing valuable information on technical trends to promote the development of electric vehicles.
The structure of this paper is as follows: a brief description of the data sources for driver identification and the characteristics of different data are presented in Section 2. Driver identification models are discussed and analyzed in Section 3. Lastly, the paper gives some conclusions and prospects.

Data for Driver Identification
Researchers have carried out numerous studies on identifying data sources for drivers. There are many data sources that can be used to identify drivers. The main data sources can be classified into the following three categories: biometric data, driving simulator data, and on-board sensor data, as shown in Figure 2.

Biometric Data
Some researchers use driver biometric data to identify drivers. Common data include face [14], action [15], fingerprint [16], voice [17], state [18], gesture [19], and grip pattern [20]. These data can directly reflect the biological characteristics of the driver, with strong pertinence and high accuracy. However, the existing problems are that new devices are usually required to acquire data, causing a series of problems such as space occupation and cost increase. At the same time, there are difficulties in continuous real-time recognition, collection of information infringing on personal privacy, and application limitations such as easy falsification of data [21].
The structure of this paper is as follows: a brief description of the data sources for driver identification and the characteristics of different data are presented in Section 2. Driver identification models are discussed and analyzed in Section 3. Lastly, the paper gives some conclusions and prospects.

Data for Driver Identification
Researchers have carried out numerous studies on identifying data sources for drivers. There are many data sources that can be used to identify drivers. The main data sources can be classified into the following three categories: biometric data, driving simulator data, and on-board sensor data, as shown in Figure 2.

Biometric Data
Some researchers use driver biometric data to identify drivers. Common data include face [14], action [15], fingerprint [16], voice [17], state [18], gesture [19], and grip pattern [20]. These data can directly reflect the biological characteristics of the driver, with strong pertinence and high accuracy. However, the existing problems are that new devices are usually required to acquire data, causing a series of problems such as space occupation and cost increase. At the same time, there are difficulties in continuous real-time recognition, collection of information infringing on personal privacy, and application limitations such as easy falsification of data [21].

Driving Simulator Data
Some early scholars used driving simulation to collect the data needed to identify the driver [22][23][24]. For example, Woo et al. [22] used a driving simulator to simulate the traffic environment to obtain training data and evaluated the effectiveness of the driving simulation data for constructing the driver identification model. The use of driving simulation to collect data on various working conditions, roads, traffic and weather conditions has strong flexibility, but the main disadvantage is that the driving simulator has limited ability to reproduce and capture the real driving conditions and conditions, and the driving simulation data may be inconsistent with the real driving data [24].

On-Board Sensor Data
With the rapid development of sensor technology, on-board controller area network (CAN) technology, network communication technology, and intelligent terminals, many on-board sensor devices have collected massive, objective, and accurate high-quality data in real driving scenes. These data contain rich information that can be used to identify the driver [10], which has been of great interest to relevant scholars. Common data of on-

Driving Simulator Data
Some early scholars used driving simulation to collect the data needed to identify the driver [22][23][24]. For example, Woo et al. [22] used a driving simulator to simulate the traffic environment to obtain training data and evaluated the effectiveness of the driving simulation data for constructing the driver identification model. The use of driving simulation to collect data on various working conditions, roads, traffic and weather conditions has strong flexibility, but the main disadvantage is that the driving simulator has limited ability to reproduce and capture the real driving conditions and conditions, and the driving simulation data may be inconsistent with the real driving data [24].

On-Board Sensor Data
With the rapid development of sensor technology, on-board controller area network (CAN) technology, network communication technology, and intelligent terminals, many on-board sensor devices have collected massive, objective, and accurate high-quality data in real driving scenes. These data contain rich information that can be used to identify the driver [10], which has been of great interest to relevant scholars. Common data of on-board sensor devices include CAN bus data, GPS data, IMU data [25], smart phones, and smart watches. With the application of CAN technology in modern vehicles, the on-board CAN bus data have become the main data source for identifying drivers. Some researchers have used the on-board CAN bus data for driver identification and achieved success [5,6,[26][27][28][29][30]. For example, Kwak et al. [5] used 26 data related to the fuel, gearbox, wheel and engine collected from the on-board CAN to accurately identify the driver through statistical analysis and feature extraction. Azadani et al. [27] proposed a method to realize driver identification through the 30 s steering driving data collected by a CAN bus. Some scholars also classify drivers by using CAN bus sensor data [31]. GPS data are another kind of on-board sensor data used to identify drivers. It can be obtained by using an on-board navigation system or general handheld devices. It has the characteristics of low acquisition cost, strong scalability, good practicability, and real-time performance. Some researchers use GPS data to identify drivers [9,32,33]. With the popularity of mobile terminals such as smart phones and smart watches, researchers have paid attention to driver identification by extracting data collected by their built-in sensors [34][35][36][37].
To sum up, many data sources can be used for driver identification, and different data have varying characteristics; the strengths and weakness of different data are listed in Table 1. The use of on-board sensors to collect vehicle motion state data has little impact on personal privacy and can be used to actively and continuously verify the driver's identity during driving. It has good feasibility, safety, and reliability and is gradually becomes the mainstream method for identifying the driver.

Driver Identification Models
According to the characteristics of the models, driver identification methods are divided into the following three categories: traditional machine learning model recognition method, deep learning model recognition method, and hybrid model recognition method, as shown in Figure 3. Deep learning model Traditional model Hybrid model

Traditional Model
Machine learning refers to the method of learning general laws from limited observation data and using these laws to predict unknown data. The traditional machine learning model needs to first express the data as a group of features and then input these features to the prediction classifier to predict the output results. Its feature representation mainly depends on artificial experience or feature transformation methods to extract, and the extracted features have a great impact on the recognition accuracy of the model. Common models include SVM, RF, GMM [38,39], HMM [40,41], extra trees [42], LDA [43], GDT [44,45], ANN [46,47], and KNN [18,48]. This paper focuses on the analysis of the progress of using SVM and RF models to identify drivers.

SVM Model
The SVM is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space. The basic principle of the SVM algorithm is to solve the separation hyperplane that can correctly divide the training dataset and has the largest geometric interval, as shown in Figure 4.

Traditional Model
Machine learning refers to the method of learning general laws from limited observation data and using these laws to predict unknown data. The traditional machine learning model needs to first express the data as a group of features and then input these features to the prediction classifier to predict the output results. Its feature representation mainly depends on artificial experience or feature transformation methods to extract, and the extracted features have a great impact on the recognition accuracy of the model. Common models include SVM, RF, GMM [38,39], HMM [40,41], extra trees [42], LDA [43], GDT [44,45], ANN [46,47], and KNN [18,48]. This paper focuses on the analysis of the progress of using SVM and RF models to identify drivers.

SVM Model
The SVM is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space. The basic principle of the SVM algorithm is to solve the separation hyperplane that can correctly divide the training dataset and has the largest geometric interval, as shown in Figure 4.

SVM Model
The SVM is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space. The basic principle of the SVM algorithm is to solve the separation hyperplane that can correctly divide the training dataset and has the largest geometric interval, as shown in Figure 4. Many scholars use SVMs to carry out driver recognition research [12,21,[50][51][52][53]. For example, Qian et al. [12] collected signals from the steering wheel, accelerator pedal, and brake pedal by using a driving simulator and identified the driver through a series of multiple SVM models, with an accuracy rate of 85%. Rahim et al. [21] used an SVM to identify the driver based on the features extracted from GPS data, with an accuracy rate of 89.16%. Burton et al. [50] extracted features from data such as vehicle speed, steering wheel angle, and pedal pressure and adopted the SVM algorithm to achieve an accuracy rate of 95% within 80 s. Marchegiani et al. [51] proposed a driver recognition framework that combines the SVM and universal background mode. The data collected by the CAN bus such as accelerator pedal signal and brake pedal signal were used to identify four drivers, with an accuracy rate of more than 95%.

RF Model
The RF model is a nonlinear classifier that uses multiple trees to perform integrated learning training on samples to complete prediction. It establishes a cluster classification model composed of many decision trees in a random manner, as shown in Figure 5.  Many scholars use SVMs to carry out driver recognition research [12,21,[50][51][52][53]. For example, Qian et al. [12] collected signals from the steering wheel, accelerator pedal, and brake pedal by using a driving simulator and identified the driver through a series of multiple SVM models, with an accuracy rate of 85%. Rahim et al. [21] used an SVM to identify the driver based on the features extracted from GPS data, with an accuracy rate of 89.16%. Burton et al. [50] extracted features from data such as vehicle speed, steering wheel angle, and pedal pressure and adopted the SVM algorithm to achieve an accuracy rate of 95% within 80 s. Marchegiani et al. [51] proposed a driver recognition framework that combines the SVM and universal background mode. The data collected by the CAN bus such as accelerator pedal signal and brake pedal signal were used to identify four drivers, with an accuracy rate of more than 95%.

RF Model
The RF model is a nonlinear classifier that uses multiple trees to perform integrated learning training on samples to complete prediction. It establishes a cluster classification model composed of many decision trees in a random manner, as shown in Figure 5. The RF model processes a large number of high-dimensional data by randomly selecting input samples and features. It has high training efficiency and low generalization error. Some scholars use it to identify drivers [28,[55][56][57][58][59]. For example, Hallac et al. [28] proposed to identify drivers using the RF model for CAN bus data collected during turn- The RF model processes a large number of high-dimensional data by randomly selecting input samples and features. It has high training efficiency and low generalization error. Some scholars use it to identify drivers [28,[55][56][57][58][59]. For example, Hallac et al. [28] proposed to identify drivers using the RF model for CAN bus data collected during turning and driving, with an accuracy rate of 76.9%. Wang et al. [55] used a RF model composed of 1000 classification trees to input the statistical features extracted from the CAN bus data into the model, and the recognition accuracy was up to 100%. Luo et al. [56] used the RF model to conduct integrated learning on the samples composed of driving data, and the accuracy of identifying 4 and 15 drivers was 89.14% and 60.36%, respectively. EnEV et al. [57] adopted the RF model and realized the accurate identification of 15 drivers by learning the on-board CAN bus data. Lestyan et al. [58] adopted the RF model and used the on-board CAN bus data and GPS data. The average accuracy rate of driver identification was 77%, and the highest was 87%. Tanaka et al. [59] used the RF model to identify 1000 drivers and found that the model identification accuracy gradually decreased with the increase in the number of drivers. The main problem of the RF model is overfitting in some noisy classification or regression work.
Some scholars have also studied and compared the application effects of different machine learning models on driver recognition [13,60,61]. For example, Ezzini et al. [13] compared the application effects of the KNN, RF, ET, decision tree, gradient boosting and other models in driver recognition and pointed out that the RF and ET models performed best. They also observed that the traditional machine learning methods need to extract features manually, and the generalization ability of the model is poor. With the premise of inputting the same CAN bus data, Kwak et al. [60] compared and evaluated the impact of KNN, RF, and multi-layer perceptron (MLP) algorithms on driver recognition accuracy. Jafarnejad et al. [61] proposed a driver identification method based on vehicle sensor data; compared the performance of five classical machine learning classifiers including RF, SVM, gradient enhancement, AdaBoost, and decision tree; and pointed out that the accuracy rate of the AdaBoost classifier is the best, reaching 89%.
It can be seen from the above studies that the recognition accuracy of the traditional machine learning model is generally within 90%, and the recognition accuracy is greatly affected by the selected model and input dataset. This is mainly due to the traditional machine learning methods relying on manual extraction of features, difficult to capture complex time features, and insufficient fitting ability of high-dimensional nonlinear data. It is also difficult to input big data samples.

Deep Learning Model
The deep learning method originates from the artificial neural network. It is a processing mechanism of stacking multiple hidden layers and processing the output layer by layer. It transforms the input representation that is not closely related to the target into the higher-order feature representation that is more closely related to the output target. This method was gradually applied to the driver identification task in 2016 and showed good performance in capturing and characterizing potential driving behaviors [62]. With the advent of cloud computing and big data era and the great improvement of computing power, the application of the deep learning method to map driving data to specific characteristics of drivers to identify drivers has achieved good results. Common models include the CNN, RNN, and MLP [63]. This paper focuses on the research of driver identification methods with CNNs and RNNs.

CNN Model
A CNN is a neural network that adds a convolution operation to the DNN to automatically extract features and is especially used to process data with similar grid structure. It has the characteristics of local connection, weight sharing, and automatic feature extraction. However, the CNN has the limitation of being unable to model changes in time series. The CNN is cross stacked by a convolution layer, a convergence layer, and a full connection layer. The typical structure and application are shown in Figure 6.
The earliest CNN structure was proposed by LeCun et al. [64] in 1998 for handwritten digital image recognition. The challenge of the driver recognition task is to extract and capture relevant features of each driver's unique driving behavior. A CNN can automatically extract features from driving data, which provides a good solution. Some researchers used the CNN model to identify drivers using driving data [3,27,31,62,[65][66][67][68][69][70][71]. For example, Jeong et al. [65] normalized, preprocessed, and segmented CAN bus data. A CNN was used to identify four drivers, with an accuracy rate of 90%. The real vehicle test showed that the accuracy rate could reach 80% in only 4-5 min. Azadani et al. [62] used feature data such as speed, acceleration, and steering wheel angle as inputs and used a CNN to extract driving-behavior-related features and identify drivers. Chen et al. [66] proposed a multi-channel CNN model for driver classification. Cai et al. [67] used a CNN to identify drivers, with an accuracy rate of 93.5%. Choi et al. [68] proposed a method for driver recognition using multi-stream convolutional neural network, with an accuracy rate of 98.9%. Azadani et al. [27] proposed a connected time convolution network driver identification method based on driver steering behavior analysis. By mapping the driver's 30 s driving data to the deep characterization feature, driver recognition and impostor detection tasks were realized. The model evaluation was carried out on the real driving dataset of 95 drivers, and the results showed that the deep learning model had excellent performance. Hu et al. [69] proposed a general framework for driver identification based on in-depth learning of one-dimensional convolutional neural network using vehicle CAN bus driving status data. Through the identification of 20 drivers, the macro F1 score of the evaluation index can reach 99.1%, which is superior to the SVM, MLP, and LSTM, with stable performance and strong robustness. Some scholars have also improved the CNN for driver recognition. For example, Abdennour et al. [31] carried out residual calculation on the basis of the CNN and proposed a driver identification method based on a residual convolution network (RCN). This method uses the original signal sequence of vehicle CAN bus data as input, and the accuracy rate is 99.3%, which is better than the traditional machine learning method. The disadvantage of the CNN is that it can only capture the local time features corresponding to the convolution kernel length and cannot capture the long-term time features of the data sequence. driver recognition using multi-stream convolutional neural network, with an accuracy rate of 98.9%. Azadani et al. [27] proposed a connected time convolution network driver identification method based on driver steering behavior analysis. By mapping the driver's 30 s driving data to the deep characterization feature, driver recognition and impostor detection tasks were realized. The model evaluation was carried out on the real driving dataset of 95 drivers, and the results showed that the deep learning model had excellent performance. Hu et al. [69] proposed a general framework for driver identification based on in-depth learning of one-dimensional convolutional neural network using vehicle CAN bus driving status data. Through the identification of 20 drivers, the macro F1 score of the evaluation index can reach 99.1%, which is superior to the SVM, MLP, and LSTM, with stable performance and strong robustness. Some scholars have also improved the CNN for driver recognition. For example, Abdennour et al. [31] carried out residual calculation on the basis of the CNN and proposed a driver identification method based on a residual convolution network (RCN). This method uses the original signal sequence of vehicle CAN bus data as input, and the accuracy rate is 99.3%, which is better than the traditional machine learning method. The disadvantage of the CNN is that it can only capture the local time features corresponding to the convolution kernel length and cannot capture the long-term time features of the data sequence.

RNN Model
The RNN model is a neural network with time memory ability. Its neurons can accept not only the information input of other neurons but also their own information input. The output of its neurons is the result of the interaction of the input at that time and all history. The basic RNN is a network structure with a loop composed of an input layer, a hidden layer, a delayer, and an output layer, as shown in Figure 7. The basic RNN can model the time-series data, but it has the disadvantage of gradient disappearance and gradient explosion caused by long-term dependence.

RNN Model
The RNN model is a neural network with time memory ability. Its neurons can accept not only the information input of other neurons but also their own information input. The output of its neurons is the result of the interaction of the input at that time and all history. The basic RNN is a network structure with a loop composed of an input layer, a hidden layer, a delayer, and an output layer, as shown in Figure 7. The basic RNN can model the time-series data, but it has the disadvantage of gradient disappearance and gradient explosion caused by long-term dependence. Figure 6. CNN structure, adopted from [65].

RNN Model
The RNN model is a neural network with time memory ability. Its neurons can accept not only the information input of other neurons but also their own information input. The output of its neurons is the result of the interaction of the input at that time and all history. The basic RNN is a network structure with a loop composed of an input layer, a hidden layer, a delayer, and an output layer, as shown in Figure 7. The basic RNN can model the time-series data, but it has the disadvantage of gradient disappearance and gradient explosion caused by long-term dependence.

Output layer
Hidden layers To solve this problem, researchers introduced a gating mechanism on the basis of the basic RNN. The main gating models are LSTM and the gate recurrent unit (GRU). The working principle of LSTM is to maintain and control the information by introducing the cell state and removing or adding the information to the cell state running through the upper horizontal line in the figure by using the input gate, the forgetting gate, and the output gate. The cycle cell structure is shown in Figure 8. LSTM does not need convolution to obtain discrimination features and can learn and remember the time correlation from the input data sequence, which is conducive to driver recognition based on the time series driving data. To solve this problem, researchers introduced a gating mechanism on the basis of the basic RNN. The main gating models are LSTM and the gate recurrent unit (GRU). The working principle of LSTM is to maintain and control the information by introducing the cell state and removing or adding the information to the cell state running through the upper horizontal line in the figure by using the input gate, the forgetting gate, and the output gate. The cycle cell structure is shown in Figure 8. LSTM does not need convolution to obtain discrimination features and can learn and remember the time correlation from the input data sequence, which is conducive to driver recognition based on the time series driving data. Some scholars used the LSTM model to carry out driver identification research [73][74][75][76][77][78][79][80][81][82][83][84][85][86]. For example, Rundo et al. [73] proposed an algorithm that uses LSTM architecture to identify the identity of automobile drivers, with an accuracy rate of nearly 99%. Ravi et al. [74] used the LSTM model to learn and train the on-board sensor data, combined with super parameter optimization, and achieved high accuracy in driver identification. Khairdoost et al. [75] proposed a deep learning method based on LSTM. This method uses the data of driver's gaze and head position as well as vehicle dynamics data to predict the driver's actions, with speed of 3.6 s and F1 score of 84%. The LSTM model proposed by Girma et al. [76] predicts the driver's identity according to the personal unique driving patterns learned from the vehicle telematics data and has good anti-noise interference ability. The prediction accuracy on the three natural driving datasets is still satisfactory and better than the traditional machine learning model and CNN. Choi et al. [77] used LSTM for driver recognition, with an average accuracy rate of 96.5%. Dang et al. [78] proposed a method to identify drivers by combining the connected network architecture with the LSTM model, which greatly improves the generalization ability and robustness of Some scholars used the LSTM model to carry out driver identification research [73][74][75][76][77][78][79][80][81][82][83][84][85][86]. For example, Rundo et al. [73] proposed an algorithm that uses LSTM architecture to identify the identity of automobile drivers, with an accuracy rate of nearly 99%. Ravi et al. [74] used the LSTM model to learn and train the on-board sensor data, combined with super parameter optimization, and achieved high accuracy in driver identification. Khairdoost et al. [75] proposed a deep learning method based on LSTM. This method uses the data of driver's gaze and head position as well as vehicle dynamics data to predict the driver's actions, with speed of 3.6 s and F1 score of 84%. The LSTM model proposed by Girma et al. [76] predicts the driver's identity according to the personal unique driving patterns learned from the vehicle telematics data and has good anti-noise interference ability. The prediction accuracy on the three natural driving datasets is still satisfactory and better than the traditional machine learning model and CNN. Choi et al. [77] used LSTM for driver recognition, with an average accuracy rate of 96.5%. Dang et al. [78] proposed a method to identify drivers by combining the connected network architecture with the LSTM model, which greatly improves the generalization ability and robustness of model recognition.
Compared with LSTM, the GRU has simpler structure, fewer model parameters, and higher learning efficiency. The basic principle of the GRU is to control how much information the current state needs to retain from the historical state by introducing an update gate and to control the balance between input and forgetting through the update gate. The cycle unit structure is shown in Figure 9. Some scholars use the GRU model to carry out driver identification research [87][88][89][90][91][92][93][94][95]. For example, Li et al. [87] used the GRU model to identify the driver with an accuracy rate of 97.3%. Gahr et al. [88] used the GRU network to identify the driver by using the steering wheel driving data in natural driving scenarios. The results show that the GRU model can improve driver identification accuracy by more than 50% compared with the traditional method. Carvalho et al. [91] used the GRU model to identify driving behavior with an accuracy rate of more than 95%. The results show that the GRU model can improve driver identification accuracy by more than 50% compared with the traditional method. Carvalho et al. [91] used the GRU model to identify driving behavior with an accuracy rate of more than 95%. To sum up, the deep learning model constructs a network framework with a certain depth and width, nonlinearly maps the original shallow driving data features layer by layer, and automatically extracts distributed features from massive data and converts them into a good deep feature representation so as to ultimately improve the accuracy of driver identification. The most important advantage is that autonomous learning forms the characteristics of driver identification, unlike the traditional machine learning model which needs to rely on the experience of domain experts to manually design the characteristics. Compared with the traditional machine learning model, a main difference is that increasing the quantity of data cannot continuously increase the total amount of knowledge learned. The deep learning model can access more driving data and can be used for the driver identification task after obtaining sufficient experience. It can mine potential features of on-board sensor data, realize end-to-end learning of data, effectively improve the inconsistency between feature representation and prediction classification criteria, overcome the difficulty in capturing complex time features, improve the performance of identification models, and achieve higher accuracy and robustness. It is increasingly popular in driver identification models. However, the deep learning model has large computer resource demands, and there are problems such as difficulty in allocating the contribution of the recognition results and a large number of sample data [81].

Hybrid Model
In order to give full play to the advantages of different basic models, some scholars attempted to further improve the performance of driver recognition by using hybrid models [8,29,[96][97][98][99][100][101]. In this paper, the hybrid model is divided into series type, parallel type, and compound type. Some researchers used the serial hybrid model for driver identification. For example, Zhang et al. [29] built a hybrid model by connecting the CNN and RNN in series (as shown in Figure 10) and used the on-board CAN bus data for driver identifi- To sum up, the deep learning model constructs a network framework with a certain depth and width, nonlinearly maps the original shallow driving data features layer by layer, and automatically extracts distributed features from massive data and converts them into a good deep feature representation so as to ultimately improve the accuracy of driver identification. The most important advantage is that autonomous learning forms the characteristics of driver identification, unlike the traditional machine learning model which needs to rely on the experience of domain experts to manually design the characteristics. Compared with the traditional machine learning model, a main difference is that increasing the quantity of data cannot continuously increase the total amount of knowledge learned. The deep learning model can access more driving data and can be used for the driver identification task after obtaining sufficient experience. It can mine potential features of onboard sensor data, realize end-to-end learning of data, effectively improve the inconsistency between feature representation and prediction classification criteria, overcome the difficulty in capturing complex time features, improve the performance of identification models, and achieve higher accuracy and robustness. It is increasingly popular in driver identification models. However, the deep learning model has large computer resource demands, and there are problems such as difficulty in allocating the contribution of the recognition results and a large number of sample data [81].

Hybrid Model
In order to give full play to the advantages of different basic models, some scholars attempted to further improve the performance of driver recognition by using hybrid models [8,29,[96][97][98][99][100][101]. In this paper, the hybrid model is divided into series type, parallel type, and compound type. Some researchers used the serial hybrid model for driver identification. For example, Zhang et al. [29] built a hybrid model by connecting the CNN and RNN in series (as shown in Figure 10) and used the on-board CAN bus data for driver identification. The identification accuracy is up to 98.36%. Moosavi et al. [96] proposed to build a hybrid model by connecting the CNN and RNN in series, capture the semantic pattern of driver behavior from the on-board CAN data through the CNN, and used the RNN to code the driving style to establish a mapping relationship so as to realize the effective identification of drivers.
Some researchers adopted parallel hybrid models for driver recognition. For example, Mekki et al. [8] proposed a hybrid model for driver recognition by inputting driving data into the CNN and LSTM in the form of univariate data and multivariable time-series data in parallel. The model learns the time correlation in the driving data series. The driver recognition model constructed has good generalization ability and robustness, and the accuracy rate is as high as 95%. Hammann et al. [97] proposed a hybrid model composed of LSTM and RESNET in parallel (as shown in Figure 11) to improve the recognition accuracy and efficiency, and the recognition accuracy of five drivers on the dataset Utdrive reached 96 Figure 11. The hybrid model adopted from [97].
Some researchers propose composite hybrid models, such as Moreira et al. [98], who developed driver identification methods by superimposing SVM, RF, boosted C4.5, and LVQ integrated hybrid models (as shown in Figure 12), greatly reducing model generalization errors. Jafarnejad et al. [99] proposed a hybrid model architecture composed of embedded modules and an RNN layer, which identifies drivers by extracting GPS trajectory data features. When the number of drivers is 5-100, the recognition accuracy is at least Figure 10. The hybrid model adopted from [29]. Some researchers adopted parallel hybrid models for driver recognition. For example, Mekki et al. [8] proposed a hybrid model for driver recognition by inputting driving data into the CNN and LSTM in the form of univariate data and multivariable time-series data in parallel. The model learns the time correlation in the driving data series. The driver recognition model constructed has good generalization ability and robustness, and the accuracy rate is as high as 95%. Hammann et al. [97] proposed a hybrid model composed of LSTM and RESNET in parallel (as shown in Figure 11) to improve the recognition accuracy and efficiency, and the recognition accuracy of five drivers on the dataset Utdrive reached 96.90%.
Some researchers adopted parallel hybrid models for driver recognition. For example, Mekki et al. [8] proposed a hybrid model for driver recognition by inputting driving data into the CNN and LSTM in the form of univariate data and multivariable time-series data in parallel. The model learns the time correlation in the driving data series. The driver recognition model constructed has good generalization ability and robustness, and the accuracy rate is as high as 95%. Hammann et al. [97] proposed a hybrid model composed of LSTM and RESNET in parallel (as shown in Figure 11) to improve the recognition accuracy and efficiency, and the recognition accuracy of five drivers on the dataset Utdrive reached 96 Figure 11. The hybrid model adopted from [97].
Some researchers propose composite hybrid models, such as Moreira et al. [98], who developed driver identification methods by superimposing SVM, RF, boosted C4.5, and LVQ integrated hybrid models (as shown in Figure 12), greatly reducing model generalization errors. Jafarnejad et al. [99] proposed a hybrid model architecture composed of embedded modules and an RNN layer, which identifies drivers by extracting GPS trajectory data features. When the number of drivers is 5-100, the recognition accuracy is at least Figure 11. The hybrid model adopted from [97]. Some researchers propose composite hybrid models, such as Moreira et al. [98], who developed driver identification methods by superimposing SVM, RF, boosted C4.5, and LVQ integrated hybrid models (as shown in Figure 12), greatly reducing model generalization errors. Jafarnejad et al. [99] proposed a hybrid model architecture composed of embedded modules and an RNN layer, which identifies drivers by extracting GPS trajectory data features. When the number of drivers is 5-100, the recognition accuracy is at least 86 (a) (b) Figure 12. The hybrid model adopted from; (a) Preprocessing (b) Model introduction [98].
In conclusion, the hybrid model can select the basic model according to the target requirements and data resources and design different types of model combinations or learning networks with different depth and width structures to further improve the accuracy, robustness, and real-time performance of driver recognition.
From the above analysis of the three types of recognition models, it can be seen that each type of model has different performance characteristics and performance. The traditional machine learning model usually needs to extract the statistical features of the data using the expert experience; the accuracy is greatly affected by the manual selection of features. The deep learning model requires a large sample size of data, with characteristics such as automatic feature extraction, seamless connection with the classifier, high accuracy, and large computational power and resource occupation and is greatly affected by the correlation of feature data and the length of sample data. The hybrid model makes full use of the advantages of different models and has strong generalization ability and good robustness.

Summaries and Prospects
The driver identification method is an important research direction in the development of intelligent vehicles and advanced transportation systems and plays an important role in engineering fields such as driving assistance and traffic safety. This paper reviews the development trends of techniques for driver identification. The main contributions of the paper is helping future researchers to deeply study the technology of driver recognition in more detail including providing a reference for further research on data collection, leading to a potential solution to help researchers develop models with higher performance and offering valuable information on technical trends to promote the development of electric vehicles. In addition, the basic process of the driver identification task is summarized into four steps in this paper: data sampling, data processing, driver identification, and engineering application. The two key elements of driver identification task, namely data collection and model construction, are analyzed and discussed. By comparing and analyzing the advantages and disadvantages of various data, it is pointed out that the onboard sensor data contain rich information and can be used as the main data source for identifying drivers. By analyzing the research progress of the traditional machine learning model, deep learning model, and hybrid model, it is demonstrated that the deep learning model has advantages in recognition accuracy and robustness compared with traditional In conclusion, the hybrid model can select the basic model according to the target requirements and data resources and design different types of model combinations or learning networks with different depth and width structures to further improve the accuracy, robustness, and real-time performance of driver recognition.
From the above analysis of the three types of recognition models, it can be seen that each type of model has different performance characteristics and performance. The traditional machine learning model usually needs to extract the statistical features of the data using the expert experience; the accuracy is greatly affected by the manual selection of features. The deep learning model requires a large sample size of data, with characteristics such as automatic feature extraction, seamless connection with the classifier, high accuracy, and large computational power and resource occupation and is greatly affected by the correlation of feature data and the length of sample data. The hybrid model makes full use of the advantages of different models and has strong generalization ability and good robustness.

Summaries and Prospects
The driver identification method is an important research direction in the development of intelligent vehicles and advanced transportation systems and plays an important role in engineering fields such as driving assistance and traffic safety. This paper reviews the development trends of techniques for driver identification. The main contributions of the paper is helping future researchers to deeply study the technology of driver recognition in more detail including providing a reference for further research on data collection, leading to a potential solution to help researchers develop models with higher performance and offering valuable information on technical trends to promote the development of electric vehicles. In addition, the basic process of the driver identification task is summarized into four steps in this paper: data sampling, data processing, driver identification, and engineering application. The two key elements of driver identification task, namely data collection and model construction, are analyzed and discussed. By comparing and analyzing the advantages and disadvantages of various data, it is pointed out that the on-board sensor data contain rich information and can be used as the main data source for identifying drivers. By analyzing the research progress of the traditional machine learning model, deep learning model, and hybrid model, it is demonstrated that the deep learning model has advantages in recognition accuracy and robustness compared with traditional machine learning model and will become the mainstream method for accurate driver recognition. It is also revealed that the hybrid model can reasonably select models according to the task target and data resources, which can make full use of the advantages of different models to further improve the generalization ability and recognition robustness and has a good development prospect. The characteristics of the three recognition models described in this paper are summarized as shown in Table 2. Although researchers have made some achievements in the performance of driver recognition, there are still many works to be further studied. Based on the description above, driver identification methods have been well developed. However, as a dynamically coupled system, its quality is subject to many factors in practice. Driver recognition still faces multiple challenges such as improvement in terms of accuracy and computational efficiency for online driver identification applications. The pursuit of higher accuracy, real-time performance, and robustness is an important development trend of driver identification methods. The outlook on the future development of driver identification methods can be described in four aspects, including higher quality data, higher performance models, and higher performance hardware and software. First of all, the reliability of the driver identification method largely depends on the data quality, the quantity of data, and the diversity of driving scenes [102]. With the characteristics such as objective reality, accuracy and reliability, massive data, no interference for the driver, protection of personal privacy, and low acquisition cost, vehicle on-board multi-sensor data have gradually become the main data source for driver identification. Data enhancement generation technology [103,104] and data label automatic marking technology [105] have become important means of supplementing data. All in all, the driver identification model is the key to accurately identifying the driver. Driver identification methods using the deep learning model that overcome the problems of sharp decline in recognition accuracy and the generalization ability caused by changes in driver groups or driving scenarios can become the mainstream driver recognition models. At the same time, different models can be adopted in different stages of driver recognition, so the hybrid model that jointly applies the advantages of various models is a promising solution for further research. Some engineering application scenarios require real-time performance and accurate driver identification, which poses new challenges to the performance of hardware and software used for the driver identification system. The software and hardware technologies complement each other to further improve the accuracy, real-time performance, and robustness of the driver identification system. It is necessary for scholars to continue to conduct in-depth and detailed theoretical research and application research on driver identification methods, which are significant for improving vehicle and traffic safety.

Conflicts of Interest:
The authors declare no conflict of interest.