Improve Aggressive Driver Recognition Using Collision Surrogate Measurement and Imbalanced Class Boosting.

Real-time recognition of risky driving behavior and aggressive drivers is a promising research domain, thanks to powerful machine learning algorithms and the big data provided by in-vehicle and roadside sensors. However, since the occurrence of aggressive drivers in real traffic is infrequent, most machine learning algorithms treat each sample equally and prone to better predict normal drivers rather than aggressive drivers, which is our real interest. This paper aims to test the advantage of imbalanced class boosting algorithms in aggressive driver recognition using vehicle trajectory data. First, a surrogate measurement of collision risk, called Average Crash Risk (ACR), is proposed to calculate a vehicle's crash risk. Second, the driver's driving aggressiveness is determined by his/her ACR with three anomaly detection methods. Third, we train classification models to identify aggressive drivers using partial trajectory data. Three imbalanced class boosting algorithms, SMOTEBoost, RUSBoost, and CUSBoost, are compared with cost-sensitive AdaBoost and cost-sensitive XGBoost. Additionally, we try two resampling techniques with AdaBoost and XGBoost. Among all algorithms tested, CUSBoost achieves the highest or the second-highest Area Under Precision-Recall Curve (AUPRC) in most datasets. We find the discrete Fourier coefficients of gap as the key feature to identify aggressive drivers.


Introduction
Active safety systems, such as Forward Collision Warning (FCW), Dynamic Brake Support (DBS), and Autonomous Emergency Braking (AEB), have been widely offered by many automobile companies in today's new car. Active safety systems use radar, camera, or ultrasonic sensors to detect vehicles, pedestrians, and obstacles. Once an imminent collision is detected, these systems will alert drivers or intervene to avoid crashes. In the meantime, with the increasing ability in driving behavior data collection, there is an emerging possibility to take advantage of machine learning to identify aggressive drivers and prevent crash events. Aggressive drivers behave recklessly, including speeding, improper following, and risky lane-changing, and put people and property at risk. In recent years, many studies have worked on the method of driving pattern identification based on naturalist driving experiments [1][2][3], in which experiment vehicles were equipped with cameras to capture driver motions and the surrounding environment [4,5], and with specialized sensors to collect vehicle model to better predict aggressive drivers. Resampling methods oversample the minority samples or undersample the majority samples to make the training data more balanced. Existing research tried cost-sensitive learning, or different resampling methods as data preprocessing with no modification on the existing machine learning algorithms.
Different from existing studies, this paper implements imbalanced class boosting algorithms, such as SMOTEBoost [31], RUSBoost [32], and CUSBoost [33], which have not been applied to recognition driving behavior or driving style. Imbalanced class boosting algorithms apply the resampling method to increase the number of minority class samples or decrease the number of majority class samples in each iteration of boosting. Imbalanced class boosting algorithms were tested on many datasets, and they give better results than using the resampling method only as data preprocessing [31][32][33]. The recognition results of imbalanced class boosting algorithms are compared with cost-sensitive boosting and boosting with resampling.
In the paper, the Next Generation Simulation (NGSIM) vehicle trajectory data extracted from video surveillance is adopted to measure the rear-end collision risk for each driver. The advantage of video-extracted vehicle trajectory data is the huge number of vehicles that can be observed simultaneously with relatively low cost of video recording and video processing. A large sample of vehicle trajectory can help better train data-hungry machine learning models. Once the recognition model is well-trained, the identification of drivers can be done using any data source that provides vehicle trajectory information, including video-extracted data, in-vehicle sensor data, and cell-phone data. If the data used for training and identification are from different sources, then the multiple-source data should be calibrated to have consistent measurement accuracy.
Among many aggressive driving behaviors, improper following is the most common one in the NGSIM vehicle trajectory data and its severity can be quantified using the surrogate measurement of rear-end collision. In this paper, the driving aggressiveness is determined and labeled by a proposed surrogate measurement of rear-end collision, which can be calculated automatically using vehicle trajectory data and therefore is more efficient and objective than questionnaires and the subjective judgment of experts. Then different algorithms are applied to identify aggressive drivers with labeled and imbalanced data. The identification results are discussed based on performance indicators.

Materials and Methods
The methodology can be divided into two parts. The first part includes Section 2.1. and Section 2.2., which explains how we measure the collision risk of each vehicle and label aggressive drivers. In Section 2.1., we propose a new measurement of rear-end collision risk. In Section 2.2., we introduce three anomaly detection methods to determine the threshold of aggressive driving. The second part explains how to train a classification model once each driver has been labeled. Section 2.3. covers the feature extraction with the Discrete Fourier Transform (DFT) method, which transforms a given time series to signal amplitude in the frequency domain, which can reveal driving characteristics hidden in vehicle trajectory data. Section 2.4. introduces imbalanced class boosting algorithms and other algorithms tested in this paper. The last section shows the performance indicators used to measure the ability of boosting algorithms. The methodology framework is shown in Figure 1. This research does not involve human participants and the studied data contains no sensitive and personally identifiable information.

Surrogate Measurement of Collision
As discussed in the Introduction section, there are many proximal surrogate indicators proposed to measure the collision risk or evaluate safety level. Among all, Time to Collision (TTC) is commonly used in the collision warning system. It assumes constant driving speed for both leading and following vehicles and ignores the scenario that the leading vehicle decelerates abruptly, which may underestimate the crash risk of the following vehicle in unsteady traffic flow. Margin to Collision (MTC) overcomes the disadvantage of TTC by assuming that both leading and following vehicles can decelerate abruptly at the same time. Difference of Space distance and stopping distance (DSS) considers the following vehicle's reaction time to the leading vehicle's deceleration. Time Integrated DSS (TIDSS) calculates the integrate DSS over a time period to show the aggregated crash risk of a given vehicle.
We propose a new surrogate indicator, Average Crash Risk (ACR), to measure a driver's aggressiveness. For each vehicle, Crash Risk (CR) at time point t is calculated based on its DSS:

Surrogate Measurement of Collision
As discussed in the Introduction section, there are many proximal surrogate indicators proposed to measure the collision risk or evaluate safety level. Among all, Time to Collision (TTC) is commonly used in the collision warning system. It assumes constant driving speed for both leading and following vehicles and ignores the scenario that the leading vehicle decelerates abruptly, which may underestimate the crash risk of the following vehicle in unsteady traffic flow. Margin to Collision (MTC) overcomes the disadvantage of TTC by assuming that both leading and following vehicles can decelerate abruptly at the same time. Difference of Space distance and stopping distance (DSS) considers the following vehicle's reaction time to the leading vehicle's deceleration. Time Integrated DSS (TIDSS) calculates the integrate DSS over a time period to show the aggregated crash risk of a given vehicle.
We propose a new surrogate indicator, Average Crash Risk (ACR), to measure a driver's aggressiveness. For each vehicle, Crash Risk (CR) at time point t is calculated based on its DSS: where v l and v f are the speed of the leading and following vehicles, respectively, µ is the fraction rate, set to 0.7; g is the acceleration of gravity, 9.8 m/s 2 (or 32.174 ft/s 2 ); d is the gap between the leading and following vehicles; τ is the reaction time of drivers. When the vehicle is accelerating, τ is set to 1.5 s. When the vehicle is decelerating or idling, τ is set to 0.7 s. When DSS > 0, it means the following vehicle has enough time to decelerate and avoid a collision. Therefore, Crash Risk is 0. When DSS ≤ 0, the following vehicle has a potential crash risk, and the Crash Risk is measured as the absolute value of DSS divided by the speed of the following vehicle.
To measure the overall driving aggressiveness for each driver during the whole car-following process, we calculate the average crash risk as follows: where T is the car-following duration; ∆t is the sampling interval, 0.1 s.

Average Crash Risk (ACR) Threshold
Once each driver's ACR is calculated based on how they interact with their preceding vehicle, aggressive drivers can be determined as their ACR exceeds a certain threshold. However, there is no empirical or theoretical threshold available in previous studies, and we apply three anomaly detection methods to find the boundary between normal drivers and aggressive drivers: K-means clustering, interquartile range rule, and Xth percentile.

K-means clustering
Given a set of observations (x 1 , x 2 , . . . , x n ), where each observation is a d-dimensional real vector, K-means clustering aims to partition the n observations into k groups = {C 1 , C 2 , . . . , C k } to minimize the within-cluster sum of variance. The objective of K-means is: x is the mean vector within cluster Ci.
The K-means algorithm uses the squared Euclidean distance metric and a heuristic to find centroid seeds for k-means clustering. We use k-means to group 299 drivers into 2 clusters: normal and aggressive.

Interquartile Range Rule
The interquartile range rule is useful in detecting outliers that fall far away from the center of the data. We assume that an aggressive driver's ACR is extremely apart from the average and use the following equation to calculate the threshold.
where Q 3 is the 75th percentile of the data; IQR is the difference between the 75th percentile and the 25th percentile of the data.

The Xth percentile
The Xth percentile method is straightforward. Using the Xth percentile of the data as the threshold is equivalent to assuming that (100−X)% of drivers on the road are aggressive. The proper value of X is vague and subjective. We take the 94th percentile of the ACR as the threshold and use it only as complementary to the other methods above.

Discrete Fourier Transform
Since every vehicle has a different length of trajectory data, the time series of gap, speed, and acceleration rate of each vehicle cannot be used directly to identify the driver's driving aggressiveness. Discrete Fourier Transform (DFT) has been applied many times in driving behavior studies to convert time series of driving features to signal amplitude in the frequency domain.
The DFT of a given time series (x 1 , x 2 , . . . , x N ) is defined as a sequence of N complex numbers (DFT 0 , DFT 1 , . . . , DFT N−1 ): where i is the imaginary unit.
Time series data have temporal structures. The low/medium-frequency information shows the time series' level, trend, and periodicity. The high-frequency part is noise. Through Discrete Fourier Transform, we keep the first 15 DFT coefficients of each time series as input and drop the rest part as noise.

Imbalanced Class Boosting Algorithms
In real traffic, the proportion of aggressive drivers is much smaller than the proportion of normal drivers. Therefore, the dataset is usually imbalanced with aggressive drivers as the minority class. A popular solution is to fully or partially balance the class distribution by resampling. For example, SMOTE (Synthetic Minority Oversampling Technique) [34] balances the data by synthetically generating more instances of the minority class, and the classifiers can broaden their decision regions for the minority class. RUS (Random Under Sampling) removes examples from the majority class at random until the desired class distribution is achieved.
The first imbalanced class boosting algorithm SMOTEBoost was proposed by Chawla et al. [31]. It combines the SMOTE and the standard boosting procedure. The standard boosting procedure gives equal weights to all misclassified examples, and sampling distributions in subsequent boosting iterations could have a larger composition of majority class cases. By introducing SMOTE in each round of boosting, SMOTEBoost algorithm gradually increases the number of minority class samples. The algorithm procedure of AdaBoost and imbalanced class boosting are shown in Figure 2. Seiffert et al. [32] proposed the RUSBoost algorithm that combines RUS and AdaBoost. In each iteration of boosting, RUS is used to balance class instead of SMOTE. CUSBoost [33] is another imbalanced class boosting algorithm that combines under-sampling with AdaBoost. CUSBoost clusters majority class first, and then randomly removes majority samples based on their cluster. Int. J. Environ. Res. Public Health 2020, 17, x FOR PEER REVIEW 7 of 18 Nine algorithms are tested in the paper (see Table 1). The first group is cost-sensitive boosting, including AdaBoost and XGBoost, which does not resample the training data. Instead, a higher-class weight was set for the minority class to offset the imbalance. The second group is standard boosting with resampling. We tried two resampling methods: SMOTE and RUS. There are four algorithms in the second group. SMOTE + AdaBoost, for example, uses SMOTE first on the training data to oversample the minority class, and then train the AdaBoost model using the balanced training data. The third group is imbalanced class boosting, including SMOTEBoost, RUSBoost, and CUSBoost. Hyperparameter optimization for each machine learning model is achieved by Grid Search.  Nine algorithms are tested in the paper (see Table 1). The first group is cost-sensitive boosting, including AdaBoost and XGBoost, which does not resample the training data. Instead, a higher-class weight was set for the minority class to offset the imbalance. The second group is standard boosting with resampling. We tried two resampling methods: SMOTE and RUS. There are four algorithms in the second group. SMOTE + AdaBoost, for example, uses SMOTE first on the training data to oversample the minority class, and then train the AdaBoost model using the balanced training data. The third group is imbalanced class boosting, including SMOTEBoost, RUSBoost, and CUSBoost. Hyperparameter optimization for each machine learning model is achieved by Grid Search.

Performance Evaluation
The performance of the boosting algorithm depends on its power to identify aggressive drivers using vehicle trajectory data. This paper use four important performance indices: precision rate, recall rate, f1 score, and Area under the Precision-Recall Curve (AUPRC).
Precision rate is defined as follows: where TP is the number of aggressive drivers correctly identified; FP is the number of normal drivers wrongly identified as aggressive drivers.
Recall rate is defined as follows: where FN is the number of aggressive drivers wrongly identified as normal drivers. The F1 score is the harmonic average of the precision and recall. A high F1 score represents high values in both precision rate and recall rate.
A ROC curve (receiver operating characteristic curve) is a graph showing the false positive rate versus the true positive rate for different candidate threshold values between 0 and 1. Similarly, a precision-recall curve is a plot of the precision and the recall for different thresholds. Generally, ROC curves should be used when there are roughly equal numbers of observations for each class. When there is a class imbalance, Precision-Recall curves should be used, because the ROC curve with an imbalanced dataset might be deceptive and lead to incorrect interpretations of the model skill [35]. Therefore, this paper uses AUPRC to compare algorithms' performance, which measures the entire two-dimensional area underneath the entire Precision-Recall curve.
Stratified K-fold cross-validation is widely used to evaluate the classification algorithm's performance, especially when the dataset is highly imbalanced. Since using all 299 drivers to train the model may cause an overfitting problem and exaggerate the accuracy of the trained model, we divided the 299 drivers randomly into five equal-sized subsets. At each time, four subsets are used for resampling and then training, and the left-out subset is used to assess the performance of the trained model. This process rotates through each subset, and the average accuracy, precision rate, and recall rate represent the performance of the algorithm. Stratified 5-fold cross-validation was repeated five times.

Data
Next Generation Simulation (NGSIM) is the most researched public dataset of vehicle trajectory. It was originally collected using cameras, and vehicle trajectory data was extracted through computer vision techniques. One part of NGSIM data was collected on a segment of the I-80 freeway in Emeryville, California. The segment is approximately 500 m in length and contains six lanes, including a high occupancy vehicle (HOV) lane. The data were collected from 4:00-4:15 pm and from 5:00-5:30 pm on 13 April 2005, 45 min in total. Due to low-resolution cameras and mistracking of vehicles from video images, the NGSIM trajectory data has substantial measurement error [36,37]. Montanino and Punzo [36] reconstructed the I-80 dataset 1 (from 4:00-4:15 pm), which was shown significant improvement over the original NGSIM dataset. The reconstructed NGSIM data is available to the public on the U.S. Department of Transportation's public data portal website.
This paper uses Montanino and Punzo's dataset [36] for aggressive driver identification and focuses on 299 leader-follower vehicle pairs (LVP) on the HOV lane that was not interrupted by lane-changing. Each leader-follower pair has a duration of at least 10 s, most are longer than 20 s. For every 0.1 s, the leading and following vehicles' speed, acceleration/deceleration rate, and gap were recorded. With the trajectory data of 299 LVPs, we can identify the 299 following driver's driving aggressiveness based on their interaction with the leading vehicle.
Since the data was collected during peak hours, the aggressive driving recognition model developed in this paper may not be suitable for nonpeak hours and weekends. Shinar and Compton [38] found that the likelihood of aggressive driving during peak hours is higher than nonpeak hours and weekends because drivers have higher values of time in peak hours and then more motives to drive recklessly. Congestion may also trigger anger, frustration, and depression and lead to angry and aggressive driving.

Average Crash Risk Threshold
Using the NGSIM trajectory data of 299 LVPs, we can calculate the Average Crash Risk (ACR) for each following vehicle in LVPs. ACR reflects the driver's aggressiveness in the car-following process. Figure 3 shows the histogram of Average Crash Risk for all 299 drivers. In total, 35.7% of drivers have an ACR = 0, which means these drivers never have negative DSS value and always have enough time to avoid a collision, and 65.8% of drivers have an ACR < 0.1, which indicates only occasional and temporary crash risk during the car-following process.   Table 2 shows three different ACR thresholds, each determined by a method discussed in Section 2.2. K-means clustering algorithm generates the smallest ACR threshold (denoted as ACR 1 ), 0.14. If labeling drivers with ACR higher than ACR 1 as aggressive drivers, then out of 299 drivers, there are 43 aggressive ones and 256 normal ones. Aggressive drivers account for 14.4% of the driver population. The ACR threshold is 0.19 under the "interquartile range rule" (denoted as ACR 2 ). The 94th percentile of ACR distribution is 0.28 (denoted as ACR 3 ). The k-means clustering method labels 14.4% drivers as aggressive, and then create a less imbalanced dataset. Because ACR 1 < ACR 2 < ACR 3 , the "interquartile range rule" method and "Xth percentile" method label fewer drivers as aggressive, and therefore they increase the imbalance ratio to 9:1 and 14:1, respectively. With different combinations of inputs and threshold values, we generate five datasets of driver aggressiveness. Dataset 1-3 have the same ACR threshold value and different input features. Dataset 3-5 have the same input features and different imbalance ratios. Comparison of results from datasets 1-3 can show us the importance of input features, and comparison of results from datasets 3-5 can demonstrate the impact of imbalance ratio on the performance of classification boosting algorithms.

Crash Risk and Driving Aggressiveness
There is one question we need to answer before using ACR to measure driving aggressiveness: is a high crash risk equivalent to aggressive driving? Several external factors may increase a vehicle's crash risk. For example, unexpected and abrupt acceleration and deceleration by the leading vehicle, either caused by traffic congestion or the leading vehicle's reckless driving style, may lead to high crash risk for the following vehicle. If the following vehicle is not responsible, or not completely responsible for the high crash risk, then using ACR to measure driving aggressiveness could be problematic.
To rule out the possibility that traffic condition and the leading vehicle's driving style may impact the following vehicle's crash risk, we calculated the correlation between the leader and follower's ACRs based on a simple logic: if traffic condition has an impact on vehicle's ACR, then both the leading and the following vehicle in the same traffic flow should have a similar crash risk level. If the leading vehicle's reckless driving has an influence on the following vehicle, making it more aggressive or more conservative, then the ACRs of two vehicles should also show some degree of correlation.
Among 299 leader-follower pairs (LVP), 264 leading vehicle's driving aggressiveness are also determined since they are following vehicles in other LVP. Therefore, we can calculate the correlation of ACR for these 264 LVPs. The Pearson correlation coefficient between the leader and follower's ACRs is calculated as 0.01, which indicates that the leading and following vehicles' ACR are nearly independent, and the leading vehicle's crash risk is not delivered to its follower. Therefore, it implies that a driver's ACR is impacted by the leading vehicle and can represent the driver's aggressiveness.

The Performance of Boosting Algorithms
After five times of 5-fold cross-validation, the average precision rate, recall rate, f1 score, and AUPRC of each algorithm are posted in Tables 3-5. We trained nine algorithms with different sets of inputs. First, we only used the DFT coefficients of speed and acceleration from 0-1.5 Hz. Table 3 shows that CUSBoost generates the best AUPRC 0.715, outperforming XGBoost, whose APRC is 0.695. However, CUSBoost's precision rate and recall rate are not so high. RUSBoost gives the highest recall rate for aggressive drivers, 92.8%, which means only 7.2% of aggressive drivers are misclassified as normal drivers. Most precision rates are low, except XGBoost, which gives the highest precision rate as 80.9%, which means in all drivers being identified as aggressive drivers, 19.1% of which are "false-alarm". XGBoost has the highest F1 score of 0.639, followed by CUSBoost, due to its high performance in both precision rate and recall rate.
In the second dataset, we introduce the DFT coefficients of the gap between Leading-following Vehicle Pairs into the input. The results are shown in Table 4. Using the gap as the input, instead of speed and acceleration, can significantly improve the ability of recognition. The highest AUPRC is 0.917, achieved by XGBoost. CUSBoost has the second highest AUPRC 0.912. RUSBoost gives the highest recall rate for aggressive drivers, 96.2%, which means only 3.8% of aggressive drivers are misclassified as normal drivers. XGBoost gives the highest precision rate as 91.0%, which means in all drivers being identified as aggressive drivers, 9.0% of which are "false-alarm". SMOTE + XGBoost has the highest F1 score of 0.903, followed by XGBoost and CUSBoost.
In the third dataset, we combine the DFT coefficients of vehicle speed, acceleration, and gap all together as the input. Table 5 shows that SMOTEBoost gives the highest AUPRC of 0.942, while RUSBoost gives the highest recall rate, 0.954. RUS + XGBoost gives the highest Precision rate and F1 score. Figure 4 shows the AUPRC values each algorithm achieved in Dataset 3-5. We find that SMOTEBoost and CUSBoost are robust with a highly imbalanced dataset. SMOTEBoost and CUSBoost have AUPRC higher than 0.9 in all three datasets. When imbalance ratio increases from 6:1 (Dataset 3) to 14:1 (Dataset 5), AdaBoost and XGBoost's AUPRC declines significantly, even with resampling. SMOTE + AdaBoost and RUSBoost can get better AUPRC when imbalance ratio increases; however, their performance is not stable. SMOTE + AdaBoost has the lowest AUPRC among all algorithms in Dataset 3. RUSBoost has the second-lowest AUPRC in Dataset 4.

The Impact of Resampling
We compare the difference of AUPRC between cost-sensitive boosting without resampling and standard boosting with resampling in Figure 5 and Figure 6. By balancing the majority class and the minority class in the train data with resampling, we expect that AdaBoost and XGBoost can be fitted to predict aggressive drivers better and then have a higher AUPRC. However, the result is not consistent. On one hand, using Random Under Sampling almost always pushes down AUPRC, compared to cost-sensitive learning. On the other hand, using SMOTE resampling slightly raises AdaBoost's AUPRC in Dataset 2 and Dataset 5, but reduces AdaBoost's AUPRC in Dataset 1, 3, and

The Impact of Resampling
We compare the difference of AUPRC between cost-sensitive boosting without resampling and standard boosting with resampling in Figures 5 and 6. By balancing the majority class and the minority class in the train data with resampling, we expect that AdaBoost and XGBoost can be fitted to predict aggressive drivers better and then have a higher AUPRC. However, the result is not consistent. On one hand, using Random Under Sampling almost always pushes down AUPRC, compared to cost-sensitive learning. On the other hand, using SMOTE resampling slightly raises AdaBoost's AUPRC in Dataset 2 and Dataset 5, but reduces AdaBoost's AUPRC in Dataset 1,3,and 4. In all datasets, it is better to train cost-sensitive XGBoost model directly without using any resampling techniques. Shown in Figure 5, SMOTE + XGBoost and RUS + XGBoost's AUPRC are always lower than XGBoost's.

The Impact of Resampling
We compare the difference of AUPRC between cost-sensitive boosting without resampling and standard boosting with resampling in Figure 5 and Figure 6. By balancing the majority class and the minority class in the train data with resampling, we expect that AdaBoost and XGBoost can be fitted to predict aggressive drivers better and then have a higher AUPRC. However, the result is not consistent. On one hand, using Random Under Sampling almost always pushes down AUPRC, compared to cost-sensitive learning. On the other hand, using SMOTE resampling slightly raises AdaBoost's AUPRC in Dataset 2 and Dataset 5, but reduces AdaBoost's AUPRC in Dataset 1,3,and 4. In all datasets, it is better to train cost-sensitive XGBoost model directly without using any resampling techniques. Shown in Figure 5, SMOTE + XGBoost and RUS + XGBoost's AUPRC are always lower than XGBoost's. 6. Discussion

ACR and Aggressiveness
There is one question we need to answer: is a high crash risk equivalent to aggressive driving? There are several external factors that may impact a vehicle's crash risk. For example, frequent/abrupt acceleration and braking by the leading vehicle, either caused by unstable traffic flow or the leading vehicle's reckless driving style, may lead to high crash risk for the following vehicle. If the following vehicle is not responsible, or not completely responsible for the high crash risk, then using ACR or any rear-end collision surrogate indicator to measure driving aggressiveness could be problematic.
To rule out the possibility that traffic condition and/or the leading vehicle's driving style may 6. Discussion

ACR and Aggressiveness
There is one question we need to answer: is a high crash risk equivalent to aggressive driving? There are several external factors that may impact a vehicle's crash risk. For example, frequent/abrupt acceleration and braking by the leading vehicle, either caused by unstable traffic flow or the leading vehicle's reckless driving style, may lead to high crash risk for the following vehicle. If the following vehicle is not responsible, or not completely responsible for the high crash risk, then using ACR or any rear-end collision surrogate indicator to measure driving aggressiveness could be problematic.
To rule out the possibility that traffic condition and/or the leading vehicle's driving style may impact the following vehicle's crash risk, we calculated the correlation between the leader and follower's ACRs based on simple logic. If traffic condition has an impact on vehicle's ACR, then both leading and following vehicle in the same traffic flow should have similar crash risk level. If the leading vehicle's reckless driving has an influence on the following vehicle, making whom more aggressive or more defensive, then the ACRs of two vehicles should also show some degree of correlation.
Among 299 leader-follower pairs (LVP), 264 leading vehicle's driving aggressiveness are also determined since they are following vehicles in other LVP. Therefore, we are able to calculate the correlation of ACR for these 264 LVPs. The Pearson correlation coefficient between the leader and follower's ACRs is −0.0137, which indicates that the leading vehicle's crash risk is not delivered to its follower, and also implies that ACR can represent driving aggressiveness.

Algorithm Performance
Based on the results shown in Table 3 to Table 5, we find that imbalanced class boosting algorithms, SMOTEBoost and CUSBoost, generally outperform other boosting algorithms. XGBoost also performs well when the imbalance ratio of the dataset is moderate.
The advantage of the imbalanced class boosting algorithm is more obvious with a high imbalance ratio. For example, in Dataset 3, aggressive drivers account for 14.4% of all drivers, and the performance difference between XGBoost, which is the best cost-sensitive boosting algorithm, and CUSBoost is small (AUPRC: 0.938 vs. 0.935). By contrast, in Dataset 5, in which only 6.4% of drivers are aggressive, the AUPRC of XGBoost decreases from 0.938-0.871, while the AUPRC of CUSBoost only drops slightly from 0.935-0.924.
It is surprising to find that SMOTE + AdaBoost and RUS + AdaBoost give worse results than AdaBoost, and SMOTE + XGBoost and RUS + XGBoost give worse result than XGBoost. Several existing studies used SMOTE or other resampling methods before boosting, assuming that it will create better results for imbalanced data. It implies that apply the resampling method before boosting algorithms does not guarantee a better recognition result than cost-sensitive learning. One possible reason is that a one-time resampling in the training data before the boosting may skew the data distribution and later weaken the boosting algorithm's power to recognize the test data because test data is not balanced by the resampling method.

Mode Input
We found that using the discrete Fourier coefficients of acceleration alone as the input was much worse than using other inputs. Due to acceleration's poor ability to recognize aggressive drivers, its result is not included in this paper. It is a little surprising since some previous studies found that acceleration, pedal brake, or throttle opening have significant power in driving style classification. For example, Kluger et al. [1] found a distinct difference in acceleration discrete Fourier coefficients between vehicles with safety-critical events and vehicles at baseline. There are several possible explanations. First, acceleration is the first-order derivative of speed, which implies that the information of acceleration has been included in the DFT coefficient of speed. Second, the trajectory data in the paper was recorded on a highway near night peak-hour, the acceleration pattern would be different from vehicles in free-flow traffic, local street traffic, or vehicle data generated from driving simulator with no traffic or light traffic. Third, the fidelity of acceleration in NGSIM, even after reconstruction by Montanino and Punzo [34], is still the second-order derivative of vehicle position extracted from the video. Errors in vehicle position will be amplified and delivered to vehicle acceleration. The innate disadvantage of video-based trajectory data might be the reason for the indifference between normal and aggressive driver's acceleration DFT coefficients.

Conclusions
The objectives of this research are mainly three things: find out how to label drivers using a new collision surrogate measurement, how to identify aggressive drivers using imbalanced class boosting, and what is the key feature. This paper takes advantage of the reconstructed NGSIM trajectory data to explore the possibility of identifying aggressive drivers. To label each driver's driving aggressiveness, we propose a surrogate measurement of collision risk, Average Crash Risk (ACR), which distinguishes aggressive drivers from others based on their response in the car-following process. Compared to other labeling methods, like experts' subjective judgment, questionnaire, or clustering based on speed/acceleration/wheel steering, surrogate measurement is more suitable in real-world traffic. The correlation of collision risk between leading and following vehicles is tested. This paper found that the crash risk of the leading vehicle, measured by ACR proposed in this paper, has no impact on the following vehicle, and the driving aggressiveness of the two drivers are independent.
We found the gap between the leading and following vehicles is the key feature to recognize aggressive drivers. A vehicle's speed and acceleration can be influenced by its leading vehicle, and we found using speed and acceleration alone cannot identify the driver's aggressiveness with acceptable precision rate and recall rate. By contrast, using the gap alone as the input can train a model with precision rate and recall rate both at a 90% level.
Imbalanced class boosting algorithms show their ability to handle imbalanced driving data. The more imbalanced the data is, the more necessary it is to use an imbalanced class boosting algorithm rather than a standard classification algorithm. When using the discrete Fourier coefficients of the gap, speed, and acceleration as the input features, SMOTEBoost, RUSBoost, and CUSBoost outperform AdaBoost and XGBoost in the most imbalanced Dataset 5. Resampling imbalanced data before AdaBoost or XGBoost does not always improve the model's recognition ability. Since SMOTEBoost, CUSBoost, and RUSBoost are modified AdaBoost with the resampling method, their performance can be further improved by replacing AdaBoost with more advanced boosting algorithms.