Intrusion Detection in Intelligent Connected Vehicles Based on Weighted Self-Information

T.Y


Introduction
The upcoming 5G-Advance and 6G mobile communication networks can boost the development of the Internet of Vehicles (IoV). In the IoV, by the exploitation of embedded communications, sensing, and information-processing modules, intelligent connected vehicles (ICVs) can intelligently be aware of transportation environments and effectively exchange information with pedestrians, peer vehicles, and roadside infrastructures. The integration of communications, sensing, and computation facilitates ICVs with intelligent services such as advanced driver-assistance systems (ADAS) and autonomous driving [1,2].
Meanwhile, with the increasing number of embedded electronic control units (ECUs) and external communication interfaces, the in-vehicle network (IVN) has been developed from a simple point-to-point control bus to a distributed and heterogeneous communication and control network to guarantee controls and communications among the onboard modules [3]. A diagram of an ICV with a heterogeneous in-vehicle network in the IoV is provided in Figure 1. With external communication interfaces, the IoV can provide intelligent and tailored services to connected vehicles while bringing potential malicious network intrusion to the IVN. The closure of traditional IVN has resulted in the existing in-vehicle communication protocols, especially the most widely deployed controller area network (CAN), and a lack of security and privacy protection mechanisms, including access control, authentication, and encryption. Due to the lack of access control mechanisms, attackers can directly invade the IVN by cracking external communication interfaces. If malicious intrusion cannot be detected in time, the attacker can manipulate the compromised vehicle by controlling the CAN bus. For example, Miller and Valasek forced a Jeep Cherokee running on the highway to brake and rush to the roadside by remotely intruding on the CAN bus. Chrysler had to recall 1.4 million vehicles [4]. The Keen Security Lab of Tencent implemented the remote intrusion and absolute control of the CAN bus of Tesla S series vehicles in parking and driving states through a Wi-Fi interface, forcing Tesla to update its IVN system [5,6].
Moreover, if the compromised vehicle fails to detect malicious network intrusion in time, the intruder can invade and control other connected vehicles through the IoV. However, the cost of updating and manufacturing makes it difficult to replace current in-vehicle network architecture and communication protocols such as the CAN bus. At the same time, due to the limitations of in-vehicle network resources, the network security mechanisms used for computer networks are too complex to be applied to IVNs directly. Therefore, it is critical to develop intrusion-detection methods for IVNs based on existing network architecture and protocols to detect malicious network intrusions in a timely and accurate manner, thus preventing further attacks and threats.
Malicious intruders manipulate the compromised vehicles by injecting messages. Message injection can incur an abnormal pattern of CAN message information entropy, which is the information entropy of the CAN message arbitration identifier (ID) per unit of time. Therefore, some research efforts have been dedicated to ID entropy-based intrusion detection. Müter and Asaj detected flooding and injection attacks with a specific ID based on the fluctuations of the CAN message ID information entropy and relative entropy in a unit time window for the first time [7]. In [8], the authors further proposed the concept of relative distance to detect an injection attack with a legal ID. Using the information entropy of different message IDs in a unit time window as features, Wu et al. [9] proposed a novel sliding-window strategy with a fixed number of messages to avoid the interference of different baud rates and aperiodic CAN messages on the information entropy. Traditional message information entropy-based intrusion-detection methods can detect flooding attacks and other injection attacks with massive and high-frequency message injections. However, it can hardly detect attacks with few injected messages that have little impact on information entropy.
To resolve the above issues, a CAN message ID features-based intrusion-detection method is proposed in this work. First, a sliding window is used to continuously extract a frame of streaming CAN messages. Subsequently, the weighted self-information of the CAN message ID is defined, and both the weighted self-information and the normalized value of an ID are extracted as features. A lightweight one-class classifier, the local outlier factor (LOF), is then used to identify the outliers and detect malicious network intrusion attacks. Simulations have been conducted based on the public CAN dataset provided by the HCR Lab. The proposed method is analyzed using four different one-class classifiers, namely LOF, support vector data description (SVDD), isolation forest (iForest), and Ellipti-cEnvelope. The traditional information entropy-based intrusion-detection methods in the literature [7][8][9] are adopted as benchmarks. Experimental results indicate that, compared to the benchmarks, the proposed method dramatically improves the detection accuracy of injection attacks, namely denial-of-service (DoS) and spoofing, especially when the number of injected messages is low. The results also unveil that, considering the detection accuracy and the time complexity, LOF is the preferred one-class classifier for this work.
The rest of the paper is organized as follows. The structure of the CAN data frame is introduced in Section 2 first. Afterward, the CAN message ID features-based intrusiondetection method is described in Section 3. In Section 4, the performance of the proposed method is evaluated. Finally, the paper is concluded in Section 5.

CAN Data Frame
The structure of the CAN data frame is shown in Figure 2.

Intrusion Detection Based on CAN Message ID Feature Extraction
The concept of weighted self-information of a CAN message ID is clarified first, followed by the two-dimensional ID features-based intrusion-detection method.

Weighted Self-Information of CAN Message ID
The structure of the CAN data frame is shown in Figure 2. The message ID occupies 11 bits in the frame, which is used to identify the destination and priority. The lower value of ID indicates a higher priority. Within a time window, the probability of message ID being i is calculated as where n i is the number of messages with ID i within the time window, while n all is the total number of all the messages within the time window. The self-information is thus determined by The weighted self-information of a message ID being i is defined as The entropy of message ID within a time window is calculated as where ID is the set of all the message IDs showing up within the time window.

Local Outlier Factor (LOF)
The local outlier factor is a density-based unsupervised outlier detection method [10]. In this work, it is used as a one-class classifier, which can be applied to unknown attack detection. The LOF is detailed as follows.

k-Distance
For a data point x i in a N−sized dataset X, the Euclidean distance to the rest of the data points in the same set is calculated by where · 2 is the l 2 norm. The k-distance of data point x i is denoted as dist k (x i ). It is defined by there being at least k data points in the rest of the set that meet the condition dist(x i , x j ) ≤ dist k (x i ) and at most k − 1 data points meet the condition dist( Based on the definition of k-distance, the set of k-nearest neighbors (kNN) of the data point x i is thus defined as Please note that |Nb k (x i )| ≥ k as there is possibly more than one data point with the kth distance.

Reachability Distance
The reachability distance (RD) from data point x i to x j is defined as where dist(x i , x j ) is the Euclidean distance between x i and x j , and dist k (x j ) is the k-distance of x j . Please note that the RD is directional, such that RD k (x i , x j ) may not be equal to RD k (x j , x i ).

Local Reachability Density
Based on kNN and RD, the local reachability density (LRD) of data point x i is given by which evaluates the average reachability of x i to its kNN.

LOF Score
The LOF score is finally defined as where the local reachability density of data point x i is compared with that of its kNN. The LOF score is used for outlier detection, where δ is the detection threshold determined by the specific applications.

Intrusion Detection Based on Extracted ID Features
A block diagram of the proposed ID features-based intrusion-detection method is depicted in Figure 3, and the pseudocode is listed in Algorithm 1. The specific procedures are provided as follows.

Algorithm 1 ID Features-based Intrusion Detection
1: Input: streaming CAN message number n = 0, window size n all , threshold δ 2: while True do 3: cumulate streaming CAN message n = n + 1 4: if n = n all then 5: # ID is the CAN message ID set of the time window 6: for i ∈ ID do 7: calculate the weighted self-information of ID i by (3) → I w One is the proposed to be weighted self-information. The other is the normalized ID, which is calculated as where i ∈ ID, and 0x07ff is the upper limit of CAN message IDs due to the pre-defined length of 11 bits. • A lightweight one-class (OC) classifier LOF that takes the two-dimensional ID features as input is used to identify the abnormal messages incurred by the malicious network intrusion.

Performance Evaluation
In this section, the practical dataset used for simulation experiments and the metrics used for performance evaluation are introduced first. Subsequently, the simulation experiments are described, and an insightful analysis of the experimental results is provided.

Experimental Dataset
The public dataset provided by the HCR Lab of Korea University was adopted for the simulation experiments [11]. The dataset used for intrusion detection is described in Table 1. As illustrated in Table 1, we adopted 30,000 samples of the attack-free dataset that were collected from normal driving conditions. For the analysis of DoS and spoofing attack detection, 200 groups of attacks were randomly injected into the dataset, and the number of injected messages ranged from 5 to 50. A DoS attack floods the CAN using the message ID with the highest priority, i.e., 0x0000, to prevent normal communications and services. A spoofing attack pretends to be a normal ECU and sends messages with a legal ID, such as 0x0316, to manipulate the vehicles with malicious operations, such as urgent brake and acceleration. In the simulation experiments, attack-free data were used for classifier training and validation, where the ratios were 80% and 20%, and the data with attacks were used for testing. The window size was fixed at 50.
where TP, FN, FP, and TN refer to the true positive, false negative, false positive, and true negative results, respectively.

Experimental Results
To evaluate the performance of the proposed method, the traditional entropy-based method [7], sliding-window entropy-based method [9], and relative distance-based method [8] were adopted as benchmarks. In terms of the lightweight one-class classifiers, except for the LOF used in this work, SVDD [14], iForest [15], and EllipticEnvelope [16] were also considered.

Analysis of Attack Group Size
The detection accuracy of a DoS attack and spoofing attack under different numbers of injected messages per attack group are provided in Tables 2 and 3, respectively. From  Tables 2 and 3, we can come to the following conclusions.

•
The relative distance-based method [8] can detect the spoofing attack with different attack group sizes accurately but can hardly be applied to the DoS attack detection due to its definition. In [8], the DoS attack is detected by the traditional entropy-based method [7]. • The detection accuracy of the DoS attack and spoofing attack using the benchmarks, namely traditional entropy-based method [7] and sliding-window entropy-based method [9], is below 90% when the attack group size is smaller than 20, jumping to around 95% when the attack group size increases to 30. The reason for this is that the traditional methods calculate the overall information entropy of all CAN messages within a time window. Thus, the methods can detect the intrusion with massive and high-frequency message injection but can hardly detect the intrusion with few injected messages that have little impact on the information entropy. • The proposed method outperforms the benchmarks, especially when the number of injected messages is low. This is because the proposed method extracts the weighted self-information and normalized ID as features, which considers the information entropy of the messages with different IDs individually. Hence, it is more sensitive to the information entropy variation than the traditional methods considering the information entropy of all the messages. • In terms of the proposed method with different one-class classifiers, the detection accuracy of DoS attack ranking in descending order is LOF > SVDD > EllipticEnvelope > iForest. The detection accuracy of spoofing attack ranking in descending order is LOF > SVDD > iForest > EllipticEnvelope.

. Analysis of One-Class Classifier
To further analyze the performance of the proposed method with different one-class classifiers under the small-scale attack, the accuracy, precision, recall, and F1-score of DoS and spoofing attack detection under the attack group size of 5 are listed in Tables 4 and 5, respectively. For DoS attack detection, the average performance of these four metrics in descending order is LOF > SVDD > EllipticEnvelope > iForest. For spoofing attack detection, the average performance of these four metrics in descending order is LOF > iForest > SVDD > EllipticEnvelope.
The time complexity of one-class classifiers is compared in Table 6. It can be seen that the descending order of the complexity of these classifiers is SVDD > LOF > iForest > EllipticEnvelope.
Overall, in terms of detection accuracy, LOF and SVDD perform better than iForest and EllipticEnvelope. For time complexity, iForest and EllipticEnvelope are less complex than SVDD and LOF. Thus, there is a tradeoff between detection accuracy and time complexity. Considering the detection accuracy and the time complexity, LOF is the preferred one-class classifier for this work.

Conclusions
To protect connected vehicles and IoV systems from being attacked, a CAN message ID features-based intrusion-detection method was proposed in this work. First, a sliding window was used to continuously extract a frame of streaming CAN messages. Afterward, the weighted self-information of the CAN message ID was defined, and both the weighted self-information and the normalized value of an ID were extracted as features. Subsequently, a lightweight one-class classifier LOF was used to identify the outliers and detect the malicious network intrusion attack. Simulations were conducted based on a public CAN dataset. The proposed method was analyzed with four different one-class classifiers, namely LOF, SVDD, iForest, and EllipticEnvelope. The three traditional information entropybased intrusion-detection methods were adopted as benchmarks. The experimental results indicated that the proposed method dramatically improved the detection accuracy of DoS and spoofing attacks compared to the benchmarks, especially when the number of injected messages was low. Furthermore, LOF was the preferred one-class classifier for the proposed ID features-based intrusion detection based on the analysis of the detection accuracy and time complexity.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: