On the Application of Entropy Measures with Sliding Window for Intrusion Detection in Automotive In-Vehicle Networks

The evolution of modern automobiles to higher levels of connectivity and automatism has also increased the need to focus on the mitigation of potential cybersecurity risks. Researchers have proven in recent years that attacks on in-vehicle networks of automotive vehicles are possible and the research community has investigated various cybersecurity mitigation techniques and intrusion detection systems which can be adopted in the automotive sector. In comparison to conventional intrusion detection systems in large fixed networks and ICT infrastructures in general, in-vehicle systems have limited computing capabilities and other constraints related to data transfer and the management of cryptographic systems. In addition, it is important that attacks are detected in a short time-frame as cybersecurity attacks in vehicles can lead to safety hazards. This paper proposes an approach for intrusion detection of cybersecurity attacks in in-vehicle networks, which takes in consideration the constraints listed above. The approach is based on the application of an information entropy-based method based on a sliding window, which is quite efficient from time point of view, it does not require the implementation of complex cryptographic systems and it still provides a very high detection accuracy. Different entropy measures are used in the evaluation: Shannon Entropy, Renyi Entropy, Sample Entropy, Approximate Entropy, Permutation Entropy, Dispersion and Fuzzy Entropy. This paper evaluates the impact of the different hyperparameters present in the definition of entropy measures on a very large public data set of CAN-bus traffic with millions of CAN-bus messages with four different types of attacks: Denial of Service, Fuzzy Attack and two spoofing attacks related to RPM and Gear information. The sliding window approach in combination with entropy measures can detect attacks in a time-efficient way and with great accuracy for specific choices of the hyperparameters and entropy measures.


Introduction
With the evolution of the automotive industry to increased levels of connectivity and automation, the potential for cybersecurity attacks is growing as the vehicle is more exposed to digital attacks. A modern vehicle today is implemented with various electronic components including sensors, actuators, Electronic Control Unit (ECU) and communication devices, which are connected to different types of in-vehicle networks. The use of sensors to perceive the surrounding environment (e.g., camera, LiDARs) will be even more important with increasing levels of automation. In general, each ECU in a vehicle performs a specific function and groups of ECUs are usually connected into a common sub-network (e.g., powertrain). One of the most common in-vehicle network standards in the automotive industry is the Controller Area Network -bus (CAN-bus), which has such a widespread use due to its characteristics of low cost, relatively high reliability and fault tolerance. On the other hand, the CAN-bus was designed in times where the potential for cybersecurity threats was minimal due to the physical isolation of the vehicle from the outside world. Although tampering of specific functions was reported, this was usually implemented on specific components of the vehicle rather than the in-vehicle network. Because the automotive vehicles are increasingly connected, the potential for cybersecurity attacks grows from the technical point of view. Researchers have demonstrated the feasibility of various remote attacks starting with the seminal work by Checkoway described in [1], which has shown that the remote exploitation of a vehicle is possible via a broad range of attack vectors to the point that remote control of the vehicle can be achieved. Although the drivers for such attacks can vary and it may be related or not related to infringements of automotive regulations, the research community has started to investigate in-vehicle cybersecurity threats and potential countermeasures in detail. Although the connectivity of the vehicle and its dependency on quite sophisticated digital components does widen the risk of cybersecurity threats, the growing levels of automation of the vehicle increase the potential impact of a cybersecurity threat from a safety point of view as the driver may be absent or s(he) may not be able to react in due time. In the ICT domain, one of the primary techniques to mitigate cybersecurity threats is Intrusion Detection System (IDS) where the objective is to detect an attack in the shortest time possible so that an appropriate countermeasure (e.g., isolation of a section of an in-vehicle network) can be implemented. As mentioned above, the most important assets in the automotive vehicles are ECU, sensors and actuators, which are connected through various in-vehicle networks like CAN-bus, FlexRay and LIN. Then, a remote attack is likely to be conducted by injecting or manipulating messages in the in-vehicle network and this is the area where most of the research literature has focused (see Related Work Section 2). The advantage of IDS based on the analysis of in-vehicle traffic is that it does not rely on the implementation of cryptographic solutions in the in-vehicle network, which may be expensive to deploy or unfeasible to implement in existing standards or technologies because of technical limitations [2,3]. In this paper, we focus specifically on attacks on the CAN-bus, at it is the most widely deployed in-vehicle network standard in the world and it is mainly used to connect the most critical assets (e.g., ECUs) of the automotive vehicle. This paper proposes an approach based on the application of an information entropy-based method based on a sliding window, which is quite efficient from time point of view, it can be flexible to adapt to changes in the operational context of the vehicle and it provides a very high detection accuracy as demonstrated by the results presented in this paper. The approach is based on the calculation of the entropy of the CAN-bus messages transmitted on the CAN-bus network and it is based on the hypothesis that attacks modify the entropy of the CAN-bus traffic so that variation of the calculated entropy may indicate a cybersecurity threat. This idea is not new in the literature and recent studies have demonstrated its potential in comparison to other IDS techniques based on machine learning and deep learning (mostly from a time efficiency point of view), but in some cases the entropy-based approach has provided a low detection accuracy or the analysis of the attacks was limited to one or two cases. This paper proposes an extensive analysis of a wide variety of entropy measures, which explains the reason for the weak results presented in the literature (e.g., the selected entropy measures did not have significant discriminating power) and support the identification of the entropy measures, which provide the highest classification performance. Different entropy measures are used in the evaluation: Shannon Entropy, Renyi Entropy, Sample Entropy, Approximate Entropy, Permutation Entropy, Dispersion and Fuzzy Entropy. The paper evaluates the impact of the different hyperparameters present in the definition of entropy measures on a very large public data set of CAN-bus traffic with millions of CAN-bus messages with four different types of attacks: Denial of Service, Fuzzy Attack and two spoofing attacks related to RPM and Gear information. The sliding window approach in combination with entropy measures can detect attacks in a time-efficient way and with great accuracy for specific choices of the hyperparameters and entropy measures This paper is organized as follows: Section 2 provides a state of art of the related work in two main areas: (a) intrusion detection systems in in-vehicle networks with a specific focus on the application of entropy measures and (b) application of the entropy measures used in this paper in various domains for problems of identification and classification. Section 3 describes the overall methodology for intrusion detection including the definition of the entropy measures used in the analysis and the materials (i.e., public data set for in-vehicle attacks) on which the measures are applied. Section 4 provides the results of the evaluation for the different attacks and the different entropy measures in relation to the values of the hyperparameters and the size of the sliding window. Section 5 draws conclusions and describes future developments.

Related Work
As mentioned before, this related work section focuses on two different areas: (a) intrusion detection systems in in-vehicle networks with a specific focus on the application of entropy measures and (b) the application of the entropy measures used in this paper for identification problems in other domains.

Intrusion Detection in In-Vehicle Networks
The basic principle of an IDS for an in-vehicle detection system is the same as that of an IDS for general communication networks, where the research literature is already quite extensive and in many cases, research outputs have led to commercial developments. There are already surveys in the literature, which extensively identify and catalog the different IDS for in-vehicle networks [4][5][6]. One simplistic classification of the methods to design and implement the IDS divides the literature in two main categories. The first category of methods is based on the pre-storing of pre-specified signatures of external attacks, inspection of the transmitted packets, and analyzing whether any pattern matches with the stored signatures. In this context, machine learning approaches based on the creation of a training data set of known attacks and normal/legitimate traffic are widely used and some examples can be found in [7], which uses one-class Support Vector Machine (SVM) or deep learning approaches as in [8]. Although these methods have achieved very high accuracy, they suffer from high computational costs to create the training set of the library of signature/patterns. The second method detects abnormalities using statistical characteristics of the normal range of the data generated by the components in the vehicle and transmitted in the in-vehicle network. This approach is more time-efficient than the approaches in the previous category because the extraction of the features from the network traffic lead to a significant dimensionality reduction in the data analysis process. On the other side, this method can be less accurate if the features and the related detectors are not chosen properly. The second method (adopted in this paper) can benefit from the specific characteristics of in-vehicle automotive networks. As highlighted in [9], one of the most relevant differences between conventional communication networks and vehicular networks in the viewpoint of IDS is that messages generated and transmitted in in-vehicle networks have uniform and regular characteristics, because the traffic usually conveys control or status information of the vehicle, unlike those made by network users over general networks. Because the estimation is made by determining whether the abnormal phenomenon is normally deviated from the specific pattern of normal traffic, the probability of error can be reduced, compared to general communication networks. It should also be noted that the computational power of the ECUs used in vehicles is generally limited compared to the processing power of the computing platforms in generic communications networks (e.g., Software Defined Networks), and thus, the implementation and execution of complicated classification algorithms (e.g., Deep Learning) may be difficult to implement in the vehicle. For this reason, this paper proposes an approach belonging to the second category where the detection of the in-vehicle attacks is done using features (i.e., the entropy measures in this case) extracted from the in-vehicle network traffic. It is noted that there are other approaches for IDS (as described in [4]) based on the physical layer fingerprinting of the ECU in the network to identify masquerade attacks when a malicious ECU replaces a legitimate one [10,11], but these techniques are out of scope of the approach proposed in this paper.
One of the first papers to adopt an entropy-based approach for in-vehicle detection is [12]. Although this can be considered a seminal paper in this area, their experimental evaluation in [12] was limited, and spans over just 15 s of CAN-bus traffic including only a single class of CAN-bus messages that are not safety-relevant.
This paper is mostly based on the approach proposed in [3,13], but the scope of this paper is considerable wider than each of the two papers. The first paper in the literature, which used entropy to detect in-vehicle attacks was [3], where the authors have used Shannon Entropy to detect two types of attacks: a replay attack and a fuzzy attack. A sliding window where the entropy is calculated and evaluated against a threshold k is used and a similar approach is used in this paper. This initial study was based on the timing of the messages and the detection accuracy suffered when the rate of attacks is relatively low. It was demonstrated in subsequent papers [13] (see paragraph below for further details) that an approach based on the counting of the messages is more effective than the timing of the messages. For this reason, this paper uses the number of received CAN-bus messages, while maintaining the information theoretic approach based on entropy measures.
In addition to [3], the most similar paper in the literature to this paper is [13], where a sliding window approach is used to detect two different types of attack: a DoS Attack and an injection attack. The impact of different window types is evaluated as well as the threshold used to determine when an attack is implemented or not. In comparison to [3] where a time-based approach was used, the authors in [13] use the number of received CAN-bus messages, which is shown to be more effective than the time-based approach. This paper benefits greatly from the description of the approach presented in [13] as it basically adopts a similar methodology but this paper expands considerably the analysis in [13] as it evaluates four attacks and uses a much wider set of entropy measures in comparison to the single use of Shannon Entropy. In addition, this paper uses a public data set, which is much larger than the one used in [13] (about 3.5 million CAN-bus messages). The data analysis is dependent on the choice of various hyperparameters, which include the choice of the entropy measures, the size of the sliding window and the threshold to distinguish a statistical anomaly (i.e., which points to an attack) from the previous calculated normal range. The last two hyperparameters are optimized using a simulated annealing algorithm. Instead, this paper opts for a description of the results according to the variation of the hyperparameter values to provide a more transparent view of the impact of each hyperparameter on the classification performance.
A very recent paper, which adopts a different entropy measure from the previous papers is [9] where the Renyi Entropy of order 2, 3 and 4 is used to detect a DoS and Fuzzy attack for different values of the sliding window. This paper is also based on a similar methodology to [9], but it takes in consideration a much larger set of entropy measures including Renyi Entropy.
Another significant difference of this paper in comparison to the previous references [3,9,13] is that the analysis is performed on the payload rather than the CAN-ID. This approach is proposed both to address a gap in the literature (mostly focused on the analysis of the CAN-ID) and to evaluate if the information theory approach can be applied to payload analysis and to address the threat where an attacker masquerades the injected traffic using a legitimate ID already present in the in-vehicle network. Although it is acknowledged that different vehicle models may have different semantics of the payload data (but they must be conforming to the CAN-bus standard specifications), the approach proposed in this paper is agnostic to the payload data semantics as it is based on a data analysis of the in-vehicle CAN-bus traffic and it needs to be executed only on a specific vehicle model or even a specific vehicle. See also Section 2.3 for additional details.

On the Application of Entropy Measures to Classification and Identification Problems
Most of the entropy measures used in this paper have never been used for in-vehicle attacks and there is an absence of related literature. On the other side, entropy measures are often used for identification problems in other domains. Thus, this subsection reviews the literature on the application of the entropy measures used in this paper (e.g., Sample Entropy, Approximate Entropy, Permutation Entropy, Dispersion Entropy and Fuzzy Entropy) for the purpose of detection and identification in different domains. Please note that the definitions of the different entropy measures are only briefly introduced in this section as they are described in detail in Section 3.5 of this paper. Permutation Entropy (PeEn) was initially introduced by Bandt and Pompe in [14] and it has been used for many different applications since then. In [15], it has been used for the detection of stealthy attacks on industrial control systems. The approach is based on the consideration that stealthy attacks present some sort of regularity besides the magnitudes, which prompted the adoption of PeEn in this paper as well because industrial control systems and in-vehicle networks shares some similarities. In the biomedical sector, PeEn is used in [16] to distinguish between normal and pathological gait with very good accuracy.
Sample Entropy (SaEn) and Approximate Entropy (ApEn) has been extensively used in the analysis of physiological signals, where it has often demonstrated superior performance. Even if there are many papers in the literature using SaEn and ApEn, we select the two following works since they are similar to our approach as they compare the discriminating power of different entropy measures for classification purposes. The authors in [17] have used approximate entropy with other entropy measures for the identification of focal electroencephalogram signals. The entropy measures are applied to the intrinsic mode functions generated by the application of empirical mode decomposition, while in this paper, the Fourier Transform is used. SaEn and other entropy measures have also been used in automatic sleep classification [18].
Dispersion Entropy (DiEn) has been recently introduced by the authors in [19] and it is suggested as an improvement both to PeEn and SaEn. Since its introduction, it has been applied for identification problem in different domains. In particular, for the identification and authentication of wireless communication devices, DiEn has demonstrated an improvement in the classification performance and robustness in the presence of noise [20]. DiEn has also been used to detect and identify gear faults in mechanical related applications, where it has shown its superior performance in comparison to PeEn and ApEn with the additional advantage of a faster computational time [21].
Fuzzy Entropy has been used in [22] in combination with the empirical Wavelet Transform for the monitor and diagnose of faults of motor bearing, where it has demonstrated its high identification performance. Fuzzy Entropy has been applied in combination with ant colony optimization in [23] to a problem similar to the one presented in this paper: intrusion detection in communication networks, which have different characteristics than in-vehicle network traffic.
In the same context of intrusion detection in communication networks, Renyi Entropy has been compared to Shannon Entropy in [24,25] to classify the traffic as normal or suspicious and to select the most appropriate attributes of the network traffic. Renyi Entropy has demonstrated a superior detection performance, which is the reason it was included in the set of entropy measures used in this paper.

Main Contributions of This Paper in Comparison to Related Work
To summarize, the key aspects of the proposed approach are identified in the following list:

•
This paper extends significantly the number and types of entropy measures used (Shannon, Renyi, Sample, Approximate, Permutation, Dispersion and Fuzzy Entropy) to perform the information theory analysis in comparison to the limited number of entropy measures adopted in the literature. Some of these entropy measures (e.g., dispersion entropy) were introduced only recently in the literature in different domains than automotive cybersecurity. The rational for their use is that such entropy measures have demonstrated their discriminating power in classification problems and this paper evaluates their application to this specific domain. In addition the impact of specific hyperparameters (e.g., embedding dimension) present in the definition of some of the entropy measures is evaluated.
• Four different type of attacks (identified as DoS, Fuzzy, RPM and Gear in the rest of this paper) are analyzed in comparison to the literature on a published data set containing millions of CAN-bus-messages.

•
The analysis is performed on the CAN-bus message payload rather than the CAN-bus message ID as commonly done in the literature because the ID could subject to masquerading attacks. As highlighted in [26], the analysis of the payload rather than the CAN-bus IDs presents the issue that a large amount of data must be processed, especially if machine learning of deep learning approaches are used. This is the reason an efficient sliding window approach is instead used in this paper where a large set of entropy measures is applied to reduce the dimensionality of the CAN-bus payload data. It can be remarked that different vehicle manufacturers have different semantics of the payload content in the CAN-bus messages, but the objective of this paper is not to support portability of the attack detection approach across different vehicle manufacturers. The intrusion detection system can be specific to each vehicle or to a vehicle model where the payload format and semantic is the same. Then the payload-based IDS is based on the consideration that the IDS algorithm identifies the key values of the hyperparameters using a data derived approach and it is agnostic to the implementation/format of the CAN-bus payload in the vehicle model.

Description of the Controller Area Network Protocol
CAN-bus protocol was invented by Robert Bosch GmbH and officially released in 1991. It is a message-based protocol, which was designed to allow robust communication among microcontrollers in a vehicle and meet the specific requirements of in-vehicle environment, such as real-time processing, strong robustness, and cost effectiveness. CAN-bus protocol uses broadcast communication to transmit messages.
A description of the standard CAN-bus (CAN 2.0) frame structure with the identification of the specific fields is provided in Figure 1. As mentioned before, the focus of this paper is on the analysis of the Data segment of the frame, which can be composed up to 8 bytes. Because of the increasing data exchange in in-vehicle networks, in most of the cases, all the 8 bytes are used and the data set used in this paper (described in Section 3.2) has all the CAN-bus messages set to 8 bytes. As described before, it can be remarked that different vehicle manufacturers have different semantics of the payload content in the CAN-bus messages, but the objective of this paper is not to support portability of the attack detection approach across different vehicle manufacturers. The intrusion detection system can be specific to each vehicle or to a vehicle model where the payload format and semantic is the same.

Data Sets and Attack Scenarios
This paper uses the data set created by Hacking and Countermeasures Research Lab described in [26,27]. The data has been extracted from a Hyunday YF Sonata through a Y-cable plugged into the OBD-II port through a Raspberry Pi3 as described in [26,27]. The recorded CAN-bus traffic matches the specification of CAN 2.0 with a CAN-bus message interpretation based on the Hyunday YF Sonata model.
The datasets contain each 300 intrusions of message injection. Each intrusion performed for in time ranging from 3 to 5 s, and each dataset has total 30 to 40 min of the CAN-bus traffic, then the data sets are quite extensive and they contains millions of messages as described in the following Table 1: The four attacks scenarios are described below: • In the Denial of Service (DoS) attack, messages of '0000' CAN-bus ID were inserted in the in-vehicle network every 0.3 ms.

•
In the Fuzzy attack, totally random CAN-bus ID and DATA values of the CAN-bus message were injected every 0.5 ms.

•
In the Spoofing attack of type RPM, messages related to the RPM information were injected every 1 ms.

•
In the Spoofing attack of type Gear, messages related to the Gear information were injected every 1 ms.
The Dataset were created by logging CAN-bus traffic (from 30 to 40 min of CAN-bus traffic) via the OBD-II port from a real vehicle while message injection attacks were performing. Additional details are provided in [26,27].

Workflow
The description of the workflow for the processing of the data is described in this section. As described in the introduction, a sliding window approach is implemented where a set of CAN-bus messages is used to generate a sample for the data analysis process. The number of CAN-bus messages used to create the sample is defined by the parameter W s in the rest of this paper. W s is the window size used for the analysis. For example, a window value of W s = 24 generates a sample of size 192 bytes because each CAN-bus message in the data set has a data payload of 8 bytes. On the basis of the results from literature [13], which provides an indication of the suitable range of window size, we identified four different values of the window size W s : 24,72,120 and 168. The trade-off is that a larger W s may decrease the reaction time while a smaller window size may provide a lower detection accuracy.
These assumptions are evaluated in Section 4. In a similar way to what it has been done in the literature, a sample of size W s is considered normal/legitimate (Note: the terms legitimate traffic and normal traffic have the same meaning in the rest of this paper.) if the sample contains only normal CAN-bus messages. The sample is considered malicious (e.g., an attack is being implemented) if it contains even a single CAN-bus message labeled as malicious in the data set. The use of a moving window allows a faster detection of the attack as the CAN-bus messages are processed in 'batches' (i.e., the samples) rather than a single CAN-bus message at the time. In this paper, the choice is to avoid overlapping among samples: no CAN-bus message belongs to two samples at the same time. The reason for this choice is to foster time efficiency as overlap would obviously increase the detection time. On the other side, the proposed approach can be easily extended to overlapping samples, where the percentage of overlapping become an additional hyperparameter in the analysis.
There are three main phases in the application of the approach proposed in this paper: the normal traffic estimate, the training phase and the operational detection phase. This paper is mainly focused on the normal traffic estimate and the training phase, but the evaluation of the detection phase is also performed. The evaluation of the classification performance of the different hyperparameters is quite similar to the workflow presented in [13] with the difference that the analysis is performed on the payload rather than the CAN-ID and no meta-heuristics algorithms are used to identify the optimal hyperparameters as it is preferred to present the graphs showing the impact of each hyperparameter and leave the choice of the optimal values to the IDS designer. Each phase and the related steps are described in the following paragraphs.
The workflow of the normal traffic estimate is described in Figure 2. This phase is executed only on data labeled as normal. The workflow is composed by the following steps.

•
Step 1. The normal traffic portion of the data set is split in non-overlapping windows. Each window is composed by several CAN-bus messages equal to W s .

•
Step 2. For each window, the value of each Entropy Measure H(j) i is calculated where j is the identifier of the window and i is the identifier of the Entropy Measure. This step is repeated until all the data set has been analyzed.

•
Step 3. For each Entropy Measure i, the mean u i and standard deviation σ i is extracted from all the values of H calculated in the previous step.  The training phase is described in Figure 3 and it is composed by the following steps: • Step 1. The labeled data set is split in non-overlapping windows. Each window is composed by several messages equal to W s . In the rest of this paper, the set of W s messages is also called a sample.

•
Step 2. Each sample is labeled as malicious if it contains at least a CAN-bus message, which was initially labeled as malicious. If all the messages are labeled as legitimate, then the sample is labeled as legitimate.

•
Step 3. For each sample j and each Entropy Measure i, the value H(j) i is calculated.

•
Step 4. For each sample j and each Entropy Measure i, the value of H(j) i is compared against the mean u i and standard deviation σ i . If the difference between H(j) i and u i in absolute value is less than a threshold, the sample is predicted as legitimate, otherwise, it is considered malicious. These conditions are formally defined in Equations (1) and (2)   The operational Detection phase is described in Figure 4. It is based on the previous phases as it is the phase where the IDS in the vehicle monitors the in-vehicle network traffic to detect attacks. This phase is composed by the following steps: 1.
Step 1. The payload data is extracted from the CAN-bus message data set from a set of sequential messages. 2.
Step 2. Samples are generated by collecting a set of W o CAN-bus messages. W o is the optimized window size. 3.
Step 3. The entropy measures identified as optimal are used to calculate H(j) o from each sample j.

4.
Step 4. It is checked if H(j) o is within the range defined by the optimized threshold Fac(o) thr as described in Section 3.4. 5.
If the previous step 4 shows that H(j) o is out of the threshold range, an attack is reported and logged.  A significant design choice is related to the portion of the data set, which are used to estimate the mean and the standard deviation (i.e., the Normal Traffic Estimate described in Figure 2) against the training phase where the optimal values of the hyperparameters are calculated (i.e., described in Figure 3). Although other papers in the literature identify a specific ratio (e.g., half for training and half for testing), this paper evaluates the impact of the size of the training and test set, which is expressed with the parameter R TT , which is defined as the ratio of the portion of the data set used for Normal Traffic Estimate against the overall data set (in this case, it is used only the traffic labeled as normal). Then, the training/hyperparameters evaluation and the detection phase is performed using the remaining (1 − R TT ) fraction of the entire data set. The potential trade-off (to be evaluated in the Results Section 4) is that a larger training set requires more training time but it may lead to a more accurate detection of the malicious traffic.

Performance Metrics
The performance metrics to detect an attack are similar to what is used in the literature: Accuracy, Precision and Recall related to a binary classification problem between legitimate traffic and attacks. Then, a True Positive (TP) is when a traffic sample (i.e., set of CAN-bus messages in a window of size W s ) is predicted by the algorithm as legitimate and it is true that it is legitimate. A False Positive (FP) is when a traffic sample is predicted to be legitimate, but it is actually malicious (i.e., part of an attack). A False Negative (FN) is when a traffic sample is predicted to be malicious, but it is actually legitimate. Finally, True Negative (TN) is when a traffic sample is predicted to be malicious and it is indeed malicious.
A traffic sample of size W s is considered normal/legitimate or malicious respectively on the condition defined in the following equations: where i is the identifier of the Entropy Measure and Fac thr is one of the hyperparameters to define the threshold factor, which discriminate between normal and malicious traffic.
The main goal is to maximize the number of correctly predicted traffic samples on the overall data set, which leads to the definition of accuracy as: Another goal is to minimize the number of FP and FN (or maximize their inverse) and in particular FP as it is more dangerous that the algorithm wrongly predicts legitimate samples as malicious than the reverse. This leads to the other two metrics used in this paper: These metrics are used to evaluate the proposed algorithm in relation to the variation of the various hyperparameters described in the previous sections and subsections: R TT , W s , Fac thr , type of attack and the parameters defined for the entropy measures.

Entropy Measures
This section describes the entropy measures, which are adopted for the analysis presented in this paper. Beyond the classical or textbook definition of the entropy measures, the focus of this section is to identify the key hyperparameters in the definition of the entropy measure, which could impact the detection of the attack. In addition, in some cases, there are constraints on the length of the time series on which the entropy measure must be applied, which are discussed in detail in Section 3.13.

Shannon Entropy
For reproducibility of the results presents in this paper, the entropy MATLAB function from MATHWORKS was used.

Renyi Entropy
where p(x i ) is the probability p(x = x i ). The limit for α −→ 1 is the Shannon Entropy defined above.
In this paper, we adopt the values of α = 2, 3, 4 as these are the range of values used in the literature [9]. For reproducibility of the results presents in this paper, the MATLAB implementation of the Renyi Entropy available at [28] was used.

Permutation Entropy
PeEn was introduced by Bandt and Pompe in their seminal paper [14]. The concept is to define an entropy measure, which takes into account the time causality of the time series (causal coarse-grained methodology) by comparing neighboring values in a time series. Then, PeEn is the Shannon Entropy of a sequence of ordinal patterns-the latter being discrete symbols that encode how consecutive time series entries relate to one another in terms of position and value and it is defined by the following equation: where p i represents the relative frequencies of the possible patterns of symbol sequences, termed permutations. The permutation is related to a sequence of m (embedding dimension) values of the original series. A time delay τ can be used in the generation of the permutations from the original series, but for simplicity we set the value of τ = 1 in this paper, while the value of m is an hyperparameter to be optimized. Additional details on the definition of the PeEn are provided in [14]. For reproducibility of the results presented in this paper, the MATLAB implementation of the Permutation Entropy provided by the authors in [29] was used in this paper.

Dispersion Entropy
DiEn was recently introduced in [21] and it addresses the potential weakness of PeEn where the mean value of amplitudes and differences between amplitude values are not considered in its definition. In dispersion entropy, the initial series X = x i , x i+1 , ..., x N is mapped to c classes. Although this mapping can be implemented with various linear or non-linear approaches, the authors in [21] propose to use Normal Cumulative Distribution Function (NCDF) to map × to the c classes. Then, the implementation of the DiEN is similar to PeEn, with the generation of dispersion patterns rather than permutations and with the calculation of the probabilities p(π j ) on the basis of an embedding dimension m and the time delay τ. As in the case of PeEn, we set τ = 1 for simplicity. Then, the Shannon Entropy (ShEn) is applied to the probabilities of the dispersion patterns in a similar way to the implementation of PeEn where permutations are used: It is important to note that the number of possible dispersion patterns that can be assigned to each time series is set to c m as this links two main hyperparameters in the application of DiEn and creates a constraint on such values because c m < N (where N is the number of CAN-bus messages payload bytes in a sample in our specific problem). This is further discussed in Section 3.13.
For reproducibility of the results presented in this paper, the MATLAB implementation of the Dispersion Entropy provided by the authors in [19,29] was used in this paper.

Approximate Entropy
ApEn was initially proposed by Pincus in [30] and it is related to the predictability or regularity of a time series. It was devised as an approximation of the Kolmogorov entropy of an underlying process.
The algorithm to define ApEn is a search for the repetitive patterns of length m commencing at sample i in which the distance induced by the maximum norm differs up to an error threshold r.
Then, ApEn is defined by the following equation: where: with C i m (r) is the number of vectors x i ∈ m such that the distance d(x i , x j ) < r and x i = For reproducibility of the results presented in this paper, the MATLAB implementation of the Approximate Entropy provided by the authors in [31,32] was used in this paper.

Sample Entropy
SaEn was defined as an evolution of ApEn with the objective to solve the bias of ApEn due to counting self-matches and it was shown to exhibit better statistical properties than ApEn in many cases [33]. It is computed in a similar fashion than ApEn described in the previous Section 3.10, but the final step of calculating SaEn becomes: where B m (r) is defined as the mean of the number of vectors x i ∈ m such that the distance d( For the reproducibility of the results presented in this paper, the MATLAB implementation of the Sample Entropy provided by the authors in [33,34] was used in this paper.

Fuzzy Entropy
This paper also applies the Fuzzy Entropy (FzEn) defined by the authors in [35], where is reported that even if SaEn is slightly faster than FzEn, the latter is more consistent and less dependent on the data length of the series where it is applied. Fuzzy Entropy is based on similar concept of SaEn and ApEn but the number of vectors which satisfy the distance condition in comparison to the tolerance r is calculated using a fuzzy function of this form: where p is set to 1 in the analysis done in this paper.
Then the FzEn is calculated as: where α is related to m+1 and β is related to m For reproducibility of the results presented in this paper, the MATLAB implementation of the Fuzzy Entropy provided by the authors in [34] was used in this paper.

Choice of the Hyperparameters for the Entropy Measures
Each entropy measure identified above is based on the definition of hyperparameters. This section discuss the choice of the value of the hyperparameters in relation to the length N of the sequence of windowed data. In this specific proposal, the value of N in a sample is defined by W s × 8 because all the CAN-bus messages in the data set have a full payload of 8 bytes. Apart from ShEn, all the other entropy measures are based on various hyperparameters: α for ReEn, the embedding dimension m for PeEn, DiEn, SaEn, ApEn and FzEn, the parameter r for SaEn, ApEn and FzEn and the parameter c for DiEn. In general, the studies presented in the literature [14,30,36] recommend that N >> m, to have a significant number of patterns to estimate the entropy. DiEn is also based on the parameter c or number of classes. In [19], the authors recommend that N > c m . An increase on the value of m does also increase the computing time in the application of entropy features, which goes against the principle of this paper to make the detection process time-efficient. The values of m = 2, 3 are used as shown in the Results Section 4. Then, ApEn, SaEn and FzEn are also based on the tolerance or threshold value r. In literature [33], it is recommended to use values of r > 0.1 * σ(X N ) where σ(X N ) is the standard deviation of the series X N of payload bytes in a sample, but such value of r may change depending on the specific characteristics of X N . Then, in this paper, the value of r is determined using the approach presented in [36] and a similar analysis has been conducted to evaluate when the maximum values of the Approximate Entropy are reached in relation to the ratio between the r and the standard deviation. One example on the application of the Approximate Entropy with window size W s = 72, R TT = 3/4 and the Gear attack is shown in Figure 5. Similar results are obtained for the other attacks and the other window sizes. Then, the values of r/σ in the range (0.01, ..., 0.04) are used for the Approximate Entropy. A similar approach is used for SaEn and FzEn and a value of r/σ > 0.1 is used. Finally, the values of α for ReEn are (1-4) as they are the same values adopted in [9] and it was experimentally found that values of α greater than 4 did not significantly improve detection.  Then, on the basis of the previous definitions of the entropy measures, we identify in Table 2 the list of the features used in the study with the related hyperparameters. As it can be seen by Table 2, a total of 34 features are defined and used to calculate the results presented in Section 4. This is a much larger set of entropy measures than what it is used in the literature for in-vehicle detection attacks.

Results
The aim of this section is to provide the results on the approach for the four different attacks (DoS, Gear, Fuzzy, RPM) present in the data set based on the different hyperparameters. Most of the analysis is conducted for the normal traffic estimate and the training phase with the objective to identify the optimal hyperparameters (i.e., Fac thr , RR TT and feature id) over the entire data set. This objective is the focus of the first three subsections of this Section: Sections 4.1-4.3. In detail, Section 4.1 evaluates the classification performance of each entropy measure for all the four attacks. Section 4.2 provides the precision and recall for each entropy measure for a specific values of the threshold Fac thr by changing the values of the window size W s and the ratio R TT . An example of the obtained values for False Positives and False Negatives is also provided. Section 4.3 analyzes the impact of W s and R TT on the classification performance of the Approximate Entropy, which is identified in the previous section as an optimal feature from the classification performance point of view. Then, the Section 4.4 provides the results for the detection phase, where the entire data set for all the four attacks is split in three portions for each phase on the basis of the ratio value R TT : (1) the first portion, which is equal to the fraction R TT of the entire data set is used for Normal traffic estimate, (2) half of the (1 − R TT ) portion of the entire data set is used for Training and (3) the remaining half of the (1 − R TT ) portion of the entire data set is used for the Detection phase. Finally, the last subsection 4.5 describes the computing resources used in the analysis and provide the computing time for each of the three phases.

Evaluation of the Accuracy for Each Entropy Measure
In an initial step, it was evaluated how the entropy measures change when an attack is executed. It was found that some entropy measures provides more discriminating power in comparison to other entropy measures. In the proposed approach, this means that the range of values reported by the application of the specific entropy measure is significantly different in the legitimate traffic in comparison to the malicious traffic. An example is shown in Figure 6a,b respectively for the Dispersion Entropy and the Approximate Entropy in the presence of the Gear attack for W s = 72 and for R TT = 3/4. The figures only show a small segment of the overall in-vehicle traffic, which has been evaluated. The pink (or light gray in a b/w representation of this paper) bars represent the set of messages when the attack is implemented (i.e., malicious traffic), while the plot shows the calculated entropy measure. It can be seen that these two specific entropy measures have a significant discriminating power because the range of values is significantly different in the legitimate traffic from the malicious traffic: the mean of the entropy measure in the presence of legitimate traffic is quite different from the mean of the entropy measure in the presence of malicious traffic. Then, an high detection accuracy is possible even for relatively small values of Fac thr .  Not all the entropy measures provide the same clear visual distinction between legitimate and malicious traffic. Then, an extensive analysis of all the 34 entropy measures for the four attacks was performed. Figure 7a,b show the accuracy of the proposed approach at the variation of the parameter Fac thr between 0 and 4 in 0.1 steps of σ i for the DoS attack and R TT = 3/4. Because of the large number of features, two pictures were created: Figure 7a for entropy measures feature id from Id = 1 to 19 and Figure 7b for entropy measures feature id from Id = 20 to 34. The results are consistent with the literature where a low value of the threshold Fac thr leads to a limited detection performance. It can be seen that for Fac thr approaching 4, the detection accuracy is very high, and it eventually reaches almost 100% detection accuracy. This is also consistent with literature because DoS attacks impact significantly the entropy values calculated on the in-vehicle traffic. The figures shows also that most of the entropy measures exhibit a similar detection performance with the significant difference of the Dispersion Entropy with c = 4 both with m = 2 (Feature Id = 6 in Figure 7a) and m = 3 (Feature Id = 21 in Figure 7b). A potential explanation of this deviation is that with C = 4 the calculation of the Dispersion Entropy is noisier than with c = 3 thus leading to a divergent behavior.    For high values of thresholds both DoS and Fuzzy attacks can be detected with almost 100% accuracy obtaining similar results to ones obtained on the same data set with more sophisticated techniques like Deep Learning [27]. The reason is that both attacks generate traffic, which is significantly different from the normal in-vehicle CAN-bus traffic and entropy measures are able to detect such anomalous behavior if the threshold is large enough.  In a similar way to the DoS attack, the values of accuracy for each feature id are reported for three selected values of Fac thr in Figure 10 for the Fuzzy attack.

DOS
As shown in the following results, the detection of spoofing attacks is more challenging because the injected malicious messages are quite similar to legitimate operations (e.g., related to the functioning of the gear). 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  This assumption can be validated by the results presented in Figure 11a,b for the Gear attack. One initial observation is that increasing the threshold value Fac thr to the limit of 4 does not always lead to the optimal detection accuracy for all the entropy measure as some entropy measures presents an optimal values well below Fac thr = 4. Then, these results are significant, because they show that the value of Fac thr must be chosen in an appropriate way. The second and more important observation is that the detection performance of each entropy measure is significantly different from each other. In particular, Approximate Entropy with values of m = 2 and r = 0.02 and r = 0.03 (respectively Id = 9 and Id = 10) and Approximate Entropy with values of m = 3 and r = 0.02 and r = 0.03 (respectively Id = 24 and Id = 25) are able to reach almost 100% detection accuracy for Fac thr = 4 (which is also their optimal value) while the proposed approach based on specific entropy measures is not able to reach an high detection accuracy. The Dispersion Entropy measures are able to reach an higher classification accuracy than the other entropy measures for low values of Fac thr , but then they reach a plateau around 90% detection accuracy even if the values of Fac thr is increased to the maximum value of 4. The classification based on Sample Entropy provides the worst results among all for this specific type of attack in particular for values of m = 2. The Shannon Entropy and Renyi Entropy used in the literature [9] are in the middle of a ranking of the entropy measures and they exhibit an optimal detection accuracy for a value of the threshold Fac thr slightly above 2. It is noted that the Dispersion Entropy has the best accuracy for relatively low values of Fac thr , but then it reaches a peak and the accuracy decreases for increasing values of Fac thr as many other features. The reason for the behavior that the accuracy reaches a maximum and then decrease of higher values of Fac thr is that an increase of the value of Fac thr forces the algorithm to include samples containing CAN-bus messages of the RPM and Gear attacks. Because Fac thr is related to the standard deviation of the legitimate in-vehicle CAN-bus traffic, this behavior can be explained by looking again at the example of entropy measures of Dispersion Entropy and Permutation Entropy shown in Figure 6. In particular, Figure 6a shows the large variation of the values of Dispersion Entropy in normal traffic. Then, larger values of Fac thr may include sample related to attacks causing the algorithm to lose accuracy as the number of FP may increase. This may explain why the accuracy plot in Figure 11 reaches a maximum and then slowly degrades. On the other side, Figure 6b for the Approximate Entropy shows that the values of the calculated entropy are in tight range (i.e., small values of standard deviation). Then, even when Fac thr is approaching the value of 4, the algorithm can discriminate with high accuracy legitimate samples from samples containing malicious CAN-bus messages.  The detailed values of the accuracy for Fac thr equal to 3.0, 3.5 and 4.0 are shown for each feature in Figure 12. 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  Similar results are obtained for the other spoofing attack: the RPM attack as shown in Figure 13a,b. The choice of the entropy measure affects significantly the accuracy performance, with the Fuzzy Entropy measures performing worse than the other entropy measures and the Approximate Entropy providing the best accuracy.

GEAR attack
To summarize, the Gear and RPM attacks are more difficult to identify in comparison to the DoS and Fuzzy attack. In addition, Gear and RPM attacks require tuning and careful choice of the entropy measure and the optimization values of Fac thr because some entropy measures are never able to reach very high accuracy (e.g., 99%) even for high thresholds and not necessarily the highest value of Fac thr is able to provide the optimal detection accuracy. To highlight more these significant results, Table 3 presents the optimal values for each entropy measure and the corresponding value of Fac thr where the optimal accuracy value is obtained. We note that these results were obtained with W s = 72 and R TT = 3/4. Similar results were obtained with the other values of W s and R TT but they are not provided here for lack of space. The impact of W s and R TT is investigated in the next subsections.  As for the previous results, Figure 14 shows the detailed values of the accuracy for Fac thr equal to 3.0, 3.5 and 4.0 for the RPM attack.

Recall and Precision for the Gear Attack
The aim of this subsection is to analyze how the recall and precision changes at the variation of W s and R TT for each of the entropy measures for the specific value of the threshold Fac thr = 2.
The Bar Figure 15a,b provide respectively the recall and precision for the Gear attack for each of the entropy measures for Fac thr = 2 and W s = 72 at the variation of the parameter R TT . Please refer to Table 2 for the description of each entropy measure associated with the specific Id appearing on the X axis of Figures Figure 11), which shows that precision and recall can vary greatly among the entropy measures and the specific entropy measure must be carefully selected. The Figure 15a,b show that the balance between the size of the training set and the test set impacts both metrics but in particular the precision. It can be seen from Figure 15b that a larger training set (e.g., increasing value of R TT ) provides higher values in a consistent way across all the entropy measures. This result indicates an important design decision as a larger value of RR TT may provide more stable values of u i and σ i to support a more stable choice of the hyperparameters and an improved detection accuracy. In particular, the precision (as indicated before) is probably more relevant than the recall in this particular detection problem, as the goal is to minimize the FP where an intrusion is wrongly detected an legitimate traffic thus allowing the attacker to implement the cybersecurity threat. On the other side, Figure 15a shows that such trend is not the same across all the entropy measures. For example, the accuracy obtained with Feature Id = 10 (ApEn, m = 2 and r/σ = 0.03) does not change significantly. In addition, as shown more in detail in Section 4.3, the improvement in classification performance due to the R TT depends both on the entropy measure but also the value of the threshold Fac thr . Then, all these factors should be taken in consideration.  In another phase of the study presented in this paper, the impact of the window size was evaluated. As in the previous case, only one specific attack is presented for space reasons. The Bar Figure 16a,b provide respectively the recall and precision for the Gear attack for each of the entropy measures for Fac thr = 2 and R TT = 3/4 and by changing the size of the window W s . The size of window size is another important hyperparameter: a small sample size may require more time for training as the data set is segmented in a greater number of segments on which the entropy measure must be calculated (thus requiring more time), but it may provide higher detection accuracy because the CAN-bus messages related to a cybersecurity attack would have in percentage more weight in the sample. The latter aspect is confirmed by the Figure 16a,  To complement the previous Figure 16a,b and to provide an independent evaluation of FN and FP, the following Figure 17a

Evaluation of Accuracy in Relation to R Tt and W s at the Variation of Fac Thr
This subsection shows the impact of the value of the threshold Fac thr both for R TT and W s . Two specific entropy measures (Id = 5 and Id = 10) are selected in relation to the specific GEAR attack.
The following Figure 18a,b provide the plots respectively for the Feature Id = 5 (Dispersion Entropy) and Feature Id = 10 (Approximate Entropy) for different values of the ratio R TT and W s = 72. Two main observations can be derived from Figure 18a,b. The first one is that the optimal value of Fac thr changes considerably with the value of R TT for both entropy measures (similar results are obtained for the other entropy measures but they are not displayed here for lack of space). Then, the combination of R TT and Fac thr must be carefully identified. The second observation confirms the previous results that the optimal detection accuracy is obtained with high values of R TT . The larger is the portion of the data set is used to calculate mean and standard deviation and more accurate is the detection.  The following Figure 19a,b provide the graphs respectively for the feature Id = 5 (Dispersion Entropy) and feature Id = 10 (Approximate Entropy) for different values of the window size W S and R TT = 3/4. In this case, the results shows that the impact of the W s can be different across entropy measures and the optimal values are obtained through a proper combination of W s with the entropy measure. In fact, in Figure 19a a smaller window size W s = 24 provides less detection accuracy than larger windows (e.g., W s = 168) for all the values of Fac thr . For Figure 19b, a small window size of W s = 24 is able to provide the best accuracy for most of the values of Fac thr apart from values near 4, where larger windows sizes are more effective. A potential explanation for this behavior is related to the characteristics of each entropy measure. In particular Dispersion Entropy requires longer time series related to the C m condition to provide correct results while Approximate Entropy can correctly estimate entropy with shorter time series.

Detection
This section provides the results for the detection phase. Although the previous sections on the training was conducted on the entire data set, the evaluation of the detection phase is performed by splitting in half the remaining of the data set (1 − R TT of the entire data set), which is not used for the normal traffic estimate (R TT of the data set). For example, if a value of R TT = 1/2 is used, the first half of the data set is used for the normal traffic estimate, one quarter is used for training and one quarter is used for detection. The calculation has been performed for all the different attacks (i.e., DoS, Fuzzy, Gear, RPM), for all the different sizes (W s = 24, 72, 120, 168) and for different values of R TT .
The results of the analysis are provided in Figure 20, while the values of the reported accuracy for all the attacks and W s = 72 are shown in Table 4 together with the optimal feature id and the optimal Fac thr from the Training Phase. In particular, Figure 20a-d show the accuracy respectively for the DoS, Fuzzy, Gear and RPM attacks at the variation of R TT . The results are consistent with the results presented in the previous subsections of this section where lower values of R TT can provide a relatively low accuracy for the detection of the in-vehicle attack. When the amount of data used for the normal traffic estimate is larger (e.g., values of R TT higher than 0.5) the accuracy increases significantly. This trend is similar for all the attacks. It is noted that the accuracy has a sharp increase in particular for the Gear and RPM attacks (Figure 20c,d), which are more difficult to detect. Although this is consistent with the other results, it can also be based on the consideration that for such high values of R TT , the driving circumstances were very similar for the training and the detection phases; then it is easier for the algorithm to detect attacks because the optimal features and thresholds used for the detection were calculated in similar driving circumstances, thus explaining the very high accuracy. When the training and detection phase are based on the analysis of a relative large set of data (lower values of R TT ), the driving circumstances may be different thus lowering the detection accuracy. Future developments of the research presented in this paper, could evaluate methods of statistical analysis, which take in consideration and identify different optimal features and thresholds for different driving circumstances. Such analysis could be quite complex because it must take in consideration the range of different driving circumstances, identify in which driving circumstances the detection is currently executed and it must choose the appropriate optimal features and thresholds. This complex analysis is out of the scope of this paper and it is reserved for future developments (see Conclusions Section 5). Table 4 provides additional information to the Figure 20 as it identifies the optimal feature ids and values of Fac thr from the training phases, which are used for the detection phase. The results are consistent with the previous sections where Approximate Entropy (feature Id = 24) and Dispersion Entropy (feature Id = 21) provides optimal results. Shannon Entropy is also the optimal feature id for the DoS and Fuzzy attacks. The optimal values of Fac thr are generally high (more than 2.9), which is also consistent with the previous results show in Section 4.1.

Computing Resources Used to Perform the Study
This section describes the computing resources used to perform the analysis and the time needed for each of the three phases for three specific ratios of the used data set. The computing platform used in the study is a mass market laptop with processor unit i7 8550U CPU 1.8 GHz with 8 Gigabytes of RAM. Table 5 shows the computing time for selected attacks and parameters (R TT = 1/4) (W s = 72). The provided times in Table 5 are based on the processing of the entropy features already calculated on the CAN-bus messages. The calculation of the entropy measures is estimated to be 40 s for 10,000 CAN-bus messages for the Normal Traffic estimate and Initial Training phase, while it is in range 0.45 s to 1.6 s for 10,000 CAN-bus messages in the Detection phase because a single entropy measure (the optimal entropy measure calculated from the Training phase) has to be calculated in this phase. From these values and the values reported in Table 5, it can be concluded that the Detection phase can be quite fast even using a mass market processor unit, while the Training phase can take considerable time because of the need to calculate the performance of all the potential entropy measures across a wide range of thresholds. From a practical deployment point of view, the Normal Traffic estimate and Training phase could be performed by a powerful cloud computing facility, while the detection phase must be performed in the vehicle itself.

Conclusions
This paper has evaluated the application of different entropy measures using a sliding window to the analysis of the payload of CAN-bus messages to identify in-vehicle attacks. Even if this approach has already been proposed in the literature where it has shown to be very time-efficient in comparison to other approaches, literature results have mostly focused on specific attacks or specific entropy measures. The analysis presented in this paper is based on an extensive range of entropy measures and different values of hyperparameters: window size, threshold range and the parameters, which are part of the definition of the entropy measures themselves (e.g., embedding dimension). The results show than an adequate selection of the entropy measures and the value of the hyperparameters can provide a very high detection accuracy. The results are based on a public data set with millions of records and four different attacks and they can be used to support further research in this area.
Future developments will investigate a more complex analysis, which takes in consideration the specific driving circumstances and identify different sets of the parameters (mean, standard deviation, feature and thresholds) for each driving circumstance, which are then applied to the detection phase.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: