A Novel Detection Approach of Unknown Cyber-Attacks for Intra-Vehicle Networks Using Recurrence Plots and Neural Networks

Proliferation of connected services in modern vehicles could make them vulnerable to a wide range of cyber-attacks through intra-vehicle networks that connect various vehicle systems. Designers usually equip vehicles with predesigned counter-measures, but these may not be effective against novel cyber-attacks. Intrusion Detection Systems (IDSs) serve as an additional layer of defence when conventional measures that are implemented by the designers fail. Several intrusion detection techniques have been proposed in the literature but these techniques have limited capability in detecting novel cyber-attacks. This paper proposes a new Machine Learning (ML)-based IDS for detecting novel cyber-attacks in intra-vehicle networks, specifically in Controller Area Networks (CANs). The proposed IDS generates high-level representations of CAN messages transmitted on the bus exploiting their temporal properties as well as the intra and inter message dependencies through the use of Recurrence Plot (RP), which are then fed into a bespoke Neural Network, designed and trained to detect novel intrusions. Evaluation of the performance of the proposed IDS in comparison with that of the state-of-the-art existing IDS schemes demonstrates the superiority of the proposed IDS.


I. INTRODUCTION
The ubiquitous nature of emerging V2X connectivity systems offers novel services and applications that enable advanced functions and features for modern vehicles, such as Advanced Driver Assistance Systems (ADAS), infotainment, productivity and maintenance services. For a safe, efficient, and comfortable operation of modern vehicles, data is transmitted through intra-vehicle and over inter-vehicle networks depending on system-level requirements [1]. This provides new cyber-attack surfaces for potential intrusions into the vehicle systems, which can put road users' lives at risk if exploited by malicious agents [2], [3].
Modern vehicles are complex cyber-physical systems that embed different components, including Electrical Control Units (ECUs), sensors and actuators. The Controller Area Network (CAN) forms the communication backbone of most vehicles over which these components exchange data. Unfortunately, the CAN protocol has inherent cybersecurity vulnerabilities due to the lack of authentication mechanisms and the broadcasting nature of its communication method. These vulnerabilities can make vehicles subject to a wide range of cyber threats (e.g., fuzzing and Denial of Service (DoS) attacks) [4]. In addition, the external V2X connectivity of vehicles could expose their intra-vehicle networks to remote attacks, allowing hackers to get access to safety-critical sub-systems of connected vehicles (e.g., braking system). Supplementing the functionality of conventional security measures (e.g., encryption algorithms), Intrusion Detection Systems (IDSs) serve as real-time monitoring systems that are becoming an integral part of the security architecture of modern vehicles to detect cyber-attacks which may have evaded conventional countermeasures. Intra-vehicle IDSs can be categorised as flow-based, payload-based or hybrid [5]. Whereas a payload-based IDS inspects the content of messages to detect potential intrusions, a flow-based IDS examines the transmission patterns of messages. Generally, flow-based IDSs are suitable for detecting intrusions that affect the frequency and order of messages (e.g., [6], [7], [8]) but their performance is inadequate when the attack affects the content of the messages [5] while payload-based IDSs (e.g., [9], [10], [11]) are effective to such attacks but display weaknesses in detecting attacks that affect the timing and sequence of messages (e.g., message injection attack) [5]. Combining the aforementioned categories, hybrid IDSs aim to combine the strengths of both approaches [5].
Most recently, Machine Learning (ML)-based techniques have been studied in the literature. Unlike rule-based IDSs, where the system designers hard-code the rules to detect intrusions, deep learning models are designed to learn and detect abnormal patterns from training datasets that contain samples of both the normal operation of the system and when under attack. A detailed review of the existing ML-based IDSs as well as their strengths and weaknesses are given in Section II.
This paper presents a novel ML-based hybrid IDS for detecting novel attacks in intra-vehicle networks (i.e., CAN) with a typical deployment such as in Fig. 1. The proposed IDS extracts representative features from the data by looking at the relative local context of a subject message by generating a 2D representation based on the Recurrence Plot (RP) concept. The content of the subject message together with its relative time, with respect to the previous message, are then fed into a Long-Short Term Memory (LSTM) neural network while the generated 2D representation is fed into a Convolutional-LSTM. The main advantage of the proposed technique stems from its ability to combine the learning of the intra-message data dependencies and inter-messages temporo-contextual dependencies, thus, enhancing the learning of the detection model as demonstrated by the improved Key Performance Indicators (KPIs), such as detection accuracy, compared to the state-of-the-art ML models. Our performance evaluation results demonstrate that the proposed IDS outperforms the state-of-the-art ML techniques, in terms of its ability to detect novel cyber-attacks.
The main contributions of this work can be laid out as follows: r The design of a new ML-based IDS for detecting novel cyber-attacks in intra-vehicle networks, specifically in the CAN of modern vehicles. r A method to generate inter-messages representative features based on RPs to capture the required temporocontextual dependencies of messages.
r An evaluation of the performance of the proposed IDS in comparison with state-of-the-art IDSs, showing the superiority of the proposed method. The rest of the paper is structured as follows: Section II provides a comprehensive review of the state-of-the-art of the related ML-based IDSs. After a short overview of LSTMbased neural networks and RP, Section III details the proposed detection model and positions it within the current landscape of IDSs. Section IV compares and discusses the performance of the proposed technique against the state-of-the-art solutions. To conclude, Section V summarises the findings of this paper and indicates potential future research directions.

II. RELATED WORK
In this section, we discuss related studies, provide an overview of their approaches, as well as their advantages and limitations. Finally, we articulate how the proposed work in this paper fits into the current landscape.
The concept of intrusion detection in intra-vehicle networks was first coined by Hoppe et al. [12]. Since then, a tremendous effort has been carried out to improve IDS KPIs with a notable shift towards ML approaches enabled by the increase of available processing power and promising initial results. Current ML techniques are used at various levels or interfaces to leverage and learn semantic representations of data to detect anomalies. At the data link layer, two natural directions of research were taken based on the payload and flow of data.
Several researchers have developed IDSs for intra-vehicle networks. Levi et al. [13] described a Hidden Markov Model (HMM)-based detection system trained on data collected from vehicles. The trained HMM together with a regression model are used to detect anomalies from the normal expected operation. Their approach monitors different interfaces (communication, CAN and operating system interfaces) across the system, extracts relevant pieces of information based on configurable rules and sends them to a trained model to detect anomalies. A configurable data collector provides a higher level of data abstraction (i.e., events), by modelling the time series data to states, which has an inherent noise-filtering effect and eliminates the need to retrain the model. The objective of the regression model is to calibrate the likelihood threshold for detecting anomalies. Choi et al. [14] went into a different direction and proposed VoltageIDS, an automotive IDS, leveraging the fact that electrical signals used to transmit CAN messages depend on the physical configuration of the network such as cables length, true value of termination resistors, true voltage values of bit zero and one, for example. The operation of VoltageIDS in a CAN is constituted of three phases, namely, the feature extraction, the feature selection, and the intrusion detection phase. In the feature extraction phase, the VoltageIDS extracts 60 features from the electrical signal of normal CAN messages which are then filtered out by the feature selection phase, selecting only the most significant features. In the intrusion detection phase, the VoltageIDS builds a supervised ML multi-class classifier (e.g., Support Vector Machine) using attack-free CAN data. When deployed, the multi-class classifier predicts the class label (i.e., normal or intrusion) of messages.
Kang and Kang [15] built a Deep Neural Network (DNN)based IDS trained on high-dimensional features extracted from bit streams of CAN messages. Song et al. [16] adopted Inception Resnet to develop an IDS for CAN. A dataset composed of CAN messages transmitted on a CAN bus of a real vehicle was used to evaluate the proposed system with results outperforming conventional ML methods. Lin et al. [17] developed an IDS based on the Visual Geometry Group (VGG)-DNN. Taylor et al. [18] proposed a LSTM neural network for detecting cyber-attacks in intra-vehicle network, including interleave, drop, discontinuity, unusual and reverse attacks. In the same vein, Loukas et al. [19] presented an LSTM-based IDS but it focused on attacks that are particularly meaningful for robot vehicles.
Martinelli et al. [20] used four Fuzzy algorithms applied to eight features (the eight data bytes of the data field in a CAN message) to detect cyber-attacks. The authors showed that the fuzzy classification algorithms can achieve high performance in detecting three types of attack, namely, Denial of Service (DoS), Fuzzy and message injection.
In [21], Zhu et al. proposed a literal multi-dimensional anomaly detection approach using a distributed LSTM framework. The proposed model uses both time and data dimensions of CAN messages to detect cyber-attacks. The experimental results showed that the proposed model could accomplish a detection accuracy of ∼ 90%.
Derhab et al. [2] proposed a Histogram-based Intrusion Detection and Filtering (H-IDFS) framework. The proposed framework, first, groups CAN frames into windows and calculates their histograms which are, then, fed into a multi-class classifier to identify windows containing malicious CAN frames. Thereafter, a one-class SVM filters out malicious CAN frames from each malicious window.
Basavaraj and Tayeb [22] designed a DNN-based IDS and evaluated its performance on two real datasets where they achieved a detection accuracy of 98.67% on known attacks.
He et al. [23] proposed Hybrid Similar Neighbourhood Robust Factorisation Machine Model (HSNRFM) for detecting anomalies in the in-vehicle network. Firstly, the HSNRFM performs a dimensionality reduction of the original data, to enhance its robustness. Then, it combines the information of the target message as well as neighbour messages to form the final input features vector of a factorisation ML model used to derive the final prediction value.
Generally, existing studies for intrusion detection in intravehicle networks r Focus on detecting specific types of cyber-attacks, ignoring novel cyber-attacks [5].
r Neglect the context of messages whereby a message that appears as normal in a given context (i.e., message sequence) may appear as abnormal in another one.
r Do not discriminate between malicious and normal frames within a malicious window, which could cause the system to drop all the frames within the window, causing potentially undesirable effects with an impact on the overall safety of the vehicle. Diverging from existing works, we propose an IDS for intra-vehicle networks that generates two independent views of the CAN traffic to detect different types of cyberattacks, augmenting the overall detection capability for both known and novel cyber-attacks. This approach incorporates intra-message features and features derived from the interdependencies among CAN messages captured through RPs.
There are several ways in which the proposed IDS differs from the state-of-the-art: r Use of machine learning: We adopt a ML-based approach to identify potential intrusions, rather than relying on predefined rules or patterns. This allows the system to adapt to changing patterns of normal and anomalous behaviour, and to potentially identify novel attacks that may not have been anticipated by the designers of the system. r High-level representations of CAN messages: The model generates high-level representations of CAN messages using RP, which captures the complex relationships and temporal dependencies among the messages. These representations provide a more detailed and nuanced view of the data than some other methods, which may allow the system to more accurately identify potential intrusions.
r Analysis of individual messages: Unlike some IDS approaches, the operational granularity of the proposed system is at the message level. It is designed to label each individual message as either normal or anomalous, rather than labeling an entire window of messages as a whole. This allows the system to more accurately identify potential intrusions in individual messages, rather than relying on the presence of a pattern of anomalies within a group of messages.

III. PROPOSED APPROACH
In this section, we give some background on CAN, Recurrent Neural Networks (RNNs), with a focus on LSTM, and RP. We also describe in detail the proposed model.

A. BACKGROUND
CAN plays a preponderant role in the communication architecture of modern vehicles. Messages transmitted over a CAN exhibit temporal relationships. For example, stepping on the accelerator pedal allows more air into the engine. The engine control unit senses the increased airflow and acts accordingly by pumping more fuel into the engine. As a result, the vehicle accelerates, and the rotation-per-minute increases. These actions happen in a certain sequence and translate into wellstructured time-series traffic under usual driving conditions. However, the temporal relationships observed between messages in intra-vehicle networks can deviate from these typical patterns when the vehicle is under abnormal conditions, such as cyber-attacks or faults.
Conventional intrusion detection techniques in intravehicle networks are prominently based on the temporal relationships between messages and their content [21]. From a timing perspective, many messages in the CAN are periodic, meaning that they normally appear at a regular frequency and show a sequential pattern [24]. From a data perspective, the data content transported by CAN frames, with the same CAN ID, also exhibits certain patterns and trends under normal conditions. However, the characteristics of these patterns change when the vehicle is under cyber-attack. On the one hand, a DoS attack, where the attacker injects malicious messages into the network at high rates, affects the frequency of the messages and their sequence. On the other hand, an integrity attack, where the malicious agent tampers the data content, affects the data pattern observed within a CAN frame, despite the fact that it might appear to be valid from a timing perspective.

B. DESCRIPTION OF THE PROPOSED MODEL
Based on these established insights, we propose a ML-based IDS to detect cyber-attacks in CAN by looking at both the content transported by a message and its relative context. For this, two views are generated from the received CAN data. The first view is generated from the intra dependencies of one CAN message and the other from its context. Concatenated, these two views form the input feature vector of a dense neural network to classify each message as normal or as an intrusion.
Messages received by a CAN node are timestamped either in hardware or software. This information is not, per se, part of the CAN protocol but it might be deemed important by upper layer protocols if the sequentiality and order of messages arrival is required. In our model, this piece of information proves to be vital as it gives a temporal context to CAN messages. A typical CAN frame is described in Fig. 2 where the most salient features are the arbitration and data fields. In particular, the arbitration field contains an 11-bit or 29-bit subfield, called message ID, identifying and describing uniquely the data field of each CAN message whose maximum length is 8 bytes.
In our model, we call an ordered set of messages a message window and the last message in this sequence the subject message of the message window. To capture the temporal relationship in a message window, the relative-time stamp (RTS) between a message and its predecessor is calculated, and together with the message ID (ID), Data Length (DL) and data fields (D1,..., D8) form the input feature vector (Input 1) of a RNN to generate the first view; thus, capturing the intra-message dependencies, as in Fig. 3.
RNNs are a type of neural networks with the ability to learn temporal relationships in a data sequence. They can be thought of as a sequence of fully connected neural networks where the state at time t of an RNN, a <t> , is updated based on the current input, x <t> , and previous state, a <t−1> , through weight matrices W aa and W ax . The output y <t> , at time t, follows a standard calculation and is based on the value of the current state, a <t> , and weight matrix W ya . These weight matrices are shared through the sequence as shown in Fig. 4.
RNNs can be used to detect intrusions or cyber-attacks in CAN. For example, it is possible to consider CAN messages in a message window as a data sequence. The likelihood of classifying a subject message, as intrusive or normal, depends on the information collected from previous messages (i.e., prior states) and the subject message (i.e., the current input). Generally speaking, conventional RNNs have limited capabilities in capturing long-term temporal-dependencies in long data sequences due to the vanishing gradient problem, which refers to the exponential decrease in the gradient when updating the weight matrices through back-propagation [19]. An LSTM can address the vanishing gradient problem by introducing gates to control the flow of information shared within an LSTM unit [25]. Traditionally, three gates are in use: the forget gate, the input gate and the output gate, activated by f t , i t and o t respectively as in Fig. 5. Of particular importance is the forget gate as it controls how much information from the previous state is passed to the current state.
The proposed model generates the second view by taking a message window as input to create a 2D texture using the notion of RP. Eckmann et al. [26] proposed RP as a method to visualise recurrent states of dynamical systems [27]. Often, the current state of a dynamical system, represented by a geometrical manifold in the appropriate space, is followed by one future state governed by some transition rule that describes the evolution of the states of the dynamical system over time [28].
This evolutionary concept seems to be typical to vehicles where the communication system exhibits recurring patterns. This implies the presence of an internal mechanism that generates regular and recurrent behaviours/patterns in the data [29]. In this context, we adopt RP to provide high-level explicit representations/features capturing the periodicity of the data, which can hypothetically lead to an improved detection rate.   The RP is defined as follows [29]: where x i , x j are the states of length d (i.e., CAN messages) observed at position/time i and j, respectively. . denotes a norm between the observations, a threshold for closeness, n the number of states (i.e., number of messages) and H the Heaviside function defined as [29]: The calculation of RP requires setting the value of the closeness threshold , but determining its value is not intuitive. Heuristics such as setting the threshold to 10% of the largest observed distance or a certain percentage of black points can be used. However, these do not generalise well to multiple RPs and can make it difficult to determine the similarity between two RPs [29]. Following [29], we eliminate the closeness threshold, and also the Heaviside function to keep the granularity provided by the norm function. RP is now a n × n square matrix whose entries R i, j are given by: The output represents the distance between different messages in a sequence, and can be viewed as a coloured map. As such, the RP is no longer a tool to analyse recurrence considering neighbourhoods but it quantifies how close each pair of messages in a sequence is. This is known as unthresholded RP, distance plot, or self-similarity matrix [29]. The norm . , and its induced distance, should be carefully chosen. A simple and widely used distance measure is the Euclidean distance. However, the Euclidean distance does not look at the neighbourhood of each entry as the calculation is performed coordinate-wise. This way, the context of each data field is not captured. We need a distance function that looks not only at the neighbouring context but also at the individual fields. Dynamic Time Warping (DTW) is able to measure the similarity between two data sequences while also considering the neighbourhood of each data field, i.e., its context. However, the complexity of the DTW algorithm is quadratic. In order to reduce the time and memory needed to calculate RPs, we used FastDTW [30], which approximates Algorithm 1: RP Generation Using DTW Distance. the DTW algorithm with the added advantage to only be of linear complexity.
Algorithm 1 shows the implementation of RP using DTW and Algorithm 2 the DTW calculation adopted in this paper. In Algorithm 2, |x i [k] − x j [z]| denotes the cost of matching the two entries x i [k] and x j [z] at indices k and z of CAN messages x i and x j , respectively. Each CAN message is captured in our implementation with 11 entries/fields (e.g., time-stamp, ID, etc.).
The generated 2D texture form the input feature vector (Input 2) of a ConvLSTM layer which extracts high-level features of the message window. This captures different data representations taking advantages of the properties of Convolutional Neural Networks (CNNs) [31]. The model concatenates view 1 and 2 together, where both the intra and inter message dependencies are encoded. The output of the "concatenate" layer is then fed to a dense layer to predict whether the subject message is a normal message or an attack, as in Fig. 3.

IV. EXPERIMENTS AND ANALYSIS
In this section, we describe the initial dataset used and the required preprocessing needed to generate the inputs of our model. We also give the complete setup, including the method employed to tune hyper-parameters. We complete the section by a comparative performance evaluation between our model and the state-of-the-art ML IDS solutions.

A. DATASET
To evaluate the performance of our model, we used the "CANintrusion-dataset" presented in [32]. The dataset contains four types of attacks: r DoS attack: high priority CAN messages (e.g., messages with ID '0x000') injected to the CAN bus with a short time cycle (every 0.3 milliseconds).
r RPM/Gear attack: CAN messages with specific message IDs related to RPM/Gear messages injected to the CAN bus with a time cycle of 1 ms. r Fuzzy attack: CAN messages with spoofed random message IDs and data injected to the CAN bus with a time cycle of 0.5 milliseconds. Table 1 shows the statistics of the dataset used in this work. We considered the DoS, Gear, and RPM as our training dataset and the Fuzzy dataset as testing dataset. To evaluate and compare the performance of IDSs, it is essential that the testing dataset contains CAN IDs that are not specific to only DoS, Gear or RPM attacks but to more generic attacks. This justifies why the Fuzzy dataset is selected to model novel attacks and serves as a benchmark to measuring performance metrics of the different models used in this study. y 1 ), . . ., (x N , y N )} be the initial dataset, with x t = {x t [1], . . ., x t [11]}, y t ∈ {normal, attack}, where N denotes the number of messages in the dataset, x t the message at position/time t and y t the class label of x t . To prepare the data for training and testing, we have performed the following steps:

B. PREPROCESSING AND GENERATING OF INPUTS
r Data conversion and padding: Initially, we converted the content of all messages in D to a decimal data representation. The input feature vector, Input 1, has to be of fixed length but the data length of CAN messages varies from 0 to 8 bytes. Hence, to guarantee that all messages have the same length, we padded messages with DL < 8 with extra bytes with a value of '-1'. This value has to be fixed and chosen so as to never occur in the original dataset. r Relative-time derivation: We replaced the time stamp of the messages in D with the relative-time between consecutive messages in the dataset as follows: The preprocessed dataset D serves as the first input (i.e., Input 1) of the proposed detection model. In order to generate the second input (i.e., Input 2), we normalised the data and adopted a sliding window of size 11 moving one message at a time.
The choice of the window size in an IDS is an important design decision that can affect its performance. A large window size can provide a wide context for analysing messages, which may be beneficial for detecting certain types of anomalies or attacks. However, a large window size can also increase the complexity of the IDS, which can lead to longer training times and possibly lower performance if the IDS becomes too complex to effectively learn from the data.
The setting of the window size depends on the specific requirements and constraints of the IDS. Some factors to consider may include the types of attacks or anomalies the IDS is designed to detect, the amount of available training data, and the resources available for training the IDS (e.g., time, computational power).
One approach to finding an optimal window size is to perform experiments with different window sizes and evaluate the IDS's performance using metrics such as accuracy, precision, and recall. This can allow identifying a window size that strikes a good balance between detecting anomalies and maintaining reasonable training and runtime complexity.
In our case, the number of feature vector elements is 11. Therefore, the minimum number of independent parameters required for linear correlation is 11. Additional parameters may be added to include nonlinear relationships. As such, the output of this process is a dataset D where each data point in D is an 11 × 11 array. A new dataset D is created, whose elements are RPs generated for each data point in D , which constitutes the second input of the proposed detection model. Fig. 6 gives an example of a normal RP vs. an intrusive RP.

C. EXPERIMENT SETTINGS AND HYPER-PARAMETERS TUNING
The proposed IDS has been implemented using Keras 1 deep learning library with TensorFlow 2 as back-end. The fine tuning of the hyper-parameters of both our proposed models and ML state-of-the-art models was performed using Autonomio Talos. 3 The dataset employed for this purpose is a subset of the concatenation of the DoS, Gear, and RPM datasets. The hyper-parameters of the DT and RF models were tuned using a grid search method. 4 To evaluate the generalisation capability of the proposed model to novel attacks, we considered the Fuzzy dataset as testing dataset. The selected hyper-parameters are given in Table 2.

D. PERFORMANCE EVALUATION
State-of-the-art metrics are used to measure and compare the performance of the proposed model against well-known IDSs: Accuracy (Acc), Detection Rate (DR), False Positive Rate (F PR), Precision, F 1 score and Matthew's Correlation Coefficient (Mcc). The positive class is composed of intrusive messages and the negative class of normal or non-intrusive messages.
Each metric plays a specific role where some are crucial in the case of imbalanced classes. Acc indicates the ability of a binary classifier to correctly classify messages as being intrusive or normal [5]. DR measures the capability of a system to detect intrusive messages. In our scenario, the DR value can be interpreted as the probability of an actual intrusive message classified as intrusive. For example, a DR value of 0.5 would mean that half of the intrusive messages are detected as intrusive. Similarly, it is also important to class non-intrusive messages correctly and not detect them as intrusive. The F PR indicator helps measuring this unwanted behaviour by looking at the number of actual non-intrusive message classified as intrusive, the closer F PR is to zero the better. It is critical for IDSs if a fail-safe policy is adopted. In such a scenario, a  high F PR would distract the system from its normal operation mode and require the need to investigate the incident by an expert to determine if a genuine intrusion was present or not, hence, increasing costs. Precision is the percentage of actual intrusions amongst all predicted intrusions classified by the classifier. It gives us the confidence of an intrusion being an actual intrusion. For imbalanced data sets, two further performance measures are selected, namely, F 1 and Mcc. F 1 is the harmonic mean of DR and Precision, thus, giving no precedence of one over the other. As to Mcc, it takes into account the ratios of the four classes in the confusion matrix and provides a correlation coefficient between the result of the classifier and the actual data. The range of values that Mcc takes is between −1 and +1. A perfect classification results in an Mcc value of 1 whereas a −1 value indicates a total disagreement between the decision of the classifier and the actual data. A zero score indicates that the classifier is not better than a random classifier [33]. Formulas of the performance measures used to evaluate the performance of different models can be found in [5].
We evaluated and compared the performance of our model with ML algorithms proposed in the literature for intrusion detection tasks, namely, DT used in [34] and RF, as well as the state-of-the-art deep learning approach presented in [19]. Delgado et al. [35] have shown that RF is the best performing classifier among 179 tested methods across 121 different classification tasks. Also, Belavagi and Muniyal [36] have shown that RF outperforms other ML algorithms (e.g., SVM and logistic regression) for intrusion detection tasks. We also evaluated the performance of the proposed IDS when the detection model is built using a single input (e.g., Input 1) or both inputs (Input 1&2). Table 2 gives the performance of all models. Table 2 shows that DT and RF, as well as the deep learning model presented in [19], have a limited capability in detecting novel cyber-attacks denoted by their low Acc, DR, F1 score, and Mcc value. These models classify most of the instances in the testing data as normal messages, which also indicates a low generalisation ability.
As seen in Table 2, the proposed model (using View 1&2) achieves the highest Acc (95.10476%), with a six-fold improvement in the DR (61.79610%), a marginal drop in Precision (99.99473%) by 0.00517 compared with the Deep Learning model ( [19]) and an increase by more than four-fold of F 1 (0.76240). A similar trend is observed with Mcc which increases by slightly less than three-fold whilst keeping a very low F PR. It is worth noting that the proposed model using only View 2 has a lower performance than both the proposed model using only View 1 and the concatenated views. The low performance of View 2 indicates that using only the context into which a message appears is not sufficient to distinguish between normal and intrusive messages. The exploration of the content of each message given by View 1 shows already excellent results on its own. However, providing a context in which the message occurs, provided by View 2, enhances significantly the overall performance for each measure.
To visualise the distinctive features that View 1, View 2, and both combined bring to separate intrusive messages from regular traffic, we have employed a data reduction and visualisation technique, namely t-SNE [37], results of which are depicted in Fig. 7, with a reduction of the dimensions to three, where blue points represent regular messages and red points intrusive messages. As can be seen in Fig. 7(a), features provided by View 1 allow a separation of both classes with a separability that can be improved further. In Fig. 7(b), we have a cloud of points where points in both classes are intermingled. It indicates that View 2 alone cannot lead to obtaining a good-performance classifier, however, when concatenating it with View 1 as in Fig. 7(c), we can notice an increase of the separability of points belonging to both classes with a reduction of the scattering compared to Fig.  7(a). These constitute strong visual indicators for achieving better performances as evidenced by the results presented in Table 2.

V. CONCLUSION
This paper proposed an ML-based approach for intrusion detection in intra-vehicle networks. The proposed approach generates two representations/views of the CAN data leveraged by machine learning techniques. The views provide high-level features capturing the time and intra-message dependencies of the CAN messages as well as their context. These views are concatenated and used to predict the class label of each message. The performance of the proposed approach was evaluated and compared with the state-of-the-art detection techniques. The results demonstrated that combining both views lead to better performance compared to a single view. The results also demonstrated that the proposed approach outperforms other state-of-the-art methods in detecting novel intrusions as it achieved the highest accuracy (95.10476%), detection rate (61.79610%), F1-score (0.76240), and Matthew's correlation coefficient (0.76427), with a low false positive rate (4.82023 × 10 −6 ).
Although the proposed approach outperformed other techniques and achieved promising results, it was not able to perfectly detect all novel cyber-attacks. The possibility of improving the detection capability of the proposed approach could be investigated further. Despite that our method relies on the structure of CAN messages, we believe it could be easily extended for typical message addressing protocols, like CAN FD, and potentially, FlexRay. In addition, it could be worthwhile to investigate how the current work can be extended to inter-vehicle networks. CARSTEN MAPLE (Member, IEEE) is currently the Director of the NCSC-EPSRC Academic Centre of Excellence in Cyber Security Research and Professor of cyber systems engineering at the University of Warwick, Coventry, U.K. He is also a Co-Investigator of the PETRAS National Centre of Excellence for IoT Systems Cybersecurity, where he leads on Transport and Mobility, and is a Fellow of the Alan Turing Institute, London, U.K. He has an international research reputation, and has authored or coauthored more than 350 peer-reviewed papers and being coauthor of the U.K. Security Breach Investigations Report 2010, supported by the Serious Organised Crime Agency and the Police Central e-crime Unit. His research has attracted millions of pounds in funding and has been widely reported through the media.