Malware Detection Based on Graph Attention Networks for Intelligent Transportation Systems

: Intelligent Transportation Systems (ITS) aim to make transportation smarter, safer, reliable, and environmentally friendly without detrimentally affecting the service quality. ITS can face security issues due to their complex, dynamic, and non-linear properties. One of the most critical security problems is attacks that damage the infrastructure of the entire ITS. Attackers can inject malware code that triggers dangerous actions such as information theft and unwanted system moves. The main objective of this study is to improve the performance of malware detection models using Graph Attention Networks. To detect malware attacks addressing ITS, a Graph Attention Network (GAN)-based framework is proposed in this study. The inputs to this framework are the Application Programming Interface (API)-call graphs obtained from malware and benign Android apk ﬁles. During the graph creation, network metrics and the Node2Vec model are utilized to generate the node features. A GAN-based model is combined with different types of node features during the experiments and the performance is compared against Graph Convolutional Network (GCN). Experimental results demonstrated that the integration of the GAN and Node2Vec models provides the best performance in terms of F-measure and accuracy parameters and, also, the use of an attention mechanism in GAN improves the performance. Furthermore, node features generated with Node2Vec resulted in a 3% increase in classiﬁcation accuracy compared to the features generated with network metrics.


Introduction
Intelligent Transport Systems (ITS) apply several cutting-edge Information and Communication (ICT) technologies for transportation and traffic management and it is one of the main emerging phenomena being discussed and implemented by the governments and private sectors. The main aim of these sophisticated systems is to make transportation smart without affecting and disturbing the current infrastructure [1] and it should provide safer, more reliable, and environmentally friendly mechanisms [2]. To build such a reliable and smart system, multiple advanced technologies from different application domains such as communication, transportation, engineering, finance, and computer science need to be integrated seamlessly to achieve the maximum benefit [3]. Some of the well-known applications of ITS are automatic number-plate recognition, car navigation, smart traffic signal management, and automatic parking.
Security is one of the main concerns in ITS as it manages various integrated devices and sensors from multiple application domains. There are many opportunities for attackers to exploit as in the case of IoT-based systems. Attackers can damage the complete infrastructure as security threats can misuse and manipulate different services. As such, security and privacy are the main concerns for ITS [4]. For example, attackers can inject malicious software (a.k.a., malware) code triggering different actions, such as confidential data retrieval and remote control of the ITS-based system, which in turn leads to catastrophic events in the worst-case scenario.
Malware detection is one of the major challenges in ITS because many different applications and IoT devices are used. For instance, self-driving vehicles are more vulnerable to hacks as they are connected to the Internet and can receive different commands from mobile applications. However, older cars do not have these advanced features. These hacks are very dangerous for passengers in the vehicle, other people in other vehicles, and also, pedestrians. It is very tough to detect this kind of illegal activity in real-time. However, many machine learning and deep learning techniques have been used to detect these behaviors. Machine learning methods that are generally used in this area are K-Nearest Neighbors, Support Vectors Machines, Naive Bayes, Random Forest, and Decision Trees [4][5][6][7][8]. These methods are mostly used for the classification of malware. Recently, deep learning has shown promising results in several application domains. The robustness and power of solving complex problems have attracted many researchers. Several deep learning models such as Convolutional Neural Networks (CNN), Artificial Neural Networks (ANN), Boltzmann Machines, Recurrent Neural Networks (RNN) have been used to detect the malware [9][10][11][12].
Malware is growing exponentially and researchers are facing several difficulties to overcome the challenges due to different reasons. One challenge is the lack of high-quality, industry-scale public datasets because of the potential security concerns. The other reason is the continuous emergence of new malware types. There are no specific rules and regulations to solve this problem easily. The other challenge is the evaluation of malware that is limited to a specific number of malware types in the literature. Some of the other challenges are scalability for large datasets and the required computational power for deep learning-based models. Some of the challenges in ITS include information theft, hacking activities, cyber terrorism, and intelligence gathering [13]. There are two major motives to inject the malware into ITS. The first one is the financial motivation that aims to gain economical profits by damaging the infrastructure and requesting some ransom fee (i.e., ransomware). The other motivation is information gathering that can be used for different purposes. In this modern era, many different datasets are shared publicly and hackers can access some private information using these public data. Relevant authorities must regularize proper legislation and standard procedures. The other major issue is that users do not have self-awareness and this leads to different attacks in ITS [3].
Recently, graph-based techniques have been adopted in different application domains because they capture more information and relationship between the nodes and edges. For instance, graph techniques such as Graph Convolutional Neural Networks (GCNs), Graph Neural Networks (GNNs), and Graph Attention Networks (GANs) include a rich source of information that can provide better performance compared to the traditional machine learning and deep learning techniques [14,15]. Deep Graph Convolutional Neural Networks (DGCNNs) that learn from the API call sequences are also applied [16]. The advantage of graphical-based methods is that they can capture the behavioral features and information accurately, which lacks in other methods.
The main objective of this study is to improve the performance of malware detection models using Graph Attention Networks because the performance of current models for ITS is not at an acceptable level yet. Any false positive and false negative can cause serious problems in ITS. Therefore, this paper presents two GNN models for detecting malware. Particularly, a novel GNN architecture that combines the strengths of GAN and node feature generator, Node2Vec, was proposed and the performance was evaluated on two public datasets. The first dataset is ISCX-AndroidBot-2015 that comprises 14 botnet families (https://www.unb.ca/cic/datasets/android-botnet.html, accessed on 10 August 2021).
The contribution of this study is listed as follows: • This study proposed a novel framework that applies GAN model together with the API call graph data. This model is different than the studies in literature [17,18]; • This study integrated the Node2Vec with the GAN model, which obtains richer and adaptive node feature representations; • The proposed model can be applied to detect malware in ITS, which has integrated mobile application interface.
The paper is organized as follows: Section 2 presents the related studies. Section 3 explains the methods. Section 4 discusses the proposed approach. Results are shown in Section 5. Section 6 presents the discussion and Section 7 concludes the paper.

Related Work
In ITS, infrastructure is connected to the external or public networks. For instance, self-driving vehicles communicate using public wireless communication channels and they operate using built-in equipment such as modems. Moreover, the interface needed to operate these services is provided by a mobile application or a built-in application by the manufacturer, which is mostly based on the Android operating system. Malware detection approaches are generally divided into three main categories, namely static analysis, dynamic analysis, and hybrid analysis. In this section, we discuss Machine Learning, Deep Learning, and Graph-based techniques that can detect malware.

Machine Learning Techniques
Rieck et al. [19] applied the SVM on a dataset including 10,072 data points and classified them into the 14 categories. This study managed to achieve up to 88% accuracy. Firdausi et al. [20] analyzed malware by using dynamic analysis on the malware and benign files. This study collected 220 malware and 250 benign files for classification. Several classifiers such as k-Nearest Neighbour, Support Vector Machines, Decision Trees, Naive Bayes, and Multi-Layer Perceptron were trained on the dataset. The best accuracy was achieved using Decision Trees (i.e., 96.8%). Sahs and Khan [21] used Support Vector Machines to detect malware. The dataset comprises 2081 benign files and 91 malicious Android applications. Rana et al. [22] applied machine learning algorithms on the Android application dataset that deals with permission access. The best accuracy is achieved by using k-Nearest Neighbours (i.e., 96%) and SVM obtained an accuracy of 93%.

Deep Learning-Based Techniques
Recently, Deep Neural Networks have shown promising results in many different application domains. Deep learning-based models such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Auto-Encoders (AE) achieved better performance to detect malware. Static approaches use features such as API calls, commands, and permissions [23,24]. On the other hand, dynamic approaches operate on the Android applications in a controlled environment [25,26].
Sewak et al. [10] used different combinations of deep learning architectures including auto-encoders. The previously reported best accuracy was 98% and false-positive rate was 1.07%. In this study, features are extracted automatically and the model achieved an accuracy of 99.21% and obtained a false positive ratio of 0.19% [10]. Another study proposed a lightweight PC malware detection system to overcome the time complexity of deep learning models. This system is based on the Convolutional Neural Network (CNN) algorithm that learns features automatically based on the given input, which is a sequence of group instructions. The accuracy is 95% achieved on the dataset including 70,000 data points [23]. Alzaylaee et al. [24] proposed a deep learning-based malware detection model called DL-Droid. It detects malicious Android applications by using input generations through dynamic analysis. The dataset size is 30,000 and comprises malware and benign applications. Moreover, experiments are performed by using both dynamic and hybrid features (dynamic + static). In the case of dynamic features, the model achieved an accuracy of 97.8% and, in the case of the hybrid, it has an accuracy of 99.6%.

Graph-Based Techniques
Recently, Graph Neural Networks (GNN) received the attention of researchers in the field of cybersecurity. In GNN, each node is associated with a label and the goal is to predict the label of unknown nodes by using the neighborhood information. The edge between two specific nodes contains specific features about its neighbors and this process is known as a neighborhood problem. Generally, embeddings are used to represent the features and neighboring nodes. Xu et al. [27] presented a GNN-based malware detection system and the categorization technique is based on the function call graph. In this study, the Android application graph structure is transformed into vectors and the model classifies the malware families. The accuracy of 99.6% is achieved for malware detection and the accuracy of 98.7% is obtained for classification. Graph Convolutional Network (GCN) is a semi-supervised approach that deals with graphical data. It is the variant of the traditional CNN, but it uses the graphical data and works on the spectral graph convolutions via local approximation [14]. Gao et al. [17] proposed a GCN-based model named Gdriod for malware classification. The idea of this study was to map the Android application and APIs to a heterogeneous graph and build edge-based relationships. The accuracy obtained is 98.99% and the false-positive ratio is less than 1%. In the case of classification, this study achieves an accuracy of 97%. Other studies [16,18] also utilized the GCN for malware detection and classification.
Graph Attention Network is a neural network architecture that also operates on graphical data. Veličković et al. [15] proposed a model to overcome the shortcomings of previous models that use an attention mechanism. In this study, attention layers are used, which are stacked over one another to interact with the neighbors. The main advantage of this method is that it does not depend on the structure of the graph. This study not only achieved better results than the previous ones but also resolve transductive and inductive problems that were discussed in the literature. Kipf et al. presented a Variational Graph Autoencoder (VGAE) for unsupervised learning that applies the VAE over the graphical data [28]. The basic idea of this framework is to generate new graphs. As the input data is graphical, the general VAE is not applicable because the graph structure is irregular. The features matrix is generated and represents the feature embeddings of each node. Further, the encoder of the VGAE consists of GCNs and as an input, it takes adjacency matrix and feature matrix and generates latent variables as output. The decoder is the inner product of latent vectors. This study was used for the link prediction tasks in Cora, Citeseer, and PubMed and achieved higher accuracy.

Methods
This section explains graph-based classification models, types of attributes, and model evaluation metrics.

Graph-Based Classification Models
Nowadays, a lot of information is represented with graphs such as Google's Knowledge Graph, which helps for Search Engine Optimization (SEO), chemical molecular structure, document citation networks (e.g., document A cited document B), and social media networks (i.e., who linked whom). A graph consists of two main elements: nodes (vertices or points) and edges (connections or lines). For example, in the CORA dataset, which is a document citation network, nodes represent the documents in the network, the edge connecting one node to another indicates that this document is citing another document [29]. Due to having the arbitrary size of nodes and complex topology, end-to-end deep models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or Autoencoders failed to model graph structures under the assumption of independence of the instances [30]. While these models are capable of capturing hidden patterns of structured data (e.g., images, text, video), they fail to capture patterns from graph structures due to the interconnection of graph nodes by various edges.
GCNs are a type of deep learning method designed to make inferences on data defined by graph structures. GCNs are neural networks that can be applied directly to graphs and provide an easy way to perform node-level, edge-level, and graph-level prediction tasks [30]. The concept of node embedding in GNNs was introduced to compensate for the failure of CNN in modeling graph networks. Node embedding allows nodes with similar properties in the graph to be projected to nearby points in a d-dimensional embedding space [31].
GCNs utilize adjacency and feature matrices for node embedding. Adjacency matrices can represent the existence of edges connecting pairs of nodes. Unlike adjacency matrices that model the relationship between nodes, graphs have a feature matrix representing the properties or attributes of each node. If a graph has N nodes and each node has K number of attributes, the dimension of the feature matrix is N by K [14]. In the example of the CORA dataset, we need to have a corpus containing words from all documents. Each document is represented by a node, while node features are the bag of words that indicates the presence of a word in the document. In this case, K represents the size of the corpus (i.e., the total number of unique words), while N is the total number of documents available.
GCNs can perform network training using Spatial Graph Convolution Networks and Spectral Graph Convolution Networks methods. Spectral Based Graph Convolutional Networks are more preferred because they are less costly in terms of computation [32]. In neural networks, the following equation is applied to propagate the feature representation to the next layer: This operation is basically the same as y = mx + b in linear regression. In the equation, m is the weights, x is the input features, and b is the bias. The rearrangement of Equation (1) for the first hidden layer (i = 0) is as follows: In Equation (2), the feature representations in layer 0 are basically input features (X). This forward propagation process in Artificial Neural Networks differs in GCNs. The underlying idea of Spectral GCN is based on signal/wave propagation. Information propagation between nodes in a spectral GCN is characterized as signal propagation across nodes. Spectral GCNs make use of the Eigen-decomposition of the graphical Laplacian matrix to implement the information propagation method. Eigen-decomposition is an important tool for understanding graph structure and is similar to Principal Component Analysis (PCA) and Linear Discrimination Analysis (LDA) methods used for dimensionality reduction and clustering [32].
The Fast Approximate Spectral Graph Convolutional Networks method uses the adjacency matrix of graphs (A) and node properties in the forward propagation process of the network. The matrix A represents the connections between the nodes in the forward propagation equation, as mentioned earlier. The presence of A in the forward pass enables the network to learn feature representations based on node connections during learning. Thus, the resulting GCN is a type of message passing network, in which information is propagated across neighboring nodes [14]. With the addition of the adjacency matrix, the forward pass equation is as follows: By adding A to the forward pass and doing the dot product of A and H simplifies the process of constructing the feature representation of the model. The feature representations generated by the dot product of the adjacency matrix and the node features are basically equal to the sum of the neighboring node features. While using the attributes of the neighboring node in the creation of the feature representations in the AH operation, it does not benefit from the attributes of the node itself. To solve this problem, self-loops are added to each node of the graph, and the diagonal elements of A adjacency matrix are changed to 1. Thus, the feature vector X is dot-producted with this matrix calledÂ and the neighboring node features are also used together with the node features during calculating the node representations [33].
The fact that the matrix elements have different numerical ranges in AH dot-product causes numerical instability and vanishing gradient in network training, as in artificial neural networks. In order to prevent this situation, a data pre-processing step such as the normalization process in neural networks should be performed. Normalization in GCNs is done using the Degree (D) matrix. The degree matrix expresses the number of edges to which the nodes in a graph are connected. In GCN, the normalization process is done by computing the inverse of D matrix and performing the dot-product withÂH. Another graph neural network used in our study is the Graph Attention Network (GAN). Unlike GCNs, where each neighbor node contributes equally to generating the central node representation, GANs have an attention mechanism that assigns different importance to each neighbor node's contribution [15].

Node2Vec Embedding
Node2Vec is an embedding method that transforms nodes in a graph into dense and low-dimensional attribute representations. Node2Vec considers edges and edge weights between nodes during the vector creation process. Similar representations are created for nearby nodes in the network while the structure of the original network is preserved during the representation process. Node2vec generates the feature representation of each node in the graph via a second-order random walk. The main difference between the second-order walk and the first order walk is that the transitions from one node to the other nodes depend not only on the current state but also on the previous state [34].
In the second-order walk, a bias factor called alpha is used to calculate the transition probabilities between nodes. There are five parameters that need to be determined in the Node2Vec embedding process. These are the size of the feature embedding, the number of random walks to be executed for each node, the maximum number of nodes to be visited for each walk, and the p and q parameters for determining the alpha value [35].
In Node2Vec, each node in the graph is determined as the starting point and a certain number of random walks are created from these points. The walks generated for each node form a corpus, which is given as an input to the Word2Vec model to generate node representations. The aim in the training of the Word2Vec is to maximize the probability of predicting the correct context nodes given the central node. Word2Vec model outputs to the predefined size of embedding vectors belonging to each node in the graphs [36]. To get rich representations, Node2Vec takes advantage of flexible parameters in exploring neighborhoods in the graph, helping to ensure the exploration and exploitation trade-off involved in graph-optimization problems [34].

Performance Evaluation Metrics
Although accuracy is the most used measure in performance measurement, it does not provide sufficient information to demonstrate the class discrimination ability of the model. Besides accuracy, the F-measure metric is also used to assess the performance of the model in distinguishing between different class instances. Accuracy and F-measure metrics are calculated based on Confusion Matrix (CM). CM simply refers to the number of correctly and incorrectly classified samples per class in a binary classification task (Table 1). True positive (tp), false positive ( f p), false negative ( f n), and true negative (tn) are matrix elements that are used to calculate aforementioned metrics.

Actual/Predicted as Positive Negative
Positive tp f n

Negative f p tn
Accuracy indicates the ratio of the number of correctly predicted samples to the total number of samples. However, where the difference between the f p and f n values is too large, precision, recall, and F-measure metrics need to be considered. Precision is the ratio of the true positive samples to the positively predicted samples (Equation (5). Recall represents the ratio of correctly classified positive samples (tp) to the total number of true positive samples (Equation (6)). A low precision means that the model produces a large number of false positive samples, while low recall rate indicates that the model result contains a large number of false negatives [37]. F-measure is defined as a harmonic mean of precision and recall. F-measure considers both false positive and false negative samples in the evaluation and can directly measure the class discrimination of the models. In addition, F-measure can measure the performance of models trained on unbalanced datasets [38]. Based on the confusion matrix, F-Measure is calculated as follows:

Framework
The proposed malware detection framework can be considered as an end-to-end model that takes the Android apk files as inputs and decides whether these files are malware or not as output (Figure 1). This framework consists of four steps. In the first step, Android apk files used in model training were collected from two datasets. To detect malware applications on ITS devices, a new dataset was collected by compiling public datasets. While 1843 benign apk files were obtained from the CICMalDroid [39] dataset, apk files containing malware were collected from the ISCX-AndroidBot-2015 [40] dataset.
API-call graphs, which represented calling relationships between methods in a computer program, were created from apk files by the Androguard tool (https://androguard. readthedocs.io/en/latest/, accessed on 10 August 2021). After the call graph generation, the attributes of the nodes in the graphs were determined. At this step, two feature generation approaches were implemented.
In the first approach, four features were generated for each node using four different graph topology metrics. In-degree, out-degree, closeness, and Katz centrality were employed as network metrics. With the help of these metrics, information was obtained from the nearby local regions of each node. In the second approach, the Node2Vec model was used as a feature generator to expand the local regions of the nodes. With the Node2Vec, 50 dimensional feature representations were generated for each node. The third step of the framework carries out the model training process. At this step, popular two graph neural network models, namely GAN and GCN architectures, were employed for malware detection. A total of four model combinations were created for classification. In the last step of the framework, the predictive performance of models were assessed. Accuracy and F-measure metrics were used during the evaluation process.

Framework Properties to Detect Malware for ITS
Security threats have increased due to the increasing connectivity of vehicles. Malicious software can flow into the internal network of the vehicle if an infected device is connected to the vehicle, which in turn can create a backdoor that can allow attackers to elevate the account privileges [41]. As such, we have to detect malware in self-driving vehicles. In their paper, Park and Choi (2020) also used the same dataset as we have used. The idea in this paper is to use malicious software in the Android OS because malware can have a detrimental effect on many ITS.
Malware detection problems encountered in ITS can be approached from two perspectives. The first perspective is related to the malware issues that occur in Vehicle-to-Device Communications [42]. This type of communication, which is defined as vehicle-toeverything, includes mostly Android-based smartphones as the basic component. Service information about the vehicle such as fuel consumption, filter status, battery status, and vehicle anomalies such as insufficient tire pressure can be detected with the help of applications installed on smartphones. In the early years, communication between devices and smartphones was provided locally via serial communication or Bluetooth interfaces. With the emergence of the Internet of Things (IoT), vehicle manufacturers placed Telematic Control Units (TCUs) in vehicles, which provide access to vehicles over mobile networks [43]. As such, information about both the vehicle and the driver became available for collection and management easily. However, extracting information regarding the vehicles and driving patterns causes different threats.
The most common threat is the transmission of the vehicle information to third parties by using malicious code injected into the software of the vehicles. Another threat is that some services of the vehicles are disabled by malware while during the vehicle software update over the Internet. Therefore, preventive intervention is needed to protect both the software of the vehicle and the server traffic against malware. From this perspective, we can state that our proposed model presents a graph-based solution that is capable of catching malicious software code both in vehicles and devices during Vehicle-To-Device communication.
The second perspective is that malware can also trigger hacking attacks such as leaking private/confidential information or denial of service (DoS). Especially, in the case of Android OS, malware is usually integrated into the system from a web page or due to an email attachment without the user's intention/knowledge. This malware can collect user and device information and transmit them to a remote server. Malicious software can also initiate a backdoor service that allows the attacker to gain access to the device and control of the device. This is particularly dangerous when an Android-based device is connected to an autonomous vehicle [44]. When the device hijacked by the malware code is integrated with the autonomous vehicle, the hijacker can transmit malicious code to the vehicle's built-in software in order to malfunction the autonomous behaviours. Our proposed model can be used to detect malware with the help of Graph Attention Networks on Android-based devices and as such, prevent the infection of the functions of autonomous vehicles.
The performance of graph neural networks is directly proportional to the quality of the node features. In both GCN and GAN models, the representations produced for each node are taken into account when performing the classification process. In order to evaluate the performance of the node features, both the static network attributes and the features generated by traversing the graphs are used. It has been observed that robust features produced by exploring graphs with the Node2Vec method cause performance increase in both GCN and GAN models. Hence, the proposed model is generic enough to be used in different malware detection scenarios including self-driving vehicles.

Experimental Results
Experiments were performed on a dataset created from the combination of two public datasets. Since graph data require high computational power, experiments were run on a computer with an Intel i7 7700 HQ processor with GTX 1070 Graphics Processing Unit (GPU) support. Pytorch-Geometric module [45] of the Pytorch framework was utilized to create graph neural networks. Compared to Keras and Tensorflow, Pytorch provides rich and diverse options in generating graph neural networks with the help of the Pytorch-Geometric package. Pytorch-Geometric has an integration with several graph modules such as Networkx for the easy processing of graph data. The first graph neural network was Gconv that was a variant of the GCN model. The second network was the Graph Attention Network, which was the attention boosted version of GCN. Both network architectures consist of five layers except the input layer. The next three layers of the networks after the input layer are the consecutive convolution layers where abstract feature representations of the specified size are produced for each node. In order to use these produced outputs in graph classification, dimensionality reduction was performed with the global pooling layer. At the last stage, the outputs produced in the global pooling layer were given to the softmax layer and it was decided whether the apk file was malware or not. The hyperparameters of the created architectures are shown in Table 2. Experiments with two established network architectures were performed using a 10-fold cross-validation approach. Despite mostly used technique in performance evaluation is hold-out method that divides the dataset into two partitions as training and test set, this approach cannot cover all instances in the dataset and cause biases in performance evaluation. In order to handle this issue, 10-fold cross-validation was employed for the assessment of our model performance. Cross-validation is easy to understand and is less prone to biased estimation in the validation of the predictive performance. During the training, every fold was trained for 100 epochs with 64 batch sizes. Although there were many feasible optimizers, such as AdaBelief, Adagrad, and Rmsprop, Adam was selected as an optimizer due to fast convergence and high accuracy properties [46]. In order to prevent over-fitting during the model training, both dropout and regularizer layers were added after convolution and global pooling layers. In addition, early stopping was performed to check the decrease in the training error for every 15 iterations during the model training. Node2Vec model was used in the generation of node features. Unlike Deepwalk and Randomwalk models, which assign equal probabilities to each neighbor node in generation random paths, Node2vec uses the parameters p and q, which indicate how quickly neighbors can be discovered in graph traversals.
The hyperparameters of the Node2Vec model were listed in Table 3. In model training, different experimental setups were conducted by changing the number of hidden units in the convolutional layers of GAN and GCN models. 16, 32, 64, and 128 were selected as the number of hidden units in the proposed framework. In order to test the statistical significance of the experimental results, the Wilcoxon Signed Rank Test was applied using a 0.05 significance level.
According to the 10-fold cross-validation, results are shown in Tables 4 and 5. The predictive performance of the GAT model was significantly larger than the GAN model in both node feature types. Results showed that the highest accuracy rate was obtained by the GAT model with Node2Vec generated features. This model provided an accuracy rate of 0.961 with an F-measure rate of 0.941 using 64 hidden units in its convolutional layers. The same model achieved an accuracy of 0.955 and an F-measure of 0.938 when using 128 hidden units. In the combination of Node2Vec and GCN, the highest classification accuracy was reached with 0.933 (0.918 F-measure rate), again using 64 hidden units. When the results of the GAN and GCN models were compared, the use of attention mechanism in GAN resulted in a performance increase of about three percent. The same performance increase was also seen when node features generated with the Node2Vec model were used instead of the network metrics. Compared to the results obtained with the network metrics, there was a performance improvement of approximately two percent with Node2Vec in both the GAN and GCN models.

Discussion
Experimental studies have some limitations and threats to validity. In this study, experiments were carried out in two different datasets. The performance of the proposed model on other datasets might be slightly different; however, we do not expect too much change in the performance. We focused on malware detection in ITS; however, there are also other threats that need to be considered. A complete security framework for an ITS must address these additional components instead of focusing only on malicious software. Different researchers might develop new models using new deep learning algorithms and reach better performance results than the one reported in this study. We applied widely applied evaluation approaches in this study, however, the results might be slightly different if the evaluation strategy is changed during the experiments.
The main difference between GCNs and GANs is that GANs use attention mechanisms that assign greater weights to more important nodes, walks, or patterns. To generate node representations, GCNs consider only neighboring node representations and weigh the neighbor representations equally. On the other hand, GANs combine random walks or outputs from multiple candidate models, as well as representations of neighboring nodes, to produce node representations. While combining the outputs, the attention mechanism weights learned adaptively in the training of the network are used.
Our proposed model has a general structure that can be extended to many areas including node and edge data types. Tasks in bioinformatics, social network analysis, transportation management systems are some examples of these areas where our model can be adopted.

Conclusions
A typical ITS consists of several complex advanced and emerging technologies including autonomous vehicles, payment applications, management applications, communication applications, and real-time traffic flow controls. Many different parties, such as different nations, cyber-criminals, and hacktivists might have different motives to cause chaos in ITS. Previously, roadside boards, surveillance cameras, and emergency sirens have been hacked. Since these Intelligent Transportation Systems include many different software components, the detection of malicious software in ITS with high performance is crucial. This study aimed to improve the performance of malware detection models using Graph Attention Networks (GAN). The proposed model integrated the Node2Vec and GAN. Experimental results showed that node features that are created with Node2Vec provide better accuracy compared to the features generated with network metrics. It was shown that the GAN-based detection model provides remarkable results. Future work will evaluate the performance of the model against adversarial machine learning attacks and will involve new case studies. We will also cover the use of deep learning models in the intelligent transportation systems from the perspectives of Explainable Artificial Intelligence (XAI).