Artificial intelligence-driven malware detection framework for internet of things environment

The Internet of Things (IoT) environment demands a malware detection (MD) framework for protecting sensitive data from unauthorized access. The study intends to develop an image-based MD framework. The authors apply image conversion and enhancement techniques to convert malware binaries into RGB images. You only look once (Yolo V7) is employed for extracting the key features from the malware images. Harris Hawks optimization is used to optimize the DenseNet161 model to classify images into malware and benign. IoT malware and Virusshare datasets are utilized to evaluate the proposed framework’s performance. The outcome reveals that the proposed framework outperforms the current MD framework. The framework generates the outcome at an accuracy and F1-score of 98.65 and 98.5 and 97.3 and 96.63 for IoT malware and Virusshare datasets, respectively. In addition, it achieves an area under the receiver operating characteristics and the precision-recall curve of 0.98 and 0.85 and 0.97 and 0.84 for IoT malware and Virusshare datasets, accordingly. The study’s outcome reveals that the proposed framework can be deployed in the IoT environment to protect the resources.


INTRODUCTION
2016; Fathurrahman, Bejo & Ardiyanto, 2022). The existing shallow neural networks and classical ML models demand a higher training duration due to fewer hidden layers (Lirim Ashiku, 2021). It memorizes the training data and makes it difficult to generalize to a newer environment. Deep learning (DL) methods have become increasingly applicable to identifying and analyzing threats with ever-growing malware datasets. Recent studies focus on employing convolutional neural networks (CNN) to classify malware. Deep CNNs facilitate the development of detection systems based on malware images. It enables the MD framework to identify the crucial features of malware.
The features learned at lower layers are strengthened in higher layers. These characteristics support CNNs in producing an effective outcome (Kumar, Janet & Neelakantan, 2022). In addition, the computational cost is minimized by limiting the size of the dataset. The grayscale values ranges from 0 to 255 and gradually shifts between the two extremes of black and white.
Furthermore, grayscale images can be created using malware binary. The properties such as texture, intensity, and wavelet can be retrieved from the resulting images (Liu et al., 2020a). Furthermore, recent studies believe that the RGB image can provide more information for classifying malware images. The primary difficulty of visualization methods is computing the texture similarity of a grayscale image. These methods effectively decrypt obfuscated code. However, they are computationally expensive due to the complexity of extracting texture features from malware images. Large datasets make the feature extraction methods less efficient. Malware is constantly evolving, updating, and producing new versions of itself.
Consequently, improving the performance of the MD framework with low hardware configuration and extracting relevant information from raw binary data are the primary motivational factors for this study. The study intends to develop an MD framework using the CNN model. In addition, it applies efficient image enhancement and object detection techniques to improve the proposed framework's performance.
For IoT devices, there is a requirement for intrusion detection systems that are vastly improved and highly secured. Traditional machine learning algorithms cannot identify sophisticated cyber breaches because of their static design. DL allows for conducting a more in-depth network data analysis and spotting anomalies. Recent studies reveal the crucial role of DL in processing complex images (Falana et al., 2022). Visualizing malware as a coloured image gives the benefit of differentiating various components of the malware binary Jian et al., 2021;Falana et al., 2022). Malware programmers typically modify a small section of the malware codes to develop a new mutant. Thus, visualizing malware as an image offers the benefit of differentiating different components of the malware binary. An image-based DL-driven detection method is highly scalable, flexible, and cost-effective Jian et al., 2021). It can evaluate vast amounts of data and automatically alter security systems to identify malware or security breaches with minimum processing resources.
The contributions of the study are: i) An effective technique to generate images from malware binaries.It overcomes the challenges of the existing RGB image generation technique. In addition, it reduces the possibility of data loss during the image generation process.
ii) A feature extraction technique for extracting the key features from the malware images. It provides the critical features for the CNN models. By presenting a set of crucial features, the performance of the MD model is improved.
iii) A hybrid CNN model for detecting malware in the IoT environment. It addresses the limitations of the existing MD techniques by employing an image-based detection technique. In addition, it demands a minimum hardware and software configuration compared to the recent CNN models.
iv) The proposed model achieved a significant outcome in detecting malware in IoT environment compared to the current models in terms of accuracy, precision, recall, and F1-measure.
The remaining part of this study is organized as follows: "Literature Review" outlines the recent MD using images and binary files. The study's methodology is discussed in "Materials and Methods". "Results" and "Discussion" highlight the performance analysis of the proposed framework and compare it with the recent MD frameworks. Finally, "Conclusion" concludes this study.

LITERATURE REVIEW
The field of image processing extensively employed CNN to generate a practical outcome (Smmarwar, Gupta & Kumar, 2022). The weight sharing and the convolution kernel methods were used in CNN to overcome the limitations of neural network techniques (Asam et al., 2021). Recently, researchers have focused on improving the malware visualization technique's performance and reducing computation cost. This section covers visualization-related studies, including malware identification using statistical similarity measures, machine learning, and deep learning. Traditional MD techniques primarily analyse harmful code properties (Conti, Khandhar & Vinod, 2022). These capabilities also utilize advanced machine learning-based MD techniques to identify new forms of destructive code. However, these technologies failed to detect new malware variations.
Several malware analysis visualization methods have been suggested recently. Makandar & Patrot (2017) developed a novel approach for detecting malware using image features. They generated two-dimensional grayscale graphics from the structure of the compressed binary executable. Based on the findings, binary texture analysis proved more precise and efficient. Venkatraman, Alazab & Vinayakumar (2019) proposed an image based model for detecting malware. Vasan et al. (2020) proposed an approach for converting raw binaries into colour images and detecting malware families. They employed data augmentation for processing the imbalanced dataset. Malimg malware and IoT-android mobile datasets were used for performance evaluation. The outcome shows that the model can identify hidden code and malware families with limited resources. Liu et al. (2020b) introduced a reinforcement method that relies on ML to identify various forms of malware and its variations. Naeem et al. (2020) developed an MD method for the Industrial Internet of Things (IIoT). To track and record information about incoming and outgoing traffic, the authors developed a sniffer gateway. Awan et al. (2021) introduced an image-based malware classification. They employed the VGG-19 network to classify 25 well-known malware images. Jian et al. (2021) suggested a unique deep neural network-based visual MD methodology. They established that three-channel RGB images are superior to grayscale images for malware identification.
Similarly, Sharma, Sharma & Kalia (2022) proposed an Xception CNN-based MD framework for classifying malware images. The authors stated that the models achieve a superior outcome than the current frameworks. Yadav et al. (2022) developed a MD framework using Andriod malware images. Obaidat et al. (2022) proposed a CNN model  Falana et al. (2022) developed a technique to convert malware binaries into an image to support the process of malware classification. A slight variation in an image assists CNN models in identifying critical malware. They employed three benchmark datasets: MaleVis, Mallmg, and Virusshare. The findings suggested that the model achieves an average accuracy of 96.77%. The recent techniques focussed on pattern-based MD. However, it has many drawbacks, including a high false positive rate that causes many valid activities to be incorrectly labelled intrusive. There is a demand for more critical training data. In addition, the existing methods require high-end computation resources to generate an effective outcome. Table 1 outlines the features of the existing MD frameworks.

MATERIALS AND METHODS
The authors propose a DL based framework to classify malware and benign images based on the study's objective. Figure 1 highlights the three phases of the proposed framework. In phase 1, the authors convert the binaries into an image. The images are pre-processed and resized as 600 × 600 pixels. The authors employed you look only once (Yolo) V7 to identify critical features from the images. Phase 2 involves Harris Hawks optimization (HHO) to fine-tune the DenseNet161 parameters to identify malware from the datasets. Finally, phase 3 evaluates the performance of the proposed frameworks. The authors utilize two datasets in this study, including IoT malware and binaries. Two IoT malware datasets (IoT_malware and Virusshare) are used in this study which is available in the Elmasry dataset (Malware, 2021) and Virusshare dataset (Virusshare, 2021) respectively. IoT_malware dataset is a recently developed malware images dataset. It includes the IoT malware images of categories including benign and malware. The unpacked executable and linkage format binaries for malware and benign applications were represented in the image format. In addition, the Virusshare dataset contains instances of multiple malware families. The description of the datasets is provided in Table 2. Based on the SDN framework, the researchers framed the network model as shown in Fig. 2 for implementing the proposed model. In the control layer, the binaries are converted into images and transformed as RGB images. Yolo V7 extracts the crucial objects. Finally, the fined tuned CNN model classifies the malware and benign images.
In phase 1, the authors follow the approaches of Falana et al. (2022) to convert binaries into an image. Let B 1 , B 2 , …, B n and M 1 , M 2 , …, M n be the benign and malware binaries set, respectively. Let D be a space to hold benign and malware binaries. Therefore, D i represents a binary, which may be benign or malicious. Figure 3 shows converting binaries and grayscale images into RGB images.
The following algorithm presents the algorithm for transforming the binaries (D) into a grayscale image. During the image pre-processing phase, the grayscale images (G) are converted into RGB images (RGB). Initially, the luminosity method converts a grayscale image into an RGB image. Equation (1) represents the conversion process of G into RGB.
where a k and b k are linear coefficients. Equations (3)-(5) outline the process of brightness equalization using adaptive gamma correction.
where b is image intensity, and c and d are the images' height and width. In this phase, the RGB images are enhanced in order to assist the DenseNet model. The high-standard deviation in the images are adjusted to reduce the variation in the pixel value. Image fusion filter is applied to remove the noises by blurring the images. It adjusts the uneven pixel value and removes the chromatic aberration. In the subsequent step, Canny edge detection is employed for identifying the ranges of edges. The non-maximum suppression is used to thin out the edges. The intensity of the images are identified using Double threshold method. Finally, contrast limited adaptive histogram equalization is employed to improve the image quality.
Furthermore, the authors employ Yolo V7 (Wang, Bochkovskiy & Liao, 2022) to extract meaningful features from the images. Yolo V7 achieves a superior outcome with fewer computational resources. It generates an output faster without any pre-trained weights. It uses CNN for extracting features and predicting the probability of classes. Yolo V7 overcome the challenges in its previous versions. It contains an extended efficient layer aggregation network and compound model scaling technique that support the proposed model for detecting malware images. Yolo V7 architecture includes residual blocks, bounding box and intersection over union (IoU). It divides the images into multiple grids (residual blocks) with equal dimensions. Each grid is a region to highlight the object. It consists of width (w), height (h), class (c), and center (x,y).
Equation (9) represents the bounding box that highlights a region in Fig. 4.
where P c is the probability of an object in the bounding box (bb). IoU is a metric for evaluating the performance of Yolo V7. It measures the Yolo V7's ability to detect the malware dataset's features. Equation (10) shows the expression of IoU.

IoU ¼
Area of overlapping of actual and predicted malware feature Area of union of actual and predicted malware feature (10) Yolo V7 computes the IoU score for each object detection process. An IoU score greater than 0.5 represents the better performance of an object detection model.
In phase 2, the authors optimize the DenseNet161 model using the HHO algorithm. HHO algorithm is one of the recent optimization algorithms for improving the performance of the complex models. It tunes the CNN model's parameters for improving the classification accuracy. It minimizes the error rate and searches for the optimal learning rate for identifying the malware and benign images. The architecture of DenseNet161 comprises an activation function, a pooling layer, a dropout layer, and the convolutional layer. Each layer acquires information from the previous layer and guides the subsequent layers. DenseNet161 simplifies the connecting pattern among the layers. It reuses malware and benign image features and enhances the network's performance. In addition, it requires a limited number of parameters compared to its counterparts. The developmental rate controls the number of data in a layer. Each dense block includes two convolutions, and each dense layer contains two operations to extract malware and benign features and reduces its depth. HHO is a familiar swarm-based optimization technique. It is used to improve the performance of the DenseNet161 model. In the context of MD, HHO identifies the effective parameters (number of pooling layers, dropout layer, and convolutional layers) for generating the outcome.
The malware and benign images are considered a rabbit in the HHO searching environment. The HHO searching strategies support the proposed framework to classify the images. The exploration and exploitation phases assist the DenseNet161 model in identifying the malware's exact location and benign images. Let q be the equal chance between the DenseNet161 parameters. The HHO exploitation phase for the proposed framework is modelled in Eq. (11).
where M(t+1) is the location of the DenseNet161 parameters in the subsequent iteration, M malben (t) is the location of malware and benign image. M(t) is the present position of hawks, Y1, Y2, Y3, Y4, and q are arbitrary numbers between 0 and 1, frequently modified at each iteration. LB and UB are the lower and upper bounds of each variable, M rand (t) is an arbitrary DenseNet161 parameter from the present population, and M m is the average location of the parameter. Equations (12) and (13) represent the soft besiege of malware and benign images in the HHO environment.
Mðt þ 1Þ ¼ DMtÞ À E JM malben ðtÞ À MðtÞ j j (12) DMðtÞ ¼ M malben ðtÞ À MðtÞ where DM t ð Þ is the difference between the malware and benign image and the present position in iteration t, E is the parameter to represent the transition between soft and hard besiege, and J is the jump strength. Hard besiege is described in Eq. (14).
In phase 3, the authors apply precision, recall, F1-measure, accuracy, Matthews correlation coefficient (MCC), and Kappa to evaluate the proposed framework's performance. The dataset is divided into a train set (70%) and test set (30%). In the MD environment, precision is the number of malware and benign classification among the classified images. A recall is a set of classified malware and benign images. F1-score is the harmonic mean of a number of malware and benign images in the datasets and correctly detected images. Accuracy is the number of optimally classified malware and benign images. MCC is the difference between predicted malware and benign images and actual malware and benign images.
Furthermore, it summarizes the confusion and error matrices. Cohen's Kappa compares the classified malware and benign images with the expected accuracy. It addresses the evaluation bias by providing the chances of generating optimal classification using a random guess. In addition, the error rate and computation cost are calculated for each classification.

RESULTS
In this section, the authors highlight the experimental outcome of this study. The proposed model is implemented in Windows 10 professional environment, i7 processor, GTX 1080 Ti (11 GB). Python 3.9 with Keras (Keras, 2022) library is employed for developing the proposed framework, The similar hardware and software configuration is followed for the training phase. During the training phase, the DenseNet161 parameters are supervised by the HHO algorithm. The authors train the DenseNet161 model with IoT datasets under the HHO environment to identify critical parameters for generating an optimal outcome. During the training phase, the proposed MD framework generates an optimal result at the 32 nd and 37 th epoch for IoT malware and Virusshare datasets, respectively. Furthermore, the authors extended the training to the 37 th and 40 th epochs for the IoT malware and Virusshare datasets. However, there is no significant improvement in the model's performance. Thus, epoch values and the dropout ratios of 32 and 41, 0.3 and 0.5, are assigned for IoT malware and Virusshare datasets, respectively. Based on the outcome of the hyperparameter optimization, an array of five layers comprised of two fully connected layers, three dropout layers and an activation function are integrated with the DenseNet161 model. The hyperparameter tuning process identifies an optimal set of DenseNet161 parameters to detect malware images from the dataset. Table 3 outlines the performance of the proposed framework. In the testing phase, the trained DenseNet161 model achieves an average accuracy, precision, recall, F1-measure, MCC, and Kappa of 98.65, 98.7, 98.3, 98.5, 97.5, and 97.65, respectively, for the IoT malware dataset. HHO assists the DenseNet161 model in generating optimum results. The outcome reveals the adequate performance of the proposed MD model. The higher value of MCC and Kappa indicates that the proposed model classifies the images with optimal precision on the imbalanced dataset.
Likewise, Table 4 shows the proposed framework's performance on the Virusshare dataset. Compared to the IoT malware dataset, the Virusshare dataset contains many files.   Moreover, the image conversion model supports the proposed framework for converting the binaries into an RGB image. The proposed model achieves an optimal accuracy on the Virusshare dataset. The feature extraction process assists the proposed model in identifying the crucial features of the images. Figures 5A and 6B illustrate the performance of the proposed framework on the IoT malware and Virushare datasets. It shows that the model effectively classifies the malware and benign images. In addition, the proposed model addresses the overfitting challenges on the IoT malware and the Virusshare datasets. Table 5 highlights the comparative analysis's outcome of the MD framework. The proposed framework outperforms the recent MD frameworks. The high value of Kappa suggests the effectiveness of the proposed MD framework on the imbalanced dataset. In addition, it highlights the importance of the proposed MD framework in handling true and false positives. However, Falana et al. (2022) framework produces a reasonable outcome on the IoT malware dataset.  Likewise, Table 6 presents the results of the comparative analysis of the Virusshare dataset. The proposed MD framework obtained a superior MCC and Kappa on the Virusshare dataset. Yolo V7 and HHO algorithm enables the proposed framework to produce a superior outcome. In addition, the image enhancement technique offers the proposed framework to identify the key objects. However, both Falana et al. (2022) and Vasan et al. (2020) frameworks achieve results similar to the proposed framework. Figure 6 reflects the performance of the individual MD frameworks on the IoT malware and the Virusshare datasets, respectively. The proposed feature extraction method offers    Table 7 outlines the error rate of MD frameworks. The proposed framework produces fewer errors for IoT malware (14.2%) and Virusshare (15.6%). The feature extraction phase assists the proposed framework in generating a superior outcome compared to the other frameworks.  Finally, Table 8 presents the computational complexities of the MD frameworks in classifying the malware images. Compared to the existing frameworks, the proposed framework consumes fewer parameters, learning rate, and computation time.

DISCUSSION
The authors developed an image-based MD framework for identifying malware and benign files in the IoT environment. An image conversion technique converts malware and benign binaries into a grayscale image. Furthermore, the grayscale images are enhanced to RGB images. An object identification technique extracts a key feature from the images. Yolo V7 is a recent CNN technique for identifying the crucial elements of malware and benign images. HHO algorithm is used to optimize the DenseNet161 model for classifying malware and benign images. It identifies the critical parameters of the DenseNet model in order to detect malware within a limited amount of time. DenseNet161 contains a set of hyper-parameters that reinforces the model to find the crucial objects from the images. Predictive accuracy and detection rates are the primary metrics for evaluating MD frameworks. The primary step in securing a system and gaining control over its further malware spread is accurately discovering the previously undetected instances. Improving the detection accuracy of a proposed method may result in false alarms. Attempts to reduce false alarms may have an unintended negative effect on detection efficiency. As a result, the proposed model uses dissimilarity by contrasting the harmonic mean of both factors, known as the F1 measure. In addition, MCC and Kappa are used to measure the efficiency of the proposed framework.
The image format enables the MD framework to serve multiple types of platforms. In addition, the CNN model can identify a slight variation in textures and patterns in the images. Thus, the proposed model supports the SDN framework to offer a protective environment for the IoT devices. The study uniquely integrates image enhancement, object detection (Yolo V7), and hyper-parameter tuned CNN model (HHO-DenseNet161). Image enhancement and object detection reduces the computation overhead of the proposed model. The hyperparameter optimization tunes the key parameters such as number of dropout layers and epochs. The fined tuned model classifies the images with limited resources. In addition, the computation cost for constructing Bi-LSTM is higher than the proposed method. The framework of Sharma, Sharma & Kalia (2022) generated a better outcome; however, the computation cost is higher than the proposed MD framework. Falana et al. (2022) framework comprised a CNN and generative neural network for classifying the malware images. However, there is a lack of feature engineering or extraction process to identify the critical features from the images. In addition, the complex architecture requires additional computation time to generate the outcome. In line with the Vasan et al. (2020) framework, the recommended MD framework applied the HHO algorithm to fine-tune the DenseNet161. Figure 6 reflects the MD performance on IoT malware and Virusshare datasets. It shows that the proposed MD outperforms the recently developed image-based MD. In line with the studies (Obaidat et al., 2022;Yadav et al., 2022;Smmarwar, Gupta & Kumar, 2022), the proposed model achieves a superior outcome. The significant improvement in the feature extraction and image classification processes enabled the proposed MD to achieve a better outcome. The existing models (Chaganti, Ravi & Pham, 2022;Kumar, Janet & Neelakantan, 2022) generated a reasonable outcome. However, the computation cost was very high comparing to the proposed MD framework. Tables 7 and 8  The presently offered MD technologies are only effective on traditional networks. The implementation of the models are difficult to apply on IoT networks or do not possess the flexibility and robustness necessary to ensure secure operations. The study's outcome reveals that they are appropriate for securing the IoT. It is adaptable, distributed, resilient, and does not require many computational resources. Many IoT devices, including temperature and humidity sensors, used in environmental and agricultural applications are battery-powered and deployed in distant places, necessitating an MD technique that is both computationally and energy-efficient to extend the battery life of these devices. The proposed framework can be applied in environmental and agricultural applications to minimize energy consumption and protect the network. IoT-based systems in smart cities rely on various devices, such as security cameras, that collect personal information and need stringent security protocols to prevent unauthorized access. Safeguarding the IoT system against malware is critical for the well-being of the workforce and the sustained improvement of the Industrial IoT. Thus, the proposed MD framework can offer an effective industrial working environment and safeguard crucial computing resources in industrial settings.
The proposed model yields reliable results and aids in identifying malware in IoT networks. In future investigations, several limitations should be addressed. CNN's multiple layers increase training time and demand a GPU. Nevertheless, the current IoT framework facilitates the high end software and hardware configuration for implementing a DL based detection method. In addition, the proposed MD model is a lightweight application comparing to the recent models. Therefore, the proposed MD model can operate in multiple IoT platforms. The study's findings reveal that the proposed MD model require limited computational resources. The hyperparameter tuned CNN model achieved a better outcome. The existing CNN and recurrent neural network approaches failed to present a crucial pattern from the malware binaries due to data loss and irrelevant features. Yolo V7 model assists the proposed MD framework by providing the key features of malware. The proposed image based MD framework overcome the challenges of the existing approaches.
The proposed technique may suffer from the imbalanced dataset. The data preprocessing is required to improve an image's quality and deliver high performance. There is a possibility of losing critical features due to multiple features. The inability to use coordinate frames might render the graphics unfavorably. The architecture of the proposed model necessitates a sizable quantity of data to yield an exciting result. However, the researcher introduced image enhancement and feature extraction to handle the shortcomings of the CNN model. Incorporating feature selection results into the images' internal representation can yield positive results.

CONCLUSION
The authors present the image-based MD framework for the IoT environment in this study. The malware binaries are converted into images to improve the quality of the malware classification approach. In addition, an image enhancement technique is employed to convert the grayscale images to RGB images. An object identification method is used for feature extraction to support the trained convolutional neural network approach. For classifying the malware images, the authors employed the DenseNet161 model with the support of the Harris Hawks optimization algorithm. The performance evaluation was conducted on IoT malware and Virusshare datasets. The experimental outcome shows that the proposed framework is suitable for real-time applications.
Moreover, the framework is lightweight, which demands a low computation cost for generating effective results. Thus, the framework can be applied to small and large-scale industries. It performs better on IoT malware and Virusshare datasets. However, there is a demand for additional experimentation to improve the performance of the proposed MD framework. In the future, the authors intend to extend the framework with the generative adversarial network to generalize the proposed framework's implementation to other malware image datasets.