Hancitor malware recognition using swarm intelligent technique

ABSTRACT


INTRODUCTION
Malware is recognized as a program created to disable computer operation and access to private computer systems.When the malware is implemented, the malicious program designer takes advantage of the benefit of accessing the computer systems of the infected device, and collects personal information and everything that the device contains without the consent of the computer owner.Currently, malware is used to steal important commercial and banking information.It is usually used widely against government websites, corporate sites, and banks to collect protected information or disrupt its operation in general.Malware is usually used against individuals to obtain personal information, such as bank card numbers.The different types of malwares create damages due to their removal difficulty upon installation on the prey's machine.The severity of the software ranges from minor inconvenience to irreparable harm that requires reformatting the hard drive.
Malware is considered a great danger and threats stand up to the world of the Internet and computer networks today and these malwares usually come in various forms such as Trojan horse, Virus, Worm, Botnet, Spyware, and Adware [1].Fire Eye reported that 47% of organizations had encountered breaches of the malware accident safety.Malware is constantly growing in size, diversity, and speed.Thus, malware has become complex and uses new and advanced methods to infect computers and smart phones [1].
Many techniques were used in malware classification and recognition [2].A deep neural network uses for malware dynamic behavior recognition, in which deep neural network could recognise future malware through the generative adversarial network implementation.A Swarm intelligent technique used for recognition of malware as in [3], [4] where particle swarm optimization (PSO) is uses to build a system to recognise malware, in these studies the researcher's refinement the rate of recognition with PSO.Vinod and Dhanya [5] using PSO in the recognition of the algorithm and to improve the recognition rate.Vinod and Dhanya [5] recognize malware using statistical approach.Another approach for recognising malware used as in [6], where the researcher used a data mining method.The other researchers used machine learning for recognising malware [7]- [11].
Another study used a hybrid approach which enhances the performance to recognise unknown malware, recognizer proposed in [12]-[17], as [12], where it suggests a malware recognition system for Android system using concept of hybrid intelligent depended on support vector machine (SVM) with evolutionary algorithms (genetic algorithm (GA) and PSO) to enhance malware recognition, which is respectively referred to as Droid-HESVMGA and Droid-HESVMPSO, to increase the precision rate to recognize malware.The methods based on naive-bayes, SVM, and decision tree used as recognisers, are exhaustibles that boost decision [13].Tree is a top method used as a recogniser of malware.A malware recognition method has been proposed using image processing methods, which depicts malware binary as gray scale images [14].A K-nearest neighbor technique with Euclidean distance method is used for malware recognition.Firdausi et al. [15] proofed a concept of using hybrid intelligent to recognise malware based on 5 recognisers i.e., k-nearest neighbors (kNN), naive bayes, J48 decision tree, SVM, and multilayer perceptron (MLP) neural network.Santos et al. [16] built OPEM system, which used 4 algorithms as recognisers, these methods are Decision Tree, kNN, Bayesian network, and SVM to recognise unknown malware, a similar work is done by [17] to recognise the malware using SVM, IB1, DT and RF.Anderson et al. [18] proposed a method which used support vector machine recognizer.
Malware classification and recognition using swarm intelligent technique are significant areas in the recognition of malicious applications.The method used by malware recognize experts depends on the problem and dataset regardless of the categorical or numerical output data, therefore swarm intelligent techniques can be used for recognition, forecasting, and estimation malwares [3].In this paper swarm intelligent (SI) algorithms are used to recognise Hancitor malware because SI are familiar in recognition and classification applications due to, they depended on simple idea and being accurate and easy concepts to implement, do not require progressive information and it is often used to solve a wide range of problems that cover many applications [19].SI algorithms are divided in two categories depending on its type; the first type is the insectbased category for example, ant colony optimization (ACO), artificial bee colony (ABC) etc.The second category is animal-based algorithms which include PSO, artificial fish, and grey wolf optimization (GWO) etc [19], [20].Two of swarm intelligent algorithms to recognising Hancitor malware uses in this paper depend on its categories, first for animal-based algorithms is GWO algorithm and second for insect-based category is ABC algorithm.GWO is used in this paper to recognize Hanictor malware due to the advantages of GWO by maintaining information on the search space overcome the course of iterations.The works in [19]-[21] uses memory to store the best solution obtained, contains some parameters to adjust and implemented in easy way [19] and GWO have capabilities to solve optimization problems [20]-[24] through the social hierarchy of GWO.
The second SI algorithm used to identify Hancitor malware is the ABC algorithm where ABC algorithm requires a minimum level of understanding of the problem area as it does not require complex training data as the bee recruiter better updates himself with the attribute correlation and update directly on the performance of the classification category than the knowledge of the waggle dance [25], [26].Therefore, these types of procedures certainly have a greater potential in improving classification accuracy.In an ABC data classification, it can be a mimic behavior of insects to find the best food source, and build an ideal nest structure.The bees distribute the workload among themselves, which does not classify the data incorrectly and are homogeneous spectrum and spectral interference.Dancing behavior aids in optimal design.The Waggle dance is one of the mechanisms for sharing the existing food source, which indicates a good candidate for developing a new smart search for the optimal solution [25], [26].The rest of the paper is arranged in the following style; in section two is theoretical background explores malware recognition techniques and Hancitor malware and its danger, while in section three the swarm intelligent techniques used in malware recognition, GWO and ABC algorithms are explained.Section four presents the proposed model, followed by the results of the comparison in section five, in section six the conclusion and future work.

MALWARE RECOGNITON TECHNIQUES
Hancitor sometimes called Tordal and Chanitor.The malware has been around since 2014.Hanictor attack to infect users' devices is malicious spam campaigns; Hancitor mostly gets into devices with Microsoft Office files.Once the user downloads and opens the malicious file, the malware either uses the lure to trick the Comput.Sci.Inf.Technol.


Hancitor malware recognition using swarm intelligent technique (Laheeb M. Ibrahim) 105 victim into enabling macros or uses an exploit.After that Hancitor will be either downloaded from the C2 server or dropped from an Office file.The next step is its execution during which the malware downloads the main payload, usually a Trojan such as Pony, Vawtrak, or DELoader.Hancitor method of infecting the victim's machine using many ways, one of these ways is using .DOC attachments taking advantage of Microsoft's dynamic data exchange (DDE) technique.The user must first download the file and then activate macros, ignoring multiple security warnings [27].Malware authors use lures to trick users into doing that.Some phishing emails contain an invoice or a fake payment related document, trying to make the user download it.In addition, attackers provide instruction to enable macros.If the user complies, malicious macros will download Hancitor or it will be dropped from the document.In some malspam campaigns, Hancitor was delivered to victims with .RTF documents which used an exploit to run the PowerShell command which downloaded the loader to the computer.Another way of Hancitor to infecting the victim's machine is by using Excel spreadsheets as a trap document since December 17, 2018, the executable file (Hanictor) was instilled in Excel spreadsheets, then Hancitor declining to a vulnerable Windows host after opening the spreadsheet in Excel and enabling macros on January 28, 2019.However, the Hancitor campaign changed its decoy document on February 5, 2019.This campaign went back to using Word documents instead of Excel spreadsheets.The Hancitor executable was retrieved from a web server hosted on the same IP address (but a different domain) as the initial Word document after enabling macros.The Hancitor infections on February 5, 2019, is shown in Figure 1.When Hancitor initially infects a system, it sends a POST request to its command and control (C&C) server with information on the infected system.Figure 2 show the Hancitor malware Infected system.

WHY SWARM INTELLIGENCE (SI)
SI is a branch of artificial intelligence (AI) branches and is specified by Gerardo Beni and Jing Wang in 1989.There are many purposes accountable for using SI algorithms based on flexibility, ease of use, speed in implementation, versatility, the self-learning capability and adaptability to external variations.Also, SI used

Grey wolf optimization (GWO)
GWO algorithm is suggested by [20], [22], [23] relying on social hierarchy and hunting habits for gray wolves to find victims.There are four hierarchy types of individuals in Community of GWO these types are alpha, beta, delta, and omega based on their fitness.Where the research process is done by designing a model to mimic the hunting behavior of gray wolves through searching for prey, attacking the prey and covering exploitation.In GWO algorithm the hunting method helps to determine the location of prey [20].The mathematical simulation of the GWO is explained in [28], and the GWO algorithm for Hancitor infection malware recognition is presented follows.

GWO algorithm [20]
Initialize the parameters a, A, C Initialize GWO population GWOi (i =

Artificial bee colony (ABC)
The ABC algorithm is an optimization algorithm based on the intelligent foraging behavior of honey bee swarm, proposed by Derviş Karaboğa in 2005, the colony into the ABC algorithm contain from three types of bees: employed bees, onlookers and scouts.In ABC they are simulated to only one artificial bee for each food source, where the numbers of bees used in the colony are equal to the number of food sources around the hive.In ABC, employed bees go to their food source, return to the hive and dance in this region.Employed bees whose food source has been abandoned become scouts and begin to search for a new source of food.Onlookers watch the dances of employed bees and choose food sources according to the dances.
ABC differ from another swarm intelligence algorithms based on the situation that the potential solutions are appear by the food sources, not the individuals in the population.The quality of the potential solution is presented as a fitness value; the fitness value is calculated by the value of the objective function of the problem.In the ABC algorithm onlookers and employed bees carry out the exploitation process in the search space, while the scouts control the exploration process.The phases of ABC algorithm and mathematical equations are in [29].Pseudo-code of the ABC algorithm for constrained optimization problems [30]  ) × 0.5 (3)

−
Product a new solution vij, as in (1) for each onlooker bee in the neighbourhood of the solution selected depending on pi and evaluate it − Calculate selection operation the value between (υi -xi) depended on Deb's method − Use "limit" parameter for the scout to decide the disused solutions.if they exist, replace them with new randomly produced solutions, as in ( 4) − Store the best solution completed up till now − cycle = cycle+1 − End do

PROPOSED MODEL
In this paper a recogniser system is designed to recognise Hancitor traffic from normal traffic, to allow the network administrator to make the appropriate decision, See Figure 3.In proposed model a Hancitor method of infecting the victim's machine is using .DOC attachments taking advantage of Microsoft's dynamic data exchange (DDE) technique, netflow traffic analyzer (NTA) tool Ver. 9 used to collect network traffics by capturing them and obtaining the data used in the proposed model.The captured traffics (normal or Hancitor malware traffics) are then stored in the Hancitor data file.The collected traffics are used for the selection of some features related to the underlying network traffic to select the following attributes (Source IP, Destination IP, Protocol, Timeline, and Length), see Table 1.After the selection of attributes, the monitored traffics are used as input to the recogniser (GWO and ABC algorithms) to recognise the traffic into Hancitor or normal traffic.GWO and ABC swarm intelligent algorithms are used to recognise Hancitor traffics on the network using the attributes.The underlying conveyances and the parameters used in the tests are significant.The parameters of the gray-wolf optimizer are chosen after running a few tests to obtain satisfactory outcomes.The values of a = (2 to 0) The fitness function as given by Euclidean distance, as in ( 5) is calculated to have best solution after first iteration as "α-wolf", and "β" and "δ" wolves, and the second and third best solution are β and δ The parameters of the ABC algorithm are chosen after running a few tests to obtain satisfactory outcomes.ABC was initialized with: − Population size of ABC: Employed bees = 30.

−
Maximum iteration number = 100.− Limite = 30 The fitness function as given by Euclidean distance, as in ( 5) is calculated to have best solution after first iteration.
To measure the performance of the GWO and ABC algorithms to recognize Hancitor malware two equations are used.[4], [16], these equations are Accuracy of Recognition packets used to calculate the percentage of correctly classified packets (normal or Hancitor packet) among the total number of packets is computed, as in ( 6) and false alarm rate (FAR) where FAR is referred to as the false positive rate (FPR) or sensitivity, as in ( 7  The number of protocol packet from the total capture packet Time span/s Is the time between the first and last packet Average App Average app: information about NetFlow Traffic Analyzer hardware.

Average size
The average size of the header on the packet Bytes The number of protocol bytes from the total capture packets Average Byte/s The average number of protocol bytes from the total capture packets Average Bits/s The average bandwidth of this protocol in relation to the capture time

EXPERIMENTAL RESULT
The proposed recognition model for Hancitor malware was implemented in MATLAB version R2015a.The proposed model performed GWO and ABC algorithms on the set of packets stored in the Hancitor data file to recognise Hancitor traffics.Two experiments were conducted in the proposed work based on two types of packets: − Packets based on statistic attributes (Length, Packet size limit, Elapsed, Packets, Time span(s), Average ppsmm Average packet size (B), Bytes, Average bytes/s, Average bits/s), see Table 2, the size of packets based on statistic attributes are 3000 packets.

−
Packets based on IPv4 characteristics, see Tables 3 and 4, the size of packets based on IPv4 characteristics are 3000 packets. Comput.Sci.Inf.Technol.


Hancitor malware recognition using swarm intelligent technique (Laheeb M. Ibrahim)  5. Accuracy of GWO and ABC Algorithms for recognition Hancitor malware, After performing experiments on two types of data packets using the GWO algorithm to recognise Hancitor malware, the Experimental result shows that using data based on attributes is better than using data based on IPv4 characteristics to recognise Hancitor malware, see Figure 4, and also after performing experiments using ABC algorithm on two types of data packets to recognise Hancitor malware, the Experimental result also show that using data based on attributes is better than using data based on IPv4 characteristics to recognise Hancitor malware, see Figure 5. From Table 5 and Figure 6, we note that the accuracy rate of the recognition of Hancitor malware using ABC algorithm is better than GWO algorithm in the two types of data Packets based on statistic attributes and Packets based on IPv4 characteristics, and also the False Alarm Rate in ABC algorithm is lower than GWO for two types of data.ABC algorithm is better than GWO algorithm to recognise Hancitor malware in spite of our use of an equal number of packets and the same data because ABC algorithm does not require a complex data training, whether it is simple or complex training data, but rather requires an understanding of the minimum understanding of the problem area that helped in accuracy and speed of discrimination.
Using SI in recognise Hancitor Malware have big advantages, becuase SI presents similar intelligent collective behavior, where SI provides intelligent solutions to problems by the self-organization and communication between individuals in the swarm, and the seamless coordination of all individual activities does not require supervisor.GWO and ABC algorithms used to detect correctly a Hancitor malware in a fast and high recognition rate because GWO and ABC swarm intelligence algorithms have a big advantage, where ABC algorithm is a simplest swarm intelligence algorithm and delivers highly accurate results for optimization problems with levels ranging from simplicity to complexity, and it proved that the ABC algorithm is the best choice for solving Hanictor malware problems and can be applied to many applications, also GWO was identified to be sufficient competitive with other state-of-the-art met heuristic methods to recognised Hanictor malware, it achieves better performance.

CONCLUSION
A new model uses GWO and ABC to recognise Hancitor malware behaviors is proposed.It can protect users from Hancitor malware attacks.In this research GWO and ABC have ability to recognise correctly a Hancitor malware in a fast and precision recognition rate.ABC and GWO gives good results for recognize the existence of Hancitor in the network with accuracy of 79.2% by using ABC and 95.8% using GWO for data depend on static attributes, for the second type of data depend on IPv4 characteristics the recognition rate is 94.3% by using ABC and 92% by using GWO.The prediction of percentage of infection has good performance using ABC better than GWO with data depend on static attributes better than data depend on IPv4 characteristics.

Figure
Figure 3. Proposed model

Figure 4 .Figure 5 .
Figure 4. Recognition accuracy result for Hancitor using the GWO algorithm

Figure 6 .
Figure 6.Recognition accuracy result for Hancitor using the GWO and ABC algorithms

109 Table 2 .
Packets based on statistic attributes from the captured traffic (normal and Hancitor packet)

Table 3 .
Derived attributes from the captured file using IPv4 characteristics (normal packet)

Table 4 .
Derived attributes from the captured file using IPv4 characteristics (Hancitor packet)