M ODELLING M ALWARE P ROPAGATION ON THE I NTERNET OF T HINGS U SING AN A GENT - BASED A PPROACH ON C OMPLEX N ETWORKS

,


INTRODUCTION
Today, any device connected to communication systems may be subject to unscrupulous and malicious individuals, whose main purpose is to access sensitive information.To achieve their goals, they use different specimens of malware [1].This malware often goes unnoticed for a period long enough to study the behavior of the internal network and its elements, in order to extract valuable information.Considering that there are large numbers of nodes deployed on communication systems and, in many cases, they are usually deployed on hostile unattended environments without human supervision, they become a principal target for malware attacks [2].Agent-based modelling and simulation (ABMS) is an effective way to model and analyze complex networks [3].Network consists of agents and the activities of these agents are monitored concurrently.ABMS offers the set of transition rules with consideration to individual device characteristics thus appropriate for malware modelling, where individual device variability is a key consideration [4]- [5].This paper postulates the malware propagation process on scale-free networks by proposing agent-based model and simulation.In scale-free networks, nodes are added with maximum probability node.Agent-based modelling and simulation are instigated for modelling the dynamics of malware propagation scale-free networks.The diversity of nodes in scalefree network by varying parameters, such as node mobility, energy consumption and propagation speed that affect the malware spread in the network.The proposed model is further compared with analytical results obtained from previous agent-based modelling and simulation schemes [6]- [9].The major contributions of this paper are outlined as follows: 1) Creation of an agent-based model and simulation with a decision maker for modelling the malware propagation on large networks using a deep-reinforcement learning algorithm.
2) The node state transition model Susceptible-Infected-Immuned-Recovered-Removed (SIIRR) is developed and the individual node performance measurement is estimated for computing the node reliability using mean-time-to-failure metric.
The rest of the paper is structured as follows: The related literature on malware propagation is explored in section 2. In section 3, the proposed model is presented and the succinct details on the application of deep-reinforcement learning in modelling malware propagation are given.The experimental set-up and simulation of the proposed scheme are discussed in section 4. Analysis is performed to compute the metrics, such as average energy consumption, average infections over time, node mobility and propagation speed.The simulation results are validated and compared with analytical results obtained from previous agent-based modelling and simulation schemes.Finally, the conclusion of this paper and future research directions are given in section 5.

RELATED LITERATURE
The rise in use of IoT devices to launch malware attacks in the recent past has invoked researchers' interest in understanding IoT malware propagation and control.In this section, we review recent literature in malware propagation with a bias towards agent-based modelling which is the approach taken in the our proposed model.
A Markov Random Filed (MRF)-based spatio-stochastic framework is applied in complex communication networks, where malicious threats spread through direct interactions and follow the SI state model proposed by Karyotis [8].It also combines Gibbs sampling with simulated annealing to analyze the behaviour of the systems under various topological and malware-related metrics.The disadvantage of MRF is that it is not isotropic, since it varies in magnitude according to the direction of measurement.Besides, the reliability of individual nodes is not assessed.The rumor diffusion process is proposed to model the outbreak of malware in [7].The limitation of this agent-based analytical model is that it is difficult to prove the validity of the malware-free equilibrium stability (global and local).
In [10], the four aspects of malware propagation modelled were; user mobility, application-level interactions among users, local network structure and network coordination of malware (Botnets).The model was tested for a malicious virus like Cabir spreading among the cellular network subscribers using Bluetooth.A queuing-based malware propagation modelling approach was proposed in complex networks with churn [11].Churn refers to dynamic node variation which captures the dynamics of SIStype malware in time-varying networks.It quantified network reliability and improved the robustness of the network against some generic malware attacks.With the dynamic nature of node variation, it does not consume less energy and also the spreading speed is high.Malware propagation over wireless sensor networks has been proposed in [12], where the network topologies are based on complete or regular graphs.The first disadvantage of this network model is that it does not consider the individual characteristics of sensor nodes which form an important attribute in modelling heterogeneity of nodes and the second disadvantage involved in this model is that parameters such as transmission rate and recovery rate are not explicitly defined.
Batool et al. [9] demonstrates that Internet of Things networks can be modelled using a hybrid approach of using complex network and agent-based models.The construction of IoT elaborated models addressing the emergence and individual characteristics represent an existing research challenge.To model the IoT as a scale-free network, when a new node wants to join the network, it requires the degree and distance of all nodes (centrality measures) in the whole network in order to compute the probability of connecting to each existing node.The centrality measure is a critical measure of how central the node is to communication and connectivity.Betweenness and closeness centralities are calculated in each subnet.Betweenness centrality of a node is the probability for the shortest path between two randomly selected nodes to go through that node and is calculated as: ≠,≠,≠ where, Nsp(j → k) is the number of shortest paths from node i to node k and Nsp(j → i k) is the number of shortest paths from j to node k that pass through i.
Closeness centrality is a measure of how accessible a node is from other nodes and is calculated as: which is an inverse of the average distance from node i to all other nodes.If Cc(i) =1, then you can reach each other node in the network via one step.The centrality measures are key to determine the influence of malware propagative and spreading nodes.
The inherent weakness of the deterministic and stochastic models surveyed in our previous work in literature is the full mix assumption [13].The full mix assumption holds that every node has equal chances of coming into contact with others in the network, which is not necessarily the case in malware propagation on IoT networks where heterogeneity is a key factor.The introduction of the decision maker in the model overcomes the key challenge of arriving at an infection decision based on individual node interaction and individual node parameters, not just contact.

THE PROPOSED MODEL
In this section, a model is formulated to model malware propagation over large-scale-free communication networks.A scale-free network environment for heterogeneous IoT devices is visually illustrated in Figure 1.The notion for modelling of malware propagation on large-scale-free networks is as follows; mitigate effects of malware over large-scale-free IoT networks, set flexible simulation parameters (number of nodes/devices are high and transmission range is also high), reduce the malware propagation speed in SFIoT networks and analyze regular changes in the subnets due to the node mobility rate between subnets within a time-varying environment.We consider a network as a graph with N nodes and M edges.The total population of N nodes is divided into T subnets, with ni nodes where i=1,2,. . ., m nodes.The total population of nodes is given by Equation 3: For each subnet T, the probability Pi is used to add a link between two nodes that should satisfy Equation 4: where, K denotes average degree of nodes in the entire network.When a new node is announced to the network to be attached to N nodes with high degree K, the announcement of the new node and preferential attachment continue until a network with !=t+N has been deployed.The principle of the decision maker-based model of malware propagation on sub-netting-based scale-free networks is based on the SIIRR model states.Decision maker is denoted as an agent considered for modelling malware propagation.Each node in the network has defined heterogeneity behaviour and set of rules is used for modelling the node behaviour.While modelling the malware propagation, nodes are classified into five states.In each time stamp, a node transits to one of the five possible states as listed below.The state transition diagram for SIIRR model is depicted in Figure 2. 2) Infected (I): The node attracted by malware is called the infected node.In this state, a node propagates the malware infections to all their neighbours.
3) Immune (I): The node that is unable to become infected by any node is called immune.This type of nodes has an immunization scheme, such as an anti-malware solution, to detect and block malware.
4) Recovered (R): It refers to infection removed state and does not get infected again.

5) Removed (R):
The node or hub is attracted by the malware and can spread malware at time, t.
The flowchart in Figure 3 shows the steps in model formulation.Algorithm 1 shows the detailed procedure for sub-netting-based network construction.

Modelling Deep-reinforcement Learning in Malware Propagation
A Deep-reinforcement Learning (DRL) scheme is adopted to illustrate the variables used for a Continuous Markov Chain Model (CMCM).The main goal of the CMCM in a DRL problem is to increase the obtained rewards.The tuples of DRL are as follows: where, S denotes the set of states, A is a possible set of actions, E is the environment, R is the reward function for state and action.In DRL, the agent has the ability to act where each action influences its  6.
Rewards depend on the current state and the action performed.Discount factor (γ): The discount factor controls the importance of future rewards (γ ∈ [0, 1]).State transition distribution: It is the transition probability that action A in state S at time t will lead to state S t at time t + 1: PA(S, S t ) = P R(S t | S, A).The policy (π) where (π)= At and the policy for a state is denoted (π)(S) −→ A which changes with the reward policy as: where 0 ≤ γ < 1.
In the Q-learning approach, an approximate reinforcement machine learning algorithm is presented for IoT devices.Consider the Q-value updated equation as formulated in Equation 8.
where, Q(S t , A t ) is the Q-value of current state S t when action A t is selected at time t, α is the learning rate, γ is the discount factor, where γ is set between 0 and 1, max  ′ ((  ,   ),  ′ ) is the maximum possible Q-value in the next state S ( +1) if selects possible action  ′ .ℜ (  ,   ) denotes the reward function when state St selects At.
()  =  () (13) where, _ is the infection rate S !I, _ is the recovery rate I !R, _ is the removed rate.The total population N (network size) at time t is computed as: After the scale-free network formation, all the hubs, decision makers and ordinary nodes are set to susceptible state.At time slot t = 1, one or more nodes are set into infected state and each time slot t = 2, 3 or 4 . . .n, malware propagates from infected nodes to their adjacent nodes through communication links.The node state changes continuously at each time slot.

Reliability Computation
The reliability function for a node is computed by using Mean Time To Failure (MTTF).However, most of the previous schemes in malware modelling have not considered the reliability factor.Specifically, reliability is the probability that the system will perform its intended function according to the specified design.To improve the network performance, we consider several metrics for computing the reliability.These are; node degree, node mobility rate, node transmission rate and distance between two nodes.
Node degree is the number of links (in degree and out degree) that lead into or out of the node.For each sub-net, the mobility of the node (i) is computed as follows: where NCP is the Node Current Position and NOP is the Node Origin Position.A transmission rate (in Kbps) between two nodes depends on the message size (Ds) and distance between two nodes (DN) given as: where C 1 and C 2 are constant variables.The distance between two nodes is computed using the Euclidean distance metric, which is calculated as: The reliability of a node R(N (t)) is the probability that the node will be successful in the interval between time 0 and t as shown in Equation 18: In Equation 18, r is a random variable that denotes the time-to-failure or failure time.The mean time to failure is computed by Equation 19: Performing integration operation yields; In Equation 20, t(R(N (t) → 0 and x → ∞.It yields the second term, which equals: For each sub-net in a scale-free network, the reliability of a sub-net at time t can be computed by: −∈ℎ Moreover, any path composed of sub-nets in a scale-free network R(P(t)) at time t can be computed as:

𝑠𝑢𝑏−𝑛𝑒𝑡∈𝑝𝑎𝑡ℎ
As a result, a sub-net-based scale-free network consists of reliable paths.Hence, the reliability of the network (R(t)) is computed by; ℎ The topology of a scale-free network is constructed based on the actual parameters (node degree and maximum probability of a node) in a sub-net.The proposed scheme is implemented in the field of Internet of Things.The reliability for each node in the scale-free network is under malware propagation situation.

SIMULATION
In this section, the modelled propagation algorithm is simulated.The proposed scheme was compared to analytical results obtained from published works as follows: for energy consumption, to the work of Batool et al. [9]; for average infection rate, to the works of [6], [7]- [14]; for propagation speed and node mobility to the work of [8] based on the performance metrics described in sub-section 4.2.

Experimental Set-up
The model is implemented using NS-3 (version 3.26) for simulation.NS-3 is a network simulator which is mainly supported for Linux and written using C++.But, the binding of NS-3 is written in Python.
In our experiment, the Gaussian Markov (GM) mobility model is used.Gauss-Markov (GM) mobility model is used to simulate mobility of device agents.Gauss-Markov mobility model caters for temporal dependency; i.e., it has a memory to correlate previous states.In Gauss-Markov, the velocity of the device i s modeled as a Gauss-Markov stochastic process, as it is assumed to be correlated over time.
In this model, node speed and direction are considered with respect to time, taking into account the previous speed s n−1 , previous direction dn − 1, the mean speed s ¯ and direction d ¯.The randomness parameter α has a Gaussian distribution.Current speed and direction are given by:  Four traffic lights for each road lane entering the intersection are considered.The blue circle in the upper right section represents the decision maker entity that manages the traffic light timing.Assume that each node moves independently with the same average speed.All nodes in the network have the same transmission range of 250 m.The simulated traffic is of a Constant Bit Rate (CBR).The proposed scheme is implemented in a single intersection-based road traffic system, then the sub-net construction process is performed.The process is based on the node residual energy and degree of nearest node.In each sub-net, decision maker is selected.All nodes are connected into hub.If the node is not connected to hub, the route between the node and hub is found using FIFO rule.Next, the node sense data and decision maker classify the node state as susceptible, infected, immune, recovered and removed using Deep-Reinforcement Learning (DRL).A visualization showing the formation of a scale-free network can be seen in Figure 5.The simulation settings and parameters are summarized in Table 1.

Simulation Performance Metrics
The proposed scheme is evaluated for performance based on the following metrics: 1) Energy consumption: It is the rate of energy used for packet transmission.Energy conservation is an important issue while communicating with other nodes.
2) Average infection rate: It is the number of nodes found to be infected during packet transmission.
3) Propagation speed: It can be computed by finding the number of infected nodes at time t and is based on the threshold value for different states.
4) Node mobility: It has long been recognized as an efficient metric for modeling malware propagation in Internet of Things; e.g.road traffic systems and smart office application systems.It causes major issues, such as increased energy consumption and connectivity failure.Hence, it needs to be considered in complex networks, so that it brings benefits of reduced energy consumption and reduced spread of malware over communication networks.

Comparative Analysis
The statistical analysis of the obtained simulation raw data is carried out.Average (means) and the confidence intervals are calculated.The confidence interval of the data realized from the simulation is calculated as follows.Simulations x1, x2, ..., x5 are carried out for each set of network size in the simulation.
Since the number of sample simulations is less than 30, that is n = 5, the t distribution with n-1 degrees of freedom is adopted as the statistical test.In order for the the t distribution to be applied, the data needs to follow normal distribution.The test for normality is carried out to provide evidence that the simulation data is normally distributed.The normal probability plots are used to depict the outcome of the normality test.Shapiro-Wilk normality measure is also applied, since simulation instances are less than 2000.Shapiro-Wilk test is carried out at all network sizes.The confidence interval is given as [L, U], where L is the lower bound and U is the upper bound of the interval.This can be expressed as [L, U] = [averagemargin of error, average + margin of error].The confidence interval is calculated as: where, tc is the critical value from the t distribution depending on the confidence level.The confidence level of 95% is used in this study.
The simulation results are subject to the test of normality for each of the network sizes and parameters.Shapiro-Wilk test statistics and the normal probability plots are derived for each of the network sizes and parameters.The normal probability plot is a visual illustration showing whether the data fits a normal probability distribution.The simulation raw results are plotted against the theoretical quantiles.
If the data lies along the straight, that data fits the normal probability distribution.The test proved that the results on all network sizes were normally distributed as required for the use of Student t distribution in the calculation of confidence interval.For illustration purposes, the example of the normal probability plots for network size of 60 nodes is shown here.Figure 6 shows the normal probability plots for a network of 60 nodes.
Shapiro-Wilk test statistics are calculated based on the following hypotheses: H0: The population is normally distributed.H1: The population is not normally distributed.
If the significance level Sig.= α > 0.05, we can't reject H0, thus the population is normally distributed.Shapiro-Wilk test statistics indicate that all the data from the simulations is normally distributed at 95% confidence interval.For example, the network size of 60 nodes shown in Figure 6 yielded Shapiro-Wilk test statistics and confidence levels as shown in Table 2. From Table 2, the significance level Sig.= α > 0.05 satisfies H0 and the data is normally distributed.The 95% confidence level upper and lower bounds are also calculated.

Energy Consumption
First, we examine the energy consumption for our proposed scheme and then compare with the previous scheme.Energy consumption is the practice of quantity of energy used.It can be achieved through efficient energy use over complex communication environment.The tasks that are considered for energy consumption include: sensing, transmission and communication.The total energy consumption was estimated in milli joule (mJ).It is formulated as follows: Energy consumption for transmission, ET is computed by: Energy consumption for reception E R is computed by: Energy consumption for idle state E I is computed by:   =  4     (30) In Equations from ( 27) to (30), D is the transmission distance, m is the packet length, α 1 -α 4 are the system dependent parameters, t I is the idle time and Pm is the packet processing rate of the node.Five simulations were carried out for each network size and energy consumption measurements were noted for each run.Figure 7 shows the average energy consumption comparative analysis.Sub-Figure 7(a) shows the energy consumption rates at varied network sizes on the proposed scheme and in Sub-Figure 7(b), the average rate of energy consumption for the proposed scheme and that of HM-CN [6] are compared.
Previous work; namely, HM-CN [6], noted that sensing and communication are the most energyconsuming tasks.Transmission and reception cost is high, especially for short-range communication.
These drawbacks are solved and our proposed scheme provides a realistic estimation of energy consumption in networks.The proposed scheme is simulated for N=200 nodes (nodes varying as 20, 40,...,200).The decision maker isolated malware-infected nodes which are not allowed for communication and sensing.Furthermore, we follow FIFO rule for packet transmission.Hence, we obtained minimum energy consumption.

Average Infection Rate
Infection rate is an important parameter in modelling malware dynamics and propagation.During malware behaviour modelling, there is a need to examine the effect of the infection rate of each node and compute the average infection rate for various network sizes.Simulations were taken for network size variations.Figure 8(a) illustrates the infection rates at varied network sizes.The proposed scheme infection rates are based on the scale (threshold) of malware prevalence and the scheme is compared to the scheme with Dynamic Analysis and Control (DAC) scheme [6], Rumour Spreading Process-Scale Free Networks (RSP-SFN) [14] and Markov Random Field-Complex Communication Networks (MRF-CCN) [8].A snapshot of the proposed vs. previous schemes in terms of infection rate is depicted in Figure 8(b).
From the simulation results, the proposed scheme gave less number of infections per given number of nodes.The threshold of α is directly proportional to the malware infections.If α is small, the number of infected hosts will largely increase.In Dynamic Analysis and Control (DAC) [6], the propagation control strategies did not perform well, hence decreasing the real-time immunity rate and increasing the proportion of infected nodes.In Rumour Spreading Process-Scale Free Networks (RSP-SFN) [14], the density of infected nodes varied and increased under different vaccination rates, such as λ=0.  the nodes are not reliable for long time.This leads to increasing the number of infection hosts.In our proposed scheme, the reliability is computed each time interval and also during packet transmission to monitor infection rates of the nodes in each sub-net.

Node Mobility
In agent-based simulation modelling, the node mobility is managed by three factors: movement detection, network connectivity or structure and location tracking.To observe node mobility, the performance at iterations i to i + 1 (between 2-4 seconds) was set in the proposed scheme.When the mobility increases above its threshold level, hub fails as noted by the decision maker and data packet transfer times between intermediate nodes are increased.In the proposed scheme, five simulations on the influence of node mobility on malware propagation were carried out.Figure 10(a) plots the mobility of nodes in a malware prone simulation against time for the five simulations.In Agent-based Simulation-Scale-Free Networks (ABS-SFN) [7], if the node mobility increases beyond the threshold, the scale-free network may disconnect.The time of the malware on the network and the malware outbreak in the sub-nets are dependent on the mobility rate.Mobility rate highly influences the spreading of network malware.When the mobility rate is smaller than the threshold value, the node in the sub-network dies.A performance comparison for node mobility between the proposed scheme and Scale-Free Networks (ABS-SFN) [7] can be seen in Figure 10.
Figure 10.Malware effect on node mobility.

CONCLUSION AND FUTURE WORK
Agent-based modelling simulation in complex networks is a challenging issue.In this paper, we developed a malware propagation model using agent-based approach and deep-reinforcement learning on a scale-free network in IoT.In the modelled system, Susceptible-Infected-Immuned-Recovered-Removed (SIIRR) transitions were formulated.The effect of malware propagation on the model was evaluated based on performance metrics, such as average energy consumption vs. number of nodes, average infections over time, node mobility over time period t and spreading/propagation speed.Our simulations showed that the introduction of a DRL-based decision maker results in a more versatile IoT model, where malware propagation is not just based on contact.
As future work, we intend to explore model stability analysis and the effect of immunization on different devices in IoT.The stability analysis will entail global and local model equilibrium.For the effect of immunization, we plan to incorporate mechanisms, such as targeted and proportional immunization, in the model.Employing immunization and quarantine mechanisms can offer a promising approach to make the model more realistic and resilient.

Figure 1 .
Figure 1.Scale-free Internet of Things networks.

Figure 2 .
Figure 2. SIIRR model transition diagram. 1) Susceptible (S): It is the first state of a node or hub and it often refers to infected in future.

Figure 3 .
Figure 3. Scale-free Internet of Things (SFIoT) malware propagation model.futurestate of the agent and success can be estimated using scalar reward signal.Q-learning-based reinforcement learning algorithm solves the decision making problems.Q-learning is defined as the quality of action in given state S at time t.Environment (E): The environment is the area in which agents communicates with each other.Agents (A): In a given environment, an agent receives information and performs the corresponding action.The main goal of agents is to pick the best policy that increases the total reward.States (S): This is the condition defined by agent characteristics within the defined transitions.Actions (A): A state transition from one state St to another state S(t+1)at time t+1 is called action.Reward (R): It represents the closeness of the current state to the true class.It is formulated by Equation6.

Figure 4
Figure 4 visually illustrates the deep-reinforcement learning approach adopted in the model.The Qlearning model is used to classify the nodes as part of five possible transition states.It specifies transition of nodes between states from S → I, I → R and R → S, where the recovered state and removed state are terminal.The nodes do not transition to another state after being in the removed state or the recovered state.It is represented as the SIIRR model and mathematically formulated as: ()  = − () ()  (9) ()  = − () ()  −  () (10) ()  =  ()(11)()  =  ()(12) =   −1 + (1 − ) ̅ + √(1 −  2 ) −1   =   −1 + (1 − )  ̅ + √(1 −  2 ) −1 (25) where, s xn−1 and d xn−1 are random variables from a Gaussian distribution.The simulation of the proposed scheme uses 200 node moves in a 5000 m × 5000 m rectangular region for 100 seconds of simulation.These nodes are vehicles deployed along the road perimeters and 20 sensors are used for sensing information.

Figure 6 .
Figure 6.Normal probability plots for network size = 60 nodes for (a) Energy consumption (b) Infection rates (c) Node mobility and (d) Propagation speed.

4. 3 . 3
Propagation SpeedPropagation speed was computed based on the density of nodes.The network topology greatly affects the modelling of malware propagation on IoT-based communication networks.In malware propagation, characterization of propagation speed is important.Understanding how propagation speed impacts the network is also necessary.The network size was varied in each simulation and the results of the five simulations are shown in Figure9(a).The proposed scheme propagation speed was compared with those of the previous schemes with respect to number of nodes on varied network size as shown in Figure10(b).In Agent-based Simulation-Scale-Free Networks (ABS-SFN)[7], the following analytical values were considered for the parameters α(k) = k−3, k = 1, 2, ...n, β = 0.3, ε = 0.01, γ = 0.08 and µ = 0.008.In addition, the reproductive ratio R0 = 3.9245 was used.If the density of infected nodes increases, the malware propagation speed also increases.The number of infected nodes increases in the ABS-SFN, whereas in our proposed scheme, the decision maker on each sub-net reduces the number of infected nodes.The proposed decision maker monitors each sub-net to determine whether it is attracted by the malware or not.

Table 1 .
Simulation settings and parameters.

Table 2 .
Test on network size of 60 nodes.