Efficient cluster based intrusion detection in homogeneous and heterogeneous WSN

Wireless Sensor Networks (WSNs) offer an excellent opportunity to monitor environments, and have a lot of interesting applications in warfare. Intrusion detection in Wireless Sensor Network (WSN) is of practical interest in many applications such as detecting intruder .The intrusion detection is defined as a mechanism for a WSN to detect the existence of inappropriate, incorrect, or anomalous moving attackers In this paper, I consider the cluster based architecture according to two WSN models: homogeneous and heterogeneous WSN. Furthermore, I derive the detection probability by considering two sensing models: single-sensing detection and multiple-sensing detection. In this Intrusion detection model we are going to track and detect Intrusion in a Homogenous and Heterogeneous Wireless Sensor Networks (WSN) using the intrusion distance and detection probability with various Tracking and Detection models.


INTRODUCTION
especially if the WSN is deployed in a hostile environment. It is also important that the WSN be robust to losing some of the sensor nodes, because it can be very easy for an adversary to capture any given node 5 . The general network topology is a dense collection of nodes, randomly distributed over some geographic area. Traffic typically goes from all the sensor nodes to a single sink, called the base station (BS), or broadcast traffic. A lot of work is currently being done on routing protocols, and not all of the details are figured out and agreed upon but, in general, routing is multi-hop like an ad-hoc wireless network. Cluster-based routing is a popular idea, because it is possible to exploit the fact that nearby nodes has highly correlated data 1 . In clusterbased routing, the network is divided up into clusters, which consist of a cluster head (CH) and member nodes (MNs). The MNs send their data to the CH, which aggregates the data before sending it out of the cluster toward the base station. In this paper, I will assume that the WSN is in a hostile environment, where the deployers have no physical contact with the nodes, but attackers may. An attacker is someone who tries to disturb the functionality of the network in any of the ways. I will also talk about "intrusion detection". This is defined as identifying an intruder, which is an attacker who has gained control of a node, or injected falsified or repeated packets into the network. This is not to be confused with other "intrusion detection" systems using WSNs, which monitor a physical environment, looking for intruders using a WSN for sensing and collecting information. In this paper, I will explain the special circumstances presented by WSNs, summarize work that has been done in security for WSNs and propose a new cluster-based approach for intrusion detection. In section 3, I will classify the types of attacks on WSNs.In section 4 I will develop the intrusion detection modal. In section 5, analyze the intrusion detection in homogeneous WSN, and section 6 examine the intrusion detection in heterogeneous WSN.In section 7 I will discuss the overall steps for cluster based architecture. in section 8 I will show the simulation of one part of this paper. Finally, I will conclude the paper in section 9.

Related Works
Intrusion detection is one of the critical applications in WSNs and recently several approaches for intrusion detection in homogeneous WSNs have been presented [8][9][10] . The focus of these approaches aims at effectively detecting the presence of an intruder. First, the problem is investigated from the aspect of the network architecture. Kung et al. 9 take advantage of a hierarchical tree structure to effectively track the movement of an intruder. The hierarchical tree consists of connected sensors, and is built upon expected properties of intruder mobility patterns (i.e., its movement frequency over a region). Based on the hierarchical tree, it allows efficient record of the intruder's moving information and supports fast querying from the base station. Another tree structure for tracking an intruder, called as a logic object tracking tree, is developed by Lin et al., 10 . The logic object tracking tree reduces the communication cost for data updating and querying, by taking into account the physical network topology. In particular, the logic object tracking tree targets to balance the cost of updating and querying so as to minimize the total communication cost.
Secondly, the intrusion detection problem has been considered from the constraint of saving network resources. For example, Chao et al., 11 have addressed the issue of tracking a moving intruder by power-conserving operations and sensor collaboration. To achieve this, the authors defined a set of novel metrics for detecting a moving intruder, and developed two efficient sleep-awake schemes called PECAS and MESH, to minimize the power consumption. Ren et al., 8 further studied the trade of between the network detection quality (i.e., how fast the intruder can be detected) and the network lifetime. Therefore, the sensor coverage can be carefully designed according to the detection probability with respect to specific application requirements on operating time. The authors then proposed three wave sensing scheduling protocols to achieve the bounded worst-case detection probability. Rather than a static WSN architecture as the above approaches, Liu et al., 12 have modeled the intrusion detection problem in a mobile WSN where each sensor is capable of moving. The authors have given the optimal strategy for fast detection and shown that a mobile WSN improves its detection quality due to the mobility of sensors.

Classification of Attacks
There are four aspects of a wireless sensor network that security must protect 1) Confidentiality 2) Data Integrity 3) Service Availability 4) Energy The first three are addressed by security systems in wired networks and non-energyconstrained wireless networks, but the fourth is unique to the sensor network application. I will thus classify the attacks that can be launched by which of these aspects they attack. In this section I will briefly describe the kinds of attacks in each of these categories and the ways in which the nature of the wireless network causes trouble.

Stealing Data (Confidentiality)
When we think of electronic security, this is the first kind of attack that comes to mind. We want to be able to send messages without enemies being able to figure out the contents. Because of the wireless nature of the WSN, it is easy for an attacker to listen in on all the messages sent in the network, so to maintain confidentiality, the network must encrypt all the messages.
One of the biggest ideas in encryption today is public-key encryption. This is very powerful because it allows one to receive encrypted messages without even sharing a secret key with the sender. But this asymmetry comes at a cost. RSA public key encryption involves exponentiation the message, which can be quite computationally expensive, and is not really feasible in WSNs, where the nodes typically are not capable of doing such computations in a time-and energy-efficient way. But symmetric key encryption mechanisms pose the problem that if any node is apprehended by the attacker, he can look in its memory to find the key and effectively be able to masquerade as any other node (compromising the data integrity) and listen in on any other conversation.

Altering/Generating False Data (Data Integrity)
Because sensor networks are used to monitor some environment, data integrity is even more important than confidentiality. This is because applications may include tracking objects that physically move through the environment and since attackers can typically see the physical environment, the only thing they could gain from listening in on the data is a sense of where the sensors are located. On the other hand, if they are able to alter to make the data collected by the WSN incomplete or incorrect, the deployed of the WSN will not know what is really going on in the environment he is trying to monitor.
In other networks, the same asymmetric key system that is used for encryption can be used for digital signatures, but this requires a lot of additional overhead. The signature may consist of a lot of additional bytes of data added on to a transmission (which takes additional energy), and verifying the signature can be very computationally expensive. Clearly, different techniques are needed for WSNs.

Attacks on Service Availability
This class of attacks is not at all concerned with the actual data that is begin sent. Rather, the goal is to make the network not function properly. This can be done by sending bogus routing information (for example advertising a route that does not exist). It can also be done by flooding the network with packets (denial of service attack), or even jamming the frequency at the physical layer. Another interesting type of attack is homing. In a homing attack, the attacker looks at network traffic to deduce the geographic location of critical nodes, such as cluster heads or neighbors of the base station. The attacker can then physically disable these nodes. This leads to another type of attack: the "black hole attack". In a "black hole" attack, the attacker compromises all the neighbors of the base station, making it effectively a black hole. A final kind of attack on service availability is a desynchronization attack, where the attacker tries to disrupt a transport-layer connection, by forging packets from either side.

Denial of Sleep Attacks (Energy)
The constrained energy of WSNs adds a new element that can greatly complicate security issues. Because there is a limited amount of energy available and no way to replenish it, it is not sufficient to make sure that bad data is not used. We need to make sure that we do not waste energy listening to or re-transmitting bad packets. This introduces a whole new set of possible attacks. These include constantly sending RTS packets to stop nodes from going to a low power "sleep" state, sending falsified or repeated packets so that nodes waste energy re-transmitting them, or draining the power of a node by forcing it to do excessive computations 3 .

Detection Model
There are two detection models in terms of how many sensors are required to recognize an intruder: single-sensing detection model and multiple-sensing detection model. It is said that the intruder is detected under the single-sensing detection model if the intruder can be identified by using the sensing knowledge from one single sensor. On the contrary, in the multiple-sensing detection model, the intruder can only be identified by using cooperative knowledge from at least k sensors (k is defined by specific application requirements). For simplicity of expression, multiplesensing and k-sensing are interchangeable in the following discussion.
In order to evaluate the quality of intrusion detection in WSNs, we define three

Metrics as follows Intrusion Distance
The intrusion distance, denoted by D, is the distance that the intruder travels before it is detected by a WSN for the first time. Specifically, it is the distance between the point where the intruder enters the WSN and the point where the intruder gets detected by any sensor(s). Following the definition of intrusion distance, the Maximal Intrusion Distance (denoted by ,  > 0) is the maximal distance allowable for the intruder to move before it is detected by the WSN.

Detection Probability
The detection probability is defined as the probability that an intruder is detected within a certain intrusion distance (e.g., Maximal Intrusion Distance ).

Average Intrusion Distance
The average intrusion distance is defined as the expected distance that the intruder travels before it is detected by the WSN for the first time.

Intrusion Strategy Model
We consider two intrusion strategies for the movement of the intruder in a WSN as shown in Fig. 1 and Fig. 2.
We consider two intrusion strategies for the movement of the intruder in a WSN as shown in Fig. 1 and Fig. 2. If the intruder (say, a panzer) already knows its destination before entering the network domain, it follows the shortest path to approach the destination. In this case, the intrusion path is a straight line (D1) from the entering point to the destination as illustrated in Fig. 1. The main idea behind this strategy is that the straight movement causes the least risk for the intruder due to the least area that it has to explore by following a straight line (i.e., the shortest path) toward the destination. The corresponding intrusion detection area S1 is determined by the sensors sensing range rs and intrusion distance D1 as shown in Fig. 1. It is because the intruder can be detected within the intrusion distance D1 by any sensor(s) situated within the area of S1.
On the contrary, if the intruder does not know its destination, it moves in the network domain in a random fashion. We consider the intruder aims to minimize the overlapping on its path. Thus, the intrusion path of the intruder can be regarded as a non-overlapping curved line (D2) and the intrusion area accordingly is a curved band S2 as illustrated in Fig. 2.
In the above two strategies, if the intruder travels the same distance i.e., D1 = D2, the corresponding intrusion detection areas approximately satisfy S1 = S2. Therefore, we adopt a straight path in the following discussion, and the analytical results can be directly applied to the case of the curved path. Furthermore, the start point of the intruder can be located in the network boundary or a random point inside the network domain. For example, the intruder can be dropped from the air and starts from any point in the network domain.

Intrusion detection in homogeneous WSN
In this section, we present the analysis of intrusion detection probability of a heterogeneous WSN in single-sensing detection and multiplesensing detection models.

Single Sensing Homogeneous Network
Since all the nodes are identical, the main design objective is to guarantee a certain network lifetime (in terms of number of data gathering cycles), and at the same time ensure that all the nodes expire at about the same time so that there is very little residual energy left behind when the network expires. Hence LEACH uses random and periodic rotation of the cluster heads for load balancing. Role rotation also ensures that a node which is located near the periphery of a cluster is nearer to the cluster head at some other time. Since each node has to be capable of acting as a cluster head, it is necessary for each node to have the hardware capable of performing long range transmissions to the remote base station, complex data computations (if required), and co-ordination of MAC and routing within a cluster. Since all the nodes are capable of acting as a cluster head, the failure of a few nodes does not seriously working of the scheme. Thus the system is robust to node failures.
In the single-sensing detection model, the intruder can be recognized once it moves into the sensing coverage disk of any sensor(s). According to the intrusion strategy, the intruder may access the network domain from any point of the network boundary or a random point in the network domain. When the intruder starts from a point of the network boundary as shown in Fig. 3, given an intrusion distance D ¸ 0, the corresponding intrusion detection area SD is almost an oblong area. This area includes a rectangle area with length D and width 2rs, and a half disk with radius rs attached to it. It has SD = 2 * D * rs + rs2. /2 3,8 .

K-Sensing Detection in Homogenous WS networks
In the k-sensing detection model, an intruder has to be sensed by at least k sensors for intrusion detection in a WSN 4 . The number of required sensors depends on specific applications. For example, at least three sensors' sensing information is required to determine the location of the intruder. Let pk [D=0] be the probability that an intruder is detected immediately once it enters a WSN with node density and sensing range rs in ksensing detection model. It has Let be the probability that the intruder is detected within the maximal intrusion distance in a K-sensing detection model for the given homogeneous WSN.Then p k [D], can be calculated as is the intrusion detection area with respect to the maximal intrusion distance _. If there are at least k sensors in the area S  , the intruder can be sensed by the k sensors, and the k sensors could collaborate with each other to recognize the intruder. From (3),

Fig. 3:
Denotes the probability that i sensors are located in the area of . Then , is the probability that less than k sensors are located in the area. Thus, the complement of is the probability that there are at least k sensors located in the area S  . If this is the case, the intruder can be sensed by at least k sensors from the WSN with probability before it travels a distance of .Finally, the probability that the intruder is detected within the maximal intrusion distanc in k-sensing detection model can be Derived as

Intrusion detection in Heterogeneous WSN Single Sensing Heterogeneous Networks
Heterogeneous sensor networks use two or more types of nodes with different functionalities. For example, the authors in [4] propose using two types of nodes; type 0 nodes which act as pure sensor nodes, and type 1 nodes which act as the cluster head nodes. Some of the salient features of such networks are: 1.
Since the cluster head nodes are predetermined, and the sensor nodes use single hop communication to reach the cluster head nodes, the sensor nodes near the periphery of the cluster have the highest energy expenditure among all the sensor nodes. It is this worst case energy expenditure that has to taken into account in battery energy dimensioning. Thus there is a waste of energy due to the residual battery energy of the sensor nodes that are near the cluster heads.

2.
Since only the cluster head nodes bear the responsibility of transmitting to the distant base station, the rest of the nodes can be designed with simple hardware that enables shor t range communication. Thus the hardware complexity's limited to only a few nodes.

3.
A cluster head node serves as the fusion point, as well as the command center of its cluster. As a result when a cluster head node fails, all the sensor nodes in that cluster have to be re-assigned to other neighboring clusters. In the extreme case, it is possible that the entire cluster head nodes Might fail, thereby bringing down the entire network.
Thus the system is less robust to node failure as compared to a homogeneous sensor network.

K-Sensing in a Heterogeneous WSN
In the k-sensing detection model of a heterogeneous WSN with two types of sensors, at least k sensors are required to detect an intruder. These k sensors can be any combination of Type I and Type II sensors.
Type I sensor that has a larger sensing range ,as well as a longer transmission range , and Type II sensor that has a smaller sensing range rs2, as well as a shorter transmission range rx2.
For instance, if three sensors are required to detect an intruder for a specific application, the intruder can be detected by any of the following sensor combinations: 1.
Three Type I sensors, 2.
One Type I sensor and two Type II sensors, and 4.
Two Type I sensors and one Type II sensor. Trace and Draw Graph End -End Delaý Watch Dog Probabilitý Performance

Cluster-Based Security
As we have seen in the G-MAC example, clusters can provide major advantages in sensor network security. In the case of G-MAC, we let the GS be the CH. The CH can also monitor the traffic coming from each MN and figure out if any of them have been compromised. It can then blacklist these nodes, isolating them from the network. In case a CH is compromised, MNs must also have the ability to decommission the CH if there are enough MNs that agree to do so 7 . This will defend against homing attacks. It is critical that several nodes agree to decommission the cluster head, because if only a few nodes are compromised, they should not be able to take down the cluster head. When a node is removed, its transmissions will be ignored, and nothing will be sent to the node. However, it will still be able to hear and understand broadcast traffic. This can be fixed with a slight modification to ìTESLA in which the function F (·) is not fixed at deployment time, but is rather a function that takes two arguments: a cluster key and key for the broadcast Kn. We can then generate keys using KN = F (KN+1). (3) When a node is removed, the CH can generate a new, and inform each of the remaining nodes individually. An analysis of this protocol modification may be included in future work. When a node is monitoring another node, there are two ways in which it can detect misbehavior: anomaly detection and signature detection [4]. In anomaly detection, the monitoring node looks for deviations from typical behavior. This has a high probability of false alarm, so it is not often used [4]. Signature detection, on the other hand, looks for particular types of 5 misbehavior. This leaves it susceptible to new creative types of attacks, but there are not really a lot of new different actions a misbehaving sensor node could take. The typical actions in this application would be dropping packets, duplicating packets, or causing collisions [6]. We discussed the mechanisms for intrusion detection in 5 and 6, but while we know how to verify packets, we need to determine the optimal place to do the checking. It turns out that the CH is the right place to check for authenticity, as discussed in [4]. The simplest idea for intrusion detection would be to have all nodes monitor each other in promiscuous mode, meaning they listen in on all transmissions, even if they are not the recipient, but this wastes way too much energy. The other extreme would be to only check the packets at the end (base station or its neighbors), but by waiting to check the packet, we run the risk of transmitting a bad packet far too many times, consuming valuable resources. The packet should be checking only a few times, and early in the path. [4] Suggests that this should occur at the cluster head. All packets should be first sent to the cluster head, which checks their authenticity before forwarding them out to the rest of the network. But a malicious node can avoid this by neglecting to send its packet to the cluster head first, instead opting to send it out to another cluster. To compensate for this, every node should have a probability, p, of sending a packet to its cluster head to be checked. The higher the p, the more energy consumed in checking, and the longer the route is made, but the more likely we are to catch malicious packets. Selecting the proper p may possible be done using the method.

The Clustering Algorithm
The clustering algorithm serves to mitigate the problems associated with a low cost wireless network. The algorithm begins with the assumption that each node has been assigned a unique identification number in the pre-deployment phase. These unique IDs will be appended to any message passed through the network so that each successive node knows which node sent the received message. A flow-diagram of the clustering algorithm is given in Appendix B. Before beginning a discussion of the clustering algorithm used, the reader should understand that the following outlines the actions that the wireless sensor network takes after event detection. The clustering algorithm is not, however, responsible for event detection itself, as this could be achieved in a variety of ways. The mission profile to which the WSN is tailored in the pre-deployment phase would ensure proper event detection.
Upon activation the nodes begin the declaration phase, in which each node broadcasts its unique ID to all other nodes within range. When a node receives the declaration transmission from its neighbors, it records the nodes within its immediate surroundings. Next, the nodes begin to establish a hierarchy for message handling by assigning themselves different classes of nodes.

LEACH: Low-Energy Adaptive Clustering Hierarchy
LEACH is a self-organizing, adaptive clustering protocol that uses randomization to distribute the energy load evenly among the sensors in the network. In LEACH, the nodes organize themselves into local clusters, with one node acting as the local base station or cluster-head. LEACH includes randomized rotation of the high-energy cluster-head position such that it rotates among the various sensors in order to not drain the battery of a single sensor. In addition, LEACH performs local data fusion to "compress" the amount of data being sent from the clusters to the base station, further reducing energy dissipation and enhancing system lifetime.
Sensors elect themselves to be local cluster-heads at any given time with a certain probability. Each node makes its decision about whether to be a cluster-head independently of the other nodes in the network and thus no extra negotiation is required to determine the clusterheads. These cluster head nodes broadcast their status to the other sensors in the network. Each sensor node determines to which cluster it wants to belong by choosing the cluster-head that requires the minimum communication energy. Once all the nodes are organized into clusters, each cluster-head creates a schedule for the nodes in its cluster. This allows the radio components of each non-cluster head node to be turned off at all times except during its transmit time, thus minimizing the energy dissipated in the individual sensors. Once the cluster head has all the data from the nodes in its cluster, the cluster-head node aggregates the data and then transmits the compressed data to the base station.
The system can determine, a priori, the optimal number of clusters to have in the system. This will depend on several parameters, such as the network topology and the relative costs of computation versus communication. If there are fewer than optimal number of cluster heads, some nodes in the network have to transmit their data very far to reach the cluster-head, causing the global energy in the system to be large. If there is more than optimal number of cluster heads, the distance nodes have to transmit to reach the nearest clusterhead does not reduce substantially, yet there are more.
Cluster-heads that have to transmit data the long-haul distances to the base station, and there is less compression being performed locally. In addition to reducing energy dissipation, LEACH successfully distributes energy-usage among the nodes in the network such that the nodes die randomly and the same rate.

Node Classes
Within the wireless sensor network, there are four (really three) different types of nodes: regular nodes, gateways, gateway pairs, and cluster heads (CHs). After the nodes Sims, Jones. WSN. 7 have declared themselves and received declarations from their neighbors, they then "Vote" by broadcasting the highest node ID number received. Any node that receives a vote is promoted to cluster head. For example, if nodes 1, 3, 4 and 9 are all within range of each other, all four nodes would broadcast a vote for node 9 (including node 9 itself). It is important to note that these MicaZ nodes have a limited range, compared to the size of the field; thus in any given field, there will be multiple cluster heads. If any single node is within range of two or more cluster heads, it must be a Gateway. This is due to the fact that any node receiving a nomination must become a cluster head. Gateways are the nodes that link two cluster heads together that are out of range of each other.

Node Clustering
In a WSN, the communication cost is often several orders of magnitude higher than the computation cost 13 . Therefore, minimizing energy consumption in communication and maximizing the system lifetime has been a major design goal for WSNs.Cluster-based hierarchical architecture has been emerging as an essential paradigm to minimize communication cost in WSNs 14 .
Node clustering is to decompose the entire network into multiple clusters to form a clusterbased WSN. It actually results in a three layer hierarchical architecture in the WSN 15 , in which each regular/member sensor (MS, in the bottom level) transmits data to its cluster head (CH, in the middle level) instead of the faraway base station(BS, in the top level). This communication paradigm hence reduces the number of sensors that participate in the long distance communication. It eventually improves the energy efficiency of the overall system. The basic idea of clustering is performed by assigning each sensor to a CH in order for optimizing the resource allocation and spatial reuse 16 . Here, we investigate two popular clustering technologies: k-clustering and balanced clustering, and compare their energy efficiency in lattice WSNs.
It is a framework in which the WSN is divided into non-overlapping sub-networks (i.e., clusters), where each member sensor in the cluster is at most k hops away from the CH 17 .

Fig. 5: Graphical representation of two cluster heads linked by a gateway pair
It then becomes mathematically trivial to demonstrate that the only other possible case is that two nodes will vote for two different cluster heads that cannot "hear" each other. As a result, these two nodes in the middle become gateway pairs Nodes that are neither gateway nor cluster heads are simply regular nodes. At this point, the wireless sensor network has obtained a fully (self) realized field  Fig. 6 illustrates the cluster instances formed by 2-clustering technology for lattice WSNs. The dark/red nodes represent the CHs, the blank nodes denote its corresponding member sensors, and the grey nodes represent member sensors belonging to other clusters. In all the three cases, the CH is located in the center of each cluster, and the number of member sensors in each cluster for square, triangle, and hexagon pattern is 13, 19, and 10 respectively. Furthermore, the resulting cluster form for square pattern deployed WSN is still square-formed, while that of both triangle and hexagon pattern deployed WSNs is hexagonally shaped. All these regular shapes can also be repeated over a continuous WSN domain. Similar observations can be found in Fig. 7 for 1-clustering technology in lattice WSNs.

Balanced Clustering
The balanced clustering technology aims at grouping neighboring sensors such that each cluster is balanced in terms of the number of sensors.
The existing balanced clustering method is based on min-weight matching in optimizing the total spatial distance between the member sensors and the CH, and minimizing the total distance between them to reduce the energy consumption in communication 18 . This scheme is also efficient in balancing the overall system load.
Suppose about Nc = 25 adjacent sensors should be grouped together to form a cluster in lattice WSNs. Fig. 2.9 shows the cluster instances resulting from balanced clustering with a cluster size of 25. It is clear that the resulting clusters are also in regular shape. That is square, rectangular and hexagon for the square, triangle, and hexagon pattern based deployment. They can be repeated in the entire network domain.

Simulation and verification
To validate the analytical and simulation model, we have carried out simulations using ns-2 simulator.
As expected, Fig. 9 shows that the intrusion detection probability in the heterogeneous WSN increases at a much faster rate than in the homogeneous WSN, as the number of Type I sensors is increased. Especially in the more demanding multiple sensing detection case, the intrusion detection probability increase seven more quickly in heterogeneous WSN than in homogeneous case.