Black Hole and Sink Hole Attack Detection in Wireless Body Area Networks

: In Wireless Body Area Networks (WBANs) with respect to health care, sensors are positioned inside the body of an individual to transfer sensed data to a central station periodically. The great challenges posed to healthcare WBANs are the black hole and sink hole attacks. Data from deployed sensor nodes are attracted by sink hole or black hole nodes while grabbing the shortest path. Identifying this issue is quite a challenging task as a small variation in medicine intake may result in a severe illness. This work proposes a hybrid detection framework for attacks by applying a Proportional Coinciding Score (PCS) and an MK-Means algorithm, which is a well-known machine learning technique used to raise attack detection accuracy and decrease computational difficulties while giving treatments for heartache and respiratory issues. First, the gathered training data feature count is reduced through data pre-processing in the PCS. Second, the pre-processed features are sent to the MK-Means algorithm for training the data and promoting classification. Third, certain attack detection measures given by the intrusion detection system, such as the number of data packages trans-received, are identified by the MK-Means algorithm. This study demonstrates that the MK-Means framework yields a high detection accuracy with a low packet loss rate, low communication overhead, and reduced end-to-end delay in the network and improves the accuracy of biomedical data.


Introduction
One of the most active research areas in the past few years is that regarding wireless sensor networks (WSNs). In WSNs, sensor nodes cooperate with one another by sensing the deployed surroundings and updating to the Sink. Sink hole attack attracts the network traffic in neighborhood nodes by promoting itself to own the shortest route to the Sink.
Based on the existing route, the sink hole node attempts to invite the stream of traffic through a specific area. The rest of the nodes in the network then use this malicious node path and interchange their data. Given that the communication dependency for affected nodes is through a malicious node, the sink hole attack easily creates a way for other kinds of attacks, such as gray hole and black hole. A distributed adaptive framework is proposed [1] on the basis of the subjective opinion and probabilistic logic extension of scheduled automata to identify the likelihood of each sensor node being attacked by sink hole. Fig. 1 shows the model of the sink hole attack.

Figure 1: Sink hole attack
As presented in Fig. 1, the red sensors represent the compromised nodes, whereas the blue sensors represent the normal sensor nodes with "S hole " and "S node " being the sink hole and sink node, respectively. In [1], subjective opinion models at the Sink evaluate the probabilistic logic in an iterative manner. This evaluation is conducted according to node behaviors, such as +ve and −ve observations collected by the distributed sensors, which are adaptively tuned on the basis of timed automation. This timed automation captures the entire network behavior at runtime in the Sink. Given that the routing paths are selected from reliable nodes, the packet loss ratio is reduced.
To determine the discriminative features of other attacks in Wireless Body Area Networks (WBANs), such as black hole and worm hole, the overlap between positive and negative observations to be analyzed is an important performance criterion. Malicious nodes utilize the holes involved during the route discovery process with the objective of carrying out their malicious intent. One such familiar attack is the black hole attack, which sends fake routing information to a source node and drops the entire data after hosting itself in the path among source nodes and destinations. A malicious node sends a fake reply to the start node that it possesses the shortest route (i.e., malicious route request) to the end node. The start node establishes a path with this malicious node by sending data packets to it, which, in turn, discards the transmitted data packets. Fig. 2 illustrates a sample scenario of black hole attack.
In Fig. 2, "S a " denotes the start node (i.e., sensors), whereas "S e " denotes the end node with a malicious route reply obtained from sensor "S d " and a normal route reply from "S b , S f , S c " with a route request placed by "S a ." An effective security algorithm against the black hole attack is given in [2], which identifies the secured path by improving the processing of data packet with minimum end-to-end (E2E) delay and routing overhead. Such an improvement is achieved using a threshold value built on the request path (P REQ ) sent and reply path (P REP ) received. However, the existing solutions provided in [3] exclude solutions for other attacks. The sink hole attack in MANETs is distinguished using multi-hop routes from the start node to the end node, as investigated in [4]. A unique cooperative cross layer approach is presented in [5] to discover the black hole attack by using multipoint relays for establishing the route [5]. Black hole and sentinel attacks are addressed in [6] on the basis of the decision process, which includes the Bayes rule and a simple threshold approach. The effect of a cooperated node on the zone routing protocol is designed in [7] to mitigate the black hole attack, resulting in the improvement of packet delivery ratio, throughput, and E2E delay. Although improvement is attained, authentication remains a major issue. To resolve this issue, a technique to discover a pattern on the basis of gathered passwords and an attempt to provide a worker's password using hash are depicted in [8].
Although the subjective logic and trust-based model are often used in different networks, the means to address different types of attacks and the overlapping between observations remain unaddressed. The prevention of other types of attacks may not be assured using the subjective logic and trust-based model. This work leverages the flaws of the subjective logic, in addition to the trust-based model and the method to combat against the black hole and sink hole attacks, through an overlapping score and by using machine learning techniques. This paper is organized as follows: Section 2 includes briefings about existing methods to overcome the different types of attacks. Section 3 explains the sink hole and black hole attack detection methodology. Section 4 presents the parametric explanations and experimental settings. Section 5 summarizes the impacts of different performance metrics by evaluating up-to-date works. Section 6 concludes with limitations and future scope.

Literature Review
With the wide popularity of next-generation communication networks, such as MANETs and vehicular ad-hoc networks (VANETs), safety has become a major concern. Wireless sensors and delay tolerant network applications have emerged as promising technologies; some of them are used for health care, smart grid, and target monitoring. A geometric-based detection of black hole and gray hole attack schemes is investigated in [9]. In [10], a secure data fragmentation is designed to avoid and identify the sink hole and Sybil attacks on the existence of fixed and dynamic deployed nodes, yielding high detection accuracy and meager false-positive rate. In addition to issues and taxonomy, future directions against DDoS attacks are presented in [11]. A survey of Sybil defense algorithms used in online social webs is provided in [12].
A mutable black hole unearthing mechanism is designed in [13] to identify the behavioral changes of nodes. However, the mechanism fails to identify the black hole attack. WBANs are prone to many attacks, which weaken network performance. Black hole attack represents the security risk that reduces the normal functioning by dropping all received packets. Even though many mechanisms are compared in [14] to protect the network from the black hole attack, an effective and lightweight security mechanism detects and prevents DDoS attacks.
An attempt is made to investigate the features for ensuring secure routing protocols in VANETs and combat against the black hole attack by applying the stack operations, as given in [15,16]. Another measure to fight against the black hole attack is by using the AODV routing protocol, which is designed in [17]. However, the reliability of the node is not ensured and thus compromises the entire network. A proficient GA-based denial of sleep attack detection is proposed in [18]; it guarantees the reliability of the connector node by using the fitness function estimation.
Node attacks in WBANs are unavoidable, owing to node replication. To address this issue, a novel type named Solo Stage Random Walk Memory with a Distributed Network is proposed in [19]; it guarantees the security of a node with sensible overhead with respect to memory and communication. However, with nodes prone to various attacks, data communication is found insecure in both ways. Although data communication is ensured in [20], attack detection is said to be compromised as network density increases. To address this issue, different types of sink hole nodes are identified in [21] on the basis of several disjoint clusters. An intrusion detection with various machine learning techniques is surveyed and presented in [22,23].
Some secured solutions are provided to protect WBANs from the black hole attack. An improvised hierarchical efficient intrusion detection system (IDS) preserves the sensor network from such an attack [24,25]. The approach depends on the exchange of control packets between the sensor node and base station. However, the security remains unaddressed.
Many security keys are used to secure data transmission from the black hole attack. A hierarchical energy efficient IDS, which preserves the sensor network from the black hole attack, is designed in [26]. The designed system controls the packet exchange between the sensor node and base station. However, the designed system still experiences network overhead issues.
A new probabilistic approach identifies and separates the black hole attack from MANETs. Routing algorithms are not designed to reduce and remove this type of attack. A technique called Novel Honeypot-based Detection and Isolation helps identify and remove the black hole attack. The Honeypot method increases the security rate of MANETs with minimum network overhead. However, such a method is cost-ineffective [26,27]. A new technique identifies the black hole attack by multiple base stations and performs verification through an agent-based technology [28].
From the observations drawn, many of the techniques and/or algorithms are implemented for detecting attacks yet suffer from overlapping discriminative features on the intermediary node, resulting in computational issues in identifying attacks. This work proposes a novel algorithm with the objective of reducing the computational complexity (CC) of identifying the discriminative features by using the proportional coinciding score (PCS).

Methodology of MK-Means
A scenario for remote patient monitoring is considered in this work, where the PCS is used to collect data and bring alert activation when abnormal patterns (i.e., sink hole or black hole nodes) are detected, as followed in [29,30]. The proposed framework is based on the machine learning technique given in [31,32]. It clusters the sensor nodes according to the updated distance (as patient records are dynamic) by using the MK-Means algorithm. Fig. 3 depicts the block structure of hybrid, sink hole, and black hole attack detection framework.

Figure 3: Block diagram of MK-Means
The block diagram consists of three different phases, namely, physiological data gathering (PDG) phase, data reduction phase, and classification phase. The description of each module is elaborated below.

PDG
The PDG phase gathers data or features from the physiological parameter measurements presented in the form of data matrix "DMat = DMat ij ." In the data matrix, "i" represents the time-related growth values, whereas "j" represents the measured parameter. All gathered values for all measured parameters are stored as a single record incrementally at time "t." It is represented by "DMat t = (DMat t1 , DMat t2 , . . . , DMat tn )" for each patient and considered packet "P t ." The recorded features are denoted by "Fe = (Fe 1 , Fe 2 , . . . , Fe n )," where "Fe i " represents column "i" in data matrix "DMat." The features represent patient metrics, such as heart rate "H Rate " and blood pressure "B P ." Hence, the data matrix is written as given below.
The PDG module in the hybrid attack detection framework monitors the events and packets, their delivery time, and topological changes; the module also archives feature inputs. The MK-Means framework selects the hidden data, which tolerate evidence of familiarity or anomaly. The healthcare WSN is in need of the received physiological data to maintain its accuracy. Wrong triggering or erroneous instances when required to alert medical personnel may result in fatal health issues [33]. To increase the attack detection accuracy and reduce false alarms (improper diagnosis), black hole and sink hole attacks are given focus in this study.

PCS
In the MK-Means framework, an analytical method for opting features, which are based on an overlying examination of medical data through classes for each patient, is presented. In the data or feature selection module, data packets are distributed to the PCS technique to select the best features.
The PCS is determined with the objectives of exploring the commonality between features across classes for unalike patients and identifying discriminative features. This approach employs the facts given by training classes along with testing classes to explore contradictory features among target classes for hybrid attack detection.
Physiological data can generally be presented in the matrix form, "DMat = dmat ij ," where "DMat ∈ K * N." Here, "i = 1, 2, . . .K" and "j = 1, 2, . . . N." Each sample record is categorized by a labeled target class "P j ," signifying the patient model. Class labels represented as vectors, such as "P ∈ N" and its "jth" element "p j ," have a unique value called "Cl," which may be either "1" or "2." Minimum subsets of patient records (i.e., features) are selected using the PCS method, and the data delivered are analyzed. This resultant value is assumed to be the small one that suitably classifies the huge set of samples in the training set given. This kind of procedure agrees to the discarding of repeated samples, such as patient records with identical profiles, thus avoiding overlapping.
Let "Fe" be taken as a set comprising all features (i.e., "|Fe| = N"). Consider also "A M (F)" to be the features aggregate mask, which is stated as the rational disconnection corresponding to patient records, which belong to the set. It is expressed as given below.
Then, the mathematical formula to cover the maximum number of patient records is as follows: From Eq. (3), the patient record set whose data matrix values have the maximum bits of "1" is assigned. The objective of this proposed framework should be to look for the tiny subset, indicated by "Fe ," by which "A M Fe " matches to the features aggregate mask in the bunch of patient records and "A M (Fe)" fulfilling the following assertion.
The above mentioned phenomenon is iterated successively and stops when all patient records are processed, that is, when the selected patient record covers almost all the samples. The step-bystep instruction of minimizing the redundancy and maximizing the relevancy by using the PCS is given in Algorithm 1.

Algorithm 1:
Minimizing the redundancy and maximizing the relevancy Input: Patients "N p ," packet "Pkt t ," time "t," features "Fe = (Fe 1 , Fe 2 , . . . , Fe n )" Output: Minimal redundancy, maximum relevant features 1: Begin 2: For each packet "Pkt t " with "N p " patients 3: Frame the data matrix with features by using Eq. (1) 4: Find the unique feature from feature set Fe by using Eq. (2) and bring the logical disconnection among features 5: Acquire maximum relevance by using Eq. (3) to cover the maximum number of patient records 6: Minimize the redundancy of features identified in the patient records by using Eq. (4) 7: End for 8: End As given in Algorithm 1, the PCS in the MK-Means framework assigns the minimum subset of patient records (i.e., not all the attributes of the records are considered; only their relevant attributes), giving the maximum likelihood of classification accuracy in a selected training set. Patient records with a minimal subset are integrated with highly measured patient records on the basis of the "PCS," resulting in an absolute feature selection.

MK-Means Clustering
K-means clustering is used to identify sets of sensor nodes with their collection represented by variable "K." As patient records (e.g., blood pressure, pulse rate, etc.) change with respect to time, the MK-Means algorithm is used in this work because of the advantage of using rescaled (i.e., varied with respect to time) entity points. K-means clustering involves a set of sensor nodes "N," "i" set of features, and "V ;" including data matrix "DMat = dmat ij ," where "dmat ij " is the value of feature "j ∈ J" at entity "i ∈ I." The framework makes a partition " Fe = Fe 1 , Fe 2 , . . . , Fe n " of "i" in "K" non-overlapping subsets generated by Algorithm 1, referred to as clusters, each represented by a centroid "c K = (c kv )" in the feature space "k = 1, 2, . . . , K." Then, "within cluster distances to centroid" are measured and given below.
Subsequently, the Minkowski m-metric between M-dimensional points "x = x h " and "y = y h " is defined using the equation given below.
With the obtained M-dimensional points, the groups or similar nodes (i.e., similar records), which exist in the network, are identified. With the obtained information, the identified records are transferred to the cluster head (CH) nodes. Fig. 4 illustrates the block diagram with a brief working structure of the suggested IDS by using the MK-Means algorithm. The work proposed here assumes that the sink hole or black hole nodes either drop the data or claim to owe the shortest path.

Figure 4: MK-means IDS
From the sensing devices, the CH gathers the data, and nodes are differentiated by stars as the CH and normal sensor nodes are denoted by filled circles. The Sink position is fixed randomly on the basis of the GCCR [34].
The intrusion detection consists of a correlation measure that estimates the intrusion measure (Int Msr ) from the extracted feature values. Data packets receive "DPkt rec ;" data transmit packet "DPkt tr ;" and node ID's "S i " is applied to estimate "Int Msr ." The alert message is activated by an intrusion detection engine that depends on the "Int Msr " input, which denotes the presence or absence of cooperated nodes (i.e., sink hole or black hole nodes). Fig. 5 shows an example network that includes sink hole and black hole attacks.
As presented in Fig. 5, "S 1 " and "D st " are the source and destination nodes, respectively. Circle "S h " represents the sink hole node, which attracts the traffic with "S 4 ," "S 6 ," and "S 7 " being the compromised nodes. Meanwhile, "S 9 " denotes the black hole node, which claims to possess the shortest path with "S 5 " being the compromised node.
The correlation measure is obtained with the aid of the intrusion measure using the data packets received and data packets transmitted for each CH node. It is mathematically measured as follows: With the intrusion measure, the IDS activates an alert message according to the resultant values.

Figure 5: An attack detection generated by the IDS with a sample network
The pseudo code representation for the sink hole and black hole attack detection using a machine learning technique is given in Algorithm 2.

Algorithm 2:
Sink hole and black hole attack detection using a machine learning algorithm Input: Sensor node "S i ," data packet received "DPktR," data packet transmitted "DPktT" Output: Detection of sink hole or black hole 1: Begin 2: For network sensing nodes "S i " (i.e., that form cluster) <= N 3: Calculate in cluster distances by using Eq. (5) 4: Calculate m-metric between M-dimensional points by using Eq. (6) 5: End for 6: Repeat 7: Calculate intrusion measure "Int Msr " by using Eq. (7) 8: If "Int Msri → ∞," then 9: The matching "S i " is the sink hole node. 10: Isolate "S i " node 11: Send alert notice about "S i " to the cluster member nodes that remain in the network 12: End if 13: If "freq (DPktR i [S i ]) = 0," then 14: Fix black hole node "S i " 15: Isolate "S i " node 16: Send alert notice about "S i " to the cluster member nodes that remain in the network 17: Else 18: The matching "S i " in the network is a normal CH.

19: End if 20:
Until communication process is accomplished 21: End The IDS comprises a correlation measure module, which calculates the intrusion measure (Int Msr ) from the extracted features by using the PCS. The correlation measure transmits the "Int Msr " input to the attack detection engine. The detection technique activates the alert message depending on the "Int Msr ," which signifies the role of compromised node as live or not.

Experimental Results
The model specified is executed using Network Simulator-3 (NS-3). Nodes, which are deployed in the region on the basis of the random-waypoint model and scenario, vary from 50 to 500. For each sensor node, the model behavior is repeated individually, and variation is made in mobility by constructing each node to be stationary for a certain pause period. The parameters used for simulation are shown in Tab. 1.
In this segment, the experimental results are presented to exhibit the efficiency of the hybrid attack detection accuracy framework. The proposed framework is compared along with the adaptive sink aware (SINK-AWARE) algorithm and secure route discovery in AODV (AODV-SR) as both are analogues to the proposed framework; the location estimation of sink hole and black hole attackers is also conducted using the two algorithms. In the NS-3 implementation, a set of nodes in WBANs is randomly deployed in a "500 * 500 m 2 " area, and each node has 150 m as the communication range.

Performance Measures
The following performance metrics are applied in the current scenario.

Computational Complexity (CC)
CC involves the time taken to measure the Minkowski m-metric between M-dimensional points "Dis m (x, y)" and the intrusion measure (Int Msr ) for detecting intrusion. (Dis m (x, y)) + time (Int Msr ) .

Attack Detection Accuracy
The attack detection accuracy is formulated as follows: From Eq. (9), attack detection accuracy "AD acc " is attained with the complete sensor nodes "S i ," which are obtained from patients "P" and attack (sink hole or black hole) nodes "Att s " during data transmission in healthcare WBANs.

Packet Loss Rate
Packet loss is said to occur when certain packets fail to reach the intended destination and network congestion. It is measured as the ratio of packets lost with respect to the packets sent.
From Eq. (10), packet loss rate "PLR" is the difference between the data packet transmitted "DPktT i " and the data packet received "DPktR i " by "i" nodes.

E2E Delay
E2E delay refers to the duration consumed for a packet to be transmitted to a network wide from source to destination. It is mathematically evaluated as given below.
E2E delay "Delay End2End " is the measure of total delay occurred for the transmission of data packets "DPktT i " to the successful delivery of a packet with respect to delay to the data packets in total "DPkt i ." It is measured in terms of milliseconds and involves waiting at interface queues, propagation time, transfer time, and retransmission time.

Discussion
The result analysis of the MK-Means framework is compared with existing SINK-AWARE algorithm [1] and AODV-SR [2].

Computational Complexity (CC)
To conduct experimentation, 500 sensor nodes with varying sizes of data packets sent at different time intervals are selected. With these sensor nodes, the CC involved in attack detection is identified. Fig. 6 illustrates the computational risk involved in attack detection averaged over 500 random training sensor nodes. The inference drawn from the experimental results clearly shows that it exceeds statistical measures compared with existing techniques SINK-AWARE [1] and AODV-SR [2]. The experimental outcome of the proposed work points out a significant improvement in different deployment scenarios and a reduced CC with the MK-Means. With the deployment of 100 nodes, the CC is observed to be 5.25 and 8.89 milliseconds using SINK-AWARE and 9.23 milliseconds using AODV-SR. The CC involved in attack detection is improved with the application of the PCS, which considers the overlapping of medical data across classes for different patients and evaluates the discriminative features for attack detection. Furthermore, the PCS regarding the overlapping of features is made on the basis of the time-related growth values with which the aggregated mask value is considered, resulting in the minimum redundant and maximum relevant features to be extracted. Moreover, the CC involved in attack detection is reduced by 31% compared with SINK-AWARE and 10% compared with AODV-SR.

Impact of Attack Detection Accuracy
The average results of 10 simulation runs, which are performed to measure the accuracy of attack detection, are shown in Fig. 7. The sensing nodes vary from 50 to 500 in the experimental scenario. As discussed in previous sections, the attack (sink hole and black hole) detection ratio using the MK-Means framework produces better results than employing contemporary methods. The attack detection accuracy of the deployment of 50 nodes using MK-Means is 83.93% and 78.53% using SINK-AWARE and 73.53% using AODV-SR.
The attack detection ratio with the MK-Means framework is compared with SINK-AWARE and AODV-SR in Fig. 7, which shows betterment using MK-Means. The MK-Means framework differs from SINK-AWARE and AODV-SR, as it incorporates a machine learning algorithm for sink hole and black hole detection. The advantage of applying a machine learning algorithm in the MK-Means framework is that an intrusion measure is used to validate attacks, and the intrusion measure value is a time-related growth value in that it obtains patient information with respect to time. This time-related increase value is then used to decide whether any type of attack or a normal node is observed. This decision, in turn, improves the attack detection ratio as 7.8% against SINK-AWARE and 8.5% against AODV-SR.

Effect of Packet Loss Ratio
The packet loss ratio is evaluated when the source transfers a volume of data (i.e., patient records) to the destination (i.e., laboratory technicians or doctors). The experiments are conducted using 200 data packets with various sizes, and the packet loss rate is measured by the set of packets received per second (pps). Fig. 8 illustrates the packet loss rate for the MK-Means framework, SINK-AWARE, and AODV-SR with 200 different data packets. The average packet loss rate returned over the MK-Means framework increases gradually for different data packets and proves to be efficient when compared with the two other methods. Fig. 8 also reveals that the average packet loss ratio while performing attack detection is enhanced by the MK-Means framework. With the MK-Means algorithm, rescaled entity points with various patient features are considered (i.e., due to the non-static nature of patient records as discussed in [35]). The packet loss rate is reduced by considering the rescaled entity points and non-overlapping subsets produced by Algorithm 1. From the extracted features, attack detection is performed, resulting in the upscale of the packet loss ratio by the MK-Means structure by 25.4% with respect to SINK-AWARE and 16.3% with respect to AODV-SR.

Impact of E2E Delay
The E2E delay is measured with different sizes of data packets against all the three algorithms, and the resultant values are shown in Fig. 9. The E2E delay rate is measured in terms of milliseconds for experimental purpose. Fig. 9 reveals that the E2E delay is lower when using the MK-Means framework than when using existing methods, such as SINK-AWARE and AODV-SR. The reason is because within cluster distances to centroid are measured using MK-Means, where similar patient records are transferred to the CH node, as given in [36]. This CH node then obtains the correlation by using the intrusion measure. Therefore, not all the node information is transferred to the IDS; only the updated patient records with respect to time using MK-Means are used as measures for obtaining the correlation value. Considering such a situation, the E2E delay, which uses the MK-Means framework, is reduced by 37.4% compared with SINK-AWARE and 8.7% compared with AODV-SR.  Healthcare WBANs remain susceptible to security risks, such as sink hole and black hole attacks. To overcome these issues, an intrusion detection framework is proposed for detecting attacks and alerting sensing nodes to minimize data loss. Specifically, the architectural view for the intrusion detection of healthcare WBANs, which use the PCS and the MK-Means machine learning technique, is implemented for data reduction and classification accuracy, respectively. The PCS minimizes the feature size, which, in turn, cuts down classification complexities, with which the attack detection accuracy is improved. The proposed attack detection framework captures the sink hole and black hole nodes with computation ability and activates the remaining deployed nodes through an alert message. Furthermore, the numerical results drawn from various executions demonstrate that the proposed attack detection framework reduces the E2E delay and packet loss rate with the help of CH nodes, which send only updated patient records.
Although CC is reduced, the experiments are conducted with only few records. For handling voluminous data, algorithmic techniques must be changed accordingly or suitable learning algorithms with cloud data management should be incorporated in the future to adopt this technique in healthcare domains.
Funding Statement: This research received no grant funding. APC was funded byŞtefan cel Mare University of Suceava, Romania.

Conflicts of Interest:
The authors declare no conflicts of interest to report regarding the study.