Clone Detection in Sensor Networks with Ad Hoc and Grid Topologies

In cloning attacks, an adversary captures a sensor node, reprograms it, makes multiple copies, and inserts these copies, into the network. Cloned nodes subvert sensor network processing from within. In a companion paper [2], we show how to detect and remove clones from sensor networks using random key predistribution security measures. Keys that are present on the cloned nodes are detected by using authentication statistics based on key usage frequency. For consistency with existing random key predistribution literature, and ease of explanation, the network in that paper used an Erdos-Renyi topology. In the Erdos-Renyi topology, the probability of connection between any two nodes in the network is uniform. Since the communications ranges of sensor nodes are limited, this topology is flawed. This article applies the clone detection approach from [2] to more realistic network topologies. Grid and ad hoc topologies reflect the node connectivity patterns of networks of nodes with range limits. We provide analytical methods for choosing detection thresholds that accurately detect clones. We use simulations to verify our method. In particular we find the limitations of this approach, such as the number of nodes that can be inserted without being detected.


Introduction
Sensor networks deployed in hostile environments are vulnerable to physical attacks; including node cloning. In a cloning attack, an adversary captures a sensor node and copies its state. She reverse engineers the node, reprograms it, makes multiple copies of it, and inserts these copies back into the network. Chan and Perrig [1] explain how clones can falsify sensor data; extract data from the network; and/or stage denial of service attacks. A typical sensor network will contain hundreds or thousands of nodes in a hostile environment. It will not be possible to constantly monitor nodes to detect potential tampering. Therefore it is necessary to develop a security approach for detecting and removing cloned nodes.
In [2] we present a statistical approach for detecting cloning attacks in networks whose security guarantees are based on random key predistribution. In random key predistribution, first introduced in [3], before deployment each node is equipped with a key-ring of k keys chosen at random from the same large pool of P keys. Given k and P, there is a well defined probability that any two nodes will share at least one cryptographic key and use placement, nodes are often distributed in a regular pattern. Grid topology analysis relies on results from percolation theory [9].
In this paper we analyze a two-dimensional rectangular grid. The same approach can easily be adapted to other tessellations, like those in [9]. Every node in the network (ignoring edge effects) has exactly four nodes in its neighborhood. Communications between a node and its neighbors occur only when they share a common key. Figure 1a shows a 2-dimensional grid of nodes in a 100 by 100 meter square region. The 100 sensor nodes are placed at regular intervals with 10 meters between neighboring nodes. Figure 1b shows the communications links that are established between nodes that use a random key predistribution scheme.

Ad hoc Network Topology
In ad hoc networks, there is no control over node placement. A model of their topologies was proposed by [10]: the sensor network has N nodes. These N nodes are uniformly randomly distributed in a square region (i.e., the x and y coordinates are random variables in the range [0..max].) They can communicate only with nodes that are within their communications range r.
[10]'s ad hoc model assumes that two nodes within range of each other communicate with probability 1. We allow nodes within range of each other to communicate with probability p c [11]. For modeling random key predistribution (with key pool size P and k keys in the keyring), p c is the probability that two nodes share a key [3]: (1) Figure 2a shows an example distribution of nodes in a square region. Again, 100 sensor nodes are placed in a 100 by 100 meter region. Figure 1b shows the communication links enabled by random key predistribution. Unlike the Erdos-Renyi topology, nodes can not communicate if they are separated by a distance greater than the radio range r. Unlike grid networks, little is assumed about node placement. The ad hoc model better captures the topology of networks deployed in hostile environments. Clone Detection in Sensor Networks 211

Clone Detection Protocol
As shown in [2], cloning attacks can be detected by analyzing key usage statistics. We use the fact that each node randomly selects k keys from the pool without replacement to predict the distribution of the number of nodes possessing a given key. The distribution of the number of times a key is used is a direct consequence of the number of nodes possessing the key. When clones are inserted into the network this key usage distribution is changed, since the cloned keys are present on a greater number of nodes than normal. The keys are therefore used more frequently than keys that have not been cloned. By collecting key usage statistics we can determine which keys have been cloned. Note that we consider only the use of keys to create a communications link between two nodes during network initialization.
The sensor network has a pool of P keys for a set of N authorized nodes. Each of the N nodes has a key ring of k keys taken randomly from the pool. C cloned nodes are added to the network. The total number of nodes is T = N + C. A base station B is outside the network with a copy of the entire key pool. Every node has B's public key that is used to communicate securely with the base station. B collects key usage statistics from the network and detects the keys that are used by cloned nodes.
We use counting Bloom filters to collect key usage statistics. Each node makes a counting Bloom filter of the keys it uses to communicate with neighboring nodes. It appends a random number (nonce) to the Bloom filter and encrypts the result using B's public key. This encrypted data structure is forwarded to B. B decrypts the Bloom filters it receives. The nonces are used to foil replay attacks; messages with duplicate nonces are discarded. B performs membership queries on the counting Bloom filters it receives and calculates how many times each of the keys are used in the network.
Keys used too often are considered cloned. B makes a Bloom filter of the cloned keys. For authentication purposes, the filter is encrypted using B's secret key. The Bloom filter is broadcast through the sensor network using a gossip protocol [12]. Each node checks the keys in its key ring for membership in the Bloom filter of cloned keys. Cloned keys are removed from the local key ring and all connections using compromised keys are terminated. More details are in [2].

Threshold Detection
A sensor network has N nodes active in a region. The key pool contains P keys. Each node has a key ring of k keys chosen at random from the pool. Also let n denote the neighborhood size, i.e., the maximum number of nodes that a given node can connect to. In [2] we derived the following theorems for the mean and variance of the number of times a key is used for making connections in the network,

Theorem
The expected number of times a key is used for making connections in the network is: with variance: (Note that M j is not canceled in (2), since it is required to compute the variance in (3).) Where M j is the probability a key is on exactly j nodes, and N j is the expected number of times a key on exactly j nodes is used. If P d (j) the probability a key is on exactly j nodes and C j denotes the number of times that key, k j on j nodes is used for making connections in the network, then In the case of Erdos-Renyi networks, C j (the maximum number of times that key, k j on j nodes can be used for making connections in the network) is given by jÃðjÀ1Þ 2 as indicated in [2]. This is not true for grid and ad hoc networks. Corollary 1. The maximum number of times that key k j on j nodes is used for making connections in the grid network, provided that no two of those j nodes have any other key in common is: Proof. The number of times a key is used by a node for making connections is dependent on the number of nodes in the neighborhood with that key. For a key k j is located on exactly j nodes in the sensor network and a node with k j , the probability p i that the key is on i other nodes in the neighborhood is: The expected number of times a key that is on exactly i other nodes in the neighborhood is used to make connections is:

Clone Detection in Sensor Networks
Since the neighborhood size is n, i can vary from 1 to n; making the expected number of times that the key is used by the node for making connections in it's neighborhood: Because key k j is on exactly j other nodes, the expected number of times it is used to make connections is: For the grid network n = 4, making the number of times that a key on j nodes can be used to create connections in the network, provided that no two of those j nodes have any other key in common is: (10) Figure 3 shows simulation results and predicted values for mean key usage and variance as network size increases for grid networks.
Corollary 2. Maximum number of times a key, k j on j nodes can be used to initialize communications links for an ad hoc network provided that no two of those j nodes have any other key in common is: Where n ¼ P NÀ1 i¼0 i:p i :p c , p i is the probability i nodes are in range of the node and p c is given by (1).
Proof. The expected number of times that the key k j (on exactly j nodes) is used to initialize connections is given by equation (9). For an ad hoc network, there are N sensor nodes with communications range r randomly deployed in a square region of area A. For a node has range r, the area covered by its neighborhood is .r 2 . Since A is the total area, the probability any node lies within the range of a given node is: Since all nodes have the same probability pr of landing in the neighborhood, the number of nodes in the neighborhood will follow a binomial distribution. The probability that any node has i nodes in its neighborhood becomes: The expected degree of a node in the network is therefore (where pc is given by (1)): 214 R. R. Brooks et al.
In practice, nodes at a distance of r or less from the edge of the surveillance domain have a smaller neighborhood. Since the effective area where nodes can be placed is less than .r 2 . These edge effects are a simulation artifact. We do not consider them in our analytical estimations. Luckily, they do not lead to any appreciable divergence between our analytical predictions, and the simulation verification. Figure 4 shows simulation results and predicted values for the mean and variance of (11) as network size increases.
We determine whether or not a key is cloned by comparing U i (the number of times key i is used to establish connections indicated by the Bloom filter data) to k (from Equations (4), (5), and (11)). If U i is significantly higher than k , the key is probably being used by cloned nodes. A key is thought to be cloned if the key fits either (i) a normal curve or (ii) the distribution from Theorem with likelihood less than false positive rate .
The value U i which matches becomes the threshold T. Keys used more than T times are marked as clones. A detailed discussion of hypothesis testing for threshold determination is in [2].  For an ad hoc sensor network of 900 nodes, key pool size P 1000, key ring size k 50, and range r 10 m in a 100 m by 100 m region, Table 1 gives threshold values for various settings of obtained analytically and using simulations. Normal distribution threshold values are obtained using statistical tables. Theorem 2 values are based on setting the    Figure 5 compares analytical predictions of the number of keys (out of 1000 keys) detected by the algorithm as clones for different values of false positive rate and different number of clones inserted in to the system for ad hoc networks. Figure 6 includes both true positives and false positives. Table 2 gives threshold key use values for varying settings of the accepted false positive rate , for a grid sensor network of 1600 nodes with key pool size P 1000 and key ring size k 50. The threshold values are obtained using the same prediction and simulation approaches as for ad hoc networks.   of clones inserted into the system for grid networks. Figure 6 includes both true positives and false positives.

Simulations
This section gives simulation results to verify our clone detection approach. We present results for both grid and ad hoc network topologies. The sensor networks have a key pool size (P) 0f 1000 keys. Each node has 50 keys in its key ring (k). The ad hoc network has 900 nodes. The grid network has 1600 nodes. Figure 7 plots the maximum component size of the network versus the false positive rate for both grid and ad hoc networks. The figure shows an inflection point for component size when the value of is somewhere between 0.3 and 0.5. In ad hoc networks, the inflection point occurs when is approximately 0.8. False positive rates above those values will fracture the network and disable the sensor network. The prediction model in [2], predicts that the phase change in the ad hoc network occurs at = 0.62. Also a known result from percolation theory is that the inflection point for grid networks of 4 neighbors occurs when the  probability of connection between any two nodes is 0.5. The corresponding value of for grid networks is 0.47. These predictions are consistent with our simulation results. Figure 8 shows simulation results for the number of keys (out of 1000 keys) that are detected as clones for different values of and different number of clones inserted in to the system. The values presented include both true and false positives. Figure 9 shows the number of cloned keys detected for different settings and different numbers of clones for both ad hoc and grid networks. These values do not include false positives. The keys that are detected are the keys that are most highly used in the network. As can be seen from the figures, larger values of result in a larger number of cloned keys being detected, but this comes at the cost of a higher number of legitimate keys being falsely detected as clones. This is an acceptable tradeoff, since this result in very few legitimate nodes being rejected from the network.   The protocol is more effective for the ad hoc topology, when = 0.60 all clones are eliminated when 90 or more clones are introduced. Even when a smaller number of clones are inserted in to the system, very few clones remain connected to the network. The clones' ability to cause harm is severely restricted.

Conclusion
This paper focuses on sensor network cloning attacks where an adversary captures a sensor node, reprograms it, and adds multiple modified copies of the node to the network. We examine the feasibility of our clone detection protocol, first presented in [2] for ad hoc and grid network topologies. These topologies better represent realistic sensor network implementations, since they address communications range limitations. We detect cloned keys by looking at how often they are used to establish connections between nodes; this is proportional to the number of nodes that possess the key. Inserting clones into the network skews this distribution. Cloned keys are present on more nodes and are used more often than keys that have not been cloned. The number of cloned keys detected increases with the false positive rate . We quantify the maximum value of , where the network will fracture for both grid and ad hoc networks. This is done using the method in [13].
Our ability to detect and remove clones increases with the number of clones in the network. Our approach does not require any cooperation from the cloned nodes. In [2], we show that the best strategy for the clones is to ignore the protocol. Active tampering with the protocol can only increase the likelihood that the clones will be detected.
We show how to quantify the number of clones that will not be detected for various number of clones inserted into the network. This is useful, in that it indicates a number of clones that may be present even if this protocol is used. Sensor network applications and security measures should be designed in a manner that can tolerate that many malicious participants.
One drawback of the presentation here and in [2] is the use of a centralized authentication mechanism. Centralized authentication using a base station is undesirable. It introduces a single point of failure into the network and can become a significant system bottleneck. It also violates random key predistribution's support of network self-organization, which is a very appealing aspect of the approach. Using multiple key servers in [5] eliminates the need for having a centralized authentication mechanism. That approach is orthogonal to the issues addressed in this paper.