Lightweight Anomaly Detection for Wireless Sensor Networks

Anomaly detection in wireless sensor networks (WSNs) is critical to ensure the quality of senor data, secure monitoring, and reliable detection of interesting and critical events. The main challenge of anomaly detection algorithm in WSNs is identifying anomalies with high accuracy while consuming minimal resource of the network. In this paper two lightweight anomaly detection algorithms LADS and LADQA are proposed for WSNs. Both algorithms utilize the one-class quarter-sphere support vector machine (QSSVM) and convert the linear optimization problem of QSSVM to a sort problem for the reduced computational complexity. Experimental results show that the proposed algorithms can keep the lower computational complexity without reducing the accuracy for anomaly detection, compared to QSSVM.


Introduction
Wireless sensor networks (WSNs) have been widely used in various applications including civil and military domains [1]. However, the harsh deployment environment and the constrained capabilities of sensors (energy, CPU, memory, etc.) make WSNs more vulnerable to different types of misbehaviors or anomalies. In WSNs, an anomaly or outlier (these terms are used interchangeably in this paper) is defined as the measurement that significantly deviates from the normal pattern of the sensed data [2]. Sensor data are affected by these anomalies that always correspond to node software or hardware failure, reading errors, unusual events, and malicious attacks. Therefore, it is critical to efficiently and accurately identify anomalies in the sensor data to ensure data quality, secure monitoring, and reliable detection of interesting and critical events.
The context of sensor networks and nature of sensor data make design of an appropriate anomaly detection technique challenging [3]. The constrained environment of a WSN impacts on anomaly detection algorithms. Node constraints on computational power and memory mean that algorithms for anomaly detection should have low computational complexity and occupy little memory space. Moreover, prelabelled data are expensive or difficult to obtain in WSNs. Anomaly detection for WSNs should be able to operate on unlabelled data. In general, the key challenge of anomaly detection algorithm in WSNs is identifying anomalies with high accuracy while consuming minimal resource of the network.
Recently, support vector machines (SVM) in the form of the one-class quarter-sphere (QSSVM) have been used for anomaly detection in WSNs due to their reduced computational complexity and ability to operate on unlabelled data [4,5]. QSSVM can convert the quadratic optimization problem of one-class SVM to a linear optimization problem. A family of algorithms based on QSSVM are proposed for anomaly detection in WSNs and have shown the potential for anomaly detection. However, the main disadvantage of those algorithms derived from QSSVM is the high computational cost for the solution of a linear programme.
In this paper, we convert the linear optimization problem of QSSVM to a sort problem and propose two lightweight anomaly detection algorithms for WSNs to identify anomalies. Simulations show that our proposed algorithms are able to reduce the computational complexity without reducing the accuracy for anomaly detection.
Our paper makes the following contributions: (1) We present a mathematic method to prove that the linear optimization problem of QSSVM can be converted to a sort problem. (2) Based on the presented method, we propose two lightweight anomaly detection algorithms for WSNs, namely, lightweight anomaly detection algorithm using sort (LADS) and lightweight anomaly detection algorithm using quick select (LADQS). It is shown that our algorithms are equivalent to QSSVM but have lower computational complexity.
(3) The experimental evaluation of the effectiveness and efficiency of the proposed algorithms on real world WSN dataset is presented.
The remainder of this paper is structured as follows. Related works on QSSVM-based anomaly detection models are presented in Section 2. In Section 3, first the principle of QSSVM is described, and then a method to convert the linear optimization problem of QSSVM to a sort problem is discussed. Furthermore, our proposed lightweight anomaly detection algorithms are also explained. Experimental results and performance evaluation are reported in Section 4. Section 5 concludes the paper and suggests some directions for future research.

Related Work
Anomaly detection typically makes use of data mining and machine learning techniques to detect abnormal activities in the systems [2]. Anomaly detection techniques for WSNs can be categorized into statistical-based, nearest neighborbased, clustering-based, classification-based, and spectral decomposition-based approaches [6]. Classification models are important models of data mining and machine learning community in which a classification model is learned using the known training data and used after that to classify the unseen testing data into different types of classes. SVMbased techniques are one of the popular classificationbased approaches and have been widely used to detect anomalies due to the advantages of no requirement of an explicit statistical model and prevention from the curse of dimensionality.
In WSNs, prelabelled data are expensive or difficult to obtain. Several one-class SVM-based anomaly detection techniques have been proposed to process the unlabeled data. Their main idea is to use a nonlinear function to map the data vectors collected from the original input space to a higher dimensional space called feature space. Then a decision boundary of normal data is found, which encompasses the majority of data vectors in the feature space. Those data vectors falling outside the normal boundary are classified as anomalous. To this end, Schölkopf et al. have presented a hyperplane-based one-class SVM by fitting a hyperplane from the origin. Those data vectors near the origin are considered as anomalous [7]. Tax and Duin have proposed a hypersphere-based one-class SVM by fitting a hypersphere with a minimum radius [8]. Those data vectors falling outside the hypersphere are considered as anomalous. However, these one-class SVM-based techniques still require solving a quadratic optimization, and that is extremely costly.
In order to reduce expensive computational complexity of the quadratic optimization, Campbell and Bennett have formulated a linear programming approach for the hyperplanebased SVM proposed in [9], which is based on attracting the hyperplane towards the average of the distribution of mapped data vectors. Laskov et al. have extended work in Tax and Duin by proposing a quarter-sphere one-class SVM, which converts the quadratic optimization problem to a linear optimization problem by fitting a hypersphere centered at the origin and consequently reduces computational complexity of learning the normal boundary of data vectors [5,8]. Based on QSSVM, several distributed outlier detection techniques for WSNs are proposed by Rajasegarar et al. and Zhang et al. [10,11]. After that, Rajasegarar et al. have further extended work in [12] by proposing a hyperellipsoidal one-class SVM using a linear optimization. However, the solution of a linear optimization, rather than a quadratic, still requires expensive computational complexity due to the fact that the solution of the linear programme is ( 3 ) where is the number of data vectors in the training set.
In this paper, we propose two lightweight anomaly detection techniques which convert the linear optimization problem of QSSVM to a sort problem and consequently reduce computational complexity of learning the normal boundary of data vectors.

Lightweight Anomaly Detection Algorithm for WSNs
In this section, we first introduce the principles of modeling the one-class quarter-sphere support vector machine (QSSVM) proposed in [5]. After that, we use a mathematical method to prove that the linear optimization problem of QSSVM can be converted to a sort problem and further propose two lightweight anomaly detection algorithms for WSNs.

Principles of the One-Class Quarter-Sphere SVM.
In this section we discuss the principles of QSSVM proposed by Laskov et al. in [5]. They have converted the quadratic optimization problem of the one-class SVM to a linear optimization problem by fixing the center of the quartersphere at the origin. The geometry of hypersphere SVM is shown in Figure 1.
Assume that data vectors { ∈ , = 1, . . . , } of variables in the input space are mapped into the feature space using a certain nonlinear mapping function ( ). The constrained optimization problem of QSSVM can be formalized as follows: subject to: where { : = 1, 2, . . . , } are the slack variables that allow some of the data vectors to fall outside the quarter-sphere.  The regularization parameter V is a representation of the number of data vectors that are expected to be anomalies, where V ∈ (0, 1).
Obtaining the dual form of the optimization problem allows its formulation in terms of dot products of the data vectors in the training set. Using the kernel trick, the dot products ‖ ( )‖ 2 are replaced by the kernels ( , ). The dual formulation of (1) will become It can be seen from (2) that optimization problem is stated in terms of a dot product of an image vector with itself; this causes an issue with distance-based kernels, such as the RBF kernel, as the diagonal term ( , ) becomes equal for all the vectors. This can be solved by centering the kernel matrix in feature space where the mean of the image vectors is subtracted from each image vector as follows: There is no explicit vector in feature space that represents the mean; however the dot product = ( ( ) , ( ) ) of the centred image vectors can be obtained in terms of the kernel matrix = ( , ) = ( ( ) ⋅ ( )) using = − 1 − 1 + 1 1 , where 1 is an × matrix with all values equal to 1/ [13]. When the kernel matrix is centred in feature space the norms of the kernels are no longer equal and the terms ( , ) of (2) are replaced by the diagonal elements ( , ) of the centered kernel matrix . √ ( , ) can be considered as the distances between data vectors and the origin of its centered quarter-sphere in the feature space. Consequently, the dual problem (2) can be solved.
After solving (2) for { }, the data vectors can be classified as follows. Data vectors with = 0, which lie inside the hypersphere, are considered as normal and their distances from the origin are smaller than the radius of the quartersphere. Data vectors with = 1/V are considered as anomalies, which lie outside the hypersphere. Data vectors with 0 < < 1/V , which lie on the surface of the hypersphere, are called the border support vectors. Moreover, the minimal radius of the hypersphere can be obtained using 2 = ( , ) for any border support vector .

Lightweight Anomaly Detection Algorithm Using Sort.
In the previous section, we introduced the principle of QSSVM which convert the quadratic optimization problem to a linear optimization problem. We are aware that the learning process of QSSVM is to find the minimal radius of one-class quarter-sphere for the training set, which can be obtained by the distances between border support vectors and its origin in the feature space using 2 = ( , ). This process has high computational and memory complexity due to the fact that it requires solving the linear programme and finding data vectors with 0 < < 1/V , that is, the border support vectors. To reduce the cost of modeling QSSVM, we propose a lightweight anomaly detection algorithm using sort to identify anomalies. Our algorithm can obtain the minimal radius of QSSVM, that is, the distance √ ( , ) between data vector with 0 < < 1/V and the origin, by a descending sorted sequence instead of the solution of a linear optimization problem. That is to say, our algorithm will be equivalent to QSSVM but converts the linear optimization problem to a sort problem. For doing so, we first fix the data vectors at the origin in the feature space through generation of kernel matrix and the transformation of central kernel matrix like QSSVM [5] and obtain the formulation of (2). Note that the terms ( , ) of (2) have been actually replaced by the diagonal elements ( , ) of the centered kernel matrix as we discussed in the previous subsection. Next we require finding the minimal radius from (2). Obviously, the process for calculation of only needs to find a distance √ ( , ) corresponding to satisfying the formulation of (2) and 0 < < 1/V , instead of solving the linear programme for (2). In order to find the minimal radius , the sequence { ( , )} is sorted in descending order and becomes . For convenience, we replace ( , ) with . Consequently, the dual formulation of (2) will be simplified to International Journal of Distributed Sensor Networks where 1 ≥ 2 ≥ ⋅ ⋅ ⋅ ≥ , V ∈ (0, 1) is the regularization parameter that represents the fraction of outliers, and is the number of the data vectors in the training set.
Actually, there exists an implied constraint that = if and only if = . This implies that the data vectors which have the same distances to origin in the feature space either are normal or not. Now we prove that the minimal radius can be computed directly from the descending sequence { 1 , 2 , . . . , } through the parameters V and .
is the solution of (4), the value of the objective function attains maximum under the constrained condition and can be denoted as 1 = 1 1 + 2 2 + ⋅ ⋅ ⋅ + . Firstly, we prove that 1 is the maximum of { 1 , 2 , . . . , } if 1 is the maximum of { 1 , 2 , . . . , }. Assume the contrary that the value of 1 is not maximal if the value of 1 is maximal. This assumption indicates that there exists such that value of is the maximal of { }. We exchange the value of with 1 , so the value of the objective function becomes Subtract 1 from 2 and we have From the contradictions, we have 1 ≥ and ≥ 1 . So we get 2 − 1 ≥ 0. This contradicts the fact that 1 is not the maximum of the objective function. Thus we have derived that 1 is the maximum of { | = 1, 2, . . . , }.
Step 4. If ( , ) > 2 is outlier Else is normal End If Algorithm 1 Theorem 4. The minimal radius of the quarter-sphere in QSSVM can be obtained by where ( +1) is the ( + 1)th largest squared distances ( , ) between and the origin of its centered quarter-sphere in the feature space.
From Theorem 4, the minimal radius of the quartersphere in QSSVM can be obtained by a descending sorted sequence using (9). So the linear optimization problem of QSSVM is converted to a sort problem of { ( , )}. Now, we propose a lightweight anomaly detection algorithm using sorting (LADS), which is equivalent to the QSSVM. The detail of LADS is described as follows.
After the generation of kernel matrix and the transformation of central kernel matrix, the data vectors are centered at the origin in the feature space the same as QSSVM. These squared distances to the origin are sorted in descending order and form a descending sequence { ( , )}, where ( 1 , 1 ) ≥ ⋅ ⋅ ⋅ ≥ ( , ). The ( + 1)th largest squared distance ( , ) ( = ⌊V ⌋) is chosen from the descending sequence as the minimal radius by (9). The data vectors can be classified depending on . The data vectors whose distances to origin are not larger than are considered as normal. Otherwise, the data vectors whose distances to origin are larger than are considered as outliers. Now, our algorithm can be described in Algorithm 1.

Lightweight Anomaly Detection Algorithm Using Quick
Select. As discussed in the previous section, the minimal radius of the quarter-sphere in QSSVM can be obtained from the descending sequence { ( , )}. The computational complexity for obtaining the descending sequence is ( log 2 ). In fact, the process for the descending sequence is unnecessary, because can be determined by (9) if the (⌊V ⌋+ 1)th largest element of the original sequence { ( , )} can be found. So we propose an anomaly detection method LADQS based on Quickselect algorithm [14] to find the (⌊V ⌋ + 1)th largest distance to the origin, that is, minimal radius . In computer science, Quickselect is a selection algorithm to find the th largest element in an unordered list. Quickselect uses the same overall approach as quicksort [14], choosing one element as a pivot and partitioning the data in two based on the pivot, accordingly as less than or greater than the pivot. However, instead of recursing into both sides, as in quicksort, Quickselect only recurses into one sidethe side with the element it is searching for. This reduces the average computational complexity from ( log 2 ) to ( ). The pseudocode of LADQS algorithm is shown in Algorithm 2.

Experimental Results and Evaluation
This section specifies the performance evaluation of our two techniques compared to QSSVM. In our experiments, we have used real data gathered at the Grand-Saint-Bernard [15], which is similar to the one used in [16]. For simulation, we use Matlab to implement our algorithms and QSSVM on a single node in a WSN. For fairness, we use the average of the tests operated on 7 different nodes, respectively, as the experimental results.

Experimental Datasets.
The real data are collected from a closed neighborhood from a WSN deployed in Grand-Saint-Bernard as shown in Figure 2. The closed neighborhood consists of node 2 and its 6 spatially neighboring nodes, namely, nodes 3, 4, 8, 12, 20, and 14. In our simulations, we test the real data collected during the period of 6 am-6 pm on September 20, 2007, with two attributes: ambient temperature and relative humidity for each sensor measurement. The data is preprocessed and normalized to the range [0, 1]. The number of anomalous data is about 10% of normal data. Measurements are labeled depending on the degree of dissimilarity between one another.

Experimental Results and Evaluation.
We choose radial basis function (RBF) kernel function to generate kernel

matrices. RBF kernel function
where is the width parameter of the kernel function. And the kernel width parameter is set to 0.25 in our experiments.
We have examined the effect of the regularisation parameter V for our two anomaly detection algorithms and QSSVM. V represents the fraction of anomalies in training set and we have varied it in the range from 0.01 to 0.25 in intervals of 0.03. And we also have examined the training time for the three algorithms. Figures 3 and 4 show that the detection rate and the false alarm rate obtained for our algorithm LADS use RBF kernel function for real data, respectively. As we discussed in the previous section, LADS, LADQS, and QSSVM have the same principle of data classification but have difference in the way of finding the minimal radius . This means the LADS behaves in a similar manner to LADQS. Therefore, results of LADQS and QSSVM have been omitted. Figure 5 shows the training time elapsing for the three algorithms. We can see that the training time of our algorithms LADS and LADQS is significantly less than that of QSSVM. It indicates that our two algorithms have less time and lower computational complexity, compared to QSSVM.
Simulation results show that our two algorithms LADS and LADQS have the lower computational complexity without reducing the accuracy for anomaly detection, compared to QSSVM.
Computational complexity of our techniques is presented in Table 1, where denotes the number of data in the training sets, represents the dimensionality of the measurement, V represents the fraction of anomalies in the training set, and ( ) represents the computational complexity of solving a linear optimization problem.

Conclusion
In this paper we propose two lightweight anomaly detection algorithms for WSNs, LADS and LADQS. Both algorithms are based on QSSVM but convert the linear optimization problem of QSSVM to a sort problem. Simulation results show that our algorithms reduce the computational complexity while achieving the same accuracy for anomaly detection. Our future research includes selecting the optimal parameters for V and implementing our algorithms on multiple sensor nodes in real-life.