Data Mining Techniques for Wireless Sensor Networks: A Survey

,


Introduction
Advances in wireless communication and microelectronic devices led to the development of low-power sensors and the deployment of large-scale sensor networks.With the capabilities of pervasive surveillance, sensor networks have attracted significant attention in many applications domains, such as habitat monitoring [1,2], object tracking [3,4], environment monitoring [5][6][7], military [8,9], disaster management [10], as well as smart environments.In these applications, realtime and reliable monitoring is essential requirement.These applications yield huge volume of dynamic, geographically distributed and heterogeneous data.This raw data, if efficiently analyzed and transformed to usable information through data mining, can facilitate automated or humaninduced tactical/strategic decision.Therefore, it is essential to develop techniques to mine the sensor data for patterns in order to make intelligent decisions promptly.
Recently, extracting knowledge from sensor data has received a great deal of attention by the data mining community.Different approaches focusing on clustering [11][12][13][14], association rules [15,16], frequent patterns [17][18][19][20], sequential patterns [21][22][23], and classification [24][25][26] have been successfully used on sensor data.However, the design and deployment of sensor networks creates unique research challenges due to their large size (up to thousands of sensor nodes), random and hazardous deployment, lossy communicating environment, limited power supply, and high failure rate.These challenges make traditional mining techniques inapplicable because traditionally mining is centralized and computationally expensive, and it focuses on disk-resident transactional data.As a result, new algorithms have been created, and some of the data mining algorithms have been modified to handle the data generated from sensor networks.A plethora of knowledge discovery methodologies, techniques, and algorithms have been proposed during the last ten years.
For example, a decent amount of work is done for detection of the outlier in WSNs which is presented in [27][28][29].Most of the techniques examined in [27,28] heavily rely on data mining techniques, but their focus is detection of irregularities in WSNs data rather than information extraction and analysis.A survey [29] presented the anomaly detection in multiple domains using data mining as well as statistical information theoretic and spectral techniques.Since data mining is a broad discipline and can be applied to any domain data, more general surveys on data mining techniques can be found in [30], where authors examined the machine-learning and data mining techniques for analyzing medical data.Since the classification of data mining techniques in this survey is based on frequent pattern mining, clustering, and classification, there are plenty of surveys available on each of these techniques.For example, frequent pattern mining over data stream is presented in [31,32].A survey on clustering algorithm for WSNs is presented in [33,34].The clustering techniques examined in those papers exclusively focus on architecture and management of network rather than information discovery.A survey on classification methods over data stream is given in [35], where the author examined conventional classification techniques over data streams.
However, none of the above surveys examined data mining techniques that focus on information extraction and analysis from WSNs data.In comparison with the above-mentioned surveys, this paper examines algorithms and approaches specially designed for WSNs data, not only leading to a different classification, evaluation, and discussion on different domains but also presenting different choices of a solution.We examined how data mining algorithms will be utilized to make the sensor network applications intelligent.The research method consists of review of data mining techniques for WSNs such as frequent pattern mining, sequential pattern mining, clustering, and classification.Problem-based taxonomy is presented to classify and compare existing data mining techniques adopted for WSNs.In addition, evaluation of each technique is presented.Based on the limitations of existing techniques and special characteristics of WSNs, we proposed a new hybrid data mining architecture for WSNs, which combines the offline learning with distributive and online data processing.
The rest of the paper is organized as follows.After the introduction in Section 1, how traditional data mining process is different with data mining process in WSNs and challenges of data mining for WSNs data are discussed in Section 2. In Section 3, taxonomy of categorizing the existing data mining techniques for WSNs is presented.In Section 4, we analyzed a collection of published studies using the taxonomy framework.The comparison of data mining techniques for WSNs is presented in Section 5.The limitations of this work are given in Section 6, and future research directions are presented in Section 7. Finally, the paper ends with the conclusion in Section 8.

Fundamentals of Data Mining in WSNs
2.1.Data Mining Process in WSNs.Data mining in sensor networks is the process of extracting application-oriented models and patterns with acceptable accuracy from a continuous, rapid, and possibly nonended flow of data streams from sensor networks.In this case, whole data cannot be stored and must be processed immediately.Data mining algorithm has to be sufficiently fast to process high-speed arriving data.The conventional data mining algorithms are meant to handle the static data and use the multistep techniques and multiscan mining algorithms for analyzing static data-sets.Therefore, conventional data mining techniques are not suitable for handling the massive quantity, high dimensionality, and distributed nature of the data generated by the WSNs.Table 1 shows the summary of difference between traditional data and WSNs data mining process.
It can be observed from Table 1 that traditional data mining is centralized, computationally expensive, and focused on disk-resident transactional data.It directly collects data at the central site which is not bounded by computational resources.In comparison with traditional data-sets, the WSNs data flows continuously in systems with varying update rates.Due to huge amount and high storage cost, it is impossible to store the entire WSNs data or to scan through it multiple times.These characteristics of sensor data and the special design issues of sensor networks make traditional data mining techniques challenging.Hence, it is crucial to develop data mining technique that can analyze and process WSNs data in multidimensional, multilevel, single-pass, and online manner.To address these challenges, researchers have modified the conventional data mining techniques and also proposed new data mining algorithms to handle the data generated from sensor networks.In the following section we have provided the taxonomy of these data mining techniques based on the discipline from which they adopt their ideas.

Taxonomy of Data Mining Techniques for WSNs
In this section, a classification scheme for existing approaches designed for mining WSNs data is presented.The second level of classification is based upon each approach's ability to process data on centralized or distributed manner.Since WSNs nodes are limited in terms of resource such as power, computation, bandwidth, and memory, therefore, the approach meant for distributed processing requires one-pass algorithms to complete a part of data mining locally and then aggregate the results.The objective to use the distributed approaches is to limit the messages and communication energy of sensor nodes while transferring data to central server.It also helps to improve the WSNs lifetime and can extract maximum data from the environment, whereas, the centralized processing data from entire network is collected and stored at central server for analysis.Since the central server is rich in resources, therefore, there are no such constraints for choosing the accurate algorithm.This approach is always discouraged for the researchers because it generates huge amount of dataflow and communication which can create bottlenecks and wastage of communication bandwidth.These two data processing/storage architectures have a large impact on type of data mining algorithm to choose; therefore, one has to decide the processing\storage architecture for choosing the data mining algorithm for WSNs application.
The third level of classification is selected according to the attitude towards solving a specific problem.Research in WSNs area has focused on two separate aspects of issues, namely, WSNs performance issues and application issues.As WSNs nodes are usually resource constrained such as energy, communication bandwidth, memory, and resource, aware algorithms are needed to maximize the WSNs performance.On the other hand, a WSNs application requires data precision and accuracy, fault tolerance, event prediction, scalability, and robustness, and it often needs abundant use of energy, communication, and redundancies.This leads to resource tradeoff: whether someone sacrifices the application's performance in favor of network efficiency or wants to get the best application performance and deal with the network resource issues such as energy in some other way (larger battery; renewable sources with the nodes).For this reason, WSNs performances or application-specific-oriented approaches have been selected as the lowest-level classification criteria.The taxonomy of data mining techniques for WSNs is presented in Figure 1.

State of the Art of Data Mining Techniques for WSNs
In this section, data mining techniques designed for WSNs are classified using the taxonomy framework presented in Section 3, and the characteristics and performance analysis of each technique is discussed.

Frequent Pattern Mining.
In this section, we review some of the works that have been proposed for mining frequent patterns from WSNs data.Frequent pattern mining is used to find the group of variables that co-occur frequently in the data-set.The aim is to find the most interesting relations between variables.Traditional frequent pattern mining algorithms [36][37][38][39] are the CPU and the I/O intensive, making it very expensive to mine dynamic nature of WSN data.Unlike the mining static database, dynamic nature of WSNs data led to the study of online mining of frequent itemset.As a result, traditional frequent pattern mining algorithms are modified according to nature of WSNs data.The basic frequent pattern mining technique is association rule mining technique.The first known association rule mining algorithm is Apriori [40].It is based on levelwise candidate generation and test methodology by making several scans over database.In each iteration, the patterns found to be frequent are used to generate possible frequent patterns (the candidates) to be counted in the next iteration.Therefore, the Apriori technique finds the frequent patterns of length  from the set of already generated candidate patterns of length  − 1.In the subsequent step, the association rules are generated by computing the support and confidence of each frequent item in given database  which is defined as follows: where Sup() is the number of occurrence of  in database .Consider the following: This is impractical in the context of sensor networks as it implies that all data has to be stored somewhere.However, recently, there has been a growing amount of work on discovering frequent item-sets from a data stream of transactions such that every transaction is considered only once and can be deleted afterwards.
The other basic approach from mining association rule is FP-growth [41] which can discover frequent patterns by reducing the database scans by two and eliminating the requirement of candidate generation as compared with Apriori.With the first database scan, the algorithm finds the set of distinct items with respective support count (i.e., frequency) in the database.Then, with the second database, scan the algorithm summarizes the database in the form of a frequency-descending tree (i.e., the FP-tree).The complete set of frequent patterns is, then, mined from the FP-tree by recursively applying a divide-and-conquer-based pattern growth approach, called the FP-growth algorithm, without additional database scan.The highly compact FP-tree structure introduced a new wing of research in mining frequent patterns.However, the static nature of the FP-tree and two database scans still limit its applicability to frequent pattern mining over a WSNs data.Recently, several centralized and distributed solutions have been proposed with the aim to maximize the WSNs' performance and maximize the application-based performance by applying Apriori-like and FP-growth methods over WSNs data.

Centralized Approaches Aim to Solve WSNs' Application-
Based Issues.Halatchev and Gruenwald [42] proposed a centralized methodology called data stream association rule mining (DSARM) to identify the missing sensor's readings.It uses the association rule mining algorithm to identify sensors that report the same data for a number of times in a sliding window called related sensors and then estimates the missing data from a sensor by using the data reported by its related sensors.Due to the stream nature of sensor data, applying an association mining algorithm such as Apriori directly to sensor data is not possible.This situation led the authors to propose the DSARM framework that adapts the Apriori algorithm to make it applicable to the data stream received from sensor nodes.This technique is evaluated by simulation experiments on real data collected by the Department of Transportation in Austin, TX, USA, to estimate missing value in related data streams.Performance evaluations were conducted to compare DSARM and alternative approaches.The results show that DSARM requires more memory space and takes longer to produce estimation than the considered alternative approaches; it achieves better accuracy of the estimated value than the alternative approaches do.However, there exist some limitations in DSARM.First, it is based on two frequent itemsets association rule mining, which means that it can discover the relationships only between two sensors and ignore the cases where missing values are related with multiple sensors.Second, it finds those relationships only when both sensors report the same value and ignores the cases where missing values can be estimated by the relationships between sensors that report different values.
Jiang and Gruenwald [43,44] proposed a data estimation technique called CARM (closed item-sets-based association rule mining), which can derive the most recent association rules between the sensors in the current sliding window.The technique is based on the closed frequent item-sets mining algorithm of data streams called CFI-stream [45].It maintains an in-memory data structure called direct update (DIU) tree to store closed item-sets.When a new transaction arrives, the algorithm checks each item-set in the transaction over a data stream sliding window online and incrementally updates the closed item-sets' support.If CRAM found some missing values in sensor reading, instead of generating all possible association rules, it generates the rules that have strong relationships with the current round of sensor readings where one or more readings are missing.Based on these rules and selected closed item-sets, CRAM generates the estimated values which contain item values that are not included in the original readings.Figure 2 redrawn from [43] shows the DIU tree after receiving first four transactions.It shows that currently there are four closed item-sets: C, AB, CD, and ABC in the DIU tree, and their associated supports at the rightupper corner are 3, 3, 1, and 2. A basic set of rules is generated from these frequent item-sets.All other rules can be inferred from this basic rule set.

Centralized Approaches Aim to Maximize
WSNs' Performance.Loo et al. [46] have proposed online one-pass algorithms for mining large sensor streams.They mine the frequent value set from sensor stream data by transforming the stream data into interval list (IL) under lossy counting framework [47].The time is divided into equal-size interval and snapshot from the sensor reading is taken when there is an update on sensor reading.Sensors' value at that snapshot constructs the value sets stored in database.An Aprioribased strategy is used to mine the value sets.The analysis of IL-based presentation of stream data showed favorable results using synthetic data-set.However, while computing the IL of candidate value set, redundant intersection of IL is inevitable, which affects the performance in terms of time and computation cost.The proposed technique is evaluated by comparing the performance of ILB against an application of lossy counting (LC) using a weighted transformation method on synthetic dataset.According to their experiments, ILB outperforms LC significantly for large sensor networks.Moreover, both the processing time and memory consumption of ILB are more stable than those of LC.
Chong et al., [48] proposed a rule-learning model that finds strong rules from sensor readings.The rules are used as a trigger to control sensor network operations; for example, they can be used to sleep sensor or reduce data transmission to conserve energy.To mine the rules, Apriori is modified to count the number of transactions that are frequent instead of the item-sets within transactions, and transactions are processed in batches  1 ,  2 , . . .,   .Suppose, there is node  that collects light, temperature, and microphone reading from three other sensor streams  0 ,  1 , and  2 .Initially,  is queried to collect all sensory values, it is used to generate a rule of the form of   which implies  −1 ; therefore, the rule is extracted and only   is sent to the base station.Upon receiving the reading   and utilizing knowledge of the rule, the reading of  −1 can be inferred.All extracted rules are stored in rule repository.The proposed method is validated by using simulation implemented in C language on synthetic dataset.In the experiment, the first correlated data received from sensor is used to extract rules.For subsequent phase, these rules are used to infer reading of sensor for the next round.
Tanbeer et al. [49] proposed a tree-based data structure called sensor pattern tree (SP-tree) to generate association International Journal of Distributed Sensor Networks rules from WSNs data with one database scan.The main idea of the proposed approach is to obtain the frequency of all event-detecting sensors' data, construct a prefix-tree based on that in any canonical order, and then reorganize the tree in a frequency descending order.Through the reorganization, the SP-tree can maintain the frequently eventdetecting sensors' nodes at the upper part of the tree, which in turn provides high compactness in the tree structure.Once the SP-tree is constructed, FP-growth mining technique is applied to find the frequent event-detecting sensor sets.Experiments are performed to verify the improvement in memory consumption and runtime that SP-tree achieves over PLT [50].The experiments show that SP-tree outperforms PLT in time and memory consumption.The reason of such gain is two folds: first, the PLT construction requires two database scans, while SP-tree constructs the tree by scanning the database only once; second, the mining phase of SPtree is highly efficient due to the frequency-descending tree structure.

Distributed Approaches Aim to Solve WSNs' Application-
Based Issues.Romer [51] proposed an in-network data mining technique to discover frequent patterns of events with certain spatial and temporal properties.In this approach, user specifies the upper bound maxscope and maxhistory (variable to be measured in seconds) for the patterns of interest.The sensor collects these events and applies a mining algorithm to discover the pattern that satisfies the given parameters.Each node in the network collects the events from its neighbors within the maximum scope and keeps a history of their events for duration of the maximum history.After that, each node applies a mining algorithm to discover the local frequent patterns.The resulting frequent patterns are converted to association rules that describe an event of type  that occurs at node  with support  and confidence .Local patterns are sent to the sink where secondary mining is performed to compute the global picture of entire network.The algorithm is implemented on BT node (bluetooth radio) platform [52], and the tradeoff between scope of the query and resource consumption on real dataset is evaluated.Results show, by reducing the scope of the query, that the proposed approach could decrease resource consumption.Major issues in this approach are memory consumption of itemset discovery algorithms and the communication overhead of event collection.[15] presented a distributed data extraction methodology to aggregate the data on sensor node which reduced the number of messages during transmission.The distributed solution sends some parameters such as support, time-slot size, and historic period from sink to all nodes within network.Each sensor node has its own buffer entry to set the support value.After each time slot, nodes check whether there are messages received during this time slot; if yes, then that node will set its buffer entry.When the historic period ended, each node will traverse its buffer; if the number of set value is more than or equal to support value provided initially, then the message would be transfered to sink.To evaluate the validity of the distributed approach, it is compared with the centralized methodology on real dataset.They conducted two experiments using historical periods of 5 and 10 days with minimum support values ranging from 10% to 90% and a time-slot size equal to 30 seconds.All of the reported results show a reduction in the number of messages and the data size while increasing in the support values.Major issues in this methodology are increase in cost for node buffer and also delay in crucial messages in case of high support value.

Distributed Approaches Aim to Maximize WSNs' Performance. Boukerche and Samarah
Boukerche and Samarah [50] proposed the positional lexicographic tree (PLT) structure for mining association rules in which the event-detecting sensors are the main objects of the rules regardless of their values.Similar to the FP-growth approach, PLT follows a pattern growth mining technique.The mining begins with the sensor having the maximum rank by generating the frequent patterns from its PLT in a recursive way.The computation is required at each recursion to update the PLT involved in the prefix part of a pattern.Therefore, two database scans requirement and the additional PLT update operations during mining limit the efficient use of this approach in handling WSNs data.The performance evaluation is done by comparing the PLT structure with the FP-growth algorithm.According to their results, PLT structure outperforms FP-growth in terms of CPU time and memory usage for all of the support values used; the enhanced performance using PLT when compared with FP-growth ranges from 30 percent to 50 percent.

Sequential Pattern Mining (SPM).
Frequent pattern mining has been extended to find more complex structure such as sequential pattern mining.It discovers frequent subsequences as patterns in a sequence database.A sequence database stores a number of records, where all records are sequences of ordered events, with or without concrete notions of time.A large number of real-world domains such as user profiling, medicine, local weather forecast, and bioinformatics show an inherent tendency to be modeled by means of sequences of events/objects related to each other.This great variety of applications of sequential pattern mining makes this problem one of the central topics in WSNs data mining as shown by the research efforts produced in the recent years.The sequential pattern mining techniques in sensor network based on either traditional sequential mining algorithms such as Apriori-like algorithm [53], Apriori-based methods: GSP [54] PSP [55], and pattern growth approaches: FreeSpan and PrefixSpan [56,57] or some new algorithm are devised specifically to work with sensor network environment.

Centralized Approaches Aim to Solve WSNs' Application-
Based Issues.Esposito et al. [58,59] presented a multidimensional relational sequence mining framework to identify the hidden frequent temporal correlations between sensor nodes.The algorithm is based on generic levelwise search method called APRIORI [60] for discovering correlated sensors.The framework exploits the relational language to describe the temporal evolution of a sensor network along with contextual information by working in two phases.Firstly, an abstraction step is to segment and label the real-valued time series into similar subsequences by using a kernel density estimator approach.Then, the knowledge is enriched by adding interval-based operators between the subsequences obtained in the discretization step, and the relation pattern mining algorithm has been extended in order to deal with these new operators.By taking into account the interval-based temporal data along with contextual information about events, it discovers interesting and more human-readable patterns.The framework is evaluated on real dataset collected from a wireless sensor network made up of 54 Mica2Dot [61] sensors deployed in the Intel Berkeley Research Lab [62].Each sensor collected topology information, along with humidity, temperature, light, and voltage values once every 31 seconds.Results show the strong correlation among some measurements, which is useful for anomaly detection.
Cook et al. [21] present MavHome smart home architecture which focuses on the creation of an intelligent home, perceiving the state of the home through sensors and acting upon the environment through device controllers.An important characteristic of the proposed architecture is the ability to make decisions based on predicted activities.To predict the activities, an algorithm called episode discovery (ED) is proposed, which is based on the work of Srikant and Agrawal [54] for mining sequential patterns from timeordered transactions.Values that can be predicted include the usage pattern of devices in the home, the movement patterns of the inhabitants, and the typical activities of the inhabitants.They utilize prediction algorithms on action sequences stored in inhabitant event history to forecast user actions.Actions can then be automated based on the significance of mined patterns as well as the predictive accuracy of the next event.
A key disadvantage is the fact that the entire action history must be stored and processed off line, which is not practical for large prediction tasks over a long period of time.Cook et al. demonstrated the effectiveness of MavHome on synthetic smart home data and real data collected by students using X10 controllers in their homes.Experiments show a predictive accuracy as high as 53.4% on the real data and 94.4% on the synthetic data.
Rabatel et al. [22] presented a strategy to detect anomalies from sensor data to improve the railway maintenance.They extract sequential pattern from real railway data and identify the abnormal behavior.Based on these abnormal findings, alarms are automatically triggered to notify potential failures.This abnormal behavior depends on environmental (weather conditions, travel characteristics) and structural (route, episode index in the route) changes in data.The PSP [55] algorithm has been used to identify the sequential patterns.To tackle the environments conditions, a contextual knowledge-based method is proposed, which is able to provide information on the seriousness and possible causes of a deviation.The proposed technique helps in proactive maintenance of train.However, real-time context can be improved by providing precise and exact information for anomaly detection.Guralnik and Haigh [23] use sequential pattern mining to learn typical behaviors of humans in their homes.Human behavior is inferred by using motion sensors, pressure pads, door latch sensors, and toilet flush sensors.They installed 10-20 sensors of different types in a home and built models of what sensor firings correspond to what activities, in what order, and at what time.For example, "In 60% of the days, the Kitchen-Motion sensor fires between 18h00 and 18h30, and then the Living-Room-Motion sensor fires between 18h20 and 20h00, and then the Bedroom-Motion sensor fires between 19h45 and 22h00." Their algorithm uses these data to learn the sequences of rooms in which the person was acting, and it uses domain knowledge to extract the sequences of rooms the person was acting in.These sequences are then analyzed by a human expert to identify complex behavior models.These models can be used to select the appropriate response plan to the action of elderly.
Wu et al. [63] proposed a new algorithm for mining sequential alarm patterns (MSAPs) from the alarm data generated by GSM system.Sequential events are identified from alarm data by defining time interval between adjacent events.For example, if time is set as six hours, then the sequential alarm pattern (, , ) indicates that , , and  happen in order and that the time interval between  and  and between  and  is less than six hours.An example of sequential alarm sequence redrawn from [63] is shown in Figure 3.
The number in circle represents the error ID, and  , denotes the time difference between alarm event  and alarm event .The knowledge extracted is not only useful for identifying relevance between two events, but it is also predict the alarm sequence and takes proper steps to prevent the occurrence of the alarms if at all possible.For example, if the network operator detects that, the alarm  occurring at time  operator should dissipate this alarm before the time + , to alleviate the abnormal situations incurred.The limitation in this technique is that it cannot discover other possible timeinterval patterns between the events.
It is observed that there is none of centralized solutions which aim to maximize the WSNs' performance.[64] proposed an object tracking strategy named the multilevel object tracking (MLOT) to discover sequential patterns in object tracking sensor networks (OTSNs) by mining the movement log in sensor networks.A multilevel hierarchical structure is adapted by using the clustering mechanism that represents the hierarchical relations among sensor nodes to achieve the goal of keeping track of moving objects in a real-time manner.The movement logs of the moving objects are analyzed by developing the data mining algorithm movement pattern generation (MPG) to obtain the movement patterns, which are then used to predict the next position of a moving object and to activate the least sensor node.The MPG is based on Apriori which uses the frequency of the inference pattern to evaluate the confidence of the pattern and which with the highest frequency serves as the basis of the prediction.

Distributed Approaches Aim to Maximize
WSNs' Performance.Tseng and Lin [65] proposed an object tracking strategy named TMP-mine to discover sequential patterns in object tracking sensor networks (OTSNs) by mining the temporal movement patterns (TMPs) logs.The discovered temporal movement rules (TMRs) are used to predict the location of next objects for saving energy.In the proposed model object is able to record the sensor nodes it visited along with the arrival time at each node.The movement log is collected by equipping the sensor nodes with storage devices.The WSN collects and integrates the movement log of moving objects.The integrated movement log is used as the input to the data mining method named the TMP-miner which uses the pattern growth approach for discovering the TMPs.By applying the TMP-mine algorithm, the TMPs are discovered, and then the temporal movement rules (TMRs) are generated for predicting next location of moving object.Suppose that the following two rules are discovered by vehicle tracking system: By dispatching these rules to the corresponding sensor nodes, the tracking can be made in energy-efficient way.For example, if a car moves with the pattern as (Station A → interval 10 min → Station B → interval 5 min) that matches with Rule 1, then the node in Station B has only to activate the node in Station C rather than that in Station D or those around Station B.
Samarah et al. [66] proposed an energy-efficient prediction-based tracking technique by using the sequential patterns (PTSPs).This technique helps to predict the future location of a moving object with the minimum number of sensor nodes while keeping the other sensor nodes in the network in sleep mode.The PTSP is based on the inherited patterns of the objects movements in the network and the utilization of sequential patterns to predict in which sensor node the moving object will be heading next.

Clustering.
Clustering is unsupervised learning, where given data is categorized into subsets so that each subset represents a cluster which has distinctive properties.It has been considered a useful technique especially for applications that require scalability to large number of sensor nodes.Clustering also supports aggregation of data in order to summarize the overall transmitted data.In the current literatures, problems related to clustering are addressed by node clustering or data clustering.Recently, large numbers of node clustering algorithms have been designed for WSNs [67][68][69][70][71][72][73][74][75][76][77][78][79][80][81][82][83].These clustering techniques widely vary in their objectives depending on the node deployment and bootstrapping schemes, the pursued network architecture, the characteristics of the cluster head (CH), and the network operation model.Although node clustering may be related to data clustering, for example, considering data similarity of neighboring node, many popular node clustering algorithms that partition the sensor nodes into a number of small groups and elect a cluster head for every group do not use the data mining techniques directly.In this study, we only focus on data clustering techniques to efficient data mining and find data correlations among the nodes.Figure 4 shows the commonly used data clustering in data mining process.
This work adapted the K-mean, hierarchical, and data correlation-based methods.The k-mean algorithm takes the input parameter, k, and partitions a set of  objects into k clusters so that the resulting intracluster similarity is high, but the intercluster similarity is low.Cluster similarity is measured with respect to the mean value of the objects in a cluster.Hierarchical method creates a hierarchical decomposition of the given set of data objects.It works by grouping data objects into a tree of clusters, whereas, data correlation-based clustering forms clusters based on spatial and temporal correlations with similar node sensory values within a given threshold, and these clusters remain fixed until the sensory value threshold has changed over time.When the threshold values change, the related sensor nodes will then communicate with neighboring nodes associated with other clusters to change their cluster memberships.The drawback of this type of clustering is that it does not consider node residual energy.It is observed from the survey that the centralized and distributed clustering solutions are aim to maximize the WSNs performance.

Centralized Approaches Aim to Maximize
WSNs' Performance.Liu et al. [84] proposed a centralized graph-based energy-efficient data collection (EEDC).EEDC is on-demand clustering algorithm that clusters node into groups such that members have similar sensor readings, and thus the protocol clusters the network with an awareness of the phenomena being sensed.EEDC is a centralized approach where the sink compares data from different nodes with a user-defined dissimilarity measure.EEDC models the cluster creation process as a clique-covering problem by constructing a graph  such that each sensor node is a vertex in the graph.An edge (, V) is drawn if the dissimilarity measure between vertex  and vertex V is less than or equal to the given intracluster dissimilarity measure threshold max dst.A cluster is a clique in the graph, and the clustering problem uses the minimum number of cliques to cover all vertices in the graph.This process minimizes the number of clusters and maximizes the energy saving.The sink also dynamically adjusts the clusters based on spatial correlation and the received data from the sensors.The algorithm produces robust and well-balanced clusters.However, due to centralized processings it is not suitable for large-scale WSNs.

Distributed Approaches Aim to Maximize
WSNs' Performance.Guo et al. [85] proposed the H-cluster, a distributed algorithm to cluster sensory data.The input of this algorithm is the set of sensory data collected by all of the sensors from the time WSN starts working up to the current time.The output of the algorithm is a set of cluster features that summarize the clusters of the input sensory data-set.Hilbert-Map mapping algorithm has been used to map a d-dimensional sensory data space into a 2-dimensional area covered by a given WSN.H-cluster has 2 phases: (1) it merges connected grid features with local cluster features of (sensory dimensional) D at each destination node; (2) it combines the connected local clusters to global clusters.The experiments on the centralized and distributed data are carried out to compare the H-Cluster with C-Corner and C-Center algorithms.During experiment, four types of environment attributes are sensed by the sensors, which are temperature, humidity, light, and voltage.The results show that H-Cluster algorithm is much efficient in data loss, energy, and the quality of cluster data in small WSN.The results also shows that as the amount of sensory data delivered increases the amount of data loss also increases and energy efficiency decreases by increasing the size of WSNs.
Yeo et al. [86] proposed data correlation-based clustering scheme (DCC) based on similarity of sensor data along a spatial suppression scheme which helps to reduce the data size.DCC enhances the advertisement phase of HEED [71] in which cluster heads are selected according to probability of becoming a cluster head; during this phase, sensor nodes communicate with each other, and the resulting clusters are organized by sensor nodes which have similar readings.Spatial suppression is performed on cluster head, and it also computes the difference between sensor reading and representative value.If a cluster head has redundant data, it will remove it except for the node identification.The experimental results justify the hypothesis claim that the clustering based on data correlation has better compression performance than ordinary clustering based on locality of communication, they show that DCC reduces 40% of data size through suppression and prolongs network lifetime 20%-30%.However, for the large-scale network applications (nodes > 500), DCC is inefficient because each cluster head needs more energy to collect similar data readings and also to communicate with several nodes.Also in case of low percentage of similar data reading, DCC is ineffective due to higher rate of cluster head creation.
Beyens et al. [87] proposed a cluster-based architecture for wireless sensor networks in which cluster heads spatiotemporally correlate and predict the measurements of the cluster members by executing their prediction model.In their approach, the cluster heads execute a prediction model, while gateway nodes at the circumference of the clusters are responsible for the routing task.Prediction model is used to select a suitable node of the cluster to be activated.The idea is to put a sensor node to sleep when there are no objects in its sensing region.
Yoon and Shahabi [88] present the clustered aggregation (CAG) algorithm that forms clusters of nodes sensing similar values within a given threshold (spatial correlation), and these clusters remain unchanged as long as the sensor values stay within a threshold over time (temporal correlation).By grouping nodes on similar values, CAG only transmits one reading per group.When the threshold values change, the related sensor nodes will then communicate with neighboring nodes associated with other clusters to change their cluster memberships.CAG guarantees the result to be within a user-specified error-tolerance threshold.Cluster formation is performed while queries are disseminated to the network (query phase), where clusters group nodes sensing similar values.Subsequently, CAG enters the response phase wherein only one aggregated value per cluster is transmitted up the aggregation tree.CAG is a lossy clustering algorithm (most sensory readings are never reported) which trades a lower result precision for a significant energy, storage, computation, and communication saving.
Taherkordi et al. [67] proposed a communicationefficient distributed protocol for clustering sensory data.A distributed version of -Mean clustering algorithm is proposed and sends summarized data towards sink which reduces the communication transmission, time, and power consumption of sensor nodes.The sensor network is divided into clusters and cluster head node will only communicate with sink.Initially, base station transmits current center locations to cluster heads.Cluster head collects data from its sensor node and sends it to the base station including count and vector sum of its local sensory data points as well as sum of the squared distance from each local point to its center.On receiving data from CH, the base station updates the cluster mean, and the algorithm repeats until the function convergence is met.The efficiency of the algorithm is evaluated via simulations.Several programs are run to get the average number of transmissions over the network during each test.According to results, the communication cost is independent of the number of sensors () and increases linearly by increasing the number of centers.Major issues are extra memory for cluster head and computation power for summarization of data before transmitting to sink.Also the algorithm requires multiple rounds of message passing between cluster heads and the base station; this may have a serious effect on communication efficiency when the number of sensors is relatively high.
Wang et al. [89] promoted the idea of clustering the WSNs based on the queries and attributes of the data.The main motive is to achieve efficient dissemination of data in the network.The concept resembles the data-centric design model of WSNs. to form clusters.Those nodes that hear the request decide whether they should nominate themselves as CHs based on their energy.After receiving the base-station request, sensor nodes having intention to become CHs wait for a random time period that is based on the remaining battery supply.If a node nominates itself, then it broadcasts an announcement to all nodes.A node joins the CH that it can reach over the least number of hops.Upon hearing a CH announcement from a node whose attribute is different, the recipient node establishes a new cluster for that attribute and becomes a CH.To evaluate the attribute-based clustering scheme, the authors have provided the theoretical analysis of it with flooding-based schemes.Analysis shows its attributebased clustering scheme yield that gains over flooding-based schemes when there are subregions in the sensor network that are more targeted than others, that is, when the distribution of inquiries is not uniformly distributed over time and space.Ma et al. [90] the proposed distributed, hierarchical clustering and Summarization algorithm (DHCS) for online data analysis and mining in sensor networks.The proposed method clusters sensor nodes based on their current data values as well as their geographical proximity, and it computes a summary for each cluster.The algorithm adopts several techniques, such as difference and hop count thresholds, to model node, and distance-based clustering.Initially, each node treats itself as an active cluster.Then, similar adjacent clusters are merged into larger clusters round by round.In each round, each cluster will try to combine with its most similar adjacent cluster simultaneously.Two clusters can be merged only if both consider one another as the most similar neighbor.DHCS terminates when no merging happens any more.The final clusters, which cannot be merged any more, are called steady clusters.The training examples are vectors in a multidimensional feature space with corresponding class labels.A nearest neighbor classifier is a lazy learner that does not process patterns during training [91].To respond, a request to classify a query vector is made to locate the closest training vectors according to the distance metric.The classes of these training vectors are used to assign a class to the query vector.
Rule-based classifier groups the dataset in predefined classes by using "if. ..then. .." rules of following form: (Condition) → Y: condition is a conjunction of attribute, and Y is a class label.SVM (support vector machine) techniques partition the data belonging to different classes by fitting a hyperplane between them which maximizes the partition.The data is mapped into a higher-dimensional feature space where it can be easily partitioned by a hyperplane.Furthermore, a kernel function is used to approximate the dot products between the mapped vectors in the feature space to find the hyperplane.

Centralized Approaches Aim to Solve WSNs' Application-Based Issues. Chikhaoui et al. [92] proposed the decision Tree (DT-) based classification technique for sensor data.
They applied the classification model to identify the persons in ubiquitous environment.In order to identify persons, the proposed approach first extracts frequent patterns called episodes from the datasets using the Apriori algorithm [53].The next step evaluates the extracted patterns and assigns weights to these episodes to construct frequent episode weight matrix (FEWM).
Finally, the classification algorithm Decision tree (DT) is applied on FEWM.DT builds pattern classifier from a labeled training data-set using a divide-and-conquer approach.To build up a DT model, it recursively selects the attribute that is used to partition the training data-set into subsets until each leaf node in the tree has uniform class membership.The proposed approach is validated by experiment using data collected from the Domus Laboratory [93] and the Testbed smart home [94].The general performance and classification accuracy of algorithm are evaluated by using the Weka framework version 3.7.0[95].Experiment results show good classification.However, using frequent episodes alone without temporal constraints and deep analysis does not guarantee good identification.
Sharma et al. [96] proposed a methodology for classifying the sensors data by using nearest neighbor trajectory classification (NNTC).The training phase simply stores every training example with its label.To make a prediction for a test example, first, its distance to every training example is computed.Then,  closest training examples are stored, where  is a fixed integer and  ≥ 1; among the  examples, it looks for the label that is most frequent.This label is the prediction for this test example.The algorithm is evaluated by building a classifier from the preprocessed training data generated from NS2 [97] and test trajectory data [98] using class labels.Experimental investigation yields a significant output in terms of the correctly classified success rate, 92.3%.
Akhlaghinia et al. [99] proposed the prediction technique in smart home environments to predict the behavior pattern of occupants.The sensor NWs collect the variety of attributes including environmental changes and occupant's interaction with the environment.The collected data is then used by the learning approach to construct a classification-based predictive model to predict the ambient intelligence environment occupancy.The occupancy is predicted by using the fuzzy rules which are modeled by using the past value of time series data.In the learning process, input from the sensor is compared with stored rules to take appropriate action.The prediction-based approach improves the energy saving in smart homes and enhances the safety and security of occupants.The result shows the ability of the proposed technique to predict the combined occupancy time series.However, the model is implemented in single-user environment and unable to predict the complex environmental patterns in multi-user environment over long period.

Centralized Approaches Aim to Maximize
WSNs' Performance.Gaber et al. [100] proposed the lightweight classification (LWClass), a one-pass algorithm for on-board mining of data streams in sensor networks.They used the algorithm output granularity (AOG) [101,102] technique to preserve the limited memory size and change the algorithm output rate according to data rate, available memory, algorithm output rate history, and time constraints to fill the available memory with generated knowledge.The algorithm works by searching for the nearest instance stored in main memory when a new element arrives.All instances are already stored in the main memory according to a prespecified distance threshold.The threshold here represents the similarity measure acceptable by the algorithm to consider two or more elements as one element according to the elements attribute values.If the algorithm finds this element, then it checks the class label.If the class label is the same, then it increases the weight for this instance by one; otherwise, it decrements the weight by one.If the weight becomes zero, then this element is released from the memory.The algorithm is empirically validated using synthetic streaming data under the resourceconstrained environment of a common handheld computer.[103] presented a distributed framework for building and deploying predictors in sensor networks.By using the computational power of each sensor, a powerful learning structure on whole network is constructed.A distributed voting approach is proposed in which each sensor is a leaf of tree (DT) to perform local prediction.Instead of sending the raw data, the local predictive models built on sensors transmit the target class to the sink.At sink, the local predication models are combined to construct global prediction model.It shows how the local model enables sensors to respond to the change in target by relearning local models.The proposed framework is useful especially for sensor networks with limited energy, computation, and bandwidth resources.It makes efficient the distributed data mining in the presence of moving class boundaries.Data is also confidentially achieved by transmitting a predictive model instead of original data to the sink.The distributed prediction model is evaluated using J48 decision tree (implemented in WEKA) on variety of dataset for both simple and weighted voting schemes.According to results, distributed prediction model has the potential of an increase in accuracy combined with a reduction in model size and runtime as compared with a centralized approach.Major issues in this framework are the need of an expensive CPU on each sensor node for computing and building local predictive model, and also extra memory is required to store local predictive model.

Distributed Approaches Aim to Maximize
WSNs' Performance.Malhotra et al. [104] proposed a distributed classification scheme to generate effective feature vectors of low dimension (FVLD) for wireless audio network.A distributed cluster-based algorithm for detection and classification of vehicles has been proposed.Sensors form clusters ondemand for the sake of running a classification task based on the produced feature vectors.The monitoring area is divided into clusters, and a cluster head is selected for each cluster.All sensors send their feature vector to cluster heads.The cluster head combines all received feature vectors (including one from itself), executes the classification task using, for example, KNN or ML classifiers, and makes decision on the class of the unknown vehicle.Two approaches were proposed: the first combines extracted features and the second combines individual decisions.Classification using decision fusion and a maximum likelihood (ML) classifier led to the best results.ML is also compared with KNN classifier with various settings of data and decision fusion schemes.The proposed technique produced the best classification accuracy of 89.46% as compared with all other approaches.
Flouri et al. [105][106][107] have proposed distributed and incremental techniques for learning classification rules using SVM-based (support vector machine) technique in a sensor network.The authors proposed two distributed algorithms: the distributed fix partition SVM (DFP-SVM) and the weighted distributed fix partition SVM (WDFP-SVM) for training a SVM applied to the classification problem in a WSN.SVM is incrementally trained on example set called support vector.The fact with SVM is that the number of support vectors is very small compared with the number of all sample values.Besides, the support vectors (and offset) reveal compressed representation of separating SVM hyperplane.That is why sending only the support vectors instead of all training samples to the next cluster head is obviously very energy efficient due to communication reduction.After training, the required parameters of the kernel functions are transferred to each node for classification.The performance of the proposed approach is evaluated by running number of simulation, and comparison is made with centralized algorithm.The results show that energy consumption decreases when the SVM is trained incrementally as compared with the centralized case.However, the challenges for SVM formulations are computational complexity and the choice of proper kernel function.
Rajasegarar et al. [108] proposed the SVM-based technique for outlier detection in sensor data.This technique uses one-class quarter-sphere SVM to identify local outliers at each node and to minimize the computational complexity.The sensor data that lies outside the quarter sphere is considered as an outlier.Each node communicates only the radius information of sphere with its parent for outlier classification.This technique identifies outliers from the data measurements collected after a long-time window and is not performed in real time.The technique also ignores spatial correlation of neighboring nodes, which makes the results of local outliers inaccurate.The technique is evaluated by using the real sensor measurement collected from deployment of wireless sensors in the Great Duck Island Project [2] for monitoring the habitat of sea birds.The algorithm is implemented in Matlab and two simulations were run to measure the computational strategy and various kernel functions.Results reveal that the proposed technique achieves significant energy savings in terms of communication overhead in the network.

Comparison of Data Mining Techniques for WSNs
This section identifies several common and different aspects of data mining techniques specially designed for WSNs discussed above.These aspects will be used as metrics in the comparative Tables 2, 3, 4, 5, and 6.First, evaluation aspects for different techniques are discussed, and, then, comparative tables are presented to compare and differentiate existing data mining techniques for WSNs

Input Sensor Data.
Sensor data can be viewed as large volume of real-valued data that is continuously collected from WSNs.The type of input sensor data demonstrates which data mining techniques can be used to analyze the data.Data mining techniques usually consider following two characteristics of data.
Attribute.Mining techniques can identify the association between data attributes.Attributes can be homogenous [50] or heterogeneous [33,48].Homogenous attribute means sensing single-value attribute, for example, temperature only.For heterogeneous case, each node may be equipped with multiple sensors and can sense multiple attributes, for example, temperature, humidity, and pressure.The data mining technique should be able to identify the correlation between multiple attributes.
Correlation.Two types of data correlation appear at each sensor node.The first type is attribute correlation, that is, dependency among data attributes.The second type is in terms of time and space, that is, temporal and spatial correlation.Temporal correlation indicates that the readings from different sensor node are observed at the same time instant, and readings observed at one time instant are related to the readings observed at the previous time instant, whereas, spatial correlation indicates that the readings from sensor nodes geographically close to each other are expected to be largely correlated.Capturing spatiotemporal correlation helps to predict future trend of sensor reading and identification of dead node if reading from correlated sensor is missing.

Processing Architecture.
In order to apply data mining technique on sensor data, we need to determine the models of computation.There are two general models.Consider the following.
Centralized.The simplest way to analyze WSNs data is to use a centralized model.In this approach, entire raw data collected from WSNs is transferred to central server which maintains a database of readings from all of the sensors.The central server performs offline extensive analysis in order to find interesting patterns from the aggregated data.With the size of WSNs increasing, the amount of data transmitted in the system will become huge.The obvious drawback of this approach is high consumption of energy and bandwidth.Furthermore, it is not scalable to very large number of sensors.
Distributed.Another computation approach uses distributed model, in which sensor nodes use their processing abilities to carry out some mining tasks locally and transmit only the required and partially processed data called local model.
Local models contain the compact event patterns rather than raw data.For example, data collected from different sensor can be aggregated before being transmitted to central server.
In these systems, an intermediate node called "aggregator" is used to collect and aggregate the data from different sensors.Since sensor nodes are constrained in resources, the challenge for this approach is how to satisfy the mining accuracy while keeping the communication overhead, memory, and computational cost low.Node Role.Node can perform three types of role [33] as follows.

Data Mining
(i) Regular Sensor.These are the nodes with limited resources, and they are used to sense the phenomena and send the sensed data to the base station.
(ii) Cluster Head.Cluster head can be a regular sensor node, or it can be rich in resources.In centralized approaches, cluster head is a regular sensor node that only controls the cluster membership.In distributed approaches, besides responding for cluster formation, CHs perform aggregation/fusion of collected sensors' data.Therefore, they are equipped with significantly more computation and communication resources.
(iii) Relay.It is the node that acts as medium to transmit the data packet from one node to the others.
Node Task.In centralized approach, node task is to sense the phenomena being monitored and send the sensed data to the base station.In distributed approaches, node can perform computation and can take action based on the detected phenomena or target.

Application Area.
We also evaluated the type of application benefited from WSNs data mining techniques.Here, we exemplify some real-world applications as follows.
(i) First is the environmental monitoring [5-7, 51, 58, 87], in which sensors are deployed in harsh and unattended regions to monitor the natural environment.Data mining techniques can identify when and where an event may occur and trigger an alarm upon detection.
(ii) Second is the habitant and health monitoring [1,2,99,109], in which patients/humans are equipped with small sensors on multiple different positions of their body to monitor their health or behavior.Data mining technique can identify the abnormal behavior and help to take effective action.
(iii) Third is the object tracking [3,4,65,66].in which sensors are embedded in moving targets to track them in real-time.Data mining techniques help to improve the estimation of the location of targets and also to make tracking more efficient and accurate.
(iv) Fourth is the WSNs performance [46,48,50,51].WSNs are usually unattended and deployed in harsh environment.Sensor nodes are resource constrained especially in terms of power.Data mining techniques help to identify the faulty or dead nodes.They also help to conserve energy by using in-network processing in which aggregated data is sent to central side.
(v) Fifth is the data analysis [67,84,90].Data mining techniques help to discover potentially interesting data patterns in a sensor network for a certain application.
(vi) Sixth is the real-time monitoring [64,65,85].Data mining techniques especially distributed techniques help to identify certain patterns and predict future events in a given time window, which make real-time response and action feasible.

Implementation.
Each technique is also evaluated in terms of experimental validation, that is, which dataset is used, which WSNs optimization objectives are achieved, and so forth.
Evaluation Method.Analytical modeling, simulation, and real deployment are the most commonly used techniques to analyze the performance of data mining technique for WSNs.
(i) Analytical Modeling.This method is very complex, and usually certain simplifications are assumed to predict the performance of the proposed scheme.Such assumptions and simplifications may lead to imprecise results with limited confidence.
(ii) Simulation.It is the most popular and effective approach to design and test any proposed scheme in terms of cost and time; it also provides higher level of details as compared with real implementation.However, the appropriate selection of a simulation framework according to problem and network characteristics is a critical task.
(iii) Real Deployment.It may not be feasible to evaluate the performance of these techniques through real deployment due to the unavailability of appropriate hardware in terms of technical and design limitations.Usually, the real deployment requires hundreds of sensor nodes, and cost becomes another important issue.In a nutshell, evaluating any technique proposed for WSNs through real deployment can get the most convincing results although the evaluating process is complex, costly, and time consuming.
Data Source.It refers to dataset use to experimentally validate the proposed technique.Two types of dataset are used generally, that is, synthetic and real.It is observed from this paper that most of the techniques use the simulation on synthetic dataset to validate the result.In this paper it is observed that most of the studies used the simulation due to limited processing power of sensor nodes.
Optimization Objective.Since WSNs are constrained in terms of different resources, the technique is also evaluated in the optimization objective that has been achieved.Most of the techniques consider the resource constraint and different design philosophies of network.None of them can work efficiently for all of the performance metrics like network size, communication overhead, energy efficiency, memory consumption, node mobility, and, and so forth.The large variations in the performance metrics make it a difficult task to present a comprehensive evaluation.

Limitations of Existing Data Mining Techniques for WSNs
Tables 2-6 show the characteristics of data mining techniques designed for WSNs.It is observed from comparative analysis that the existing techniques have the following shortcomings.
(i) Most of the techniques do not take into account the heterogeneous data and assume that the sensor data is homogenous [42, 46, 49-51, 65, 87, 110].They ignore the fact that different attributes together can improve the mining accuracy.In some cases, homogenous data cannot contribute appropriately toward realtime decision.(ii) The majority of techniques only considers the spatial, or temporal or spatiotemporal correlations [65-67, 87, 88] among sensor data of neighboring nodes and does not consider the attribute dependency among sensor nodes.This in turn increases the computational complexity and reduces the accuracy of mining technique.(iii) The techniques which consider spatial correlation [51] among sensor data of neighboring nodes suffer from the choice of appropriate neighborhood range.Techniques which consider temporal correlation among sensor data suffers from the choice of the size of the sliding window.(iv) The majority of techniques uses centralized approach [21, 42-44, 46, 58, 84, 101] in which all data is transmitted to the sink node for identifying certain patterns.These techniques cause much communication overhead and delay the response time.While the techniques that used distributed architecture optimize response time and energy consumption, they have the same problem as that of the centralized approach if the aggregator/cluster head has a large number of nodes under its membership.(v) Excluding a few, the performance of all of the schemes discussed in this paper has been evaluated with the help of different simulation tools.Although the number of simulators is available and plays an important role for developing and testing new technique, there is always some kind of risk involved as simulation results may not be accurate.In order to analyze a protocol more effectively, it is important to know different available tools and understand the associated benefits and limitations.Due to different performance requirements according to specific applications, a general tool for sensor networks is still lacking at present.(vi) The techniques evaluated by using analytical modeling [21,23,46,49,100,109] used certain simplification and assumption to evaluate the performance of proposed technique.Such assumptions and simplifications may lead to imprecise results with limited confidence.None of the proposed technique is evaluated by using real deployment.Although real deployment is complex, costly, and time consuming, accurate results can only be obtained by using real deployment.
(vii) Excluding a few [22,103,109], the majority of techniques assumes that sensor nodes are stationary and do not consider nodes mobility.Applying these techniques for mobile networks or the networks with dynamic changed topology would be challenging.
(viii) Most of the techniques used the synthetic data.Although synthetic data is easily available, there always been chances that results generated on synthetic data are not accurate.
(ix) For the data mining techniques themselves, frequent pattern mining [15][16][17][18][19][20] approaches suffer from choice of proper and flexible support and confidence threshold.Clustering techniques [11][12][13][14] suffer from the choice of an appropriate parameter of cluster width, and computing the distance between data instances in heterogeneous data is computationally expensive, whereas classification-based techniques [24][25][26] require some prior knowledge to classify the incoming data stream.However, learning accurate classification model is challenging if the number of variables is large in deployed WSNs.

Future Research Directions
It is observed from the analysis of existing data mining work on sensor network-based application there are still shortcomings in existing techniques.By seeing these shortcomings and special characteristics of WSNs, there is a need for data mining technique designed for WSNs.The technique should be based on the following requirements.
(i) The technique should combine offline learning mechanisms with distributed and online data processing.
(ii) It should also consider the resource constraint of WSN and its special characteristics such as node mobility and network topology.
(iii) The technique should consider heterogeneous data and dependencies among spatial, temporal, and attribute correlations which may exist between adjacent nodes.
(iv) During online mining, the technique should be capable for incremental learning.
(v) The technique should have low computation complexity and be easy to be implemented.
Based on aforementioned requirements for WSN, a hybrid data mining framework is proposed as shown in Figure 6.In this framework, sensor nodes use their processing abilities to locally carry out mining processing and transmit only the required and partially processed data called local models.Single-pass algorithms are applied for network data processing as the data is continuously arriving and not available for the next scan.
Local models contain the compact event patterns rather than raw data which address the issue of communication (ii) Since the data management is a crucial issue in WSNs data [111], in order to deal with large-scale data from WSNs, the proposed framework splits the data processing tasks at multiple locations, in-network processing and processing at central server.In-network processing splits the large task into smaller ones at node level and cluster head which is distributed over the entire network and executes parallelly.At the node level, storage capacities of single nodes are used to compute the local model, which contains aggregated data from single node, whereas cluster head acquires the data from group of nodes and aggregate data readings over a certain region or period.As a result, network model is computed at each cluster head which contains compact data from set of nodes and reduces data size to be transmitted.Network models can be integrated at sink to get the global view of real-time applications.Since the sink at network level has restricted resource and cannot process large-scale data for predictive analysis, therefore, network models are sent to central server where global models can be computed for predictive offline analysis.Historical query from the user can also be addressed from central server, whereas instant query can be handled by sink to support real-time response.In this way of data distribution, the proposed framework is feasible to deal with large amount of data obtained from WSNs.
(iii) It can consider the resource constraint of sensor node by using context-awareness techniques.Memory, energy [79], and bandwidth are considered in the implementation of data processing on the sensors; for example, many summarization and aggregation techniques can be adopted to reduce energy and bandwidth consumption.
(iv) The framework can address the problem quickly changing nature of WSNs data, where characteristics of the monitored process may change over time and render the old models outdated.This problem can be addressed using the incremental learning mechanism [39,112] that helps the model to update new information.
(v) The framework can identified the spatial-temporal correlation at local model by using data correlationbased clustering, whereas attribute correlation can be identified at global model by using the multipass data mining algorithms.
Currently, we are working on implementation of this hybrid framework, and the implementation will be completed in the near future.

Conclusion
The emerging need for the data mining techniques in the field of WSNs resulted in the development of numerous algorithms.Each one of these algorithms solves certain issues related to the appropriate WSNs type and application.In this paper, we analyzed, discussed, and compared the related existing research approaches.We observed that the techniques intended for mining sensor data at the network side are helpful for taking real-time decision as well as serve as prerequisite for development of effective mechanism for data storage, retrieval, query, and transaction processing at central side.Moreover, we have presented problem-based taxonomy, an overall analysis and review of the past research and their limitations which can provide insights for endusers in applying or developing an appropriate data mining method and appropriate technology for WSNs.Based on these limitations, we have proposed a hybrid framework which can address the shortcomings of existing work.We have also discussed the challenges for implementing data mining techniques in resource-constrained WSNs.Besides, there are a number of open issues in existing studies which need to be addressed.Surely, the number of WSNs applications presented here is neither complete nor exhaustive but merely a sample of applications that demonstrate the usefulness and possible applications of data mining method in sensor network.
We believe that WSNs applications will become more mature and popular with the advancement of sensor technology, and sensor data will become more information rich.Mining techniques will then be very significant in order to conduct advanced analysis, such as determining trends and finding interesting patterns thus enhancing WSNs performance and operation.The intention to present this paper is to stimulate interests in utilizing and developing the previous studies into emerging applications.

Figure 1 :
Figure 1: Taxonomy of data mining techniques for sensor networks.

Figure 3 :
Figure 3: Example of sequential alarm pattern.

Figure 4 :
Figure 4: Data clustering for sensor networks.

Figure 5 :
Figure 5: Classification maps input attribute set (X) to class label (Y).

4. 4 .
Classification.Classification is a task of assigning new object into a class of predefined object categories.Classification model is learned using the set of training data and classifies new data into one of the learned class.
Figure 5 shows that classification maps input attribute set (X) to class label (Y).Classification-based approaches have adapted the traditional classification techniques such as decision treebased, rule-based, nearest neighbor-based, and support vector machines-based techniques based on type of the classification model that they used.Decision tree is a classifier in the form of tree and classifies the instance by starting at the root of tree and moving through it until a leaf node where class label is assigned.The internal nodes are used to partition data into subsets by applying test condition to separate instances that have different characteristics.Nearest neighbor-based approaches classify dataset based on closet training examples.

Figure 6 :
Figure 6: Proposed hybrid framework for sensor network based applications.

Table 1 :
Difference between traditional and sensor data processing.
According to the following reasons, conventional data mining techniques for handling sensor data in WSNs are challenging.Fast and Huge Data Arrival.The inherent nature of Modeling Changes of Mining Results Over Time.When the data-generating phenomenon is changing over time, the extracted model at any time should be up-to-date.Due to the continuity of data streams, some researchers have pointed out that capturing the change of mining results is more important in this area than the mining results.The research issue is how to model this change in the results.
WSNs data is its high speed.In many domains, data arrives faster than we are able to mine.Additionally, spatiotemporal embedding of sensor data plays an important role in WSNs application.This may cause many classical data processing techniques to perform poorly on spatiotemporal sensor data.The challenge for data mining techniques is how to cope with the (v) Data Transformation.Since sensor nodes are limited in terms of bandwidth, transforming original data over the network is not feasible.Knowledge structure transformation is an important issue.After extracting model and patterns locally from WSNs data, the output is transferred to the base station.The challenge for data mining technique is how to efficiently represent data and discovered patterns over network for transmission.(vi)Dynamic Network Topology.Sensor network deployed in potentially harsh, uncertain, heterogenic, and dynamic environments.Moreover, sensor nodes may move among different locations at any point over time.Such dynamicity and heterogeneity increase the complexity of designing an appropriate data mining technique for WSNs.
among the datapoint, whereas, classificationbased approaches have adapted the traditional classification techniques such as decision tree, rule-based, nearest neighbor, and support vector machines methods based on type of classification model that they used.These algorithms have very different and distinct roles; therefore, in order to choose the algorithm for WSNs application, one has to decide in term of these top-level classes.
The highestlevel classification is based upon the general data mining classes used such as frequent pattern mining, sequential pattern mining, clustering, and classification.Most of the frequent pattern mining and sequential pattern mining approaches have adapted the traditional frequent mining techniques such as the Apriori and frequent pattern () growth-based algorithms to find the association among large WSNs data.Cluster-based approaches have adapted the K-mean, hierarchical, and data correlation-based clustering, based upon the distance

Table 2 :
Method.It refers to the data mining algorithm adapted or developed for unique characteristic of WSNs data.Distributed approaches use one-scan algorithms for real-time processing in order to deal with the high data arrival rate; the mining results are expected to be available within short response times, whereas centralized approaches collect the sensory data to single site and applies offline multiscan technique for extensive data analysis.Comparison of data mining techniques for wireless sensor networks.

Table 4 :
Comparison of data mining techniques for wireless sensor networks, continued.