A Diagnosis-Based Clustering and Multipath Routing Protocol for Wireless Sensor Networks

In wireless sensor networks, it is of great importance for fault diagnosis to ensure the gathering information accuracy and reduce energy additionally consumed by faulty nodes, for the deployment of a large number of sensor nodes in hostile environment. In this paper, we propose an energy-e ﬃ cient data collection protocol which consists of clustering and multipath routing. Clustering based on fault diagnosis eliminates the possibility of cluster heads (CHs) acting by faulty nodes which reduce energy consumption and fault information transmission. Multipath routing provided by directed acyclic graph (DAG) increases system fault tolerance. Furthermore, clustering and multihop routing consider residual energy and routing cost, respectively; thus balanced energy consumption is achieved. Performance analysis shows that the message complexity disseminated in clustering and fault diagnosis is acceptable. Simulations demonstrate that the protocol has better energy e ﬃ ciency compared with other related protocols.


Introduction
In recent years, Wireless Sensor Networks (WSNs) have become an attractive technology for a large number of applications, ranging from monitoring to event detection and target tracking [1]. To design and deploy successful WSNs, many issues need to be resolved such as deployment strategies, energy conservation, fault-tolerant routing in dynamic environments, localization, and fault diagnosis. To extend the network lifetime as long as possible, energy efficiency becomes one of the basic tenets in the WSNs protocol design. There are several possible solutions to balance energy consumption, such as deployment optimization [2], topology control [3], and data aggregation [4].
Among these schemes, clustering provides an effective way for promoting energy efficiency [5][6][7][8][9]. In clustering schemes, sensor nodes are organized into clusters and a main node is selected as the cluster head (CH) of a cluster, and the other nodes are called cluster members (CMs). Each CM collects local data from the environment periodically and then sends it to the CH. When the data from all the CMs arrives, the CHs aggregate the data and send it to the BS via single-hop or multihop. When the network is partitioned into clusters, data transmission can be classified into two stages, that is, intra-and intercluster communication. Mhatre and Rosenberg have shown that multihop intercluster communication mode is usually more energy efficient because of the characteristics of wireless channel [10]. Thus it is better to let CHs cooperate with each other to forward their data.
Due to the low cost and the deployment of a large number of sensor nodes in uncontrolled or even harsh environments, it is common for nodes to become faulty. The existence of these faulty nodes in WSNs brings the data collection protocol many adverse effects such as nonuniform distribution of the clustering effect and inaccuracy of the information collected. In addition, too many faulty nodes directly affect the connectivity of the network, resulting in premature network partition, which is an important factor affecting network lifetime. How to identify faulty nodes and eliminate the impact of these nodes gradually attracts more and more attentions.
Multipath routing between a source and a destination is a promising routing scheme to achieve robustness, load balancing, bandwidth aggregation, congestion reduction, and security compared to the single shortest-path routing that is usually used in most networks [11]. Techniques developed for multipath routing are often based on employing multiple

Related Work
Clustering provides an effective way for prolonging the lifetime of WSNs. Heinzelman et al. [5] first proposed a clustering protocol called LEACH for periodical data gathering applications. It is an application-specific data dissemination protocol that uses clustering to prolong the network lifetime. HEED [6] introduced a variable known as cluster radius which defines the transmission power to be used for intracluster broadcast. EEUC [8] and EADUC [9] introduced cluster head competitive algorithms which extend LEACH and HEED by choosing CHs with more residual energy. Both of them achieve well distribution of CHs.
As faults are inevitable in every distributed computer system, especially in WSNs which consist of a large number of capacity-limited nodes, it is important to be able to determine which of them is working and which is faulty. Comparison-based diagnosis is a realistic approach to detect faulty nodes based on the outputs of tasks executed by system nodes. The model is based on comparisons of the outcomes returned by different units executing the same task and uses the invalidation rule of the generalized Maeng and Malek (gMM) model [14,15]. Comparison-based diagnosis initially used for multiprocessor system has been firstly applied to mobile ad hoc networks (MANETs) by Chessa and Santi [16]. Later, Elhadef et al. considered the problems of self-diagnosis of wireless mesh networks (WMNs) and MANETs using the comparison approach [17,18].
For WSNs, traditional comparison-based fault diagnosis protocols for multiprocessor systems, WMNs and MANETs are not suitable without changing. To the best of our knowledge, so far fault diagnosis based on the comparison model has not yet been applied to WSN efficiently. Chen et al. proposed a distributed fault-detection algorithm to locate the faulty sensors [19]. It calculated the measurement difference between neighbor sensors at different times to find if the current measurement of a sensor is different from its previous measurement. Wang et al. provided a cluster-based real-time fault diagnosis aggregation algorithm for WSNs [20]. The protocol is based on the comparison approach aiming at achieving a correct and complete diagnosis for hierarchical WSNs. They assumed that each sensor can transmit data to any other sensor and can communicate directly with the BS, which is unrealistic in practice.

Network Model.
In this paper, we consider a sensor network consisting of N static and homogeneous sensor nodes uniformly deployed over a vast field to continuously monitor the environment. The communication topology of WSN is usually represented by the graph G = (V , E), where each vertex v ∈ V represents a sensor node and each edge (u, v) ∈ E represents a communication link. For any vertex v ∈ V , N(v) is the set of all vertices that are adjacent to v in G. We denote the ith sensor by s i and the corresponding sensor node set S = {s 1 , s 2 , . . . , s N }. Assume that links are bidirectional in nature, which may be realized using two unidirectional links. We denote a bidirectional link between nodes s i and s j as s i -s j , while the directed link from s i to s j is denoted by s i → s j . When a link fails, it means that both directed edges have failed. For graph terminology and notation not defined here we refer the reader to [21]. Moreover, we make the following assumptions about the sensor nodes and the underlying network model.
(1) There is a unique identifier for every node. The computing, storage, and energy power of sensors are limited. Nodes are capable of operating in an active mode or a low-power sleeping mode.
(2) There is a stationary base station (BS) located far from the sensing field. BS distributes control messages in one-hop mode, and its energy and computing capability are not limited.
(3) Nodes are location-unaware, but a node can compute the approximate distance to another node based on the received signal strength, if the transmitting power is known.
(4) All nodes are static and homogeneous which are organized as clusters. CMs communicate with CH with one-hop manner, while the communication between CHs and BS is relayed by other CHs.
International Journal of Distributed Sensor Networks 3 (5) Proper data aggregation mechanism is adopted for energy saving, and there exists a MAC protocol which is executed to solve contentions, providing reliable one-hop broadcast over logical links.
All of these assumptions are typical for wireless sensor networks, which means that our model is general, that is, not unrealistic. We use a simplified model for the communication energy dissipation [22]. Both the free space (d 2 power loss) and the multipath fading (d 4 power loss) channel models are used, depending on the distance between the transmitter and receiver. The energy spent for transmission of an l-bit packet over distance d is The electronics energy, E elec , depends on factors such as the digital coding, modulation, whereas the amplifier energy, ε fs d 2 or ε mp d 4 , depends on the transmission distance and the acceptable bit error rate. To receive this message, the radio expends energy: 3.2. The Diagnosis Model. Each node in the system can be in one of two states: faulty or fault-free. There are different classifications for faulty type. Based on duration, faults can be classified as permanent, intermittent, and transient. A transient fault will eventually disappear without any apparent intervention, whereas a permanent one will remain unless it is repaired and/or removed by external administrator. Based on how a failed node behaves once it has failed, faults can be either hard or soft. When a node is hardfaulted, it cannot communicate with the rest of the system. In WSNs, a node can be hard-faulted either because it is crashed or due to battery depletion. Soft faults are subtle, since a soft-faulted node continues to operate and to communicate with the other nodes in the system, although with altered behaviors. In this paper, we utilize the invalidation rule of the gMM model [14,15] that is summarized in Table 1. In the gMM model, diagnosis is based upon comparison of the results generated by test tasks assigned to pairs of units with a common neighbor. Let u be a unit adjacent to both unit v and w. If nodes u, v, and w are fault-free, then the results agree and the comparison outcome is 0. If unit u is fault-free and any unit v or w is faulty, then the results disagree and the comparison outcome is 1. If unit u is faulty, then the comparison outcome may be not reliable (0 or 1), regardless of the state of v and w. Assuming that the topology of the network does not change during the diagnosis executing, comparison-based approach relies on the following operations.
(1) Test Request Generation. In order to test adjacent nodes, each node u generates a test sequence number i, a test task T i , the expected result R u,i and sends the test request message TEST REQ(u, i, T i ) to its neighbors N(u) at time t. Then, node u sends a message T out to initiate the timer. Node Case 2. If w / = u, this means that w is not the testing node, we should check whether w is u's neighbor. The following two cases arise.
Case 2.1. w ∈ N(u) at time t. In this case, w ∈ N(v) ∩ N(u), that is, w and the tester node u share at least one common adjacent node v. Node w received the test request TEST REQ from u and the test response TEST RES from v, hence it can compare R v,i with R w,i . Node v is diagnosed as fault-free if the comparison outcome is 0, as faulty otherwise. Figure 1(a) illustrates this case.
If there exists some z ∈ N(u) such that R z,i = R v,i then both nodes are diagnosed as fault-free; otherwise, if node z (node v) has been diagnosed as fault-free, then node v (node z) is diagnosed as faulty. Otherwise, the test result R v,i is stored. Figure 1(b) illustrates this case.
(4) Timeout Reception. After sending its test response message, node u initiates a timer to T out in order to guarantee that its neighbors will response within this time bound. Once this bound expires, the testing node u receives the timeout message from the timer and diagnoses all the nodes that did not reply to the test request as faulty. traditional multiprocessor systems, WMNs, and MANETs are unrealistic. In the following, we sum up these problems and give the corresponding analysis.

Problems and
On the one hand, we require a suitable fault diagnosis mechanism for WSNs which is energy-efficient and message complexity acceptable. Sensor nodes are limited in battery energy, computing, and storage capacity, and the protocol design need to consider the energy efficiency of the network nodes. That is, the amount of calculation performed by nodes should be moderate and the traffic generated by the exchange of messages should be reasonable. Clearly, there should be some mechanisms to isolate messages flooding in the networks. Owing to instability of nodes, the number of faulty nodes is far more than the traditional multiprocessor systems or MANETs. Besides, constrained by wireless communication capacity, the structure properties in terms of regularity, connectivity, and so forth, of the communication topology are worse than interconnection networks. As a result, the traditional diagnostic measures cannot meet the fault diagnosis requirements in WSNs.
On the other hand, an excellent clustering mechanism provides a good basis for fault diagnosis. Clustering WSNs, CMs only perform the sensing tasks, while in addition to collecting data from CMs, CHs also have to fulfill a variety of other tasks, such as data aggregation, cluster maintenance, and communication with BS via multihop or one-hop way. Considering the efficient mechanisms for CHs to reduce and balance energy consumption and to ensure the CHs fault-free is crucial, especially in fault diagnosis applications. Besides, nodes are left unattended after deployment requires adaptive node fault processing mechanism. For instance, the fault conditions can be transmitted to the BS, which are uniformly processed according to the actual requirements.
In this paper, we summarize the protocol design as clustering problem based on fault diagnosis and multipath routing problem based upon DAG. Since the networking nodes in WSNs are very limited in resources, clustering should not only have small size, but also be constructed with low communication overhead and computation cost. In addition, the amounts of communication and computation should be scalable as the networks are typically deployed with large network size. For the multipath routing problem, the backbone network composed of CHs can be abstracted as an edge and vertex weighted graph G(V, E, W, R). We view the CHs as vertexes set V , the communication links as edges set E, the communication cost (edge weight) as W: E(G) → R + , and the residual energy (vertex weight) as R: V (G) → R + . In order to provide solutions for the two problems, we believe the following requirements should be met.
(1) Clustering should be completely distributed with accepted message complexity. Each node independently makes its decisions based on local information and results into well-distributed CHs over the sensing field.
(2) At the end of clustering, each node is either a cluster head or a member node.
(3) Using message complexity acceptable diagnostic mechanism to eliminate the harmful influence and to avoid unnecessary energy consumption imposed by faulty nodes.
(4) Utilizing multipath routing mechanism to make the gathering data transmission reliable. In particular, it is necessary to select the cost-aware or energy-aware communication path among all of the possible paths.
(5) Avoiding excessive energy consumption of CHs and maintaining the energy consumption balance of network nodes.

System Design
Our fault diagnosis-based clustering and multipath routing data collection protocol (FDCM) can be divided into two phases mainly as follows. Phase (I): clustering construction based on fault diagnosis; Phase (II): resilient multipath routing selection. In the following, we explain how FDCM works in detail.

Clustering Construction Based on Fault Diagnosis.
In network deployment phase, BS broadcasts a HELLO message to all the nodes in the network at a certain power level which includes a certain number of candidate CHs selected in advance. After receiving this message each sensor node checks whether it is a candidate CH. Each selected CH computes the approximate distance to the BS based on the received signal strength and then executes a distributed cluster head competitive algorithm similar to EEUC [8].
Our CH competition is primarily based on the fault status and residual energy of candidate CHs. The size of cluster is controlled by competition radius R, which is a constant tuned by typical situation. In addition, let s i denote a CM and c i represents any cluster i, respectively.
Before clustering, BS selects predefined candidate CHs randomly on certain probability to compete for final CHs. For the sake of saving energy, nodes that fail to be candidate CHs keep sleeping until the cluster head competition stage ends.
The distributed clustering algorithm which is initiated by BS and executed by each candidate CH is presented in Algorithm 1. First, each candidate CH broadcasts a COMPETE CH(s i .ID, s i .R, s i .RE) message which contains its node ID, competition radius R, and residual energy RE. After the construction of N CH has finished in lines 2-4, each candidate CH checks the fault status of its N CH based on comparison model approach in lines 5-24. Candidate CH s i generates a test sequence number i and the correspondent test task T i and sends a test request message TEST REQ(s i .ID, i, T i ) to its adjacent candidate CH set s i .N CH . Node s i waits for the responses of s i .N CH and diagnoses their status according to the comparison model. Lines 25-36 describe the CH competition process. The candidate CHs with faulty status (soft-faulty nodes) have no qualification for the competition. If s i belongs to s i .N CH and s i receives a FINAL CH message from s j , then s i will give up the competition immediately. After that the faultfree candidate CH makes a decision whether it can act as a final CH. In particular, if the constructed adjacent set N CH is null, then the candidate CH becomes final cluster head immediately. Once fault-free s i finds that its residual energy is more than all the nodes in its S CH , it will win the competition.
After all the final CHs have been elected, immediately, previous sleeping nodes now are waked up and each CM chooses their closest CH with the largest signal strength received. All the CMs register with the CH by sending a JOIN CLU message. In order to determine the status of CMs in each cluster, the CH sends a test quest TEST REQ to its member nodes. These nodes compute the tasks and feed back the results to the sender. The lower layer fault diagnosis algorithm based on comparison protocol is presented in Algorithm 2. Once the faulty nodes are determined, they will be ordered to turn dead. The final CH sets up a TDMA schedule and transmits it to the nodes in the cluster. After the TDMA schedule is known by all nodes in the cluster, the clustering phase is completed and the data transmission stage begins. Based on the execution of Algorithm 1, we can draw the following theorem. Proof. During the execution of clustering algorithm, each candidate CH sends a COMPETE CH message at first. In order to identify the faulty candidate CHs, which will be deprived of the eligibility of final CHs, each of them sends TEST REQ and receives TEST RES message one after another. If it becomes a final CH then broadcasts a FINAL CH message to declare its win, otherwise broadcasts a QUIT ELECT message to exit. After the declaration of winning the election, each regular node broadcasts a JOIN CLU message. So each node has the message complexity of O (1).
Assume that network size is N, the number of candidate CHs is M and the number of final CHs/clusters is N c (N c ≤ M ≤ N). In clustering stage, the overall message overhead is The theorem is proved.
From Theorem 2 we can conclude that the clustering stage has a low message complexity both for individual node and entire network, thus requirement (1) is satisfied. Proof. During the execution of Algorithm 1, for the nodes in sensor network there are at most four states in total: RegularNode, CandidateCH, FinalCH and deadNode. Here the status of RegularNode and FinalCH means it is a CM and CH, respectively.
In the following we first show that any node is either a final CH or a CM after execution of Algorithm 1. Initially, in addition to the selected candidate CHs in advance, the remainder nodes are all regular nodes. For candidate CHs, in lines 5-24, each of them knows the fault status of its adjacent CHs. If the node is determined as faulty nodes, then they quit the competition process, immediately. In lines 33-36, the faulty candidate CHs are ordered to turn dead (deadNode), and they do not participate in the subsequent work any more. For any candidate CH s i , if it has not any adjacent node, then Algorithm 1 executes line 26 and s i becomes a final CH at once. Furthermore, in lines 25-36 s i either becomes a final CH (FinalCH) or becomes a CM (RegularNode) mutually exclusive.
After the election of final CHs has finished, each CM registers with only one CH based on received signal strength, thus each CM exactly belongs to a cluster. The competition process shows that for any candidate CH s i 's adjacent CH set N CH if s i wins the competition, then the nodes in s i .N CH quit competition. Otherwise, if s i receives a message, it quits competition too, so only one final CH is allowed in each competition range.
To sum up, the theorem is proved.
Fault diagnosis in this paper refers to all faulty nodes within the sensor network are identified correctly and these faulty nodes are ordered to stop working, and the fault conditions about each cluster are reported to BS via data transmission by CHs. Upon receiving fault information, BS takes appropriate actions, such as forbid faulty nodes taking part in the final CH election in the next round. According to the description and analysis above, the fault diagnosis process consists of two phases. First, it eliminates the candidate CHs to participate in the final CH competition in the process 1 Condition 3. c i .CH.E RE > E relay : the energy relation ensures that the relay nodes should have enough residual energy for data transmission in practice. Relay nodes' residual energy is greater than the energy sum of receiving and sending date packets. Therefore, for balancing network energy consumption, it is necessary to protect the relay nodes' residual energy and give priority to the use of the relay nodes with more remaining energy. According to the model described in Section 3, we have mapped multipath routing problem into finding communication path in weighted graph G(V, E, W, R). On the basis of this mapping, we assume that there is a logical communication link between any two nodes which meet conditions 1-3 synchronously. Note that the links direction is always from sources to BS, thus a directed connected cost network (vertex and edge weighted digraph) is built. In order to model and depict mapped multipath routing problem, the definition of DAG is given followed by a theorem which satisfies requirement (4).

Definition 5.
A directed acyclic graph (DAG) is a directed graph with no directed cycles. We say that a DAG is rooted at r if it is the only node in the DAG that has no outgoing edges. Every other node has at least one outgoing edge.

Theorem 6. The mapped weighted graph G(V, E, W, R) which meets Conditions 1 and 2 synchronously is a connected DAG rooted at BS. That is, the graph is connected, directed and without directed cycles. Furthermore, for any vertex in G, it is BS reachable.
Proof. As stated before, there is an edge between any two vertexes which meet with Condition 1. For any two adjacent vertexes u and v ∈ V , edge (u, v) ∈ E. Condition 2 ensures that the direction of edge (u, v) is always from u to v if dist(u, BS) ≥ dist(v, BS), according to distance relation. Therefore, graph G is directed.
Suppose that the mapping produces at least two connected components G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) such that any v 1 ∈ V 1 cannot communicate with any v 2 ∈ V 2 .
Note that a connected component is a maximal connected subgraph of G. Without loss of generality, assume that V 2 lies on the right of V 1 and BS ∈ V 2 , we have the following two cases.
Case 1. There is not any relay CH in V 2 such that v 1 reaches BS via it. There must be a cluster head v 1 ∈ V 1 which is able to communicate with v 2 = BS ∈ V 2 .
Case 2. There exists at least a relay CH v 2 ∈ V 2 such that v 1 reaches BS via v 2 . Then v 1 communicates with it firstly.
So both cases there must be at least a directed edge (v 1 , v 2 ) connects these two connected components, which contradicts with the initial assumption that a cluster head in one component cannot communicate with the one in the other component. Therefore, V 1 and V 2 are connected.
Condition 2 ensures that the route moving toward BS from source CHs via relay CHs. Without loss of generality, we suppose that there is a directed circle Therefore, the theorem is proved.

Next-Hop Neighbor
Choosing. The reachability relation in a DAG forms a partial order, with which a routing path can be constructed, that is, the DAG provides a multipath routing mechanism. With Theorem 6, any CH can transmit its data along relay CHs to BS. Any node or link fails, based on the candidate nodes and links in DAG another one will be chosen. According to different requirements, the selection of the next hop node gives priority to the minimum energy strategy or the maximum residual energy strategy or any other factors. The former forwards packets along the minimum energy path to BS; while the latter farces packets to move toward the BS considering more residual energy of the node on routing path. The decision is made depending on the connected network model and the information gathered in clustering stage.
As an example, the distributed algorithm looking for next-hop neighbor for any cluster head c i .CH is presented in Algorithm 3. Each CH chooses its next-hop neighbor independently according to the distance to BS. Initially, c i .CH chooses a neighbor, which is the nearest to BS within its communication range. In lines 5-9, if more than two neighbors have the same distance, then algorithm selects the one with more residual energy for the sake of balancing energy once again. When the cluster head cannot choose its neighbor any more, the network becomes partitioned. From the view of entire network, a spanning tree with root BS which has minimum hop counts to BS is received. Figure 2 illustrates the communication path selection process. Each cluster head has 2 J initial energy, and the communication cost is denoted by edge weight, the residual energy is represented by vertex weight. The DAG shows the available communication paths. For sensor node v 4 , if the minimum energy first, it selects v 3 , v 6 as the next hop node The minimum energy first routing

BS
The available communication path one after another. Note that when node v 6 looking for its next hop node and find that there are two nodes v 9 and v 11 with the same distance to BS. At this moment, it selects node v 9 which has more residual energy. On the contrary, if the highest residual first, algorithm not necessarily selects the paths which approach BS more quickly but have more residual energy. Finally, from the global view, algorithm outputs a relative optimal communication spanning tree about the graph G whose root is BS.

Route Maintenance.
As stated previously, in the connected network each CH always chooses the neighboring CH with the minimum distance to BS as the next-hop routing node independently. The rest neighbors are maintained in its routing table in order of their distance to BS. If the optimal neighbor is unavailable due to node or link failure, then the node chooses a suboptimal one in its routing table, thus providing a robust routing. In addition, this multipath routing selection mechanism to some extent guarantees that the malicious attacker in network cannot obtain the communication path by listening in the signal simply. After each CH determined its next-hop neighbor, CHs are ready to start transmitting sensing data. In intercluster multihop routing stage, Algorithm 3 finds a routing path, which approaches BS more quickly among all the available paths. While in clustering stage, candidate CH competition takes into residual energy into account. In round based protocol, both stages progress alternatively and  thus provide network balanced energy consumption among all sensor nodes which meets requirement (5).

Simulation Settings.
In this section, we evaluate the performance of FDCM with simulations. Because LEACH [5], HEED [6], and EEUC [8] are the most similar clustering protocols, we use them for comparisons. For fault diagnosisbased clustering, CRFDA [20] is compared. One hundred of sensor nodes are randomly distributed over the region of 100 m × 100 m as showed in Figure 3. The number of candidate CHs is set as 20% of the total nodes. The BS is located far away from the region, at point (50, 175). The simulation parameters are listed in Table 2 .
In our paper we make the following assumptions. (1) Nodes that are detected as faulty will turn into dead mode, that is, they will no longer generate information and consume energy. (2) During the network lifetime, nodes may be faulty at any time. The data sending by soft-faulty nodes is invalid. (3) Sensor nodes have idealized sensing capabilities. Ideal MAC layer conditions are assumed, that is, perfect transmission of data on a node-to-node wireless link. (4) In diagnosis process, we use the uniform rules to generate test Task T i , and ignoring the energy consumed by its implementation.
International Journal of Distributed Sensor Networks

Simulation
Results. Since our protocol is round based, we assume that sensor network randomly generates certain number of faulty nodes when it is running in a certain round. In ideal situation, we expect that the number of the faultfree nodes in network decreases correspondingly comparing with previous round. In our simulation, assume that 3 faulty nodes present in 100th, 200th, 300th round, respectively, and then the result of algorithms execution is shown in Figure 4. One of the curves indicates the result when there are faulty nodes in the network; while the other is faultfree case. From the figure, we can see that the number of fault-free nodes is reduced with faulty nodes arise when the network is running in each round. This means FDCM can correctly detect the faulty nodes in the network. Besides, by comparing, the existence of faulty nodes decreases the network lifetime observably.
Communication complexity is an important measurement for fault diagnosis efficiency. That is one of reasons why fault diagnosis mechanisms with high complexity in traditional MANETs are unrealistic using for WSNs directly. By means of introducing fault diagnosis based on the comparison model in the process of the candidate CHs running for final CHs, FDCM eliminates the chance of faulty nodes to participate in the election; Then, within each cluster only O(N) message complexity needed to complete the fault diagnosis of the cluster members. The isolation of diagnosis boundary avoids large-scale message diffusion throughout the entire network, which made the message complexity reduced significantly. Figure 5 confirmed this by comparing the message complexity of different protocols. It shows that the message complexity of FDCM increases nearly linearly. When faulty nodes come into being, if they do not be diagnosed and excluded from the network, they will consume additional energy of other fault-free CHs. Then, it will shorten the lifetime of the entire network. In simulation, we monitor the number of nodes alive changing with time (round). We find that the network lifetime of FDCM may be influenced by various factors in different situations. The   evaluation of the lifetime comparison between the costaware and the energy-aware is shown in Figure 6.
In contrast with similar clustering protocols, we run LEACH, HEED, and EEUC to compare their performance in network lifetime. As shown in Figure 7, FDCM and EEUC perform far better than LEACH and HEED in prolonging network lifetime attributed to the consideration of energy conservation. In FDCM, a certain amount of energy is spent by the nodes involving in fault diagnosis; however, this eliminates additional energy consumption caused by the faulty nodes. More importantly, the existence of fault diagnosis ensures the correctness of the information collected.

Conclusions
In wireless sensor network, it is of great importance for fault diagnosis to ensure the gathering information accuracy and reduce energy additionally consumed by faulty nodes, for the deployment of a large number of sensor nodes in hostile environment. For the inherent characteristics of sensor networks, this paper analyzes the issues and challenges of comparison based-fault diagnosis model for wireless sensor networks and gives the relevant design requirements.
As a complete data collection protocol, the proposed protocol mainly consists of fault diagnosis-based clustering and multipath routing. During clustering stage, a fault diagnosis approach based on comparison model is introduced. Fault diagnosis of the network nodes consists of two phases. At first, it eliminates the faulty candidate CHs to participate in the final CH competition in the process of clustering. Secondly, after the clustering is finished, the fault diagnosis is done based on special comparison model between CH and CMs. CH sends a test request message to its members and according to their responses to determine the fault status of these nodes, failure nodes are ordered to turn dead. These two phases together complete the diagnosis of all faulty nodes in the network.
In Multipath routing stage, communication characteristics impose certain conditions, which map the original abstract communication graph into the DAG. The new graph determines the feasible multipath communication path of any node to transfer data to the BS. In particular, we give an algorithm greedy select next hop neighbor which has the minimum distance to BS. When multiple nodes are optional, then the node with the highest residual energy is preferential. If any node in the routing path fails, then select an available path in the DAG depending on the highest residual energy or have the minimum distance to BS until the data transfer to the BS. Note that the transmitted data including node fault status, which can be used in the next round as a basis of cluster first election, that is, faulty nodes will lose the possibility of acting as the candidate CH.
For future work, we will consider two new directions. First, we intend to improve our algorithm effectiveness and obtain better performance, such as more accurate diagnosis and lower response time. Second, on condition that acceptable message complexity, we will study the possibility of new diagnosis approach which is appropriate for dynamic topology.