An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network

Urmonov, Odilbek; Kim, HyungWon

doi:10.3390/electronics7120444

Open AccessArticle

An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network

by

Odilbek Urmonov

and

HyungWon Kim

^*

Electronics Engineering Department, School of Electronics Engineering, Chungbuk National University, Cheongju 371763, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2018, 7(12), 444; https://doi.org/10.3390/electronics7120444

Submission received: 15 November 2018 / Revised: 10 December 2018 / Accepted: 11 December 2018 / Published: 17 December 2018

(This article belongs to the Special Issue Advanced Technologies in Low Power Wide Area Networks (LPWAN))

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Conventional IoT applications rely on seamless data collection from the distributed sensor nodes of Wireless Sensor Networks (WSNs). The energy supplied to the sensor node is limited and it depletes after each cycle of data collection. Therefore, data flow from the network to the base station may cease at any time due to the nodes with a dead battery. A replacement of the battery in WSNs is often challenging and requires additional efforts. To ensure the robust operation of WSNs, many fault recovery routing mechanisms have been proposed. Most of the previous fault recovery routing methods incur considerable delays in recovery and high overhead in either energy consumption or device cost. We propose an energy-efficient fail recovery routing method that is aimed to operate over a data aggregation network topology using a TDMA media access control (MAC). This paper introduces a novel fault recovery routing algorithm for TDMA-based WSNs. It finds an optimal neighbor backup parent (NBP) for each node in a way that reduces the energy consumption. The proposed method allows the NBPs to utilize the time slot of the faulty parent nodes, so it eliminates the overhead of TDMA rescheduling for NBPs. To evaluate the fault recovery performance and energy efficiency of the proposed method, we implemented it in C++ simulation program. Simulation experiments with an extensive set of network examples demonstrate that the proposed method can extend the network lifetime by 21% and reduce the energy consumption by 23% compared with the reference methods.

Keywords:

failure recovery; routing; wireless sensor network; redundant path; backup parent node; network lifetime and power consumption; active period

1. Introduction

The recent advancement of WSNs has enabled a variety of Internet of Things (IoT) applications that penetrate our daily life [1]. Many IoT applications are often safety related and mission-critical (e.g., health care, active volcano monitoring, fire alert, etc.) where device failures might cause serious consequences [2,3]. Especially, wireless sensor nodes deployed for environment monitoring, periodically send their sensing data to a gateway called a sink node in a multi-hop topology [2].

Sensor nodes are used widely in the industry to monitor and accumulate the data related to the object. For instance, deploying sensor nodes, we can receive periodic information about the environments such as wild nature (forests or deserts), special industrial facilities, etc. [2]. We may apply WSN to obtain up-to-date temperature information or monitor toxic gas levels in different branches of industry. Large-scale self-organized wireless sensor and mesh network provide an opportunity to develop Smart Environment and Smart Grids applications [1]. The WSN is critically important to support these advanced applications.

In the past, many WSNs employed a carrier sense multiple access (CSMA) protocol due to its simplicity [4]. Such networks, however, share the media and therefore suffer from frequent collisions, which incur retransmissions of packets causing extra energy loss. A time division multiple access (TDMA) protocol is regarded as an effective alternative to CSMA, since it can ensure fair and collision-free data forwarding from all nodes, therefore reducing the energy loss [4,5]. Our proposed method is thus based on TDMA. Regardless of the choice of protocol, however, any WSN is susceptible to devise failure or battery depletion, and therefore it may lose network connectivity.

Recent studies on WSNs have achieved considerable enhancement in network architecture and data forwarding protocols to reduce the energy consumption [2]. The primary goal of many WSNs is to maximize the network lifetime even under the event of node failures [6]. Hence, it needs a fail recovery method that operates the rest of the WSN to maintain the desired lifetime.

For low-power WSNs, a tree structure topology is often adopted [7], since it permits simple routing paths from all the nodes towards the sink (root) node, which acts as a gateway collecting all the sensing data. In WSNs of tree structure topology, each child node at a lower level forwards its sensing data to its parent node at a higher level until all data are delivered to the sink node [7]. If any parent node fails, then, all nodes in the subtree under the failed parent lose their routing path towards the sink node. A large portion of the network, therefore, can be isolated, resulting in all their sensing data being lost. Figure 1 illustrates such a faulty parent and its isolated subtree marked by a dotted line.

There are many causes of node failures such as sensor hardware impairment, radio frequency (RF) transceiver malfunctioning, and battery depletion [8]. In the field of network fail recovery, many previous researchers consider battery depletion as the most common cause of node failures [8,9,10]. Our work also assumes battery depletion as the cause of node failures for the sake of presentation, while the proposed recovery algorithm can be extended to any types of node failures.

When the number of faulty nodes exceeds a certain level, the network may cease to operate. The time until the first sensor node runs out of energy is called the First-node Die-Time (FDT). The period from FDT to the time when all the sensor nodes are dead, or the network is completely disabled, is called All-node Die-Time (ADT) [11,12].

As the percentage of faulty nodes in the network exceeds the threshold of the fault ratio, the network is considered as disabled and the remaining alive nodes of the network become useless. The network lifetime is defined as the duration from a network initialization to the time when the network is disabled [13]. Our goal is to restore the connection between isolated nodes and the functioning portion of the network.

Many studies have shown that the occurrence of faults in WSN is largely classified into two groups: (i) transmission fault and (ii) node fault. The node fault is further classified into five categories: power fault (battery depletion), sensor circuit fault, microcontroller fault, transmitter circuit fault, and receiver circuit fault [10,14]. In the cases of receiver or transmitter fault, the sensor node cannot receive nor send its sensed data as well as the data forwarded from the child nodes. The sensor circuit failure is considered as less critical, as the sensor node can still forward the data from its child nodes [15].

Depending on the hardware condition of the sensor node, they are categorized as a Normal node, Traffic node, End node, and Dead node [10]. According to the results of previous studies, the categorization helps in reducing the percentage of dead nodes in the network, therefore improving the network lifetime.

As electronic components of wireless sensor nodes are becoming more reliable, battery depletion is considered as the most prominent source of failure according to recent reports [16]. For the WSNs in a harsh environment, a distributed fault detection (DFD) algorithm was proposed by Reference [17]. The DFD algorithm does not incur additional transmission costs because they use existing network traffic to identify sensor failures. Due to the exchange of multiple enquiry messages, this method may consume more energy during the recovery process. Another common solution to providing fault tolerance (FT) is adding redundant hardware or software [18]. Highly stringent design constraints (e.g., limited battery power) of WSNs, however, make it difficult to add such redundancy due to the additional cost.

In [19], the authors proposed a method of fault recovery during the routing process in a WSN. It classifies the fault recovery methods into two main classes based on the improvement in data transmission. The first technique is retransmission, in which the source retransmits the data through another path when the original path fails. The second technique is data replication which duplicates the data to multiple copies over multiple paths. Utilization of multiple paths for the same message may reduce the network efficiency and cause additional contention to the channel access.

In [20], it is studied a temporal classification method that classifies fail recovery techniques as preventive and curative. Preventive techniques attempt to keep the network functioning without any interruption when any fault occurs. In contrast, curative techniques employ a reactive process that interrupts the network functions while it recovers an identified fault.

The methods of [21] are examples of preventive techniques. They select in advance the second-best routing option as a redundant path to use when a fault appears in the shortest path. To meet the energy efficiency requirement, their algorithm utilizes the largest portion of the shortest path that can still forward the data to determine the redundant path. Since the nodes within the shortest path execute multiple transmissions, they may consume greater energy than the other nodes. An unequal distribution of network load may cause a failure of nodes in the shortest paths. Then, the system frequently executes a fail recovery procedure which makes the nodes consume additional energy.

The authors of [22] proposed a routing protocol that allows real-time fault recovery. It uses the remaining time of each packet and the state of the forwarding candidate set of nodes, and chooses a path dynamically. Upon detection of a failure, sensor nodes change their status to the jump mode and dynamically adjust the probability of jump to increase the ratio of successful transmissions. Updating the state of the nodes requires additional control message exchange which can be costly in the network with limited energy.

In [23,24], it is reported meta-heuristic fault detection algorithms to overcome WSN failure and improve the system reliability. Like many previous approaches, however, such fault recovery methods add significant overheads to both hardware and power, and thus are unacceptable for practical IoT networks.

In [16], S. Gobriel et al. recommended classifying the edges between sensor nodes into three types: primary, backup and side edges. Each node selects one parent as a primary parent and zero or more parents as backups. Primary edges from a spanning tree are used as long as no communication error occurs. If an error occurs in a primary edge, data may be successfully delivered by one of the backup edges. Authors of this work, however, did not clearly specify on what basis their algorithm selects the primary and backup edges.

In this paper, we propose a fault recovery routing algorithm called an energy efficient, neighbor-extended maximal connectivity re-routing (NE-MCR), which does not incur any additional hardware cost, and thus is well suited to WSNs under stringent power constraints. The NE-MCR algorithm conducts an additional route-recovery process after the main routing and TDMA scheduling steps are completed. In this, we identify as faulty nodes the parent nodes that do not respond (acknowledge) to their child nodes within a given time duration. Our method selects the local optimal backup parent nodes during the routing process in a way that ensures most of the child nodes can maintain their connectivity.

In WSNs, a number of different techniques can be applied to detect the failures. Some researchers proposed the use of passive information collection for the purpose of failure detection. In these methods, information crucial to detect the failure can be extracted from regular data packets sent to the sink node [25,26]. In [8], the authors proposed a special framework to detect the failure in WSN. In this model, sensor nodes piggyback checksum tags of path upon all regular messages sent to the sink node. Each node updates the tags with its own node identification (ID) by means of the Fletcher checksum algorithm. After receiving packets from all routes, the sink node inspects their checksum. To identify any changes in a specific path, the sink node injects a series of control messages. Based on the response to these messages, the sink determines and reports the failure. Most failure detection algorithms use additional control information which incurs an overhead in low power WSN. Therefore, in the current work, faulty parents are detected by identifying the nodes that do not acknowledge their child within a given period.

We compare the NE-MCR algorithm with two reference algorithms, i.e., exponential and sine cost function-based routing (ESCFR) and double cost function-based routing (DCFR) that are presented in [11]. Our simulation results prove that NE-MCR has improved energy efficiency and a longer network lifetime than reference algorithms.

We are targeting the applications of the proposed method’s IoT networks for wireless metering. In this application, every sensor node periodically wakes up at the same time and sends its sensing data towards the sink node via pre-calculated multi-hop routing paths. We consider a contention-free TDMA protocol, where each node transmits at its time slot, which is pre-allocated during the scheduling process after the routing process is done [27]. The essence of this network structure can be explained in a way that each parent node receives all its child nodes’ data and aggregates them into one data packet along with its own sensing data. It then transmits its aggregated data to its parent node at its allocated time slot. To conserve the battery power, each parent node switches back to sleep mode once it transmits the aggregated data. We also implemented above procedures on real hardware and demonstrated data forwarding performance in [28]. The authors of this paper believe that there is no related research proposing an energy-efficient fail recovery routing algorithm for this specific type of network model.

In most environmental monitoring, facility diagnosis and wireless metering applications employing energy-harvesting devices like solar cells are not effective solutions since sensor devices are usually installed in an indoor environment or the dark basements of buildings. Using a larger battery is not acceptable due to the cost and size constraints on the sensor devices, since these applications are often deployed throughout the entire city to monitor temperature, air pollution or toxic gas level.

The remainder of this paper is organized as follows. In Section 2, we introduce our network topology and energy model for tree-structured WSN. Section 3 elaborates the proposed fault recovery re-routing algorithm. The definition of NE-MCR algorithm’s cost (objective) function is explained in Section 4. Performance evaluation of the NE-MCR algorithm is provided in Section 5, followed by the conclusion in Section 6.

2. Network Topology and Energy Model

2.1. Network Topology with Time Division Multiplexing

This section describes the network topology, scheduling and routing schemes of the proposed method. Energy-efficient data aggregation is often considered as the primary goal of many WSNs. In conventional sensor networks based on simplistic CSMA, as each sensor’s data travels through multi-hop paths, its data is duplicated and transmitted by the nodes along the paths. Such duplicate transmissions, however, often cause excessive energy consumption. In this paper, we consider a more energy-efficient data forwarding method, a TDMA-based aggregate-and-forward method with convergent network topology. In this forwarding method, each node receives sensing data packets from all its child nodes in different time slots, and sends at once an aggregated data packet to its parent node in another time slot. In the network topology considered in this paper, we assume that all the nodes wake up together at a pre-scheduled sensing period, while they stay in a long sleep period in order to save energy.

In [27], the authors proposed a multi-channel TDMA scheduling where each sensor node has a single radio and selects one channel from a set of RF channels. We also consider a similar TDMA scheduling method in this paper. The scheduling process is conducted after the routing process. In the routing process, each child node selects its parent considering the transmission distance. In the scheduling process, each selects a time slot while satisfying the constraint that the time slot of a parent node

p_{i}

is higher than all child nodes

c_{i}

. This allows all nodes to aggregate and forward the sensing data from leaf nodes towards the final destination, the sink node. For example, Figure 2a illustrates a network with TDMA time slots and channels selected by the above routing and scheduling process for each node. This example is cited from [27]. In this network scenario, a sensor node first selects its parent node, and then schedules its transmission in the time slot that is earlier than the slot of the selected parent. Employing different RF channels allows concurrent time slots for large-scale networks. It also mitigates the interference between the nodes that are using identical slots in the zone of interference. In Figure 2, node n31 and n33 select the same time slot. Since they use different channels, their concurrent transmission does not cause collision. Although each sensor node has a single radio to communicate, it can bridge the child nodes that use various RF channels with the sink node. Initially, it tunes the RF channel of the child node who is allotted with the earliest slot. Then, it switches to other channels according to the sequence of the slots assigned to child nodes. The time consumed for switching from one channel to another is negligible [27]. For instance, in Figure 2, node n11 receives data from the child nodes using channel 1 and 2. Then, it forwards the aggregated data to the sink node. The process of data forwarding in various time slots is depicted in Figure 2b.

The scheduling process ensures another constraint that multiple nodes can share the same time slot only when their channel is different. Figure 2b shows the result of time-slot scheduling. It uses eight time-slots and three channels to complete the aggregate and forward process from all nodes to the sink node n0. Shaded boxes denote each node-allocated time slot, while the colors of the boxes indicate the different channels selected. The dotted arrows represent the forwarding path from a child to its parent node.

The objective of the routing and scheduling algorithm is to minimize the energy consumption by minimizing the number of time slots without exceeding the specified number of channels. If the network forwards all data using fewer time slots, all its nodes can switch back to sleep mode early, leading to less active energy. The authors of [27] proposed a binary linear programming method to solve the time slot allocation problem and implemented a heuristic algorithm distributed method in each node. Since, the method uses in [27] is a more practical method that can be applied to a network of large scale, we have employed it as the routing and scheduling method.

The resulting network topology of the above routing [5] and scheduling process [27] restricts every node to have only a single egress edge while allowing multiple ingress edges. When the WSN wakes up, all nodes wake up at the same time and measure their sensing data at the same time. Then each node starting from the leaf nodes forwards its data to its parent node. Each parent node waits until all sensing data are received from its child nodes. Then, the parent node aggregates all the received sensing data into one data packet and forwards the aggregated data to its parent node in the next hop. Our system model uses the routing algorithm in [5] and the scheduling algorithm in [27] during the network initialization stage.

2.2. Energy Model of Convergent Network

This section presents an energy model of the proposed convergent networks. Figure 3 shows a subtree of six nodes, where parent node 3 has five child nodes. All five child nodes must be allocated in different time slots since they cannot transmit to the same parent node in the same time slot. During the five time slots, node 3 receives data from its all child nodes, and consumes reception energy for each ingress (child) node. The sum of reception energy

E_{A}^{i n g r e s s}

of all ingress nodes for a parent node p is expressed by Equation (1).

E_{A}^{i n g r e s s} = \sum_{i = 1}^{n} E_{r x}^{i},

(1)

Here, n denotes the number of child nodes, which is five, as shown in Figure 3. We assume, that the energy consumed by sensing and data processing is negligible [29] in order to focus on the problem of minimizing the data forwarding energy—a primary cause of energy consumption.

In Equation (1), the reception energy

E_{r x}^{i}

for the received data from node

i

to node

p

is given by Equation (2) [30].

E_{r x}^{i} = l_{i} \cdot P_{e l e c}

(2)

Here,

P_{e l e c}

is the power consumed by the transceiver and radio circuit including the channel coding and modulation circuits.

l_{i}

indicates the data length in seconds.

The total energy consumed by each node during one active period of sensing and data forwarding is estimated by Equation (3):

E_{A}^{T o t a l} = \sum P_{r x} T_{r x} + \sum P_{p} T_{p} + P_{t x} T_{t x}

(3)

Here,

E_{A}^{T o t a l}

is the total energy consumed by a node during its active period;

T_{r x}

,

T_{p}

, and

T_{t x}

denotes the time spent on receiving, processing and transmitting respectively (sum of these time segments equal to active period), and

P_{r x}, P_{p}

, and

P_{t x}

indicate the amount of power consumed by receiving, processing and transmission operations, respectively. Equation (3) can also be expressed by Equation (4) assuming

E_{P}^{T o t a l}

is the energy expense during the active period of the node.

E_{A}^{T o t a l} = \sum_{i = 1}^{n} E_{r x}^{i} + \sum_{i = 1}^{n} E_{p}^{i} + E_{t x}

(4)

Here,

E_{t x}

,

E_{r x}

, and

E_{p}

denote transmission, receiving and processing energy, respectively, while

n

indicates the number of child nodes of the current node.

In multi-hop sensor networks, the transmission power is often constrained. If the transmission power is increased beyond that constraint, it may cause interference with other nodes. The transmission distance and packet length are the main arguments for the transmission energy function. These two parameters are proportional to the energy consumption. Let’s suppose that initially node-A’s transmission power is set approximately to 5 dBm and it is aimed to cover a 150 m range with 95% packet delivery ratio (PDR). Node-A may create a 200 m round interference zone for other neighbor nodes that use the same slot. Hence, these nodes should not execute transmission when node-A sends its data to the parent. However, nodes within 250 m are allowed to transmit data using the same slot since their transmission is not interfered with by node-A. If node-A increases its transmission power to 10 dBm, then its interference zone obviously enlarges, and it interferes with the nodes within a 250 m range. On the other hand, now node-A consumes twice as much energy for each transmission and its battery suffers from intense drainage of energy. Thus, in strictly-scheduled TDMA MAC protocol-based WSNs, we cannot merely increase the power of transmission due to the above constraints; we can, however, reduce the transmission power when the failure recovery process replaces a failed node by a backup node in a way that the transmission distance is reduced [31]. Thus, we assume that every node is assigned a constrained transmission power, which leads to the same maximum transmission energy consumption

E_{t x}

for every node for its allocated time slot. Although the receiving and processing energy may increase during the fault recovery (backup node selection) procedure, we assume the processing power is negligible compared with the receive and transmit power [32]. On the other hand, an increase in selected backup parent node’s receiving energy is inevitable since its number of child nodes grows due to the failure recovery process. When a sensor node is selected as a backup node that has a larger number of child nodes, it consumes higher energy to receive the additional sensed data from the additional child nodes.

For the calculation of a network lifetime, we calculate each sensor node’s battery lifetime

L

. The lifetime

L

of a node is defined as the time length from the power-up time until the battery outage time of the node, which is expressed by Equation (5).

L = \frac{E_{i n i t i a l}}{E_{c o n s u m e d}} \times t_{c y c l e}

(5)

Here,

E_{i n i t i a l}

is the initial energy of node

n_{i}

at its power-up time.

t_{c y c l e}

is one full cycle time including data sensing, receiving all child data, and transmitting the aggregated data to

n_{i}

’s parent node.

E_{c o n s u m e d}

denotes the energy consumed by the node

n_{i}

during

t_{c y c l e}

.

E_{c o n s u m e d}

is expressed by Equation (6) ignoring the energy consumption for sensing and data processing.

E_{c o n s u m e d} = E_{t x} (l, d) + n E_{r x} (l)

(6)

In Equation (6),

n

denotes the number of child nodes and the transmission power

E_{t x} (l, d)

for data length

l

, and distance

d

is defined as follows. Using Frii’s path loss model, we assume that the transmission power of node

n_{i}

is selected in a way that the path loss is compensated [33]. According to the radio model used in Reference [34], data transmission usually depends on the distance and packet length, as expressed in Equation (7).

E_{T x} (l, d) = {\begin{matrix} l \cdot E_{e l e c} + l \cdot E_{F s} d^{2}, i f d < d_{0} \\ l \cdot E_{e l e c} + l \cdot E_{a m p} d^{4}, i f d > d_{0} \end{matrix}

(7)

Here,

E_{F s}

and

E_{a m p}

are the amplifier energy consumption for the distances in free space (

d^{2}

power loss) and the channel with multi-path fading (

d^{4}

power loss) respectively. It is mentioned above that a variable

l

is a length of data in Equation (7) (or it denotes a time that is required to send a sensed data). Since we are concerned with finding backup nodes with a short distance, the free space fading (

E_{F s} \times d^{2}

) is a more appropriate model [35], and thus, it is used in the remainder of the paper.

Using Equations (6) and (7), Equation (5) can be expressed by Equation (8).

L = \frac{E_{i n i t i a l}}{E_{t x} + n E_{r x}} \times t_{c y c l e} = \frac{E_{i n i t i a l} t_{c y c l e}}{l E_{e l e c} + l E_{F s} d^{2} + n l E_{e l e c}} = \frac{E_{i n i t i a l} t_{c y c l e}}{l E_{F s} d^{2} + (n + 1) l E_{e l e c}}

(8)

Equation (8) indicates that the distance and the number of child nodes are the main components of energy consumption.

Using Equation (8), the proposed algorithm selects a set of optimal backup nodes for all parent nodes in the network as described in the next section.

3. Proposed Route-Recovery Algorithms

3.1. Energy Model of Convergent Network

A single sensor node failure can cause branch isolation and thus leave many nodes with broken routes to the sink node. To recover the connectivity, we propose a route-recovery method called maximum connectivity local rerouting (MCR). It quickly replaces the faulty parent node with a backup parent among the child nodes, and thus restricts the rerouting process to only the local nodes within one hop subtree of the faulty node. For example, consider an example network in Figure 4a, where a parent node p is faulty. Here, MCR selects

c_{k}

among p’s child nodes as a backup parent

M C R_B P (p) .

For fast recovery in the event of a fault, MCR operates in two stages. The first stage is a processing step for backup parent selection, which is conducted as a part of the proactive routing algorithm. The second stage is a recovery step during which a real-time rerouting operation is conducted only when a fault occurs. The recovery step instantly replaces the faulty node by the pre-selected backup parent node. Hence it does not interrupt the forwarding operations of all other nodes.

The first stage of MCR, the preprocessing algorithm, is executed during the network initialization period. It examines every child node

c_{i}

of each parent node p for two connectivity conditions: (1) How many sibling nodes of

c_{i}

are covered by

c_{i}

’s wireless range; and (2) whether

c_{i}

can reach its grandparent

g

(the parent of

p

). This method selects one of the child nodes who best satisfies the above two conditions as a backup parent for its primary parent. If

p

fails in the future,

c_{i}

instantly replaces

p

’s role. In other words,

c_{i}

receives the data from

p

’s child nodes and forwards the aggregated data to the grandparent

g

. The key advantage of this recovery process is that the selected

M C R_B P

c_{i}

inherits the time slot of

p

, which eliminates the need for time slot rescheduling of many nodes around

p

. For example, consider the example subtree of Figure 3. Suppose that the MCR algorithm’s preprocessing step selected node 3 as the backup

M C R_B P

; If the parent node 6 fails, node 3 takes over the parent’s role of node 6 and the time slot Slot6. Once node 3 receives and aggregates other sibling’s data, it then transmits the aggregate data to the grandparent node using Slot6. The proposed algorithm, therefore, recovers the communication failures without disturbing the surrounding nodes except the siblings of

c_{i}

. In contrast, most of the previous fail recovery algorithms either re-allocate the parent or reroute the orphan child (the node who lost a primary parent) nodes to another parents in the neighborhood, thus disrupting many neighbor nodes.

The MCR algorithm’s preprocessing procedure for the backup parent selection is illustrated in Figure 4b. It searches for the best candidate of a backup parent for every parent node

p_{j}

amongst

p_{j}

’s child nodes to prepare for the event when

p_{j}

ever fails.

Procedure Select-MCR_BP

Let’s assume

G (V, E)

represents our network [36], where

V

is a set of vertices (nodes) and

E

is the number of edges (lines).

For each parent node,

p_{j} \in V of G (V, E)

repeats the following steps:

For each child node $c_{i}$ of a parent node $p_{j}$ :
(1)
Measure $C o n n e c t i v i t y (c_{i})$ = number of other child nodes $c_{k}$ of $p_{j}$ such that all $c_{k}$ are in the wireless range $W_{i}$ of $c_{i}$ .
(2)
Measure $D i s t a n c e (c_{i}, g)$ = the distance from $c_{i}$ to grandparent $g$ of $c_{i}$ (parent of $p$ ).
Select $c_{m}$ as a MCR backup parent $M C R_{B P}$ of $p_{j}$ such that $C o n n e c t i v i t y (c_{m})$ is maximum and $D i s t a n c e (c_{m}) < W_{m}$ , where $W_{m}$ is the wireless range of $c_{m}$ .
If for all child nodes, $c_{i}$ $g$ is unreachable, select the $c_{m}$ as a $M C R_B P (p_{j})$ and NE-MCR concept to find NE- $M C R_B P (p_{j})$ (details are given in following subsection).
Inform all $c_{k}$ ’s and g that $c_{i}$ is chosen as $M C R_B P (p_{j})$ .

When

M C R_B P (p_{j})

is selected for each

p_{j}

,

p_{j}

informs its child nodes

c_{k}

and grandparent node

g

by broadcasting a message

M (M C R_B P (p_{j}), s l o t (p_{j})) .

Then nodes

c_{k}

and

g

record the node ID of

M C R_B P (p_{j})

and the time slot of slot(

p_{j}

). This process completes the procedure of

M C R_B P

selection. Then, during the main data-forwarding operation, if a fault occurs in node

p_{j}

, the real-time recovery process is conducted as follows: All child nodes

c_{k}

of

p_{j}

forward their data to pre-selected

M C R_B P (p_{j})

instead of the failed parent

p_{j}

. Then,

M C R_B P (p_{j})

forwards its data to the grandparent

g

, while bypassing the failed node

p_{j}

. The failed recovery process takes place only when a fail occurs in the parent node whose backup node was pre-selected. We have implemented an MCR simulator in a C program and measured the performance of MCR using an example network of size 1000 nodes. We evaluated the behavior of the number of isolated nodes (nodes with lost routing) by injecting faults to an increasing number of nodes. Figure 5 compares the number of isolated nodes for the two-fail recovery method. In the case of the MCR fail recovery method, the number of isolated nodes tends to grow linearly, whereas in a naïve route recovery method (based on random selection), the number of isolated nodes grows exponentially. The significant reduction in the number of isolated nodes is attributed to the fact that MCR can efficiently select optimal backup nodes, whereas the random selection method could not find proper backup parents in many cases. The proposed recovery algorithm selects the node that has maximal connectivity with other siblings. In the sparse network scenario, however, the elected

M C R_B P (p_{j})

may not cover all siblings. Therefore, in Figure 5, our method experiences additional isolated nodes when the number of induced faulty nodes grows.

In the event of a parent node’s failure, the first action of its pre-selected backup parent (BP) is to take over the faulty parent’s time slot. The selected BP, then, notifies all its sibling nodes that BP is selected to act as a backup parent. This pre-processing method is carried out only once when the network is initiated, and its initial routing is conducted. A detailed pseudo code for the backup parent selection algorithm is given in [37].

The MCR is a fast algorithm with low complexity and no hardware overhead. It may, however, fail in finding a BP node when none of the child nodes can reach its grandparent node. We define this problem as an out-of-reach problem. Increasing the wireless range of the child nodes may seem to be a quick solution to the out-of-reach problem. This, however, requires increased transmission power of the nodes leading to a shorter battery life of the network. It also increases the interference with neighbor nodes. It is well known from the Frii’s path loss model of Equation (7) that the transmission power rapidly grows with the increasing distance. Equation (7) shows that the required

E_{t x}

tends to grow excessively even for a small increase in distance. In this paper, therefore, we only consider a constrained distance and thus a limited transmission power for all nodes to preserve the battery lifetime.

3.2. Neighbor-Extended Maximal Connectivity Routing

To address the out-of-reach problem described above, we propose an enhanced recovery algorithm called Neighbor Extended MCR (NE-MCR), which is conducted after MCR. In addition to solving the out-of-reach problem of MCR, NE-MCR further reduces the energy consumption. It conducts an extended recovery process only for the nodes that failed to find their backup parent (BP) during their MCP process due to the out-of-reach problem. NE-MCR searches for a neighbor backup parent (NBP) in the neighborhood of the sibling nodes. Suppose that an

M C R_B P

has been chosen by MCR but has an out-of-reach problem. Then this

M C R_B P

broadcasts a special out-of-reach message to all its sibling nodes to trigger the NE-MCR algorithm. Then NE-MCR algorithm executes an individual search for the local optimum NBP in each sibling node of the

M C R_B P

. Hence,

M C R_B P

node is an initiator of NE-MCR’s procedure of selecting an NBP. In the end,

M C R_B P

receives the results of searches from the sibling nodes, compares them, and determines which NBP to choose as the optimum NBP. The detailed procedure of NE-MCR algorithm is presented below.

Procedure Select-NE-MCR_BP

For each parent node

p_{j} \in V of G (V, E)

, if

M C R_B P

fails to reach grandparent

g

, repeat the following steps:

For each child node $c_{i}$ of a parent node $p_{j}$ :
- Select $c_{k}$ as a $M C R_B P (p_{j})$ if it has a maximal $C o n n e c t i v i t y (c_{k})$ with other siblings;
- $c_{k}$ informs its child nodes $c_{j}$ about the out-of-reach problem;
- Every $c_{j}$ examines each parent’s neighbor $p_{j i}$ ( $D i s t a n c e (c_{j}, p_{j i}) < W_{j}$ ) for the following conditions:
  (a)
  Verify if the $D i s t a n c e (c_{j}, p_{j i})$ is minimal;
  (b)
  Verify if the aggregated packet length $L e n g t h_{\sum} (l_{j i})$ of $p_{j i}$ is minimal;
  (c)
  Verify if $p_{j i}$ has an extra time slot $t_{j i}^{e x t r a}$ to receive $c_{j}$ ’s data;
- If $p_{j i}$ best satisfies all three conditions, it sends $O p t i m a l_{l o c a l} (p_{j i})$ to $c_{k}$ ;
- $c_{k}$ for each received { $O p t i m a l_{l o c a l} (p_{j i})$ } from $c_{j}$ :
- Determine $O p t i m a l_{g l o b a l} (p_{j i})$ with conditions (a), (b) and (c);
- If $O p t i m a l_{g l o b a l} (p_{j i})$ is determined by $c_{j}$ then assign $s l o t (p_{j})$ to $c_{j}$ ;
- Else, keep $s l o t (p_{j})$ assigning to $M C R_B P (p_{j})$ ;
- Send registration request message $M (c_{j}, s l o t (p_{j}))$ to $O p t i m a l_{g l o b a l} (p_{j i})$ .

The operations of Procedure Select-NE-MCR_BP are described below for two cases:

Case 1: parent node $p_{j}$ has only one child c₁.
Case 2: $p_{j}$ has more than one child nodes $c_{i}$ ’s.

First, consider case 1. Let

p_{j}

be the target parent node and let

c_{1}

be a child node of

p_{j}

. Also let

g

be the grandparent node of

c_{1}

. Suppose that

c_{1}

cannot reach its grandparent

g

; in this case, NE-MCR carries out a single local search from

c_{1}

, and then it selects the optimum NBP within the wireless range of

c_{1}

. Then c1 sends a registration request message

M (c_{1}, s l o t (p_{j}))

to the NBP. Here,

s l o t (p_{j})

indicates the time slot of

p_{j}

allocated for its transmission in TDMA protocol. Then NBP registers

c_{1}

with

s l o t (p_{j})

, so upon the event when

p_{j}

ever fails, NBP expects to receive data from

c_{1}

not from

p_{j}

during

s l o t (p_{j})

. Like the MCR algorithm, for any pi that fails, the NE-MCR algorithm recycles

p_{j}

’s time slot for its child node

c_{1}

. Therefore, the fail recovery process requires no updates in time slot scheduling, leading to a low complexity and low power process. Additionally, it reduces the overhead of the whole fail recovery procedure. Otherwise, the child node should have sent a request for a new TDMA slot to NBP, and this also would have triggered the time-consuming process of rebuilding the time slot table for the entire network.

Now consider case 2, where parent node

p_{j}

has more than one child nodes

c_{i}

’s. If the MCR algorithm finds no

M C R_B P

that can reach the grandparent

g

, it selects the

M C P_B P

with the maximum

C o n n e c t i v i t y (c_{i})

. In this case, the NE-MCR algorithm searches for neighbor backup parent NBPs in the neighbor subtrees.

For the child node

c_{j}

(including

M C R_B P

), NE-MCR conducts a search within the wireless range of

c_{j}

and selects a NBP

p_{j i}

that is a local optimum for

c_{j}

, if such an NBP exists. The objective function which evaluates the optimality of NBP is described in Section 4. Once

c_{j}

determines its

O p t i m a l_{l o c a l} (p_{j i})

, it forwards a message

M (c_{j}

,

p_{j i}

,

d_{j}

) to

M C R_B P .

The

M C R_B P

collects the messages on

O p t i m a l_{l o c a l} (p_{j i})

of all child nodes, and determines the globally optimum NBP (

O p t i m a l_{g l o b a l} (p_{j i})

). If the algorithm determines

O p t i m a l_{g l o b a l} (p_{j i})

that is originally found by

c_{j}

, it assigns

s l o t (p_{j})

to

c_{j}

. Then, it requests

c_{j}

to send

M_{C h i l d R e q}

message to

p_{j i}

. Like in Case 1, if

p_{j}

ever fails in the forwarding operations for Case 2,

c_{j}

uses

s l o t (p_{j})

to transmit data to

O p t i m a l_{g l o b a l} (p_{j i})

. Then,

c_{j}

changes its role from a child to

M C R_B P

and forwards the data from the isolated subtree under

p_{j}

to

O p t i m a l_{g l o b a l} (p_{j i})

. If the

O p t i m a l_{g l o b a l} (p_{j i})

is originally found by

M C R_B P

, it keeps using

s l o t (p_{j})

and sends a registration request message to the

O p t i m a l_{g l o b a l} (p_{j i})

. Figure 6 illustrates a flow diagram of the overall NE-MCR algorithm, where the procedure Select-NE-

M C R_B P

is highlighted with a blue dotted line.

The NE-MCR algorithm conducts a search for

O p t i m a l_{g l o b a l} (p_{j i})

only within the wireless range of

c_{j}

’s that are either

M C R_B P

or one of its siblings. The NE-MCR algorithm considers as candidates of NBP only the neighbor nodes that do not share the same parent with the current isolated nodes. For example, Figure 7 shows a subtree, where the target parent node is market by a red circle, while its child nodes are market by purple colors. Among the child nodes, a node is selected as

M C R_B P

. For the target parent node, the MCR algorithm is conducted only by the purple nodes (

M C R_B P

and its siblings). The green nodes do not participate in the search operation, since they share the same parent (or grandparent) with the purple nodes. For candidates of NBP, only the blue nodes within the wireless range circles are eligible.

To measure the optimality of candidate NBPs, we utilize the aggregated packet length of each NBP and the distance between

c_{j}

and each NBP. In the current work, we assume that the selected NBP has an extra time slot available to receive additional data. When NE-MCR determines the

O p t i m a l_{g l o b a l} (p_{j i})

with a closer distance to the child nodes

c_{j}

’s, it can reduce the transmission power of

c_{j}

’s and, therefore, can allow

c_{j}

’s to conserve more energy. Many literatures [24,38,35] have emphasized the critical effect of transmission distance on the energy consumption. To the best of our knowledge, however, no prior fault recovery methods like our method have been reported that minimize the distance from the backup parent to the child nodes in the isolated subtree. In the following section, we discuss how we estimate the energy consumption during the search procedures for backup parents.

4. Constraints and Objectives of NE-MCR Algorithm

In this Section, we describe how the proposed network topology aggregates data and how the size of the aggregated data grows. Figure 8 illustrates an example subtree of a network that shows a data aggregation flow. In every active period, a node

n_{i}

wakes up and obtains its sensing data

D_{i}

of size S from its sensor. If

n_{i}

is a leaf node, it forwards Ds to its parent node. If

n_{i}

is a parent node, it receives a set of sensing data

D_{k}

from all child nodes

n_{k}

. The parent node

n_{i}

then aggregates the set of sensing data with its own sensing data. Finally,

n_{i}

forwards the aggregated data to its parent node. For example, in the subtree, node n38 is chosen as a parent by two leaf nodes n41 and n42 nodes. We assume the sensing data generated by every node is of the same size S. Since n41 and n42 each send a datum of size S, n38 concatenates the two data and its own data into an aggregated packet of size 3S and forwards it to its grandparent n25. This Figure shows the size of data aggregated by every node in the above method.

As described in Section 2, in our network topology, every node is allocated with a TDMA time slot. Let

t_{s}

be the fixed length of each time slot. This fixed slot length constrains the length of aggregated data in each node. For example, nodes n11 and n12 aggregate sensing data into a length of 8S each, which is the maximum data length in this subtree.

The length of each aggregated data must be shorter than the time slot constraint

t_{s}

. The routing algorithm selects route paths that meet this constraint. The proposed algorithm NE-MCR also ensures that this constraint is satisfied when it searches for the optimum NBP.

The constraint on the aggregated data length for NE-MCR is given by Equation (9).

l_{N B P_{i}} + l_{M C R_B P} < t_{s},

(9)

Here,

l_{N B P_{i}}

is the length of

N B P_{i}

’s packet that is aggregated with the data received from all its child nodes.

l_{M C R_B P}

is the length of

M C R_B P

’s aggregated packet that is forwarded from

M C R_B P

to

{NBP}_{i}

.

The energy model of tree-structured WSN comprises the sum of all transmission energy consumed by every child node

n_{i}

and the sum of all reception energy consumed by every parent node while receiving data from its child nodes

n_{i}

. The energy model of a normal operation with no failed nodes can be formulated as follow:

E_{t o t a l} = \sum_{n_{i} \in N} E_{t x} (l_{n_{i}}, d (n_{i}, p_{n_{i}})) + \sum_{n_{i} \in N} E_{r x} (l_{n_{i}}),

(10)

Here,

N

is the set of all nodes in the network, while

n_{i}

represents every node in N.

p_{n_{i}}

represents the parent node of

n_{i}

.

l_{n_{i}}

denotes the length of data aggregated by node

n_{i}

.

d (n_{i}, p_{n_{i}})

is the distance between

n_{i}

and its parent

p_{n_{i}}

.

E_{t x} (l, d)

is the transmission energy consumed by

n_{i}

for transmitting a datum of length l through the distance d, while

E_{r x} (l)

is the receiving energy consumed by a parent for receiving a datum from

n_{i}

.

In this work, for the sake of simplicity, we assume that the condition in Equation (9) is satisfied by the selected

N B P_{j}

’s parents

g_{i}^{N B P_{j}}

and their successive parents who can receive additional data from the child nodes.

We now describe the energy model of a network with a fail recovery process for the case where node

n_{i}

failed. Assume that the MCR algorithm selects

M C R_B P_{i}

as an MCR backup parent node for

n_{i}

. Assuming that

M C R_B P_{i}

cannot reach the parent of pi, now suppose that the NE-MCR algorithm selects

N B P_{i}

as a neighbor backup parent. When failure occurs at

n_{i}

, the new data forwarding recovered by using the preselected

M C R_B P_{i}

and

N B P_{i}

incurs variable transmission energy

E_{T X_{i}}^{n e w}

which can be expressed by Equation (11).

E_{T X_{i}}^{n e w} = E_{t x}^{B P_{i} \to N B P_{i}} + \sum^{​} E_{t x}^{C_{m} \to B P_{i}},

(11)

Here,

E_{t x}^{B P_{i} \to N B P_{i}}

indicates the new transmit energy of a recovery route from

M C R_B P_{i}

to

N B P_{i}

. The second term

\sum^{​} E_{t x}^{C_{m} \to B P_{i}}

accounts for the total transmit energy of all other child nodes Cm forwards their data to

M C R_B P_{i}

nodes. Using Equation (7), we can rewrite Equation (11) by Equation (12).

\begin{matrix} E_{T X_{i}}^{n e w} = l_{M C R_{B P_{i}}} E_{e l e c} + (l_{c_{1}} E_{e l e c} + l_{c_{2}} E_{e l e c} + l_{c_{3}} E_{e l e c} + \dots + l_{c_{m}} E_{e l e c}) + \\ l_{M C R_{B P}} E_{F S} d_{M C R_{B P}_{i}, N B P_{i}}^{2} + (l_{c_{1}} E_{F S} d_{c_{1}}^{2} + l_{c_{2}} E_{F S} d_{c_{2}}^{2} + l_{c_{3}} E_{F S} d_{c_{3}}^{2} + \dots + l_{c_{m}} E_{F S} d_{c_{m}}^{2}), \end{matrix}

(12)

In Equation (12),

E_{e l e c}

is the unit energy per data bit consumed by the transceiver circuit. This paper assumes that

E_{e l e c}

is constant for all nodes.

l_{M C R_{B P_{i}}}

denotes the length of the packet that

M C R_B P_{i}

forwards to

N B P_{i}

, whereas

l_{C_{m}}

indicates the length of the packet that the other child nodes C_m forwards to

M C R_B P_{i}

.

d_{M C R_{B P}_{i}, N B P_{i}}

denotes the transmission distance from

M C R_B P_{i}

to

N B P_{i}

, while

d_{c_{m}}

indicates the distance from C_m to

M C R_B P_{i}

. Since the first two terms are constant, it can be substituted by

C_{e l e c}

, so Equation (12) is simplified by Equation (13).

E_{T X_{i}}^{n e w} = C_{e l e c} + E_{F S} (l_{M C R_{B P}} d_{M C R_{B P}_{i}, N B P_{i}}^{2} + \sum_{1 \leq m \leq M} l_{C_{m}} d_{C_{m}}^{2}),

(13)

Here,

E_{F S} l_{M C R_{B P}} d_{M C R_{B P}_{i}, N B P_{i}}^{2}

denotes the transmission energy of the link from

M C R_{B P_{i}}

to

N B P_{i}

, whereas

E_{F S} \sum_{1 \leq m \leq M} l_{C_{m}} d_{C_{m}}^{2}

represents the sum of transmission energy from all Cm’s to

M C R_{B P_{i}}

.

Using Equation (13) as the cost function, the objective formula of the proposed algorithm NE-MCR is given by Equation (14) under the constraints given by Equations (15)~(17). For all nodes

n_{i} \in N

, it selects a set of backup pairs (

M C R_B P_{i}

,

N B P_{i}

) that minimize the cost function

E_{T X_{i}}^{n e w}

, respectively for each

n_{i}

.

Objective:

Minimize E_{T X_{i}}^{n e w} while selecting M C R_B P_{i} and N B P_{i} for every node n_{i} \in N

(14)

Such that:

l_{N B P_{i}} + l_{M C R_B P_{i}} < t_{s},

(15)

d (C_{m}, M C R_{B P_{i}}) \leq W,

(16)

d (M C R_{B P_{i}}, {NBP}_{i}) \leq W,

(17)

Equation (15) defines the constraint that the aggregated data length of

N B P_{i}

must not exceed a threshold

t_{s}

as presented in Equation (9). Equation (16) stipulates that the child node considered as

M C R_B P_{i}

must be reachable from all other child nodes C_m with the wireless range

W

, whereas Equation (17) stipulates that

{NBP}_{i}

must be reachable from the selected

M C R_B P_{i}

. In this way, the NE-MCR algorithm finds an optimal backup pair (

M C R_B P_{i}

,

{NBP}_{i}

) that meets the optimization objectives and constraints given by Equations (14) and (17).

For example, Figure 9 illustrates a subtree of a network to depict how NE-MCR selects an optimal pair of

M C R_B P_{i}

and

N B P_{i}

for a node

n_{i}

. NE-MCR repeats this selection process for every node

n_{i} \in N

(

N

-the total number of node in a network) as a process to find a recovery route path for the case where

n_{i}

indeed fails during normal operation. In Figure 9, the potential faulty node

n_{i}

is highlighted. In this subtree, none of its child node

C_{m}

’s can reach their grandparent

g

, and thus the MCR algorithm fails in finding a backup parent. Therefore, NE-MCR attempts to find an optimal pair (

M C R_B P_{i}

,

N B P_{i}

) as follows. NE-MCR checks the potential of each

C_{m}

and adds

C_{m}

to the candidate set of

M C R_B P_{i}

, if

C_{m}

meets the constraints of Equation (16). For each

M C R_B P_{i}

of the candidate set, NE-MCR finds a set of

N B P_{i}

nodes that satisfy the constraints of Equation (17), and adds a pair (

M C R_B P_{i}

,

N B P_{i}

) to a set of candidate pairs. NE-MCR then calculates the cost function

E_{T X_{i}}^{n e w}

of every candidate pair, and selects the pair (

M C R_B P_{i}

,

N B P_{i}

) of the minimum

E_{T X_{i}}^{n e w}

as the optimal recovery backup nodes.

5. Results and Discussion

In this section, we discuss the performance analysis of our fail recovery routing approach. To evaluate, we compare the simulation results of the proposed method with the existing ESCFR and DCFR algorithms. These algorithms forward the data of sensor nodes to the base station using a backbone formed by a particular set of nodes. The nodes in the backbone are selected based on a cost function. Before the actual data forwarding, each node uses the cost function to identify the minimum power for the current transmission and the neighbor node with the maximum remaining energy. This backbone can be changed at any time if the result of the cost function becomes less optimal for the corresponding chain of the backbone.

Many previous articles such as References [21,22,23,24] report that the network lifetime drastically changes when the variation in the number of nodes or in the node’s transmission range occurs. Therefore, during the simulation, we use these network parameters as a varying argument.

5.1. Analysis of the Performance of NE-MCR

In order to generate simulation results, we exploited the C++ program based on the WiSer simulation tool introduced by [27]. This tool first generates a spanning tree of the target network by conducting a routing algorithm that is presented in [27]. For example, Figure 10 illustrates such a spanning tree. Then the simulator allocates TDMA time slots to each node in the spanning tree using a multichannel and multi-hop scheduling algorithm presented by [5].

To examine the reliability of the proposed algorithm, we conducted simulations using example networks of different density varying from 100 up to 1000 nodes for a 1000 m

\times

1000 m area. The sink node was placed in the center of the given area. We injected faults to 10 percent of the nodes to evaluate the network connectivity ratio of the MCR, DCFR and NE-MCR algorithms, respectively. The network connectivity ratio

β_{c}

is defined as follow:

β_{c} = M / N

(18)

Here,

M

is the number of nodes that can forward the data to the sink node and

N

denotes the total number of nodes in the network.

Figure 11a shows the network connectivity ratio of the proposed algorithms, and compares algorithms for 10 example networks with 100 nodes up to 1000 nodes. The NE-MCR algorithm provides an increasingly higher network connectivity ratio for denser networks. Similarly, DCFR algorithm also performs a better network connectivity ratio, but we can see fluctuating behavior as well for the denser networks such as from 400 to 800. On the other hand, MCR algorithm shows the network connectivity ratio decreasing as the network density grows beyond 300 nodes.

Another set of experiments are conducted to evaluate the recovery capability for an increasing number of faults. For all algorithms we increased the number of faulty nodes until the point where the network reaches a complete halt. Figure 11b shows the number of isolated nodes with a lost connection when the number of fault injections increases for the network of 1000 nodes. We compare two network conditions: A half-network-isolated condition (red dashed line) and a whole-network isolated condition (purple dashed line).

Figure 11b shows that NE-MCR reaches the half-network-isolated condition when the fault injection ratio is 40% (400 faulty nodes out of 1000), whereas DCFR and MCR reach this condition much earlier when the fault injection ratio becomes 32% and 24% respectively. For NE-MCR, the whole-network-isolated condition comes as late as the fault injection ratio of 65%. In contrast, for MCR, this condition is reached as early as the fault injection ratio of 40%. DCFR algorithm results in the whole-network-isolated condition when the number of injected faulty nodes accounts for 60% of the overall network nodes. This experiment demonstrates that NE-MCR sustains the operation for the rest of the network significantly longer than DCFR and MCR algorithms. The worst result is produced by the MCR which proves that this algorithm alone cannot solve the out-of-reach problem. NE-MCR and DCFR algorithms can find alternative or backup recovery solutions for most of the faulty nodes that have the out-of-reach problem.

5.2. Energy Efficiency of NE-MCR

To evaluate the energy efficiency of the proposed algorithm, we compare the network lifetime and energy consumption of NE-MCR with DCFR and ESCFR methods. In this experiment, we assume that the only cause of node failure is a dead battery for the sake of focus on the energy efficiency. For each example network, the simulation initiates the network operation by filling every node with full battery energy. As each node starts forwarding the aggregated data towards the base station, it gradually drains its battery using the energy model of Equation (6). Table 1 summarizes the unit energy parameters used by the energy model [11]. The proposed method is a route recovery method, not a node recovery. Thus, the FDT of the network is not relevant to our performance evaluation.

On the other hand, the ADT is a suitable performance metric. Figure 12a compares the ADT performance of all three fault recovery routing methods, NE-MCR, DCFR and ESCFR, for the 10 network examples of Figure 11a. NE-MCR and ESCFR methods show that the ADT decreases when the number of nodes increases. Surprisingly, DCFR experiences an increase when the number of nodes grows from 400 to 500. This algorithm uses a different cost function than ESCFR and it periodically updates information regarding the available energy of all neighbor nodes. Therefore, it balances a network load more effectively than ESCFR by changing the set of nodes in backbones. In terms of NE-MCR, as the number of child nodes increases, the backup parent nodes receive more data from their child nodes and consume higher energy. Nevertheless, it still performs better ADT, since it also balances the network load considering transmission power and NBP’s receiving energy during the recovery procedure.

Due to the random injection of a fault, some nodes may not find closer neighbor nodes to choose as an NBP, and thus this drains their battery faster. Those nodes, therefore, may spend more transmit energy since the distance between

M C R_B P_{i}

and optimal

N B P_{i}

is greater than others.

Figure 12a shows that until the number of network nodes reaches 400 (red dashed line), the ADT of NE-MCR is substantially greater than the ADT of the other two algorithms. Because, until this point, the network density is low and the nodes using DCFR or ESCFR are more likely to choose a father node that has more available energy to forward their data. As the number of nodes increases, a difference of ADT between these methods shrinks. However, we can observe that for a lower density network cases, NE-MCR algorithm improves ADT by 21% on average over the compared methods. Figure 12b compares the average energy consumption per node for all fail recovery routing methods. The average energy consumption per node grows gradually as the network density increases, and so more data is aggregated in each node. For the network of 200 nodes, our proposed NE-MCR algorithm consumes around 40% less energy than the other two reference algorithms. However, for higher density networks, NE-MCR algorithm’s energy consumption grows rapidly and a difference in performance of the proposed method with DCFR shrinks up to 16% (for the network with 500 nodes). We can see some non-linearity in the performance of DCFR whereas ESCFR experiences a linear increase as the network density grows.

Figure 12c compares the average energy consumption per node measured with various wireless ranges for each node. While all three algorithms have increasing energy consumption as the wireless range grows, NE-MCR shows substantially lower energy consumption in all wireless ranges tested. For example, for the wireless range of 90 m, NE-MCR consumes about 40% lower energy than DFCR. For the same wireless range, ESCFR consumes 48% more energy than the proposed method.

Figure 12d illustrates ADT measured over various wireless ranges for the proposed NE-MCR and comparing DCFR and ESCFR algorithms, respectively, for an example network of 600 nodes. The nodes that transmit data to a longer distance consume higher energy and, therefore, drain their battery faster. NE-MCR shows better performance due to its key advantage that minimize the transmit distance for backup parents, while balancing the number of child nodes to meet the data size constraints. Consequently, Figure 12d demonstrates that NE-MCR performs around 30% more data collection rounds than the compared algorithms for the network with a 40 m wireless range.

6. Conclusions

This paper presented an energy-efficient fail recovery routing algorithm targeted for a tree-topology wireless network whose nodes can fail due to battery depletion. When the recovery process is initiated, MCR algorithm determines a back-up node for each parent from a local subtree. Then, this back-up node employs faulty parent’s TDMA slot to forward aggregated data of the subtree to grandparent. In the implementation stage, we observed that some back-up nodes are not able to connect their grandparents due to a distance constraint. We increased the transmission power of back-up nodes and then they were capable of forwarding their data to the grandparents. This small modification of transmission power, however, caused the following problems: (a) The nodes who were out of the interference zone of the back-up node started facing collision if their slots were identical, and (b) the back-up nodes were identified as faulty nodes in further steps since they used higher power to execute each transmission. Thus, we applied our second NE-MCR recovery method to find back-up parent nodes from different branches of the spanning tree. In this phase, we faced other constraints: (c) The back-up parents selected by NE-MCR were only able to accept a limited number of child nodes; (d) slot length was constrained and it was assigned in an earlier scheduling phase. Allocation of longer slots causes additional energy consumption due to the Idle mode of parents who have less child nodes. However, in a denser network scenario, more back-up patents were found, and NE-MCR was able to connect the isolated subtree with the optimal back-up parent. We compared the proposed method with reference algorithms in a wide range of network sizes. In comparison with reference algorithms, NE-MCR provided a substantially higher network connectivity ratio for networks of greater than 400 nodes. When compared with ESCFR and DCFR algorithms, NE-MCR consumed on average 23% less energy, while allowing a 21% longer lifetime for large networks. The proposed algorithm, therefore, is well suited to a fast recovery solution for low power networks. In addition, it offers a non-disruptive recovery solution for TDMA networks since it finds all back up parents without changing the existing scheduling.

Author Contributions

Conceptualization, O.U. and H.W.K.; Methodology, H.W.K.; Software, O.U.; Validation, O.U., H.W.K.; Formal analysis, H.W.K.; Writing—original draft preparation, O.U., H.W.K.; writing—review and editing, H.W.K.; funding acquisition, H.W.K.

Acknowledgments

This work was financially supported by the Center for Integrated Smart Sensors funded by the Ministry of Science, ICT & Future Planning as Global Frontier Project, Korea (CISS-2017M3A6A6066117). This work was also supported by IITP grant funded by the Korean government (No. R7117-19-0164, Development of wide area driving environment awareness and cooperative driving technology which are based on V2X wireless communication). It was also supported by SoC platform and SW development Advanced Education for IOT which is funded by the Ministry of Trade, Industry, and Energy (N00011132).

Conflicts of Interest

The authors declare no conflict of interest.

References

Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; Zorzi, M. Internet of Things for Smart Cities. IEEE Internet Things J. 2014, 1, 22–32. [Google Scholar] [CrossRef]
Mois, G.; Folea, S.; Sanislav, T. Analysis of Three IoT-Based Wireless Sensors for Environmental Monitoring. IEEE Trans. Instrum. Meas. 2017, 66, 2056–2064. [Google Scholar] [CrossRef]
Wu, F.; Rüdiger, C.; Redouté, J.M.; Yuce, M.R. WE-Safe: A Wearable IoT Sensor Node for Safety Applications via LoRa. In Proceedings of the IEEE 4th World Forum on IoT (WF-IoT), Singapore, 5–8 February 2018. [Google Scholar]
Singh, H.; Biswas, B. Comparison of CSMA Based MAC Protocols of Wireless Sensor Networks. Int. J. Ad Hoc Netw. Syst. (IJANS) 2012, 2. [Google Scholar] [CrossRef]
Kumar, S.; Lim, H.; Kim, H. Energy Optimal Scheduling of Multi-Channel Wireless Sensor Networks for Wireless Metering. In Proceedings of the International Conference on Electronics, Information and Communication (ICEIC), Danang, Vietnam, 27–30 January 2016. [Google Scholar]
Rahman, M.N.; Matin, M.A. Efficient Algorithm for Prolonging Network Lifetime of Wireless Sensor Networks. Tsinghua Sci. Technol. 2011, 16, 561–568. [Google Scholar] [CrossRef]
Ghaffari, M.; Parter, M. Near-Optimal Distributed Algorithms for Fault-Tolerant Tree Structures. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, Pacific Grove, CA, USA, 11–13 July 2016. [Google Scholar]
Banerjee, I.; Chanak, P.; Sikdar, B.K.; Rahaman, H. DFDNM: Distributed fault detection and node management scheme in wireless sensor network. In Proceedings of the Springer Link International Conference on Advances in Computing and Communications (ACC-2011), Kochi, India, 22–24 July 2011. [Google Scholar]
Kamal, A.R.M.; Bleakley, C.J.; Dobson, S. Failure Detection in Wireless Sensor Networks: A Sequence-Based Dynamic Approach. ACM Trans. Sens. Netw. 2014, 10, 35. [Google Scholar] [CrossRef]
Gupta, G.; Younis, M. Fault tolerant clustering of wireless sensor networks. In Proceedings of the 2003 IEEE Wireless Communications and Networking (WCNC 2003), New Orleans, LA, USA, 16–20 March 2003. [Google Scholar]
Liu, A.; Ren, J.; Li, X.; Chen, Z.; Shen, X.S. Design principles and improvement of cost function based energy aware routing algorithms for wireless sensor networks. Comput. Netw. 2012, 56, 1951–1967. [Google Scholar] [CrossRef]
Ok, C.-S.; Lee, S.; Mitra, P.; Kumara, S. Distributed energy balanced routing for wireless sensor networks. Comput. Ind. Eng. 2009, 57, 125–135. [Google Scholar] [CrossRef]
Ren, J.; Zhang, Y.; Zhang, K.; Liu, A.; Chen, J.; Shen, X.S. Lifetime and Energy Hole Evolution Analysis in Data-Gathering Wireless Sensor Networks. IEEE Trans. Ind. Inf. 2016, 12, 788–800. [Google Scholar] [CrossRef]
Lee, W.L.; Datta, A.; Cardell-Oliver, R. WinMS: Wireless Sensor Network Management System, an Adaptive Policy-Based Management for Wireless Sensor Networks; Technical Report UWA-CSSE-06-001; University of Western Australia: Crawley, Australia, 2006. [Google Scholar]
Chessa, S.; Santi, P. Crash faulty identification in wireless sensor networks. Comput. Commun. 2002, 25, 1273–1282. [Google Scholar] [CrossRef]
Gobriel, S.; Khattab, S.; Mossé, D.; Brustoloni, J.; Melhem, R. Fault Tolerant Aggregation in Sensor Networks Using Corrective Actions. In Proceedings of the 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks, Reston, VA, USA, 28 September 2006; Volume 2, pp. 595–604. [Google Scholar]
Chen, J.; Kher, S.; Somani, A. Distributed fault detection of wireless sensor networks. In Proceedings of the 2006 Workshop on Dependability Issues in Wireless ad Hoc Networks and Sensor Networks, Los Angeles, CA, USA, 26 September 2006. [Google Scholar]
Koren, I.; Krishna, C.M. Fault-Tolerant Systems; Morgan Kaufmann: Burlington, MA, USA, 2007. [Google Scholar]
Alwan, H.; Agarwal, A. A survey on fault tolerant routing techniques in Wireless Sensor Networks. In Proceedings of the Third International Conference on Sensor Technologies and Applications, Athens, Greece, 18–23 June 2009; pp. 366–371. [Google Scholar]
Ganesan, D.; Govindan, R.; Shenker, S.; Estrin, D. Highly Resilient, Energy-efficient multipath routing in Wireless Sensor Networks. Mob. Comput. Commun. Rev. 2011, 5, 11–25. [Google Scholar] [CrossRef]
Ho, J.H.; Shih, H.C.; Liao, B.Y.; Chu, S.C. A ladder diffusion algorithm using ant colony optimization for wireless sensor networks. Inf. Sci. 2012, 192, 204–212. [Google Scholar] [CrossRef]
Wu, G.; Lin, C.; Xia, F.; Yao, L.; Zhang, H.; Liu, B. Dynamical Jumping Real-Time Fault-Tolerant Routing Protocol for Wireless Sensor Networks. Sensors 2010, 10, 2416–2437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Keerthana, S.; Pamila, J.M.J. A Survey on Fault node Detection and Recovery Mechanisms in Wireless Sensor Network. In Proceedings of the International Conference on Advanced Computing and Communication Systems (ICACCS-2015), Coimbatore, India, 5–7 January 2015. [Google Scholar]
Munir, A.; Antoon, J.; Gordon-Ross, A. Modelling and Analysis of Fault Detection and Fault Tolerance in Wireless Sensor Networks. ACM Trans. Embed. Comput. Syst. 2015, 14, 3. [Google Scholar] [CrossRef]
Guo, S.; Zhong, Z.; He, T. Find: Faulty Node Detection for Wireless Sensor Networks. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems (SenSys’09), Berkeley, CA, USA, 4–6 November 2009; ACM Press: New York, NY, USA, 2009; pp. 253–266. [Google Scholar]
Liu, Y.; Liu, K.; Li, M. Passive diagnosis for wireless sensor networks. IEEE/ACM Trans. Netw. 2010, 18, 1132–1144. [Google Scholar]
Kumar, S.; Kim, H. Low Energy Scheduling of Minimal Active Time Slots for Multi-Channel Multi-Hop Convergence Wireless Sensor Networks. In Proceedings of the International Conference on Computing, Network and Communications (ICNC 2017), Santa Clara, CA, USA, 26–29 January 2017. [Google Scholar]
Urmonov, O. TDMA Latency Verification in Small-Scale WSN. Available online: https://youtu.be/3RNd491TXzE (accessed on 13 November 2018).
Rogers, A.; David, E.; Jennings, N.R. Self-organized routing for wireless microsensor networks. IEEE Trans. Syst. Man Cybern. A 2005, 35, 349–359. [Google Scholar] [CrossRef]
Aouchiche, M.; Hansen, P.; Zheng, M. Variable neighborhood search for extremal graphs 18. Conjectures and results about the Randic index. Commun. Math. Comput. Chem. 2006, 56, 541–550. [Google Scholar]
Rodoplu, V.; Men, T.H. Minimum energy mobile wireless networks. IEEE J. Sel. Areas Commun. 1999, 17, 1333–1344. [Google Scholar] [CrossRef] [Green Version]
Aziz, A.A.; Sekercioglu, Y.A.; Fitzpatrick, P.; Ivanovich, M. A Survey on Distributed Topology Control Techniques for Extending the Lifetime of Battery Power Wireless Sensor Networks. IEEE Commun. Surv. Tutor. 2013, 15, 121–144. [Google Scholar] [CrossRef]
Hu, Y.; Leus, G. Self-Estimation of Path Loss Exponent in Wireless Networks and Applications. IEEE Trans. Veh. Technol. 2015, 64, 5091–5102. [Google Scholar] [CrossRef]
Heinzelman, W.R.; Chandrakasan, A.; Balakrishnan, H. Energy-Efficient Communication Protocol for Wireless Microsensor Networks. In Proceedings of the 33rd Hawaii International Conference on System and Sciences, Maui, HI, USA, 7 January 2000. [Google Scholar]
Liu, A.F.; Zhang, P.H.; Chen, Z.G. Theoretical analysis of the energy hole in cluster based wireless sensor networks. J. Parallel Distrib. Comput. 2011, 71, 1327–1355. [Google Scholar] [CrossRef]
Sarioz, D. Geographic Graph Theory and Wireless Sensor Network. Ph.D. Thesis, City University of New York, New York, NY, USA, 2012. [Google Scholar]
Urmonov, O.; Kumar, S.; Kim, H. Maximal Connectivity Local Routing for Self-Recovery Wireless Sensor Network. In Proceedings of the IEIE 2016 Summer Conference, Jeju Island, South Korea, 27–22 June 2016. [Google Scholar]
Bagheri, T. DFMC: Decentralized Fault Management mechanism for Cluster Based Wireless Sensor Networks. In Proceedings of the Second International Conference on Digital Information and Communication Technology and its Applications (DICTAP), Bangkok, Thailand, 16–18 May 2012. [Google Scholar]

Figure 1. An example of isolation in WSN due to single parent fault.

Figure 2. Example, TDMA based data forwarding in WSN: (a) presents the data flow in the network; (b) illustrates the TDMA scheduling table.

Figure 3. Power consumption of parent node in duty cycle.

Figure 4. MCR back-up parent selection procedure. (a) Segment of a network where the primary parent failed; (b)

M C R_B P

selection procedure.

Figure 4. MCR back-up parent selection procedure. (a) Segment of a network where the primary parent failed; (b)

M C R_B P

selection procedure.

Figure 5. Comparison of the number of isolated nodes for the WSN of 1000 nodes when the number of faults is increased.

Figure 6. A flow diagram of proposed NE-MCR algorithm.

Figure 7. Avoid the same branch concept in NE-MCR algorithm.

Figure 8. Data aggregation in tree-structured WSN.

Figure 9. Example sub-tree of network where NE-MCR selects an optimal NBP.

Figure 10. Spanning tree-based routing-enabled WSN generated in WiSer C++.

Figure 11. Simulation results regarding: (a) network connectivity ratio; (b) network isolation rate.

Figure 12. Simulation results for energy efficiency considering: (a) ADT versus network density; (b) energy consumption versus network density; (c) energy consumption versus wireless range; (d) ADT versus wireless range.

Table 1. Default value of all network parameters used in the simulation.

Parameter	Value
$E_{elec}$	$50$ (nJ/bit)
$E_{Fs}$	10 (pJ/bit/ $m^{2}$ )
$E_{amp}$	0.0013 (pJ/bit/ $m^{4}$ )
Initial energy	0.5 (J)

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Urmonov, O.; Kim, H. An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network. Electronics 2018, 7, 444. https://doi.org/10.3390/electronics7120444

AMA Style

Urmonov O, Kim H. An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network. Electronics. 2018; 7(12):444. https://doi.org/10.3390/electronics7120444

Chicago/Turabian Style

Urmonov, Odilbek, and HyungWon Kim. 2018. "An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network" Electronics 7, no. 12: 444. https://doi.org/10.3390/electronics7120444

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Energy-Efficient Fail Recovery Routing in TDMA MAC Protocol-Based Wireless Sensor Network

Abstract

1. Introduction

2. Network Topology and Energy Model

2.1. Network Topology with Time Division Multiplexing

2.2. Energy Model of Convergent Network

3. Proposed Route-Recovery Algorithms

3.1. Energy Model of Convergent Network

Procedure Select-MCR_BP

3.2. Neighbor-Extended Maximal Connectivity Routing

Procedure Select-NE-MCR_BP

4. Constraints and Objectives of NE-MCR Algorithm

5. Results and Discussion

5.1. Analysis of the Performance of NE-MCR

5.2. Energy Efficiency of NE-MCR

6. Conclusions

Author Contributions

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI