Investigating the Impact of Optimal Data Transfer Intervals on Failure-Prone Wireless Sensor Networks

Wireless sensor networks (WSNs) typically consist of failure-prone sensor nodes and more reliable sink nodes. To prevent data loss, sensor nodes must regularly transfer their data to sink nodes. Therefore, setting an appropriate data transfer interval between them is crucial. This letter presents a method to optimize the data transfer interval using a Markov model. Although the Markov model directly implies the assumption of exponentially distributed failures' interarrival times, using extensive simulations, we showed that the optimal data transfer interval derived by this model still performs exceptionally well under other failure distributions with less than 7.3% difference in terms of period of data loss compared to the true optimal data transfer interval. We also discuss how to integrate the proposed method into the widely used communication protocol in WSN, TDMA, and show that it can improve data collection time and energy efficiency.


I. INTRODUCTION
Wireless sensor networks (WSNs) play a crucial role in advancing the Internet of Things.These networks typically comprise unreliable sensor nodes (SN) tasked with collecting environmental data and reliable sink nodes (SK) tasked with aggregating data from SNs.However, due to the limited battery lifetime and the unreliable nature of SNs, data transfers between SNs and SK must be optimized so that they are performed in energy-efficient and fault-tolerant manners.Due to the relatively high power consumption for data transfer [1], to save energy, an SN can delay the data transfer by temporarily accumulating the data it collected in its local storage instead of frequently sending them.However, it is important not to delay it for too long because there is an increased risk of data loss if hardware failure occurs before the data are sent to SK [2].Hence, setting up an optimal data transfer rate is important for WSNs.
Time division multiple access (TDMA) is a widely used protocol for wireless communication, especially for WSN [3], [4], [5].In TDMA, SNs are scheduled to alternately send their data in such a way that no communication interference occurs among SNs.This is achieved by dividing the SNs into several groups.SNs that are physically close to each other, i.e., within the interference range of other SNs, will be placed in different groups.Periodic time slots are then assigned to each group, typically in a round-robin fashion, during which SNs that belong to the same group can send their data simultaneously without any fear of communication interference from other SNs.However, this protocol provides less flexibility in data transfer control between SN and SK because the time slots depend on the number of groups of the WSN.Consequently, SNs must follow these time slots for their data transfers, and hence, it is challenging to optimize them in terms of energy-efficiency and fault-tolerance.
Lakhlef et al. [6] found a way around to tweak the data transfer rate in TDMA by dynamically adjusting the number of groups along with their members whenever an SN fails.While this appears to be an interesting approach, dynamically changing the number of groups (i.e., the topology of the WSN) in TDMA is very expensive because determining groups of SNs falls under graph coloring problems, which are well-known to be NP-hard.Bhatia and Hansdah [7] proposed a fast-distributed algorithm that does not require topological change on the WSN.However, their method requires high overhead for sending control messages across the SNs in order to tune their data transfer rate.Nguyen et al. [8] combined a topological ordering and the fast-distributed algorithm for controlling the data transfer rate of SNs.Although they can significantly cut the time needed for grouping the SNs, their method does not consider the possibility of data loss due to hardware failures.
Our work aims to improve the way data are transferred between SNs and SK, considering the existence of hardware failures and without changing the topology of the WSN.Using a Markov model (MM), we propose a simple yet powerful analytical formula to tune SNs' data transfer rate.We show that the optimal data transfer rate is actually a function of data transfer time and failure rate of an SN.Since MM directly implies the assumption of exponentially distributed   failures' interarrival times, we evaluate our formula under various failure distributions.Finally, we discuss how to integrate it into TDMA with minimum control messages and analyze its effects on the overall performance of WSNs.Table 1 summarizes the differences between our work and the related work.

A. Basic Operational Mode of an SN
Fig. 1 depicts how an individual SN assumed in this letter works in an operational mode called the schedule-driven mode [9].In this mode, the SN continuously senses the environment to collect data for S seconds and stores the data in its memory for M seconds.This sequence repeats for seconds until the scheduled timing of data transfer is reached.It then transfers the accumulated data to an SK for T seconds.If a hardware failure strikes before the data are transmitted to an SK, the accumulated data will be lost.The SN is then repaired/replaced, and the process continues.Note that this repeating cycle of sensing and data transfer is similar to that of TDMA, but at this point, we still assume that the length of can be arbitrary.Failure and recovery rates of the SN are represented by λ and μ, respectively.We rewrite and T to derive the data transfer rate δ = −1 and the rate of data transfer time τ = T −1 .

B. Proposed MM-Based Optimization
Next, the behavior of the SN discussed in Section II-A is transformed into an MM [10].The resulting MM is then analytically solved to identify the optimal data transfer interval opt that minimizes data loss.Fig. 2 illustrates the state transitions of the SN based on the MM, encompassing three states: working (W), data transfer (D), and recovery (R).The transition rates between states are represented by the edges of the MM.In state W, the SN performs environmental sensing and stores the collected data in its local storage.The SN then transitions to state D with a rate δ.From state D, it reverts to state W with a rate τ .In the event of failures occurring at a rate λ, the SN transitions to state R, subsequently returning to state W with a recovery rate μ.Note that since an MM is used to model the transitions, it is implied that the model assumes exponentially distributed failures' interarrival times.
We then proceed to solve the MM, which is characterized by the following differential equations: Here, the state probabilities at time t for states W, D, and R are denoted by P W (t ), P D (t ), and P R (t ), respectively.Assuming that initially at t = 0 the SN is in state W, then P W (0) = 1 and P D (0) = P R (0) = 0. First, we apply the Laplace transform [11], [12] to (1)-( 3), resulting in the following three equations: Since reducing data loss corresponds to maximizing the duration in state W, our next goal is to specifically calculate P W (t ).From ( 4)-( 6), we obtain By applying the inverse Laplace transform to (7), P W (t ) can be expressed as where k 1 and k 2 are In reliability theory, P W (t ) is also known as the instantaneous availability, indicating the probability that a system will be operational at a specific time t.For a WSN that has been operating sufficiently long enough, the steady-state availability of the SN, P W (∞), is determined by taking the limit of P W (t ) as the time approaches infinity.This steady-state availability, P W (∞), is given by The optimal data transfer rate δ opt should be selected such that it maximizes P W (∞). To determine δ opt , we first need to calculate the recovery rate μ.To simplify the calculation, we consider its inverse μ −1 , which represents the recovery time.Here, μ −1 comprises the total time used for data collection but subsequently lost the data due to failures, and the total time required for repairing/replacing the faulty SN.Given the memoryless property of the exponential distribution, failures should occur, on average, at the midpoint of the data transfer interval ( 2 = 1 2δ ).Let c be the time needed to repair/replace the faulty SN, μ −1 can be written as μ −1 = c + 1 2δ .By substituting μ −1 into (10), the value of δ that maximizes P W (∞) is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.In other words, opt = 2T λ .This equation shows that, to optimally tune of each SN, we only need to know its data transfer time (T ) and failure rate (λ).Since both parameters are local information of each SN, we can tune the data transfer with minimum global coordination/control messages across SNs.

III. RESULTS AND ANALYSIS
In this section, we are going to answer the following three important questions.
1) Since we use an MM to derive opt , how good is opt for distributions other than exponential?2) What if the actual timing of opt does not fit with the time slots decided by the TDMA protocol?3) What is the effect of opt on the energy efficiency of WSNs?

A. Experimental Setup
To answer Question (1), we conducted simulations assuming a group of SNs that are periodically and simultaneously sending data to an SK.This group of SNs is tasked to collect 90 days' worth of data from the environment.Whenever an SN from the group fails, the group enters a so-called period of data loss, during which continuously sending data from other healthy SNs is not considered progress toward completing the task due to the missing data from the faulty SNs.We considered two kinds of mean time between failures (MTBF = 1 λ ) for the group: 1 and 8 h.The former is when cheaper SNs are used, and the latter is when more expensive (more reliable) ones are used.We also considered three important probability distributions under which failures are generated: Weibull, gamma, and log-normal distributions.We generated failures for each distribution by selecting a particular mean (MTBF) and standard deviation (stddev).The stddev is changed from one-quarter to twice the mean to generate more failure scenarios.We fixed data transfer and SN replacement time to 60 and 30 s, respectively.Such a short replacement time is possible by implementing a fast fault detection algorithm and automatic SN replacement, as in [13], [14].Finally, we compared the difference in time spent in the period of data loss when using opt and when using a true optimal data transfer interval * .Here, * is obtained by traversing many possible values of and returns the one that minimizes the period of data loss for a particular failure scenario.The simulator used in this letter is available online in [15].
To answer Questions (2) and (3), we first discuss how to actually integrate opt into the standard TDMA protocol where, in fact, data transfers can only be performed at certain points in time.We call this integration TDMA opt .Then, we calculate the time and energy TDMA opt needs to collect 90 days' worth of data and compare them with those of the standard TDMA protocol.

B. Impacts of opt on Various Failure Scenarios
Fig. 3 shows how much slower the WSN completes its given task if we compute the optimal data transfer rate/interval under the assumption that failures' interarrival times follow an exponential distribution (data transfer interval = opt ) even when they are more accurately characterized by different distributions (data transfer interval = * ).This slowdown is represented by the increase in the period of data loss.When a longer time is spent in this period, the WSN takes longer to complete its task.Fig. 3 also presents results that show the differences in duration between opt and * .Fig. 3(a) and (b) depict the effect of selecting opt as the data transfer interval on the increase in the period of data loss for the three distributions as a function of the stddev.In all three distributions, the increase in the period of data loss is less than 7.3%, which is relatively small.Despite this, Fig. 3(c) and (d) show that the difference between opt and * can be significant.opt can be approximately 11% longer or 20% shorter than * .From these results, we can conclude that it is relatively safe to compute while assuming exponentially-distributed failures because the impact on the increase in the period of data loss is modest.
As shown in Fig. 3(c) and (d), for Weibull and gamma distributions, the smallest differences between opt and * are obtained when the stddev is close to the MTBF, namely, 75 min [see Fig. 3(c)] and 360 min [see Fig. 3(d)].In these two cases, the differences are not more than 2%.As the stddev approaches the mean, the shapes of the probability density and cumulative distribution functions of the Weibull and gamma distributions are getting closer to those of the exponential distribution.This results in opt becoming very close to its true optimal value * .In fact, the exponential distribution is a special case of the Weibull and gamma distributions when the stddev equals the mean.Hence, theoretically, opt and * should be identical when the stddev equals the mean, i.e., stddev = MTBF = 60 min (1 h) and stddev = MTBF = 480 min (8 h).However, in these cases, the minimum differences seem slightly shifted to their neighboring stddev cases [shifted to the right in Fig. 3(c) and to the left in Fig. 3(d)].This is because, unlike our simulation that allows failures to occur during any state of the WSN (R, W, or D), which is more realistic, the MM used to derive opt assumes failures only occur in the working state (W).This can be seen from the absence of arrows pointing from state D to state R or from state R to itself in Fig. 2.This results in a slight discrepancy in the calculation of opt .In other words, opt itself is not truly optimal for more realistic exponential distribution cases.Nevertheless, this discrepancy does not affect the overall performance of opt as the increase in period of data loss from the optimal one can be kept below 8%.Thus, we can conclude that the exponentially distributed failures' interarrival times assumed by the MM remain reasonable.each colored differently.Data transfer of SNs from a particular group can only be performed during the time slot of a particular color.For example, group 1 can perform data transfers during the blue-colored time slots.In our integration, we use opt to control some kind of enabling signals (en i ) that determine whether or not an SN from group i should execute its data transfer.Data transfers are skipped when en i = 0 and executed otherwise.The value of en i will turn to 1 after opt seconds have elapsed from the last data transfer and turn back to 0 after performing the next data transfer.In this example, SNs from the same group use the same value of opt .This emphasizes the relatively straightforward integration with the "no-requirement" of adding complex control signals to the existing TDMA protocol.

C. Practical Integration of opt With TDMA (TDMA opt ) and Its Effects on the Overall Performance of WSNs
To investigate the significance of our proposed method in terms of WSN's data collection time and energy efficiency, we ran simulations with the same parameter setup as in Section III-A while assuming a WSN whose SN consists of an Intel Strong ARM SA-1100 Microprocessor [16] and a Chipcon CC2420 transceiver [17].With this hardware configuration, the power ratio of data transfer to sensing is approximately 8:1 [1].We assumed four groups of SNs in the network, i.e., four different colors in the large TimeSlot.Fig. 5 presents the comparison of the total data collection time and the energy consumption normalized to those of the standard TDMA when the WSN is tasked to collect 90 days' worth of data.This figure shows that TDMA opt is, on average, 1.25 times faster in completing the task and 2.45 times more energy-efficient than the standard TDMA.These results prove that sending data at the right moment significantly impacts the overall performance of WSNs.On the energy-efficiency aspect, this number emphasizes the significance of our proposed method as it may potentially prolong the WSN's lifetime approximately 2.45 times longer compared to the standard TDMA protocol.

IV. CONCLUSION
In this letter, we have proposed a method to calculate an optimal data transfer interval opt , derived from the assumption of exponentially distributed failures, and investigated its performance on three distributions commonly used to model failures in WSNs.Using extensive simulations, we show that overall, opt does not result in a significant increase in the period of data loss, only slightly deviating from the data loss of the true optimal * by at most 7.3%.This highlights the robustness of our proposed method to different failure scenarios.We also presented a straightforward approach for integrating opt to the standard TDMA protocol with minimum control signals called TDMA opt .Simulation results show that TDMA opt is outperforming the efficiency of the standard TDMA in terms of total data collection time and energy consumption by the factors of 1.25 and 2.45 times, respectively.Such improvements, especially in the reduction of total energy consumption, are essential for WSNs as they may potentially increase their lifetimes.
Despite the promising results, our proposed opt also comes with limitations.For example, it needs the information of the target SN's MTBF to calculate opt .In some cases, such information may not be available before deployment.In addition, opt is currently not applicable to WSNs with multihop data transfer, such as tree-based WSNs.In such WSNs, an SN may also act as an SK of other SNs.Hence, when it fails, the amount of data lost is not only those collected by itself but also includes those received from its children SNs.These problems will be respectively addressed in our future work by adding a real-time failure rate prediction mechanism and extending opt for tree-based WSNs.

Fig. 1 .
Fig. 1.Basic operational model of an SN assumed in this letter.

Fig. 4
Fig.4shows how our proposed opt is integrated into the TDMA scheduling policy.This figure depicts four different timings for data transfer allocated in a round-robin fashion by the TDMA scheduler.On every large TimeSlot, every group of SNs gets exactly one time slot,

Fig. 4 .Fig. 5 .
Fig. 4. Proposed opt controls the enabling signals that decide whether a data transfer is performed or not.

TABLE 1 .
Comparison With Related Work