Hop-Distance Estimation in Wireless Sensor Networks with Applications to Resources Allocation

We address a fundamental problem in wireless sensor networks, how many hops does it take a packet to be relayed for a given distance? For a deterministic topology, this hop-distance estimation reduces to a simple geometry problem. However, a statistical study is needed for randomly deployed WSNs. We propose a maximum-likelihood decision based on the conditional pdf of f ( r | H i ). Due to the computational complexity of f ( r | H i ), we also propose an attenuated Gaussian approximation for the conditional pdf. We show that the approximation visibly simpliﬁes the decision process and the error analysis. The latency and energy consumption estimation are also included as application examples. Simulations show that our approximation model can predict the latency and energy consumption with less than half RMSE, compared to the linear models.


INTRODUCTION
The recent advances in MEMS, embedded systems, and wireless communications enable the realization and deployment of wireless sensor networks (WSN), which consist of a large number of densely deployed and self-organized sensor nodes [1]. The potential applications of WSN, such as environment monitor, often emphasize the importance of location information. Fortunately, with the advance of localization technologies, such location information can be accurately estimated [2][3][4][5]. Accordingly, geographic routing [6][7][8] was proposed to route packets not to a specific node, but to a given location. An interesting question arises as "how many hops does it take to reach a given location?" The prediction of the number of hops, that is, hop-distance estimation, is important not only in itself, but also in helping, estimate the latency and energy consumption, which are both important to the viability of WSN.
The question could become very simple if the sensor nodes are manually placed. However, if sensor nodes are deployed in a random fashion, the answer is beyond the reach of simple geometry. The stochastic nature of the random deployment calls for a statistical study.
The relation between the Euclidean distance and network distance (in terms of the number of hops), also referred to as hop-distance relation, catches a lot of research interest re-cently. In [9], Huang et al. defined the Γ-compactness of a geometric graph G(V , E) to be the minimum ratio of the Euclidean distance to the network distance, where d(i, j) and h(i, j) are the Euclidean distance and network distance between nodes i and j, respectively. The constant value γ is a good lower bound, but might not be enough to describe the nonlinear relation between Euclidean distance and network distance. In fact, their relation is often treated as linear for convenience, for example, [r/R] + 1 is widely used to estimate the needed number of hops to reach distance r given transmission range R. Against this simple intuition, the relation between Euclidean distance and network distance is far more complex. Fortunately, a lot of probabilistic studies have been applied to this question. In [10], Hou and Li studied the 2D Poisson distribution to find an optimal transmission range. They found that the hop-distance distribution is determined not only by node density and transmission range, but also by the routing strategy. They showed results for three routing strategies, most forward with fixed radius, nearest with forward progress, and most forward with variable radius. Cheng and Robertazzi in [11] studied the onedimensional Poisson point and found the pdf of r given the number of hops. They also pointed out that the 2D Poisson 2 EURASIP Journal on Wireless Communications and Networking point distribution is analogous to the 1D case, replacing the length of the segment by the area of the range. Vural and Ekici reexamined the study under the sensor networks circumstances in [12], and gave the mean and variance of multihop distance for 1D Poisson point distribution. They also proposed to approximate the multihop distance using Gaussian distribution. Zorzi and Rao derive the mean number of hops of the minimal hop-count route through simulations and analytic bounds in [8]. Chandler [13] derives an expression for t-hop outage probability for 2D Poisson node distribution. However, Mukherjee and Avidor [14] argue that one of Chandler's assumptions is relaxed, and thus his expression is in fact a lower bound on the desired probability. Using the same assumption, they also derive the pdf of the minimal number of hops for a given distance in a fading environment. Although these analytic results are available in the literature, their monstrous computational complexity limits their applications. Therefore, we try to approximate the hopdistance relation and simplify the decision process and error analysis in this paper. Considering the application of resource allocation, only large-scale path loss is considered, and thus the fading is ignored. The rest of this paper is organized as follows. The number of hops prediction problem is addressed and solved in Section 2. Since this problem has no closed-form solution, we propose an attenuated Gaussian approximation and show how to simplify the error analysis in Section 2.1. Application examples are shown in Section 3. Section 4 concludes this paper.

ESTIMATION OF NETWORK DISTANCE BASED ON EUCLIDEAN DISTANCE
Suppose the sensor nodes are placed on a plane at random, and N(A), the number of nodes in a given area A, follows two-dimensional Poisson distribution with average density λ. The problem of interest is to find the number of hops needed to reach a distance r away. We can make a maximumlikelihood (ML) decision, where the event H n can be described as "the minimum number of hops is n from the source to the specific node at Euclidean distance r." In the following discussion, we are trying to approximate f (r | H n ) for 2D Poisson distribution. Note that r < R implies H 1 , that is, the specific node is within one hop from the source. We are more interested in multiple-hop distance relation, especially when n is moderately large.

Attenuated Gaussian approximation
Since f (r | H i ) is awkward to evaluate even using numerical methods, we use histograms collected from Monte Carlo simulations as substitute to the joint pdf. All the simulation data are collected from a scenario where N sensor nodes were uniformly distributed in a circular region of radius of R Bound meters. For convenience, polar coordinates were used. The source node was placed at (0, 0). The transmission range was  set as R meters. For each setting of (N, R Bound , R), we ran 300 simulations, in each of which all nodes are redeployed at random. We ran simulations for extensive settings of node density λ and transmission range R. Due to space constraints, only the histograms for (N = 1000, R Bound = 200, R = 30) are plotted in Figure 1, which approximately shows that f (r | H i ) approaches the normal when H i increases. Table 1 lists the first-, second-, third-, and fourth-order statistics of Skewness is a third-order statistic used to measure of symmetry, or more precisely, the lack of symmetry. Skewness is zero for a symmetric distribution and positive skewness indicates right skewness while negatives indicates left skewness.
Definition 1 (see [15]). For a given sample set X, where X is the sample mean of X, and n is the size of X. Then a sample estimate of skewness coefficient is given by L. Zhao and Q. Liang Kurtosis is a fourth-order statistic indicating whether the data are peaked or flat relative to a normal distribution.
Definition 2 (see [15]). A sample estimate of kurtosis for a sample set X is given by where m 4 = Σ(X − X) 4 /n is the fourth-order moment of X about its mean.
Skewness and kurtosis are useful in determining whether a sample set is normal. Note that the skewness and kurtosis of a normal distribution are both zero; significant skewness and kurtosis clearly indicate that data are not normal. Table 1 clearly shows that the skewness and kurtosis satisfy the Gaussianity condition within tolerance of error. Furthermore, The postulated distribution and histogram are drawn together in Figures 2(a), 2(b), 2(c), and 2(d), which clearly shows a close match for each case. Also, note that f (r | H n ) attenuates exponentially with n increase, we need to introduce an attenuation factor to model this behavior.
Thus, the objective function can be approximated by where α is the equivalent attenuation base, m n and σ n are the mean and standard deviation (STD), respectively. Since f (r | H n ) attenuates with n increasing, α must be less than 1.
The specific values of these parameters can be estimated from simulations or computed numerically from the exact pdfs. Our extensive simulations show that even for only moderately large H i , f (r | H i ) has the following properties.
(1) σ n ≈ σ n−1 , which means that the neighboring joint pdfs have similar spread.  (1), this tell us that the neighboring joint pdfs have nearly identical shape.
As shown in the following discussion, these properties largely simplify the decision rule and the error analysis. Another interesting observation, besides these properties, is that the following equations do not stand true, Although these equations sound plausible, they all give visible errors. The aforementioned estimator [r/R] + 1 for H i , though widely used, is not good in the new light shed by this study.

Decision boundaries
Following (2), and observing the f (r | H i ) in Figure 3, the decision is needed only between neighboring H i , that is, Using property (1), For large density λ, property (5) is applicable, (9) simplifies to Applying property (1) to (11), No matter which approximate solution we choose for d n , the decision rule is given by In other words,

Error performance analysis
For our decision rule, a decision error occurs only when the required number of hops is n, but our decision n / =n. Thus, the probability of error for a specific r is where f (H | r) is related to f (r | H i ) by the Bayesian rule. The total probability of error is obtained by integrating (15) over all possible r, According to property ( (17) is approximated by Substituting an appropriate solution of d n into (19) would give us the probability of error within required accuracy. For example, if we choose (12), Thanks to the Gaussian approximation, the error probability is given in forms of Q functions, which is tremendously simpler than the derivation from the original pdfs. This error process is general and applicable to other estimators. For example, even when we have to use a linear estimator due to limit of computation capacity, we can still use the above process to obtain the corresponding error probability.

APPLICATION EXAMPLES
We provide two application examples, latency and energy estimation, in this section. To emphasize the role of the number of hops in the estimation, we use general time and energy models. On how to derive the parameters such as T rx , T tx for a specific routing scheme, readers are referred to [16,17].

Latency estimation
We use a simple time model, in which the latency increases linearly with the number of hops [18]. Suppose it takes T rx , T tx for a sensor node to process 1 bit of incoming and outgoing messages, respectively, and T pr is the required time to transmit 1 bit of message through a band-limited channel. Therefore, the latency introduced for each hop is  As shown in Figure 4, given the end-to-end distance r, we can find the required number of hops n according to (13), thus, a good estimator of the total latency of an l-bit message is

Energy consumption estimation
The following model is adopted from [19] where perfect power control is assumed. To transmit l bits over distance r, the sender's radio expends and the receiver's radio expends E elec is the unit energy consumed by the electronics to process one bit of message, f s and mp are the amplifier factor for free-space and multipath models, respectively, and d 0 is the reference distance to determine which model to use. In fact, the first branch of (23) assumes a free-space propagation and the second branch uses a path-loss exponent of 4. The values of these communication energy parameters are set as in Table 2. Let s n denote the single-hop distance from the (n − 1)thhop to the nth-hop. Obviously, s n ≤ R. In our experimental setting, R = 30m < d 0 so that the free-space model is always used. This agrees well with most applications, in which multihop short-range transmission is preferred to avoid the exponential increase in energy consumption for long-range transmission. Naturally, the end-to-end energy consumption for sending l bit over distance r is given by where n is the estimated number of hops for given r and r 1 is the single-hop distance because the message is relayed hop by hop.

Simulation
We used the same scenario described in Section 2.1 and varied the node density λ and transmission range R. In each simulation, the number of hops is estimated for each node using (11) and (13), and then the latency and energy consumption are estimated using (22) and (26), respectively. As comparison to our proposed statistic-based estimator, we choose a widely used linear estimator, linear estimator 1 n = r R + 1, where r is the given distance, R, the transmission range, and [r/R] is the maximum integer less than r/R. We plot the average of latency and energy consumption in Figures 5(a) and 5(b) and the RMSE in Figures 6(a) and 6(b), respectively. The latency is plotted in units of T hop while the energy consumption in units of joules. The ripple shape of RMSE is due to the fact that decision errors occur more often in the overlapping zones of neighboring f (r | H i ). Figure 5 shows that the linear estimator 1 performs well at the shorter range but suffers visibly at larger range, while the linear estimator does the opposite. The linear estimators, no matter what value their parameters take, may significantly underestimate or overestimate the latency and energy consumption as already pointed out in Section 2.1, while our statistic-based model keeps close to the actual latency and energy consumption at all ranges except for the border. This is also verified by Figure 6, which also shows that our model can reduce RMSE to at least half for both latency and energy consumption. These results show that linear models cannot identify network behavior accurately, as also confirmed by our extensive simulations for different settings of node density and transmission range, which is not shown here due to space constraints.

CONCLUSION
To address the fundamental problem "how many hops does it take for a packet to be relayed for a given distance," we make both probabilistic and statistical studies. We proposed a Bayesian decision based on the conditional pdf of f (r | H i ).
Since f (r | H i ) is computationally complex, we also proposed an attenuated Gaussian approximation for the conditional pdf, which visibly simplifies the decision process and the error analysis. This error analysis based on Gaussian approximation is also applicable to other estimators, including the linear ones. We also show that several linear models, though intuitively sound and widely used, may give significant bias error. Given as application examples, our approximation is also applied in the latency and energy consumption estimation in dense WSN. Simulations show that our approximation model can predict the latency and energy consumption with less than half RMSE, compared to the aforementioned linear models.