Orthogonal Regression Based Multihop Localization Algorithm for Large-Scale Underwater Wireless Sensor Networks

For large-scale underwater wireless sensor networks (UWSNs) with a minority of anchor nodes, multihop localization is a popular scheme for determining the geographical positions of normal nodes. However, existing multihop localization studies have considered the anchor positions to be free of errors, which is not a valid assumption in practice. In this paper, the problems existing in nonlinear least square-based node self-localization schemes are analyzed, and the biased distribution characteristic of multihop distance estimation errors is pointed out. Then, the orthogonal regression method is employed for the localization of normal nodes in the presence of anchor position errors. In particular, the influences of errors in independent variables and biases in dependent variables on node coordinate estimation are taken into account simultaneously. Extensive simulation results illustrate the robustness and effectiveness of our method.


Introduction
Underwater wireless sensor networks (UWSNs) have attracted a rapidly growing interest from researchers during the last few years. Due to the advantages of easy deployment, selfmanagement, and no requirement for infrastructure, UWSNs can be applied to a wide range of aspects, such as naval surveillance, earthquake and tsunami forewarning, climate and ocean observation, and water pollution tracking. In these applications, each node needs to collaborate with others in sensing events of interest by exchanging acquired data. To make the data collected from sensor nodes meaningful, the positions of related nodes are often required. In recent years, various node localization algorithms for UWSNs have been proposed and a comprehensive survey of them is provided in [1][2][3] and the references therein.
The task of WSNs node localization is to determine the positions of normal nodes based on the knowledge of anchor positions and internode distance measurements. Currently, most UWSNs localization studies assume the anchor positions are perfectly known and only take the distance estimation errors into account. This, however, is not the case for UWSNs in practice due to the complex environments [4]. Since the positions of anchor nodes are often obtained through GPS receiver or manual configuration in fixed places, they usually suffer from some uncertainties. A typical example is in climate and ocean observation, some anchor nodes may be moved by water. In such cases, the anchor positions may subject to errors. Therefore, finding the positions of normal nodes in UWSNs with the use of inaccurate anchor positions is an important and challenging research topic [5][6][7].
In addition, the uncertainty of multihop distance measurements is also an important influence on localization accuracy [8][9][10][11]. Most of the existing approaches considered the distance measurements affected by the normal distribution noise and determined the distances uncertainty through Monte Carlo analysis or other conventional statistical techniques [12,13]. But these methods perform well only when the measurement sample size is large and the hypothesis that measurement noises obey Gaussian distribution is satisfied.
In this paper, we mainly concentrate on the scenarios of multihop localization. We first point out the main problems existing in traditional multihop localization schemes, which are lack of considering the errors in independent variables and the biases in dependent variables. Through theoretical and empirical analysis, we learn that multihop distance estimation errors are distributed with various biases. We then construct a 3D multihop localization model based on the idea of orthogonal regression and give the simplified scheme for solving the optimum value of node coordinates. The final simulation results indicate that our method is robust against both anchor position errors and ranging errors.
The remainder of this paper is organized as follows. Section 2 formulates the multihop localization problem. Section 3 presents the details of our method. Section 4 evaluates the performance of our method through simulations. And, finally, Section 5 concludes this paper. Through the multihop information exchange, can get all the multihop distance estimations to the anchor nodes. Then, the localization of node can be regarded as the parameter estimation problem of the following nonlinear regression model:

Problem Formulation
where is the regression parameter. is the independent variable, and its observations are anchors' declared coordinates { 1 , 2 , . . . , }.
is the dependent variable, and its observations are the estimated distances between and anchor nodes. is the distance estimation error.
A direct and commonly used method for solving the unknown parameter is Nonlinear Least Squares Estimator (NLSE):̂= arg min∑ wherêis the estimated coordinates of .

Discussion of NLSE Method.
When the position information provided by anchor nodes is accurate and follows the zero-mean Gaussian distribution, NLSE is a statistically efficient realization of the Maximum Likelihood Estimator (MLE) for . However, in practice, the preconditions for NLSE as an optimal estimator are not satisfied, which is mainly embodied in the following two aspects.
In the geometry sense, the objective of NLSE is to minimize the sum of squares of vertical distances from the data points to the fitting curve (or curved surface). It only takes account of the errors in dependent variables, while ignoring the errors in independent variables. When both kinds of errors exist simultaneously, the fitting results and stability of NLSE are relatively poor. As discussed in the previous section, the anchor positions are inaccurate. For model (2), these inaccuracies reflect in the independent variables. If we do not address this issue, we cannot get credible localization results. In multihop scenarios, there are two sources of distance estimation errors.
(1) Random Errors. Due to the environment noise, the distance measurements between pairs of neighboring nodes suffer from certain random uncertainties, namely, random errors (or ranging errors). According to experimental results, the random errors are normal distributions. This can be further explained by invoking the central limit theorem.
(2) System Errors. By approximating the length of the shortest path to the Euclidean distance, multihop localization schemes can infer the distances between any pairs of nonneighboring nodes, which causes the system errors (or multihop cumulative errors).
Since the system errors are affected by the bending degrees of broken lines, they are usually larger than the ranging errors, especially in irregular networks. In this paper, the random and system errors are called collectively multihop distance estimation errors.
In most WSNs localization studies, the distribution of ranging errors is assumed to be unbiased. In fact, it is not the case. Even if follows the ideal zero-mean normal distribution, the mean of multihop distance estimation errors may not be equal to zero, which is proved by the empirical results in the following section. Therefore, when we estimate the coordinates of normal nodes, the distance estimation bias should not be ignored.

Analysis of Distance Estimation Bias.
In this part, we give an empirical analysis of multihop distance estimation bias (the mean of multihop distance estimation errors) for 3D UWSNs. In simulation experiments, sensor nodes with adjustable communication range are randomly distributed in a 200 × 200 × 200 spatial region. The ranging error follows the zero-mean Gaussian distribution ∼ (0, 2 ). Figure 2 shows the statistical results of multihop distance estimation bias ( ) that is normalized by with different network connectivity and . Both network connectivity and have some effect on bias , among which the former affects the most. Since the broken line is always longer than direct one, the multihop distance estimation errors usually experience a positive bias. When the network connectivity is small, the approximation between the length of the shortest path and the Euclidean distance is the main source of distance estimation errors, and the value of bias is bigger. In contrast, if nodes are densely deployed, an approximately straight multihop path is likely to exist between pairwise nodes. Then, usually has a smaller value. With the increase of , the influence of ranging errors on multihop distance estimation rises, and the bias gradually decreases to the mean of ranging errors. Generally, is no more than half of nodes' communication radius .

Orthogonal Regression Method
We where = [ 1 , . . . , , , ] ∈ (3 +4)×1 is the regression vector. = ‖( − ) − ‖ 2 is the Euclidean distance between and . and are weights of scalar square terms. and are, respectively, lower and upper bounds of 's feasible region. In Section 3.1, we restrict the feasible region of to a smaller bounding cube. is the upper bound of . By restricting the range of and , we can prevent the local optimum from emergence and ensure the accuracy of coordinate estimation.
In the framework of MLE, the value of weights and are = 1/ 2 and = 1/ 2 , respectively. As is unknown, we could not get the exact value of . Here we give its approximate value = 1/( ℎ ) 2 . That means that the weight of multihop distance estimation should decrease with the hops ha increasing. is the standard deviation of ranging errors. When and are equal, the objective of model (3) is to minimize the sum of squares of orthogonal distances from the data points to the fitting curve (or surface). CWOR can lower the impact of errors in both independent and dependent variables on the parameter estimation, so it is an effective regression method.

Determination of Feasible Region.
As seen from Figure 1, the feasible region of can be restricted to the intersection of spheres. The smaller the intersection is, the more accurately 's coordinates can be pinpointed. In a sense, the size of feasible region can reflect the coordinate estimation accuracy of normal nodes.
However, the intersection among several spheres is difficult to be determined because of its irregular shape. In addition, due to the combined uncertainties in anchor positions and multihop distance estimations, the spheres may not have intersection. So the calculation of feasible region can be classified into two cases.
(1) We denote as the estimated distance between and . In Figure 1, we use as the radius and as the center to make spheres. If there is overlapping area among all bounding spheres, then the feasible region FR can be easily determined by taking the maximum of the low coordinates and the minimum of the high coordinates of all bounding spheres:  (2) There may be no intersection among all bounding spheres in the first case. So we let the communication distance be the radius; then the bounding spheres are all having intersection. We still use the expression in the first case and it only has to change to .

International Journal of Distributed Sensor Networks
After getting the range of FR , we can set = [ , , ] as the lower limit of the feasible region and = [ , , ] as the upper limit. Additionally, we can estimate a rough position of by using the center of FR : (3), the objective function consists of (3 + 1) × = 4 scalar square terms, and the number of scalar variables in is 3 +4.

Simplified Methods for Solving CWOR. Based in model
In NLSE, the numbers of scalar square terms and regression parameters are only and 3. Since the introduction of anchor position errors, the size of CWOR increases obviously, and a formal use of traditional optimization methods for solving the NLSE problem cannot be recommended to solve (3) unless the problem is small. In this part, referring to the idea in [14], we adopt the simplified scheme for solving CWOR problem. Firstly, set where . . .
Then, we get the estimated coordinates of normal node , denoted aŝ

Discussion of Our Method.
In the framework of MLE, the proposed method for node coordinate estimation is suboptimal because of our approximation of ( ℎ ) 2 to the variance 2 of multihop distance estimation errors. The calculation of the optimal value of model (8) includes iteration operations. The amount of computation mainly depends on the number of iterations. Intuitively, when requesting higher accuracy, we can set the thresholds 1 , 2 , and 3 to smaller values, which might enhance the computational cost. On the other hand, we should raise the threshold values. In this section, the thresholds 1 , 2 , and 3 are generally called threshold and they are set to an equal value. Table 1 shows the empirical data of the relationship among threshold , localization error, and computation cost, for reference only. The value is the localization error with = 1. In most cases, the iteration number for solving the optimum value is no more than 15. Thus, the computation cost of our method is acceptable. In most cases, we could get a better tradeoff between the computation cost and localization accuracy when = 0.1.

Performance Evaluation
In this section, we conduct simulations to study the performance of our proposed method. To reduce the influence of outliers, we take the average of 100 simulation runs as the final data points. The default simulation parameters are shown in Table 2. We mainly discuss the Average Localization Errors (ALE) under different network connectivity and standard deviation. The ALE is normalized by the communication radius : where is the number of normal nodes. Figure 3 shows the statistics for performance of two methods with different standard deviation of anchor position errors. As seen from Figure 3, the standard deviation has certain effect on both methods, among which NLSE is the most affected. Compared with NLSE, our method can improve the localization accuracy by at least 14% in each case. When = 8, the improvement can even reach 26%. Generally speaking, our method is robust with respect to anchor position errors and it can yield much better results than NLSE.   Figure 4 shows the comparison results of ALE under different standard deviations of ranging errors. With the increase of , the accuracies of both methods drop gradually.

Impact of Ranging Errors.
When ⩽ 6, the accuracy of our method remains generally stable. It has an ALE of about 25%-27%. When increases to 7, the ALE of our method is only 30% while that of NLSE is close to 50%. And after that, our method produces an average error that is slightly more than 40% when = 9. But it is still 13% lower than that of NLSE.

Comparison under Anisotropic Network.
In this part, we discuss the topology adaptabilities of NLSE and our method. Figure 5 shows a typical anisotropic network in which 200 nodes are randomly deployed in a C-shape spatial area. Its network connectivity is 12. Figure 6 shows the accuracy comparison results of two methods under the Cshape networks.
Since the shortest path between distant nodes in irregular networks is generally more winding than that in uniform networks, the ALE of both methods are bigger than those in Figure 3. However, our method always performs much better than NLSE. For example, the ALE of our method with = 1 is 30.6% while the corresponding ALE of NLSE is about 48%, which is a marked improvement. Therefore, our method has higher adaptability to irregular network topologies.

Impact of Network Connectivity.
In this subsection, we simulate an improved localization algorithm named Taylor-LS for comparison and analyze the performance of ALE with different network connectivity [9]. We control the network connectivity by changing the transmission range while keeping the area of deployment the same. Figure 7 plots the relationship between the ALE and network connectivity.
We can observe that the ALE of our scheme decreases significantly with the increase of network connectivity when the anchor percentage is 10%. It should be noted that our scheme can achieve relatively high localization accuracy even with low network connectivity. This indicates the good localization performance of our proposed scheme in sparse region.

Conclusions
Due to the uncertainties in anchor positions and the bias in multihop estimative distances, the UWSNs multihop  localization needs to take both the independent variable errors and the dependent variable biases into account. In this paper, we address these issues and give an anchor position error-tolerant multihop localization method based on the orthogonal regression for UWSNs. Through extensive simulations, we demonstrate that the proposed method can give more accurate results in various environments. In most cases, compared with the conventional nonlinear least square methods that ignore anchor position errors and distance estimation biases, our method could improve the localization accuracy by at least 10%. In irregular networks, the advantage of our method is more obvious. On the basis of improving the tolerance capability to anchor position errors, how to reduce the adverse influence of anchor ratio and anchor placement on multihop localization is a main topic of our further study.