Improved Location Estimation in Wireless Sensor Networks Using a Vector-Based Swarm Optimized Connected Dominating Set

Location estimation in wireless sensor networks (WSNs) has received tremendous attention in recent times. Improved technology and efficient algorithms systematically empower WSNs with precise location identification. However, while algorithms are efficient in improving the location estimation error, the factor of the network lifetime has not been researched thoroughly. In addition, algorithms are not optimized in balancing the load among nodes, which reduces the overall network lifetime. In this paper, we have proposed an algorithm that balances the load of computation for location estimation among the anchor nodes. We have used vector-based swarm optimization on the connected dominating set (CDS), consisting of anchor nodes for that purpose. In this algorithm, major tasks are performed by the base station with a minimum number of messages exchanged by anchor nodes and unknown nodes. The simulation results showed that the proposed algorithm significantly improves the network lifetime and reduces the location estimation error. Furthermore, the proposed optimized CDS is capable of providing a global optimum solution with a minimum number of iterations.


Introduction
Wireless sensor networks (WSNs) are widely used in observing, agriculture, home, and defence environments that are dynamic. Emerging information technology and application requirements have shifted the paradigm of a static network scenario to dynamic environments of WSNs. The mobility of nodes (anchor and unknown) in the network makes it difficult to estimate the location of an unknown node. The accuracy of the location of nodes in such networks is of great importance and depends on location estimation and verification [1,2]. The major drawback of WSNs is the lifetime of the network, as the nodes have limited resources to power up. Compared to unknown nodes, anchor nodes are more privileged, with better power and computational resources. However, unknown nodes drain their resources faster to perform heavy computational tasks. Therefore, the network becomes partially algorithm using a simplex method is shown in Reference [25]. This simplex method is used to optimize the error minimization problem in widely used DV-hop for searching a local optima. The use of the multiobjective genetic algorithm to create a load-balanced virtual backbone is shown in Reference [26]. The mobility of the network increases the iterations of the algorithm to get a local optimum solution, making the overall approach disadvantageous. The identification of balanced nodes in wireless sensor networks for data aggregation has also been shown with a probabilistic network model for a data aggregation tree [27]. Moreover, an algorithm for the identification of a load-balanced maximal dominator node Set (LBMDS) and connector nodes for LBMDS have been developed. An expected allocation probability algorithm (EAP) to solve parent node assignment (PNA) has been proposed. A novel distributed tracking process of the sensor nodes has been introduced [28]. In this approach, a local node using a fuzzy system provides a partial solution, with a centralized algorithm that merges all the partial solutions. The centralized algorithm is based on the calculation of the centroid of the partial solutions. The deviation of the partial solutions increases the complexity of the algorithm, which is an identified concern in this approach.
The solution for swarm configuration-based localization uses the min-max method and swarm optimization [29]. Swarm uses a sequential Monte Carlo localization method to consider mobility and does not require additional hardware. As it can work with seed movement, the mobility factor lowers the cost of implementation without compromising accuracy. We have considered this approach as a candidate, as well for comparison. An ant colony-based optimization approach is described in the literature [30]. A distributed method for parametric regression in a clustered WSN using particle swarm optimization was also introduced recently with multiobjective optimization (MO) [31]. Moreover, the same as the vector-evaluated particle swarm optimization method, it considers the MO problem in two phases; the algorithm obtains a set of candidate network regressors and computes the final model using a weighted averaging rule. However, this parametric regression is not suitable for real time-based scenarios with the mobility effect. A novel optimized localization method using glowworm swarm optimization (GSOL) has been introduced to overcome this drawback [32]. The fitness function of this optimization process follows the deviation or error calculation with the Euclidean distance aspect. After the multi-iterations, all the mobile nodes within the network concentrates around one or multiple nodes providing local optimal location create the global optimal location. We have considered this algorithm also as another candidate for comparison due to its claimed efficiency. Some other techniques for the optimization and load balancing the virtual backbone include: Neural networks [33], the machine learning approach [34], and the differential evolution approach [35].
The aforementioned short survey on the range-free optimization algorithms in WSNs has some drawbacks: (i) The algorithms are not load-balanced and, therefore, partial exhaustion of the network exists; (ii) the redundancy of the clustering nodes makes the process complex; (iii) heavy functioning load on some nodes lowers the network lifetime; and (iv) validation for the overall mobile (unknown and anchor: Both are mobile) environment is not ensured. The present work aims to provide a solution for the abovementioned deficiencies by developing an approach of localization using a VBSO-based CDS construction.

Network Scenario
The present research work considers that the n unknown nodes are located within a plane of M × M area with true locations defined by u i = (x i , y i ) and estimated locations are defined bȳ u i = (x i ,ȳ i ), ∀i = 1, 2, .., n. We have also considered m anchor nodes with known locations defined by a j = (x j , y j ), ∀j = 1, 2, .., m.
The transmission area is considered to be a circle. The centre of the circle consists of the corresponding node itself. The number of unknown nodes connected to an anchor node is variable at different time intervals and, therefore, we have calculated that the expected number of connectivity for each anchor node in the network can be given as: The network of n unknown nodes and m anchor nodes along with their connectivity is considered as a graph G(V, E). G (V , E ) ⊆ G(V, E), where G(V, E) is the original sensor network scenario and G (V , E ) consists only of the optimized backbone anchor nodes considered as optimized CDS and its links to the unknown nodes u 1 , u 2 , ..., u n ∈ U, the set of unknown nodes, such that ∀u i , a j ∈ V {i = 1, 2, .., n and j = 1, 2, .., m}. All the unknown nodes in the set U are the one-hop neighbors of any anchor node present in the CDS such that D ⊆ A, the set of anchor nodes {a 1 , a 2 , .., a m }, V ⊆ V and The proposed network model is mobile in nature for both the anchor nodes and unknown nodes. Relative mobility is required to be calculated to check the mobility direction and the relative distances. The relative mobility between an unknown node u i with respect to anchor node a j at a given time t is given by: where d a,u t = k √ P r and P r is the power received by the receiving antenna and is calculated by the Friis space model [36]. k is constant here. d a,u t is used to calculate the distance and closeness between two nodes. RM a,u t is positive if node u is moving away from a and negative if u is coming closer to a. This relative mobility effects the CDS generation and further the related vectors. This relative mobility is used to generate the updated distance matrices for location estimation.

Vector-Based Swarm Optimized CDS
Our proposed algorithm considers the following assumptions: • The anchor nodes are preinstalled with their own location, also known to the base station (BS).

•
The base station can control all the anchor and unknown nodes. Multiple base stations are allowed, depending on the range of communication.

•
Unknown nodes and anchor nodes have random mobility.

•
According to the privilege concern, BS is the most privileged in terms of storage and resource. Anchor nodes are more privileged, compared to unknown nodes. Therefore, the maximum computing tasks are controlled by BS; anchor nodes and unknown nodes are kept only with a minimum message exchange process. One BS broadcasts the aggregated information to other BS.
The proposed method starts with broadcasting by anchor nodes which have more privileges than the resource constrained unknown nodes. The anchor nodes broadcast the neighbor discovery packet (NDP) as per the following format.
where A_addr is the source address of the anchor node, Leash_C is set to 1 to bound the packet with a limited transmission of one-hop nodes, Seq is the sequence number of the NDP to avoid void routing and stale of packets, and Accept_S is the status of the acceptance of the NDP by an unknown node set at 0. It may happen that in this process of broadcast, another anchor node receives the NDP, in which case, the receiving anchor node will simply discard the NDP. Once this NDP is received by the unknown nodes who are in the one-hop neighborhood of the sender, the unknown nodes reply with a reply neighbor set (RPNS) by decrementing the leash value by 1 and changing the acceptance status to 1. The RPNS is composed of the following components.
Upon receiving the RPNS, the anchor nodes enlist the U_addr as its one-hop neighbor in a neighbor set list (NSL) and send it to the base station (BS) with cryptographic services so that the list cannot be manipulated by any third party attack or compromised insider node.
Anchor → BS : NSL = {A_addr|U_addr 1 , U_addr 2 , ...} After receiving all the NSLs from the anchor nodes, BS calculates the maximum degree anchor node and enlists the anchor node in an empty dominating set D. This operation iterates until at least three anchor nodes are in the dominating set and all the unknown nodes are a one-hop neighbor of at least one of the anchor nodes in the set of anchor nodes A. The process outputs a connected dominating set (CDS), but the problem is the anchor nodes are unbalanced in terms of number connections with unknown nodes. To solve this problem, swarm optimization is performed on the CDS. BS creates binary vector V j for each of the anchor nodes present in the CDS and vector-based swarm optimization is applied for the best solution to achieve an optimized CDS for the backbone. All these n-bit binary vectors (as number of unknown nodes is n) create the initial population of swarm. Fitness value is calculated on each of these vectors and direct and indirect cooperation is calculated. After mutation on the initial population of vectors, boundary values are checked and selection starts again for the next iteration. If the convergence condition is met successfully, we get the output of the best-fit solution for our purpose. Once we get the optimized dominating set, trilateration is performed to get the location of unknown nodes. The algorithm is summarized in Algorithm 1. BS performs: 18: while True do 19: 20: Break while loop; 21: else 22: Calculate the maximum degree anchor nodes a max 23: for k=1 to m' do 28: BS creates binary vector ∨ j , ∀ j = 1, 2, ..., m 29: Apply VBSO algorithm 30: Apply Trilateration 31: end for 32: end procedure

Optimization Process with VBSO
To optimize the obtained CDS, we have applied vector-based swarm optimization (VBSO) on the obtained CDS. The solution for optimizing the CDS exhibits the features of the maximization problem. To solve this problem, we considered the connections of anchor nodes in the CDS with their corresponding one-hop neighboring unknown nodes as n-dimensional vector V j . The proposed method provides a global maximization solution, which is formulated as a pair (V j , f ), where V j ⊆ R n is a bounded set on R n and f : V j → R is n dimensional real valued function. The domain R n of function f is considered the search space. We have to find a pointx ∈ V j so that f (x) is a global maximum on V k . Thus, each element x of R n is considered a candidate solution in the defined search space withx being optimal such that f (x) ≤ f (x). The optimization process for CDS with VBSO algorithm has been summarized in Algorithm 2.

Initialization of Vectors
Each unknown node in the network is allocated to one of the anchor nodes in the CDS. Each anchor node a j and its connections with one-hop neighboring unknown nodes u i is represented by an n-dimensional binary vector V j , ∀j = 1, 2, .., m , where m is the number of anchor nodes in CDS and m ⊆ m.
Let us consider the scenario shown in Figure 1. The scenario consists of five anchor nodes and twelve unknown nodes. The blue circles represent anchor nodes, green circle nodes represent unknown nodes, blue solid lines represent a CDS and green solid lines represent the connection between an anchor node of CDS and unknown node in the network. As per our algorithm, we require at least three anchor nodes in the CDS to calculate a location with the trilateration process. Therefore, anchor nodes 1, 2, and 3 create the required CDS for the purpose for which the vector representation (V1, V2, and V3) is shown in Figure 2. These vectors are considered parent vectors for the VBSO algorithm. The vector representation for each anchor node is formulated as:

Calculate the Fitness Vector
The formulation of the fitness function is a critical and important task in the optimization process, as it helps for fast convergence with the optimum solution. To formulate the fitness function, we have used two parameters, which are also considered in our previous work [37].
Dominating set θ value for the connected dominating set D given as: where deg j is the degree of each dominator anchor node in the set D andd eg is the mean of all the degrees of the dominator in the set D. The calculation of allocation scheme value: |Al θ | = ∑ m j=1 |deg j − ζ| θ 1 θ considers that there are m disjointed sets for m anchor nodes present in the set D, such that N(a j ) ∩ N(a k ) = ∅ and ∀u i ∈ N(a j ), 1 ≤ j ≤ m so that (u i , a i ) ∈ E . N() here represents the neighborhood. deg i is the valid degree of an anchor node a j , i.e., the number of unknown nodes already allocated to the anchor node a j . The objective of the problem defined in this paper is to identify an optimized CDS with minimum D θ and Al θ values. Therefore, the fitness function can be given as: The linear combination of the two metrics CDS θ and Al θ is made to adjust the bias factors W 1 and W 2 , according to the network environment. The above equation emphasizes that to maximize the fitness function for achieving optimum value, the denominator needs to be minimized and the numerator needs to be maximized so that the overall fitness function will be maximized accordingly. The experimented value of θ = 2, as it is analogous for an electro static field.

Reproduction in VBSO
The reproduction method in the VBSO algorithm primarily depends on creating cooperation vectors. Multiple vectors in the search space are combined with vector operations to achieve the cooperation vector.
The direct cooperation vector and differential cooperation vectors are used to create a cooperation vector. The direct cooperation vector at the kth iteration is given by: where V x ,V y ,V z belongs to the initial population vectors, V b is the best vector in the neighborhood, and V r is the random vector and w 1 + w 2 + w 3 + w 4 + w 5 =1. These values are called cooperative weighting coefficients (CWC). Figure 3 represents a direct cooperation vector with V x and V b which actually outputs the global optimum (GO). Following the same line, Figure 4 represents another direct cooperation vector with V x , V y , V b m and V r .  For the scenario of more than three initial population vectors, the selection of three among them is another crucial part which depends on higher fitness values and with a higher probability of selection. The probability is therefore given as: The differential cooperation vector signifies the search around direct cooperation vector. It helps the algorithm to pass local optimums and converge to global one, as shown in Figure 5. Direct cooperation vectors provide a local solution, whereas the differential cooperation vector uses these direct vectors to get the global optimal solution, which is as good or better than all the other feasible solutions. To obtain the unknown global optimum, we have followed k iterations, in which if two simultaneous iterations have output the same values of local optima, we have considered it the global optimum. The differential cooperation vector (V di f f ) of the population is calculated by considering proper portions of differential vectors of a current solution, for example V c , given as: where w 6 + w 7 + w 8 + w 9 + w 10 = 1 (8) Figure 6 shows the generation of the cooperation vector obtained from direct and differential cooperation vectors, which can be given as: In all Figures 3-7, the contour lines represent the cost function that depends on the fitness value of the vectors.

Mutation
To increase the diversity of the population, mutation is applied to the initial population vectors. A number of mutation operators have been suggested in Reference [38][39][40]. Applying direct and differential cooperation implicitly provide mutation in the process. In addition, a second mutation is also applied to transfer the search space origin to a point far enough from the current origin. A random mutation probability p m is selected. A random number r is generated and then compared with p m . The second mutation is applied if rand > p m and rand is in (0, 1). In the mutation process, the random bits in a vector V k are flipped to get the mutation vector V m . The candidate offspring vector V o f f is obtained by the following equation, and also shown in Figure 7:

Boundary Check
After the mutation, new offspring vectors are checked for the boundary condition so that all the offsprings must belong to the search space. An invalid offspring can lead to ab uncoordinated and invalidated search and the convergence will become a delay. Different methods are shown in References [41,42]. We have used the boundary check with a fitness function. If the fitness function of a vector in the current iteration is smaller than the fitness function of the best vector in the previous iteration, then replace the current unfit vector with its parent vector. The formulation of this process is given as: For example, considering the scenario of Figure 2, the fitness value of the best solution is 5.89 (considering r = 2m and θ = 2 (analogy to electro static field). After reproduction and mutation, we have got the offspring vector as in Figure 8. The candidate offspring has a fitness value of 0.278, which is less than the previous best solution of 5.89. Therefore, we need to replace this candidate with one of the parents of the previous iteration by calculating the hamming distance. This vector V o f f has a hamming distance of 3, 6, and 9 from the initial parent vectors V 1 , V 2 , and V 3 , respectively. Thus, this candidate offspring will be replaced by vector V 1 that corresponds to A 1 . This process of retaining the parent of best solution helps to converge faster to the global optimum.

Selection of Offsprings
The identification of the next iteration vectors requires a selection process for suitable offspring vectors. Various methods of selection have been discussed in References [38,43,44]. In our proposed approach, all the valid candidate offsprings are directly transferred to the next iteration. The overall process of the vector-based optimization approach used in our purpose has been summarized in Algorithm 2. The convergence process of the optimization method is depicted in the form of a flowchart in Figure 9. Calculate the allocation value given as:

Localization of Unknown Nodes
Once the optimized CDS is achieved, we can apply any trilateration method to estimate the location of an unknown node u i . The estimated distanced j from an unknown node u i to each dominator anchor node in optimized CDS is given by the respective Euclidean distances between them, as shown below.d where (x,ȳ) is the estimated location of an unknown node of u i .
The calculated distance with relative mobility becomes: The above equation for j = 1, 2, .., m can be transformed into the AX = b form, where: Equation AX = b is solvable by the standards of the least-square [45] solution to provide the value of x given by: The problems identified in the existing algorithms as stated in the section of related work have been addressed by our proposed solution and summarized in Table 1. Table 1. Addressing problem with proposed solution.

Existing Problems Solution in the Proposed Algorithm
Algorithms are not load-balanced and, therefore, partial exhaustion of the network exists.
Load allocation attribute has been used with CDS θ and Al θ .
Redundancy of clustering nodes make the process complex and also causes heavy functioning load on some particular nodes.
Redundancy is removed as the CDS is maintained only by anchor nodes having a maximum degree, so that other nodes can be at rest and can be used later.
Validation for overall mobile (sensor and anchor: Both are mobile) environment is not ensured.
In the proposed solution, we have considered all anchor nodes and sensor nodes to be mobile and therefore support full mobility of the network.

Result and Analysis
The proposed algorithm has been simulated in NS-2 with the following parameters enlisted in the Table 2. The performance of the algorithm has been analyzed in the terms of localization error and network lifetime and number of iterations used to provide the global optimum solution. The location estimation error is defined as a squared error of the estimation and is given by: whered j is the estimated distance and d j is the true distance between unknown node u i and anchor node a j . The average error is given as: The statistical values of localization errors for our different simulation environment are shown in Table 3. Results are shown for the variable number of anchor nodes ranging from 10 to 40, with unknown nodes ranging from 100 to 200. The table also represents the number of anchor nodes used in the VBSO-optimized CDS, by which the given number of unknown nodes has been localized. The result signifies the fact that only 50% to 57% of all the anchor nodes are used. This means that the unused anchor nodes can save their energy for future tasks, which actually elongates the network lifetime. The results of the average localization error have also been compared with three recent optimization-based approaches, as shown in References [29,32,37]. The comparison results are shown in Figure 10. The results emphasize that our proposed approach reduces the average error for localization of unknown nodes. We have observed that with the increasing number of anchor nodes, the proposed algorithm and the work in Reference [37] have a marginal difference of average localization error. However, the advantage of the proposed algorithm lies with the number of iterations for obtaining the global optimum, as shown in Figure 11. The optimization algorithms consider the number of iterations in searching the global optima as an important performance metric. We have compared the number of iterations executed by varying the number of anchor nodes and unknown nodes. Figure 11 shows the results of comparison, which signify that our proposed algorithm is faster in providing the global optimum value with a lower number of iterations of the optimization process.  (d) Figure 10. Average localization error comparison.
The objective of the proposed algorithm is to elongate the network lifetime. Therefore, we have also evaluated the performance of the proposed algorithm in terms of residual energy. The proposed algorithm uses a linear energy model. The overall residual energy of the proposed optimized CDS depends on the constituents, i.e., the dominator anchor nodes. We have considered a timeline, say t1 to t2, for this energy consumption measurement. Residual energy in time t is defined by omitting consumed energy in t from the initial battery power in t − t. Thus, the energy consumption is determined in t. Therefore, the residual energy equation is given as: With the minimum usage of message exchange between anchor nodes and unknown nodes and the handing over the optimization and localization tasks on BS, our proposed approach significantly improves the residual energy of the overall network scenario. Figure 12 describes the result for the network lifetime comparison. The data are represented by varying the anchor nodes from 10 to 40. The data are plotted for a number of unknown nodes vs. the percentage of residual energy. In all the circumstances as shown in Figure 12, the percentage of residual energy for our proposed algorithm is higher than the other algorithms in comparison. Moreover, it shows an effective point for our proposed algorithm: With the increasing number of anchor nodes, the energy residual amount also increases. This signifies that the load is balanced and distributed, for which overall residual energy increases. Furthermore, we have analyzed the complexity of algorithms, which are compared in Table 4, where m is the number of anchor nodes used for localization and n is the number of unknown nodes.

Conclusions
In the present study, a novel algorithm is proposed to optimize anchor nodes in CDS. In this case, the algorithm used random mobility for both the anchor as well as unknown nodes and BS was considered to perform the major tasks, such as optimization of CDS, localization with optimized CDS, etc. The vector-based swarm method used for optimization process converges to the global optima rather than the local optima. By keeping anchor nodes and unknown nodes to a minimum working level, traffic overload in the network and message-processing functions are substantially reduced. The localization process is executed with a transformation matrix of nonlinear equations and applying the standard least-square method. The simulated results were compared with recent algorithms. The accuracy and network lifetime are approximately 30% and 67%, respectively, better than the other compared approaches. This algorithm can be used in a dynamic environment, as it converges to the global optimum solution with a lower number of iterations. Additionally, the method is scalable and works with different network sizes, which makes it adaptable to various applications.
Author Contributions: G.K. and R.S. conceptualized the idea, proposed the methodology and did the software implementation; M.K.R. supervised the experiments; R.T., T.K., J.L., and J.S.P.S. interpreted the results and performed the validation and formal analysis. All authors have proofread the paper.