EMP: Exploiting Mobility Patterns for Collaborative Localization in Sparse Mobile Networks

Location awareness plays an indispensable role in a wide variety of application domains such as environment monitoring and vehicle tracking. In this paper we focus on the localization of mobile users in sparse mobile networks which exist in many practical scenarios where users are distributed over a vast area. The unique characteristics of sparse mobile networks present several challenges for accurate localization, such as constant movement and little information from anchors. By analyzing five large datasets of real users traces with entropy analysis from five sites, we make an important observation that there are strong patterns with user mobility. Motivated by this observation, we propose a localization approach called EMP by exploiting mobility patterns of users for localization in sparse mobile networks. EMP implements a range-free distributed algorithm, with which each user collaboratively estimates its current location by fusing two localization sources, that is, network connectivity with other nodes and mobility patterns. With trace driven simulations, we demonstrate that EMP significantly improves the localization accuracy, comparing with other existing localization approaches.


I. INTRODUCTION
Location awareness plays an indispensable role in a wide variety of domains, such as environment monitoring, and vehicle tracking.In this paper we focus on the localization of mobile users or devices in sparse mobile networks which exist in many practical scenarios where the users or devices are distributed over a vast area.The localization approach based on the Global Positioning System (GPS) suffers several limitations.First, GPS may fail in indoor environments or urban areas where urban canyons exist.Second, GPS receivers are power consuming and can easily drain power driven devices.
The unique characteristics of sparse mobile networks present several challenges for accurate localization.First, users are constantly changing their locations and localization should be performed in real time.Second, in a sparse network, anchors that have location information can be few for most of the time and thus localization should usually be performed under little information from anchors.
There has been extensive research on the localization problem [3] [7] [11] [13] [25] [29].Existing approaches could be classified into two categories: range-based and range-free.Examples for range-based approaches include Received Signal Strength (RSS) [5] [6], Angle of Arrival (AOA) [19], Time of Arrival (TOA) [32], Time Difference of Arrival (TDOA) [26], etc.The performance of range-based approaches are highly dependent on the accuracy of range techniques, which could vary greatly in practical situations.For range-free approaches [13] [17] [29], mere communication connectivity is used for computing localizations of nodes.A high node density is an indispensable condition for range-free approaches.Unfortunately, the high density condition does not hold for sparse mobile networks.
By analyzing five large datasets of real users traces with entropy analysis from two university campuses (NCSU and KAIST), New York City, Disney World (Orlando), and North Carolina state fair [16] [23], we make an important observation that there is strong patterns with user mobility.Mores specifically, the future location of a user is highly dependent on the current location.In addition, a user intends to move around a few preferred locations.
Motivated by this observation, in this paper we propose a localization approach called EM P by exploiting mobility patterns of users for localization in sparse mobile networks.EM P implements a range-free distributed algorithm, with which each user collaboratively estimates its current location by fusing two localization sources, i.e., network connectivity with other nodes and mobility patterns.Upon meeting another user, the location of that user is used to improve the location estimation of the user.At the same time, the mobility pattern of the user is exploited for helping refine its location estimation, and users are differentiated according to the degrees of their mobility patterns.
The technical contributions of the paper are listed as follows.
• By analyzing the five real-world user traces with entropy analysis, we reveal that there exists strong patterns for user mobility.As a result, the mobility of a user is characterized by a Markov chain.• It is the first attempt, to the best of our knowledge, to estimate user locations by fusing network connectivity and mobility patterns.• With trace driven simulations, we demonstrate that EM P significantly improves the localization accuracy, comparing with other existing localization approaches, such as Locale [33].The rest of the paper is organized as follows.In Section II, we review related work.In Section III, we give the system model and the problem statement.In Section IV, we introduce our localization algorithm in details.Section V presents evaluation results.The conclusion is given in Section VI.

II. RELATED WORK
A lot of methods have been proposed for localization.They could be classified into two categories: range-based localization and range-free localization.In this section, we review related work under the two categories.
For range-based localization algorithms, techniques like triangulation or trilateration [1] are very popular, in which the physical distance among nodes are measured.These algorithms require some kinds of special hardware for measurements of distances [6]  [30], [33] which rely on connectivity between different nodes.
LOCALE [33] is proposed for localization of mobile nodes in sparse mobile networks.Each mobile node estimates its own location by sensory data from its accelerometer.In addition, it refines its own location by using the locations information from encountered neighbors.It could be viewed as a delayed triangulation localization algorithm in sparse mobile networks.

A. System Model
We consider the localization of a set of mobile users moving within a given region.The set of mobile users are denoted by M = {1, 2, • • • , m}.We separate the time into several time slots, and the whole period is represented by Initially, at τ 0 a node, i ∈ M , is located in: Since any node's velocity is finite where v max is the maximum velocity that all nodes can reach.Thus, after a time slot τ , the location of i is within a certain range by where E i 1 is the estimated location of node i at τ 1 .The whole region is divided into grids of equal size (v max τ ) × (v max τ ).We use set G to denote the set of the grids and let g denote a grid within G.As an example shown in Fig. 1, the KAIST campus is divided into 40 × 60 grids, .
The trajectory of a node can hence be represented as a series of grids that it travels, We make three assumptions.First, mobile nodes are equipped with a low accuracy dead-reckoning tracking sensor device.Second, all users have access to their historical traces.The historical trace of a mobile user i is represented by ξ i .Third, all users share the same communication range γ.When the distance between user i and j is smaller than the communication range, d ij (τ ) ≤ γ, we claim that the two users encounter each other at time slot τ .

B. Problem Statement
The goal of our algorithm is to get the location of each mobile users at any time within a period of time of interest.The estimated locations of all mobile nodes are denoted by set Ê.The real locations of mobile nodes are denoted by set E. The location of a mobile user i at time slot τ is represented by We define Δ( Ê) to represent the localization error between the estimated locations and the real locations of the mobile nodes, where || • || F is the Frobenius norm.Thus, our objective of localization of the mobile nodes is as follows, Start Exploiting Mobility Pattern Exploiting Connectivity

Localization Fusion
Fig. 2. The three major building blocks of EMP and their relationship.

A. Overview
EM P is a distributed algorithm designed for localization of nodes in highly sparse mobile networks.In EM P , each node estimates its location jointly based on its own track sensor devices (3D accelerometer, electronic compass, etc.), its own mobility pattern, and estimated locations of its encountered neighbors.As shown in Fig. 2, EM P can be divided into three building blocks.
In Exploiting Mobility Pattern, we characterize the mobility pattern of a mobile node with a Markov chain, as introduced in Subsection IV-B and IV-C.In Exploiting Connectivity, the location of a node is consolidated by using the estimated location of an encountered node, as described in Subsection IV-D.In Localization Fusion, the two localization sources, i.e., connectivity and mobility pattern, are fused to derive a better location estimation, as introduced in Subsection IV-E.

B. Characterizing Mobility Pattern
We first show that there are strong mobility patterns with user mobility.To this end, we analyze the real-world user traces from two university campuses (NCSU and KAIST), New York City, Disney World (Orlando), and North Carolina state fair [16] [23] through entropy analysis.
We denote the locations of the nodes as a variable: X ∈ G.The probability of node i within grid g k is denoted as where num(g k ) denotes the number of times that g k appeared in the historical trace ξ i .The marginal entropy can be calculated as, Similarly, the joint probability P (X i τ +1 = g j , X i τ = g k ), where num(g j , g k ) denotes the number of times that (g j , g k ) appeared in the historical trace ξ i .The conditional probability The conditional entropy could be calculated as follows, More generally, the conditional entropy is denoted by The CDFs of entropies of five real user traces are shown in Fig. 3.We observe that the entropies of the users traces are very low.For comparison, we calculate the marginal entropy of a node moving in a random way within a 100 × 100 field, it is 13.29 bits.This result indicates that the mobility of real users has strong spatiotemporal regularity.Thus, we can use the Markov chain model to characterize the mobility patterns of the mobile users.
To determine the number of orders for the Markov chain model, we calculate the conditional mutual information as, is less than 0.2 bits, which indicates that the first order Markov chain can well model the mobility pattern of a mobile user.
For implementing the first order Markov chain, the state transition matrix, denoted by Q, could be calculated by

C. Exploiting Mobility Pattern
This building block estimates the location of a mobile node by exploiting the mobility pattern of a mobile user.After modeling the movement of a mobile node with the first order Markov chain, the estimate of its location can be obtained, • Initially: • After k steps: Note π is the location estimate of the mobile node within the field.For the illustration purpose, we divide a field into 2 × 2 grids.Fig. 4 shows the location estimation process.The initial state π 0 =< 1 0 0 0 > corresponding to the state that a node is 100% sure to be located in the northwestern grid of G.
Clearly, the technique for estimating the location of a mobile node performs well if the mobility pattern of the node is strong.In practice, however, the mobility patterns of some nodes may not be strong.Thus, merely exploiting mobility patterns is insufficient for accurate localization.

D. Exploiting Connectivity
This building block aims to estimate the location of a mobile node by exploiting connectivity between nodes.The main idea of this building block is inspired by LOCALE [33] which is a distributed technique for using connectivity for localization of mobile nodes.In this subsection, we first introduce how to represent a location with the location estimate (mean) and the certainty (variance), and then describe how to exchange a node's location information with its encountered neighbors.
1) Location Representation: In probability theory, the central limit theorem (CLT) states that given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, is approximately Normal distributed.Based on the CLT, we use the location estimate (mean), denoted by E, and the certainty (covariance), denoted by C, to represent the current location of a node.
In the 2-dimensional case, the probability density function of location estimation is where E denotes the true location and the parameter C denote the certainty.We can see that only two parameters, C for certainty and Ê for location estimation, are necessary.When a node moves through a long period of disconnection, it estimates the location by some low accuracy dead-reckoning tracking devices.The devices are influenced by a great deal of factors, e.g., battery condition, wind, temperature, etc.The location estimation during this period also follows the Normal distribution.Since the movement covariance matrix is oriented in the moving direction denoted by θ, ρ in C Lr equals to zero.The covariance matrix in the local coordinate C Lr is represented by Before the combination of the old estimation distribution N (E o , C o ) and the relative measurement distribution N (E r , C r ), the transformation process is necessary because they are not in the same coordination.The rotation matrix is defined as The covariance matrix in the common coordinate could be calculated by the rotation of the local coordinate We could calculate the new distribution simply by Finally, the new location estimation distribution is calculated simply by the linear combination of the old estimation distribution N (E o , C o ) and the relative measurement distribution N (E r , C r ).
2) Exchanging Location Information with Encountered Nodes: As mentioned before, our algorithm is distributed, where the coordinate of the individual mobile nodes are different with each other, shown in Fig. 5.To solve this problem, the coordinate transition process is necessary before the process of merging the location estimation from the neighbor nodes.
The operation process of exchanging location information with encountered nodes is shown in Fig. 6 In Step 1, we transform the location estimation by rotating the local coordinate to the common coordinate by, In Step 2, the host location estimation could generate a y uncertainty component in the y-axis.
In Step 3, the location uncertainty of the neighbor also influences the y uncertainty component in the y-axis.We add them to the observation component, too.
In Step 4, in the x-axis, the x uncertainty from the neighbor is also added to the x uncertainty component.In the previous operation we have already transformed the coordinate into

Neighbor
Step 1 Step 2 Step 4 Step 5 Step 3 Step 6 The y uncertainty component from the host

The observed estimation
The merged estimation Fig. 6.
The operation process of exchanging location information with encountered nodes.the relative one, so ρ in C Lo equals to zero, and could be calculated by when the host and the neighbor node are in the communication range, the distance between them is a random variable.Here we assume that it is a uniform distribution in the 2-dimensional field.Thus, the distance d = γ/ √ 2. The observed location estimation could be calculated by In Step 5, with the help of the observation from the neighbor node, the observed C o could be calculated by In Step 6, the node localization accuracy could be improved by merging the host node location information and the transformed location information from the neighbor node.Due to the subjective (tracking sensor devices) and objective (environment factor) influence, the nodes location certainties (C) are different from each other.Therefore, we combine the estimation with respect to their certainties (C) acting as the weight.The merged certainty is calculated by The merged estimation is calculated by where the factor K is defined as Then we need to rotate the new location to the local coordinate by where θ m could be calculated by

E. Localization Fusion
This building block fuses the estimated location by exploiting connectivity and the estimated location by exploiting mobility patterns.So far, we have derived two different kinds of location estimation distributions which are very different from each other.The location estimate from exploiting mobility pattern is a discrete location distribution, while the location estimate from exploiting connectivity is continuous.We transform the continuous location into a discrete one by a sampling method.
Previously, we use function P (E) in ( 15) to describe the location information.We sample this location distribution by After sampling, π(i, j) is still continuous.Then, we introduce the uniform quantization process (UQP).In the UPQ we quantize the probability density into 2 ν levels, where ν is the bit number to store the quantized value.Then, the length of each quantization region is, The quantized values are the midpoints of the quantization regions.
After this process, two quantized discrete location estimation distributions are ready to be fused.In order to fuse those  two kinds of location estimates, we utilize the median percent area error (MPAE) proposed in [33].As shown in Fig. 7, the MPAE is defined as the area of the smallest circle that includes 50% certainty of the probability, i.e., the circle is the 50% certainty line of the 2-dimensional CDF.From the definition of MPAE, we can see that, when a node's certainty of its location estimation is higher, its MPAE should be smaller.From a rule of thumb, we use the reciprocal as its weight of certainty, the Certainty Weight w is defined as where C is defined as the grid number within the MPAE in the circle contains 50% certainty shown in Fig. 7.
With two distributions, π ρ : from exploiting mobility pattern and π : from exploiting connectivity, and their weights, w ρ and w , we can fuse them by calculating their weighted average After the fusion process, the host's π ρ is refined by π f .This process is the key to increasing the accuracy of node localization.

V. PERFORMANCE EVALUATION A. Methodology and Simulation Setup
To evaluate the performance of EM P , we perform evaluation with simulations with two real user traces from the KAIST campus and New York City.The trace datasets are recorded as follows.GPS receivers record the current positions of users every 30 seconds, which are recorded with a relative distance to a reference point.One user can make one or more daily trace files.
We divide the time of the traces into two segments: one is from the morning to the afternoon, which contains almost 75% period of time, and the other is from afternoon to night, which contains the rest 25% period of time.The first segment acts as the training trace, which is used to generate the mobility patterns of mobile users.The second segment of traces is used to evaluate the localization approaches.
We use the mean absolute error (MAE) as our performance metric, which has been widely used by localization algorithms, We compare EM P with the following schemes.
• LOCALE [33]: it is designed for localization in sparse mobile networks, which utilizes the location information from neighbor nodes when they are within the communication range.The big difference of our EM P from LOCALE is that LOCALE does not consider the inherent mobility patterns of users at all.• Tracking Sensor (T S): it merely utilizes the location information provided by tracking sensor devices.

B. Comparison Over Time
We examine a typical run of the localization schemes.Fig. 8 and Fig. 9 report the comparison of EM P , LOCALE and T S for the KAIST and the New York City traces.
We can see that the MAEs of the three schemes are initially almost the same.This is because the mobile nodes are initialized with an accurate starting location.When it comes to 100 time units, the MAE of T S increases greatly, while LOCALE and EM P are almost the same before 40 time units.After 40 time units, however, the localization error of LOCALE is greater than EM P .From the results, we can find that EM P achieves more accurate location estimation than LOCALE and T S do.

C. Impact of Number of Nodes
We next investigate the impact of the node density on the localization performance.To examine the impact we vary the number of nodes from 10 to 66 for the KAIST trace, and from 5 to 25 for the New York City trace.Fig. 10 and Fig. 11 report the comparison of the three schemes as the number of nodes is varied for both two traces.We can find that the performance of T S is very poor whatever the number of users is.EM P achieves almost 50% smaller localization error than LOCALE.Both EM P and LOCALE have better performance with the increasing of mobile users.This is reasonable because they all utilize the cooperative localization to estimate user locations.EM P is even better because it takes mobility patterns of mobile users into consideration.

VI. CONCLUSION
Location information is valuable for many locationdependent application scenarios in mobile networks.The existing range-based localization is costly because of the hardware for range measurements, and the existing range-free localization requires a high node density.Unfortunately, the high density condition does not hold for sparse mobile networks.By analyzing five large datasets of real users traces with entropy analysis from two university campuses (NCSU and KAIST), New York City, Disney World (Orlando), and North Carolina state fair [16] [23], we have made an important observation that there is strong patterns with user mobility.With this observation, we have presented a localization approach called EM P by exploiting mobility patterns of users for localization of nodes in sparse mobile networks.EM P a ranging-free distributed algorithm, with which each user collaboratively estimates its current location by fusing two localization sources, i.e., network connectivity and mobility patterns.Upon meeting another user, the location of that user is used to improve the location estimation of the user.At the same time, the mobility pattern of the user is exploited for helping refine its location estimation, and users are differentiated according to the degrees of their mobility patterns.Trace driven simulations show that EM P achieves significantly better localization performance than other existing approaches.

Fig. 3 .
Fig. 3.The CDF of the entropies from five real user traces: NCSU, KAIST, New York City, Orlando Disney World, and North Carolina State.

Fig. 4 .
Fig. 4. The process of state transition for exploiting mobility patterns.

Fig. 5 .
Fig. 5.The representation of the host and neighbor nodes with different coordinates.

Fig. 7 .
Fig. 7.The distribution of estimated locations, marked with the median percent area error in the red dash circle.

Fig. 9 .
Fig. 9. Mean absolute error vs. time (for the New York City).

Fig. 11 .
Fig. 11.Mean absolute error vs. number of users (for New York City).