URALP: Unreachable Region Aware Location Privacy against Maximum Movement Boundary Attack

As location-based services have become popular, thereby exposed user locations raised serious privacy concerns. A typical measure for location privacy is to report blurred locations and ensure that other users coexist in the reported region. However, additional knowledge about the user's maximum speed and the territorial information in user's vicinity can allow for the adversary to effectively compromise the user's location privacy. In this paper, we present an anonymization algorithm that effectively counters such attacks while achieving k-anonymity requirements as well as minimum acceptable cloaked region size. We evaluate our anonymization scheme using state-of-the-art simulators for both vehicular movements and pedestrian movements. The experimental results demonstrate the effectiveness and efficiency of our proposed algorithm.


Introduction
With the advanced technologies, for example, wireless communication, global positioning system (GPS), and cellular networks, location-based services (LBSs) are eventually available anywhere anytime. They currently attract millions of mobile users by offering many valuable and important services to users. Common examples include Point-of-Interest services (e.g., finding the closest hospital for heart patients), monitoring traffic condition (e.g., warning traffic congestion reported from probe vehicles), location-aware social networking (e.g., users sharing their locations with friends using Facebook Places or Google Latitude), and location-based advertising (e.g., distributing 50% off coupons to all customers within two kilometers of a store). However, because LBSs need users' locations as well as profiles of users to increase the value of services, problems may arise if the service provider is not trusted for user privacy. For example, an adversary could get user location information from the service provider to locate and track the users. Therefore, users could endanger their privacy by contacting the LBSs directly to send their location information.
One may attempt to preserve users privacy by hiding their identities, using, for example, a pseudonym instead of an identity. However, this is not enough because the user can be reidentified even from anonymized location data [1]. In order to process location-based requests, the LBS needs the location of the user. An attacker, which could be the LBS itself, can infer the identity of the user by associating the location and the request time with a particular individual. This can be easily performed in practice. For example, if one reports her or his location at 3 am every Wednesday, an attacker may then infer who she or he is with the help of a public telephone directory. Another approach to protect the user identity is to employ the concept of location -anonymity [2]. This approach tries to find a set of at least users such that the user is indistinguishable from other − 1 nearby users.
In addition to identity, the users want to hide their exact location. Protecting the user privacy with locationanonymity, however, is not enough to protect her or his location information when the users are located within an area small enough to locate the user precisely. Location obfuscation approach [3][4][5] is used to decrease the precision of a position so that the attacker can only receive coarse-grained position information. A cloaked region, which is larger than a user-specified threshold, called minimum acceptable area, will be sent to the service provider instead of the exact location. Therefore, the attacker knows that the user is located in 2 International Journal of Distributed Sensor Networks the cloaked region but has no clue where the user is exactly situated. In this paper, we aim to protect the user identity and user location by the methods: location -anonymity and location obfuscation.
Consider an attacker Bob who wants to keep track of his teenage daughter Alice. One day, on the way to a bar, Alice would like to find the nearest gas station to fuel her car. However, she does not want to disclose her location so she reports a cloaked region to LBS instead of her exact coordinate. In the bar, she makes another query to see if any of her friends nearby is interested in joining her while hiding her coordinate because she also does not want her father to know that she is in a bar after school. By accessing the service provider's data somehow, Bob may obtain the cloaked location information of Alice's two LBS queries. From the first query, he can infer with high probability that Alice is currently driving in the city, so her speed cannot exceed 30 mph. Then he determines the area where Alice can reach maximum movement boundary (MMB) at the time requesting second query by using Alice's maximum speed. Therefore, knowing the area map and the MMB, Bob can limit the obfuscation area to certain locations (i.e., the bar) by removing all the unreachable regions based on the map and the MMB. We call this type of attacks MMB attack with constrained movement (MMB-CM). The MMB attack has been studied in some previous work [6][7][8]. However, existing approach works only in an open-space environment and its effect is significantly limited for the constrained movements. Attacking based on knowledge about reachable and unreachable areas in victim's vicinity was mentioned in [9]. But combinations of different types of attacks have been rarely considered [10]. We make the following contributions in this paper: (i) We propose URALP (Unreachable Region Aware Location Privacy), a location-cloaking algorithm against MMB-CM attack.
(ii) For the first time in the literature, we evaluate location privacy algorithms using state-of-the-art transportation and pedestrian simulators for realistic vehicle and pedestrian movements.
(iii) We show that the proposed anonymization algorithm is efficient and effective. In particular, we show that the algorithm achieves near-optimal entropy for the user locations.
The rest of this paper is structured as follows. In Section 2, we describe the system architecture, the attack model, MMB-CM attack, and privacy goals. Then we propose our cloaking algorithm in Section 3. Section 4 presents the performance evaluation results of our proposed algorithm. We discuss related work in Section 5, and we conclude in Section 6.

Problem Model
In this section, we describe the system architecture, the attack model, MMB-CM attack, and privacy goals. 2.1. System Architecture. For system architecture, we adopt three-tier model comprised of mobile device, trusted anonymizer, and LBS provider ( Figure 1). We assume that the user does not trust the LBS provider for respecting user privacy. Therefore, when the user wants to submit an LBS query to the provider, the user instead contacts the anonymizer. On behalf of the user, the anonymizer performs the following tasks: (1) receiving an LBS query from the user, containing user identity, location, and privacy requirements including the privacy level , the minimum required area min , and the life time of the query ; (2) anonymizing the query by replacing the user identity with a pseudonym and cloaking the location into a cloaked region, which meets the privacy requirements; (3) submitting the anonymized LBS query to the LBS provider; (4) refining the query results from the provider by leaving out irrelevant results caused by location cloaking; and (5) sending the refined results to the user.
Our system architecture supports both stateful and stateless services. When each query needs to be linked to previous queries by the provider (stateful), the anonymizer keeps the same pseudonym for the same user. When each query is independent (stateless), a fresh pseudonym is generated for each query.

Adversary
Model. Any party with the following capabilities is a potential attacker: (i) being able to access all or some anonymized LBS queries including pseudonym, time, and cloaked region; (ii) having knowledge about the upper bound of the victim's moving speed; (iii) having knowledge about reachable and unreachable areas in victim's vicinity.
For example, the attacker can be a malicious LBS provider or anyone who can access the provider's system such as a law enforcement agency or an intruder. We assume that the user trusts the anonymization server.
The upper bound of the victim's moving speed can be estimated based on the victim's transportation means and the road speed limit. If the user is driving in New York, for example, the road speed limit is 40 km/h in residential areas, 104.6 km/h on freeways, and 88.5 km/h in rural areas [11]. In addition, the vehicle type can also help the attacker to estimate the maximum speed more precisely. If the user is travelling on foot, the moving speed may not exceed 6 km/h. An unreachable area is the area where the victim is unlikely to move through, if not impossible. For example, when driving, the user can only travel on vehicular roads or parking lots, and one can easily identify unreachable areas for the driving user from the area map. If the user is a pedestrian, unreachable regions are places where people cannot walk, such as water, vehicular roads, train tracks, or some dangerous areas.

MMB-CM Attack.
We consider the following MMB-CM attack ( Figure 2). Suppose the attacker wants to know the location of the user . At time , makes a query; then later at +1 makes another query. At time , users , , and cloaked their location as region ( ), and at time +1 users , , and cloaked their location as region ( +1 ). As the attacker estimates the maximum speed of as V , he/she can confine 's location at +1 within the maximum movement boundary MMB ( +1 ), the bound enclosing ( ) with enlargement by V * ( +1 − ). Therefore, 's real location should be inside the intersection of MMB ( +1 ) and ( +1 ). Knowing the geometry of the lake, which is unreachable by the user, the attacker can further confine 's location at time +1 to the smaller area indicated by red color in Figure 2. If the confined area is smaller than the minimum acceptable area set by the user, we conclude that 's location at time +1 is compromised.

Privacy Goals.
As the privacy model we adoptanonymity. However, -anonymity itself is not enough to protect the user's location privacy. For example, even if we find more than users located in the same cloaked region, the location privacy would be compromised if the cloaked region is small enough so that the user feels uncomfortable to reveal it. In summary, we aim to achieve the following privacy goals to meet the user's privacy requirements.
(i) Every successfully anonymized request contains the cloaked region such that at least − 1 other users submit requests with the same cloaked region. Each user may have different value. (ii) Every cloaked region should be larger than or equal to the minimum acceptable area size ( min ) set by the user.

Anonymization Algorithm
In this section, we describe our anonymization algorithm.

Overview.
For a newly arrived request, we find a set of requests (including the new one) to be anonymized, submitted by different users. These users report the same cloaked region to the LBS so that each user can be indistinguishable among other users. So, we aim to find the set of users and the cloaked region that satisfy these conditions as follows: (i) meets the -anonymity requirements of all the users in the set. (ii) The cloaked region meets the minimum acceptable area requirement of all the users in the set. (iii) The cloaked region is contained inside the MMBs of all the users in the set. The first and second condition are necessary to meet the privacy goal described in Section 2.4. The last condition is also necessary. Suppose the cloaked region is not contained in the MMB of a user in the set. Then, part of the cloaked region outside the MMB should be removed from the cloaked region as the user cannot appear on that part.
We find such a -set as follows. First, we find requests from different users, such that each user's location is contained in the MMBs of all the other −1 users. That means that all the other users can appear at the location of the user, which makes them indistinguishable to the attacker. Once we find such a set, the cloaked region is the minimum bounding rectangle of the users. As a result, this rectangle contains the users' locations and is contained in the MMBs of all users. The latter is true because each MMB is a bounding box of the users. If the cloaked region is not large enough to meet the minimum acceptable area requirements of all users, we strategically expand the region until the condition is met.
In the following, we describe the algorithm in more detail.

Finding -Anonymity Set.
All requests waiting for cloaking (called alive requests) are modeled as an undirected graph ( , ), where is the set of all the alive requests and is the set of edges. An edge ( , V) exists in if and only if the two requests and V are issued by different users (also denoted by and V), and their locations fall in the MMB of each other. For example, consider four requests , V, , and from different users in Figure 3. Note that is contained in MMB V and V is contained in MMB ; thus an edge exists between and V. Likewise, there is an edge between V and , and , and and V.
Given such a graph of alive requests, we can find aanonymity set by identifying the maximum clique (or maxclique) in the graph containing the new request node [12]. This ensures that each user is contained in the MMB of each

MBR of , t, and w
Intersection of MMB , MMB t , and MMB w other. We accept this anonymity set if it is larger than theanonymity requirement of each user.
Once the anonymity set is found, we set the initial cloaked region to the minimum bounding rectangle (MBR) of all the users in the clique. In Figure 3, V, , and form a max-clique, and thus a -anonymity set.
The cloaking engine processes each arriving request from mobile users in 3 steps. The first step, called detection, is responsible for updating all max-cliques containing the new request. The second step, called generation, involves generating candidate cloaked regions, which satisfy privacy requirements from the max-cliques. The candidate cloaked region that archives the best utility is chosen as the cloaked region. Finally, the graph will be updated in the last step, called updating. If the request is cloaked successfully or failed to be cloaked because of expiry of its life time, it will be dropped. In the following sections, we detail each step of our algorithm.

Detection.
Upon arrival of a new request, we first add the new node to the graph and detect all max-cliques containing the new node. Then, we incrementally detect the max-cliques by using the update max-clique algorithm in [6]. Finally, because we need to find the -node clique, we only add the max-clique that has number of users greater than the privacy level of the new request to the max-clique set.

Generation.
After updating all max-cliques, we generate the cloaked regions that meet all the requirements as follows: (i) The number of users in the candidate cloaked region has to be larger than the privacy levels of all users in the candidate cloaked region.
(ii) The area of MBR excluding unreachable region has to be larger than the minimum acceptable area of all users in the candidate cloaked region. First, we sort the max-clique set in descending order of clique size. For each max-clique, we generate a candidate cloaked region as the MBR of the cells containing all the clique-members (see Figure 4). Then, we check if the two conditions are met. If the first condition is not satisfied, that is, the clique size is less than a clique-member's privacy level , then we repeatedly remove the user from the clique until the condition is satisfied. If the second condition is not met, that is, the current cloaked region is less than a clique-member's minimum acceptable area size, the MBR is expanded as follows.
At each step, we extend the MBR by one cell-width towards one of four directions that increases the reachable area the most, but not beyond the intersection of the MMBs (see Figure 5). We repeat this step until the MBR is as large as the minimum acceptable area. If we fail to satisfy any of the two conditions, the anonymization fails.
Algorithm 1 gives the pseudocode to generate the cloaked region. The procedure for extending the MBR is described in Algorithm 2.
Input: the new request , max-clique set involving Output: cloaked region (1) sort in descending order of clique size (2) for each do (3) find max , min , max min for (4) ← MBR of cells of users in if ≥ max min then (8) add to ; break; (9) else (10) add returned result of Algorithm 2 to if it does not return false; break; (11) if | | < min then (12) not satisfy; break; (13) else (14) sort the requests in in descending order of their privacy level ; (15) while | | < max do (16) drop the request with the highest from (17) update , | |, max , max min , and (18) if | | ≥ max and ≥ max min then (19) add new to ; break (20) if < max min then (21) add returned result of Algorithm 2 to if it does not return false; break; (22) return the cloaked region that has minimum reachable area We define MCSet to be max-clique set involving the new request , CR being the cloaked region, canCS being each max-clique in MCSet, max k (min k) being the maximum (minimum) privacy level of all users in canCS, max min being the maximum min of all users in canCS, canCR being the area of candidate cloaked region, CRset being set of candidate cloaked regions, UR being the unreachable area, and area being the reachable area.

3.5.
Updating. Each request specifies its own life time . If the request is cloaked successfully or its deadline has passed, it will be dropped. The max-clique set also has to be updated. After removing the leaving request from the max-cliques involving it, we need to check whether the update cliques are still maximal. If the update clique is a subset of any other clique, it is not a max-clique. So it will be removed from the max-clique set.

Evaluation
In this section, we present a set of experiments that show the effectiveness and efficiency of our proposed algorithm. In particular, we evaluate our algorithm for users moving by car and on foot. To our best knowledge, this is the first attempt in evaluating a location privacy method with realistic pedestrian movement data. In the following, we describe the experimental setup, evaluation metrics, and the results.
For pedestrian movements, we used a pedestrian simulator SimWalk [15]. Table 2 shows the experiment parameters. The experiment contains 200 people walking in a bus station ( Figure 6). The area is 164 m by 80 m large, divided into 1.6 m by 1.6 m cells. All experiments are run on an i7-2600 3.4 GHz machine with 4 GB of RAM. We compare our proposed algorithm URALP with the algorithm proposed by Pan et al. [6]. Their algorithm is designed for MMB attacks, but we evaluate it against MMB-CM attacks. We denote this algorithm by MMBPan.

Evaluation Metrics.
We evaluate the scheme by cloaking time and anonymization time for performance, cloaked region size for utility, and success rate and entropy for privacy as follows: (i) Cloaking time is the time used to update all maxcliques and generate a cloaked region. (ii) Anonymization time is the time between the periods when the request arrives and when the request is cloaked successfully. It includes the cloaking time and the waiting time to be cloaked. (iii) Cloaked region size is the size of the final cloaked area when successful. (iv) Success rate is the ratio of the number of successfully anonymized requests to the total number of requests.
(v) Entropy is as shown below.
We measure the entropy of the mobile user's locations as follows. Given a set of cloaked regions { 1 , . . . , } reported to the LBS server, let be the reachable part of ( = 1, . . . , ). Then, for a fixed number , we evenly partition into tiles { 1 , . . . , }. For each , we compute the probability of each tile as = / ∑ where is the number of times that a user appeared in . Then, ∑ = 1 (1 ≤ ≤ ). Now we can compute the average entropy of the user locations bŷ= International Journal of Distributed Sensor Networks When the algorithm makes all the users appear in a specific tile, say the center tile, then the entropy becomes zero, and the attacker can always pinpoint where the user is. As an opposite case, suppose each user appears in each tile with the same probability; then the entropy is maximized (i.e., entropy = log( )) and the attacker cannot guess the user locations except random guess. Our experiments show that our algorithm achieves near-optimal entropy.

Experiment Results.
In the experiments, we vary the maximum movement speed (Figure 7), request life time (Figures 8 and 9), minimum acceptable area (Figures 11 and  12), number of users (Figure 13), privacy level (Figures 14 and  15), and cell size ( Figure 16). We also analyze the achieved privacy using the metric of entropy in Figure 10.
In summary, the proposed algorithm URALP outperforms MMBPan in success rate, cloaking time, anonymization time, and entropy, whether vehicular or pedestrian movements.
As shown in Figures 7(b That indicates that our proposed algorithm achieves higher efficiency than MMBPan. The reason is that the complex computations to calculate the reachable area in each cell were done before doing anonymization. So, in cloaking process, we only use addition operation to calculate the reachable area in cloaked region. Otherwise, without cell generation, MMBPan has to take more time to finish anonymization. (1) Effect of Speed. Figure 7(a) shows that the higher user speed increases the success rate. The reason is that since user's velocity is proportional to the MMB's size of the user, a faster speed leads to a larger size of MMB. Therefore, having an edge existing between two nodes is more possible to achieve. It means the possibility of finding a cloaked region is much easier. Figures 7(b), 7(c), and 7(d) show that, for the two algorithms, the cloaking time, the anonymization time, and the cloaked area are not affected by the change of the speed.
(2) Effect of Life Time. Figures 8(a) and 9(a) study the effect of the life time of requests on the two algorithms. With prolonging , the success rates of all algorithms increase because fewer requests could be dropped due to expiry of their life time. A longer life time means that a request can wait longer to be cloaked successfully. Moreover, the anonymizer in our algorithm has to take time to extend the MBR. Therefore, when is prolonged from 0.5 s to 3 s, the success rate of URALP greatly increases from 74% to 77.7% and from 73% to 87.4% in vehicle simulation and pedestrian simulation, respectively. After that, from 3 s to 15 s, it slightly grows since not many requests need such long time to be cloaked.
From Figures 8(b) and 9(b), as the anonymizer can take more time to extend the MBR, the cloaking time increases with longer . Figures 8(c)  (3) Effect of Minimum Acceptable Area. This section gives the effect of more strict privacy profile on system performance by increasing the value of minimum acceptable area min . From  Figures 11(a) and 12(a), because two algorithms have to find a larger cloaked region to meet an increased min requirement, their success rates drop with larger min . Figures 11(b), 12(b), 11(c), and 12(c) show that the cloaking time and anonymization time increase with larger min . The reason is that it requires more time to find a larger cloaked region and the requests could be waiting for a longer time. Figures 11(d) and 12(d) show that the average cloaking area increases to satisfy the minimum acceptable area requirement.
Note that URALP generates larger cloaked regions than MMBPan. This is because URALP expands the cloaked area to meet the min requirement.
(4) Effect of Number of Users. As shown in Figure 13(a), the success rate decreases with increasing number of users. This is mainly because of the increased workload (see Figure 13(b)), which makes more requests expire and fail to be cloaked. From Figures 13(b) and 13(c), the cloaking time and the process time increase with increasing number of users, since more users imply more max-cliques and longer search time. The average cloaked areas of two algorithms in Figure 13  The reason is that more requests could be removed from the clique whose number of users is smaller than request's privacy level. Those requests have to wait to be cloaked in other cliques. The average cloaked areas increase with higher privacy level since the request needs to find more neighbors to satisfy the privacy level requirement (Figures 14(d) and 15(d)). It causes less requests to expire before their anonymization succeeds. Therefore, the success rate in Figure 16(a) increases greatly. After that, from 100 m to 800 m, it increases slightly. Downside of large cell is, obviously, to have larger cloaked area sizes as shown in Figure 16(d) since MBR is generated from all cells that contain users in max-clique. In other words, it could decrease the quality of service. So, to maximize the success rate while minimising the cloaked region size, the optimal cell size can be chosen based on a specific map. In this paper, we pick the cell size of 100 m and 1.6 m for vehicles and pedestrian, respectively. this experiment, we divided each reachable region into 100 × 100 tiles, that is, = 10000. Therefore, the maximum entropy is log(10000) = 13.287.
In Figure 10, we notice that entropy of our proposed algorithm is significantly higher than that of MMBPan under various conditions such as time, area, and privacy level. The reason is that, in MMBPan, cloaked region is MBR of users in max-clique, so users are mostly located in the boundary of cloaked region. Otherwise, URALP expands the cloaked area to meet the min requirement; thereby entropy of URALP is higher. This indicates that URALP provides more uncertainty so as to reduce the chance of being discovered by adversaries. Particularly, the entropy that URALP achieves is from 12 to 12,6 nearly reaching the maximum entropy. It shows that the distribution of users in cloaked region is almost uniform when using URALP method.

Related Work
The concept of location -anonymization was first studied by Gruteser and Grunwald [2]. Under the centralized location anonymization architecture, they introduced a scheme in which the spatial and temporal accuracy of location information is reduced such that at least users are indistinguishable. This basic model has been extended in several ways. For example, Bettini et al. [16] have introduced the concept of historical -anonymity for preserving privacy. Mokbel et al. [17] have integrated both anonymity and obfuscation to protect privacy using a centralized system. They calculated the obfuscation area of the users in their Casper framework based on the user-defined values of and an area value min indicating that the user wants to hide his/her location within an area size of at least min . Recently, Niu et al. [18]  proposed a dummy location selection algorithm to achieve -anonymity for users in LBS. They have considered both entropy and cloaked region to maintain the entropy while carefully choosing dummy locations to achieve -anonymity.
The MMB attack has been studied in some previous work. The problem was first introduced by Reynold et al. [8]. They proposed two simple solutions, namely, patching and delaying. Patching solution enlarges the previous cloaked region to cover the current one so that the overlapped area with the MMB is at least as large as the current cloaked region. Its disadvantage is that the size of cloaked regions is increasing when time goes past. That will result in expensive query process cost. If there is a constraint Amax, which is maximum cloaking region size a user can tolerate, the increasing cloaked region is easy to exceed Amax. Therefore, success rate of anonymizing will be low. Delaying solution postpones the query until the MMB covers the current cloaked region. However, the user may have moved to another location and the current cloaked region also is changed. If we treat the new cloaked region as the old one, it will affect the accuracy of query result. Time delaying also results in bad services quality. Ghinita et al. [7] proposed spatial and temporal transformations to preserve user privacy. Moreover, they considered the scenario that the attacker knows the placement of sensitive locations. However, in these papers, the identity of the user is known, and the objective is to protect the exact location of the user. In other words, the privacy model is only the granularity of cloaked regions, without considering the location -anonymity. So they fail to protect the user identity in case there is only International Journal of Distributed Sensor Networks one user in the cloaked region. These studies cannot work effectively in the situation of constrained movements. Gedik and Liu [12,19] proposed a personalizedanonymity model and proposed CliqueCloak, which constructs an undirected graph for all the requests that have not been anonymized yet to combine users that can share the same cloaked region. Our work also employs a graph model but differs from the underlying problems and the methods for finding cliques. The proposed algorithm exhaustively searches the graph for cliques covering the new request to generate candidate cloaked region. In contrast, based on [6], to reduce the computational complexity, our algorithm incrementally maintains maximal cliques. Their cloaking algorithm protects location privacy against snapshot location attacks, while our proposed algorithm can prevent the MMB attack with knowledge of all the cloaked location updates.
Similarly, Pan et al. [6] proposed an incremental cliquebased cloaking algorithm, called ICliqueCloak to make sure that all users in the cloaked region are in the MMB of each other. Therefore, the intersection area between the cloaked region and the MMBs is the cloaked region itself. In other words, the cloaked area is not reduced by MMB attack. They used the graph model to solve location -anonymity problem. However, different from our work, this study works in an open-space environment only and its effect is significantly limited for the constrained movements. Moreover, they also performed wrong implementation because of miscalculating the reachable areas of users. In their implementation, moving objects can only move on the roads instead of anywhere; but they calculate that the reachable area is the MBR of the users.
In [20], the authors adapted location entropy measure defined in Cranshaw et al. [21] to study the privacy of 16 International Journal of Distributed Sensor Networks a location. Location entropy measures the frequency of users' visits to a given location. They investigated how entropy impacts user perceptions of location privacy by showing that users are more comfortable in sharing high entropy locations than low entropy locations. Xu et al. [22] also developed a location cloaking technique to resist MMB attack. They built the cloaked regions based on two privacy metrics: entropy and minimum acceptable cloaking area. However, they did not consider the -anonymity to protect user identity.

Conclusion
In this work we have proposed a greedy algorithm against the MMB attack that may infer user's exact location with knowledge of regions in the map. To address this problem, we have employed a grid structure to extend the cloaked region that does not satisfy minimum area requirement. We showed that the existing algorithm against the MMB attack cannot work effectively with considering the accurate calculation of the reachable areas. The presented experiment results examine the effectiveness and efficiency of our proposed algorithm under various settings.