A New Privacy-Preserving Scheme for Continuous Query in Location-Based Social Networking Services

With continuous queries that are used widely in location based mobile social networking services, how to protect the location privacy effectively for continuous query has been a hot topic for researchers. In this paper, we analyze the existing location privacy protection systems and algorithms for location based services; considering their disadvantages of slow responding time and high anonymization costs, we propose a new enhanced greedy cloaking algorithm which predicts a cloaking area at the initial query to be the cloaking region in the whole query lifetime by the comprehensive computation of privacy monitor, quality monitor, and dynamic adjuster. Privacy monitor and quality monitor charge the privacy protection level and service quality degree respectively; dynamic adjuster can adjust the cycle center point dynamically. We employ cycle as cloaking region form which can effectively alleviate the computation overhead. And we compare it with the earlier algorithm on three aspects. The experimental result shows that the enhanced greedy cloaking algorithm is better than the original greedy algorithm on average responding time or anonymization cost.


Introduction
With the rapid development of location based mobile social networking services (LBMSS) [1,2], location information has been a key factor for improving the quality of location based mobile social networking services. However, when users enjoy the convenience and efficiency of LBMSS, the location privacy leakage problems have become more and more serious than before. For example, Loopt [3] is a popular mobile social networking application which can create a map on the phone that will locate users' friend's residences and where they have visited; when users want to maintain the social talks with their nearest friends, they have to contentiously query the friends' locations. The adversary can infer the privacy of these people from the query content history and the people's location history. Although there are some mature measures such as -anonymity scheme [4][5][6] to solve it, most of them resolve the snapshot query that the content of any two consecutive queries from one user is uncorrelated. But in the mobile social networks users often need to repeat queries on a certain content during a time. For instance, in the mobile social navigation application WAZE [7], the road traffic information is gathered by reports from all users based on their current positions. When users report the road traffic situation based on their current locations, they probably need to continuously query the latitude and longitude of current location to report the road situation on continuous time. We call this kind of queries as continuous query. Directly applying the privacy protection algorithms for snapshot query to continuous query would cause some problems; the bad guys can infer the users' exact positions according to some contextual knowledge [8,9]. As an example, a common traditional location protection method for snapshot query is that replacing users' location position with a space region which consists of users who request queries in the same time period. However, for continuous query, if the above method is employed, there will be many consecutive space regions at different time which probably have intersections with each other, and the intersection part may be used to infer the users' location. Later on, some researchers [10][11][12] proposed schemes that a region is generated at first query to be the cloaked region in the whole life cycle of continuous queries. Thus there is no need to worry about intersection inference problems. Yet, how to select the first cloaked region and how to prevent the mobile users of continuous queries from converging to one point or be too dispersive have been a new research problem to be solved.

Background
Traditional privacy protection algorithms such as -anonymity (replace one query user's location with a region consisting of at least users) are mainly for snapshot query that a user issues one query on each interest point. Each query is uncorrelated with others. But in physical world, continuous query is more common than snapshot query such as GPS navigation for the users' nearest friend in a time period. When users who request queries are moving, the adversaries can collect the different cloaked regions to infer the users' exact location or movement route.
Here we give the details of some typical location privacy inference problems.

Cloaked Regions Intersection Problem.
Cloaked regions intersection problem is that the adversary may infer the user's exact location without having the user's any contextual knowledge. The adversary gets the consecutive cloaked regions to calculate the intersection part from these regions to understand the exact location of the user.
When a user send a continuous query to the location anonymizer (A server who charging anonymizing users queries is called as an anonymizer), the anonymizer will generate consecutive different cloaked regions for one query. The adversary can explore the cloaked regions to find out the user's location. An example is presented in Figure 1.
In Figure 1, mobile user A issues a 3-anonymity continuous query, which generates three different cloaked regions on three consecutive time points. Figure 1(a) shows the cloaked region a on , which includes users A, D, and E; because users are always mobile, on +1 , cloaked region b is different from a as it includes users A, E, and F, while D is not inside the cloaked region at that moment; on +2 , cloaked region c is generated where it includes users A, B, and H; if adversaries have the abilities to intercept the data of the three cloaked regions, he only need to calculate the intersection part and can easily find out that the query is from user A.

Privacy Inference Problems Based on Contextual Knowledge.
Commonly location data in location based mobile social networking services consists of users' coordinates {longitude, latitude}, velocity, and direction. We called these data users' contextual information. If one or some items are known by the adversary, he can easily infer the users' sensitive privacy. Figure 2 shows how the privacy is inferred when mobile users' direction and velocity are known by the adversary.
As shown in Figure 2(a), the black solid point is a user issuing queries; gray solid points are other mobile users. Three-anonymity cloaked region is generated like the black rectangle at 1 . Suppose the movement direction of the query user is parallel with coordinate , the movement speed is a constant , and thus the adversary can calculate the possible cloaked region according to and direction. The adversary uses the cloaked region of 1 to move to left and right by , on the gray area is the intersection part of inferred cloaked region and real cloaked region, which only includes one user. Thus the query user is exposed. In Figure 2(b), if the velocity of the query user is a certain value in a range ( min , max ), the attackers are unsure of the exact velocity of the user; however, inference process can be similar. The attackers use min and max to calculate the possible cloaked regions on the left and right side. At 2 the distance of the left boundary movement is min ( 2 − 1 ), while the right boundary is max ( 2 − 1 ); thus   the query user is still exposed. The above attacking measures are both based on the knowledge of direction of the users. However, even without the direction information, privacy can be still leakage.
As shown in Figure 3(a), the direction of mobile user is unknown but a range of angle ( min , max ), velocity is constant . Based on the knowledge, the adversary can calculate a possible cloaked region (an irregular close region shown as dotted line area in Figure 3(b)). This means that when direction and velocity are all uncertain, the early attacking measure still works.
According to the above analysis, we can find that the capability to protect the privacy of continuous query is very limited. The adversary can analyze the users' independent cloaked regions or some other background knowledge, and then users' privacy may be inferred and exposed. Therefore, we need to research on new advanced privacy preserving scheme for continuous query of location based social networking services.
For the privacy protection of continuous query, some researchers have proposed some schemes. Chow et al. [11,12] proposed a scheme that the users included in the initial cloaked area will last in the whole query life cycle; in other words, it is that the cloaked region selected at the first should also works at the end of the continuous queries. As shown in Figure 4.
User A sends a 3-anonymity query, at initial time 1 , {A, E, D} are the users waiting to be anonymized in the generated cloaked region; at 2 , the cloaked region maintains {A, E, D} as cloaking users, and adjusts the size of cloaked region according to the moving distances of the three users from time 1 to 2 ; at 3 , the same process is executed; thus the adversary cannot get the real user by intersection of the consecutive cloaked regions. However, its biggest drawback is that the users who decided to be in a region are close at the initial time, but when the time goes, users move in different direction and speed, and thus it can result in disperse distribution at the next time, the size of cloaked region may be too big to reduce the quality of location based social networking services, or the users may gather to one same position at the next time, which can also expose the users' privacy. To solve this problem, [13] gives a -privacy model and -quality model that select the mobile users in the initial cloaked region by considering the possible movement area of mobile users in the valid query life cycle. -privacy model mainly limits the length and width of the cloaked region to guarantee that the users of continuous query do not gather to one point; -quality model can monitor and adjust the torsion level to guarantee the service quality. But the two models have some disadvantages; one is that solving the rectangle region needs to calculate both parameters (length and width), which cost the computation of server; secondly the system needs to make real-time update the length and width by monitoring users' movement and predicting the next positions of the mobile users, which aggravate the burden of servers. To solve the second problem, [14] gives a 4 International Journal of Distributed Sensor Networks  method by cloaking the enough factors (location coordinates, velocity, and direction) of mobile users; thus the possible cloaked regions calculated by the adversaries are totally inside the real cloaked regions. As shown in Figure 5. However, the accuracy of algorithm depends on the estimation of mobile users' future positions although it can hide the users' velocity and direction. Commonly, the estimation of all continuous query users is not as precise as the actual positions; thus the result of cloaking may not be accurate.
Continuous query is one of the most common query types for location based services; however, current methods for protecting the privacy of continuous query are few and cost too much computation. Therefore, we study it and propose a new scheme to reduce computation of the estimation of the users' future location in order to alleviate pressure of anonymizer servers.

Enhanced Greedy Algorithm Scheme
In our scheme, we adopt greedy algorithm as the basic idea and improve it by changing cloaked region form and adding adjustment algorithm; at the same time, we improved privacy and quality models to let them suit the new situation. The major notations are shown in Notations section.

Definitions.
Assumption: we assume that one user only sends one query at a certain time; in other words, one query at a certain time is responding to a certain user.
Here we give some formal definitions.
Definition 1. In query , define each query from users as , and is a quintuple: QID is the identity of query which identifies a certain and unique query; LOC is the coordinate ( , ) of a place where the query is sent out; maxSpeed is the biggest moving speed of users which is represented by a vector, is the component of coordinate , and is the component of coordinate . is the time stamp to initialize a query and is the time when the query is expired.
Currently most of cloaking regions are represented as rectangles, marked as the coordinate of left-bottom corner and top-right corner. In our scheme, we pick cycle as cloaking region forms which is represented by center coordinate of the cycle and radius, and the definition is as follows.
Definition 2 (cloaked cycle CS (cloaked cycle)). Each cloaked region represented by CS is a spatial-temporal area which includes at least users: CID is the identity of cloaked region which identifies a certain unique cloaked region; LOC is the center coordinate of the cloaked cycle; Ra is the radius of the cycle; QS is a set included in the Q from CS.
As shown in Figure 6(a), the rectangle represents the whole system, points represent the users who request query,  dashed cycles represent cloaked cycle, the black points inside the cloaked cycle represent anonymity users, and the gray points outside the cycle represent the mobile users around. Suppose mobile users U6 request 3-anonymity query; then the dash cycle is the qualified cloaked cycle.
Definition 3 (distance cycle). We define the cycle formed by the distance from the query users to other users at some time as the distance cycle from query user to some certain user: LOC is the coordinate of query user at time , ⋅LOC = ⋅ LOC, Ra is the distance from query user to other user , suppose the query user's coordinate is ( , ), the other user coordinate is ( , ), then the center coordinate of distance cycle is ( , ), the radius is the Euclidean distance from to , ⋅ Ra = √( − ) 2 + ( − ) 2 , as shown in Figure 6(b), and the dashed cycle is the distance from users U6 to U9.
Definition 4 (cloaking angle (CA)). Cloaking angle is the inclined angle between coordinate and the beveled edge of triangle formed by Ra's area and radius of the cloaking cycle: as shown in Figure 7.
From Figure 7, CA is a range from 0 ∘ to 90 ∘ (0 ∘ < CA < 90 ∘ ), when is farther from , and the degree of is bigger.

Cloaking
Model. We improved cloaking model by privacy monitor, quality monitor, and dynamic adjuster.
Privacy monitor guarantees that, in the whole query life cycle, any two queries in the cloaked region cannot gather to one point.
is the generation time of cloaked cycle and max = max( ⋅ )( ∈ CS) ∘ . Then the cloaked cycle satisfied the privacy monitor requirement.
Quality monitor mainly prevents the queries of the mobile users from getting too scattered; therefore, we need to forecast the furthest possible location according to the biggest velocity.

Enhanced Greedy Cloaking Scheme.
The main idea of enhanced greedy cloaking algorithm (EGCA) is that when mobile users request location based services, they send anonymization query requests to anonymizer, the server checks each unexpired queries from candidate cloaking set QS one by one after receiving the query request and judges whether the anonymized queries (anonymized and QS) satisfy quality monitor model, if yes, then it inserts into QS, if not, it finds the next query and repeats the process until all queries of QS are checked (Algorithm 1). Then the server compares the numbers of QS with the users' anonymity requirement ; if they are bigger, then it forms a cloaked cycle and judges whether they satisfy privacy monitor model, if yes, then it calls the center position adapting function to adjust the position of the cloaked cycle center, if not or the queries number is smaller than , then it inserts into QS. The details are as follows. We can divide the algorithms into 4 parts: improved greedy algorithm, quality monitor model, privacy monitor model, and center adjustment algorithm (Algorithm 4).
In order to make sure that the cloaked region will not be too scattered to affect the service quality and efficiency in the query life cycle, we adopt quality monitor algorithm to avoid the above problem by predicting all possible locations of the querier according to the most biggest velocities at the initial time; thus qualified and appropriate queriers will be selected to be anonymized while unqualified ones will not be selected, as shown in Figure 8(a).
For the query shown in Figure 8(a), the arrow represents the possible biggest velocity of the queriers and the dashed cycle represents all possible positions of the queriers in the whole query life cycle. In the graph, suppose is a querier who issues 2-anonymity request; 1 and 2 are the other queriers waiting to be anonymized. 1 and 2 are the furthest position from to 1 and 2 . For a conservative calculation, calculate the anonymization inclined angle CA ( , 1 ) and CA ( , 2 ) and select the queriers inside the angle region. For the example shown in the figure, CA ( , 1 ) is bigger than CA ( , 2 ), which means that the area to which 1 might move to is bigger, so 1 is not selected. The solid cycle is the distance cycle from the query to 2 . The details of the algorithm are shown in Algorithm 2.
Algorithm 2 prevents the queriers from getting too dispersive at the initial time, while for the problem where queriers might converge to one point, we have to solve it after queriers are selected. To solve the problem, we propose another algorithm privacy monitor (Algorithm 3), a conservative way, and calculate all the possible future positions of each querier in the cloaked cycle and build equation set EQ of possible future position of each querier to judge whether there is solution of EQ. If there is a solution, then the cloaked regions can be overlapped and then might converge to one point; otherwise, we satisfy the privacy monitor model. As shown in Figure 8(b), solid cycle is mobile querier, the cycle is 3-anonymity cycle, and the dashed cycle is all possible positions scope of each querier in the whole query life cycle. From that, we can see that the location scope of each querier is a cycle area whose center is the start position of the query; the radius is the multiply of possible biggest velocity and the time period of initial and end of the cloaked cycle form; we then judge whether there is any intersection, if not, then privacy monitor requirement is satisfied.
Since we adopt cycle as cloaked region, the center is the querier location. In order to avoid the adversaries computing the center position of the cloaked cycle, we need to adjust the cycle center. The main idea of adjustment is to generate two random numbers according to the positions inside the cycle, as shown in Figure 9.
The main objective of the algorithm is to move the center position of the cycle while the size of the cycle is unchanged. It can effectively prevent the adversaries from calculating the center position of the cloaked cycle to identify the position.

Evaluation
We simulated mobile users and generated their location data by Brinkhoff Generator [15]. Input data is the city network map of OldenburGen, output is location data of mobile users' query which is stored in Oracle 10 g database, and the algorithms are coded by Java. We mainly analyze the feasibility and effectivity of the algorithms from three aspects:  average responding time, average anonymization cost, and anonymization success rate.

Simulator Configuration (Brinkhoff Generator).
In our experiment, we select Brinkhoff generator as the simulator which can simulate mobile objects for city roads; it has two basic city road maps, OldenburgGen and Sanfrancisco. We adopt OldenburgGen city network map, which includes 6105 nodes and 7035 edges, the generated data is stored in Oracle 10 g database, the algorithms are coded by JAVA, and the configuration of simulation computer is CPU: Intel i3 3.20 GHz 3.20 GHz, Mem: 2 G. Brinkhoff generator input parameters are shown in Table 1.
After configuration Brinkhoff generator is running to generate mobile users and outside objects' location data. Figure 10 shows the 395th time stamp graph after running Brinkhoff generator.
263434 records have been generated in table moving objects, and 12078 records have been generated in table external objects. The parameters for the algorithms are configured, as shown in Table 2.

Simulation Analysis.
We compare our algorithm EGCA with the original one GCA from the average responding time, average anonymity cost, and anonymity successful rate.

Average Responding Time.
Average responding time is the average time cost from that each query is send out until the cloaked region is received. Here it only calculates the successful anonymized queries. The time comparison of average responding time is shown in Figure 11.
From Figure 11, the average responding time varies with the degree of anonymity. If is bigger, the average responding time is longer and rise of freight rate, in that with becomes bigger, more mobile queries need to be solved. It needs more compuatation to get the distance cycles, thus it alleviates the anonymizer and responding time is longer.

Average Anonymization Costs.
Average anonymization costs represent the costs that the anonymizer solves the query requests after the requests are sent out. In CGA, the average anonymization costs are defined as the average perimeters of the cloaked rectangle in the query life cycle, when the average perimeter is bigger, then the cost is higher. While the average anonymization costs in ECGA is defined as the average perimeter of cloaked cycle in the query life cycle, Figure 12 shows the comparison. For Figure 12, the average anonymity cost of both algorithms rises with the increasing of . In the figure, we can find that two curves are intersectant; before the intersection point, average anonymization costs of ECGA is higher than CGA, while after the intersection point, ECGA is lower in average anonymization costs. In both algorithms we use average perimeter to calculate the average anonymization costs, while both perimeters and diameters are decided by two furthest queries along coordinates and , which means that when the region perimeter is 6.28 ( is half of the distance between two furthest queries), both perimeters are identical. When the region perimeter is smaller than 6.28 , the average anonymization costs of ECGA are higher, and when the perimeter is bigger than 6.28 , costs are lower than CGA.

Anonymization Success Rate.
Anonymization success rate means the ratio of the number of queries which return cloaked cycle with the numbers of all request queries. It reflects the feasibility of algorithm. Figure 13 gives a comparison columnar graph of the anonymization success rate of ECGA and GCA and the anonymity degree is from 3 to 8. As shown in Figure 13, the success rate of EGCA anonymization is equal or a little higher than GCA when is relatively small. When becomes bigger, the success rate of EGCA anonymization is a littel lower than GCA. When EGCA searches the qualified mobile queries, the quality monitor is defined to be relative fixed in strict way while GCA is changing. When is small, the qualified queries are relatively few; when is big, the fixed condition limits some mobile queries, so the success rate starts to decline and becomes lower than GCA. However, from the simulation, even is 10, the success rate is still above 90%, and thus it is acceptable for practice in real application.

Conclusion
In this paper, we analyze the existing location privacy protection systems and algorithms for location based services, considering their disadvantages of slow responding time and high anonymization costs; we propose a new enhanced greedy cloaking algorithm which predicts a cloaking area at the initial query to be the cloaking region in the whole query lifetime by comprehensive control of privacy monitor, quality monitor, and dynamic adjuster. Privacy monitor and quality monitor charge the privacy protection level and service quality degree, respectively; dynamic adjuster can adjust the cycle center point dynamically. We employ cycle as cloaking region form which it can effectively alleviate the computation overhead. And we compare it with the earlier algorithm on three aspects. The experiment result shows that the enhanced greedy cloaking algorithm is better than the original greedy algorithm on average responding time or anonymization cost.

Q:
Q u e r yf r o mm o b i l eu s e r QID: Id of a query LOC: Current location when a query is issued maxSpeed: Biggest moving speed of users which is represented by a vector : The time stamp to initialize a query : The time when the query is expired CS: Cloaking cycle: a spatial temporal area which includes at least k users CID: The identity of cloaked region which identify a certain unique cloaked region Ra: The radius of the cloaking cycle QS: A set included in the Q from CS CA: The inclined angle between coordinate x and the beveled edge of triangle formed by Ra's area and radius of the cloaking cycle : Q u a l i t ym o n i t o r : P r i v a c ym o n i t o r .