δ -Calculus: A New Approach to Quantifying Location Privacy ☆

: With the rapid development of mobile wireless Internet and high-precision localization devices, location-based services (LBS) bring more convenience for people over recent years. In LBS, if the original location data are directly provided, serious privacy problems raise. As a response to these problems, a large number of location-privacy protection mechanisms (LPPMs) (including formal LPPMs, FLPPMs, etc.) and their evaluation metrics have been proposed to prevent personal location information from being leakage and quantify privacy leakage. However, existing schemes independently consider FLPPMs and evaluation metrics, without synergizing them into a unifying framework. In this paper, a unified model is proposed to synergize FLPPMs and evaluation metrics. In detail, the probabilistic process calculus (called δ -calculus) is proposed to characterize obfuscation schemes (which is a LPPM) and integrate α -entropy to δ -calculus to evaluate its privacy leakage. Further, we use two calculus moving and probabilistic choice to model nodes’ mobility and compute its probability distribution of nodes’ locations, and a renaming function to model privacy leakage. By formally defining the attacker’s ability and extending relative entropy, an evaluation algorithm is proposed to quantify the leakage of location privacy. Finally, a series of examples are designed to demonstrate the efficiency of our proposed approach.


Introduction
With the widespread usage of mobile devices equipped with high-precision localization capabilities, such as mobile phones [Yin, Guo, Zhang et al. (2019)], intelligent cars [Tian, Gao, Su et al. (2019)], and wearable devices [Tian, Luo, Qiu et al. (2020)], locationbased services (LBS) have gained great success in the mobile wireless Internet. LBS (e.g., navigation, point of interest (POI), and motion data publishing) is changing our daily lives at an unprecedented speed. Specifically, users can guide themselves to places that they have never been. With the help of LBS, they also can query nearby POIs with location. Recently, the publishing of running or cycling trajectories has become a new fashion in the circle of friends. Furthermore, trajectories publishing can also help optimize the city resource and prevent traffic congestion. In addition, LBS can be easily integrated in many other fields, such as crowdsensing systems [Li, Sun, Lu et al. (2020)], edge computing systems [Tian, Shi, Wang et al. (2019)], and IoT-Based network [Yin, Luo, Zhu et al. (2020)]. However, as users enjoy the convenience of LBS, location privacy has become a major concern. Location-based Service providers can infer the users' preferences and behavior habits by counting uses' location information and their search history. What's more, adversaries can obtain users' trajectories by monitoring their communication, thus carry out trail, rob or theft of empty houses, which seriously threaten the safety of users' life and property [Yin and Liu (2019)]. Nowadays, a large number of efforts have been spent on preserving location privacy, which can be roughly be divided into two categories: (1) the threat analysis of location privacy, the formalization of related attacks, and the design of the appropriate LPPMs [Zheng, Cai, Li et al. (2017)], and (2) the evaluation and measurement of location privacy [Olteanu, Huguenin, Shokri et al. (2017)]. Beyond that, several efforts have been spent on acquiring the true location of users from the anonymized locations. In the first category, three schemes, the elimination scheme (ES) [Abul, Bonchi, Nanni et al. (2014); Pandit, Polina, Kumar et al. (2014); Dong and Pi (2018)], the anonymity scheme (AS) [Sweeney (2002) ;Freudiger, Manshaei, Hubaux et al. (2013)], and the obfuscation scheme (OS) [Xiao and Xiong (2015); Zhang, Zhong, Han et al. (2016)] have been proposed. ES is to confuse the linkage relationship of locations at continuous time series through eliminating the trajectory of the real user, thus, preventing trajectory from being leakage. In the anonymity scheme, Sweeney [Sweeney (2002)] interrupted the connection between the true identity and privacy information. Through this approach, the true identity is protected. In OS, noise is added into the user information and prevents attackers from discovering the relevance between user's location and identities, thus degrading the possibility of location attack [Wang, Yang, Han et al. (2017)]. However, these schemes do not formally and theoretically verify their efficiencies. To address the above problem, a large number of efforts are spent on adopting (or designing) formal methods to mine vulnerability and improve the protective efficacy of LPPMs [Arapinis, Chothia, Ritter et al. (2010); Guo, Zhang, Zhang et al. (2018)]. Additionally, many of the metrics Emara et al. [Emara, Woerndl, Schlichter et al. (2015); Niu, Li, Zhu et al. (2015)] are proposed to measure evaluate the effectiveness (including accuracy, correctness and certainty) of the LPPMs. The requirements for adopting formal methods to design LPPMs and evaluate their degree of privacy have been widely recognized. However, in the existing approaches, the design scheme is often separated from evaluation metrics, and this separation might cause that the selected formal tool for evaluating LPPMs might not match the designed LPPMs. As a result, users' location privacy cannot be measured precisely. This makes it harder to guarantee location privacy. Therefore, we should closely combine assessments into the designed LPPMs to guarantee location privacy. In this paper, we propose a probabilistic process calculus, called δ-calculus, to formalize the LPPMs and measure the privacy level of LPPMs by using the relative entropy. The main contributions of this paper are as follows: Through adding location calculus into π calculus, we propose a δ-calculus to formalize the obfuscation schemes. In detail, a suit of syntax and their semantics is designed to formally describe LPPM. Specially, we design the moving calculus to model nodes' mobility and probabilistic choice calculus to compute the probability distribution of nodes' locations. Two examples show that our proposed calculus can efficiently evaluate location traces. We propose the renaming function to model information leakage. Further, by formally defining the ability of an attacker and extending the relative entropy, we propose the evaluation algorithm to evaluate the degree of location privacy. We use the proposed scheme to evaluate the protection scheme (DUMMY-T) of trace privacy. The results demonstrate that our scheme can accurately quantify location privacy. The rest of the paper is organized as follows: Related work is introduced in Section 2, and Section 3 presents the probabilistic automata for this paper. We propose the syntax and semantics of our δ-calculus in Section 4. Section 5 provides some evaluation results of our proposed location privacy measuring. Section 6 shows some experiment issues. Finally, we conclude our work in Section 7.

Related work
In this paper, we mainly focus on location-privacy protection mechanisms and their metrics methods, so we also discuss the related work in these two aspects.

Location-privacy protection mechanisms
In general, protection mechanisms of location-privacy can be divided into 3 categories: the elimination scheme (ES) [Abul, Bonchi, Nanni et al. (2014);Pandit, Polina, Kumar et al. (2014); Dong and Pi (2018)], the anonymity scheme (AS) [Sweeney (2002) ;Freudiger, Manshaei, Hubaux et al. (2013); ], and the obfuscation scheme (OS) [Xiao and Xiong (2015); Zhang, Zhong, Han et al. (2016)]. In the aspect of ES, by exploiting the inherent uncertainty of the whereabouts of the moving object, Abul et al. [Abul, Bonchi, Nanni et al. (2014)] designed a co-localizationbased LPPMs to eliminate the outlier trajectories of users', and cluster them, thus, enhancing users' privacy. Pandit et al. [Pandit, Polina, Kumar et al. (2014)] proposed a novel server-central framework (called CLOPRO) to generalize a new query content and protected location privacy in continuous LBS by eliminating some attributes from the original query and confusing the temporal link of locations. Dong et al. [Dong and Pi (2018)] presented a frequent-path-based approach (called TOPF) for preserving privacy in trajectory data publishing. In their work, information of infrequent road was removed and all trajectories were divided into candidate groups, and provided a balance between the data usability and data privacy. However, in ES, attackers can use the uneliminated content of the original trajectory to infer the real location of users. In the anonymity scheme, location privacy is guaranteed by hiding the relationship between users' true identity and their sensitive location. Generally, the anonymity scheme can be divided into two categories: cloaking methods [Sweeney (2002)], and pseudonym change [Freudiger, Manshaei, Hubaux et al. (2013)], where k-anonymity is a common method in cloaking techniques where require at least k users in the anonymity set. Through k-anonymity protection, an attacker cannot distinguish the user from the other k-1 users [Roberto and Rakesh (2005) ;Niu, Li, Zhu et al. (2014)], thus providing anonymity. Except basic k-anonymity schemes, several variants are proposed to protect location privacy. Ye et al. [Ye, Li, Xu et al. (2014)] proposed an l-diversity-based LPPMs to maintain the heterogeneity of anonymity trajectories and depersonalized user's characteristic. In the aspect of pseudonym change, mix zone where users collectively changed their pseudonyms is one of the frequently used solutions to protect location privacy. By frequently changing pseudonyms, Beresford et al. [Beresford and Stajano (2003)] proposed the Mixzone to prevent the locations they visit from being identified. Besides, many efforts are spent on the combination of k-anonymity with pseudonyms. For example, Liao et al. [Liao, Sun, Zhang et al. (2017)] hid user's real trajectory by combining k-anonymity [Pramanik, Lau, Zhang et al. (2016)], Mix-zone [Liu, Zhao, Pan et al. (2012)], MixGroup [Yu, Kang, Huang et al. (2016)] together. Although this kind of work has disrupted the relationship between user ID and query, attackers with background knowledge can still guess the real location of the user. OS reduced the location accuracy by adding noise into users' information. Using geoobfuscation, Wang et al. [Wang, Yang, Han et al. (2017)] proposed a location privacypreserving framework for assigning tasks to protect users' locations. By adding noise, it is difficult for attackers to guess the real location by analyzing the query results. Xiao et al. [Xiao and Xiong (2015)] proposed a systematic solution to preserve location privacy with rigorous privacy guarantee. Their work reduced the sensitivity of a single node transmission by rendering indistinguishability between the real events and the fake ones.
To balance the quality of service and privacy protection, the noise should be accurately added, which required to quantify and evaluate the similarity between the obfuscated trajectory and the real trajectory. However, it is an important challenge to quantify them.

Formal analysis of location privacy and its metrics
Many efforts are spent on formally analyzing and discovering location privacy to decrease the risk brought by the vulnerability of LPPMs. Arapinis et al. [Arapinis, Chothia, Ritter et al. (2010)] used the applied π calculus to analyze the unlinkability and anonymity of identities and demonstrated that the RFID e-passport of French is linkable. In consequence, a person who uses this e-passport can be traced physically by a malicious attacker. Brusó et al. [Brusó, Chatzikokolakis, Hartog et al. (2010)] defined both untraceability and forward privacy, and they formally proved the privacy guarantees of the OSK protocol (an encryption method named after the authors: Miyako Ohkubo, Koutarou Suzuki and Shingo Kinoshita). Dahl et al. [Dahl, Delaune, Steel et al. (2010)] also used the applied π calculus to demonstrate that the cryptographic mix-zones (CMIX) protocol doesn't provide privacy guarantees in specific scenarios. Guo et al. [Guo, Zhang, Zhang et al. (2018)] measured the degree of privacy disclosure by adopting evaluating the privacy leakage level with uncertainty of the adversary's speculating the user's identity by formalizing the proportion of users using the pseudonym algorithm in the system. Liu et al. [Liu, Zhao, Pan et al. (2012)] proposed a metric method to quantify the system's resilience to the side information. An optimization formulation with cost and traffic constraints is presented to model the multiple mix zones placement problem. However, the formal privacy protection model LPPMs is not enough. How to measure the effect of different privacy protection algorithms is an important standard in the design of privacy protection algorithms. To measure the LPPMs, a large amount of effort has been spent on studying metrics (accuracy, correctness and certainty) to measure location privacy for specific scenarios [Yin, Sun, Wang et al. (2018)]. For example, Emara et al. [Emara, Woerndl and Schlichter (2015)] used the uncertainty to describe the ambiguity of the actual location that can be disguised by posterior distributions. Through this approach, location privacy of a given user can be quantified [Niu, Li, Zhu et al. (2015)]. The validity of the uncertainty metric relies on the knowledge of probability mass believed by an adversary. However, the changes of the assigned probabilities will be influenced by the inaccuracy of context information. As a result, the choice of the attacker may be tainted with uncertainty [Fischer, Katzenbeisser, Eckert et al. (2008)]. Since uncertainty cannot be adopted to accurately evaluate location privacy, many new metric (e.g., inaccuracy) have been proposed, where for an observed location and its distributions estimated by an adversary, the inaccuracy is defined as the discrepancy between the actual posterior distributions of its possible location. Because, the tracking error is taken into account in the inaccuracy metric, the metric is an appropriate approach to evaluating the privacy. Unfortunately, there is still a gap between the current location privacy protection model and the privacy protection effect evaluation algorithm. When the model is not consistent with the rules of evaluation index, it will be difficult to ensure the effect of LPPM. There is an urgent need to design a way to integrate the two in a unified architecture.

Probabilistic automata
In our work, we use probabilistic automata to describe the formal semantics of the proposed δ-Calculus. To achieve this goal, we compactly retrospect the probabilistic automata [Herescu and Palamidessi (2000); Deng, Pang and Wu (2006)] in this section, as follows.
Let X be a set of discrete events and pb be a probability function over X , pair ( ) , X pb is a discrete probabilistic space, that is, . Given a set Y of discrete events, its set of probabilistic spaces are defined on : (1) S is a set of the pre-defined states; (2) 0 s S ∈ is the initial state; (3) A is a set of actions; (4) is sub-set of Cartesian product between S and , and the element of ∆ is called transition group. Generally, there may exist multiple groups for a given state of probabilistic automata. In this case, schedulers will select one group during an automata running. Formally, the scheduler for M can be represented by function ξ : , defined . For simplicity, we define a shorthand for the notation, as follows.
If and only if ( ,

δ-calculus 4.1 Syntax
Communication devices (also called nodes) which might be comprised in the mobile wireless Internet run at locations and they may move from locations the other locations. For simplicity, we use notations M or N to denote the set of devices and use notations P or Q to denote the set of processes. The syntax for nodes of δ-calculus is defined as follows.
, :: , z denotes the node name (e.g., node ID) and rad denotes the communication radius. P stands for a process and i p is a positive probability, that is, means that node z runs process P with probability i p at location i loc , and the maximum communication distance of node z is rad . M|N denotes that nodes M and N run in parallel. The restriction operator is denoted by symbol ν , and ( ) loc ν ⋅ is used to constrain the range of locations. 0 denotes an inactive node. A process is defined as followed: , :: .
where processes .

ST P and ( ).
S x P denote "sending T over channel S , then running as P " and "receiving x over channel S , then running as P ", respectively. Probabilistic process , assuming that P are now at loc , MV .
f P is used to denotes that process P will reach location i loc′ with probability ( ) i pb loc′ , 1 i n ≤ ≤ . Notation nil is an empty process. In δ-calculus, we use both S and T to denote terms and their syntax are defined as followed.
, :: Where x is an element of the countable set of variables and a is an element of the countable set of channel names.

Semantics
Combining normal π calculus with node names, locations and communication distances, we can get δ-calculus. Using a transition system tagged by actions , α β , we define its operational semantics, as follows: where τ is a silent action, ( ) a M and ( ) a x denote the output of term M and the input of x on channel a , respectively. Generally, an attacker may deduce the true location of nodes by monitoring their communications. denotes the physical distance between location loc′ and location loc′′ ) and (2) z′ synchronizes with z′′ . If the above conditions are satisfied simultaneously, z′′ will accept data sent by z′ (that is, z′′ will use x to substitute T ). As shown in the COM rule, z′′ will get the name and location of z′ after interacting with z′ . In the PAR rule, wildcard "*" is either "-", a location or a node name. Note: after PAR is used, new information about locations and names cannot be obtained by interactive nodes. The REP rule (the replication rule) denotes that a process repeatedly executes at the given location. We use RES1 to denote the constraint at locations, that is, actions at the restricted location should be allowed to be executed. RES2 means that actions on the channels different from y are allowed. If we use δ-calculus to simulate a LPPM S , then S 's behavior, denoted by ( ) tds S , can be considered as a set of trace distributions (denoted as ( ) tds S ). We obtain ( ) tds S via unfolding the δ-calculus. Next we give two simple examples to show the use of our δ-calculus. Example 1. We assume two entities (i.e., a node and a sink, denoted by 1 z and a z , respectively) exist in a wireless communication system. Node 1 z at location 1 loc delivers information info to a z with probability 1 p , and at location 1 loc′ with probability 1 1 p − ; Let a z be always in the range of communication of 1 z , (i.e., a z can always receive info from 1 z ). We can use δ -calculus to simulate this system, as follows:  Fig. 1 shows the probabilistic execution of Example 1. In Fig. 1, there are 8 of execution sequences (i.e., 1 8 t t … ), where trace 1 t indicates that if the send action is asynchronous with the receive action (i.e., they don't shake hands), then node a z does not get the information about 1 z from 1 t . In traces 3 t and 4 t , a z records locations ( 1 loc and 1 loc′ , respectively) of 1 z . In the two traces, a z observes two traces:     In this example, 1 z can infer that a z stayed at loc with probability 1. Accordingly, a z will infer that the traces of 1 z are , with probabilities 0 2 p p , 0 3 p p , 0 4 p p , 1 5 p p , 1 6 p p , and 1 7 p p , respectively.

Measuring location privacy
Surely, a LPPM always reveals location information more or less. In general, in an attack, if the amount of location leakage is less than a given threshold value, then this leakage can be accepted. This involves two issues (that is, the attacker model, and quantifying location leakage). In this section, we discuss them separately.

Modeling attackers
An adversary may infer location information by playing the role of normal users to interact with them, and monitoring their communication. Generally, adversaries are divided into two categories: strong attackers and weak attackers. If a LPPM under a strong attack is secure, then it is also secure under a weak attack. Thus, we should simulate the strong attacker. Informally, an attacker is to be strong, if it can gather all locations of the normal users at anywhere or anytime.

Definition 2.
An attacker is strong, if an output action (i.e., ( ) a T ) is performed anywhere or anytime and the attacker can gather the location where action a happens. In Examples 1 and 2, if an attacker can act as the sink, then it is strong (because it can obtain all locations once the output action happens). Next, we illustrate an example of a weak attacker. Example 3. Consider the system:

Quantifying location privacy
Given LPPM M and an attacker ATT simulated by δ-calculus, we use a set of trace distributions (obtained by unfolding | M ATT , written as ( ) | tds M ATT ) to describe ATT 's interactions with M . An attacker implicitly obtains nodes' locations of by recording these traces. Given a set X of trace distributions, a metric D on a set X can be defined as function : D X X R + × → . Generally, metric D is required to satisfy the following three axioms: non-negative (i.e., for all 1 2 , x x X ∈ , formula 1 2 ( , ) 0 D x x ≥ holds), coincidence (i.e., for all 1 2 , x x X ∈ , 1 2 ( , ) 0 D x x = if and only if 1 2 x x = ), symmetry (i.e., for all 1 2 , x x X ∈ , ) and subadditivity (i.e., for all 1 2 3 , , where R + is the set of non-negative real numbers.
To protect location privacy, many LPPMs add false locations into true locations to prevent an attacker from inferring the true location. For simplicity, we use LOC to denote the set of locations. To measure location privacy, we define the re-naming function as : ). That is, for each location in LOC , the following conditions are satisfied: (1) ( ) f loc loc = / forever, and (2) . We use LOC F to represent the set of all renaming functions LOC f on LOC . This means that M is the privacy-preserved under 2 D . The second part is direct. In our paper, we use α-entropy as a quasi-metric to evaluate location privacy, because relative entropy meets the axiom of nonnegative and coincidence. Definition 4. For discrete probability distributions u and u′ , the relative entropy of u′ from u is defined to be where 0 0log 0 0 = , 0 0log 0 q = , 0log 0 q = ∞ and i I ∈ is an index set. Because the behavior of a node is simulated as a set of trace distributions, KL D is extended to EKL D as follows (Similarly to Kapus [Kapus (2017)]).

Definition 5. Given two sets (
) of probability distributions, the relative entropy of U ′ from U is defined as where inf φ = ∞ and sup 0 φ = .
Measuring location privacy: Assume that node M is protected under the LPPM and ( ) tds M denotes M 's trace distribution, then the amount of leakage of local privacy is .
Example 4. Considering that a wireless communication system owns a base station b z and nodes 1 Z . Assuming that node 1 z at location 1 loc sends data to b z , and the goal of attacker a z is to get 1 z 's location by observing their communications. We also assume that in LPPM, false locations (e.g., false location That is, The leakage amount of location privacy will be . This is accordance with our intuition, (i.e., the capability of the LPPM is zero if i p approaches to 0 or 1, and it reaches 1 if i p approaches to 0.5). This shows that our measurement is accurate. According to the above analyses, we summarize our proposed approach to measuring location privacy, as shown in Tab. 2.

Experiments
In this section, we use the proposed δ-calculus to evaluate trace privacy protected by DUMMY-T [Niu, Gao, Li et al. (2016)]. In DUMMY-T, a set of realistic dummy locations for each snapshot is generated to guarantee the minimum cloaking region and resist from attacks performed by adversaries with background information. Then, DUMMY-T connects the dummy locations together into the dummy paths with considering the location reachability.  The idea of DUMMY-T is shown in Fig. 3. Specifically, to protect users' true trajectories from the LBS server, DUMMY-T need generate k-1 dummy trajectories based on kanonymity method. The trajectory in blue (triangle) is the user's real route, and the routes in green (square) and pink (circle) are dummy paths. Each trajectory can be divided into 5 snapshots with 5 timestamp 1 t to 5 t . The obfuscation region of each snapshot is denoted as 1 R to 5 R , the message from LBS server at each timestamp can be denoted as 1 mes to 5 mes . For each snapshot, DUMMY-T generates a set of dummy locations which cannot be distinguished from others easily, and connects the dummy locations into k-1 dummy paths with considering the reachability. Finally, users get several dummy trajectories which the adversary cannot be guessed the real one from them. Using δ -calculus, we can describe DUMMY-T, as follows: ≤ . Next, we discuss the leakage amount of privacy location in the following cases: Case 1: In this case, the moving functions 1 m~5 m are assumed to be rational (i.e., the mobile client moves with a reasonable velocity between the time intervals that the client sends messages 1 m~5 m , in the other word, SERVER cannot perceive any abnormality).
Given permutation function f and its inverse function 1 f − , we can evaluate the leakage amount of privacy location, as follows.
The minimum amount is evaluated as leakage amount of location privacy is zero. This is consistent in our intuition.
Case 2: In this case, at least one of the moving functions 1 m~5 m is assumed to be irrational (i.e., the distance between two dummy points is great than a reasonable distance and SERVER can perceive this abnormality

Conclusion
In this paper, we propose δ-calculus to formalize obfuscation-based schemes and measure location privacy. Probabilistic automata is adopted to formally characterize the semantics of δ-calculus. Specially, two calculus moving and probabilistic choices are proposed to model nodes' mobility and compute its probability distribution of nodes' locations. Further, the renaming function is proposed to model privacy leakage. By formally defining the attacker's ability and extending relative entropy, we propose an evaluation algorithm to quantify the leakage of location privacy. Experimental results demonstrate that our scheme can accurately quantify the location leakage. Through the proposed δcalculus, the gap between the obfuscation-based scheme and its measurement is decreased. In the future, the following work should be conducted.
(1) In this paper, we only integrate privacy measurement into obfuscation-based schemes. Obviously, it is necessary to design a formal language to describe both elimination and anonymization schemes and synergize them into the quantitative measurement framework. Through this approach, more LPPMs are verified and measured.
(2) Although we develop a measurement algorithm to evaluate the privacy leakage, this algorithm is not integrated into the existing tool (such as PRISM [Pramanik, Lau, Zhang et al. (2016)]). It is of great importance to integrate them to automatically calculate the privacy level.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.