A Game Theory-Based Analysis of Data Privacy in Vehicular Sensor Networks

Mobile traces, collected by vehicular sensor networks (VSNs), facilitate various business applications and services. However, the traces can be used to trace and identify drivers or passengers, which raise significant privacy concerns. Existing privacy protecting techniques may not be suitable, due to their inadequate considerations for the data accuracy requirements of different applications and the adversary's knowledge and strategies. In this paper, we analyze data privacy issues in VSNs with a game theoretic model, where a defender uses the privacy protecting techniques against the attack strategies implemented by an adversary. We study both the passive and active attack scenarios, and in each scenario we consider the effect of different data accuracy requirements on the performance of defense measures. Through the analysis results on real-world traffic data, we show that more inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. In addition, we present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy.


Introduction
With the advances and wide adoption of wireless communication technologies, vehicles are now often equipped with wireless devices that allow them to communicate with each other (V2V) as well as with roadside infrastructures (V2I). The V2V and V2I communications make driving more safe and improve a driver's driving experiences. Such communication networks are called Vehicular Ad Hoc Networks (VANETs). However, with the increasing needs for sensing and data acquisition in cities, VANETs have turned into Vehicular Sensor Networks (VSNs) [1]. VSNs exploit vehicles and passengers to capture the occurrence of events, such as traffic volume, road surface condition, chemical, and radiation. The location traces in the traffic-related data create various fresh new business applications and services, such as map drawing [2], traffic prediction [3], city planning, and mobile network analysis [4].
However, the places in these location traces that a driver or passenger has visited may reveal his/her sensitive information, such as traffic law violations, political affiliations, and medical conditions [5,6]. Although the information about vehicular mobility traces are often collected in an anonymous way, an adversary can reidentify the true owner of a trace. Because the location information of drivers and passengers can be openly observed in public places, and also can be disclosed voluntarily or inadvertently by themselves, such as a casual conversation, or published media such as news articles or web blogs [4]. The adversary who has partial knowledge of the whereabouts of drivers or passengers (which are called victims), can infer the traces' true owners with high probability by using vehicular mobility constraints and spatiotemporal correlation [4,7].
To reduce spatiotemporal correlation, some frequently proposed privacy protecting techniques suggest reducing 2 International Journal of Distributed Sensor Networks the resolution of the recorded data [8,9] or introducing noise in the data [10][11][12]. However, these techniques may not be suitable for privacy preserving in VSNs due to their inadequate consideration for the adversary's knowledge and its attack strategies. On the other hand, these techniques can not meet the different data accuracy requirements of different applications and services [13]. To address these challenges, we must first analyze the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.
In this paper, we use a game-theoretic model to study the effect of the adversary's knowledge and strategies and the data accuracy requirements on the performance of defense measures. More specifically, we first present location privacy issues in VSNs, including the ability and goal of the adversary and defender. Then, we define a game theoretic model the attack and defense game which models the strategy selection decision behavior of the adversary and defender. In this game, the adversary implements its attack strategies in both the passive and active attack scenarios; the defender uses the frequently proposed privacy protecting approaches to increase the hardness of the adversary to reidentify the victims. Finally, through analysis results on real-world traffic data, we show that attack strategies in different scenarios show different performance. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy for each attack strategy. The main contributions of this paper are as follows.
(i) We define an attack and defense game model to capture the strategy selection decision behavior of the adversary and defender, and we show the effectiveness of defense strategies.
(ii) We establish the attack and defense game based on real world traffic data. In particular, for different attack scenarios, we study both the complete information game (the defender knows the adversary's knowledge on whereabouts of victims) and the incomplete information game (the defender does not know the adversary's knowledge).
(iii) Through the Nash equilibriums in these games, we show that the defender can balance the data accuracy and victims' location privacy to obtain the maximum utility when an adversary implements the optimal attack strategy.
The rest of the paper is organized as follows. In Section 2, we discuss related work. Section 3 presents the network model of VSNs and the location privacy issues in VSNs including the ability and goal of the adversary and defender. In Section 4, we define the attack and defense game model and present the attack in different scenarios and defense strategies. In Section 5, we present our main analysis results in the complete and incomplete information game for different attack strategies. We conclude the paper in Section 6.

Related Work
Several recent studies [4,7] have analyzed the privacy risk of mobile traces and found that omitting identifiers from mobile traces does not guarantee anonymity due to the spatiotemporal correlation. Ma et al. [4] show that an adversary who have a relatively small amount of drivers' location snapshots, could infer the true owners of anonymous traces with high probability. Montjoye et al. [7] study fifteen months of human mobility data for one and a half million individuals and find that four spatio temporal points are enough to uniquely identify 95% of the individuals. Therefore, there is an raising need for stronger privacy protection mechanisms.
In general, the existing solutions can be divided into two categories: reducing the resolution of the recorded data [8,14] and introducing noise in the data. Hoh et al. [8] propose a disclosure control algorithm called uncertainty-aware path cloaking algorithm that selectively reveals GPS samples to limit the maximum time-to-confusion for all vehicles. Nergiz et al. [14] adopt the notion of k-anonymity to trajectories. They find a representative trajectory in trajectories so that every trajectory is indistinguishable from other − 1 trajectories.
These approaches of introducing noise have also been extensively studied in [10,12]. Lu et al. [10] create a mix area in social spots to achieve the provable location privacy in VANETs. Huang et al. [12] proposed a solution called silent period to provide user with location privacy preserving in wireless networks. However, these techniques may be not suitable for privacy preserving in VSNs for two reasons. First, these techniques rarely consider the effect of the adversary's knowledge and its attack strategies on the performance of defense strategies. Second, they cannot meet the different data accuracy requirements of different applications and services [13]. These issues are what we want to address in this paper.
Game theory provides the many needed mathematical frameworks for analysis, modeling, and decision processes for network security and privacy issues [15][16][17], so we adopt game theory to study the data privacy issues in VSNs. There are many works on using game theory in security aspects of VANETs. Raya et al. [18] model the revocation problem using a finite dynamic game with mobile nodes as players, who can detect misbehavior with a certain probability. Reidt et al. [19] design a distributed detection error tolerant revocation scheme called karmic-suicide by using a game theoretic approach. In [20], zerosum game, fuzzy game and fictitious play are applied to model the interaction of the attacker and defender. In [10], the authors use game theoretic techniques to prove the feasibility of their pseudonym changing strategy. Freudiger et al. [21] analyze the noncooperative behavior of mobile nodes by using a game theoretic model, where each player aims at maximizing its location privacy at a minimum cost. In this paper, we study a new aspect of privacy by evaluating the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.
International Journal of Distributed Sensor Networks 3

Preliminaries
In this section, we explain our network model, as well as our assumptions and the location privacy issues in VSNs. We conclude by sketching the problem this work aims at solving. In Table 1, we summarize the notations introduced throughout this paper.

Network Model.
In VSNs, vehicles and passengers act as sensors to capture the occurrence of events, such as traffic accidents, traffic distribution, and road weather information [22]. VSNs can perceive the traffic distribution in a city with efficiency and high accuracy, and thus they have been envisioned to have a great potential to revolutionize human's driving experiences and metropolitan-area traffic flow control. Figure 1 illustrates the network model of VSNs. Vehicles use Dedicated Short Range Communication (DSRC) [23] technology to transmit the traffic related information, that is, ⟨time, position⟩ pairs, to Roadside Unit (RSU) via single-hop or multihop communications. Then, RSUs upload the information to a traffic control center through wired networks, so the center can predict the traffic distribution with very low cost and high accuracy. We assumed that a set of traces, each of which recording intermittently the time and corresponding location of a mobile node, are used by various applications, such as traffic prediction, city planning, and mobile network evaluation. These traces are anonymous in that the true identity of a vehicle has been replaced by a random identifier, but one same true identity is always mapped to a same identifier. In the following, we will elaborate the privacy threat on these anonymous traces.

Threat
Model. An adversary tries to identify the complete path histories of one or more victims (drivers or passengers) from the anonymous traces. We assume that the adversary can collect certain side information about one or more victims. Each piece of side information gives the location of a victim at a time instant, although the information may not be exact. In practice, the side information may be obtained through the following means. First, nodes are open to observations in public spaces. Hence, the adversary may obtain the side information directly through meeting the victim by chance or engineered encounters. This case is called an active attack scenario. Second, nodes may disclose information on their whereabouts either voluntarily or inadvertently [11], which is called passive attack scenario. For example, a casual conversation between Alice and Bob may involve where Alice was around 8 am, or it may involve another person's location.
We set the trace of node to be , the set of all nodes' traces is Tr = { }, the side information of a victim as a map, : { } → Θ, where is the time instant at which some side information about the victim's locations is revealed, and Θ is the set of all cell location IDs. Then, the ability of the adversary to reveal the victim's trace can be given in the form of conditional entropy as follows: where Pr( | ( )) is the conditional probability that the victim's trace corresponds to the side information ( ) known by the attacker, and Pr( , ( )) is the joint probability of the trace and the side information ( ).

Defense Measures.
A defender tries to protect the privacy of victims, which is defined by the uncertainty of a node's trace. The trace uncertainty is referred to as the entropy of the probability distribution of a node's trace [9,12]. Let = Pr( = ), ∀ ∈ Tr, denotes the probability that 4 International Journal of Distributed Sensor Networks corresponds to the victim's trace , so the entropy of the probability distribution of can be defined as The larger the entropy ( ) is, the more uncertainty the victim's trace is. The smaller the entropy ( ), the less uncertainty the victim's trace , when ( ) = 0, the victim's trace can be uniquely determined.
Defense objectives for protecting trace privacy can be measured by k-anonymity. An anonymity set is denoted as that includes the nodes with their traces indistinguishable from that of the victim . The k-anonymity model for privacy protection as used in [24] essentially refers to an anonymity set with a minimum size , where the victim is guaranteed to be not distinguishable from at least − 1 nodes with respect to information related to the victim (such as location information). However, not all the nodes in the anonymity set are equally likely to be the victim since an adversary may be able to obtain side information on the nodes.
We use entropy to improve this model here. Let the defender want to provide an anonymity set with a minimum size ; then, the entropy of the distribution of the anonymity set is given by ( ) = − ∑ (1/ )log 2 (1/ ) = log 2 . Hence, the mechanisms for trace privacy protection should meet as follows: where Ω is the side information that the defender believes the attacker has known, and ( | Ω) is the uncertainty of the victim's trace in the case of Ω.
Since privacy is a context-specific property and is socially and/or culturally defined [6,25], the trace privacy needs of individual users may vary, and further different users may require different k-anonymity entropy ( ). But here, we only consider the average of k-anonymity entropy for all users.

Problem Statement.
Given the attacker's strategies and the defender's strategies, the problem is to find the relationship between the attack strategies and the performance of defense strategies, and the relationship between the data accuracy requirements and the performance of defense strategies. Then, based on these anlysis results, we find the optimal defense strategies for different attack scenarios.
In order to address these problems, we must consider (i) how to model the strategy selection decision behavior of the adversary and defender, and (ii) that the defender may not know the adversary's knowledge.

Attack and Defense Game
In this section, we introduce the attack and defense game model to capture the strategy selection decision behavior of the adversary and defender. We first define the game model and the concept of Nash Equilibrium (NE) throughout the paper, and then we present different attack strategies and defense strategies.

Game Model. The game is defined as a triplet ( , , ),
where is the set of players, is the set of strategies, and is the set of payoff functions [26].
Strategy. The set of strategies in the games is = { | 0 < ≤ }, where 1 , 2 are the set of strategies of the adversary and defender, respectively. We will describe them in detail in Sections 4.3 and 4.4.
Payoff Function. When the defender knows the side information that the adversary has collected, we use complete information games. In a complete information game, the payoff function of the adversary is 1 ( 1 , 2 ) = ( | )− 1 , and the payoff function of the defender is 2 ( 1 , 2 ) = ( | Ω) − 2 , where 1 , 2 are the cost of attack and defense, respectively. In order to maximize attack power, 1 is set to 0. The defense cost 2 includes both the cost of implementing defensive strategies and the damage to data accuracy.
Typically, the defender does not know all the side information that the adversary has collected. Hence, we consider the suggestions proposed by Harsanyi [27]. We introduce a new player named Nature, which assigns a type to the adversary according to a prior distribution . can be considered as the side information that the adversary has collected. Then, the payoff functions are expressed as ( 1 , 2 ( )).

Equilibrium Concepts.
In complete information games, Nash equilibrium (NE) can be defined as follows.

Definition 1.
A strategy profile * = ( * , * − ) is a Nash equilibrium if, for each player , In other words, in a NE, none of the players can unilaterally change his strategy to increase his payoff. A player can also play each of his pure strategies with some probability using mixed strategies. A mixed strategy of player is a probability distribution defined over the pure strategies .
In incomplete information games, we adopt the concept of Bayesian Nash equilibrium [21].
where − is the type of player 's opponents, and ( ) is the prior distribution of .
A2 is the more general attack. To some extent, A1 can be considered as a special case of A2, that is, for each , = 0. We assume that the adversary will attempt to use all known information in its inference strategy, by employing some form of Bayesian inference. In applying the Bayesian inference, the adversary can make use of some general knowledge, including constraints on nodal movements imposed by geography of the roads, and general movement preferences of the nodes.
Scenario B: Active Attack. The adversary is active in this scenario that it obtains side information about victims by encountering the victims. The adversary can obtain traces in a real time and gradual fashion, that is, as time progresses, the adversary is provided with the trace information together with the information acquired up to the real time instants. The goal here is to identify as many traces as possible. If is the trace of the adversary , active attack can be divided into the following attacks.
Attack 2 (B2). The adversary moves to maximize the amount of useful side information, that is, there exists at least one pair of and such that ( ) ̸ = ( ). We assume that after encountering a victim, the adversary will not attempt to follow the victim. Because the objective of the adversary is to identify as many trace identities as possible.

Strategies for A1 and A2.
As noted before, the side information often contains noise. The adversary thus needs to perform Bayesian inference or use the maximum likelihood estimator to make the best guess. The goal is, given , to find the that gives the best match. The formulation of such a procedure is described below. Given The goal of the maximum likelihood estimator is to find which maximizes the expression (7). Note that the denominator is a constant. In addition, without any knowledge about how the victim is chosen, we set the priori distribution of the victim to be uniform: as follows, ( ) = 1/ , = 1, 2, . . . , . Hence the solution of the maximum likelihood estimator is given by For A1 and A2, the expression (8) can be given in the following form.
Scenario A1. If the noise in the side information is independent and identically distributed, and obey some given distribution Pr , the expression (8) can be written as where ( )− ( ) = , this location difference is computed using the Cartesian distance between the two cells [4].
Scenario A2. If node mobility obeys the Markov model, (8) can be given by where =̃+ (1 − )̃+ 1 , and when is a constant, the Markov process of node mobility is steady. The expression (9) can be greatly simplified if the noise obeys specific forms, such as normal distribution or uniform distribution. Therefore, the adversary can use some heuristic approaches [4] to identify the victim's trace. In the following we consider four strategies used by the adversary to identify the victim's trace from the published trace set. We first describe them for scenario A1 as follows.
(i) Maximum likelihood estimation approach (MLE). This is the same as formulation (9), that is, the similarity value of trace is given by where the trace with the maximum similarity value is declared to be the victim's. (ii) Minimum square approach (MSQ). When the 's takes normal distribution (0, 2 ), that is, Pr( ( ) | ) = exp{−(∑ | ( ) − ( )| 2 )/(2 2 )}, = 1, 2, . . ., for some constant . Hence, the maximum likelihood estimator is essentially the same as the following minimum square approach: where the trace with the least similarity value is declared to be the victim's. (iii) Basic approach (BAS). The adversary assumes that the noise is zero-mean and has a specific standard deviation ( ), but makes no assumption about its exact distribution. The adversary then computes the similarity value of trace with the side information as follows: where 2 ( , ) = 1 if | − | ≤ 2 and 0 otherwise. Hence, the adversary accepts a trace as a potential candidate if the trace owner appears in a radius of 2× of the revealed location. The trace with the maximum similarity value is declared to be the victim's. (iv) Weighted exponential approach (EXP). In this approach, which is proposed and analyzed in [11], the adversary does not know the type of noise or its magnitude. Similar to BAS, the adversary maximizes the similarity value of trace as follows: where ( ( )) is some weight assigned to the revealed cell ( ) and is a constant.
The above formula can be easily modified for scenario A2. For convenience, the probability that the vehicle is on the cell at time for trace is defined as the function : Θ × { | = 1, 2, . . .} → + as follows: where = (̃) + (1 − ) (̃+ 1 ),̃< <̃+ 1 . Then we have International Journal of Distributed Sensor Networks The four strategies have the same computational complexity, which is linear in the number of pieces of side information and the number of nodes. Notice that we assume attack strategies that only collect the side information about one victim. However, the strategies can be easily extended to the case in which the adversary collects the side information about several victims. In particular, the MLE approach can be used directly without modification, while a properly picked threshold can be used for the other attack strategies to remove traces from consideration if their similarity to the victim's trace is lower than the threshold.

Strategies for B1 and B2.
In the active attack scenario, the adversary observes the participants directly. The published traces are revealed in a real-time and synchronized way with respect to the information collected by the adversary. As there is no noise when additional information is acquired, the adversary does not need to use any inference strategy. Based on the idea of excluding the unmatched traces [4], attack algorithm can be described as follows.
Illustrated in Algorithm 1, the attack algorithm takes as input the traces that are published progressively. The algorithm first assumes that all the traces are candidate traces for each victim. A trace is said to be a candidate trace of a victim if it appears at the same set of times and locations as when/where the adversary meets the victim. As time evolves, the adversary removes candidate traces which do not agree with the observed information about each victim from the set for that victim. When a victim's trace is identified, the identified trace is removed from the candidate set of other victims. Notice that the adversary may not identify a participant at times they meet each other, but the identification can occur at a later time when all but one candidate traces are identified and removed. Hence, the adversary identifies a participant more efficiently when it tries to identify as many participants as possible.
In the scenario B1, is fixed, because the adversary always stays at one position. While changes as the adversary moves in the scenario B2. B2 has experimentally a better performance than B1. The reason is because mobile nodes typically obtain more side information than stationary nodes.  (4) while not ended do (5) for each node met at and ∈ Candidate (6) if met node at location at and ( )! = then (7) remove trace from Candidate ; (8) if |Candidate | = 1 then (9) in Candidate is the identified trace and remove it from other candidate set; (10) end if (11) + +; (12) end if (13)  one or more pseudonyms [28][29][30][31], or a group of users share the same ID [32]. However, the true identity of each vehicle is only replaced by a random identifier in VSNs, so we will not discuss the anonymous techniques. Cloaking techniques, such as introducing noise data or reducing the recorded data, also can be used to protect users' privacy. But using cloaking techniques will impact on the truth of the published information, so we should balance the users' privacy and the data accuracy required by different applications.
In VSNs, the defender's (i.e., the traffic control center or other authorities) objective is to increase the hardness for the adversary to identify a trace in the anonymous published traces. The defender can insert some bogus traces or delete the recorded data at some sampled times to achieve its objective. If { 1 , 2 , . . . , , . . .} is the set of the sampled times in the set of the published traces Tr, the defense strategies can be expressed as follows. Proof. Before the execution of D1, the trace privacy levels of nodes meet as follows: International Journal of Distributed Sensor Networks 9 After the execution of D1, the trace privacy levels of nodes meet as follows: Since log 2 | Tr + | > log 2 | Tr |, D1 provides nodes with a higher trace privacy level.

Lemma 4. If is a concave function and is a random variable,
Proof. For a two-mass-point distribution, the inequality becomes which follows directly from the definition of concave functions. Suppose that the lemma is true for distributions with − 1 mass points. Then, writing = /(1 − ), = 1, 2, . . . , − 1, we have where the first inequality follows from the induction hypothesis and the second follows from the definition of concavity.
Notice that it is reasonable to assume that the probability density function of is concave over the interval [ ( ) − , ( ) + ], because the collected side information ( ) about appears more frequently in the vicinity of . However, these defense strategies will bring some loss in data accuracy. The more the inserted bogus traces and the deleted sampled times, the more unreal the data. Hence, we assume that the defense cost is proportional to the inserted bogus traces and the deleted sampled times. We take into account the application requirements involved in the defense cost in a parameter . Then for A1, can be expressed as, where is a system parameter that indicates the requirement of an application, |Trace| is the number of all the traces, is the number of the inserted bogus traces. Similarly to A1, for A2 can be expressed as: where is a system parameter, is the number of all the sampled times, and is the number of the deleted sampled times in the sampled times.

Analysis of Attack and Defense Game
In this section, we study both the passive and active attack scenarios. In the passive attack scenarios (A1 and A2),  we consider games of complete and incomplete information. As the adversary can obtain the victims' positions accurately in the active attack scenarios (B1 and B2), the defender does not need to infer the side information the adversary obtained. Therefore, we only establish the complete information game in scenario B1 and B2.
The traffic data used in the experiments contains mobility traces of taxis in Beijing, China. It contains GPS coordinates of approximately 28,000 taxis collected in a month in Beijing [33]. The location updates are quite fine-grained the average time interval between two consecutive location updates is less than 30 sec. In the experiments, the adversary tries to identify the trace of one participant (randomly picked from all the participants) by gathering side information. The noisy randomly sampled ⟨time, location⟩ pairs from the trace are revealed to the adversary as side information, which the adversary utilizes to identify the complete movement history of the victim from the anonymous traces. The defender inserts some bogus traces (D1) or deletes the recorded data at some sampled times (D2) against the attack strategies implemented by the adversary.

Analysis of Complete Information Game.
Strategic form for complete information game in A1 is depicted as matrices, as in Table 2. Each player chooses a strategy simultaneously, and has common knowledge about the side information. We assumed that the side information contains 18 pairs of ⟨time, location⟩. In particular, since the adversary is still in scenario A1 when the defender uses strategy D2, the side information which is not at sampled times is not involved in calculating the payoff.  We observe that Nash equilibriums depend on the value of the defense cost in this game. when the defense cost is zero, that is, = 0, (MSQ, = |Trace|/2) is a purestrategy Nash equilibrium in the left strategic form. And (MSQ, = 7 /8) is a pure-strategy Nash equilibrium in the right strategic form. when ̸ = 0, from (27) and (28)  if > 3.89. In other words, = 7 /8 or = |Trace|/2 is the optimal defense strategy when is small, whereas = 0 or = 0 becomes the optimal strategy when is very large. Because a larger means a higher data accuracy, which restricts the defense measures.

Analysis of Incomplete Information Game.
Generally, the defender does not know how much side information obtained by the adversary. To solve the problem, Harsanyi [27] introduce a new player named Nature that turns an incomplete information game into a game with complete but imperfect information. We assume that Nature chooses the probability that side information contains 10 pairs of ⟨time, location⟩ is 1 , the probability that side information contains 15 pairs of ⟨time, location⟩ is 2 , and the probability that side information contains 20 pairs of ⟨time, location⟩ is 1 − 1 − 2 , as shown in Table 3. We observe that MSQ is a strictly dominated strategy for the adversary. Nash equilibriums are (MSQ, = |Trace|/2), (MSQ, = 7 /8) if = 0.
When ̸ = 0, we study which are the optimal defense strategies at different values of , . As shown in Figure 2, the maximum point of each line corresponds to the optimal defense strategy. For example, In In the incomplete information game, the adversary also prefers the strategy MSQ, while the optimal defense strategies depend on the values of , . If the defense cost is small, that is, when 0 < ≤ 1.20, the optimal defense strategy is = |Trace|/2, and when 0 < ≤ 3.51, the optimal defense strategy is = 7 /8. If the defense cost is high, that is, when > 1.24, the optimal defense strategy is = 0, and when > 4.17, the optimal defense strategy is = 0.

Analysis of Complete Information Game.
Strategic form for complete information game in A2 is depicted as shown in Table 4. The defender has the knowledge about the side information obtained by the adversary. The side information contains 18 pairs of ⟨time, location⟩. We observe that Nash equilibriums also depend on the value of the defense cost . When = 0, the pure-strategy Nash equilibrium is (MLE 2 , = |Trace|/2) in Table 4(a), and (MLE 2 , = 7 /8) in Table 4(b). When ̸ = 0, the Nash equilibrium is (MLE 2 , = |Trace|/2) if 0 < ≤ 0.82 and (MLE 2 , = 0) if > 0.82 in Table 4(a). In Table 4(b), the Nash equilibrium is (MLE 2 , = 7 /8) if 0 < ≤ 1.21 and (MLE 2 , = 0) if > 1.21. From Table 2 and Table 4, we observe that the attacks in A2 have a better performance than that in A1, because the adversary can use the side information between two consecutive sampled times. Table 5 depicts the incomplete information game in scenario A2. MLE 2 is a strictly dominated strategy for the adversary. (MLE 2 , = |Trace|/2) and (MLE 2 , = 7 /8) are Nash equilibriums when = 0. Figure 3 depicts the optimal defense strategies at different values of , . The maximum point of each line corresponds to the optimal defense strategy. In Figure 3 Figure 3(f). From Figure 2 and Figure 3, we also observe that the attacks in A2 have a better performance than that in A1.

Discussion.
In the A2 complete information game, the adversary prefers the strategy MLE 2 . When the cost of defense measures is 0 or very low, the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. Compared with A1, the method of deleting sampled times in A2 has a lower performance, but still better than the method of inserting bogus trace. When the cost of defense cost is high, the defender also prefers to do nothing.
In the A2 incomplete information game, the adversary also prefers the strategy MLE 2 , while the optimal defense strategies depend on the values of , . If the defense cost is small, that is, when 0 < ≤ 0.74, the optimal defense strategy is = |Trace|/2, and when 0 < ≤ 1.35, the optimal defense strategy is = 7 /8. If the defense cost is high, that is, when > 1.16, the optimal defense strategy is = 0, and when > 1.70, the optimal defense strategy is = 0.
In the complete information game, the adversary prefers B2 if the time that the attack algorithm performs is short, B1 if the time that the attack algorithm performs is long. It is because that when time is long, the adversary at one position can also meet other victims with a high probability. When = 0, the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. When ̸ = 0, the optimal defense strategies depend on the value of , .

Conclusion
In this paper, we analyze data privacy aspects of VSNs by using a game-theoretic model. We first quantify attack power and defense objectives for recording and comparing to the performance. Then, we define an attack and defense game model which can capture the strategy selection decision behavior of the adversary and defender. We also show the effectiveness of defense strategies. Finally, we establish and analyze the complete information and incomplete information games for passive and active attack scenarios based on real world traffic data. Through the analysis results, we show that attack strategies in different scenarios show different performances. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy. Therefore, our analysis results are useful for designing appropriate privacy protection mechanisms.