Perturbation-Based Schemes with Ultra-Lightweight Computation to Protect User Privacy in Smart Grid

In smart grid, smart meters are deployed to collect power consumption data periodically, and the data are analyzed to improve the efficiency of power transmission and distribution. The collected consumption data may leak the usage patterns of domestic appliances, so that it may damage the behavior privacy of customers. Most related work to protect data privacy in smart grid relies on cryptographic primitives, for example, encryption, which induces a large amount of power consumption overhead. In this paper, we make the first attempt to propose solutions without any cryptographic computation to protect user privacy. The privacy in smart grid is formally defined in the paper. Three schemes are proposed: random perturbation scheme (RPS), random walk scheme (RWS), and distance-bounded random walk with perturbation scheme (DBS). Three algorithms are also proposed in each scheme, respectively. All schemes are ultra-lightweight in terms of computation without relying any cryptographic primitive. The privacy, soundness, and accuracy of proposed schemes are guaranteed and justified by strict analysis.


Introduction
Smart grid is a typical application of Internet of Things, M2M, or IP-based sensor networks.It has been envisioned as a key method to reduce the emission of carbon dioxide and retard climate changes, by improving the efficiency of power distribution and transmission.
Smart grid relies on smart meters to collect power consumption data at user ends instantly.Smart meters report the power consumption data periodically to smart grid control center (SGCC).SGCC thus can allocate necessary power distribution and schedule required power transmission.In addition, the SGCC can relocate the power requirements at user ends by delivering power price to users.Users thus can schedule the usage of their household appliances according to the forthcoming price.
As smart meters report the power consumption data periodically, the data may leak user privacy in daily life.For example, the data may be used for deducing user behavior patterns, such as when she gets up according to the data of using microwave oven or toaster in the morning, when she goes back home according to the data of using electric stove for cooking at afternoon, or when she takes bath or goes to bed at night according to the data of using water heater or lamps.Such privacy concerns have already been acknowledged and reported by NIST [1] and significantly affect the deployment of smart meters.
Although there exist several privacy protection or security improvement for smart grid currently [2][3][4][5][6], most of them rely on cryptographic primitives, for example, encrypting the uploading data at smart meters.Cryptographic operations are usually not lightweight, so that they will induce extra power consumption at smart meters.In addition, the data uploading may occur frequently and periodically, so the computation for data encryption occurs extensively.For example, data are uploaded to SGCC once in 10 minutes.The encryption for the data has to be 144 times a day.Thus, the energy consumption for encryption computation International Journal of Distributed Sensor Networks would be large for a month even at single smart meter.Moreover, the extra power consumption will be accumulated to an unsatisfactory waste, because the number of smart meters in smart grid is huge.Furthermore, the decryption computation at SGCC has to be conducted if the uploading data are encrypted at smart meters.The energy consumption of decryption at SGCC will thus extremely increase.Last but not least, the smart meters usually have resource and power constraints, like traditional sensors.As the privacy protection must be conducted at smart meters, any computation for privacy protection should cost low energy to tackle these constraints.The frequent encryption operations are undesirable.Even though the encryption is lightweight in certain situations, the key management for encryption is also a difficult issue for deployment.Therefore, privacy protection by encryption unfortunately contradicts the intention of smart grid for saving energy; an ultra-lightweight method without any cryptographic computation for privacy protection is mandatory for a long run and a large scale.
In this paper, we propose perturbation-based schemes with ultra-lightweight computation without any cryptographic computation.Besides, we strictly and formally define and proof its privacy protection strength.We adapt a rigorous method to state, present, and analyze the privacy protection achievements.All our presentations strictly follow the formal expressions for better clarity and generality.
The contributions of the paper are listed as follows: (i) we propose ultra-lightweight privacy-protection schemes in terms of computation (and thus energy consumption) without any cryptographic computation; (ii) we strictly define the requirements on privacy, soundness, and accuracy in smart grid and proof the guarantee of those requirements.
The rest of the paper is organized as follows.In Section 2 we discuss the basic assumption and models used throughout the paper.Section 3 provides the detailed description of our proposed models and analysis.Section 4 gives an overview on relevant prior work.Finally, Section 5 concludes the paper.SM computes power consumption data and uploads them to SGCC periodically.The period for computing power consumption data at SM is called sensing period.The period for uploading power consumption data to SGCC is called uploading period.Without loss of generality, suppose the sensing period and uploading period are both  minutes.The sensing times and uploading times in a day will thus be  = [24 * 60/].The total sensing data for a day are denoted as a set   = { 1 , . . .,   }.The total uploading data for a day are denoted as a set   = { 1 , . . .,   }.If SM does not hide   ,   will be the same as   .

Problem Formulation
In smart grid, utility price may vary in different time slots.The price information is delivered by SGCC in advance.Users use such information to guide the power consumption.SM receives such information to calculate utility charge in a month for users.Suppose the prices for  uploading periods in a day are denoted as a set  = { 1 ,  2 , . . .,   }.Thus, the total utility charge for a day is ∑  =1   *   .The total utility charge for a month is the summation of charges for all days in this month.If the sensing data are changed into the uploading data for protecting privacy, the total utility charge for a day should be remained correct.

Attack Model and Trust Model.
Only adversaries who attack user privacy are considered in this paper.Adversaries can eavesdrop the channels between SM and SGCC; those are denoted as A  .Adversaries at SGCC can access all uploading data by SM; those are denoted as A  .Both adversaries desire to deduce the user behaviors in a day by analyzing the uploading data from SM, namely,   .As A  and A  have the same view on   , we further do not distinguish those two adversaries.Both are denoted by the same notation A.
SGCC is untrustworthy, as we assume adversaries at SGCC are interested in user privacy.SM should be trustworthy.It is a prerequisite for any further discussion, sensing data are at SM, and all possible solutions are conducted at SM. Besides, if SM is untrustworthy, users will not choose them.SM can be easily evaluated and authorized by a Trusted Third Party (TTP).

Security Definition and Design Goal.
Informally speaking, the privacy is guaranteed if the adversaries (not only at SGCC but also at channels between SGCC and SM) cannot deduce the user activities in a day.More specifically, we formally state the privacy requirement definition as follows.
Definition 1. User activities.They are the activities that damage user privacy and are related to using one or multiple household appliances in a daily life.They are denoted as a set  = { 1 ,  2 , . . .,   }, where   ( = 1, . . ., ) is an activity related to one or multiple appliances.
where Pr{ : } denotes after viewing ""; the probability of event "" happens; "⇐" means "is selected from"; ", " means two operations happen consequently; "s.t." is a shorthand for "such that." Definition 4. Computational full privacy (denoted as Privacy  full ).Given anyone   ∈   , it is computationally infeasible for any Probabilistic Polynomial Turing Machine (PPTM) adversary A to find   ∈ , such that (  ,   ) ∈ .That is, where negl() is a negligible function with security parameter .
Claim 1. Perfect (computational) full privacy can protect user privacy on all user activities in a day, as no activity can be deduced from data in   by any (PPTM) adversary.
In previous claim the content in "()" is corresponded with each other.Similarly, the perfect (computational) partial privacy can be defined in the following.Definition 5. Perfect (computational) partial privacy, denoted as Privacy () partial .Given at least one   ∈   , it is computationally infeasible for any (PPTM) adversary A to find   ∈ ; such that (  ,   ) ∈  after viewing   .Besides, given at least one   ∈ , it is computationally infeasible for any (PPTM) adversary A to find   ∈   , such that (  ,   ) ∈  after viewing   .That is, Claim 2. Perfect (computational) partial privacy can protect certain privacy-sensitive activities, as these activities cannot be deduced by   by any (PPTM) adversary.
Claim 3. Full privacy has stronger strength than partial privacy in terms of the number of deducible data in   .Perfect privacy has stronger strength than computational privacy due to the adversary's ability.That is, where " < " means that the privacy protection strength of "A" is weaker than that of "B".
Roughly speaking, full privacy protects all activities; partial privacy protects partial activities.Perfect privacy defends against any adversary; computational privacy defends against any PPTM adversary.As perfect full privacy has the strongest privacy strength, we thus concentrate on the perfect full privacy protection in the following.Definition 6. Full privacy attacking experiment on the scheme Π defending against any adversary A-ExpPrivacy ,A,Π full is defined as follows: (1) the scheme Π is executed in the presence of any adversary A; (2) A fully accesses   , , and .Given any   ∈   , if A can find   ∈ , such that (  ,   ) ∈ , A outputs 1, otherwise, outputs 0; (3) if and only if A outputs 1, the experiment outputs 1.
Definition 7. The scheme Π that can guarantee the perfect full privacy in presence of any adversary A (denoted as Privacy ,A,Π full = 1) is defined as follows.For any adversary A that the scheme Π defends against, the probability that the output of the full privacy attacking experiment equals one is 0. That is, if and only if Therefore, the design goal is to propose a scheme Π satisfying Privacy ,A,Π full and importantly, with ultra-lightweight computation without any cryptographic computation.

Problem Reduction.
To protect the privacy of sensing data   , a naive method is encrypting them at SM and then uploading them to SGCC.As SGCC is untrustworthy, SGCC cannot decrypt them and has to consult a TTP.The TTP decrypts the data, and the result cannot be sent to SGCC.The TTP should compute accumulative values (or metadata) and send them to SGCC for further scheduling and charging.It obviously arises multiple overheads: a large volume of computation overhead at SM; extra communication overhead at SM and SGCC; extra entity TTP; key management overhead between SM and TTP.
As SM is trustworthy, SM is proposed to equip a trusted mixing layer between sensing layer and communication layer.That is, SM is modeled as three tuples: ⟨  ,   ,   ⟩, where   is a sensing layer computing the power consumption periodically.The output of layer   is   ;   is a mixing layer that transfers   into   ;   is a communication layer that uploads   to SGCC.That is, where "::=" means "is defined as"; "⇒" means "data transferring between layers";  is a data transforming function; " → " that means the input of the function  is transformed into the output of the function .Therefore, it becomes the International Journal of Distributed Sensor Networks concentration to search an ultra-lightweight transformation function  with Privacy ,A, full = 1 in the rest of the paper.Definition 8. "Bad" data set (  ).It consists of all power consumption data that can deduce to one or multiple activities in .  = { 1 ,  2 , . . .,   }, where  is the total number of   ∈   ( = 1, . . ., ).
The characteristics of   , , and deduction relationship set  are as the following.
(1) Without loss of generality,   is a sorted set of positive numbers.That is,  1 <  2 < ⋅ ⋅ ⋅ <   . 1 is equal to or greater than the power consumption of the minimum power consumption appliance in a period.  is equal to or less than the power consumption of all appliances in a period.
(2) Any   ∈   ( = 1, . . ., ) may represent the usage of one appliance in a period.For example,  1 (30 wh) is the power consumption of a lamp for a period. 1 is related to an event (e.g.,  1 ) that means the lamp is on in the period.
(3) Any   ∈   ( = 1, . . ., ) may also represent the usage of multiple household appliances.For example,  9 represents two household appliances used simultaneously. 9 =  1 +  2 , where  1 is the power consumption of the lamp in a period;  2 is the power consumption of the washing machine in the period.Thus,  9 means using lamp and washing machine simultaneously in the period.
(4) Similarly, any   ∈  ( = 1, . . ., ) may represent the usage of one appliance or multiple household appliances simultaneously. ( In other words, mapping   →  is not a function, and mapping  →   is a surjective and not a injective function.Definition 9.After transformation , the privacy of   is guaranteed (denoted as Privacy    = 1).
Definition 10.After transformation , the soundness of   is guaranteed (Soundness    = 1).The utility summation remains unchanged.That is, Due to the concentration in the rest of the paper, the research problem is reduced to as follows: given   , find an ultra-lightweight transformation  :   →   , such that the privacy and soundness of   are both guaranteed.That is, given   , find  :   →   , s.t.Privacy    = 1 and Soundness    = 1.Next, we propose a family of schemes to solve the problem.We list all major notations used in the remainder of the paper in Table 1.

Random Perturbation Scheme (RPS).
We firstly propose a basic scheme-random perturbation scheme (RPS) to illustrate our motivations.In RPS, any   ∈   is perturbed into a new value in the middle of   and  −1 or in the middle of   and  +1 .The two cases are selected randomly.A Random Perturbation Algorithm called RPA is proposed for transformation  as follows.

Analysis of Algorithm 1
Proposition 11.After the transformation of algorithm RPA, the soundness of   is guaranteed.(Soundness    = 1.)

Random Walk Scheme (RWS).
If the gap between   and  +1 ( = 1, . . ., −1) is small, the perturbation (namely, ) in RPS will be small.It can be proofed as a claim in the following.Claim 4. If the gap between   and  +1 ( = 1, . . .,  − 1) is small, the perturbation in RPS will be small.

Proof. Suppose max(|𝑑𝑏
If the perturbation is small, adversaries may guess the   correctly, and adversaries can guess the activity is either of two activities.To address this issue, we propose a random walk scheme called RWS in which   ∈   randomly jumps to a value in   .In this case, the privacy definition is extended to include unlinkability, in which the possibility of   ∈   for   is equal.Thus, the revealed user activity occurs with equal possibility.Definition 14.After transformation , the privacy of   is guaranteed (denoted as Privacy    = 1), if The definition for privacy is thus extended to include the definition here and Definition 9.

Analysis of Algorithm 2
Proposition 15.After the transformation of algorithm RWA, the soundness of   is guaranteed.(Soundness    = 1.) Proof.The proof is similar to the proof of Proposition 11.
As ∑  =1   *   = ∑  =1   *   , the total cost of power consumption in a day maintains the correct value.Thus, the soundness of RWA is guaranteed.Proposition 16.The scheme RWS is ultra-lightweight.
Proof.The number of loops is  − 1, so algorithm RPA is ultra-lightweight.The computations in loops are only simple operations such as modulo, minus, plus, and multiplication.
Moreover, algorithm RWA is more lightweight than algorithm RPA.Thus, scheme RWS is ultra-lightweight.

Distance-Bounded Random Walk with Perturbation Scheme (DBS).
In smart grid, the uploading data will be used as a feedback for future scheduling of distribution and transmission.It thus requires the uploading data can accurately present the power consumption (namely, sensing data).However, thanks to the power distribution and transmission serve not for a single SM, but a large number of SMs (e.g., a campus, a community, or a county scale), only the accuracy for a scale of SMs is sufficient for scheduling.
In RPS and RWS, although the bias exists (that is, uploading data is not equal to sensing data) at single SM, the uploading data for a large number of SMs can still represent power consumption in a scale.More specifically, the deviation between the summation of uploading data and the summation of sensing data is randomly positive or negative in one SM, thus the overall summation remains almost unchanged in expectation in a large scale.It is explained as follows.
Definition 18.After the transformation , the accuracy of   is guaranteed in expectation for a scheduling area (denoted as Accuracy    = 1).The summation of   equals the summation of   , in scheduling area and scheduling period.More specifically, suppose that each scheduling period consists of  sensing (uploading) period and each scheduling area consists of  SMs.The uploading data for them is SUM  = ∑ The expectation of both is equal, as the expectation of  is 0 in a scheduling area.That is, SUM  = SUM  , as  = 0, where  means the expectation of .
To further guarantee the scheduling accuracy, we propose a distance-bounded scheme, in which the perturbation value (i.e., ) is bounded.The accuracy is thus guaranteed within a threshold value.It takes the advantages of former two algorithms RPA and RWA.A distance-bounded algorithm (DBA) for the transformation  is proposed as follows.

Analysis of Algorithm 3
Proposition 20.After the transformation of algorithm DBA, the soundness of   is guaranteed.(Soundness    = 1.) Proof.The proof is similar to the proof of Propositions 11 and 15.
Proposition 21.The scheme DBS is ultra-lightweight.
Proof.The proof can be reduced to the proof of Propositions 12 and 16.Proof.The schedule accuracy is the deviation between the summation of uploading data and the summation of sensing data.As it is proofed in Proposition 19, it depends on the number of SMs in the schedule area and the number of sensing (uploading) period in the schedule period.The expectation value is proofed to be 0, as the expectation of  is 0. Concerning the accuracy of one schedule period, the maximal bias between the summation of uploading data and the summation of sensing data is bounded by  *  * .

Related Work
The security architectures and overall security requirements in smart grid were discussed in the recent years [3,7].Currently, the privacy issue in smart grid starts to attract more attentions.The requirements of privacy were explored in some previous works [8][9][10][11].They pointed out the importance International Journal of Distributed Sensor Networks and urgency of privacy issues.Efthymiou and Kalogridis proposed a privacy protection scheme via anonymization of data [12].Their work relied on Escrow and Public Key Infrastructure (PKI); thus the flexibility and scalability may be tampered.Tomosada and Sinohara proposed to use virtual energy demand to estimate the energy load and protecting consumer privacy [13], but the estimation may take much computation overhead, and accuracy may be damaged.Lu et al. [10] proposed an efficient and privacy-preserving aggregation scheme (EPPA).Their scheme relied on homomorphic Paillier cryptosystem and induces much computation overhead.Cheung et al. [14] proposed a credential-based privacypreserving power request scheme for smart grid, which relied on an advanced cryptographic primitive-blind signature.He et al. [15] proposed to use homomorphic encryption for smart grid communications.Comparing with all aforementioned related work, our final scheme does not rely on any cryptographic primitive but fulfils provable privacy and restrains ultra-lightweight in computation.

Conclusions
In this paper, we proposed three schemes to protect user privacy in smart grid without any cryptographic primitive and with ultra-lightweight computation.They are random perturbation scheme (RPS), random walk scheme (RWS), and distance-bounded random walk with perturbation scheme (DBS).We also proposed three algorithms for three schemes, respectively.Our schemes do not rely on any cryptographic computations, are sound in terms of maintaining the correct utility charge, can guarantee the privacy that were strictly proofed, and can ensure the scheduling accuracy in power transmission and distribution.All proposed schemes and algorithms were extensively analyzed, which justified their applicability.
2.1.Network Model.Two major entities exist in smart grid: smart meter (denoted by SM hereafter) and SGCC.
, because such   may be the power consumption for multiple appliances, and those appliances may have the same power consumption in total.For example,  9 =  1 +  2 =  3 +  4 . 9 is related to  5 ,  6 ∈ , where  5 means using lamp and washing machine simultaneously and  6 means the usage of the other two appliances.

𝑖 end if end for end for 𝑑𝑢
⇐   + /  //For soundness Algorithm 1: Random Perturbation Algorithm-RPA.Proof.The biases of   ( = 1, . . .,  − 1) comparing to   (1, . . ., −1) are accumulated into a total value . is changed into extra power consumption and added to the last one   .Thus, ∑  =1   *   = ∑  =1   *   .The total cost of power consumption in a day maintains the correct value, so Soundness RPA   = 1.It is clear that for all   ∈   ,   = (  ) ∉   .Thus, Privacy RPA   = 1.According to the definition of the perfect full privacy, Privacy Proof.As algorithm RPA is ultra-lightweight, the number of loops is ( − 1) * ( − 2).The computation in each loop is only simple operations such as modulo, minus, plus, division, and multiplication.The computation complexity of algorithm RPA is ( * ).Proposition 13.The scheme RPS can guarantee the perfect full privacy.(Privacy ,A,  = 1.)Proof.
=1 SM  , SM  = ∑  =1   .The sensing data for them isSUM  = ∑  =1 SM  , SM  = ∑  =1   .,   , BOUND Ensure:   , Privacy DBA   = 1, Soundness DBA   = 1   ⇐ (  ) for  = 1 to  − 1 doThe accuracy of   is guaranteed, if and only if SUM  = SUM  .After the transformation  or , the accuracy of   is guaranteed in expectation for a scheduling area.(Accuracy ‖ Proof.In each sensing (uploading) period,   is changed into   at single SM. =   −   .Suppose that each scheduling period consists of  sensing (uploading) period and each scheduling area consists of  SMs.The uploading data for them is SUM  = ∑  =1 SM  , SM  = ∑  =1   ; the sensing data for them is SUM Required: