EPPDC: An Efficient Privacy-Preserving Scheme for Data Collection in Smart Grid

Different from the traditional grid, smart grid builds a real-time connection network between the user and the grid company by smart terminals, which can achieve bidirectional data transmission and information control. In smart grid, the smart meters send various information to the power generators and substations. Frequent data collection meets real-time management, but it tends to raise privacy concerns from the users about privacy information leakage. Based on the blind signature and the key distribution scheme, an efficient and privacy-preserving data collection (EPPDC) scheme is proposed for smart grid to cope with the above problems. In EPPDC scheme, the users' data information is transmitted to the local aggregator by building gateway with privacy preserving. In addition, the security analysis indicates that EPPDC scheme not only can resist replay attack, but also has source authentication and data integrity, confidentiality, unforgeability, nonrepudiation, and evolution of shared keys. Furthermore, performance analysis shows that EPPDC scheme has less computation cost than existing scheme.


Introduction
Smart grid is one of the most important public infrastructures for smart cities. It builds a real-time connection network between the user and the grid company by smart terminals and supports bidirectional data transmission and information control. Depending on it, smart cities could ensure resilient supply and delivery of energy, which help smart cities to fulfil many enhanced and innovative functions and even more efficiencies compared with traditional cities. Furthermore, smart grid can also facilitate coordination among people who are responsible for public safety and the public, such as urban officials and infrastructure operators [1]. The advantage of smart grid is attracting more and more attention and research in smart city projects. Currently, around one-third of the smart city projects are primarily focused on smart grid or other energy innovations. Almost half of smart city strategies include energy-focused projects [2].
Smart grid can support the bidirectional information flow between the power consumer and the utility provider [3]. This two-way interaction allows electricity to be generated in real-time based on consumers' demands and power requests.
As an important technique in smart cities, the advanced metering will affect not only the power sector but also other utilities such as gas/heating and water which will make use of smart meters to read and process consumption data remotely [4]. In smart cities, each house may contain a smart meter connecting to all electric appliances in the house. The utility transmits requests and commands to the smart meters and gathers and analyzes power usage data responded by each smart meter. If being leaked, that information will indicate not only the amount of energy consumed by each user but also behaviors like when they are at home, at work, or traveling [5]. Furthermore, it is possible to infer what types of home appliances are used by attackers who compromise users' home area networks. If a criminal or malicious attacker can determine when a user is not at home, they may break into his/her house at such a time. And energy information can support burglars or provide business intelligence to competitors [6]. By this information, the users' habits or lifestyles can be tracked. And a series of problems arises in case of information leakage. Thus, authentication and user privacy preservation are two important security issues on the information flow in smart grid.

International Journal of Distributed Sensor Networks
Thus, there is need to design schemes which can achieve data transmission between smart meters and smart grid provider with privacy preserving. This paper just studies the privacy-preserving data collection scheme for smart grid.

Related Works
The proposed privacy-preserving schemes about smart grid are mainly constructed by two kinds of cryptographic tools, homomorphic encryption [7][8][9] and signcryption [10,11]. By homomorphic encryption, smart meters (SMs) encrypt the messages and send them to gateway (BGW), but gateway cannot get any users' messages without the system private key. Then, gateway signs encrypted messages and sends them to control center (CC). Based on the property of homomorphic encryption, control center can make use of the system private key to recover every user's messages.
Secure data aggregation schemes in smart grid have been investigated by several researchers. Lu et al. proposed a privacy-preserving aggregation scheme [7], which is based on the homomorphic Paillier cryptosystem. But in [7], it is assumed that the session keys between SM and BGW are unchanged. Once an adversary A compromises the session keys, A can decrypt any previous response message. Based on [7], Li et al. proposed a privacy-preserving demand and response scheme with adaptive key evolution [9]. Both [7,9] make use of homomorphic encryption to achieve privacy preserving, and they can meet aggregation for some data. In addition, several researchers focused on privacypreserving aggregation in different conditions by using multiparty computation [12,13], differential privacy [14], and the aggregated pseudostatus variation [15]. As signcryption based schemes, they can complete digital signature and encryption for a message in one time. In particular, SMs signcrypt the messages and send them to gateway. Gateway cannot get any users' messages from the encrypted messages. Then, gateway signs and sends encrypted messages to control center. Control center can recover every user's messages by the shared key between CC and SMs. In [10], an identitybased signcryption scheme for smart grid was proposed. But in [10], the management of pseudonymous ID is a problem. Reference [11] adopted the pseudonym technology to achieve the user identity anonymity and adopted the signcryption to complete digital signature and encryption in one time in smart grid.
However, the smart grid needs not only to protect users' sensitive information but also to meet their demands for personalized data application with multilevel and multigranularity. Thus many users' messages cannot be aggregated, which should be sent to control center detail by detail. And existing homomorphic encryption to achieve privacy preserving is based on the computational expensive operations [7,9], which may not be desirable for smart grids with limited resources in terms of both bandwidth and computation. And the existing signcryption based schemes in smart grid [10,11] cannot meet forward secrecy.
This paper proposes an efficient privacy-preserving data collection scheme for smart grid, which is based on the blind signature and the key distribution scheme. In this scheme, users' data information is transmitted to the local aggregator via gateway, while gateway cannot get any users' messages. Moreover, this scheme can achieve forward secrecy of SM's session key, and evolution of SM's private keys.
The remainder of paper is organized as follows. Section 3 introduces models and design goal. Section 4 describes preliminaries. Section 5 presents the proposed EPPDC scheme. Section 6 shows the security analysis and the computation overhead of the scheme in this paper, respectively. Finally, Section 7 makes a conclusion.

Models and Design Goal
In this section, we give the system model, security model, and the design goal. Figure 1, smart grid is divided into a number of hierarchical networks, which is comprised of control center (CC), district area network (DAN), building area network (BAN), and home area network (HAN). The CC covers 1 DANs. For the sake of simplicity, we assume that each DAN comprises 2 BANs and each BAN comprises 3 HANs. Each HAN is assigned a smart meter (SM) enabling an automated, bidirectional communication between the CC and the HAN users. Meantime, each BAN is equipped with a gateway (BGW) and each DAN is equipped with a local aggregator (LAG). And each SM can directly communicate with LAG via the BGW.

System Model. As shown in
In this paper, the system model of smart grid contains 5 parties, including trusted authority (TA), central aggregator (CAG), LAG, BGW, and SM. LAG is the entity that can directly communicate with CAG on behalf of those geographically dispersed HANs. TA belongs to some independent organizations like Regional Transmission Organizations (RTO) or Independent System Operators (ISO). TA does the system initiation, such as generating public system parameters and assigning private key for each entity.
Then we give a partial relationship for smart grid in China, which is shown in Figure 2. The provincial operator is viewed as central aggregator CAG, and municipal operator is viewed as LAG. For example, there is focus on the Northwest China. It has a CAG located in Shaanxi and multiple LAGs dispersed in Xi'an, Hanzhong, Yulin, and other towns. And ISO, Northwest China, plays the role of TA. The provincial operator is responsible for generating and transmitting parameters of CAG, predicting flexible power demand and managing renewable generation in its province. Generally, the provincial operator can refer to municipal electricity demand curve to make power supply plan and generation dispatching plan with a day ahead. The municipal operator is responsible for generating networks parameters and aggregating the power demand. At present in China, the communication between the provincial operator and the municipal operator is used by fiber optic link which is assumed to be safe. The municipal operator can refer to load curve. According to preferred load curve, time of use (TOU) prices, and customers demand, each BAN makes bids  in the electricity market. Both municipal and BAN operator need real-time communication and data management. Usually, wireless communication is used to transfer data between BGW and SM.

Security
Model. In our security model, CAG and LAG are trusted by all parties and are infeasible for any adversary to compromise. BGW can comply with the scheme but with diligent curiosity. Thus BGW possibly gets the user's privacy information in the process of implementing the scheme. We consider the following security goals.
(1) Confidentiality: the messages sent to LAG from SM should be confidential; that is, if an adversary A captures the messages, it cannot identify the encrypted messages.
(2) Authenticity and data integrity: BGW and HAN users should be authenticated by LAG and BGW each other, respectively. Meanwhile, if an adversary A modifies the messages, the malicious operations can be detected.
(3) Privacy preservation: the users' electricity information should not be disclosed to the undesirable entities. Privacy preservation should meet the anonymous authentication and data encryption, which make attacker not able to get any information from any of the users. In smart grid, if an adversary A hacks into the database of BGWs, it cannot determine the contents of ciphertexts. In order to protect the users' privacy, even BGW cannot determine the detailed electrical information to certain users.
(4) Evolution of users' private keys: the evolution of users' private keys should be achieved. If an adversary A compromises any previous private key of a HAN user, A cannot use it currently or in the future.

Design Goal.
Under the above models, our design goal is to develop an efficient privacy-preserving scheme for data collection in smart grid. Specifically, the following two desirable objectives will be achieved.
(1) The proposed privacy-preserving scheme should achieve the message source authentication, data integrity, and the confidentiality of the messages.
(2) The proposed scheme should be cost-effective in terms of computation and communication overheads.

Preliminaries
In this section, we review bilinear pairings, hash function and HMAC [16], group key distribution scheme [17], and Nyberg-Rueppel blind signature technology [18], which will serve as the basis of the proposed scheme.

Hash Function and HMAC. A one-way hash function ℎ(⋅)
is said to be secure if the following properties are satisfied.
(1) ℎ(⋅) can take a message of arbitrary length as input and produce a message digest of a fixed-length output.
Hash-based message authentication code (HMAC) is a specific construction for computing a message authentication code (MAC) using a cryptographic hash function in combination with a secret key. Both data integrity and authenticity of a message can be achieved using such a technique. Due to the property of hash functions, an HMAC value can be computed in a much shorter time than a traditional digital signature. In this paper, we denote the HMAC value on message is HMAC K ( ) using the secret key .

Group Key Distribution
Scheme. The purpose of group key distribution is to distribute keys to selected group members so that each of the selected group members shares a distinct personal key with the group manager, but the other group members cannot get any information of the keys. In [17], the group manager broadcasted a message, and all the selected group members could derive their keys from the message. The approach of [17] chose a random -degree polynomial ( ) from [ ] and selected ( ) for each group member as the shared person key. The group manager constructed a single broadcast polynomial ( ) such that, for a selected group member , ( ) could be recovered from the knowledge of ( ) and the personal secret . But for any revoked group member, , ( ) could not be determined from ( ) and .
In [17], ( ) = ( ) ( ) + ℎ( ) was constructed by ( ) with the help of a revocation polynomial ( ) and a masking polynomial ℎ( ). The revocation polynomial ( ) was constructed in such a way that ( ) ̸ = 0 for any selected group member , but ( ) = 0 for any revoked group member . During setup phase, each group member V had its own personal secret V = {ℎ(V)}, which might be distributed by the group manager through the secure communication channel between each group member and the group manager. Thus, for any selected group member , new personal key ( ) could be computed by ( ) = [ ( ) − ℎ( )]/ ( ), but for any revoked group member , new personal key could not be computed because ( ) = 0. Specific steps were as follows.
(1) Setup: the group manager randomly picked a 2degree masking polynomial, ℎ( ) Each group member got the personal secret = {ℎ( )} from the group manager.

Nyberg-Rueppel Blind
Signature. Blind signatures enable users to obtain valid signatures on a message without revealing its content to the signer. Nyberg-Rueppel blind signature scheme was proposed by Camenisch et al., which was based on the discrete logarithm problem [18]. The scheme had three parts as follows.
(1) Setup system parameters: the system parameters consisted of a prime , a prime factor of − 1, and an element ∈ * of order . The signer's private key was a random element ∈ * , while the corresponding public key was = (mod ).
(2) Sign: Bob could obtain a valid signature on a message from Alice without revealing its content to Alice.
(a) Alice randomly selected ∈ * , computed = (mod ), and sent to Bob. (b) Bob selected , ∈ * at random and computed = (mod ) and = −1 (mod ). Bob checked whether it was satisfied with ∈ * . If this was not the case, a new ( , ) would be chosen until it was satisfied with ∈ * . Then, Bob sent to Alice.

EPPDC Scheme
Based on the blind signature and the key distribution scheme, this section gives the EPPDC scheme for smart grid. This scheme can achieve that the users' data is transmitted to the LAG via BGW with privacy preserving. In this section, we propose the scheme, which consists of five phases: system initialization, certificate issuing, user registration, data collection, and key evolution.
Then, TA computes its public key PK TA = and publishes the tuple ( , , , ℎ, PK TA ) as the system parameters.

Certificate
Issuing. During this phase, TA verifies the identity and issues the certificate for every entity. These entities include all the CAGs, LAGs, and BGWs. As an example, TA issues the certificate for a certain LAG as follows.
(1) TA chooses a random number SK LAG ∈ * as the LAG 's private key and computes the LAG 's public key PK LAG = SK LAG (mod ). (2) TA generates the signature TA,LAG , where TA,LAG = Sig SK TA (PK LAG ) is a signature on PK LAG using TA's private key SK TA . (3) TA delivers SK LAG and Cert LAG to LAG , where Cert LAG = (PK LAG , TA,LAG ). The delivery of SK LAG must be via a secure channel, such as a Secure Socket Layer.

User Registration.
Before accessing smart grid, every SM needs to get a certificate from TA and register in certain LAG which SM belongs to. Assume SM −LAG indicates that a certain SM belongs to LAG i . In Figure 3, an example of user registration for SM −LAG is as follows.

Data Collection.
When a certain LAG needs to make statistical analysis and collect energy information in its DAN, LAG broadcasts the data collection command to its subordinate BGWs . Similarly, each BGW will broadcast the data collection command to its subordinate SMs. As an example, the process of data collection from SM −LAG to LAG is as follows. (3) BGW −LAG sends the message

(I) Each BGW Collects Eligible SMs' Permits
where 2 is the current timestamp. (5) If the value of HMAC is consistent, go to Step (5). Otherwise, go back to Step (1).
where 3 is the current timestamp.
(7) When 3 − 2 < Δ , BGW −LAG verifies the correctness of received messages in Step (5) by HMAC, where Δ is the limit for time difference. Then, BGW −LAG collects all eligible SMs' permits and sends them to LAG .

(II) Generate the Shared Blind Factors between LAG and Every SM
(1) After receiving all the permits from BGW −LAG , LAG confirms the total of permits as BGW −LAG . Then LAG finds the corresponding LAG −SM in its database, where LAG −SM is the shared key of LAG with SM −LAG . And LAG generates the message { BGW −LAG , 3 BGW −LAG } by using Algorithm 2.
(2) LAG sends message In the following, SM −LAG is an example of data collection, which is the same as other SMs.

(III) LAG Collects Data from SM via BGW
where 4 is the current timestamp.
(2) BGW −LAG verifies the validity of signature in Step (1). If the signature is valid, go to Step (3). Otherwise, BGW −LAG requires SM −LAG resending the message as in Step (1)  Here we assume that every shared key is valid for one day. dimensional shared key is valid for days during the validity of permit. Accordingly, both SM and LAG can confirm the intraday shared key LAG−SM from dimensional key LAG−SM . (1) Decrypt Enc PK BGW −LAG [ 1 SM −LAG ] by BGW −LAG 's private key SK BGW −LAG .
if Sig SK LAG (PK SM −LGA ‖ TS ‖ ) is valid then shared keys of LAG with SMs which permit past validation.  (4) if HMAC is valid then find LAG −SM in its database according to PK SM −LAG . Every SM only stores the intraday shared key LAG−SM within permit validity period. The day before the expiration of , SM applies for registration in LAG again. For eligible SM, LAG will issue the new permit and form the new shared key LAG−SM to SM. When previous permit has expired, SM deletes previous permit and LAG−SM . Thus, the shared key LAG−SM evolution is achieved.

Security Analysis and Computation Overhead
In this section, we analyze the security properties and the computation of the EPPDC scheme.
6.1. Security Analysis. EPPDC scheme can achieve data collection from SM to LAG via BGW. And EPPDC scheme not only can resist replay attack, but also has source authentication and data integrity, confidentiality, unforgeability, nonrepudiation, and evolution of shared keys.
Property 1 (correctness). In EPPDC scheme, LAG can verify the blind signature of BGW and recover the message sent by SM.
Proof. During Stage III of data collection in EPPDC scheme, BGW −LAG sends the ( BGW −LAG , SM −BGW −LAG ) message −LAG and PK SM −LAG to LAG . LAG can find the corresponding LAG −SM in its database according to PK SM −LAG and the current timestamp. Then, LAG recovers the current electricity information −LAG using Algorithm 3, which is proved by The BGW −LAG 's signature on −LAG is ( SM −BGW −LAG , −LAG ), which can be proved by Property 2 (to resist replay attack). In EPPDC scheme, we assume that there is an adversary A who can intercept and capture the messages sent by SMs. When adversary A resends the messages, BGW can detect the replay attack based on the SM's signature or HMAC with the current timestamp. For the same reason, LAG can also detect the replay attack, when an adversary A resends the BGW's messages.
Property 3 (confidentiality). In EPPDC scheme, the messages maintain their confidentiality when messages are sent to LAG from SM.
Proof. During Stage III of data collection in EPPDC scheme, SM −LAG has processed the information −LAG into −LAG by blind factors LAG −SM and LAG −SM , before SM −LAG sends the messages to BGW −LAG . We consider the following game played between a challenge C and an adversary A. C runs the system initialization and sends the system parameters to A. A performs a polynomial bounded number of queries (these queries may be made adaptively; that is, each query may depend on the answer to the previous queries). By queries, A can get many couples of ( Therefore, an adversary A cannot get −LAG , even if BGW −LAG cannot get −LAG too. Thus the messages maintain their confidentiality when messages are sent to LAG from SM.

Property 4 (nonrepudiation and unforgeability). During
Stage III of data collection in EPPDC scheme, SM −LAG sends −LAG ‖ 4 and its signature to BGW −LAG , which is sent to LAG by BGW −LAG at a later step.
We consider the following game played between a challenge C and an adversary A. A performs a polynomial bounded number of adaptive queries. By queries, A can get many couples of ( −LAG ‖ 4 ) * and Sig SK SM −LAG ( −LAG ‖ 4 ) * . Based on the properties of signature [19], In the following steps, BGW −LAG signs the message −LAG via blind signature with its private key SK BGW −LAG . By adaptive queries, A can get many couples Furthermore, based on the properties of signature, the messages sent by SM −LAG and BGW −LAG are provided with the source authentication and data integrity.
Property 5 (forward security). In EPPDC scheme, the permit has a validity period . When permit is valid, both SM and LAG can noninteractively share dimensional key where is the corresponding number of days for . Thereby in days, SM and LAG use different shared key every day.
In the key evolution phase, SM deletes the previous shared key after it has computed the new shared key LAG−SM . If an adversary A compromises a HAN user's SM, it gets the current shared key ( LAG−SM ). Assume that an adversary A can perform a polynomial bounded number of adaptive queries for a challenge C. By queries, A can get many shared keys * LAG−SM . Based on the property of one-way for hash function, A cannot get any previous shared key LAG−SM making using of the current shared key ( LAG−SM ). Thus adversary A cannot compute the previous blind factor, and then it cannot get previous message sent by SM. Therefore, EPPDC scheme provides the evolution and forward secrecy of shared key LAG−SM .
Finally, we present the comparison results of security levels in Table 1. It can be seen that scheme [7] and scheme [10] achieve confidentiality, authenticity, and data integrity and scheme [9] cannot resist replay attack.

Computation
Overhead. In EPPDC scheme, every SM sends its processed messages and its corresponding signatures to BGW. BGW verifies the validity of SM's signature. For available signature, BGW signs message making use of blind signature and sends it to LAG. LAG can verify that the message is indeed sent by BGW and SM. Furthermore, LAG can recover the original message .
In EPPDC, we assume that the SM's signature can be converted into the same as the existing literatures such as [7][8][9][10][11] using bilinear pairing, which can also perform the bath verification [7]. So here we mainly discuss the computation complexity of messages turning to , the signature of BGW, verification of BGW's signature, and recovery for the original message . Because the computation complexity is similar in [7,9] which are both realized by homomorphic encryption, we only consider [7] in the following comparisons.
In EPPDC scheme, SM −LAG needs 2 exponentiation operations, 3 multiplication operations, and 1 inverse operation in * to blind message to . In [7], SM −LAG needs 2 exponentiation operations and 1 multiplication operation in 2 to encrypt message to using homomorphic encryption. In [10], SM −LAG encrypts message to making use of AES block cipher.
In EPPDC scheme, BGW can perform the bath verification. BGW makes use of blind signature to sign blinded messages which are authenticated. Here BGW only needs 1 addition operation and 1 multiplication operation in * . In [7], BGW needs 1 multiplication operation in 1 and 1 hash operation. In [10], BGW needs 1 addition operation and 2 multiplication operations in 1 .
In EPPDC scheme, LAG verifies that the message is indeed sent by BGW and recovers the original message which needs 4 exponentiation operations, 6 multiplication operations, 1 addition operation, and 2 inverse operations in * . In [7], LAG can verify that the message is indeed sent by BGW and recover the original message , which needs 1 exponentiation operation, 1 multiplication operation in 2 , 2 pairing operations, and 1 hash operation. In [10], LAG can verify that the message is indeed sent by BGW and recover the original message , which needs 2 pairing operations, 1 multiplication operation in 1 , 1 exponentiation operation in , 2 exponentiation operations in , and AES decryption. Since the AES encryption or decryption and hash function are negligible compared with exponentiation and pairing operations, here we mainly consider the computation overhead for other operations. Table 2 gives the test time for the involved cryptography operations [20]. The experiments are conducted on a computer with Intel i5-3210-2.5 GHz CPU and 4-GB RAM.
When a message is sent by SM, the comparisons of computation complexity for SM, BGW, and LAG are shown in Table 3.  When messages are sent by different SMs to the same LAG, LAG can make use of the bath verification to reduce pairing operation from 2 to + 1 [7,10].
With the exact operation costs, we depict the variation of computation costs in terms of the message number in Figures 4 and 5, which is for BGW and LAG, respectively. From the figures, it can be obviously shown that the EPPDC scheme largely reduces the computation complexity for both BGW and LAG.

Conclusions
This paper proposes an efficient and privacy-preserving data collection scheme for smart grid, which is based on the blind signature and the key distribution scheme. This scheme can achieve that the users' data information is transmitted to the local aggregator through building gateway. And we analyze the EPPDC scheme. The analysis shows that the scheme not only provides privacy preserving but also has less computation cost than existing schemes.