SFSDA: Secure and Flexible Subset Data Aggregation with Fault Tolerance for Smart Grid

,


Introduction
As technologies such as artificial intelligence, 5G communications, big data, and more become widely available in various fields [1], the smart grid is gradually replacing traditional grids to provide efficient and reliable uninterrupted energy supply to households and businesses, with advantages such as dynamic electricity distribution, automatic pricing, and automatic troubleshooting [2].
In SG, device entities can interconnect and communicate with each other, which greatly increases productivity and automation [3][4][5]. However, many security issues in SGs pose a potential threat to personal privacy as well [6][7][8]. In smart grid, smart meters (SMs) installed in homes collect electricity consumption data from users in real-time and periodically submit this data to a control server (CS). CS collects and analyzes electricity consumption data from all users so that it can forecast electricity demand and adjust prices. While real-time individual electricity consumption can improve the quality of CS and user services, it is also capable of being used by malicious entities and inferring the user's privacy. For example, analyzing many intimate details of users' daily life through electricity consumption, such as the time of departure and return of the occupants, used appliances, and the use time of appliances [9], seriously threatens personal privacy. Therefore, it requires challenging efforts to balance the tradeoff between data sharing and data privacy to enable the convenience advantages brought by smart grid [10].
Privacy-Preserving Data Aggregation (PPDA) introduces the concept of edge computing to reduce data latency and communication bandwidth between edge devices [11]. To effectively protect personal privacy, it aggregates all the electricity consumption transmitted by SMs and then transmits the aggregated data to CS. As a result, CS no longer collects a single user's electricity consumption, but the sum of all users' consumption. This allows CS to analyze and process data without knowing individual data, taking full advantage of the convenience of smart grid. In recent years, the fog computing architecture is used in various data aggregation schemes to further improve aggregation efficiency while protecting personal privacy due to the powerful computing and storage capabilities of the Fog node (FN) [12]. Researchers use a variety of techniques for data aggregation that preserve privacy, such as homomorphic encryption, differential privacy, and blind factor. Although existing data aggregation schemes [13][14][15] are able to guarantee users' privacy, most do not consider fine-grained requirements. In practical applications, CS needs not only to collect and analyze the total electricity consumption from all customers, but also to get more detailed and rich statistical characteristics through more detailed electricity consumption. To better determine generating capacity and pricing, for example, CS requires the total electricity consumption and the total number of customers within a given data range. As a result, the utility of the data will be greatly enhanced if the aggregation scheme meets the fine-grained requirements.
Various multi-subset data aggregation schemes [16][17][18] have been proposed for the past few years to achieve fine-grained requirements. Zhang et al. [19] found that proper subset adjustment allowed CS to gather data in a more focused way. For instance, during the day of the working day, the subset ought to be focused on a low-power interval range. Whereas, they found that existing multisubset aggregation schemes cannot meet the requirement of dynamically adjusting the subset. If the subset is adjusted, the system parameters need to be updated again in their schemes. To address the shortcomings of the above multi-subset aggregation scheme, Zhang et al. [19] first proposed a novel flexible subset data aggregation (FSDA) scheme. In their scheme, The total interval (0, R] is divided into k consecutive intervals [0, R 1 ), [R 1 , R 2 ), . . . , [R k−1 , R k ), and CS is able to dynamically change the value of k and R j (1 ≤ j ≤ k) according to the requirements. Specifically, firstly, the electricity meter SM i of the i-th (1 ≤ i ≤ n) user constructs k data slots and fills the slots in the corresponding range with electricity data. The filled data M i is sent to FN as plaintext after encryption. Secondly, FN aggregates the encrypted data. Finally, CS decrypts the aggregated data to get the total electricity consumption of the interval corresponding to k slots. As a result, the scheme can display the extent of dense data distribution in more detail while coarsely displaying the extent of sparse data distribution with higher data utility. However, we launched an effective attack on FSDA. When SM i needs more than one Paillier ciphertext to accommodate k slots (for example, k > 15 in FSDA scheme), only one of Paillier ciphertexts will be filled with the electricity data, while others are the Paillier ciphertext of 0. Then, the malicious FN can decrypt the Paillier ciphertext filled with electricity data by using the Paillier ciphertext without data filling, thus completely obtaining individual electricity consumption. In addition, FSDA has not considered the issues of fault tolerance. Some SMs may inevitably fail due to limited-service life, network failure, or natural hazard. Thus, a data aggregation scheme must be fault-tolerant if it is to be applied in practice, meaning that even if some SMs fail, it will still work. Therefore, it remains challenging to design a secure and fault-tolerant data aggregation scheme. We aim at solving this challenge.

Contributions:
The first flexible multi-subset aggregation scheme with both security and fault tolerance is proposed. The main contributions are summarized as follows: (1) We find a potential security flaw in FSDA [19] scheme for personal privacy disclosure. Specifically, when the SM submits more than one Paillier ciphertext (for example, when k > 15 in FSDA scheme), the attacker can use the Paillier ciphertext itself to recover the plaintext, that is, the user's electricity consumption. Our proposed attack to recover the single electricity consumption is feasible from the theoretical proof and experimental results.
(2) A secure and flexible multi-subset aggregation improvement scheme is proposed. Based on FSDA scheme, different hash functions for different Paillier ciphertexts of a single SM are introduced to avoid the correlation between different ciphertexts of the SM, so even if the SM generates more than one Paillier ciphertext, revealing Paillier ciphertexts will not result in a single electricity consumption. Experiments show that the improved scheme enhances privacy while maintaining computational efficiency.
(3) A first flexible subset data aggregation scheme with fault tolerance was constructed. We combine the fault tolerance method proposed in [14] with the flexible multi-subset aggregation improvement scheme to improve application capability in real-world scenarios. Experimental results show that our scheme is computationally efficient for multi-subset aggregation with fault tolerance.
The rest of this paper is organized as follows. Section 2 discusses related works. Section 3 briefly reviews the FSDA scheme and introduces the extended Shamir's threshold secret-sharing scheme (tSSS). Section 4 shows the security flaws of FSDA, and then presents an attack method and experimentative verification. Section 5 describes the improved SFSDA in detail, followed by security analysis and performance evaluation in Sections 6 and 7, respectively. Finally, a conclusion is drawn in Section 8.

Related Works
Data aggregation in the smart grid not only protects data privacy, but also provides useful electricity consumption information for CS, which has become a research hotspot for privacy protection in smart grid in recent years. There are many techniques for achieving data aggregation to protect data privacy, such as homomorphic encryption [20,21] and differential privacy [22,23].
Homomorphic encryption allows data to be computed on the ciphertext, and the result of the additive operation on the ciphertext is equal to the result of adding plaintext and then encrypting it. Therefore, the aggregator can directly aggregate all the encrypted electricity consumption data submitted by SMs without knowing the raw electricity consumption data [24]. In recent years, Liu et al. [25], Ding et al. [26], Zhao et al. [27], Gope et al. [28], Li et al. [29], and Zhang et al. [30] have used homomorphic encryption for data aggregation to protect data privacy. In [29] and [30], aggregators use homomorphic encryption technology to achieve efficient privacy-preserving data aggregation, but the drawback of this scheme is that malicious aggregators have access to raw data. To prevent internal attacks, researchers typically combine homomorphic encryption technology with blind factors for data aggregation. Fan et al. [31] proposed the first data aggregation scheme for internal attackers by using homomorphic encryption and blind factor. However, the scheme only works in the case of short plaintext data, and Bao et al. [32] found that the scheme was not resistant to the public key replacement attack, because both public/private key pairs are generated by users themselves. In addition, differential privacy is often used for data aggregation. Liu et al. [33] protected the raw data by adding random noise, but the aggregation result obtained by this scheme is not accurate enough. Song et al. [34] propose a DMDA scheme. In their scheme, a set of known sum random numbers is used instead of random noise to obtain accurate aggregated data.
However, none of the above schemes is fault tolerant. To improve robustness and practicability, researchers have done in-depth research on fault-tolerant data aggregation. Xue et al. [15] proposed a fault-tolerant data aggregation scheme that does not rely on trusted authority (TA). However, once the failed SM returns to normal, its original secret key, including shares, must also be changed via the secure channel, which will lead to serious communication overload and high delay. Lyu et al. [35] proposed a data aggregation scheme with fault tolerance that requires the support of TA, which has low computational overhead. However, the fault tolerance of the scheme requires the protocol participants (FN, TA, and the cloud) to carry out an extra round of communication, resulting in a large communication cost. Saleem et al. [36] proposed a FESDA scheme with non-interactive faulttolerant methods. In their scheme, CS prepares the equivalent ciphertext of all SMs in advance. If the SM fails, CS can decrypt aggregated data using the equivalent ciphertext of the failed SM. However, Wu et al. [14] found a security defect in FESDA scheme and demonstrated that CS can abuse equivalent ciphertexts of SMs to recover any SM's reading. To remedy this defect, they introduced an extended Shamir's threshold secret-sharing scheme (tSSS) and constructed FPDA, which is an improvement on FESDA scheme, and demonstrated its computational efficiency.
In addition to adding fault tolerance to improve the practicability of the scheme, the multi-subset aggregation has been extensively studied for the past few years as a means to enhance the functionality of aggregation schemes. In 2015, Lu et al. [17] realized two-subset aggregation for the first time based on Paillier homomorphic encryption. In the same year, Erikin [37] realized multi-subset data aggregation using Chinese remainder theorem and homomorphic encryption. To support users to join and quit dynamically, Liu et al. [38] constructed 3PDA scheme, in which, users can dynamically join the virtual aggregation domain of data aggregation. In [16] and [39], their proposed schemes are TAindependent that support users to join and exit dynamically without complicated initialization.
To obtain more detailed and targeted data with better statistical characteristics, Zhang et al. [19] recently proposed the first flexible subset aggregation scheme FSDA. In some situations, however, their scheme may reveal personal information.
The flexibility and fault tolerance described above are both important in multi-subset aggregation. To the best of our knowledge, there is no multi-subset aggregation both supports fault tolerance and flexible subset data aggregation. Our goal is to construct a secure flexible subset data aggregation with fault tolerance for smart grid.

Preliminaries
Some preliminaries required for SFSDA scheme are introduced in this section including FSDA scheme and extended Shamir's tSSS.

Brief Review of FSDA Scheme
As shown in Fig. 1, the system model consists four entity types: (1) TA: it is responsible for generating parameters for the system and sending private keys to CS and SMs respectively. TA will remain offline once the system is up and running. (2) SMs: each SM collects the multidimensional data, performs cryptographic operations, and then sends a report to FN.
and t is defined as the submission time interval.
FN calculates the size of each slot data l j + d, where l j = | R j · n|, d = |n|. FN then sends ( R j , Δt, l j +d) to each SM i .

Individual Data Encryption
SM i constructs a data slot, where the size of a slot (j) is l j + d, and each bit of slot data is set to 0 by default. Then, "1" and data D i − R j−1 are filled in the first and second part of slot (j), respectively, where D i ∈ R j−1 , R j is the reading of SM i . The data structure of plaintext M i used for encryption is shown in Fig. 3. Finally, SM i uses the private key s i to calculate where H is the SHA-256 hash function.
Remark: A ciphertext c i is originally computed by c i = g M i · H (T) N·s i modN 2 , and the author of FSDA proves by reasoning that due to the element g = N + 1, the calculation of c i can be transformed into Eq. (1). This method can reduce the calculation time, and has also been adopted in PPSO [40] and PPMA [18]. (2) Then, FN sends C to CS.

CS calculates
And CS gets M = D j1 = M j1 is the number of households in the j-th subset, D j2 = M j2 +R j−1 · M j1 is the corresponding total consumption date in the j-th subset.

Subset Adjustment Phase
CS resets the submission setting when a subset or interval time needs to be adjusted. First, CS redistributes [0, R) into k new continuous intervals, resets the submission setting R j , t , and sends it to FN. Then, FN updates the ( R j t , l j + d) and sends it to the SM i . Finally, SM i reconstructs the data M i .

Extended Shamir's tSSS
Shamir's k, n tSSS refers to sharing secrets among n participants, in which any k or more participants can reconstruct secrets, on the contrary, less than k participants cannot reconstruct secrets.
In this section, we first introduce the original Shamir's tSSS [41], and later the extended Shamir's k, n tSSS for our scheme.

Shamir's tSSS
Step 1: Share distribution. Given that P is a large prime number. We usually call a person who needs to share a secret S ∈ Z P the dealer. First, randomly selects k − 1 elements and makes a 0 = S. The dealer then builds a polynomial Given that x i is the serial number of participants Step 2: Secret recovery. Take k shares into the following equation: where

Extended Shamir's tSSS
To achieve fault tolerance for flexible subset data aggregation, we use extended Shamir's k, n tSSS, and it works by three steps, as described below.
Step1: Setup. Choose a large prime number P, thus the corresponding finite field is GF (P). Let N = p·q, where p and q are two large prime factors of P −1. Let g be a generator of order N in GF (P).
Step2: Share distribution. The dealer calculates the share y i of the secret S under the module N, and assigns y i to the participant ϒ i . And then, let the new secret that the dealer wants to share at time T be S and h = H (T) ∈ Z is a blind factor about time. Compute Δ T = S −g h·S modP and make it public.
Step 3: Secret recovery. Our goal is to recover S at time T. First, the collector obtains the blind share Y i =g y i ·h modP of k or more participants and calculates where k ∈ Z. Since g N modP = 1, the final equation of Eq. (6) holds. Then, the collector computes S = S + Δ T to restore the secret S at time T.
It is worth noting that the extended k, n tSSS scheme is master key secure and reusable. There's no way for the collector to get any information on the initial secret S due to the difficulty of recovering S is equivalent to the difficulty of solving the discrete logarithm problem. And due to the constant updating of time T, the share provided by each participant is different in the relevant interval of time series. Therefore, the extended k, n tSSS scheme is reusable.

Attack on FSDA Scheme in a Specific Scenario
In this section, the attack method in a specific scenario for FSDA is designed and verified by experiments.

Design of Attack Method
In FSDA scheme, to realize flexible multi-subset aggregation, SM i constructs data M i according to the requirements of CS, and then fills the electricity consumption data into a slot of M i . If one M i is not enough to divide the number of slots to meet the requirements of CS, then SM i will construct additional data to divide more slots, in this case, the constructed data is M i = (M i1 , M i2 ). Take the parameter values set by the author of FSDA scheme as an example, where |p| = |q| = 512, the maximum electricity consumption is limited to 100 Wh, and the maximum total number of users is not more than 5000. Then, at this time, a Paillier ciphertext can accommodate up to 15 subsets. If 16 or more subsets need to be set, SM i constructs multiple data, taking the construction of M i = (M i1 , M i2 ) as an example, and then fills electricity consumption D i into one of the constructed data. Suppose D i is filled in M i1 , then M i2 = 0, and c i = (c i1 , c i2 )= (1 + NM i1 ) · H (T) N·s i modN 2 , H (T) N·s i modN 2 can be obtained from Eq. (1). It is not difficult to find that two parts of data in the ciphertext c i contain H (T) N·s i at the same time, and our attack finds the potential security flaw of FSDA scheme from this correlation in the ciphertext.
In the particular case described above, suppose that the goal of FN is to obtain a single electricity consumption D * i from a user u * i at time T * . Our attack works by the following steps.
Step 1: Malicious FN receives the ciphertext c * i submitted by u * i .
Theorem 1 shows that the multiplicative inverse element of c * i2 always exists under given parameters. Since H (T * ) is less than p and q, we can easily get gcd(H(T * ), p · q) = 1 and gcd(H(T * ) Ns i , N 2 ) = 1.
Therefore, we can conclude that H (T * ) N·s i contains at least one factor p, which contradicts the conclusion gcd H (T * ) N·s i , N 2 = 1. Thus, gcd c * i2 , N 2 = 1.
Step 4: FN gets M * i1 by M * i1 = (V − 1)/N, thus obtaining the electricity consumption D * i in M * i1 . Our attack on FSDA is shown in Fig. 4.

Experimentative Verification
We implemented FSDA scheme in java and attacked it. The experimentation is performed on a computer with an Intel core i7 CPU (2.8 GHz) running Windows 10 and 16-GB RAM. In our experiment, the number of users is set to 10000, and SMs encrypts the data with their respective private keys and transmits it to FN. After FN gets the c i of all users, it executes our proposed attack method. Experiments show that the electricity consumption of all users can be restored correctly, and the average recovery time is 372 us. Therefore, our proposed attack method is feasible and efficient.

System Model and Design Goal
The system model we consider is the same as FSDA scheme, as described in Section 3.1, which consists primarily of four entity types: Offline TA, CS, FNs, and SMs.
In the attacker model, FNs and CS are honest-but-curious. These entities will faithfully implement the protocol, but they are curious about the private information contained in the reports. Moreover, SMs are honest and tamper-resistant. However, there exists an external adversary A that may compromise the SMs, FN, and CS. The adversary goals may include knowing the aggregated and individual SM readings. There are also some considerations for adversary A: Offline TA is thought to be invulnerable and a collusion attack can be initiated by at most any (n − 1) SMs.
Our design goal of the improved scheme is mainly to provide privacy preservation, practicability, and the utility of data in smart grid. Specifically, it includes the following three aspects.
(1) Privacy: All electricity consumption information in smart grid should be protected. The adversary A cannot access the user's data, even if the attacker intercepts the communication data transmitted on an insecure channel. To protect the consumer's personal electricity consumption data, no entity other than the consumer himself can obtain the consumer's electricity consumption data. To protect aggregated data results, no entity other than CS can obtain aggregated data results.
(2) Flexible: Our scheme can achieve the same function of flexibly adjusting subsets as FSDA scheme. Compared with the traditional multi-subset aggregation scheme, our scheme can provide more targeted aggregation data for CS.
(3) Fault tolerance: Compared with FSDA scheme, our scheme can decrypt the data correctly when a user fails to submit the data. In practical applications, the user's electric meter will often malfunction, and fault tolerance is the key to the application of the aggregation scheme in real life.

Overview
As mentioned above, FSDA scheme has the risk of disclosing personal privacy. Therefore, we add extra secure hash functions to the encryption phase of FSDA scheme, and introduce an extended Shamir's k, n tSSS to make the scheme fault-tolerant.
(1) Security enhancement. In the individual data encryption phase, SFSDA differs from the original scheme in that if the ciphertext c i exceeds one Paillier ciphertext (i.e., c i = (c i1 , c i2 )), c i1 and c i2 contain two different hash functions, H 1 and H 2 , respectively. Therefore, our SFSDA scheme does not make c i1 and c i2 contain the same as the FSDA scheme as described in Section 4.1. This improved method can effectively resist the proposed attacks and prevent personal privacy from leaking.
(2) Fault tolerance. When some faulty SMs cannot submit the ciphertext, In Eq. (3) of the decryption process in the FSDA scheme, n i=1 s i modλ = 0, hence, CS cannot obtain aggregated data results by decryption. To support the correct decryption, an extended Shamir's k, n tSSS is performed to reconstruct the equivalent ciphertext H (T) N·s i of malfunctioning SM i .

Detailed SFSDA Scheme
Without loss of generality, we describe SFSDA scheme in detail, taking the case where SM i needs to construct two pieces of data (i.e., M i = (M i1 , M i2 )) to store subsets as an example.

Initialization
In the initialization phase, TA is mainly responsible for generating system parameters, CS transfers data-related settings to FN, and FN assigns settings to SMs. It is divided into the following three steps.
Step 1: Given the system parameter κ. Set N = p · q and a generator g = 1 + N, where p and q are two large prime numbers satisfying |p| = |q| = κ. And let P = αN + 1 be a large prime number, where α is a small integer. Defines H 1 and H 2 as hash functions SHA-256. TA publishes the public key K pub = (H 1 , H 2 , N, g, α) and holds the private key K pri (λ, μ).
Step 2: Given the upper limit R of electricity consumption, CS divides [0, R) into k continuous Step 3: FN calculates the size l j + d for slot (j), where l j = |ΔR j · n| and d = |n|. FN sends ( R j , l j + d) to each SM i .

Entity Registration
The registration of SMs and CS is completed by TA, which includes the following three steps.
Step 1: TA randomly selects (n + 1) numbers s 0 , s 1 , . . . ,s n ∈ Z * N satisfying n i=0 s i = 0modλ, and sends s 0 and s i (i = 1, 2, ..., n) to CS and SM i as their private keys respectively through the secure channel.
Step 2: TA randomly selects n users (defined as U i ) in the residential area. By applying extended Shamir's k, n tSSS with modulus N, TA calculates share {s i } j (u j ∈ U i ,1 ≤ j ≤ n) of s i , then sends all shares {s * } j (u j ∈ U, and * is a universal symbol) to the corresponding user u j and sends all u i , U i to FN.

Individual Data Encryption
SM i needs to construct data M i = (M i1 , M i2 ) at each time T, then fill the electricity data into the corresponding data slot, and finally encrypt the data and transmit it to FN.
Step 1: SM constructs a data slot, where the size of slot (j) is l j + d, and each bit of slot data is set to 0 by default.
Step 2: SM i detects reading D i for each time interval Δt. If D i ∈ R j−1 , R j , the "1" and the data D i − R j−1 are filled in the first and second parts of the slot, respectively.
Step 3: SM i encrypts data M i with its private key s i : and pass c i = (c i1 , c i2 ) to FN.

Data Aggregation
FN checks all received c i and marks the lost user setÛ. IfÛ is not empty, for each u i ∈Û, FN acquires the u i , U and sends a request for the share to u j ∈ U (1 ≤ j ≤ n). After receiving the request, u j acknowledges the malfunction of SM i and returnsc i,j = H 1 (T) {s i } j . FN receives k shares from u j (denoted asÜ i ), and it then calculates L j = uw∈Ü i ,uw =u j [−u w ]/ u j − u w modN, α·L j modP. The equivalent ciphertext can be obtained by computing FN aggregates all equivalent ciphertexts and accepted ciphertexts and pass C andÛ to CS.

Decryption
After obtaining the aggregated data, CS first decrypts the data, and then splits the decrypted data to obtain the aggregation results of each subset. It includes the following two steps.
Step 1: CS performs the following calculations with its private key s 0 : CS obtains M by calculating Step 2: CS splits M into M 1 ||M 2 || . . . ||M k . M j (1 ≤ j ≤ k) can be split into M j1 ||M j2 . Finally, CS gets the number of users D j1 = M j1 and the corresponding total consumption D j2 = M j2 + R j−1 · M j1 of the j-th subset.

Subset Adjustment Phase
When CS wants to adjust the subset according to the specific situation, it only needs to reset the related settings of the subset.
Step 2: FN recalculates the size l j + d . FN then sends ( R j , l j + d ) to each SM i .
Step 3: SM i constructs new data M i containing k slots, and the size of slot (j) is l j + d (1 ≤ j ≤ k ).

Correction
The correctness evaluation of SFSDA mainly includes the reconstruction of equivalent ciphertext and the decryption of aggregated ciphertext.

Reconstruction of Equivalent Ciphertext
After FN receives k shares from u j (denoted asÜ i ), it reconstructs the equivalent ciphertext using extended Shamir's k, n tSSS according to Eq. (6): where k ∈ Z. Since

Decryption of Aggregated Ciphertext
The last equation holds sinceŨ ∪Û = U and is obtained.

Security Analysis
The security of the scheme is mainly analyzed through the following scenarios.
Scenario 1: The scenario described in Section 5 is semantically secure.
Proof: Both the single ciphertext submitted by SM and the aggregated ciphertext generated by FN calculation are valid ciphertexts of Paillier cryptosystem [42]. Because the security of the public key cryptosystem is based on solving Composite Residuosity Class Problem over Z * N 2 , it has been proven to be semantically safe. Therefore, a plaintext is protected by the corresponding ciphertext.

Scenario 2:
Even if a malicious CS colludes with SMs and FN, it would not be computationally possible to gain privacy.
Proof: CS can decrypt aggregated data with its private key s 0 because s 0 + n i=1 s i = 0modλ and H (T) s 0 + n i=1 s i = 1modN 2 , but s 0 cannot be used to decrypt any individual data M i . If CS wants to obtain M i , the first method is to decrypt an aggregated ciphertext with λ or s i to obtain M i , the second method is to obtain H 1 (T) N·s i modN 2 and H 2 (T) N·s i modN 2 , then calculate their multiplication inverse, and then use the methods of steps 3 and 4 in the designed attack to obtain plaintext.
For the first method, on the one hand, λ is saved by TA, and no other entity can get it. On the other hand, even if the CS colludes with n − 1 users u i (i = 1, 2, . . . , n − 1), it is not computationally feasible to recover the s n from s 1 , s 2 , . . . , s n−1 obtained from the user u i and its own s n .
For the second method, extended Shamir's k, n tSSS can be used to recover H 1 (T) N·s i modN 2 and H 2 (T) N·s i modN 2 . If CS wants to reconstruct them, it needs to collude with at least k out of n users. However, as described in the system model, SMs are honest, so this method is not feasible.
Scenario 3: Malicious FN cannot obtain individual data and aggregate data results.
Proof: FN will receive all ciphertexts submitted from u i and perform an addition homomorphic operation. If FN wants to obtain individual data or aggregated data after the homomorphic operation, there are two methods: one is to obtain λ or s i , and the other is to obtain H 1 (T) N·s i modN 2 and H 1 (T) N·s i modN 2 . Since FN does not have λ or the private key s i of u i and CS, it cannot get the corresponding plaintext of ciphertext. In the worst case, like CS, even if FN colludes with n SMs and CS, it cannot recover s n through s i (i = 0, 1, . . . , n − 1). In addition, FN can use secret reconstruction to obtain equivalent ciphertexts H 1 (T) N·s i modN 2 and H 2 (T) N·s i modN 2 . However, if FN wants to reconstruct equivalent ciphertexts, at least k users must honestly confirm the malfunction of u i . Therefore, based on the honest condition of user SMs, FN has no authority to reconstruct the equivalent ciphertext of normal users.
In the case of user failure, the user u i cannot submit encrypted data normally, and FN will perform the secret reconstruction. After at least k users honestly confirm the u i failure and submit shares, FN can successfully reconstruct the equivalent ciphertext. Since the extended Shamir's k, n tSSS method used is master key secured, FN cannot get any information about the user's private key. In addition, the equivalent ciphertext of the FN reconstruction is not reusable, i.e., H 1 (T) N·s i modN 2 and H 2 (T) N·s i modN 2 reconstructed at the user u i failure moment T cannot be used to forge the H 1 (T ) N·s i modN 2 and H 2 (T ) N·s i modN 2 at the normal moment T . That prevents FN from reusing the equivalent ciphertext to recover the electricity consumption information in the ciphertext submitted by the user.

Scenario 4:
Our scheme can effectively resist the attacks proposed in this paper.
Proof: Different from the FSDA scheme, our scheme uses different hash functions H 1 and H 2 in the two Paillier ciphertexts submitted by the user. Therefore, it is not feasible for an attacker to obtain (1 + NM i ) modN 2 by calculating the multiplication inverse of H 1 (T) N·s i modN 2 (or H 2 (T) N·s i modN 2 ) and multiplying it with (1 + NM i ) · H 2 (T) N·s i modN 2 (or (1 + NM i ) · H 1 (T) N·s i modN 2 ).

Performance Analysis
In this section, we evaluate the performance of the proposed SFSDA scheme and compare it with FSDA [19], MSDA [39] and FPDA [14] in terms of computation overhead, communication overhead and storage overhead. It is worth noting that fault-tolerant data aggregation will incur additional overhead due to the addition of fault-tolerant functionality in our scheme. Therefore, the performance evaluation of the scheme will evaluate normal data aggregation and fault-tolerant data aggregation respectively.
Two evaluation environments are used. We evaluate computation costs for data aggregation and decryption on the first platform consisting of a computer with an Intel Core i7 CPU (2.8 GHz) and 16 GB of memory running on Ubuntu 18.04. Computation costs are evaluated for all encryption of SMs on the second platform with a Raspberry Pi (RasPi). The RasPi 3 Model B is installed Ubuntu MATE 21.04 operating system with 1.2-GHz quad-core 64-bit ARM Cortex-A53 CPU and 1 GB of memory. In our implementation, the basic security parameters are the same as described FSDA, where the size of p and q are set to 512 bits. To facilitate the comparison with the original scheme, the maximum electricity consumption and the number of users of our scheme are also consistent with FSDA scheme. The maximum electricity consumption is set to 100 Wh and the total number of users is less than 5000. We choose a random integer value from 0 to 100 as each SM's reading. Thus, a Paillier ciphertext may contain at most 15 subsets.

Computation Overhead
To facilitate the evaluation, some notations are defined. Exponentiation operation is denoted as O e , point multiplication operation is denoted as O m , hash operation is denoted as O H , and point addition operation is denoted as O a .
For normal data aggregation, our scheme and FSDA scheme work in the same way. Therefore, their computation overhead is the same. In SFSDA and FSDA, when SM i produces its report, it requires k/15 O e and k/15 O m to generate the ciphertext. In MSDA, SM i requires k/10 O e and k/10 O m to generate ciphertext, and also requires one O H and two O a to update secret key. In FPDA, SM i requires kO e and kO m . Fig. 6a shows that when transmitting multiple subsets, the computation overhead of our scheme is consistent with FSDA, lower than the others. When  Fig. 6c shows that the cost of CS in our scheme is still the lowest compared with the other three schemes, and the costs of CS of all schemes fluctuate a little. For fault-tolerant data aggregation, our scheme generates extra computation overhead as shown in Fig. 7. Since CS is not involved in fault-tolerant data aggregation, Fig. 7 only gives the total time to generate shares for all SMs with different failure probabilities and different thresholds, and the time for FN to perform reconfiguration and aggregation. Figs. 7a and 7b show that under the same threshold, with the increase of failure probability, the total time for SMs to generate shares and the time for FN to perform reconstruction and aggregation increase approximately linearly with the increase of failure probabilities. As can be seen from Figs. 7a and 7b, the extra computation overhead incurred by SMs and FN would increase significantly if the thresholds were increased.  (13,20), n = 500

Communication Overhead
In our scheme, the failure probability is defined as β, and the maximum sequence number length of users is defined as l. Since the size of p and q is 512 bits, the size of N is 1024 bits. Hence, the size of N 2 is 2048 bits. We also assume the share reported by SMs to FN is about 1024 bits under modulus P = αN + 1, where α is a small integer. Besides, by instantiating our SFSDA scheme with β = 5%, l = 20, n = 500, k = 13, n = 20 and k = 15, detailed communication overheads are shown in Table 1. During normal data aggregation, each SM transmits ciphertext of size k/15 |Z * N 2 | to FN in our scheme, which should be k/15 |Z * N 2 | in FSDA, k/10 |Z * N 2 | in MSDA, and k|Z * N 2 | in FPDA. Besides, in MSDA, each SM needs to send additional |Z * p | to each other to update secret key. Therefore, the overheads of all SMs in SFSDA, FSDA, MSDA, and FPDA are 1000, 1000, 127000, and 15000 kb respectively. Since the length of the aggregated ciphertext is the same as the ciphertext uploaded by SM, the communication cost between FN and CS is the same as the communication cost between SM and FN, which is 2, 2, 4 and 30 kb in SFSDA, FSDA, MSDA, and FPDA respectively. When the user fails to submit data, in our scheme, FN transmits a sequence number of size l to SM to send a request for reconstruction, resulting in a total additional communication overhead of l ·(β · n)·n ≈ 10 kb from FN to SMs. SM transmits a share of size |Z * p | to FN, resulting in a total communication overhead of |Z * p | · (β · n) · n ≈ 500 kb from SMs to FN. FN transmits to the CS the serial number of the failed user of size l, resulting in a total communication overhead of l · (β · n) ≈ 0.5 kb from FN to CS. While in FPDA, the extra communication overhead is k times that of our scheme. As can be seen from Table 1, for normal data aggregation, the communication overhead of our scheme is consistent with that of the FSDA and lower than that of the other two schemes. For fault-tolerant data aggregation, our scheme can achieve a low communication overhead.

Storage Overhead
In SFSDA and FPDA schemes, to achieve fault tolerance, SM and FN need to store the original share {s i } in addition to their private keys, and FN needs to store the index with the size l · n · n for finding the private key and Δ i,T sent by TA in advance. Let SM submit electricity consumption every 15 min, then each SM will submit its electricity consumption at 96 different times a day. Set Δ i,T to be updated daily, then Δ i,T = 96 · n · |Z * N 2 |. Table 2 shows the storage overhead for each entity. For normal data aggregation, the storage overhead of our scheme is consistent with that of FSDA and FPDA, and the storage overhead of SM is less than that of MSDA. For fault-tolerant data aggregation, SM only stores a small amount of extra data, and most of the storage overhead for handling user failures is borne by FN.  In this article, we identified the potential security flaw of personal privacy disclosure in the FSDA scheme. Once the ciphertext sent by SM to FN contains two or more Paillier ciphertexts, the ciphertext itself will reveal the encrypted personal electricity consumption, resulting in personal privacy infringement. In addition, the FSDA scheme lacks fault tolerance. Once SM cannot submit data, CS will not get the correct decryption results. To solve the above-mentioned security issues and hold the fault-tolerant property, a secure flexible subset data aggregation with fault tolerance SFSDA is proposed. On the one hand, our scheme allows for flexible subset aggregation, where the subset can be flexibly adjusted to the needs of CS without compromising user privacy. On the other hand, when the encrypted data of a faulty user cannot be obtained, the CS can still obtain the correct aggregation result by reconstructing the equivalent ciphertext in our scheme. The experimental results show that our scheme is efficient, flexible, and practical. For future work, as the existing data aggregation schemes for smart grids only perform sum operations on the data, some important operations, such as linear operations, are ignored. Therefore, maintaining linear homomorphism of data aggregation for smart grids is an important direction for our future research.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.