Mitigating consumer privacy breach in smart grid using obfuscation-based generative adversarial network

: Smart meters allow real-time monitoring and collection of power consumption data of a consumer’s premise. With the worldwide integration of smart meters, there has been a substantial rise in concerns regarding threats to consumer privacy. The exposed ﬁne-grained power consumption data results in behaviour leakage by revealing the end-user’s home appliance usage information. Previously, researchers have proposed approaches to alter data using perturbation, aggregation or hide identiﬁers using anonymization. Unfortunately, these techniques su ﬀ er from various limitations. In this paper, we propose a privacy preserving architecture for ﬁne-grained power data in a smart grid. The proposed architecture uses generative adversarial network (GAN) and an obfuscator to generate a synthetic timeseries. The proposed architecture enables to replace the existing appliance signature with appliances that are not active during that period while ensuring minimum energy di ﬀ erence between the ground truth and the synthetic timeseries. We use real-world dataset containing power consumption readings for our experiment and use non-intrusive load monitoring (NILM) algorithms to show that our approach is more e ﬀ ective in preserving the privacy level of a consumer’s power consumption data.


Introduction
Over the past decade, advances in the industrial and social sectors have drastically increased the demand for energy consumption. For example, Energy International Agency (EIA) projects a nearly 50% increase in world energy usage by 2050, led by growth in Asia from 2018 to 2050. The buildings sector, which includes residential and commercial infrastructure, is estimated to increase by 65% in energy consumption between 2018 and 2050 i.e., from 91 quadrillions to 139 quadrillions British Thermal Unit (BTU) [1]. The global CO 2 emission will increase more than double by 2050 while the global investment in electrical grid infrastructure is estimated to be around $6 trillion by 2030 [2]. To meet this ever-increasing demand for energy supply, the need for efficient use of energy resources, reduced carbon emission, and integration of multiple sources of renewable energy, required a new electrical grid to incorporate the digital and computing technologies to automate and manage the energy supply needs of the 21st century.
A smart grid is an electrical network that integrates information and communication technologies for efficient distribution and consumption of energy resources. The integration of various communication and data processing capabilities transforms the traditional electrical grid into a revolutionized power system, enabling information flow between different entities such as metering, substations, distributions, transmission, and generation [3]. With this increased availability of communication and computing resources, the smart grid has enhanced benefits and potential unknown to the traditional electrical network. For instance, a smart grid with its broad range of grid-side and consumer-side applications, enables monitoring of energy consumption data, demand response, dynamic pricing, and different information messages via its smart infrastructure. It also enables collecting and processing various types of energy related data through its smart infrastructure consisting of different entities such as grid sensors, wide-area monitoring, distribution energy management systems, etc. [4].
One such important asset of the smart grid is known as Advanced Metering Infrastructure (AMI). The AMI is made up of a set of smart meters, communication modules, local area network (LAN), data concentrator (DC), wide area network (WAN), software, and hardware of central system [5]. The AMI allows two-way communication between the consumer's smart meter and the energy supplier for measuring periodic or on-demand fine-grained energy consumption data. This fine-grained energy consumption data as feedback helps reduce cost and reduce consumption by up to 20% through efficient energy management. The European Parliament and Council of the European Union has taken one such initiative under the EU Directive 2006/32/EC to provide accurate measuring and actual "time of use" of energy consumption to the energy consumers [6].
While such detailed energy consumption feedback benefits economically and ecologically for involved stakeholders, smart meters allow massive energy information flow between consumers and suppliers, causing a potential threat to consumers' privacy. This sensitive energy information in collaboration with algorithms such as NILM [7], can help third parties deduce a consumer's daily routine, appliance usage, working hours, meal hours, occupants present on premises or any medical equipment in usage and even living habits such as the time when TV is watched.
In 2009, the Federal Bureau of Investigation's Cyber Intelligence investigated a widespread incident of power theft related to smart meters. It was found that the miscreants hacked in to the smart meters and reprogrammed the power consumption settings, resulting in a loss of $US 400 million annually for the Puerto Rico utility [8]. Furthermore, in 2007, the Austin Energy/Austin Police conducted a warrantless surveillance program where consumer usage information was provided to find marijuana growing operations. Furthermore, law enforcement agencies might use the data as real-time surveillance [9]. Figure 1 shows how NILM enables the identification of individual appliances using various machine learning algorithms from a single aggregate power consumption reading of a consumer's premise. Various entities such as law enforcement agencies, marketing agencies, and malicious users may misuse this fine-grained data to profile a consumer and jeopardize their privacy or to achieve unfair business strategies.
Recent research findings [3] on various privacy-preserving schemes and their implementation conclude that there are some practical limitations of the existing approaches: first, noise added through various noise distribution techniques do not considerably affect the identification of appliances; second, the effectiveness of these approaches is quantified using information-theoretic metrics and not NILM algorithms; and finally, various auto-encoders and filters can be used to denoise a time series. This paper solves these problems by proposing a privacy-preserving architecture that combines an obfuscator and GAN model to generate a synthetic time series that is close to the real time series. The proposed privacy preserving architecture enables a consumer to obfuscate the power consumption data to help prevent NILM algorithms from inferring the active appliances. Thus preventing consumer profiling and preserving privacy. We evaluate our approach using the widely accepted NILMKTK [7] framework and publicly available datasets such Dutch Residential Energy Dataset (DREDD) [10]. Figure 1. Appliance disaggregation from a single point of measurement of power i.e. aggregate reading using NILM algorithms to predict the appliance activity [11].

Contribution
In this paper, we propose a novel privacy-preserving architecture that will generate a synthetic time series yielding the following contributions: 1. We develop a privacy-preserving architecture to preserve the privacy of consumer activity deduced through disaggregation algorithms. The architecture identifies the inactive states and generates state combination close to the total aggregate power consumption. 2. We propose a novel hybrid privacy approach to generate indistinguishable synthetic time series data. The proposed approach hybridizes the strength of the generative adversarial network (GAN) with NILM in an adaptive manner. In this work, we develop an obfuscator model that generates the combination of the appliance's inactive state for GAN discriminator. A customized generator model is devised to produce a various robust combination of states of appliance signatures. 3. We evaluate and quantify the effectiveness of our privacy-preserving architecture by performing disaggregation on the synthetic time series generated by our architecture using NILM algorithms. We show that the disaggregation results are distinguishable from the real dataset using the MEC metric.

Motivation and related works
The Privacy Impact Assessment (PIA) was a comprehensive process of determining the privacy, confidentiality, and risk involved with data collection in the smart grid. Revealing information about residential consumers and activities within the house was one of the concerns reported by the privacy sub-group of the Cyber Security Working Group [12].
In 2009, the FBI Cyber intelligence investigated a wide spread incident of power theft related to the smart meters. It was found that the miscreants hacked into the smart meters and reprogrammed the power consumption settings resulting in a loss of $US 400 million annually for the Puerto Rico utility. Furthermore, in 2007, the Austin Energy/Austin Police conducted a warrantless surveillance program where consumer usage information was provided to find marijuana growing operations. Besides this, law enforcement agencies might use the data as real-time surveillance [3].
Although NILM enables efficient use of power consumption, it however, presents severe privacy concerns. The appliance usage inference from a NILM algorithm can be related to the daily routines i.e., behavioral patterns of a household or the presence of a number of individuals in a premise. Such sensitive data helps a malicious user build a detailed profile of consumer behavior in a premise and provide a basis for forecasting a premise activity such as when the premise was unattended, work schedules, and other personal activities. Furthermore, marketing agencies can use this data to carry out a targeted advertisement for devices not owned by consumers or for mass surveillance by law enforcement agencies. The potential privacy concerns and usage of data makes it a valuable target for data thieves.
Several privacy-preserving approaches have been proposed and used by researchers. We have performed an extensive literature survey on privacy-preserving schemes [3] and we have presented some state-of-the-art approaches proposed by researchers in this section.
Battery-based load hiding (BLH) approach uses a battery i.e., rechargeable, to partially supply the energy demand to manipulate meter reading to hide the actual energy consumption. [13] proposed a reinforcement learning (RL) based BLH approach to preserve privacy for high-frequency and lowfrequency variation data. The RL-BLH algorithm learns a decision policy for choosing pulse magnitudes on the fly without prior knowledge of usage patterns and uses artificially generated data to reduce the time taken to converge to an optimal policy. However, reinforcement learning does not estimate the actual input/output characteristic but only the desired probabilistic behavior. [14] proposed a scheme to address the smart meter (SM) privacy concerns using renewable energy sources (RES) and a battery to partially hide the consumption pattern from the utility provider. The proposed scheme uses an information-theoretic approach to minimize leakage of consumer's energy consumption data to the utility provider as well as the energy generated by the RES. However, renewable energy is wasted when the battery is maximally charged or the required energy load is smaller than the generated energy.
Data obfuscation provides a unique opportunity to mask the original energy consumption data by applying random noise [15] or by using an appropriate algebraic transformation on the fine-grained energy usage data [16]. [17] proposed a utility-privacy tradeoff scheme based on random data obfuscation. In the proposed scheme, random data-obfuscation generated by the Laplace distribution is used to mask the real-time data. The proposed scheme also has a Key Initialization Centre (KIC) to initialize keys to smart meters and control centre and has a higher error rate. Furthermore, KIC uses Paillier encryption for generating encryption parameters, which is computationally expensive.
Data anonymization allows to disassociate the customer identity from its energy consumption data while utilities receive enough information to compute the required information. These approaches allow the implementation of additional trusted infrastructure. [18] proposed an authentication framework based on anonymization to protect unauthorized data access and achieve privacy. The framework is designed to prevent service providers from correlating various types of data from a smart meter and avoid a single point of failure. The scheme does not consider the trustworthiness concern of the Anonymizer (AN), Electricity supplier (ES), and the Data Collector (DC) colluding. [19] proposed a privacy-preserving approach based on pseudo-identity. The approach uses a hash tree-based mechanism to achieve data integrity. However, the approach does not prevent insider attacks. Furthermore, anonymization techniques have previously failed on multiple occasions [20,21], and the data was traced back to individuals.
In data aggregation, network aggregators are used for concatenating and summarizing data packets from various devices using functions such as sum or average. [22] proposed Integrated Authentication and Confidentiality (IAC) protocol to provide efficient and secure AMI communications. The scheme uses hop-by-hop data aggregation and a forwarding approach between the intermediate nodes. The proposed approach does not consider the malfunctioning of intermediate nodes and is also vulnerable attack such as replay attack and forgery attack. [23] proposed a secured privacy-preserving protocol for smart metering systems using multiple gateways for aggregation using a cluster approach. The proposed protocol uses Fully Homomorphic Encryption (FHE) with a randomly generated polynomial (secure MPC) to secure the data. The encrypted data is aggregated using a hierarchical manner and without revealing the actual meter readings. However, FHE requires a lattice-based cryptosystem, which is very complex. Thus, implementing a lattice-based cryptosystem requires significantly high and complex computations and ciphertext sizes.
Differential Privacy is another related privacy concept based on privacy-preserving data mining. The privacy mechanism adds controlled noise to the requested data before being released. [24] use the Laplace mechanism to hide the consumer's power consumption data in smart meter data sets, achieving ε-differential privacy. [25] uses a differential privacy approach by using household batteries. The battery recharges/discharges power in a bid to hide the original power consumption data. The addition of the noise depends on ε and the sensitivity function. The lower the value of ε is chosen, the privacy risk is low. However, choosing a suitable value for ε poses a difficult challenge, as it may significantly decrease the utility of the data. Furthermore, it is challenging to input a differential privacy based dataset to a complex optimization algorithm which may lose the practicality of the original power consumption dataset [26].

Background
In this section, we introduce the technological concept related to this work.

Generative adversarial network
GANs are deep generative models [27][28][29] used to produce synthetic images and text. The GAN consist of a generator (G) and a discriminator (D), which compete in a two-player min-max game V (D, G). The G learns a mapping G(z) that tries to map the random noise vector z to a realistic time series. The D tries to find a mapping D(x) that tell us the input data's probability of being real. This is achieved by minimizing/maximizing the binary cross-entropy [30]: A simplified explanation of the Eq (3.1) is that the generator is trained to produce fake samples while the discriminator is trained to identify the synthetic of fake samples. A competition between the generator and the discriminator helps them improve their methods until the synthetic data is not distinguishable from the real data samples.
Algorithm 1 shows the training process for the generator model G of GANs. D and G are a neural network which try to maximize and minimize the objective, respectively. In other words, the objective of the generator G is to produce fake or synthetic data while the discriminator D is responsible for detecting the fake data samples. The feedback enables the D and G to improve their functions until the synthetic samples are indiscernible from the real data [31].
where σ(t) represents unaccounted power or noise.

Multi-State Energy Classifier metric
The Multi-State Energy Classifier (MEC) metric combines both event classification and energy estimation of an appliance state to give a more realistic and accurate evaluation of the performance of the existing NILM techniques [32]. The MEC metric consist of three steps namely; calculating the classification accuracy, the energy estimation accuracy and the total penalty of the operational states of an appliance. We choose the MEC metric to measure the accuracy of the NILM algorithms for the following reasons: the MEC accurately classifies multiple states of an appliance, quantifies the accuracy even for values that are too far from the original ground truth. Also the metric does not exceed the usual accuracy interval of 0 and 1 for relatively large errors.

Proposed privacy preserving architecture
In this section, we introduce the security and privacy concern that will be addressed and the overall workflow of the proposed privacy-preserving architecture. We also present the hybrid-GAN, as shown in Figure 2. Figure 2 illustrates the overall architecture, which comprises of three important steps detailed in following subsections.  Figure 2. An overview of the proposed architecture to preserve the privacy of the energy consumption data of the consumer.

Privacy concern
We address the following privacy and security concern in this paper: infer the use of the individual appliances using a NILM algorithm from an aggregate power consumption reading i.e., consumer profiling. Since our privacy-preserving architecture generates a synthetic time series based on the inactive states during a given time period T, it is strong against appliance inference via NILM. Unlike noise addition techniques, the hybrid-GAN effectively reduces the appliance detection accuracy and is immune to noise removing techniques such as auto-encoders and filters.

Architecture workflow
This section presents the proposed privacy-preserving architecture, as shown in Figure 2. Figure 2 illustrates the overall process, which comprises of the following steps: 1. The original power reading from the consumer's premise is disaggregated using a NILM technique and given to the data pre-processing step of the obfuscator. 2. The obfuscator process generates a combination of the appliance's inactive state. The obfuscator provides the obfuscated aggregate readings to the discriminator. 3. The GAN process in the proposed architecture is trained on the real dataset consisting of all the state combinations of the appliances used on a consumer's premises. 4. The GAN process generates a synthetic time series which is close enough to the real time series with a different combination of states of an appliance.
In the next section, we explain the obfuscator (O) in detail.

Obfuscator
The obfuscator process generates a combination of inactive states of appliances with total energy nearly equivalent to the original ground truth. The identification and storing of inactive states of an appliance allows the hybrid-GAN to generate n-combinations of inactive states and select the optimal solution close to the original ground truth power consumption. Figure 3 presents the obfuscation process in detail. The obfuscator takes the ground truth aggregate g power and the active operational states of appliances d i where i=1 to m for the time T to output a obfuscated aggregate value. The obfuscator process is subdivided into two steps: Step 1 Data pre-processing: In the data pre-processing step, the basic idea is to identify the active states and the corresponding power of the appliances, calculate the remaining power, categorize the appliances, and compute the inactive devices and its corresponding states.
The process starts by traversing through the data points of the ground truth time series G T = {g power , d 1 , ...d m } where g power = M i=1 d i at time t. An appliance object A i is instantiated of type < appliance > and the d i power value is set to the object A i .power. The A i is stored in devices list of type < PowerReadings >. The process now categorizes the appliance A i into Always Active or Not Always Active. We categorize the appliances based on the amount of privacy concern. An Always Active appliances such as fridge, smoke alarm do not cause privacy concerns as high as appliances in the Not Always Active category such as fan, television, laptop etc. These appliances help deduce a consumer's activity pattern which is a serious privacy concern. Hence we aim to obfuscate only the Not Always Active appliances. Based on the categorization of the appliances i.e. for an always active appliance, we set the total power of always active devices in reading.activePower and then calculate the remaining power to be obfuscated i.e the total power g power minus the reading.activePower. The remaining power is then updated in reading.remPower.
For a Not Always Active appliance, the process starts by identifying the state of an appliance. The process compares the A i .power to Map < S tate, Power > to obtain the active state of an appliance. The Map < S tate, Power > consists of all the appliances and its states and the corresponding power of the appliance states. The states of an appliance are identified using the appliance state clustering technique as mentioned in [32]. Next, the inactive states of an appliance A i are stored into inActives of type < inActiveProperties >. A similar process is performed for all the devices active during the same instance at time t. This is done to track the change in the state of appliances for consecutive instances.
Step 2 Generate n-device combinations: The next step in the process involves generating the state combinations for obfuscation. The combination process is executed only when the obfuscator detects a change in state of appliances at time t − 1 and t. This reduces the need for re-generating the state combinations for sequentially similar active states and reduces computational time. The input for the process is the inactive states inActives, the total power g power and a user supplied parameter thres as shown in Figure 4.
The thres is used to compute the lower bound and the upper bound for the total power of permitted combination i.e., thres lower and thres upper . The process generates 'N-Appliance' combinations of all the appliances A i and its inactive states and computes the total power of the combination. The process ensures that the appliance states of the combination i.e., A i .S tate are mutually exclusive. The combination is stored in allPossibleCombination if the total power of the combination lies within the thres lower and thres upper . The process then maps the combination to the closest threshold. The N appliance combination with minimum distance is stored in combinationDevice. Similar process is performed for N = 1 to number of devices in the inActives. At the end of the process, the combinationDevice is stored with best combination of N appliances. Furthermore, combinationDevice are sorted with respect to minimum distance and number of appliances in the combination. Once sorted, the process outputs the first combination in the combinationDevice.

GAN architecture
In this section, we discuss the proposed GAN of the privacy-preserving architecture, as shown in Figure 2. Most of the existing GANs were developed for images process and audios applications. However, researchers have used GAN to preserve privacy of the original timeseries by generating a synthetic timeseries. In [33], author uses GAN to preserve the privacy of an individuals sensitive data generated by his movements i.e. the GPS trajectories. To this end, this work proposes a hybrid GAN architecture for the privacy preserving problems related to the consumers energy consumption data in a smart grid. The proposed architecture is designed to deal with timeseries dataset based on the model presented in [34]. To design a GAN architecture for preserving privacy in time series dataset, several changes and customisation have been made to develop an efficient and effective model. These are: • A smart meter measures energy consumption at low-frequency sampling rate and sends data every 15 min interval, we split the time series into daily vectors i.e., minutes and seconds using hot embedding. • We set a low decay rate to allow the GAN to have enough time to capture the pattern of the dataset. • The aggregate power reading time series in very dynamic in nature. We use a small kernel in order to reduce the amount of inaccurate values of the power reading time series. • We use Adam optimizer since the aggregate power time series is sparse in nature. We set the beta parameter value in order to ensure that the small weights are assigned to far gradients.
In the literature, several convolutional neural network (CNN) models were developed for image and audio classification problems. However, the same model cannot be directly used to deal with the time series data because the data is represented in sequences-time manner. CNN exhibits excellent performance on several challenging applications and thus it can be used to learn from a time series dataset. While recurrent neural networks (RNN) can also be used for timeseries data as they store temporal information available in a time series, CNN models are computationally lighter and learn by batch, in our experiments it is more suitable as we have resource constraint devices and data is sent in batches every 5-15 min (based on the utility provider) time interval. Furthermore, while RNN learns from the previous data timestep it needs to predict, whereas CNNs learns by seeing the data from a broader perspective, which is more feasible for our GAN model.
To this end, we use 1D CNN where the time series is represented as one-dimensional sequences of data. The CNN model learns to extract features from sequences of observations and then map the internal features into different activities. It directly learns most effective features from the raw time series dataset without the need for the domain expertise to extract those features. Thus the proposed model can adaptively deal with various data types and can cope with problem changes that might occur during the development process.
In this work, we develop a 1D CNN for both the discriminator (D) and Generator (G) models. We then conducted experimental tests to find the best parameter values for this GAN model. Considering the time and accuracy, we have fine-tuned various parameters by setting a small kernel size i.e. 2 and gradually increasing or decreasing the other parameters to get the best possible output. As seen in these experiments, the decay values of 1e-1 and 1e-2 decreased the learning rate rapidly for this model and resulted in poor synthetic timeseries generation. On the other hand, a smaller decay value resulted in better performance. Most of the GAN models have a default value for the beta parameter set to 0.9, but reducing it to 0.4 provided a more stable training process. Therefore, we use the suggested values as shown in Table 1. In the following subsection, we describe the discriminator (D) and Generator (G) in detail.

Discriminator
The discriminator (D) is trained to differentiate between the generated samples as synthetic and the original samples as real. The D consists of multiple layers of a 1D CNN neural network that takes the sample input from the obfuscator, the minute vector, and the seconds vector as input and classifies whether the input is real or synthetic. The first layer of the discriminator consists of a 1D convolutional network. We set the number of filters for the layer and assign a low value for the kernel size. The kernel size specifies the size of the convolutional window. The first layer of 1D convolutional is followed by a Leaky Rectified Linear Unit (LeakyReLU) activation. We use LeakyReLU as it fixes the dying ReLU problem, is balanced and speeds up the training process. The second layer of the discriminator also consist of a 1D convolutional network followed by batch normalization and the leakyReLU activation. The batch normalization normalizes its output with the moving average of the µ and the σ of the batch. The third layer in the discriminator is also a 1D convolutional network with batch normalization and leakyReLU. At the output, we use the sigmoid activation. The sigmoid activation exist between (0,1) and is used to predict the probability i.e., real or fake.

Generator
The generator (G) is trained to generate synthetic samples. The G consists of multiple layers of neural network that takes a latent vector Z, the minute vector, and the seconds vector as input. The first layer of the generator consists of a transpose 1D convolutional network. We set the number of filters for the layer and assign a low value for kernel size and the strides. The first layer of 1D convolutional is followed by batch normalization and the leakyReLU activation. The second layer also consist of a transpose 1D convolutional network with batch normalization and leakyReLU activation. The final layer also consists of a transpose 1D convolutional network with filters set to input dimension and followed by a sigmoid activation at the output layer.

Implementation & results
The implementation and results of the proposed privacy-preserving architecture for generating obfuscated timeseries is discussed in this section. The architecture is implemented in sequential order as shown in Figure 3.

Dataset description
We conduct experiments using the DREDD dataset, an open source real-world dataset for researchers [10]. We use a subset which records aggregated energy consumption and appliance level energy consumption. The aggregate power readings and the appliance level reading are collected at a sampling frequency rate of 1 Hz. The dataset consists of various appliance types such as Type I (On-Off), Type II (Multi-State) etc.

NILM algorithm
We first perform power disaggregation on the ground truth data and measure the appliance detection accuracy of three state-of-art algorithms i.e., Combinatorial Optimization (CO), Factorial Hidden Markov Model (FHMM) [7] and Sparse Viterbi [35] using the MEC metric [32]. The disaggregation algorithms are first trained on the original dataset (aggregate and appliance-level power consumption data) to identify the appliance states. Once the algorithm learns the appliance states, the algorithms are tested on the aggregate power consumption data. Table 2 presents the appliance level detection accuracy of the algorithms using the aggregate power consumption data as an input. The SparseViterbi algorithm has shown a consistent detection accuracy of more than 90% for all the appliances in the dataset. In our simulation we used four thousand (4,000) data points in a bias environment (train and test on same dataset samples).
We choose the SparseViterbi disaggregation algorithm to generate the initial input to hybrid-GAN due to the higher rate of appliance detection as shown in Table 2. The algorithm is a highly accurate load classification and estimation algorithm. The algorithm uses a variant Viterbi algorithm and a hidden Markov model (HMM) to disaggregate appliances with complex multi-states power signatures [35]. The algorithm outputs ground truth time series

Data pre-processing
In the first step, the pre-processing component is implemented on every ground truth instance at time t = 1 to T . In this process, the active appliances and their states are identified. The appliances are categorized into 'Always Active' and 'Not Always Active'. Based on the categorization, the remaining power to be obfuscated is calculated. The inactive states of appliances are identified as well. The output of this process is inActives, thres, totalPower and remainingPower. Table 3 presents the data pre-processing output for every instance t. The data pre-processing step calls the n-device combination process when a change of appliance states is detected, as shown in Table 3. Table 3. A output sample of step 1 (data pre-processing) of the obfuscator of the proposed privacy-preserving architecture.

Generate N-device combinations
The second step of the implementation is makeUniqueCombination process. As shown in Figure  4, the process calculates the upper and lower threshold, generates n-appliance combination of inactive states. The process then selects the best optimal solution based on the minimum energy difference, the specified threshold, and the number of devices in the combination. The process outputs an obfuscated aggregate Obfus Table 4 represents the ground truth and Table 5 presents the corresponding obfuscator inactive state combination output at every time instance t. The obfuscator re-calculates the states when a change in active appliances is detected in the ground truth, as shown in Table 5.   143  98  36  0  7  2  0  0  0  143  98  36  0  7  2  0  0  0  143  98  36  0  7  2  0  0  0  41  0  0  0  7  2  25  7  0  41  0  0  0  7  2  25 7 0

GAN
The third step is the GAN to generate a synthetic time series. The discriminator, as explained in Section 4.4.1 is responsible for distinguishing between the real and synthetic data samples. The discriminator takes vector minute , vector second and Obfus T = {O 1 , O 2 , ...O t } as an input. The generator, as explained in Section 4.4.2 generates synthetic data samples. The generator takes latent vector Z, vector minute and vector second as an input.

Discussion & results
We plot the real ground truth data and the synthetic time series output of a consumer, as shown in Figure 5 . As the obfuscator generates a combination of inactive states close to original ground truth, the energy difference is minimum. The energy difference is based on the threshold variable thres specified by the user to balance the utility-privacy tradeoff. We exploit the NILM feature of identifying states to make NILM predict inaccurate states. Figure 6 shows the ground truth (blue) and the disaggregated result (yellow) of generated synthetic time series for appliances i.e. fan and television. In Figure 6, at data sample t 1 , the ground truth for appliance Fan is an Off state with a corresponding power of 0 Watts, whereas the NILM algorithm predicts a wrong active state with a corresponding power of 29 Watts. This shows a wrong prediction of an individual appliance activity compared to its ground truth. Table 6 presents the accuracy scores for disaggregation of different appliances from a synthetic generated timeseries using the NILM algorithm Sparse Viterbi. We perform disaggregation on aggregate timeseries obfuscated using two approaches i.e. adding White Gaussian Noise (WGN) and our proposed method hybrid-GAN. We select appliance fridge and heater to be an always active device, while others as not always active. As mentioned before, we aim to only obfuscate the not always active devices. The results presented in table 6 show disaggregation results of hybrid-GAN based synthetic timeseries. We measure the accuracy of appliance detection rate of NILM algorithm by using the MEC [32] metric. The amount of noise added is set to 2, 3, 5 and 8% for both the approaches to show the variation between different percentage of noise levels (Min:2% -Max:8%) for our experiments. This results in a variation of 2 to 8% in the power consumption, which is acceptable in a real world scenario.
The MEC metric accurately quantifies the appliance in terms of energy estimation as well as state classification. Referring to Tables 2 and 5, the disaggregation accuracy of a SparseViterbi algorithm for the original ground truth power consumption data is 97.02, 97.01, and 97.57% for appliances Fan, Television and Laptop respectively. By adding a noise threshold of 5%, our proposed approach reduces the detection accuracy to 40.16, 39.76, and 61.60% for appliances Fan, Television and Laptop as compared to 91.21, 64.73, and 74.24% using the Gaussian noise approach. We show that our approach effectively reduces the appliance detection rate as compared to White Gaussian Noise (WGN).

Conclusions & future works
This paper proposed a new privacy preserving architecture that generates a synthetic time series based on the inactive state combinations. The proposed architecture addresses the critical issues with the existing scenario: lack of effective privacy preserving approach to preserve consumer privacy and prevent inference of appliance activity. The proposed architecture solves this using a hybrid-GAN i.e. by combining the obfuscator with a generative adversarial network to generate a synthetic time series close enough to the real time series. As shown in the results, the proposed architecture has reduced the average appliance detection accuracy of the NILM algorithm between 4-18% for devices with binary and multiple states of operation. In future works, we aim to completely integrate the obfuscator as part of the GAN. This will enable GAN to generate specific combinations corresponding to the aggregate power based on constraints and conditions. Furthermore, we also aim to include appliance state selection based on time of the day and the appliance in use relating to the consumer. For example, using a BBQ appliance in early hours of morning would notify a malicious user of a synthetic time series in use. Synchronizing the appliance and time of use will help mislead malicious user as the inactive state combinations will be more time-based and will reflect a normal consumer activity when analysed.