A data set of a Norwegian energy community

This paper presents a data set designed to represent Norwegian energy communities. As such it includes household consumption data collected from smart meter measurements and divided into consumer groups, appliance consumption data collected from Norwegian households, electric vehicle data regarding charging patterns, simulated photovoltaic power generation data based on temperature and irradiance data sets and wholesale electricity prices. All data sets are further filtered by season, weekday/weekend and time segment, and then fitted to either a normal, exponential or log-normal distribution. The reason for this specific segmentation is the intention to provide a suitable data set for case studies and experiments on energy communities that consider uncertainty, a main challenge to be overcome in the practical implementation of energy community projects. In addition to this filtered version, the previously unpublished raw data sets on household consumption and photovoltaic power generation are also provided.


Specifications
Electrical and Electronic Engineering Specific subject area The data describes electrical energy communities in the Norwegian power system. As such it contains consumption data from households, consumption data from household appliances, electric vehicle (EV) charging data, calculated power from photovoltaics (PVs) and wholesale electricity price. Type of data Table (.xlsx format) How data were acquired Household consumption: Smart meter measurements Appliance consumption: High-resolution measurements Electric vehicles: Secondary studies Photovoltaic generation: Agrometeorology Norway's public repository Wholesale electricity price: Nordpool's data repository Data format Raw Filtered Parameters for data collection At the core of the data collection was the intention for application in future simulation studies, thus an hourly resolution was mandated for the raw time series. In addition to this, and due to the filtering process (season, weekday/weekend, time segment), a considerable number of data points of over 50 0 0 per time series was applied as a baseline. 1

Description of data collection
Norwegian household consumption data from smart meters for 2015. Norwegian household appliance consumption data from the research project ElDek (2009)(2010)(2011)(2012)

Value of the Data
• This data set consists of a composition of previously unpublished data sets, filtered data sets and simulated data. It allows for building stochastic case studies and simulations on energy communities within the Norwegian power grid. • The target audience of the data set is researchers and decision makers in electric power systems that aim to implement simulations and experiments on realistic energy communities. • The data set can be used in its current form to formulate case studies for various system topologies. In addition, it is possible to add information on the local grid, either from test systems or real networks. As such data sets are usually deterministic, the stochastic form of the here presented data complements this and allows for creating stochastic case studies of energy communities. • Within Europe, a growing importance of decentralized energy community models can be observed. Aligned with this, in 2018 and 2019 respectively, the European Union has defined Renewable [3] and Citizen Energy Communities [4] as legal entities of such energy communities. This data set provides an opportunity to formulate heterogeneous test communities of varying sizes and varying characteristics such as number of EVs or available solar power generation capacity.

Data Description
A visual summary of all components in the data set is shown in Fig. 1 .
In summary, the data set consists of five separate data sets as shown in Table 1 . The data is presented as a single.xlsx-file consisting of seven individual sheets.
All separate data sets contain filtered values. All filtered values, except the values for the appliances, are based on a three-dimensional filtering process. The segments of this process are shown in Table 2 .  a Includes heating, which is in Norway most commonly conducted electrical [5] .

Table 2
Three-dimensional filtering process.  The filtering process follows four steps: 1. Separate the data corresponding to dimension 1 (seasons).

For each of the four segments created in
Step 1, find the data corresponding to dimension 2 (day of week). 3. For each of the eight segments created in Step 2, find the data corresponding to dimension 3 (time of day). 4. For each of the 48 segments, fit the data to the given distribution.
The data for the appliances is not filtered by seasons, and therefore starts at step 2. Hence, only dimension 2 (day of week) and dimension 3 (time of day) are used to filter the appliance data, leading to 12 segments in total. Table 3 shows a summary of the included data points, including starting and ending points of the utilized time series. Note that the data points for the simulated photovoltaic generation will be introduced further below in the following subsections. In addition to this, each individual component of the data set is also described in detail. Fig. 2 shows a geographical overview of the origin sites of the data set.

Household consumption
The data for household consumption are given by the sheets Households_raw and House-holds_filtered : • Households_raw consists of the load of four different household groups (Groups A, B, C and D) with hourly resolution for the time period 1 January 2015 to 31 December 2015. The (electrical) load is presented as a ratio of the maximum load over the year, and can therefore be scaled up by multiplying with the hourly peak power over the year for a given household. The four groups have been obtained by clustering smart meter data from 100 households. See Fig. 3 for a visualization of the mean and 0.99 quantiles of the household consumption. It has to be noted here that in Norway heating is conducted most commonly via electrical space heaters [5] , thus this series also includes the effects of such. • Households_filtered shows the filtered values of the corresponding dataset as shown in Table 2 and described in the introduction of this section. The data is fitted to a normal distribution with the following parameters: mu, sigma, minimum and maximum. An excerpt of this is shown in Table 4 .

Appliance consumption
The data for the household appliances is given by the sheet Appliances_filtered . It consists of filtered values of electrical loads for three different appliances: dishwasher, dryer and washing machine. The reason for the selection of these specific appliances is that these allow for load shifting, i.e. utilizing delay in their operation in order to minimize electrical consumption during   peak hours. This is a common operational problem that energy community implementations could encounter and attempt to solve. The number of households for each appliance is shown in Table 5 , along with information on zero values (periods when the appliance is turned off). The data is fitted to an exponential distribution with the following parameters: lambda, minimum and maximum. Similar to the household consumption, the appliance consumption is provided as a ratio of the maximum load over the year, which allows for scaling by multiplying with the hourly peak power over the year. The filtered values correspond to the two-dimensional segmentation as explained in the introduction of this Section 1 .

Electric vehicle charging
The data for electric vehicle charging is provided by the sheet EVcharging_filtered . It shows the filtered values for charging start probability (%) and charging duration (h). The filtered values again correspond to the segments in Table 2 and the filtering process described in the introduction of this section. The data is fitted to an exponential distribution with the following parameters: lambda, minimum and maximum. The geographical location of the data set can be observed in Fig. 2 .

Photovoltaic power generation
The data for the photovoltaic power generation is presented in sheets PV_raw and PV_filtered : • PV_raw consists of simulations of generated power from a photovoltaic panel of one module for 15 different locations in Norway (three selected sites located in each price area). The data is hourly for the time period 1 January 2015 to 31 December 2020. The weather stations and corresponding areas are given in Table 6 , along with the data points for measured irradiance and temperature data. These irradiance and temperature data sets were used to calculate the simulated photovoltaic power as described in Section 2 . Fig. 4 shows the simulated power for one week for NO1. The simulated power for each location can be seen in Fig. 5 . The geographic locations of the weather stations can be observed in Fig. 2 . • PV_filtered shows the filtered values for photovoltaic power per price area (NO1-NO5), using PV_raw as input. Again, the filtration segments are shown in Table 2 and the filtering process is described in the introduction of this section. The data is fitted to a normal distribution with the following parameters: mu, sigma, minimum and maximum.

Wholesale electricity price
The wholesale electricity price data is presented in sheet WholesalePrice_filtered . This is dayahead market data obtained from the Norwegian electricity market Elspot operated by Nordpool and provided via a publicly accessible data platform [6] . The sheet shows filtered values for the different price zones (NO1-NO5) for the duration between 26 May 2017 to 2 April 2021. The filtration process is again conducted as described in Table 2 and in the introduction of this section. The data is fitted to a log-normal distribution with the following parameters: mu, sigma, minimum and maximum. The geographical distribution of the data set can be observed in Fig. 2 .

Experimental Design, Materials and Methods
The data set allows for the formulation of control problems on the residential level. Instead of choosing specific numerical values in e.g. kWh, it was instead chosen to formulate the data in form of ratios representing usage patterns. This allows adjusting the individual data sets to various sizes of households as well as different brands and models of appliances.
As described, this data set consists of both raw and filtered data. Further, and as previously described, the filtered data was obtained by fitting the raw data to three different distributions: normal, exponential or log-normal. This was done in order to allow for utilization in stochastic models. Selection of the distribution parameters was conducted via minimization of the Kullback-Leibler divergence, whereas the utilized Python script can be found in [13] . The selection of the distribution for each specific data set was as shown in Table 7 .  In order to be fitted to the distributions shown in Table 7 , the data set was normalized by feature-scaling [7] , i.e.: x = x − minimum maximum − minimum (1a) x is the original value x is the scaled value Since the filtered data sets and their parameters are normalized, samples must be re-scaled before they can be used: if Household, Photovoltaics x sample is the sample (with normalized values) x sample, re-scaled is the sample scaled to real values The distributions were chosen based on the lowest Wasserstein metric for all segments. An overview of these is shown in Fig. 6 .

Household consumption
The original household consumption data series obtained from the smart meters comes in the form of hourly resolution for year 2015. Due to data security concerns, the presented raw data was obfuscated in the following ways: • individual household labels were removed • the data was normalized • instead of single households the households were aggregated into groups via k-means clustering [7] . This data set is provided in sheet Households_raw and illustrated in Fig. 3 .
The elbow plot for the segmentation obtained via clustering of the provided data series is shown in Fig. 7 . Based on this, four household consumption profiles (Group A to D) were created. These groups were filtered as described above.

Appliance consumption
The original household appliance data series was obtained from the Norwegian research project ElDeK (Electricity Demand Knowledge, 2009-2013), in which 1-minute resolution consumption measurements in kWh of cloth washing machines, dishwashers and dryers were collected using dedicated plugin instruments [1] . These appliances all allow for load shift, i.e. postponing electricity demand to a later hour. 75 Norwegian households from four DSOs participated in the study, for periods of four weeks. The number of households for each appliance is shown in Table 5 .
The data was created in the following ways: • The 1-minute resolution measurements of dryer, washing machine and dishwasher were changed to hourly resolution by summation. • The hourly values were normalized.
• The normalized values were filtered with a two-dimensional categorization, as explained in the introduction of Section 1 .

Electrical vehicle charging
The electric vehicle charging data consists of charging start probability in % and charging duration in hours. The original data used to derive these series was obtained from a previously published data set [2] . More specifically, the following set was used: Dataset 1_EV charging reports.csv [8] to obtain the following information: session ID, user ID, user type, date/hour for plugin of vehicle, and date/hour for plugout of vehicle. Only data for user type = Private was used, i.e. only data for private parking spaces with one user. The data set thus consists of a number of 56 vehicles.
It has to be noted that in this data set, the number of vehicles increases over the duration of the study collecting the data points, thus leading to the following equation for the filtered data on the vehicle charging start probability:  (4) Finally, the data was filtered with the three-dimensional categorization, as explained in the introduction of Section 1 .

Photovoltaic power generation
The photovoltaic data was simulated based on publicly available temperature and irradiance data. Both temperature and irradiance data in hourly intervals were obtained from [9] for the 15 weather stations as shown in Table 6 for the time period 1 January 2015 to 31 December 2020. The data was additionally cleaned for measurement errors by removing temperatures above 40 • C and irradiances above 10 0 0 W / m 2 . The utilized Python script can be found in [14] .
Based on these time series, the resulting photovoltaic power P t of a single module was calculated as described in [10] of the type Mitsubishi 255 Wp [11] : where The cell temperature is calculated as described in [12] : P mpp is the maximum power point of the module in W F F is the fill factor of the module V oc is the open circuit voltage of the module in V I sc is the short circuit current of the module in A T cel l ,t is the cell temperature of the module in K T t is the measured temperature in degree C T 0 is the standard module temperature in K E 0 is the standard irradiance in W/m 2 E t is the measured irradiance in W/m 2 η in v is the inverter efficiency NOCT is the nominal operating cell temperature The resulting data is P t for each weather station. In addition, the values were filtered with the three-dimensional categorization, as explained in the introduction of Section 1 .

Wholesale electricity price
The wholesale electricity price data is the spot market data obtained from the Norwegian electricity market operators' platform [6] . The raw data obtained from the platform was filtered with the three-dimensional categorization as described previously.

Ethics Statement
The household consumption data was collected by a DSO in Trøndelag county, Norway. The individual labels for each household was removed by the DSO. The data was further normalized (dividing by maximum value), making it impossible to identify individual households based on their maximum consumption. The specific geographic location of the 99 households is not disclosed outside of them stemming from Trøndelag county (Trøndelag county has 468,702 inhabitants).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.