REEDD-CR: Residential electricity end-use demand dataset from Costa Rican households

End-use demand data availability is a catalyst for improving energy efficiency measures and upgrading electricity demand studies. Nevertheless, residential end-use public datasets are limited, and end-use monitoring is costly. The lack of electricity end-use data is even more profound in Latin America, where there are no public end-use datasets as far as the authors are concerned. Hence, we present the Residential Electricity End-use Demand Dataset of Costa Rica (REEDD-CR), containing the results of monitoring 51 Costa Rican households. The data set includes the aggregated and branch circuit measurements for every home with a sample time of 1 min for at least an entire week. The measurements were distributed all around the country. In addition, based on these sub-measurements, REEDD-CR includes a dataset of 197 load signatures composed of seven consumption and demand features for eight high-consuming appliances: refrigerator, stove, dryer, lighting, water heating, air conditioning, microwave, and washing machine. The features included on each load signature are average power, peak power, average daily events, average daily energy, day-use factor, night-use factor, and time of use. The single-appliance measurements used to calculate these load signatures are also part of the dataset. The release of REEDD-CR can serve as a tool for appliance modeling, demand disaggregation testing, feedback for energy demand models, and the overall upgrade of electricity supply and demand simulation studies with realistic and disaggregated data.


Value of the Data
• The data can be used for appliance modeling, testing of demand disaggregation algorithms, electricity demand estimations, and the overall upgrade of electricity demand simulation studies. • Helpful for researchers, utilities, and policymakers working toward understanding consumers' behavior and end-use electricity consumption. • Useful to understand the contribution of specific devices to the overall electricity consumption of a household. • The dataset contributes to the limited number of public datasets with real end-use electricity measurements. • Although measurements correspond to Costa Rican households, the dataset can be a reference for understanding appliance electricity consumption in other countries and regions as well.

Data Description
This paper presents the Residential Electricity End-use Demand Dataset of Costa Rica (REEDD-CR) dataset. It contains end-use electricity demand measurements of 51 Costa Rican households with a 1-minute sample time. Monitors were installed in each home to record the electricity demanded by the principal (i.e., aggregated) and branch (i.e., disaggregated or individual) circuits.
Every measurement includes at least an entire week. Therefore, it consists of both weekdays and weekends. They are labeled with the corresponding appliances they represent.
In addition, REEDD-CR includes a set of load signatures (i.e., representation of an appliance's operation). Every device has a characteristic load signature. The load signature results from the appliance's working pattern, the conditions in which the devices operate (e.g., temperature and pressure), and the way users manipulate the devices (e.g., time of use and power levels) [1] . A method to represent an appliance's load signature is feature extraction [2] . This dataset has load signatures with seven features for eight common end-uses: refrigerator, stove, dryer, lighting, water heating, air conditioning, microwave, and washing machine. The features are related to the end-uses' power demand and energy consumption (i.e., average power, peak power, average daily events, average daily energy, day-use factor, night-use factor, and time of use).
REEDD-CR has three main groups of data, separated into different directories: i) submeasurements, which correspond to all of the aggregated and disaggregated electricity demand measurements, ii) single appliance measurements, which has the end-use time series used to calculate the load signatures, and iii) the load signatures, that corresponds to the features calculated for each end-use. The following sections explain the three of them.

Aggregated and branch circuit measurements
These data correspond to the complete electricity demand measurements, including each home's principal and branch circuits. The dataset has a different file for each household in the 0_submeasurement folder. The files' names have the form of S_ < n > .csv . Here, _ < n > corresponds to the household number. The first column corresponds to the measurement date and time in each file, and the remaining columns have the data monitored by each channel of the monitoring device. The first row enumerates the measurement channels, and the second row indicates the type of voltage supply that each circuit had. The voltage supply is coded as follows: 120 indicates that the circuit is connected to 120 V, 240 indicates that the circuit is connected to 240 V, and 120/240 is used for those circuits with a 240 V connection, but both phases were measured.
A separate folder named 0_labels contains the labels for each sub-measurement (i.e., a list of the end-uses plugged into the corresponding circuit). The names of these documents have the form of L_ < n > .csv , where _ < n > corresponds to the number of the household. The measurements and labels with the same number denote the same household. The device used to collect the data in REEDD-CR has 14 measurement channels (see Section 2.1 ). In most cases, empty columns in the label files denote unused channels. These channels (or columns) have null or only zero values in the sub-measurement files. In a limited number of cases, it was not possible to identify the outputs for some branch circuits. These circuits were monitored but did not record any significant power consumption. This explains the noise (i.e., very low values) in some measurements with no labeling.
To exemplify the data available in REEDD-CR, Fig. 1 shows the power demand measurements of a typical day in household number 1 of the dataset.

Single end-use measurements
These data have measurements corresponding to individual end-uses. The time series were selected from those sub-measurements with only one end-use or one dominating end-use. In addition, the files in this folder (named 1_single_appliance_measurements ) are the raw data used to calculate the load signatures. There is a folder for each end-use: air-conditioning ( AC , 1 measurement), dryer ( D , 12 measurements), lighting ( L , 62 measurements), microwave ( M , 20 measurements), refrigerator ( R , 16 measurements), stove ( S , 32 measurements), washing machine ( WM , 7 measurements), water heating ( WH , 47 measurements). The names of the files have the following form: < End-use > _ < m > . Here, < End-use > corresponds to the initial letter(s) of the end-use, and < m > enumerates the end-use sub-measurements. Fig. 2 illustrates the type of single-appliance measurements available in REEDD-CR. Fig. 2 a shows the power consumption of water heating systems, which occurs in short periods. In contrast, the refrigerators have a use pattern throughout the day ( Fig. 2 b). Stove ( Fig. 2 c) and Lighting ( Fig. 2 d) reflect the uses according to the time of the day, i.e., when it gets dark and meal preparation.

Load signatures (2_load_signatures)
The last group of data in REEDD-CR contains the load signatures that are part of the dataset. Our definition for each feature in the load signature was: • Average power ( AveragePower_W ): this feature represents the average power in periods in which the device is active (i.e., ON state). This feature in the dataset is given in watts. For every feature, a device was considered to be ON whenever its power demand was above 3 W.  • Use factor ( Use_factor ): this variable quantifies the proportion of the day in which the device is active (i.e., ON) during the day. The dataset separates this feature into the day use factor ( Day_use_factor , from 6:00 am to 6:00 pm) and the night use factor ( Night_use_factor , from 6:00 pm to 6:00 am). The daily use factor is expressed as a number between 0 and 1, and the day and night factors are a number between 0 and 0.5. • Time of use ( Time_of_use_min ): it corresponds to the average time a device is used whenever it is active (i.e., the time elapsed between a change from OFF to ON and later from ON to OFF). This feature is given in minutes. Table 1 presents the average values and standard deviation for each feature and device included in REEDD-CR's load signatures.

Experimental Design, Materials, and Methods
Before proceeding with the measurements involved in REEDD-CR, several characteristics were pre-defined as desired in the dataset: • To monitor the aggregated profile of every household.
• To include as many end-uses as possible in the monitoring, prioritizing the highest consuming appliances. • To procure clock-synchronization among all the end-use sub-measurements and between them and the aggregate profile. • To include measurements of both weekends and weekdays for every household.
• To contemplate a wide variety of households, including rural and urban locations, spread across the country.
The following sections describe the methods for obtaining REEDD-CR's sub-measurements and load signatures.

Sub-measurements
The monitoring device used for REEDD-CR was the IoTaWatt. This open monitor records electricity demand with 14 inputs connected to current transformers (CTs) that clip around each one of the wires of the circuits of interest [3] . In this case, the IoTaWatt was installed in the main circuit panel of each household. Therefore, the aggregated and branch circuits could be monitored with a single IoTaWatt. Fig. 3 shows the device. Fig. 4 presents a typical configuration for residential electricity supply in Costa Rica. In the installation of REEDD-CR, two CTs were clipped around both main circuits for monitoring the aggregated power demand (node A and B in Fig. 4 ). The remaining CT inputs (12 total) monitored the branch circuits, corresponding to the end-uses of interest. Clock synchronization was a crucial advantage of monitoring from the main circuit panel with a single device. By using a  single monitoring device, all measurements were collected simultaneously, avoiding complicated synchronization procedures.
Nevertheless, sub-metering faces several challenges. For instance, a singular branch circuit can supply multiple end-uses. In these cases, branch circuit identification is crucial. Therefore, labeling each circuit with its corresponding appliances and CT input was the first step during every installation.
Additionally, the available inputs were sometimes insufficient for the number of branch circuits (i.e., households with more than 12 branch circuits, in this case). The following considerations were made to solve this problem: • Branch circuits with usually the highest energy consumption were prioritized (i.e., refrigeration, cooking, lighting, and water heating in the case of Costa Rica). • When possible, two or more branch circuits corresponding to the same end-use (mainly lighting) were monitored with a single CT, i.e., one CT was clipped around more than one insulated wire. • In most cases, for devices connected to 240 V, only one CT was used. Nevertheless, checking their specific supply system before proceeding with the installation was critical. For instance, there are stoves and oven combos with a 240 V connection. However, the stove section is connected to 120 V and the oven connected to 120 V. In these cases, two CTs are needed. Both situations were clarified in the circuit labeling during the monitoring process.
The monitoring process lasted for at least an entire week in each home, so weekdays and weekends were included-however, the starting day of the week varied. Urban and rural homes were part of the study. Both considerations were made aiming at diversifying the data set. Each measurement has a 1-minute sample time.

Load signatures
The load signatures for eight different appliances are represented by extracting seven relevant features from the sub-measurements. The feature selection incorporates characteristics related to the devices' power demand and time of use. This was done to have a comprehensive load signature of each device and differentiate those devices with similar power demands. The load signatures were calculated in those sub-measurements that monitored a single end-use or where the end-use of interest was easy to identify and extract (i.e., sub-measurements with multiple end-uses were discarded in this calculation).